"Kechuangban Daily" reported on July 16 (Reporter Huang Xinyi) Recently, in the latest issue of "Singer" program, the slight score difference between Sun Nan and foreign singers triggered netizens to question who is bigger and who is smaller, 13.8% and 13.11% controversy. Some ne

"Science and Technology Innovation Board Daily" July 16 (Reporter Huang Xinyi) Recently, in the latest issue of " Singer " program, Sun Nan 's slight score difference with foreign singers triggered netizens' concerns about 13.8% and . 13.11% Argument about who is older and who is younger.

Some netizens actually gave the wrong answer of "13.8% is greater than 13.11%", and the reporter found that many large models, like some netizens, were confused about this knowledge point for the fourth grade of elementary school. In the test of

by reporters from the Science and Technology Innovation Board Daily, large model applications such as kimi, Zhipu Qingyan, and Tongyi overturned one after another, while Baidu Wenxinyiyan and ByteDoubao maintained the dignity of large models. After

reporter asked a question, kimi said: 13.11 is greater than 13.8. After some guidance, including asking how to explain the meaning of the negative answer of 13.11 minus 13.8, Kimi gave the correct answer.

Reporters asked Kimi many times who is older, 13.11 or 13.8, and Kimi sometimes answered correctly. Judging from the wrong answers to , Kimi, like some netizens, mistook 13.8 for 13.08, so the result was larger than 13.11.

The reporter asked Kimi if he knew whether the ranking of Hunan Satellite TV's "Singer" caused a bigger incident, 13.11 or 13.8? Kimi answered smoothly and apologized for his previous wrong answer. After

, the reporter also tested other decimal point ratios, and kimi's accuracy was 50%.

kimi is talking nonsense in terms of mathematical logic, so can other large models accurately answer it? In the reporter's test, both Wen Xinyiyan and Doubao gave the correct answer.

Among them, Wen Xinyiyan gave a specific reasoning process and also answered recent news events.

bean bag also withstood the test.

Zhipu Qingyan also made the same digit error as the netizen. Because he believed that 11 is larger than 8, he deduced that 13.11 is larger than 13.8. Tongyi also firmly believes that 13.11 is greater than 13.8.

Zhipu Qingyan’s answer

Tongyi’s answer

It is worth mentioning that chatgpt also has the phenomenon of nonsense. The correct answer came after filling in the zero digits of 13.80 for 13.8.

The phenomenon of large models talking nonsense is called hallucinations of large models in the industry. Previously, a review paper published by research teams from Harbin Institute of Technology and Huawei concluded that there are three major sources of hallucinations in models: data sources, training processes, and inference. Large models may over-rely on some patterns in the training data, such as location proximity, co-occurrence statistics, and related document counts, leading to hallucinations. In addition, large models may suffer from insufficient recall of long-tail knowledge and difficulty in coping with complex reasoning.

Someone in the industry told a reporter from the Science and Technology Innovation Board Daily that the hallucination rate of large models of is still relatively high, which is one of the reasons why the industry lacks truly disruptive applications of . The industry is working together to solve this core problem. Make large models more controllable in business processes.

(Huang Xinyi, reporter of Science and Technology Innovation Board Daily)