Every reporter: Ke Yang Every editor: Zhang Haini
After the round table discussion ended, a wall of people quickly surrounded, with Dark Side of the Moon CEO Yang Zhilin as the core, filling the empty space in the front row of the venue. People held up cell phone, hoping that the outstretched arm would happen to scan Yang Zhilin’s WeChat message. Yang Zhilin was surrounded by on-site participants at least three times before he successfully left the venue with the assistance of staff.
From June 14th to 15th, 2024, the high-profile AI field event "2024 Beijing Intelligent Source Conference" was grandly held at the Zhongguancun Exhibition Center. "Daily Economic News" reporters noticed on the spot that this conference, known as the "Spring Festival Gala in the AI World", has taken on an increasingly intense atmosphere driven by the wave of large models in recent years, and domestic large model star companies have also become participants. The focus of attention of the attendees.
is different from the previous conference, which was mainly focused on foreign technicians and practitioners and centered on technological exploration. This year, large domestic model companies such as Baidu, Dark Side of the Moon, Zhipu AI, Zero One Thing, and Wall-facing Intelligence have become the protagonists of the forum. At the same time, as large models gradually move from technical competition to practical applications, some new changes are taking place.
At this year's "AI World Spring Festival Gala", large domestic model companies such as Baidu, Dark Side of the Moon, Zhipu AI, Zero One Thing, and Face Wall Intelligence became the protagonists of the forum. The picture shows the scene of the 2024 Beijing Zhiyuan Conference. Picture provided by the organizer
New protagonists: Domestic large models stand in the center of the stage
In this year’s “AI World Spring Festival Gala”, domestic large model companies became the protagonists.
"As we enter 2023, large models will gradually develop from the scientific research results of research institutions to the industry. We have also seen that a hundred flowers are blooming, and more and more large models have been released in the past year." Zhiyuan Research Institute Dean Wang Zhongyuan mentioned in his speech.
Wang Zhongyuan believes that taking 2023 as the boundary, artificial intelligence can basically be divided into two major stages: before 2023, it belongs to the era of weak artificial intelligence, that is, the artificial intelligence model is aimed at specific scenarios and specific tasks, and needs to be Collect specific data and train specific models. For example, AlphaGo, which defeated the human world Go champion, performed very well at Go, but it cannot be used to directly solve medical problems. Although the methods can be used for reference, data and models need to be collected and trained again for different scenario tasks. Entering 2023, with the development of large models, artificial intelligence has gradually entered the era of general artificial intelligence. The biggest feature of general artificial intelligence is that its scale is very large, the model is emergent, and it can be used across fields.
The Beijing Intelligent Source Conference in 2023 and 2024 is like two contrasting scenes, especially in the development and application of large model technology. The composition of guests and changes in topics at the two conferences have become a footnote to the rapid development of the large model era.
At the 2024 Zhiyuan Conference, the guest lineup has undergone significant changes. What is even more eye-catching are the CEOs (CEOs) and CTOs (Chief Technology Officers) of large domestic model companies, such as Baidu, Dark Side of the Moon, Zero One Thing, Zhipu AI, Wall-face Intelligence and other large model star companies. and representatives from top domestic universities and research institutions. This conference focused more on the key technical paths and application scenarios of artificial intelligence, taking a big step forward from theoretical discussion to practical application.
At the 2023 conference, chatgpt has just been launched for half a year, domestic large models have started to follow up, and the "100-Model War" has just begun. At that time, the protagonists of the conference were top scholars and technological giants from around the world, while in China, the conference was dominated by academic circles. In the main forum at that time, the two groups of dialogue guests were: meta chief AI scientist and professor of New York University Yang Likun and professor of computer science at Tsinghua University Zhu Jun; founder of the Future Life Research Institute max tegmark and Tsinghua University Intelligent Industry Research Institute (air ) Dean Zhang Yaqin, the conversation focused on the exploration of AI technology.
Today, the changes have been very obvious. The "Battle of 100 Models" is intensifying, reflecting the rapid rise of the domestic large model market and the significant improvement in independent innovation capabilities.
As large models move from scientific research to industry, people have more imagination about AGI (artificial general intelligence).Wang Zhongyuan also mentioned that when a large multi-modal model can understand, perceive, and make decisions about the world, it is possible for it to enter the physical world. If you enter the macro world and combine it with hardware, this is the development direction of embodied large models. If it enters the microscopic world to understand and generate life molecules, then this is AI for science. Whether it is embodied models, AI for science, or multi-model modalities, they will promote the development of the entire world model and ultimately promote the development of artificial intelligence technology in the direction of AGI.
has a consensus: landing! Landing! Landing!
Despite the challenges it faces, the popularization and implementation of the technology has accelerated significantly, indicating that artificial intelligence is moving towards a new stage of development. An important consensus is that on the road to bringing AGI's ideals into the real world, implementation is an important must-answer question.
"Zero Yi Wan is determined to do TO C (for individuals) and not to do 'TO B (for enterprises) that loses money. If we find TO B that can make money, we will do it. If it doesn't make money, we will not do it." Kaifu Li said.
Regarding the application of large models, Kai-fu Lee believes that there are more opportunities for to c in China in the short term, and there are opportunities for both abroad. On the to c side, large models are like new technologies and new platforms in the Internet era or the PC era, which will bring new applications, which is a huge opportunity. He judged that in the AI era, the first stage to break through should be productivity tools; the second stage may be entertainment, music, and games; the third stage may be search; the next stage may be e-commerce; and then There may be social networking, short videos, and o2o (online to offline). This is an unchanging law.
Zhang Yaqin believes that when looking at the layers, what really makes money now is To B, which is in the hardware, chip, and infrastructure layers. This has already happened, but in terms of applications, it is To C first and then To b. Regarding the current AI layering, Zhang Yaqin divided it into information intelligence, physical intelligence (also called embodied intelligence) and biological intelligence. In the embodied intelligence stage, enterprise-oriented applications may develop even more rapidly. When it comes to the bio-intelligence stage, the situation may be just the opposite, and personal-oriented applications will exceed enterprise-oriented applications. The situation in each field may be different, but in general, enterprise- and individual-oriented applications, including open source models, commercial closed source models, basic large models, vertical industry large models, and edge models, will all exist.
As for the B-side application, Kaifu Li believes that to B is the greater value brought by large models and should be realized faster. However, it is a pity that there are several huge challenges in the field of to B.
On the one hand, some large companies and traditional companies cannot understand large model technology and dare not carry out disruptive applications. At the same time, for enterprises, the greatest value brought by this year (large models) is cost reduction, rather than value creation. To be honest, cost reduction means replacing human work. There will be many senior executives or middle managers in large companies who are unwilling to do this, because if they do this, the team may be cut off, and their capital in the company will be lost. , his power becomes smaller, and even his own job is gone. Therefore, large companies sometimes want to be CEOs, but there will be resistance from below. These reasons cause to B to be implemented immediately in theory, but in fact it is not that fast.
Another serious problem in China is that many large companies do not realize the value of software and are unwilling to pay for software. Moreover, there are so many large model companies bidding for it. As a result, the price gets lower and lower with the competition. In the end, it is only a single order. Losing one order means no profit. "We saw this phenomenon in the AI1.0 era, and now unfortunately (it) has reappeared in the AI2.0 era." Kaifu Lee lamented.
Baidu CTO Wang Haifeng’s point of view is that in human history, the core technologies of each industrial revolution, whether mechanical, electrical or information technology, all have some common characteristics: First, the core technologies have strong versatility and can be widely used. each field.Secondly, when these technologies have the characteristics of standardization, modularization and automation of industrial mass production, these technologies will enter the stage of industrial mass production, thereby changing people's production and lifestyle more quickly and bringing huge value to people. . At present, artificial intelligence based on deep learning and large model engineering platforms have strong versatility, as well as good standardization, automation and modularization characteristics. Therefore, Wang Haifeng believes that the combination of deep learning and large-model engineering platforms is pushing artificial intelligence into the stage of industrial mass production, thereby accelerating the advent of general artificial intelligence.
A disagreement: Do you still believe in scaling law
Disagreements around "scaling law" (law of scale) have begun to emerge. As to whether and when scaling law will expire, the leaders of celebrity large model companies have also given different opinions. judge.
Yang Zhilin is still a firm believer in scaling law. "There is no essential problem with scaling law, and I think the next 3 to 4 orders of magnitude are very certain. The more important question here is how can you scale (expand) very efficiently?"
Yang Zhilin pointed out that now Just relying on some web text (web page text) for scaling as it is now may not be the right direction. Because you may face many challenges in this process, such as reasoning ability, etc., which may not be effectively solved. Therefore, the key lies in how to define scaling law and what its essence is. If we only follow the existing method and perform next token prediction (next token prediction), and then expand it by multiple orders of magnitude on this basis, the upper limit is obvious with the current data distribution.
However, scaling law itself is not subject to this limitation. Its core is that as long as it has more computing power and data models and expands the parameter scale, it can continue to generate more intelligence. However, in this process, it does not define the specific shape of the model, such as the number of modes of the model, the characteristics and sources of data, etc. Therefore, Yang Zhilin believes that scaling law is a first principle that will continue to evolve. It's just that during this process, the scale method may change greatly. Wang Xiaochuan, CEO of
Baichuan Intelligent Technology, believes that scaling law has not yet seen a boundary and is still continuing to play a role. "We have seen that Elon Musk in the United States claims to buy 300,000 pieces of B100 and B200."
In his view, we need to look for new transformations in paradigms outside of scaling law, and in terms of scaling law, it is clear that we need to follow the United States. From a strategic point of view, there are paradigm changes in addition to scaling law. Only by stepping out of such a system can we have the opportunity to move towards AGI and compete with the most cutting-edge technology. Zhang Peng, CEO of
Zhipu AI Company, and Li Dahai, CEO of Wall-Facing Intelligence, hold a relatively cautious and optimistic attitude. Zhang Peng believes that all laws recognized by mankind so far, including scaling law, may be overturned, it just depends on how long it is valid. But so far, there is no sign that scaling law will become invalid, and it will still be effective for a considerable period of time in the future. "As everyone's understanding of the laws becomes deeper and deeper, the essence of the laws is more and more revealed. Mastering the essence can hold the key to the future. Based on everyone's current understanding of the essence, at least in our opinion , (scaling law) will still take effect and will be the direction our main force wants to advance in the future." Zhang Peng said.
Li Dahai also said that scaling law is an empirical formula, which is a summary of the industry’s experience after observing a complex system such as a large model. As more experiments are conducted during the training process and the understanding becomes clearer, there will be more detailed Granular cognition. For example, the training method itself in model training has a significant impact on scaling law and intelligence. This significant impact becomes particularly important after controlling the model parameters to a certain scale. While ensuring that the terminal chip can support the model at this scale, factors such as achieving high-quality intelligence, data quality and training methods are also crucial.
There is no doubt that scaling law is still an important theoretical basis for driving the development of large models at the current stage, but its application and expansion methods in the future may face more challenges and changes. As technology advances and the understanding of the nature of laws deepens, the industry may also need to further optimize model training methods to cope with higher-level challenges such as intelligent reasoning.
Daily Economic News