From June 14th to 15th, the "2024 Beijing Intelligent Source Conference" was held in Beijing. At the scene, Aditya Ramesh, the head of openai sora, shared technology, Kai-fu Li, CEO of Zero One Everything and Chairman of Sinovation Ventures, had a fireside conversation with Zhang Yaqin, an academician of the Chinese Academy of Engineering, and the "Four Little Dragons" of domestic large-model startups Rarely on the same stage.
Zhiyuan Research Institute is a new R&D institution established in November 2018 by the Beijing Municipal Science and Technology Commission and the Haidian District Government under the guidance of the Ministry of Science and Technology and the Beijing Municipal Party Committee and Municipal Government. In 2023, the former president Huang Tiejun took over the "baton" of chairman from Chairman Zhang Hongjiang, and the new president was appointed by Wang Zhongyuan, who was once the vice president of technology of Kuaishou. The annual Zhiyuan Conference is called the "AI Spring Festival Gala" by the industry. At the
site, Kang Xiangwu, deputy director of the Strategic Planning Department of the Ministry of Science and Technology, said that currently, artificial intelligence is at the starting point of mass technological change and is moving towards a new stage of multi-intelligence integration and will become a standard feature of the fourth industrial revolution. Trigger far-reaching changes in social development. The large-scale cross-border application of artificial intelligence will also bring multiple security risk challenges. How to anticipate coexistence and better benefit human society while ensuring safety and controllability has become a major issue faced by mankind around the world.
Domestic large models rapidly iterated within one year
In the interview, Wang Zhongyuan talked about the development and changes of domestic large model technology in the past year. He said that in 2023, the industry believes that large domestic models are still chasing GPT 3.5. This year, the average level of domestic large models has exceeded gpt3.5 and is infinitely close to gpt4. Even in some capabilities in the Chinese context, domestic large models surpass gpt4, but gpt4 itself is also constantly iterating. For example, the overall performance, effect, and even efficiency of the newly released gpt4o have been significantly improved, so the entire domestic large model is still in a catching-up stage. At the
conference, Wang Zhongyuan disclosed the progress of Zhiyuan Research Institute in language, multi-modality, embodiment, and biological computing large models. Including Zhiyuan Research Institute and China Telecom Artificial Intelligence Research Institute (teleai) jointly developed and launched the world's first low-carbon single-body dense trillion language model tele-flm-1t. In response to problems such as large model hallucinations, Zhiyuan Research Institute independently developed the bge (baai general embedding) series of general semantic vector models. And in order to realize a multi-modal, unified, end-to-end next-generation large model, Zhiyuan Research Institute launched the emu3 native multi-modal world model.
Wang Zhongyuan said that domestic large models have reached a level that is usable, but not very easy to use. After gpt4, large models can enter the scene for rapid iteration, but the accompanying breakthroughs are also very difficult, including computing resources, core In terms of algorithms, system engineering, etc., there are still certain challenges in how to achieve chip interconnection for GPUs above 10,000 cards.
Among the factors that have contributed to the rapid development of large models in the past year, scaling law has become the key that the guests at the conference mentioned many times. Kai-fu Lee said that AI 2.0 is the greatest technological revolution and platform revolution in history. The importance of large model scaling law has been highlighted in this era-human beings can use more calculations and data to continuously increase the wisdom of large models. This law has been emphasized by many parties. The path to verification is still progressing and is far from reaching the ceiling.
Dark Side of the Moon CEO Yang Zhilin admits that large models are the first principle and that the scale of the model needs to be continuously improved. However, the biggest challenge is that some data may not necessarily be that large. Zhipu AI CEO Zhang Peng said from a pragmatic perspective that scaling law is still effective and moving forward. As for whether it can help large models reach their peak, the industry currently cannot find a definite answer. From the perspective of the end point of AGI, Wang Xiaochuan, CEO of Baichuan Intelligence, believes that in addition to scale, achieving AGI requires paradigm changes. For example, large models rely on data-driven learning for compression, but the current scaling law cannot achieve AGI. Li Dahai, CEO of Wall-Facing Intelligence, said that scaling law is an empirical formula, which is a summary of the industry’s experience after observing a complex system such as a large model. As more and more experiments are conducted during the training process and cognition becomes clearer, there will be more Fine-grained cognition, such as training methods in model training, has a significant impact on scaling law and intelligence.
The layout after approaching gpt4
In the interview, Wang Zhongyuan said that domestic large models have reached the stage where they can support applications, so he personally predicts that a large number of large model applications will be seen in the next two to three years. As for the specific classification, Wang Zhongyuan believes that b-end applications are very clear and cover almost all industries. As for the C-side, the industry generally looks forward to seeing hot-selling applications on the C-side. But analogous to the mobile Internet era, when a new technology or technological revolution appears, it requires a certain cycle, the right time, right place, right people, and technical capabilities.
When it comes to the implementation of large models into c-end products, Wang Zhongyuan believes that the models need to be low-cost and easy to use while solving the real pain points of users. Therefore, we need to maintain a certain degree of patience for popular c-end applications. “Even on the other side of the ocean, they have not yet appeared. A hot-selling application on the C end,” Wang Zhongyuan said.
If the AGI era arrives, what is the possible technological evolution route? Wang Zhongyuan believes that in the past few years, most scientific research attention, including industry attention, has been on the breakthrough of large language models. Currently, large language models are still single-language models, but in addition to text data, there are also a large number of images. , video, audio and other multi-modal data. When a multimodal large model can understand, perceive, and make decisions about the world, it is possible for it to enter the physical world. If you enter the macro world and combine it with hardware, this is the development direction of embodied large models; if you enter the micro world to understand and generate life molecules, this is AI for science. Aditya Ramesh, head of the
openaisora team, said in a dialogue with New York University Assistant Professor Xie Saining that language modality is indeed very important for building a more intelligent system with reasoning capabilities, but in a sense, the language information Integrating visual signals with some common interface may enable the ability to simulate anything. As the size of the model increases, its dependence on language decreases.
Recently, there have been frequent updates in the multi-modal field at home and abroad, including the AI startup luma AI releasing the video generation model Dream Machine, and the short video company Kuaishou launching the kling large model. Regarding the current status of the industry, Aditya Ramesh said that the team is currently most concerned about the security of the video generation model and its impact on society. He hopes that people will not use Sora to publish wrong information, and also hopes that the behavior of the model conforms to human expectations. It’s great to see other labs and companies working on video generative models, and having a large number of people experimenting with different approaches is important to spur innovation in the field of art and diffusion models. "Improve controllability" and "reduce randomness" are the most important functional requirements currently received by the sora team from partners.
ai security is another important topic at this Zhiyuan conference. Yang Zhilin also believes that AI security is very important. Although it is not necessarily the most urgent thing at the moment, it is something that needs to be prepared in advance. Because as the model progresses, the development of scaling law is that the computing power is multiplied by 10 times every n months, and the intelligence will be improved. Yang Zhilin believes that AI security includes the malicious intent of the model itself due to the user, as well as the behavior of injecting the AI "constitution" at the bottom of the model to frame the model.
Li Dahai believes that security at this stage mainly focuses on two directions: basic security and content security. The current large model is essentially read-only, the weight is fixed, and reasoning will not affect the weight. In the future, when users deploy models to terminals such as robots, and the models can dynamically update their weights, security issues will become a very important issue.
Regarding the recent price war, Wang Xiaochuan said that price reductions have allowed more individuals and companies to enter the market, and at the same time, many companies have begun to wake up and no longer participate in making large models, but "retreat" to become users of large models, reducing resources. waste.
(This article comes from China Business News)