In February 2024, the emergence of sora prevented many large model companies from having a good Spring Festival.
"The Spring Festival Gala is still being replayed, so we are urgently organizing group discussions," an employee of a leading AI company said eagerly to the "City Circle". Seeing Sora's silky-smooth experience, even bloggers who sell courses are rushing to launch online tutorials and make a lot of money.
Who can "copy" the domestic version of sora in the fastest time? More people are focusing on Alibaba, Baidu, and the "Big Five". But no one thought that the winner would be a fast player with relatively "Buddha-like" technical performance.
html On June 7, Kuaishou suddenly launched the Wensheng video model "kling", which can support video generation up to 2 minutes. In addition, compared with Sora, which is still in the "futures" stage, Keling was opened for testing as soon as it was announced, and the generated results are also remarkable."Keling is the most discussed object in the industry recently," a person in the venture capital industry told "City Circle". According to official data, more than 500,000 people applied for Keling within one month of its launch. It has been opened to more than 300,000 users and has generated more than 7 million short videos. The unexpected popularity of
Keling inevitably makes Byte feel a little embarrassed. In May of this year, Byte also opened the "Jimeng" test of the Vincent Video model, but the effect does not have a clear advantage in the currently popular Vincent Video track.
was suddenly left behind by its once "ignored" opponents, and Byte needed to catch up. According to TMTPost, ByteDance has recently set the AI large model as the group’s “p0 highest level” direction. Many teams such as Douyin and Jianying are also working hard to develop AI video model applications, which are expected to be announced in the near future.
1. Make Keling, fast, rough and fierce.
Many developers told the "market" that the launch and outstanding performance of Keling can be regarded as a surprise in the industry.
Recently, "city boundary" used a paragraph of "black cat" as the keyword, and entered the same prompt in Keling, Jimeng, and "Qingying" that had just been launched by Zhipu AI: "Over the city streets on a rainy day There was no one around, and a cute black cat ran past. It had green eyes, a yellow collar and a bell around its neck, and long black hair all over its body. The video was shot from the perspective of the camera, and the water on the ground reflected it. The black cat appears. "
In the three videos generated, although the Keling version failed to achieve the effect of running fast, the videos generally conformed to the objective rules.
In contrast, there is no water on the ground in the "Jimeng" version, and the black cat does not move forward. Although the "Qingying" version has water accumulation, the black cat has a weird gait when walking, and the tail also loses frames.
▲ (The videos are generated for Keling, Jimeng, and Qingying in order)
According to "Silicon Star" reports, Keling was built by Kuaishou in 3 months; the team is very small, with only more than 20 people, led by The person is Wan Pengfei, head of the Kuaishou Visual Generation and Interaction Center. Most of his research directions are image/video signal processing, computational photography and computer vision, loss function reduction, visual generation, etc. The predecessor of
Keling comes from an inconspicuous project "Puji" restarted by Kuaishou in October 2023. This is a tool software that generates 2s gif emoticon packages from static images through AI. In early March this year, Kuaishou held a small internal meeting. Wan Pengfei’s idea was affirmed by Gai Kun (Yu Yue), Kuaishou’s senior vice president, and Puchi was quickly determined as a product for pre-investigation.
According to "Silicon Star People", "When doing Keling, there is a consensus at the execution level, which is fast, rough and fierce." Less than a month after the start of the
Keling project, it received the support of Kuaishou founder Cheng Yixiao. Think of this as a strategic corporate level project. Gai Kun also often says: The company's cards are all for you to use, and the company fully supports it.
Ke Ran, an entrepreneur on the digital human track, analyzed "City": "Keling's success is largely due to the video data material accumulated by Kuaishou. Looking at the country, there are few companies that can compete with it in this regard. Only Douyin. "While
can be successful, Byte seems a bit lonely.
Although "Jimeng" was officially announced on May 9th, on June 17th, Jimeng also served as the chief AI technical supporter and appeared in the aigc short series "Sanxingdui: Future Apocalypse". But whether it is the performance on the c-end or compared to Kuaishou's AIGC short drama "Mountains and Seas Strange Mirror" launched on July 13, Ji Meng's voice is not very loud.
html On July 17, there was news in the market that Byte will announce the progress of Sora-like Wensheng video technology. The outside world also interprets this as meaning that Byte is going to catch up and compete head-on with Keling.However, Byte later stated to the "City Circle" that the news was not accurate. On July 17, "City Circle" noticed that the event was more like a technology sharing session. The meeting was mainly hosted by Feng Jiashi, the leader of the basic visual research team of Doubao large model, and the whole session was shared by Byte research scientists, institutional scholars, etc. in English.
It seems that Byte's "big move" may still have to wait for some time.
2, Byte has not recovered yet
So, in the Vincent video track that has been in full swing recently, why did Byte miss the feast; what is Byte busy with recently?
To some extent, maybe because compared to Kuaishou betting on "Ke Ling", it can "reduce ten skills with one effort". Byte's large-scale model layout is more complicated - and in the first half of this year, Byte's more important opponents are Tencent and Alibaba.
Faced with large models, Byte's pace is no longer "radical". After all, it was Byte that took the lead in launching a large model price war in the industry more than 2 months ago.
html On May 15, at ByteDance’s “Force Power Conference”, Byte launched an API service based on its self-developed bean bag model. At the same time, Tan Dai, president of Volcano Engine, took one step and revealed the latest price of “bean bags”: 0.0008 yuan/thousand tokens, announcing that this was the “floor price” lower than 99.3% of the industry.At that time, Byte's "attack" had taken the lead. According to "market circles" from many sources, leading players were unprepared for Byte's attack; although all parties felt helpless, they could only passively follow.
In the next few days, Alibaba Cloud, Baidu Wenxin Model, and Tencent Cloud successively announced that their large model inference input token and API will be significantly reduced in price. Under this influence, the C-side calls of large head models are now almost free, and the industry has begun to move towards the next ecological level.
According to the founder of a legal AI application company, there was almost no time lag after the API service was opened, and the sales staff of Volcano Engine began to actively contact customers and promote products. This also confirms the speculation circulating in the market that Byte has marked the large model as the highest-level strategy.
▲ (Tan stayed at the 2024 "foece Conference". Picture source/Volcano Engine)
Recently, Byte's "leading product" bean bag has grown more obviously.
According to questmobile data, as of June 2024, among the domestic aigc apps, Doubao, Tiangong, kimi smart assistant, and Cat Box have achieved eye-catching growth - among which Doubao ranks first in traffic.
▲ (Source/questmobile)
Compared with Kuaishou, what Byte is more concerned about now may be the full ecological competition from basic large models to the AI application layer. In addition, considering that the Volcano Engine will only officially become a cloud in 2021, it is the "youngest" among the giant cloud vendors. For more than three years, Huoshan Cloud has been regarded as a challenger in the cloud market. How Byte collaborates with the basic large model, application layer, and cloud market is a comprehensive proposition.
Recently, according to the "Photon Planet" report, a large number of users of the Byte "Button" platform are looking for ways to connect their created agents and bots to WeChat official accounts or mini-programs, and discussions are very active.
In December last year, Byte launched the AI application development platform "coze" overseas. In February this year, the domestic version of “Button” was launched online. A large number of Douyin merchants also hope to quickly make a pot of gold from it.
Considering that Tencent was belatedly released in May this year, it released the AI agent creation and distribution platform "Tencent Yuanqi". At that time, the number of visits to Guanzi had reached 2.33 million. As of now, Tencent Yuanqi has not yet opened up the WeChat series ecosystem of mini programs, official accounts, and customer service subscription accounts.
After all, AI development is still in its early stages. Byte, like Tencent, still needs to spend a lot of time educating users. Competing for distribution rights in the AI era and gaining a head start may be a bigger lesson that Byte has to do in order to target Tencent.
3, strike later, there is still time
At the industry level, in today's Internet, there is no shortage of content traffic, e-commerce traffic, and bytes of financial ammunition. Even if it is "one step behind" in Vincent Video in the short term, in the long term , still has the potential to strike later.
uses active market strategies to catch up with its backward position and vigorously achieve miracles, which is also Byte's specialty.
Recently, targeting Alibaba, Byte is also working on integrating large models. At the DingTalk Ecosystem Conference that just passed on June 26, President Ye Jun announced that in addition to Alibaba’s own Tongyi, large third-party models from the other six companies will also be included in DingTalk. These include Minimax, : The Dark Side of the Moon , Zhipu AI, Orion Starry Sky, Zero One Thing and Baichuan Intelligence, covering almost all well-known large-scale model startups in China. It is self-evident to "build the most open AI ecosystem in China".
Similar to the DingTalk gameplay, in addition to supporting its own "Doubao", Byte's button platform also has access to major external models such as Tongyi Qianwen, Dark Side of the Moon, and minimax. On June 14, Kouzi also launched the “Model Square” function, which allows users to select two anonymous models and score them based on the performance of the generated content.
In addition, Byte has been exposed recently, and its exploration of "ai + hardware" is accelerating, and it does not hesitate to recruit talents through acquisitions.
According to "tech planet" reports, Pico, a subsidiary of Byte, has been developing multiple wearable devices since the second half of last year, including headphones and speakers, and these devices will also be equipped with AI. The Byte Doubao team has also explored the combination of large-scale model software and hardware. The combination of large-scale model software and hardware has been gradually applied to hardware devices such as learning machines, robot dogs, and robots.
According to 36Kr, the person in charge of the “d-line” of Byte AI hardware is Li Haoqian. The latter is the founder of oladance, the ows (open wearable stereo) headphone brand acquired by Byte in March this year. The person in charge of another AI hardware line, the "O Line", is also the founder of the company that Byte has acquired. He reports to Hong Dingkun, Vice President of ByteDance Technology.
In the direction of Vincent Video, facing the recent popularity of the track, the pursuers, including Byte, still have time.
Recently, a developer told "City": "Now I use Keling to compose pictures and reduce the workload of the work process. It is not yet fully used for creation, so there is no dependence on it."
On the other hand, In the eyes of a developer and short video aigc blogger, Keling still has a lot of room for optimization: "Relying on Keling Wensheng Video cannot guarantee the consistency of the virtual human IP. I usually use the function of Keling Tusheng Video, which is equivalent to Give Ke Ling a picture, let him generate dynamic videos from different angles on this basis, and then splice them together to simulate the effect of moving the camera. In fact, the proportion of human operation is greater. "
A certain domestic AI simulated dating. The product's R&D members said: "In the current large model application market, everyone is crossing the river by feeling the stones. How to commercialize it is a question that is too far away and too vague. But what is certain is that the more people use it, The more you play, the more you can ensure the optimization and iteration of the product. "
(Ke Ran is a pseudonym)
author | Dong Wenshu
editor | Li Yuan
operation | Liu Shan