Every Journal reporter: Wang Yubiao Yang Xinyi Every editor: Zhang Haini Byte and Kuaishou, the two short video giants are facing a head-on confrontation in the field of AI. On November 8, Dream AI, an AI content platform owned by ByteDance, announced that Seaweed, a video genera

Category：entertainment

2024-11-10

reporter: Wang Yubiao Yang Xinyi editor: Zhang Haini

Byte and Kuaishou, the two short video giants are facing a head-on confrontation in the AI field.

On November 8, Dreamai, an AI content platform owned by ByteDance, announced that seaweed, a video generation model developed by ByteDance, is officially open to platform users.

According to Byte, the beanbag video generation model seaweed that is open for use this time is the standard version of this model. It only takes 60 seconds to generate a high-quality AI video of 5 seconds, which is 3 to 5 minutes ahead of all in the domestic industry. Requires generation time.

"Daily Economic News" reporters also found during actual testing of the first and latest versions of Jimeng and Keling that after iteration, the video generation effects of the two products have been improved in many aspects and to varying degrees. Keling It is more accurate in terms of spatial layout and picture detail presentation, and more flexible and convenient in adjusting the generated content effects; Jimeng has advantages in generation duration and video style. A large model technician from

told the reporter of "Daily Economic News" that it is very difficult for the video generation model to achieve different "styles" of production content. "In addition to technology, it also mainly depends on the richness of the data source." .

When short videos enter the AI era, Byte and Kuaishou will be eliminated. Who will come out on top?

original generation vs. iteration: What updates have been made to Jimeng and Keling in half a year?

is now open for use with Byte's self-developed video generation model seaweed. The most interesting pair in the domestic video generation model competition - Ji Meng and Ke Ling finally officially competed.

They both carry the "AI dream-making plan" of understanding the physical world and amplifying imagination as much as possible while deriving "reality". However, for themselves, Jimeng and Keling also shoulder the responsibilities of ByteDance and Kuaishou. development prospects.

In fact, both Jimeng and Keling completed several iterations in less than a year. Jimeng started internal testing of the video generation function at the end of March. Half a year later, Byte released two video generation models of the Doubao model family, seaweed and pixeldance, and invited small-scale testing through Jimeng AI and Volcano Engine. Seaweed is now open to platform users. Officially open.

Pan Helin, a member of the Information and Communication Economy Expert Committee of the Ministry of Industry and Information Technology, told the reporter of "Daily Economic News" that the generation speed of the new model used by Jimeng has been improved, giving users a better generation experience. "Jimeng AI is currently generating domestically It is still relatively leading in the field.”

Keling became a blockbuster after its "birth" in June. It has experienced more than ten updates since its release, including the release of the Tusheng video function and the launch of the 1.5 model and other important updates. As of now, Keling has more than 3.6 million users, has generated a total of 37 million videos, and will officially launch an independent app (application software) in the near future.

"Daily Economic News" reporter selected 5 sora video prompt words officially announced by openai (lady on the streets of Tokyo, astronaut, coast from drone perspective, 3D animated little monster, young man reading in the cloud) and tested them separately That is, the first and latest versions of Meng and Keling, vertically compare the video effects of the two video generation models. After comparing the video effects produced by

between the original version and the latest version of Jimeng, we found that two parts of Jimeng's updates are more obvious: one is in the performance of dynamic "people and things", the capture and coherence of movements are more obvious. Improvement; the other is that the differentiated presentation of picture styles has also made great progress.

Take "Lady on the Streets of Tokyo" as an example. The characters created by the first generation Ji Yu have stiff movements, especially in capturing the movements of legs and feet. The overall effect is blurred and distorted. The iterated new version of Ji Meng has natural and smooth character movements, and the detailed processing of foot dynamics is clearer and more in line with the logic of the real world.

From the perspective of the differentiation of picture styles, the updated version of Jimeng's painting style is more differentiated. It presents different styles in terms of descriptions of real-world pictures and surreal pictures.

This is clearly contrasted in the effect of the video generated by "Young People Reading in the Cloud".The first generation of Jimu treated this surreal scene in a complete animation style, while the new version of Jimu's presentation of characters is more realistic.

Screenshots of the video of the first version of Jimeng "Young People Reading in the Cloud"

Screenshots of the new version of Jimeng "Young People Reading in the Cloud"

Screenshots of the video of Keling "Young People Reading in the Cloud"

" Video generation of "Astronaut" The same is true for the effect. The first-generation dream-generated astronauts had a heavy "game modeling" feel, but the new version is completely realistic. The first generation of

and the 1.5 model after several iterations have improved the video generation effect even more significantly. One of the changes is that the spatial layout and presentation of picture details are more refined. In the "coast from drone perspective" generation effect, it can be seen that in terms of spatial layout, the picture has more depth, the spatial layout is more complex, and the details of houses, roads, etc. are also richer.

Ji Meng PK Ke Ling: There are differences in understanding, capturing and imagining

After iteration, the generated effects of the two models are more stable, the image quality is better, and the fluency and detail processing are more able to withstand scrutiny. However, they still have obvious differences in semantic understanding, keyword capture and amplification, and the balance between creative imagination and creative relevance.

We made a horizontal comparison between the latest version of Dream and the 1.5 model Keling, and compared the five sora video prompts (ladies on the streets of Tokyo, astronauts, the coast from the perspective of drones, 3D animated little monsters, and young people reading in the cloud). Compete with the presentation of people). The semantic understanding of

and the capture of keywords make the video presentation of Jimeng and Keling different.

In the video "Coast from a Drone Perspective", Ji Meng relatively blurred the "island with a lighthouse" in the prompt word, and whether it is Ke Ling or Sora, the focus of this scene is "Island". In the description of "Coast Highway", the dream setting does not conform to the logic of the real world.

Screenshots from the video of "The Island from the Perspective of a Drone" by Meng

Screenshots from the video of Keling's "Island from the Perspective of a Drone"

As for the video effect of "Astronaut", the video of Meng's pair in the description "Adventure" was not described, and after regeneration, the astronaut holding a cup of coffee and riding a motorcycle also ignored the "adventure" setting. Ke Ling emphasized "adventure" through the characters' expressions and camera movements. However, both Ji Meng and Ke Ling relatively ignored the "movie trailer" setting. In contrast, Sora's "Spaceman" video has a more cinematic feel.

Dream "Spaceman" video screenshot

Keling "Spaceman" video screenshot

In the "3D animated little monster" video generation, Dream's little monster settings are related to the characters in the animated movie "Monsters, Inc." "Sally" is almost the same. The description of the little monster in the prompt words, that is, the presentation of the dream, is also relatively inaccurate, such as the implementation of the "short-haired" setting. In addition, in terms of the presentation of the artistic style, the prompt words emphasize "lighting and texture", that is, the execution of dreams is weaker than that of Ke Ling.

Screenshots of Ji Meng's "Little Monster" video

Screenshots of Ke Ling's "Little Monster" video

In the video "Lady on the Streets of Tokyo", Ji Meng's presentation of complex multi-subject interactions is more effective than Ke Ling's. Not good. Both the "lady" who is the subject of the picture and the description of the space are relatively accurate, but the pedestrians in the picture are generally blurred, and the pedestrians in the close-up are distorted.

Jimeng’s “Lady on the Streets of Tokyo” video screenshot

Keling’s “Lady on the Streets of Tokyo” video screenshot

However, Jimeng AI official revealed that in the near future, the pro versions of the two video generation models seaweed and pixeldance will also be open for use. .The pro version model will optimize multi-subject interaction and the coherence of multi-shot actions, while also overcoming problems such as the consistency of multi-shot switching. In terms of function and experience,

has been developed after several rounds of iterations. When generating videos, there are adjustments to the "creative imagination and creative relevance" parameters, so balance adjustments can be made. Ke Ling can also set content that you do not want to present, such as blur, collage, transformation, animation, etc. The generation operation is more flexible and the effect can be adjusted.

makes the video generated by Dream more convenient. In addition, after testing, the dream video generation time is shorter. The video generation time of sora's five prompt words does not exceed half a minute each. However, it takes more than 10 minutes to generate a 10-second high-quality video with the 1.5 model.

However, it should be noted that the above-mentioned videos generated by Jimeng and Keling were tested and generated by reporters. Different versions and description details will cause differences in the video generation effects. Moreover, sora has not yet been opened, and the generated videos are all officially released versions. They will be released later. There may be some differences between the actual test results of users and the official videos.

ai In the melee in the field of video generation, what is the winner?

For the two short video giants Byte and Kuaishou, if they want to compete in the field of AI video generation, their opponents are far from just each other.

For example, on November 8, Zhipu, one of the “Six Little Dragons of AI”, launched a new upgrade to its video generation tool Qingying. It is worth noting that the upgraded Qingying supports video generation from images of any proportion, and has multi-channel generation capabilities. The same command or picture can generate 4 videos at one time. In addition, Xinqingying can generate sound effects that match the picture. This sound effect function will be launched in public beta this month.

There are already emerging players among them.

html On August 31, minimax released its first AI high-definition video generation model technology abab-video-1, which was widely reported in the first month of its launch. Minimax’s official public account disclosed that in the first month after the video model was launched on Conch AI, the number of visits to Conch AI’s web version increased by more than 800%. Users cover more than 180 countries and regions around the world, and the product continuously topped the AI product list (web) in September globally. It topped both the growth rate list and the domestic growth rate list.

Wang Peng, an associate researcher at the Institute of Management of the Beijing Academy of Social Sciences, pointed out to the "Daily Economic News" reporter that AI video products at home and abroad are currently in a stage of rapid development, and foreign technology giants such as Meta and Google are actively deploying the AI video field; domestic On the other hand, products such as Kuaishou Keling and Jimeng AI are also constantly being iteratively upgraded to improve user experience and commercialization capabilities.

At the same time, a research report released by Soochow Securities in August this year also pointed out the iteration and implementation of domestic large-scale AI video models, the rapid development, and the fierce competition - at the technical level, the new model generation time, resolution, and frame rate are fast Improvement has narrowed the gap with sora; in terms of products, many new products and model upgrades are open to all users, and some have been used in the creation of micro-short dramas. Domestic companies are making rapid progress in user opening and commercialization.

In terms of commercialization possibilities, the research report mentioned that under the neutral assumption of an AI penetration rate of 15%, the potential space for China’s AI video generation industry is 317.8 billion yuan; under the full AI model, movies, The production costs of long-form dramas, cartoons and short plays will be reduced by more than 95% compared with traditional models. The huge potential market size of

and the "superpower" of cost reduction and efficiency improvement can also be glimpsed from Keling's usage data.

At the "2024 China Computer Conference" held in October, Zhang Di, vice president of Kuaishou and head of the large model team, revealed that since its release in June this year, Kuaishou Keling AI has more than 3.6 million users and has generated a total of 37 million videos and over 100 million pictures.

Pan Helin said in an interview with a reporter from "Daily Economic News" that Keling is backed by Kuaishou and has traffic support, so the commercialization process is very fast. "AI video products still need to be backed by the Internet platform. Only with traffic can there be business." potential". Similar to

, Byte also puts the commercialization of video models at the forefront of its task list.When two video generation models were launched in September this year, Tan Dai, president of Volcano Engine, publicly stated that the new beanbag video generation model "has been considering commercialization since its launch." The areas of use include e-commerce marketing, animation education, and urban cultural tourism. and micro-scripts.

"AI video will show different commercialization potentials on the b-side and c-side." Wang Peng believes that for the b-side, ai video can provide enterprises with more efficient and low-cost video production and distribution solutions; on the c-side , AI video can meet users' needs for personalized, high-quality video content, and can also be combined with e-commerce, advertising and other industries to achieve more precise marketing and monetization.

However, regarding commercialization, there are also thoughts like Yan Junjie, the founder of minimax: "At this stage, the most important thing is not commercialization, but that the technology can reach the level of widespread 'availability'." But there is no doubt that it is. , under the combined influence of factors such as the fate of the two short video giants, the start-up unicorn's "new approach", and the different pace of commercialization, this race in the field of AI video generation has become more interesting.