Every editor: He Xiaotao, Bi Luming In the early morning of February 16, OpenAI once again threw a depth bomb and released the first Vincent video model Sora. According to reports, Sora can directly output videos up to 60 seconds long, and contain highly detailed backgrounds, com

2024-03-02 10:09:02 entertainment 9777℃

Each editor: He Xiaotao, Bi Luming

html In the early morning of February 16, openai once again threw a depth bomb and released the first Vincent video model sora. According to reports, sora can directly output videos up to 60 seconds long, and contain highly detailed backgrounds, complex multi-angle shots, and multiple emotional characters. Currently, 48 video demos have been updated on the official website of

. In these demos, sora can not only accurately present details, but also understand the existence of objects in the physical world and generate characters with rich emotions. The model can also generate videos based on cues, still images, and even fill in missing frames in existing videos.

For example, the description of a prompt (prompt word in a large language model) is: On the streets of Tokyo, a fashionable lady shuttles through streets full of warm neon lights and dynamic city signs.

In the video generated by sora, a woman is wearing a black leather jacket and a red skirt walking on a neon street. Not only is the subject coherent and stable, but there are also multiple shots, including slowly cutting from the street scene to a close-up of the woman's facial expression, and The wet street floor reflects the light and shadow effect of neon lights.

The movie trailer tells the adventures of a 30-year-old astronaut wearing a red wool knitted motorcycle helmet, blue sky, salt desert, movie style, shot on 35mm film, bright colors.

ai imagines the Spring Festival of the Year of the Dragon, with red flags waving and huge crowds of people. There were children who followed the dragon dance team and looked up curiously, and many people took out their mobile phones to follow and take pictures. There were a large number of characters, each with their own behaviors.

From the super close-up perspective of the vertical screen, this lizard is full of details:

Netizens called the game over and their jobs will be lost:

Some people have even begun to "mourn" an entire industry:

Some netizens said that the film industry must be completely subverted. .

A YouTube blogger Paddy Galloway expressed his thoughts on Sora. He said that the content creation industry has changed forever, and it is no exaggeration. "I've been in the YouTube world for 15 years, but what OpenAI just showed left me speechless... Animators/3D artists are in trouble, material websites will become irrelevant, anyone can get incredible products without barriers, The 'ideas' and stories behind the content will become more important."

Openai is not shy about Sora's current weaknesses, pointing out that it may have difficulty accurately simulating the physical principles of complex scenes and may not understand cause and effect.

For example, "Five gray wolf cubs were playing and chasing each other on a remote gravel road." The number of wolves will change, and some will appear or disappear out of thin air.

The model may also confuse the spatial details of the cues, such as left and right, and may have difficulty accurately describing events over time, such as following a specific camera trajectory.

For example, in the prompt word "The basketball passes through the hoop and then explodes", the basketball is not blocked by the hoop correctly.

openai said that they are teaching AI to understand and simulate the physical world in motion, with the goal of training models to help people solve problems that require real-world interaction.

OpenAI then explained how sora works. Sora is a diffusion model that starts from a video similar to static noise and gradually removes the noise through multiple steps. The video is also transformed from initial random pixels into a clear image scene. sora uses the transformer architecture, which has strong scalability.

videos and images are collections of smaller data units called "patch". Each "patch" is similar to a token in gpt. Through a unified data expression, it can be used on a wider range of visual data. Training and diffusion variations, including different times, resolutions, and aspect ratios.

sora is based on past research on dall·e and gpt. It uses dall·e 3's restatement prompt word technology to generate highly descriptive annotations for visual model training data, so the model can better follow text instructions.

Sora is now opening itself up to select members to assess potential hazards or risks in key areas.At the same time, OpenAI has also invited a group of visual artists, designers and filmmakers to join in, hoping to gain valuable feedback to promote model progress and better assist creative workers. OpenAI shares research progress in advance, aiming to cooperate with people outside OpenAI and obtain feedback, so that the public can understand the upcoming new chapter of AI technology.

Editor | He Xiaotao Bi Luming Gai Yuanyuan

Proofreading | Liu Siqi

Daily Economic News Comprehensive Interface News, Qubits, Openai Official Website, etc.

Daily Economic News

Tags： entertainment

Prev post： According to real-time data from Beacon Professional Edition, as of 0:00 on February 16, the box office of the movie "Hot and Spicy" exceeded 2.3 billion. Late at night on February 16, Jia Ling posted on Weibo: I know you can tell after watching the movie that this is not a weigh

Next post： Text | Mu Ran Editor | Sanyuan The gap between Spring Festival movies is gradually widening. Ranking first is Jia Ling's "Hot and Spicy", which currently has a box office of over 2.3 billion. Followed closely by "Flying Life 2", which received 2 billion votes...