I know people have been scared by new technology since technology, but I've never before fallen into that camp until now. I have to admit, this really does frighten me.
This is so much better than all text-to-video models currently available. I'm looking forward to read the paper but I'm afraid they won't say much about how they did this.
Even if the examples are cherry picked, this is mind blowing!
I wonder if in the 1800s people saw the first photograph and thought… “well, that’s the end of painters.” Others probably said “look! it’s so shitty it can’t even reproduce colors!!!”.
What it was the end of was talentless painters who were just copying what they saw. Painting stopped being for service and started being for art. That is where software development is going.
I have worked with hundreds of software developers in the last 20 years, half of them were copy pasters who got into software because they tricked people into thinking it was magic. In the future we will still code, just don’t bother with the thing the Prompt Engineer can do in 5 seconds.
This is still so bizarre to me. I've worked on 3D rendering engines trying to create realistic lighting and even the most advanced 3D games are pretty artificial. And now all of a sudden this stuff is just BAM super realistic. Not just that, but as a game designer you could create an entire game by writing text and some logic.
If this goes well, future video compression might take a massive leap. Imagine downloading 2 hours movies with just 20kb file size because it just a bunch of prompts under the hood.
Looks good but still has the ai hallmarks, rotating legs, f’ed up gait.. impressive though and it’s going be wild to see what results from this latest pox on the tubes.
Sora is capable of creating “complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background,” according to OpenAI’s introductory blog post.
The company also notes that the model can understand how objects “exist in the physical world,” as well as “accurately interpret props and generate compelling characters that express vibrant emotions.”
Many have some telltale signs of AI — like a suspiciously moving floor in a video of a museum — and OpenAI says the model “may struggle with accurately simulating the physics of a complex scene,” but the results are overall pretty impressive.
A couple of years ago, it was text-to-image generators like Midjourney that were at the forefront of models’ ability to turn words into images.
But recently, video has begun to improve at a remarkable pace: companies like Runway and Pika have shown impressive text-to-video models of their own, and Google’s Lumiere figures to be one of OpenAI’s primary competitors in this space, too.
It notes that the existing model might not accurately simulate the physics of a complex scene and may not properly interpret certain instances of cause and effect.
The original article contains 395 words, the summary contains 190 words. Saved 52%. I'm a bot and I'm open source!