OpenAI, creators of artificial intelligence programs such as ChatGPT, have revealed an app that creates photo-realistic text-to-video projects up to one minute long. CEO Sam Altman’s recent brainchild Sora took the internet by storm when videos demonstrating its capabilities were released on X.
On February 15th, Altman took to Twitter/X to ask users for Sora prompts. He wrote, “We’d like to show you what Sora can do. Please reply with captions for videos you’d like to see, and we’ll start making some!” He followed up, “Don’t hold back on the detail or difficulty!”
The prompts ranged in content and detail. One user requested, “Two golden retrievers podcasting on top of a mountain.” Another included more detail, requesting “An instructional cooking session for homemade gnocchi hosted by a grandmother social media influencer set in a rustic Tuscan country kitchen with cinematic lighting.” Both videos were equally impressive.
Users marveled at how effortlessly realistic the final videos turned out. Film producer Ben Everard commented, “The stakes have been raised. Considerably.” The video of the grandmother filming a gnocchi tutorial gained particular traction for its realistic human hands, which AI notoriously struggles to get right.
Sora’s capabilities
Sora’s website reads: “Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. The model understands not only what the user has asked for in the prompt but also how those things exist in the physical world.”
“The model has a deep understanding of language, enabling it to accurately interpret prompts and generate compelling characters that express vibrant emotions. Sora can also create multiple shots within a single generated video that accurately persist characters and visual style.”
Sora’s ability to create continuity between shots is one of the software’s most impressive features. Bill Peebles, a researcher on the project, said, “There’s actually multiple shot changes—these are not stitched together, but generated by the model in one go.” He continued, “We didn’t tell it to do that; it just automatically did it.”
One of the most popular videos was titled ‘Bling Zoo.’ The seventeen-second video accrued almost three million views on X. It demonstrates shot changes and continuity, making it one of the most cinematic examples shared on the social media platform.
As of now, Sora is not available for public use. It is currently accessible to a selection of visual artists and designers to receive feedback. It is also becoming available to red teamers “to assess critical areas for harms or risks.” OpenAI has not revealed when Sora will become available for general use.