Today saw Google unveiling its latest advancement, Gemini 1.5, to developers and corporate users, signalling a leap forward in their AI portfolio. In a concurrent move, OpenAI, Google’s formidable rival, has made waves with a groundbreaking announcement of its own in the realm of artificial intelligence. OpenAI introduced a novel text-to-video AI model named Sora, with the unveiling featuring a variety of video clips that include everything from a celebration of the Chinese Lunar New Year to an animated monster adoring a red candle.
OpenAI has initiated access to Sora for a select group of red teamers, tasked with identifying potential for harm or misuse in areas such as misinformation, discrimination, and bias. This step is pivotal for ensuring that Sora aligns with the stringent safety protocols set for DALL·E 3, alongside OpenAI’s commitment to developing verification tools for discerning Sora-generated videos.
While ventures by Pika and Stability AI into AI-generated video precede Sora, several attributes set Sora apart. Remarkably, Sora is capable of producing up to 60 seconds of continuous video, a significant leap over the roughly four-second outputs of its predecessors, and boasts superior sharpness, resolution, and environmental fidelity.
Over 35 examples showcased on OpenAI’s website demonstrate Sora’s capabilities, albeit with an admission of its current imperfections. OpenAI candidly acknowledges:
The current model has weaknesses. It may struggle with accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect. For example, a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark.
OpenAi
The model may also confuse spatial details of a prompt, for example, mixing up left and right, and may struggle with precise descriptions of events that take place over time, like following a specific camera trajectory
This is observable in the very first video featured in the blog, where a woman is shown walking through Tokyo. A closer inspection reveals anomalies such as her legs intermittently switching or stuttering, her feet seemingly sliding over the ground, and noticeable alterations in her outfit and hairstyle towards the end of the clip.