On Thursday, OpenAI revealed Sora, their new text to video AI generation model. I had the same reaction to it that I have had to every new release of a so-called artificial intelligence application: this is the same scam I have seen in Silicon Valley one thousand times.
At first, the idea of generating videos with motion just from a text prompt sounds like magic. This is indeed a cutting edge technology, and one that sounds like it’s from a cyberpunk fever dream. But looking at the generated images for even a couple minutes reveals how uncanny and unpolished they are. Look at the way that the people in this video clap—it makes the hairs on the back of my neck stand on end.
More than that, the video does not match the details in the prompt. I don’t see an expression of pure joy and happiness on this grandmother’s face. Her friends aren’t seated around the table, but at a different separate table behind her. At no point does she blow out the candles either. You can see the figure make a few gestures as if she’s going to blow out the candles before wriggling around like a snake wearing human skin. Even a second’s glance at this moving image causes it to fall apart.
While people breathlessly circulated these videos as both a significant achievement in artificial intelligence and also a threat to the very fabric of reality, all I can see are the obvious flaws. I look at moving images all day, and have done so for over thirty years. I know what a human being looks like, how human bodies move, the conscious and unconscious gestures we make as we all move through life. There is more information stored in my brain about the moving image than could be crammed into an AI model like Sora; the human brain and eye are always going to outclass a machine designed to mimic them.
Brian Merchant, author of Blood in the Machine: The Origins of the Rebellion Against Big Tech, posted on Twitter that just like AI models that create still images or text, Sora isn’t actually creating new moving images. It’s just recreating moving images that are extremely similar to the ones in the data set it was trained on.
“I think this whole 'Sora will end reality' framing is similar to the Skynet-type talk we saw with OpenAI last year,” Merchant told me over Twitter direct messages. “[OpenAI CEO] Sam Altman invites the idea that this tech is so powerful that it threatens humanity itself, all while selling it, of course.”
Rather than technological innovation, OpenAI’s business model can be understood as an iteration on the way that Uber and WeWork gamed the financial sector. WeWork never had a coherent business model, but it was able to use raw charisma and some razzle dazzle to get cash injections from investors, convincing them that what was essentially an office real estate company was guaranteed to make them rich. Similarly, Uber as a company has never been profitable and used venture capital money to keep the company afloat while they lost billions of dollars.
Would it surprise you to learn that OpenAI is also not making any money, and relies on investments from venture capital to stay in the black? Because they are not making any money and rely on investments to stay in black. Part of this is because of the technology itself: according to Microsoft, which has invested $13 billion into OpenAI, they lose money each time a user makes a request using their AI models. In this light, Sora seems more like an advertisement for the “potential” of OpenAI as a business than an existential threat. They need the hype on Twitter to boost their valuation, and each viral tweet declaring Sora to be a threat to Hollywood is a tool to use to boost their financial portfolio.
“If we're all worried that this deeply imperfect if technically impressive system is going to collapse reality itself we won't be thinking as much about what's really happening: OpenAI is going to sell the software to bosses as a labor-saving technology,” Merchant told me.
There are aspects of AI video models like Sora that are mildly worrisome—before we all learned how to spot a photoshopped image, it was really annoying to explain to my parents that something had clearly been photoshopped. But the idea of this being a breakdown of reality or the art of the moving image is overblown. AI will change the world in the same way that NFTs and crypto did: it will make a few people very very rich before disappearing.