This week, X launched an AI image generator to allow paid subscribers of Elon Musk's social platform to create their own art. Unsurprisingly, some users quickly created images of Donald Trump flying into the World Trade Center, Mickey Mouse holding an assault rifle, and enjoying a cigarette and a beer on the beach. Some of the images created with the tool are deeply unsettling, others strange and kind of funny. They depict completely different scenarios and characters. But somehow, they all feel similar, bearing the unmistakable hallmarks of the AI art that has emerged in recent years thanks to products like Midjourney and DALL-E.
Two years into the generative AI boom, what these programs churn out seems technologically advanced. Trump’s images look better than, say, the similarly jarring SpongeBob image generated by Microsoft’s Bing Image Creator last October. But they’re stuck with a unique aesthetic: the colors are bright and saturated, the people are beautiful, the lighting is dramatic. Many of the images are blurred or airbrushed, carefully smoothed out like the icing on a wedding cake. At times, the footage seems exaggerated. (And errors like extra fingers are common.) Users can get around the monotony of these algorithms by using more specific prompts: enter, say, a photo of a dog riding a horse, not just a photo of a dog riding a horse, an Andy Warhol-esque one. But when users don’t specify, these tools seem to default to a bizarre mix of cartoons and dream worlds.
These programs are becoming more and more common. Google announced a new AI image creation app, Pixel Studio, that lets you create such art on your Pixel phone. The app will come pre-installed on all of the company's latest devices. Apple plans to release Image Playground as part of its Apple Intelligence suite of AI tools later this year. OpenAI is now letting ChatGPT users generate two images per day from its latest text-to-image conversion model, DALL-E 3, for free (previously, access to the tool required a paid premium plan). So I wondered why so much AI art looks the same.
The AI companies themselves haven't been particularly forthcoming. X replied with a boilerplate email in response to a request for comment about the new product and the images users are creating. The four companies behind popular image generators — OpenAI, Google, Stability AI, and Midjourney — either didn't respond or offered no comment. A Microsoft spokesperson pointed to some of the company's prompting guides and recommended that people refer technical questions to OpenAI, since Microsoft uses a version of DALL-E in products like Bing Image Creator.
So I consulted with outside experts, who came up with four explanations. The first focuses on the data used to train the model. Text-to-image generators rely on vast photo libraries combined with text descriptions, which they then use to create unique, original images. The tools can inadvertently pick up biases in the dataset, whether they're racial or gender biases or something as simple as bright colors and good lighting. The internet is full of decades of filtered, artificially brightened photos and fantasy illustrations. “There's a lot of fantasy-inspired art and stock photography, and that feeds into the models themselves bit by bit,” said Givi Epstein, a scientist at Stanford University's Human-Centered AI Laboratory. Philip Isola, a professor at the MIT Computer Science & Artificial Intelligence Laboratory, also said there's a limited number of good datasets available to build image models, which means there can be overlap in what's used to train the models. (One popular one, CelebA, has 200,000 labeled photos of celebrities. Another, LAION 5B, is an open-source option with 5.8 billion photo-text pairs.)
The second explanation has to do with the technology itself. Most modern models use a technique called diffusion. During training, the model is taught to add “noise” to existing images, which are then paired with a text description. “Think of it as TV static,” Apolinário Passos, a machine learning artistic engineer at Hugging Face, a company that creates its own open-source model, told me. The model is then trained to remove this noise over and over again, on tens of thousands, or even millions, of images. This process is repeated, and the model learns how to denoise images. Eventually, it can take this noise and create an original image from it. All it needs is a text prompt.
Many companies use the technology. “I think these models are all very similar, technically,” Isola said, noting that more recent tools are based on transformer models. Perhaps the technology is biased toward a certain look. Take an example from the not-so-distant past. Five years ago, image generators tended to create very blurry outputs, Isola explained. Researchers realized that it was the result of a mathematical accident: the models were essentially averaging all the images they trained on. Averaging, it turns out, “makes things look blurry.” It's possible that something similarly technical is happening in this generation of image models today that allows them to produce similarly dramatic, highly stylized images, but researchers haven't fully figured it out yet. Additionally, “most models have 'aesthetic' filters on both the input and output that reject images that don't meet certain aesthetic criteria,” Hany Farid, a professor at the University of California, Berkeley's School of Information, told me in an email. “This kind of filtering on input and output is almost certainly a big reason why all of the AI-generated images have a certain ethereal quality.”
The third theory concerns the humans who use these tools. Some of these sophisticated models incorporate human feedback, learning as they use them. They do this by incorporating signals such as which photos are downloaded. In other models, trainers manually evaluate which photos they like and which they don't, Isola explains. Perhaps this feedback is then incorporated into the model. If the artworks people download tend to include very dramatic sunsets or ridiculously beautiful seascapes, the tool may learn that's what humans want and serve them more of them. Alexandru Costin, Adobe's vice president of generative AI, and Zeke Koch, vice president of product management for Adobe Firefly (the company's AI imaging tool), said in an email that user feedback can certainly be a factor for some AI models, a process they call “reinforcement learning from human feedback,” or RLHF. They also noted that training data and ratings by human evaluators are influencing factors. “Art generated by AI models can sometimes have a peculiar look (especially when created using simple prompts). This is typically caused by a combination of the images used to train the image output and the preferences of the person training or rating the images,” they said in a statement.
A fourth theory has to do with the creators of these tools. An Adobe representative told me the company doesn't do anything to encourage a particular aesthetic, but it's possible that other AI creators have picked up on human preferences and coded them into their models—telling them to paint more dreamy beach scenes or elfin women. This could be intentional: If there's a market for such images, companies might start rallying around them. Or it could be unintentional: Companies put a lot of manual effort into their models to fight bias, for example, and various tweaks to favor one kind of image over another could unintentionally result in a certain look.
One or more of these explanations could be true. In fact, that’s probably what’s happening. Experts say the style we see is likely the result of multiple factors at once. Ironically, all of these explanations suggest that the creepy scenes we associate with AI-generated images are actually an extreme reflection of our own human tastes. So it’s no surprise that Facebook is flooded with crude AI-generated images that benefit creators, that Etsy recently asked users to label AI-created products after a surge in junk listings, or that arts and crafts store Michaels was recently found to be selling canvases that were partially AI-generated (the company called it an “unacceptable error” and pulled the products).
AI imagery will become more prevalent in everyday life. For now, these artworks are visually obvious and usually machine-made. But that could change. The technology could get better. Passos told me that new models tend to “try to deviate” from the current aesthetic. Indeed, computer-generated artworks may one day shed their strange, cartoonish appearance and pass unnoticed. Then we might feel nostalgic for the banal style that was once a telltale sign.