In a separate analysis this week, data journalist Ben Welsh found that just over a quarter of the news sites he surveyed (294 of 1,167 mostly English-language, U.S.-based publications) blocked Applebot-Extended. By comparison, Welsh found that 53 percent of the news sites in his sample blocked OpenAI's bot. Google introduced its own AI-only bot, Google-Extended, last September, but it was blocked by about 43 percent of these sites, suggesting that Applebot-Extended may still be going unnoticed. But Welsh told WIRED that the number has been “gradually increasing” since he began his investigation.
Welsh is currently running a project to monitor how news outlets approach major AI agents. “News outlets are divided on whether to block these bots,” Welsh says. “I don't know why every single one of them has made that decision. Of course, we hear that a lot of them have licensing deals and are being paid to block the bots, so that's probably a factor.”
Last year, The New York Times reported that Apple was looking to strike AI deals with publishers. Since then, competitors like OpenAI and Perplexity have announced partnerships with various news outlets, social platforms, and other popular websites. “Many of the world's largest publishers are clearly taking a strategic approach,” says Jon Gillham, founder of Originality AI. “In some cases, I think there's a business strategy involved, like not releasing data until a partnership agreement is in place.”
There is some evidence to support Gillham's theory. For example, Condé Nast's website once blocked OpenAI web crawlers. The company unblocked the OpenAI bot after announcing its partnership with the company last week (Condé Nast declined to comment publicly on the matter). Meanwhile, BuzzFeed spokesperson Juliana Clifton told WIRED that the company, which now blocks Applebot-Extended, adds all identifiable AI web crawler bots to its block list unless their owners have a partnership with the company (usually for a fee). BuzzFeed also owns The Huffington Post.
Robots.txt must be edited manually, and with so many new AI agents on the market, keeping block lists up to date can be difficult: “You don't know what to block,” says Gavin King, founder of Dark Visitors. Dark Visitors offers a freemium service that automatically updates robots.txt for client sites, but King says the majority of his clients are publishers because of copyright concerns.
While robots.txt may seem like the esoteric domain of webmasters, given its outsized importance to digital publishers in the age of AI, it's now the domain of media executives: WIRED found that two CEOs of major media companies directly decide which bots to block.
Some media outlets have explicitly stated that they are blocking the AI scraping tool because they are not currently affiliated with the owner. “As with many other AI scraping tools, we block Applebot-Extended on all Vox Media properties if we do not have a commercial agreement with the other party,” said Lauren Stark, senior vice president of communications at Vox Media. “We believe in protecting the value of published works.”