Tall Bit
The 28-year-old founders of TollBit, a New York-based startup that's just six months old, believe we're living through an AI “Napster era,” with companies pirating vast swaths of the internet without paying the rights holders, in the same way that a generation of people downloaded digital music. They want TollBit to be the iTunes of the AI world.
“Right now it's kind of a lawless zone,” Olivia Joslin, co-founder and COO, told Engadget in an interview. “We want to make it easier for AI companies to pay for the data they need.” Their idea is simple: create a marketplace that connects AI companies that need access to fresh, high-quality data with the publishers who are willing to actually pay to create it.
In fact, it's only recently that AI companies have started paying for (some of) the data they need from news publishers. OpenAI kicked off the arms race at the end of 2022, but it was only a year ago that the company signed the first of many licensing deals with the Associated Press. Later that year, OpenAI announced a partnership with German publisher Axel Springer, which runs Business Insider and Politico in the US. Since then, several publishers, including Vox, Financial Times, News Corp and TIME, have signed deals with OpenAI and Google.
But that still leaves countless other publishers and creators out in the cold, with no option to enter into this Faustian bargain even if they wanted to. This is the “long tail” of publishers that TollBit wants to target.
“Powerful AI models already exist, they're already trained,” TollBit co-founder and CEO Toshit Panigrahi told Engadget, “and right now, thousands of applications are pulling these existing models off the shelves. What's needed is fresh content. But right now, the infrastructure isn't there, which means they can't buy it, and content makers can't sell it seamlessly.”
Neither Joslyn nor Panigrahi were particularly familiar with the media industry, but they did know how online marketplaces and platforms worked—they were colleagues at Toast, a platform that manages restaurant billing and reservations. When Panigrahi saw the pile of deals and lawsuits in the AI space, he reached out to Joslyn.
Their first conversation was about RAG, which in the AI world stands for Retrieval-Augmented Generation. In RAG, the AI model first looks up information from a specific database (such as a scrapable part of the internet) and then uses that information to synthesize a response. It doesn't just rely on training data. Services like ChatGPT don't know current house prices or the latest news. Instead, they typically consult websites to get that data. Lacking the latest data, AI chatbots are often stumped by queries about the latest news; they have to scrape the latest data to respond.
“We thought that using content for RAGs was fundamentally different from using it for training,” Panigrahi said.
Tall Bit
Some see RAG as the future of search engines. More and more people are asking questions on the internet and expect complete answers, not just a list of blue links. In just over a year, startups like Perplexity, backed by the likes of Jesse Bezos and NVIDIA, have emerged with the ambition of taking on Google. Even OpenAI plans to one day turn ChatGPT into a search engine. Google has acted quickly in response, gathering relevant information from search results and presenting it as a coherent answer at the top of the results page. This feature is called AI Overview (it doesn't always work, but it seems like it's here to stay).
The rise of RAG-based search engines has publishers shaking in their boots. After all, who benefits if AI reads the internet for us? After Google released its AI Overviews earlier this year, at least one report estimated that publishers will lose more than $2 billion in advertising revenue because people have less reason to visit their websites. “AI companies also need ongoing access to high-quality content and data,” Joslin said. “But if we don't come up with some kind of economic model here, no one will have an incentive to create content, and that will be the end of AI applications.”
TollBit's model aims to reward publishers on an ongoing basis, rather than issuing one-off checks: if someone's content was used in 1,000 AI-generated answers, the publisher would receive 1,000 times the reward, at a price they set themselves and can change on the fly.
Each time an AI company accesses fresh data from a publisher through TollBit, the company can pay a small fee set by the publisher. Panigrahi and Joslin believe this fee should be roughly equivalent to what publishers earn from traditional page views. The platform can also block unregistered AI companies from accessing a publisher's data.
The founders claim that 100 publishers have joined TollBit so far since it launched in February, and that they are piloting it with three AI companies. They declined to say which publishers and AI companies have signed up so far, citing confidentiality clauses, but did not deny having spoken to OpenAI, Anthropic, Google, and Meta. So far, they say, no money has exchanged hands between AI companies and publishers on their platform.
Tall Bit
Until that happens, their model remains a huge hypothetical. However, investors have pumped in $7 million so far. TollBit's investors include Sunflower Capital, Lerer Hippeau, Operator Collective, AIX, and Liquid 2 Ventures, with Joslin claiming that more are currently “knocking on their door.” In April, TollBit brought on Campbell Brown as a senior advisor. Brown is a former TV anchor who served as head of news partnerships at Meta for nearly a decade.
Despite some high-profile lawsuits, AI companies still scrape the internet for free and largely get away with it. Why would they have an incentive to actually pay publishers for this data? According to the founders, there are three big reasons: scraping the web has become more difficult and expensive as more websites have taken steps to prevent their content from being scraped since generative AI went mainstream; no one wants to deal with ongoing copyright litigation; and most importantly, AI companies can get in on smaller, more niche publications because they can easily pay for content if they want, since it would be impossible to enter into individual licensing agreements with every website. Joslin also noted that several investors in TollBit are also investing in AI companies that are concerned they could face lawsuits for using content without permission.
Getting AI companies to pay for content could provide a continuous revenue stream not just for big publishers, but for anyone who publishes something online. Last month, Perplexity, which was accused of illegally scraping content from Forbes, Wired, and Condé Nast, launched a publisher program that plans to give publishers a cut of the revenue it makes when it uses their content to generate answers with AI. However, the success of the program depends on how much revenue Perplexity makes when it introduces ads to its app later this year. Like Tollbit, this is entirely hypothetical.
“Torbit's argument is that if you lose pageviews today, you should be compensated immediately, instead of having tech companies figure out an advertising program for you years from now,” Panigrahi said of Perplexity's work.
Despite existing licensing agreements and technological advances, AI-powered chatbots remain the worst news sources, falsifying facts and confidently crafting links to articles that don't actually exist. But tech companies are stuffing AI chatbots into every crevice, and in the not-too-distant future, many of us will be getting our news from these products.
A more cynical take on Thorbit's premise is that the startup is, in effect, offering hush money to publishers who are likely to lace misinformation. Unsurprisingly, the company's founders don't agree with this view. “We're careful about the AI partners we bring on board,” Panigrahi says. “These companies care very much about the quality of the input material and the accuracy of the responses. We find that paying for content, even if it's a small amount, creates an incentive to respect the raw inputs into the system rather than treating them as a free and replaceable commodity.”