Helen Toner remembers when every person who worked in AI safety could fit onto a school bus. The year was 2016. Toner hadn’t yet joined OpenAI’s board and hadn’t yet played a crucial role in the (short-lived) firing of its CEO, Sam Altman. She was working at Open Philanthropy, a nonprofit associated with the effective-altruism movement, when she first connected with the small community of intellectuals who care about AI risk. “It was, like, 50 people,” she told me recently by phone. They were more of a sci-fi-adjacent subculture than a proper discipline.
But things were changing. The deep-learning revolution was drawing new converts to the cause. AIs had recently started seeing more clearly and doing advanced language translation. They were developing fine-grained notions about what videos you, personally, might want to watch. Killer robots weren’t crunching human skulls underfoot, but the technology was advancing quickly, and the number of professors, think tankers, and practitioners at big AI labs concerned about its dangers was growing. “Now it’s hundreds or even thousands of people,” Toner said. “Some of them seem smart and great. Some of them seem crazy.”
After ChatGPT’s release in November 2022, that whole spectrum of AI-risk experts—from measured philosopher types to those convinced of imminent Armageddon—achieved a new cultural prominence. People were unnerved to find themselves talking fluidly with a bot. Many were curious about the new technology’s promise, but some were also frightened by its implications. Researchers who worried about AI risk had been treated as pariahs in elite circles. Suddenly, they were able to get their case across to the masses, Toner said. They were invited onto serious news shows and popular podcasts. The apocalyptic pronouncements that they made in these venues were given due consideration.
But only for a time. After a year or so, ChatGPT ceased to be a sparkly new wonder. Like many marvels of the internet age, it quickly became part of our everyday digital furniture. Public interest faded. In Congress, bipartisan momentum for AI regulation stalled. Some risk experts—Toner in particular—had achieved real power inside tech companies, but when they clashed with their overlords, they lost influence. Now that the AI-safety community’s moment in the sun has come to a close, I wanted to check in on them—especially the true believers. Are they licking their wounds? Do they wish they’d done things differently?
The ChatGPT moment was particularly heady for Eliezer Yudkowsky, the 44-year-old co-founder of the Machine Intelligence Research Institute, an organization that seeks to identify potential existential risks from AI. Yudkowsky is something of a fundamentalist about AI risk; his entire worldview orbits around the idea that humanity is hurtling toward a confrontation with a superintelligent AI that we won’t survive. Last year, Yudkowsky was named to Time’s list of the world’s most influential people in AI. He’d given a popular TED Talk on the subject; he’d gone on the Lex Fridman Podcast; he’d even had a late-night meetup with Altman. In an essay for Time, he proposed an indefinite international moratorium on developing advanced AI models like those that power ChatGPT. If a country refused to sign and tried to build computing infrastructure for training, Yudkowsky’s favored remedy was air strikes. Anticipating objections, he stressed that people should be more concerned about violations of the moratorium than about a mere “shooting conflict between nations.”
The public was generally sympathetic, if not to the air strikes, then to broader messages about AI’s downsides—and understandably so. Writers and artists were worried that the novels and paintings they’d labored over had been strip-mined and used to train their replacements. People found it easy to imagine slightly more accurate chatbots competing seriously for their job. Robot uprisings had been a pop-culture fixture for decades, not only in pulp science fiction but also at the multiplex. “For me, one of the lessons of the ChatGPT moment is that the public is really primed to think of AI as a bad and dangerous thing,” Toner told me. Politicians started to hear from their constituents. Altman and other industry executives were hauled before Congress. Senators from both sides of the aisle asked whether AIs might pose an existential risk to humanity. The Biden administration drafted an executive order on AI, possibly its “longest ever.”
AI-risk experts were suddenly in the right rooms. They had input on legislation. They’d even secured positions of power within each of the big-three AI labs. OpenAI, Google DeepMind, and Anthropic all had founders who emphasized a safety-conscious approach. OpenAI was famously formed to benefit “all of humanity.” Toner was invited to join its board in 2021 as a gesture of the company’s commitment to that principle. During the early months of last year, the company’s executives insisted that it was still a priority. Over coffee in Singapore that June, Altman himself told me that OpenAI would allocate a whopping 20 percent of the company’s computing power—the industry’s coin of the realm—to a team dedicated to keeping AIs aligned with human goals. It was to be led by OpenAI’s risk-obsessed chief scientist, Ilya Sutskever, who also sat on the company’s board.
That might have been the high-water mark for members of the AI-risk crowd. They were dealt a grievous blow soon thereafter. During OpenAI’s boardroom fiasco last November, it quickly became clear that whatever nominal titles these people held, they wouldn’t be calling the shots when push came to shove. Toner had by then grown concerned that it was becoming difficult to oversee Altman, because, according to her, he had repeatedly lied to the board. (Altman has said that he does not agree with Toner’s recollection of events.) She and Sutskever were among those who voted to fire him. For a brief period, Altman’s ouster seemed to vindicate the company’s governance structure, which was explicitly designed to prevent executives from sweeping aside safety considerations—to enrich themselves or participate in the pure exhilaration of being at the technological frontier. Yudkowsky, who had been skeptical that such a structure would ever work, admitted in a post on X that he’d been wrong. But the moneyed interests that funded the company—Microsoft in particular—rallied behind Altman, and he was reinstated. Yudkowsky withdrew his mea culpa. Sutskever and Toner subsequently resigned from OpenAI’s board, and the company’s superalignment team was disbanded a few months later. Young AI-safety researchers were demoralized.
Yudkowsky told me that he is in despair about the way these past few years have unfolded. He said that when a big public-relations opportunity had suddenly materialized, he and his colleagues weren’t set up to handle it. Toner told me something similar. “There was almost a dog-that-caught-the-car effect,” she said. “This community had been trying so long to get people to take these ideas seriously, and suddenly people took them seriously, and it was like, ‘Okay, now what?’”
Yudkowsky did not expect an AI that works as well as ChatGPT this soon, and it concerns him that its creators don’t know exactly what’s happening underneath its hood. If AIs become much more intelligent than us, their inner workings will become even more mysterious. The big labs have all formed safety teams of some kind. It’s perhaps no surprise that some tech grandees have expressed disdain for these teams, but Yudkowsky doesn’t like them much either. “If there’s any trace of real understanding (on those teams), it is really well hidden,” he told me. The way he sees it, it is ludicrous for humanity to keep building ever more powerful AIs without a clear technical understanding of how to keep them from escaping our control. It’s “an unpleasant game board to play from,” he said.
ChatGPT and bots of its ilk have improved only incrementally so far. Without seeing more big, flashy breakthroughs, the general public has been less willing to entertain speculative scenarios about AI’s future dangers. “A lot of people sort of said, ‘Oh, good, I can stop paying attention again,’” Toner told me. She wishes more people would think about longer trajectories rather than near-term dangers posed by today’s models. It’s not that GPT-4 can make a bioweapon, she said. It’s that AI is getting better and better at medical research, and at some point, it is surely going to get good at figuring out how to make bioweapons too.
Toby Ord, a philosopher at Oxford University who has worked on AI risk for more than a decade, believes that it’s an illusion that progress has stalled out. “We don’t have much evidence of that yet,” Ord told me. “It’s difficult to appropriately calibrate your intuitive responses when something moves forward in these big lurches.” The leading AI labs sometimes take years to train new models, and they keep them out of sight for a while after they’re trained, to polish them up for consumer use. As a result, there is a bit of a staircase effect: Massive changes are followed by a flatline. “You can find yourself incorrectly oscillating between the sensation that everything is changing and nothing is changing,” Ord said.
In the meantime, the AI-risk community has learned a few things. They have learned that solemn statements of purpose drafted during a start-up’s founding aren’t worth much. They have learned that promises to cooperate with regulators can’t be trusted either. The big AI labs initially advertised themselves as being quite friendly to policy makers, Toner told me. They were surprisingly prominent in conversations, in both the media and on Capitol Hill, about AI potentially killing everyone, she said. Some of this solicitousness might have been self-interested—to distract from more immediate regulatory concerns, for instance—but Toner believes that it was in good faith. When those conversations led to actual regulatory proposals, things changed. A lot of the companies no longer wanted to riff about how powerful and dangerous this tech would be, Toner said: “They sort of realized, Hang on, people might believe us.’”
The AI-risk community has also learned that novel corporate-governance structures cannot constrain executives who are hell-bent on acceleration. That was the big lesson of OpenAI’s boardroom fiasco. “The governance model at OpenAI was supposed to prevent financial pressures from overrunning things,” Ord said. “It didn’t work. The people who were meant to hold the CEO to account were unable to do so.” The money won.
No matter what the initial intentions of their founders, tech companies tend to eventually resist external safeguards. Even Anthropic—the safety-conscious AI lab founded by a splinter cell of OpenAI researchers who believed that Altman was prioritizing speed over caution—has recently shown signs of bristling at regulation. In June, the company joined an “innovation economy” trade group that is opposing a new AI-safety bill in California, although Anthropic also recently said that the bill’s benefits would outweigh its costs. Yudkowsky told me that he’s always considered Anthropic a force for harm, based on “personal knowledge of the founders.” They want to be in the room where it happens, he said. They want a front-row seat to the creation of a greater-than-human intelligence. They aren’t slowing things down; they’ve become a product company. A few months ago, they released a model that some have argued is better than ChatGPT.
Yudkowsky told me that he wishes AI researchers would all shut down their frontier projects forever. But if AI research is going to continue, he would slightly prefer for it to take place in a national-security context—in a Manhattan Project setting, perhaps in a handful of rich, powerful countries. There would still be arms-race dynamics, of course, and considerably less public transparency. But if some new AI proved existentially dangerous, the big players—the United States and China in particular—might find it easier to form an agreement not to pursue it, compared with a teeming marketplace of 20 to 30 companies spread across several global markets. Yudkowsky emphasized that he wasn’t absolutely sure this was true. This kind of thing is hard to know in advance. The precise trajectory of this technology is still so unclear.
For Yudkowsky, only its conclusion is certain. Just before we hung up, he compared his mode of prognostication to that of Leo Szilard, the physicist who in 1933 first beheld a fission chain reaction, not as an experiment in a laboratory but as an idea in his mind’s eye. Szilard chose not to publish a paper about it, despite the great acclaim that would have flowed to him. He understood at once how a fission reaction could be used in a terrible weapon. “He saw that Hitler, specifically, was going to be a problem,” Yudkowsky said. “He foresaw mutually assured destruction.” He did not, however, foresee that the first atomic bomb would be dropped on Japan in August 1945, nor did he predict the precise conditions of its creation in the New Mexico desert. No one can know in advance all the contingencies of a technology’s evolution, Yudkowsky said. No one can say whether there will be another ChatGPT moment, or when it might occur. No one can guess what particular technological development will come next, or how people will react to it. The end point, however, he could predict: If we keep on our current path of building smarter and smarter AIs, everyone is going to die.