In the sprawling labs of Google's Mountain View, California, headquarters, hundreds of server racks slog through multiple aisles, performing tasks far less common than running the world's largest search engine or running workloads for millions of Google Cloud customers.
Instead, they run their tests on Google's own microchip called a Tensor Processing Unit (TPU).
Originally trained for internal workloads, Google's TPUs have been available to cloud customers since 2018. In July, Apple revealed that it was using TPUs to train the AI models underlying Apple Intelligence, and Google also uses TPUs to train and run its Gemini chatbot.
“There's a fundamental idea out there that all the AI, all the large language models are trained on NVIDIA, and of course NVIDIA does the lion's share of the training volume. But Google has taken its own path here,” said Daniel Neumann, CEO of Futurum Group, who has covered Google's custom cloud chip since its launch in 2015.
Google was the first cloud provider to manufacture a custom AI chip. Three years later, Amazon Web Services announced its first cloud AI chip, Inferentia. Microsoft's first custom AI chip, Maia, wasn't announced until late 2023.
But coming out on top in AI chips hasn't secured the top spot in the overall generative AI race: Google has faced criticism for botched product releases, and Gemini arrived more than a year later than OpenAI's ChatGPT.
But Google Cloud is gaining momentum thanks in part to its AI services, and Google's parent company Alphabet reported that its cloud revenue grew 29% in the most recent quarter, topping $10 billion in quarterly revenue for the first time.
“The AI cloud era has completely changed the way we look at the company. This silicon differentiation, the TPU itself, may be one of the biggest reasons why Google is now seen as fully pari passu by the third cloud and more people now see it as superior in AI capabilities than the other two,” Newman said.
“A simple but powerful thought experiment”
In July, CNBC got its first on-camera tour of Google's chip lab and interviewed the person in charge of custom cloud chips, Amin Vadat, who has been at Google since the company first toyed with the idea of making chips in 2014.
Amin Vadat, Google's vice president of machine learning, systems and cloud AI, holds up TPU version 4 at Google headquarters in Mountain View, California on July 23, 2024.
Mark Ganley
“It all started with a simple but powerful thought experiment,” Vadat says. “A number of leaders at the company asked the following question: 'What if a Google user wanted to interact with Google by voice for just 30 seconds a day? How much computing power would it take to support that user?'”
The group determined that Google needed to double the number of computers in its data centers and searched for a better solution.
“We realized that by building custom hardware, in this case Tensor Processing Units, rather than using general-purpose hardware, we could support this much more efficiently — in fact, 100 times more efficiently than the alternative,” Vahdat said.
Google's data centers rely on general-purpose central processing units (CPUs) and Nvidia's graphics processing units (GPUs). Google's TPUs are another type of chip called an application-specific integrated circuit (ASIC), which is custom built for a specific purpose. The TPUs are focused on AI. Google also makes video coding units, another ASIC that focuses on video.
Google also makes custom chips for its devices, similar to Apple's custom silicon strategy: the Tensor G4 is in Google's new AI-enabled Pixel 9, and its new A1 chip is in the Pixel Buds Pro 2.
But it's the TPU that sets Google apart: it was the first of its kind when it launched in 2015, and the Google TPU still dominates custom cloud AI accelerators with 58% market share, according to The Futurum Group.
Google coined the term based on the algebraic term “tensor” to refer to the large-scale matrix multiplications that are performed quickly in advanced AI applications.
With the second TPU release in 2018, Google expanded its focus from inference to training, enabling its cloud customers to run workloads alongside market-leading chips such as Nvidia's GPUs.
“GPUs make them more programmable and more flexible, but supplies are tight,” said Stacey Rasgon, senior semiconductor analyst at Bernstein Research.
The AI boom has sent shares in chipmaker Nvidia soaring, giving it a market capitalization of $3 trillion in June, overtaking Alphabet and putting it in contention with Apple and Microsoft for the title of world's most valuable publicly traded company.
“Frankly, these specialized AI accelerators are not as flexible or as powerful as Nvidia's platform, and the market is excited to see if someone can step up in this space,” Newman said.
We know that Apple is using Google's TPUs to train its AI models, but the real test will come next year when the full AI capabilities are introduced on iPhones and Macs.
Broadcom and TSMC
Developing an alternative to Nvidia's AI engine will be no easy task. Google's sixth-generation TPU, Trillium, is due to be released later this year.
Google showed off the sixth version of its TPU, Trillium, to CNBC on July 23, 2024 in Mountain View, California. Trillium is scheduled to be released in the second half of 2024.
Mark Ganley
“It's costly. It requires significant scale,” Rasgon said. “So it's not something that just anyone can do. But the hyperscalers have the scale and the money and the resources to go down that path.”
The process is so complex and costly that even hyperscalers can't do it alone. Since the first TPU, Google has partnered with chip developer Broadcom, helping design Meta's AI chips. Broadcom says it has spent more than $3 billion to make the partnership happen.
“AI chips are very complex and have a lot of different features, so Google does the compute,” Rasgon said. “Broadcom does all the peripherals, the I/O, the SerDes, all the different parts that are related to the compute. They also do the packaging.”
The final design is then sent to a manufacturing facility, or fab, to be built — primarily factories owned by Taiwan Semiconductor Manufacturing Co. (TSMC), the world's largest chipmaker, which makes 92% of the world's most advanced semiconductors.
Asked whether Google had any safeguards in place in case of a worst-case scenario in the geopolitical realm between China and Taiwan, Vadat said: “It's certainly something we're preparing for and considering, but hopefully it's not something we actually have to trigger.”
Protecting against these risks is a key reason the White House is doling out $52 billion in CHIPS Act funding to companies building factories in the US, with Intel, TSMC and Samsung getting the lion's share so far.
Processor and Power
Google showed off its new Axion CPU to CNBC.
Mark Ganley
“Now we can bring in the final piece of the puzzle: CPUs,” Vahdat says, “which is why a lot of our internal services run on Axion, including BigQuery, Spanner, and YouTube ads.”
Google is late to the CPU game: Amazon released its Graviton processors in 2018, Alibaba released its server chips in 2021, and Microsoft announced its CPUs in November.
When asked why Google didn't build a CPU sooner, Vahdat replied: “We've always focused on areas where we can provide the most value to our customers, and that started with the TPU, the video coding unit and networking. We really thought the time was right.”
These processors, made by non-chip makers including Google, are all powered by the Arm chip architecture. Arm is gaining popularity as a more customizable, power-efficient alternative to traditional x86 models from Intel and AMD. Power efficiency is important because by 2027, AI servers are projected to consume as much electricity annually as a country like Argentina. Google's latest environmental report found that emissions increased nearly 50% between 2019 and 2023, driven in part by the proliferation of data centers to power AI.
“Without the efficiency of these chips, the numbers could be completely different,” Vadat said. “We will continue to work to reduce the carbon footprint of our infrastructure 24/7, 365 days a year, and move closer to zero.”
Cooling the servers that train and run AI requires vast amounts of water, so Google's third-generation TPUs use direct-to-chip cooling that uses much less water, which is also how Nvidia cools its latest Blackwell GPUs.
Despite challenges ranging from geopolitics to power and water, Google is focusing on generative AI tools and manufacturing its own chips.
“We've never seen anything like this before, and it's not showing any signs of slowing down yet,” Vadat said. “That's where hardware is going to play a really important role.”