Elon Musk's supercomputing achievements continued to advance this week, as he released a video of his newly renamed AI supercluster “Cortex” at X. Tesla's recent expansion of its “Giga Texas” factory includes 70,000 AI servers, requiring 130 megawatts (MW) of cooling and power at launch, with plans to scale to 500 MW by 2026.
Video from today inside Cortex, the massive new AI training supercluster being built to solve real-world AI problems at Tesla HQ in Austin. pic.twitter.com/DwJVUWUrb5August 26, 2024
Musk's video of the Cortex supercluster shows the assembly of a massive number of server racks. From the blurry video, it appears that the racks are arranged in an array of 16 compute racks per row, with around four non-GPU racks breaking up the rows. Each compute rack houses eight servers. With 16 to 20 rows of server racks visible in the 20-second clip, a rough calculation would suggest we're looking at 2,000 GPU servers, which is less than 3% of the estimated full-scale deployment.
Musk said during Tesla's July earnings call that the Cortex supercluster will be Tesla's largest training cluster to date, featuring “50,000 (Nvidia) H100s and 20,000 of our own hardware,” a smaller figure than Musk had previously stated. A June tweet estimated that Cortex would feature 50,000 of Tesla's Dojo AI hardware. The Tesla CEO's previous statements also suggested that Tesla's own hardware would come online at a later date, with Cortex expected to be fully Nvidia-powered at launch.
According to Elon's tweet, the Cortex training cluster is being built to “solve real-world AI,” which in Tesla's Q2 2024 earnings call means training Tesla's Full Self-Driving (FSD) autopilot system (which will power consumer Teslas and the promised “cyber taxi” product), as well as training the AI for Optimus Robot, an autonomous humanoid robot due to go into limited production in 2025 for use in Tesla's manufacturing process.
Cortex first garnered press attention in June thanks to the giant fans that Musk showed off under construction to cool the entire supercluster. The fan stacks will cool a liquid cooling solution provided by Supermicro that's built to eventually handle 500 MW of cooling and full power. For perspective, the average coal-fired power plant has an output of about 600 MW.
(Image credit: Elon Musk via X)
Cortex will join a fleet of supercomputers being developed by Elon Musk. The first of Musk's data centers to go live so far is the Memphis Supercluster, owned by xAI and equipped with 100,000 Nvidia H100s. All 100,000 servers in Memphis are connected to a single RDMA (Remote Direct Memory Access) fabric, also cooled with the help of Supermicro. Musk has also announced plans to build the $500 million Dojo supercomputer in Buffalo, New York, another Tesla venture.
The Memphis Supercluster is also planning to upgrade its H100 base to 300,000 B200 GPUs, but this large order has been delayed by several months due to Blackwell's production delays caused by a design flaw. As one of the largest single customers of Nvidia AI GPUs, Musk appears to be following CEO Jensen Huang's “the more you buy, the more you save” calculation. Only time will tell if this applies to Musk and his collection of supercomputers.