Tesla Dojo training tiles now have wiring and piping complete
At Hot Chips 2024 we had the opportunity to learn about TTPoE (Tesla Transport Protocol over Ethernet), this is mostly about V1 of TTP but something we can share at Hot Chips: Tesla decided to create their own network protocol for their AI clusters instead of using TCP.
Please pardon any typos. These are happening in real time during Hot Chips in Stamford.
Tesla DOJO Exascale Lossy AI Network using Tesla Transport Protocol over Ethernet TTPoE
For Tesla's DOJO supercomputer, the company developed not only AI accelerators but also its own transport protocol over Ethernet, called Tesla Transport Protocol over Ethernet (TTPoE).
Tesla Dojo Hot Chips 2024_Page_01
Tesla says TCP/IP is too slow, but RDMA, which uses PFC for a lossless fabric, will impact the network.
Tesla Dojo Hot Chips 2024_Page_02
TTPoE is a peer-to-peer transport layer protocol that runs in hardware, with the advantage that it does not require special switches since Tesla primarily uses switches for layer 2 transport.
Tesla Dojo Hot Chips 2024_Page_03
Here is the OSI layers for DOJO. You can see that Tesla replaces the transport layer.
Tesla Dojo Hot Chips 2024_Page_04
Here is an example of a TTP transition over a TTP link:
Tesla Dojo Hot Chips 2024_Page_05
This is a comparison of the TCP state machine and the TTP state machine.
Tesla Dojo Hot Chips 2024_Page_06
This is a TTP header frame built on top of Ethernet-II framing.
Tesla Dojo Hot Chips 2024_Page_07
Unlike lossless RDMA networks, TTPoE expects packet loss and will retry sending the packet, more like TCP but not UDP.
Tesla Dojo Hot Chips 2024_Page_08
Congestion management is handled by the local link channel, rather than at the network or switch level. Tesla says that TTP supports QoS, but it is disabled.
Tesla Dojo Hot Chips 2024_Page_09
Tesla is designed to incorporate this IP block into the FPGA and silicon to send packets over the wire.
Tesla Dojo Hot Chips 2024_Page_10
This is the TTP microarchitecture. What makes it unique is that it looks a lot like an L3 cache. The 1MB TX buffer was mentioned as “in this generation” so it's likely that it will change in the new generation. The last HBM2HBM fabric memory line is a very popular feature.
Tesla Dojo Hot Chips 2024_Page_11
The 100Gbps NIC for Dojo is a Mojo that runs at under 20W and includes 8GB of DDR4 memory and the Dojo DMA Engine, which we covered in HC34's Tesla Dojo custom AI supercomputer.
Tesla Dojo Hot Chips 2024_Page_12
Tesla is now back at its 2022 presentation where it showcased the D1 die.
Tesla Dojo Hot Chips 2024_Page_13
Currently, we see a 5×5 array of D1 chips packaged together.
Tesla Dojo Hot Chips 2024_Page_14
x
There is also a 32GB HBM Dojo interface processor with TTPoE support. The 900GB/s TTP interface is internal. TTPoE is wrapped in an Ethernet frame.
Tesla Dojo Hot Chips 2024_Page_15
Tesla showed how the Dojo is connected.
Tesla 100G NIC to V1 Dojo Interface Card to Dojo
It starts with the assembly of all the SerDes cabled D1 tiles packaged together.
Tesla Dojo training tiles now have wiring and piping complete
They are sent to the interface card.
Tesla V1 Dojo Interface Processor Card 2
It is then connected to a low-cost 100G NIC.
Tesla Dojo 100G NIC
Here's another view of what was on the table.
Tesla Dojo Hot Chips 2024_Page_16
This is Mojo Dojo Compute Hall (MDCH) in New York, where we see a 2U compute node with no 2.5 inch storage on the front, which is very interesting.
Tesla Dojo Hot Chips 2024_Page_17
This is a 4 ExaFLOPs engineering system with 40PB of local storage and loads of bandwidth and compute. It also has a 4EF (BF16/FP16) engineering system, which is pretty crazy.
Tesla Dojo Hot Chips 2024_Page_18
Arista has provided switches for this purpose. As networks grow in size and the number of hops increases, increased latency impacts bandwidth.
Tesla Dojo Hot Chips 2024_Page_19
Tesla joins UEC and rolls out TTPoE. Very cool!
Tesla Dojo Hot Chips 2024_Page_20
Tesla Dojo Hot Chips 2024_Page_21
From the photos it looks like Tesla also uses Arista switches.
Tesla Dojo Hot Chips 2024_Page_22
Here's something interesting: Tesla also says that TTPoE can provide lower one-way write latency through the switch, and this includes NVLink.
Tesla Dojo Hot Chips 2024_Page_23
Tesla's conclusion is that they are in the microsecond realm.
Final Words
This is one interesting talk, and it would be great to see this used outside of Dojo one day. It seems like a lot of work to write custom NICs, custom protocols, etc. for your system and not try to leverage economies of scale. It would be great to see Tesla bring this to the UltraEthernet Consortium.