Meta MTIA Hot Chips 2024_Page_09
Today at Hot Chips 2024, Meta is presenting about its next-generation MTIA, a processor designed specifically for recommendation inference.
Please pardon any typos, these will be done live during the presentation.
Meta AI Acceleration for Recommendation Inference with Next-Gen Meta MTIA
Meta uses AI heavily, and one of the big uses of AI within Meta is our recommendation engine.
Meta MTIA Hot Chips 2024_Page_04
The company says there are many challenges to using GPUs for its recommendation engine.
Meta MTIA Hot Chips 2024_Page_05
As a result, the next generation MTIA has been designed to improve TCO and handle multiple services efficiently.
Meta MTIA Hot Chips 2024_Page_06
Key features of the new MTIA include: The company has significantly increased computing power with this generation.
Meta MTIA Hot Chips 2024_Page_07
The new chips are manufactured at TSMC 5nm and run at a 90W TDP. Another interesting thing here is that the Meta uses LPDDR5 for memory. This is a lower TDP device as it is designed for the recommendation engine, but it also comes with 128GB of memory.
Meta MTIA Hot Chips 2024_Page_09
Besides the 16-channel LPDDR5 128GB memory, there's also 256GB of on-chip SRAM for the 8×8 computing grid.
Meta MTIA Hot Chips 2024_Page_10
Each accelerator uses a PCIe Gen5 x8 host interface and is controlled by RISC-V. It's interesting to note that not only is Meta not using a GPU here, but it's also using RISC-V rather than Arm.
Meta MTIA Hot Chips 2024_Page_11
The new network-on-chip (NoC) is faster than its predecessor.
Meta MTIA Hot Chips 2024_Page_12
The processing elements are based on RISC-V cores with scalars and vectors. What's interesting here, at least, is that we can see some similarities between this and the Tenstorrent Blackhole RISC-V approach.
Meta MTIA Hot Chips 2024_Page_13
There is also a Dot Product Engine (DPE).
Meta MTIA Hot Chips 2024_Page_14
Local memory is 384KB. Meta said throttling memory and internal bandwidth is important to maintain compute utilization.
Meta MTIA Hot Chips 2024_Page_15
Meta has built a high-precision integer dynamic quantization engine that runs in hardware.
Meta MTIA Hot Chips 2024_Page_16
Eager mode is used to improve job startup time and speed up responsiveness.
Meta MTIA Hot Chips 2024_Page_17
Meta is building a hardware decompression engine to allow compressed data to move around the system while saving bandwidth.
Meta MTIA Hot Chips 2024_Page_18
Meta also does weight reduction.
Meta MTIA Hot Chips 2024_Page_19
The new Table Branch Embedding (TBE) Meta is said to improve execution time by 2-3x, which is a big leap.
Meta MTIA Hot Chips 2024_Page_20
Here is the accelerator module. Each card has two MTIA chips. Still, it has a relatively easy to cool 220W TDP. Also, each MTIA can use a PCIe Gen5 x8 interface for a total of x16, making efficient use of PCIe lanes.
Meta MTIA Hot Chips 2024_Page_22
The Meta uses dual CPUs, wow! What is the PCIe switch and memory expansion connected to the CPU? This is a 2024 architecture, so is this CXL or something?
Meta MTIA Hot Chips 2024_Page_23
Well, I've just asked the first Hot Chips question in ages, and Meta says there is an option to add more memory to the chassis, but it's not currently rolled out.
Meta also uses 12 modules per chassis, but appears to be in low power density racks with only 3 chassis per rack and 72 MTIA accelerators (~16kW for the accelerators, probably less than 3kW for the CPUs) – they do not appear to be designed for 40kW+ racks.
Meta MTIA Hot Chips 2024_Page_25
Below is the baseline model performance for Meta's internal workloads.
Meta MTIA Hot Chips 2024_Page_26
Without knowing your baseline, it's a bit hard to know if this is adequate.
Final Words
Overall, it's very cool to see new recommended accelerators from Meta. The fact that their system architecture uses some sort of shared memory over PCIe is very cool. As well, they're using RISC-V, which is a very modern approach. Meta is one of the most open hyperscalers about their hardware, which is very cool.