Microsoft MAIA Hot Chips 2024_Page_02
A few months ago, Microsoft Azure posted a blog post about MAIA, and the company went into more detail about its custom AI accelerator at Hot Chips 2024. Let's take a closer look.
Please excuse any typos. I'm writing this live.
Microsoft MAIA 100 AI Accelerator for Azure
Microsoft built the MAIA 100 as a custom AI accelerator to run OpenAI models, with the midpoint proving to be a clear cost savings (compared to using NVIDIA GPUs).
Microsoft MAIA Hot Chips 2024_Page_02
Key specs include: Microsoft is using TSMC CoWoS-S, which is a TSMC 5nm part. These chips also have 64GB of HBM2E. Using HBM2E means Microsoft is not competing with NVIDIA or AMD for cutting edge HBM supply. Amazingly, there is a large L1/L2 cache of 500MB, and the chip has 12x 400GbE network bandwidth. This is also a 700W TDP part. For inference, Microsoft is using 500W for each accelerator in production.
Microsoft MAIA Hot Chips 2024_Page_03
A diagram of the tiles is shown below. Each cluster has 4 tiles, and each SoC has 16 clusters. Microsoft also has image decoders and confidential computing capabilities.
Microsoft MAIA Hot Chips 2024_Page_04
The accelerator has a wide range of data types, so it can support 9-bit and 6-bit calculations.
Microsoft MAIA Hot Chips 2024_Page_05
Here we show 16 clusters laid out in a NOC topology.
Microsoft MAIA Hot Chips 2024_Page_06
Microsoft wants to use an Ethernet-based interconnect – not something like InfiniBand, but a custom RoCE-like protocol and Ethernet. Microsoft is also a driving force behind the Ultra Ethernet Consortium (UEC), so it makes sense that this is Ethernet-based.
Microsoft MAIA Hot Chips 2024_Page_07
On the software side, there's the Maia SDK.
Microsoft MAIA Hot Chips 2024_Page_08
The asynchronous programming model is as follows:
Microsoft MAIA Hot Chips 2024_Page_09
Maia supports programming through Triton or the Maia API. Triton may be at a higher level, but Maia offers more control.
Microsoft MAIA Hot Chips 2024_Page_10
Here are the partitions and schedules for GEMM: The Excel sheet is a must-see.
Microsoft MAIA Hot Chips 2024_Page_11
The partitions and schedule for GEMM are as follows:
Microsoft MAIA Hot Chips 2024_Page_12
Maia 100 comes with the ability to use PyTorch models out of the box.
Microsoft MAIA Hot Chips 2024_Page_13
In this experience, we import the Maia backend and load Maia instead of cuda.
Microsoft MAIA Hot Chips 2024_Page_14
This is the Maia inter-communication library.
Microsoft MAIA Hot Chips 2024_Page_15
The tools in Maia-SDK are: I've used nvidia-smi for years, but now I use rocm-smi, so it's nice to have maia-smi.
Microsoft MAIA Hot Chips 2024_Page_16
It gave us more information and met our expectations.
Final Words
Overall, we have a lot of information about the Maia 100 accelerator. What is very interesting is that it is a 500W/700W device with 64GB of HBM2E. Due to the lower HBM capacity, it is unlikely to perform as well as the NVIDIA H100. At the same time, it is quite power hungry. In today's power-constrained world, we bet Microsoft can manufacture these for a lot cheaper than NVIDIA GPUs.