Intel Gaudi 3 for AI Training and Inference

Intel Gaudi 3 OAM working sample package 1

Intel's main AI chip up until Falcon Shores is the Intel Gaudi 3. Some new details emerged at Hot Chips 2024. We've been covering this for a while now (e.g. April 2024), but it's expected to go from samples to production by 2024.

This is a live stream so please excuse any typos.My fingers are getting rough by 5pm.

Intel Gaudi 3 for AI Training and Inference

This is the third generation of Gaudi since around 2019. This generation brings further improvements in compute, memory bandwidth and capacity.

Inter Gaudi 3 Hot Chip 2024_Page_02

This is the OAM module. It has two interconnected compute dies that are mirror images of each other.

Inter Gaudi 3 Hot Chip 2024_Page_03

Here is the block diagram: What's really interesting here is that we have 14 decoders for HEVC, H264, JPEG, VP9, which is important for video inference, and we also get a lot of speed and feeds.

Inter Gaudi 3 Hot Chip 2024_Page_04

Each die has two DCOREs (Deep Learning Cores). Each die has a pair of matrix multiplication engines and 16 tensor processor cores, as well as 24MB of cache.

Inter Gaudi 3 Hot Chip 2024_Page_05

The Matrix Multiplication Engine is the large-scale matrix calculation engine of the Gaudi 3 accelerator.

Inter Gaudi 3 Hot Chip 2024_Page_06

Tensor processors are for non-Matmul computations.

Inter Gaudi 3 Hot Chip 2024_Page_07

L2, L3 and HBM are all in a unified memory space. There is also a memory context ID that allows tagging of shared cache lines. There is also near memory computing capability to save work for the TPC.

Inter Gaudi 3 Hot Chip 2024

Gaudi 3 also has its own control path and runtime drivers.

Inter Gaudi 3 Hot Chip 2024_Page_09

A quick word here about the Intel Gaudi software suite: I wish Intel had gone a step further and just talked about the Gaudi suite for Falcon Shores. If Falcon Shores is 2025, that should be on the table.

Inter Gaudi 3 Hot Chip 2024_Page_10

The graph compiler orchestrates how work is divided among the accelerators. The NOC bandwidth is designed to support parallel MME and TPC work.

Inter Gaudi 3 Hot Chip 2024_Page_11

When I saw Habana Labs at Hot Chips 31 in 2019 (Hot Chips was last held at Stanford Memorial Theater), one of the cool things they did was this: Habana uses an RDMA Ethernet network from the accelerators to connect each accelerator to each other and to a larger topology.

Inter Gaudi 3 Hot Chip 2024_Page_12

Here are some performance benchmarks: Although scaling has been done, it appears that Llama3-8B is still being optimized.

Inter Gaudi 3 Hot Chip 2024_Page_13

Gaudi 3 is designed to be easily scaled out using standard networks, using Ethernet networks.

Inter Gaudi 3 Hot Chip 2024_Page_14

At the same time, there is the question of whether it is “at any scale” or has it actually been tested on high-end systems with 65,000 or 100,000+ accelerators.

Final Words

This is a chip that is being ramped up in production, so we should start seeing more of it soon. After showcasing Gaudi 2 at Intel Developer Cloud last year, we got our first look at Gaudi 3 UBB earlier this year.

In April 2024, the Supermicro Gaudi 3 box was also unveiled.

Supermicro SYS 822GA NGR3 Intel Gaudi 3 8 Way 2

There's a lot here, and we want to roll it out at scale.

Source link

What's Hot

One sixty-meter to rent in a lease. “You can get a shower with one hand and mix with another pot.”

I will meet with zelanski. We sign an agreement

A breakthrough in Ukrainian minerals. Trump: I’ll meet Zelansky

Conversational AI – AiPedia

Associative Memory – AiPedia

AI Search – AiPedia

Everything you need to know about Mercosur and its translation into several different languages

6 Best Free Online Translation Tools

How do I translate my mobile app?

EF Polymer Named to Forbes' List of 100 Asia Companies to Watch for Agricultural Innovation

Champions League draw: Improved format packed with high-profile rematches between Europe's biggest clubs

Costacurta expects Milan to have a 'good journey' in Europe this season

Two of Europe's most successful teams meet in the Champions League

One sixty-meter to rent in a lease. “You can get a shower with one hand and mix with another pot.”

Review: 7 Future Fashion Trends Shaping the Future of Fashion

Meta’s AlbedoGAN Advances Realistic 3D Face Generation

Subscribe to Updates

What's Hot

Intel Gaudi 3 for AI Training and Inference

Intel Gaudi 3 for AI Training and Inference

Final Words

Related Posts