FuriosaAI RNGD Processor for Sustainable AI Computing

FuriosaAI RNGD No Cooler_1

We are hearing more and more about sustainable AI computing, and FuriosaAI offers a solution with RNGD that is almost the opposite of many of the AI computing platforms we have learned about today: instead of striving for maximum power, it is a low-power computing solution.

This is the final talk of the day after over 10 talks, and it's being done live so please excuse any typos.

FuriosaAI RNGD Processor for Sustainable AI Computing

Here are the specs for the card: This is not specifically designed to be the fastest AI chip on the market.

Furiosa AI RNGD Hot Chips 2024_Page_05

Here is the card with the cooler.

FuriosaAI RNGD without cooler and with cooler

For air-cooled data centers, the target TDP is only 150W.

Furiosa AI RNGD Hot Chips 2024_Page_06

The build will be done using 12-layer HBM3 and TSMC CoWoS-S, as well as a 5nm process.

Furiosa AI RNGD Hot Chips 2024_Page_07

Rather than focusing on the H100 or B100, FuriosaAI is targeting the NVIDIA L40S. We've written extensively about the L40S before. The goal is not only to offer similar performance, but to offer that performance at a lower power.

Furiosa AI RNGD Hot Chips 2024_Page_08

Efficiency comes from hardware, software and algorithms.

Furiosa AI RNGD Hot Chips 2024_Page_09

One of the challenges FuriosaAI has been addressing is the abstraction layer between hardware and software.

Furiosa AI RNGD Hot Chips 2024_Page_11

Tensor contraction is one of the big operations in FuriosaAI; on BERT, it accounted for over 99% of the FLOPS.

Furiosa AI RNGD Hot Chips 2024_Page_12

Usually, instead of tensor contractions, you would use matrix multiplication as a primitive.

Furiosa AI RNGD Hot Chips 2024_Page_13

Instead, the abstraction happens at the tensor contraction level.

Furiosa AI RNGD Hot Chips 2024_Page_14

Furiosa adds low-level einsum to its primitives.

Furiosa AI RNGD Hot Chips 2024_Page_15

Now we multiply matrices A and B to produce C.

Furiosa AI RNGD Hot Chips 2024_Page_16

Furiosa takes this and schedules it on a real architecture with memory and compute units.

Furiosa AI RNGD Hot Chips 2024_Page_17

From here on, the entire tensor contraction becomes a primitive.

Furiosa AI RNGD Hot Chips 2024_Page_18

Considering spatial and temporal orchestration can increase efficiency and utilization.

Furiosa AI RNGD Hot Chips 2024_Page_19

According to Furiosa, it has flexible reconfiguration capabilities, which are important for keeping performance high as batch sizes change.

Furiosa AI RNGD Hot Chips 2024_Page_20

Let's look at the implementation of RNGD.

Furiosa AI RNGD Hot Chips 2024_Page_21

Here is the interconnection network for accessing the scratchpad memory:

Furiosa AI RNGD Hot Chips 2024_Page_22

Furiosa uses PCIe Gen5 xq6 for chip-to-chip communication, and P2P over a PCIe switch for direct GPU-to-GPU communication, so if XConn can get this right it will be a great product.

Furiosa AI RNGD Hot Chips 2024_Page_23

Furiosa supports SR-IOV for virtualization.

Furiosa AI RNGD Hot Chips 2024_Page_24

The company has worked on signal and power integrity for reliability.

Furiosa AI RNGD Hot Chips 2024_Page_25

Here is how the Furiosa LLM works in flowchart form.

Furiosa AI RNGD Hot Chips 2024_Page_27

The compiler compiles each partition that is mapped to multiple devices.

Furiosa AI RNGD Hot Chips 2024_Page_28

The compiler optimizes the model for better performance and energy efficiency.

Furiosa AI RNGD Hot Chips 2024_Page_29

The service framework performs operations such as continuous batch processing to increase utilization.

Furiosa AI RNGD Hot Chips 2024_Page_30

The company has a graph-based automated tool to assist with quantization. Furiosa can support a variety of formats, including FP8 and INT4.

Furiosa AI RNGD Hot Chips 2024_Page_31

This is the company's development methodology.

Furiosa AI RNGD Hot Chips 2024_Page_32

Final Words

There's a lot of information here, but the quick summary is that the company is using compilers and software to map AI inference onto low-power SoCs to deliver low-power AI inference.

Source link

What's Hot

Canada. Joint Chiefs of Staff Jenni Carignan slams Pentagon nominee for remarks about women in the military

War in Ukraine. An air alert has been issued in Kyiv. I hear an explosion

The war in Ukraine, the most important information. Explosions in Kyiv, talks about new weapons, fired the Russian commander

Conversational AI – AiPedia

Associative Memory – AiPedia

AI Search – AiPedia

How do I translate my mobile app?

EF Polymer Named to Forbes' List of 100 Asia Companies to Watch for Agricultural Innovation

MPA: Asia content spending to exceed $17 billion by 2028

The ultimate racing experience on Asia's most iconic circuits

Champions League draw: Improved format packed with high-profile rematches between Europe's biggest clubs

Costacurta expects Milan to have a 'good journey' in Europe this season

Two of Europe's most successful teams meet in the Champions League

Canada. Joint Chiefs of Staff Jenni Carignan slams Pentagon nominee for remarks about women in the military

Review: 7 Future Fashion Trends Shaping the Future of Fashion

Meta’s AlbedoGAN Advances Realistic 3D Face Generation

Subscribe to Updates

What's Hot

FuriosaAI RNGD Processor for Sustainable AI Computing

FuriosaAI RNGD Processor for Sustainable AI Computing

Final Words

Related Posts