NVIDIA Blackwell Sets New Standard for Gen AI with MLPerf Inference Debut

As businesses race to adopt generative AI and bring new services to market, demands on data center infrastructure are at an all-time high. Training large language models is one challenge, but delivering real-time services that leverage LLMs is another.

In the latest round of MLPerf industry benchmarks, Inference v4.1, the NVIDIA platform was the top performer in all data center tests. The first submission for the upcoming NVIDIA Blackwell platform revealed that using the second-generation Transformer Engine and FP4 Tensor Cores, it delivered up to 4x performance over NVIDIA H100 Tensor Core GPUs on MLPerf's largest LLM workload, Llama 2 70B.

NVIDIA H200 Tensor Core GPUs achieved outstanding results across all benchmarks in the Data Center category, including the latest addition to the benchmark, Mixtral 8x7B Mixture of Experts (MoE) LLM, with a total of 46.7 billion parameters, with 12.9 billion parameters active per token.

MoE models are gaining popularity as a way to bring more versatility to LLM deployments, as they can answer a wider variety of questions and perform a wider variety of tasks in a single deployment. They are also more efficient, as they only activate a smaller number of experts per inference, meaning they deliver results much faster than dense models of a similar size.

The continued growth of LLMs demands more computing power to handle inference requests. To meet the real-time latency requirements to serve today's LLMs, and to serve as many users as possible, multi-GPU computing is a must. Based on the NVIDIA Hopper architecture, NVIDIA NVLink and NVSwitch provide high-bandwidth communication between GPUs, delivering significant benefits for real-time, cost-effective large-model inference. The Blackwell platform further extends the capabilities of NVLink Switch with a larger NVLink domain with 72 GPUs.

In addition to NVIDIA's submissions, 10 NVIDIA partners – ASUSTek, Cisco, Dell Technologies, Fujitsu, Giga Computing, Hewlett Packard Enterprise (HPE), Juniper Networks, Lenovo, Quanta Cloud Technology and Supermicro – all had solid MLPerf inference submissions, highlighting the broad availability of the NVIDIA platform.

Continuous software innovation

The NVIDIA platform is under continuous software development, delivering monthly improvements in performance and features.

The latest round of inference delivered breakthrough performance improvements on NVIDIA products including the NVIDIA Hopper architecture, NVIDIA Jetson platform and NVIDIA Triton inference server.

The NVIDIA H200 GPU delivered up to 27 percent higher generative AI inference performance over the previous round, highlighting the long-term added value customers can derive from their investment in the NVIDIA platform.

Part of the NVIDIA AI platform and available in NVIDIA AI Enterprise software, the Triton Inference Server is a full-featured open-source inference server that helps organizations consolidate framework-specific inference servers into a single, unified platform, lowering the total cost of ownership for serving AI models in production and reducing model deployment time from months to minutes.

This MLPerf round saw the Triton Inference Server deliver performance nearly on par with NVIDIA's bare-metal submission, demonstrating that organizations no longer have to choose between having a feature-rich, production-grade AI inference server and achieving peak throughput performance.

Heading to the Edge

Generative AI models deployed at the edge can transform sensor data, such as images and videos, into actionable insights in real time with strong contextual awareness. The NVIDIA Jetson platform for Edge AI and Robotics is uniquely capable of running any kind of model locally, including LLM, Vision Transformer, and Stable Diffusion.

In this MLPerf benchmarking round, the NVIDIA Jetson AGX Orin system-on-module achieved over 6.2x throughput improvement and 2.4x latency improvement on the GPT-J LLM workload compared to the previous round. Instead of building for specific use cases, developers can now use this general-purpose 6 billion parameter model to seamlessly interface with human language to transform generative AI at the edge.

Demonstrating performance leadership in all areas

“MLPerf Inference demonstrates the versatility and performance of the NVIDIA platform across all benchmark workloads, from the data center to the edge, powering the most innovative AI-powered applications and services. For more details on these results, read our tech blog.”

H200 GPU-powered systems are available today from CoreWeave, the first cloud service provider to announce general availability, as well as server manufacturers ASUS, Dell Technologies, HPE, QTC and Supermicro.

Please see the software product information notice.

Source link

What's Hot

How the Russians attacked – Mariusz Cielma on the Russian Oriesznik missile

Bydgoszcz. The door was broken and two people were taken out of the smoke-filled apartment. Record

Preview for TVN program “You can dance – Just dance!” Maciej Dowbor: emotions are great

Conversational AI – AiPedia

Associative Memory – AiPedia

AI Search – AiPedia

How do I translate my mobile app?

EF Polymer Named to Forbes' List of 100 Asia Companies to Watch for Agricultural Innovation

MPA: Asia content spending to exceed $17 billion by 2028

The ultimate racing experience on Asia's most iconic circuits

Champions League draw: Improved format packed with high-profile rematches between Europe's biggest clubs

Costacurta expects Milan to have a 'good journey' in Europe this season

Two of Europe's most successful teams meet in the Champions League

How the Russians attacked – Mariusz Cielma on the Russian Oriesznik missile

Review: 7 Future Fashion Trends Shaping the Future of Fashion

Meta’s AlbedoGAN Advances Realistic 3D Face Generation

Subscribe to Updates

What's Hot

NVIDIA Blackwell Sets New Standard for Gen AI with MLPerf Inference Debut

Continuous software innovation

Heading to the Edge

Demonstrating performance leadership in all areas

Related Posts