Hardware is always a main focus of Nvidia's GPU technology conferences, and this year saw a preview of the “Blackwell” data center GPU, which is the basis of the 2025 platform that includes the “Grace” CPU, NVLink Switch 5 chip, Bluefield-3 DPU, and other components — all components Nvidia will be highlighting again at its Hot Chips 2024 conference this week.
What has received less attention is Nvidia's NIM strategy to make it easier and faster for developers to create AI applications. There's been a lot of buzz about Nvidia Inference Microservices, but with the likes of Blackwell on the horizon, it's hard to get traction.
Still, NIM is important to Nvidia's larger plans to enable users to develop AI software with generative AI tools like chatbots. Nvidia says NIM delivers everything software engineers need in a container-like environment as pre-built microservices that can be deployed to the cloud, datacenters, workstations and other systems. Built on Kubernetes, the NIM container includes open source large-scale language models, a cloud-native stack, Nvidia's TensorRT and TensorRT-LLM, Triton inference server and standard APIs, and is part of Nvidia's larger AI enterprise strategy.
Justin Boitano, vice president of enterprise AI software products at Nvidia, said NIM is part of what he calls the “second wave of generative AI,” occurring in enterprises and enabling companies to leverage organizational knowledge to run their business, engage with customers, and innovate much faster.The first wave, fueled by the enthusiasm following the release of OpenAI's ChatGPT in late November 2022, was driven by foundational modelers and concerned with embedding generative AI in internet services and improving individual productivity through writing languages and code.
In this new wave, “generative AI will help teams understand complex business processes and supply chain dependencies and bring new products and services to market at a speed no company has been able to achieve before,” Boitano told journalists and analysts in a briefing ahead of the Hot Chips show in California this week. “This started with the introduction of open models such as Meta Platforms' Lama 3.1. These models represent an incredible advancement, giving companies a new level of intelligence that was almost unimaginable running in the data center just a few years ago.”
NIM was created to make it possible to run such models at scale, in production, and securely, he said, adding that Nvidia is currently working with various AI model builders to use NIM to essentially give their models a high-performance, efficient runtime.
“These NIMs deliver performance optimizations, delivering token throughput efficiency two to five times faster than other solutions, optimizing the total cost of ownership for businesses running generative AI on Nvidia systems,” Boitano said. “By working with an ecosystem of community model builders, proprietary model builders and our own models, we ensure every modality for every business works seamlessly, resulting in the best token efficiency for customers using Nvidia AI Enterprise.”
At Hot Chips, Nvidia is taking another step with NIM, introducing NIM Agent Blueprints for developers who want to create custom generative AI applications. These are reference AI workflows that include sample applications based on NIM and partner microservices, reference code, documentation outlining customizations, and Helm charts (files that detail the resources for a Kubernetes cluster and package them as an application) for deploying the apps. Developers can modify the blueprints.
“It's a catalog of reference applications built for common use cases, codifying best practices Nvidia has learned from its experience with early adopters. Nvidia NIM blueprints are executable AI workflows that are pre-trained for specific use cases and can be modified by any developer. They're a starting point for executing what we believe are some of the most critical business tasks in enterprises,” Boitano said.
The NIM Blueprint is part of what Nvidia calls a “data flywheel,” which goes beyond accelerating models. Models must be enhanced and customized to address the specific needs of an organization and its use cases. In the flywheel idea, as an AI application runs and interacts with users, it generates data that can be fed back into the process and used to improve the model in a continuous learning cycle, he said.
“Nvidia NeMo is the engine that powers this flywheel,” Boitano said, adding, “The Nvidia AI Foundry is the factory that powers the NeMo flywheel, and these customized generative AI applications enable businesses to deliver better, higher-quality experiences to their customers and employees.”
He added, “The application building process actually starts with NIM, but to build a data flywheel, the Nvidia NeMo framework can be used to curate data, customize models and evaluate them to power the application and bring it back to production. NeMo accelerates all the compute-intensive stages of the generative AI app development lifecycle, and we have a broad partner ecosystem building on top of NeMo and NIM, making it easy for enterprises to develop their own generative AI applications.”
Since the early days of generative AI efforts, organizations have spoken about the need to be able to customize their AI efforts by incorporating enterprise data into the training and inference mix. This impetus gave birth to Search Augmented Generative (RAG).
Nvidia will initially release blueprints for three scenarios, including Digital Humans for Customer Experience (creating 3D digital humans that can interact with users, enabling multi-channel communication and connecting to the RAG system), and multi-modal PDF data extraction for enterprise RAG.
“Trillions of PDFs are generated across enterprises every year, and these PDFs contain multiple data types, including text, images, graphs, tables, and more,” he said. “The Multimodal PDF Data Extraction Blueprint helps organizations accurately extract the knowledge contained within vast amounts of enterprise data, allowing users to effectively access this data through chat interfaces and quickly turn digital humans into experts on any topic, empowering employees to make smarter, faster decisions.”
Finally, accelerating drug discovery using generative AI to simulate molecules that can target and bind to proteins.
Nvidia has brought on Accenture, Deloitte, SoftServe, Quantiphi and World Wide Technology to provide NIM Agent Blueprints, Dataiku and DataRobot for fine-tuning models and monitoring, LlamaIndex and Langchain for workflow building, Weights and Biases for application assessment, CrowdStrike, Datadog, Fiddler AI, New Relic and Trend Micro for cybersecurity. Enterprise portfolios from Nutanix, Red Hat and Broadcom support the blueprints.
They also run on systems from OEMs such as Cisco, Dell Technologies, Hewlett Packard Enterprise and Lenovo, as well as hyperscale systems from Amazon Web Services, Google Cloud, Azure and Oracle Cloud Infrastructure.
Sign up for our newsletter
We'll deliver the week's highlights, analysis and stories straight to your inbox, with no middle ground.
Subscribe now
Related articles