Nvidia CEO Jensen Huang believes physical AI will be the next big trend: New robots will take many forms, but they will all be powered by AI.
Lately, Nvidia has been celebrating a future where robots are everywhere: intelligent machines taking over increasingly repetitive tasks in kitchens, factories, doctor's offices, highways, etc. And, of course, Jensen's company will provide all the AI software and hardware needed to train and run the AI we need.
What is Physics AI?
Jensen describes the current stage of AI as pioneering AI, creating the foundational models and tools needed to refine it for specific roles. The next stage, already underway, is enterprise AI, where chatbots and AI models are making companies' employees, partners and customers more productive. At the apex of this stage, everyone will have their own personal AI assistant, or a fleet of AIs to help them perform specific tasks.
In these two phases, the AI tells us or shows us things by generating a sequence of words, or tokens, of possible next words. But the third and final phase, Jensen says, is physical AI, where intelligence takes shape and interacts with the world around it. To do this well requires integrating inputs from sensors and manipulating items in three-dimensional space.
Jensen will be introducing two robot friends at GTC'24.
NVIDIA
“Building a foundational model for a general-purpose humanoid robot is one of the most exciting challenges to solve in AI today,” said Jensen Huang, founder and CEO of NVIDIA. “Enabling technologies are coming together to help leading robotics researchers around the world make a giant leap toward artificial general-purpose robots.”
So we need to design a robot and its brain – this is clearly the job of AI. But how do we test it against the myriad of situations it might encounter, many of which cannot be predicted or reproduced in the physical world, and how do we control it? As you might imagine, we use AI to simulate the world that our robot will occupy and the myriad devices and living things it will interact with.
“You'll need three computers: one to create the AI, one to simulate the AI and one to run the AI,” Jensen said.
The Three Computer Problem
The “three-body problem” in robotics.
NVIDIA
Jensen is talking, of course, about Nvidia's portfolio of hardware and software solutions: The process starts with Nvidia H100 and B100 servers for creating the AI, Nvidia Omniverse-powered workstations and servers with RTX GPUs to simulate and test the AI and its environments, and Nvidia Jetsen (soon to feature Blackwell GPUs) to provide on-board real-time sensing and control.
Nvidia also unveiled GR00T (Generalist Robot 00 Technology), a robot that observes human behavior to design, understand and emulate it. GRooT learns coordination, dexterity and other skills to navigate, adapt and interact with the real world. During the GTC keynote, Huang demonstrated several such robots onstage.
Two new AI NIMs enable robotics engineers to develop generative-physics AI simulation workflows in NVIDIA Isaac Sim, the reference application for robot simulation built on the NVIDIA Omniverse platform. First, the MimicGen NIM microservice generates synthetic motion data based on teleoperation data recorded using spatial computing devices such as Apple Vision Pro. The Robocasa NIM microservice generates robot tasks and simulation-ready environments in OpenUSD, Omniverse's underlying universal framework for development and collaboration in 3D worlds.
Finally, NVIDIA OSMO is a cloud-native management service that enables users to orchestrate and scale complex robotics development workflows across distributed computing resources, whether on-premise or in the cloud.
OSMO simplifies the creation of robot training and simulation workflows, reducing deployment and development cycle times from months to less than a week. Users can visualize and manage a wide range of tasks, including generating synthetic data, training models, conducting reinforcement learning, and testing humanoids, autonomous mobile robots, and industrial manipulators at scale.
So how do you design a robot that can grab objects without crushing or dropping them? The Nvidia Isaac Manipulator delivers state-of-the-art dexterity and AI capabilities for robotic arms, built on a set of foundational models. Initial ecosystem partners include Yaskawa, Universal Robots, a Teradyne subsidiary, PickNik Robotics, Solomon, READY Robotics and Franka Robotics.
So how do you train a robot to “see”? Isaac Perceptor provides multi-camera 3D surround vision capabilities that are increasingly being used by autonomous mobile robots in manufacturing and fulfillment operations to reduce error rates and costs while improving efficiency and worker safety. Early adopters include ArcBest, BYD, and KION Group, who are looking to achieve new levels of autonomy in material handling operations and more.
To operate the robot, the new Jetson Thor SoC includes a Blackwell GPU based on the Transformer Engine that delivers 800 teraflops of 8-bit floating-point AI performance to run multi-modal generative AI models such as GR00T. It features a functional safety processor, a high-performance CPU cluster, and 100GB of Ethernet bandwidth, greatly simplifying the design and integration effort.
Conclusion
Just when you think it might be safe to get back in the water, dum. Dum. Dum. Dum. In comes the robot. Jensen believes the robots need to take on a human form because the factories and environments they operate in are all designed for human operators. It's much more economical to design a human-like robot than to redesign the factories and spaces they'll be used in.
Even if it's just your kitchen.