As a result of their research, a team from Google and Tel Aviv University have developed a generative AI game engine that can simulate the cult favorite DOOM at over 20 frames per second.
The research, detailed in a paper (PDF) published yesterday, shows how the reinforcement and diffusion models can be used to simulate a game engine in real time.
The model, called GameNGen (pronounced “game engine”), was trained on DOOM, but the researchers note that the approach used is not specific to that game and can be applied to a variety of titles.
Traditional game engines are hand-coded to follow a series of loops that track user input, update game state, and render pixels on the screen – if they do this fast enough, it creates the illusion of moving and interacting with a virtual environment.
Youtube Video
In comparison, GameNGen works a bit differently as the entire game engine and frames are generated on the fly based on the player's actions and the past few frames. To do this, you might think the researchers would have collected hours of game footage from real players, but according to the researchers, this was not realistic.
Instead, the first phase of GameNGen's training was to create a reinforcement learning agent that learned how to play DOOM. The data generated from these training sessions was used to train a custom diffusion model based on Stable Diffusion v1.4 that renders the game.
According to the researchers, when running on a single TPU v5, GameNGen was able to achieve around 20 FPS, which is a far cry from the acceptable 60+ FPS goal for most modern first-person shooters, but it's worth noting that OG DOOM maxed out at 35 FPS.
The researchers note that reducing the denoising step to a single one actually allows for faster performance, up to 50 FPS, but at the expense of reduced quality.
In terms of visual quality, the researchers claim that the generated frames are comparable to lossy JPEG compression, and that “human raters were only slightly better than chance at distinguishing between short clips of the game and clips of the simulation.” We've embedded the video so you can judge for yourself, but it's worth noting that these “short clips” only last between 1.6 and 3.2 seconds of gameplay.
As you might expect, GameNGen is currently only a proof-of-concept and has many limitations, as highlighted in the paper. One of the biggest is memory: when running on a single TPU v5, the model only has enough space to store about 3 seconds of gameplay.
The fact that game logic can function despite this limitation is, in the researchers' words, “remarkable.”
Another limitation highlighted in the text is that relying on a reinforcement learning agent as a source of training data does not map every corner of the original game: “Our agent, even at the end of training, has not explored every location and interaction in the game, which leads to erroneous behavior.”®