New AI tool creates Doom gameplay

Doom Gameplay

Researchers from Google and Tel Aviv University have unveiled a new AI model that can simulate the classic 1993 first-person shooter game, Doom, in real time using AI image generation techniques. The neural network system, known as GameNGen, functions as a limited game engine and could pave the way for new possibilities in real-time video game synthesis. Instead of using traditional techniques to draw graphical video frames, future games might utilize an AI engine to generate or “hallucinate” graphics in real time.

App developer Nick Dobos said, “Why write complex rules for software by hand when the AI can just think every pixel for you?”

GameNGen can generate new frames of gameplay at over 20 frames per second using a single tensor processing unit (TPU), a specialized processor optimized for machine learning tasks. In tests, human raters sometimes failed to distinguish between actual game footage and outputs generated by GameNGen, identifying true gameplay footage only 58 percent to 60 percent of the time. An example shows GameNGen interactively simulating Doom using an image synthesis model.

This form of real-time video game synthesis has been hinted at by Nvidia CEO Jensen Huang, who suggested that most video game graphics could be AI-generated in real time within the next five to 10 years. GameNGen builds on previous work, including a 2020 project and Google’s own research in March. Earlier this year, university researchers trained an AI model to simulate vintage Atari video games using a diffusion model.

Research into AI video synthesis models like Runway’s Gen-3 Alpha and OpenAI’s Sora is heading in a similar direction.

Simulating Doom with AI-generated frames

In a preprint research paper, authors Dani Valevski, Yaniv Leviathan, Moab Arar, and Shlomi Fruchter explain how GameNGen works.

It uses a modified version of Stable Diffusion 1.4, an image synthesis diffusion model released in 2022, to produce AI-generated images. The diffusion model predicts the next gaming state from previous ones after being trained on extensive footage of Doom in action. Developing GameNGen involved a two-phase training process.

Researchers first trained a reinforcement learning agent to play Doom, recording its gameplay to create a training dataset. They then used this data to train the custom Stable Diffusion model. However, using Stable Diffusion introduced some graphical glitches, particularly affecting small details and the bottom bar HUD.

Maintaining the visual clarity and consistency of images over time, known as “temporal coherency,” is another challenge. The researchers noted that “interactive world simulation is more than just very fast video generation.” The need to condition on a stream of input actions during generation can break some assumptions of existing diffusion model architectures, leading to instability and a decline in the quality of the generated world over time. The potential for AI to revolutionize real-time video game graphics continues to grow, and GameNGen’s advancements hint at a future where classic games might be reimagined in new ways.