Did you know you can play Doom on a diffusion model now? It’s true, Google just announced it! Just don’t read the paper too closely. In their paper “Diffusion models are real-time game engines,” Go…
This is conceptually different, it just generates a few seconds of doomlike video that you can slightly influence by sending inputs, and pretends that In The Future™ entire games could be generated from scratch and playable on Sufficiently Advanced™ autocomplete machines.
Skimmed the paper, but i don't see the part where the game engine was being played. They trained an "agent" to play doom using vizdoom, and trained the diffusion model on the agents "trajectories". But i didn't see anything about giving the agents the output of the diffusion model for their gameplay, or the diffusion model reacting to input.
It seems like it was able to generate the doom video based on a given trajectory, and assume that trajectory could be real time human input? That's the best i can come up with. And the experiment was just some people watching video clips, which doesn't track with the claims at all.
I didn't seek out the video before, I read about all the glaring problems, but one thing that no one pointed out was... why is the entire thing slow motion?
Like, you know Doom? The game where the brisk pace and constant movement are a core part of its DNA? Witness it running at 0.5x speed and like 5FPS in the year of our acausal robot lord 2024.
It's running slow because it's running at such a low framerate. The speed and the framerate are tied. Old console games used to work that way, which was a problem because games would run at different speeds in different countries (PAL vs NTSC). This is a solved problem in modern games. Just separate the game logic from the display logic. But this AI can't do that because there is nothing but the video.
Add to that that the AI was probably trained on high framerate footage but is only capable of generating low framerate footage and you get (gestures wildly) this