The difference between the easy and the virtually impossible

I saw a very interesting tweet the other day.

If you have missed it, the AI Minecraft that this tweet refers to is this. It’s basically a neural network, trained on endless Minecraft gameplay footage on Youtube, and it generates a Minecraft video on-the-fly. However, there is no actual game engine running in the background. No update cycle, no database, no object persistence, no if-else conditions, nothing. 100% neural network. Similar to image generation models, but generating a somewhat coherent video stream.

Well. Generating video is impressive by itself. Models like SORA has achieved much better looking video generation obviously. However, in this case, the video is generated not only realtime, but also based on player input. That’s a huge difference ─ the player can actually interact with the video. The player can build, destroy, explore, fight, and so on. And the stream will react to the player’s actions.

And the reaction to this demo was… let’s say people were not impressed. at all. Most made fun of it and took it more as a joke.

This made me realize that it is really hard to explain why this demo is crazy impressive to someone without a basic understanding of what’s happening on a technical level. Without that understanding, one is looking at a very very bad copy of Minecraft.

In CS, it can be hard to explain the difference between the easy and the virtually impossible.

Naturally, this reminds me the classic xkcd comic called tasks. Ironically, the second example here is also trivial today. But it is still hard to explain why.

Maybe this is the inevitable nature of groundbreaking technology - the better it works, the more it gets taken for granted. After all, most of us use smartphones every day without marveling at the fact that we’re carrying supercomputers in our pockets that would have been science fiction just a few decades ago.

The AI Minecraft demo might look unimpressive at first glance, but it represents something profound.

Perhaps in a few years, we’ll look back at this moment and realize it was one of those quiet breakthroughs that seemed underwhelming at the time, but actually marked the beginning of something revolutionary. Just like how that XKCD comic’s “impossible” task of determining whether a photo contains a bird is now a trivial exercise for modern AI.


Published: 2024-11-16