[Previous entries on DZone: 52 Papers in 52 Weeks: Turing’s "On Computable Numbers..."]
“On a scale of OMFG to WTFLOL I was like whaaaaaaat?”
That’s the line that got me hooked. While casually scrolling through candidates, the words OMFG and WTFLOL shined out from the sea of dry academese.
I loved it.
But it was published on April 1st, reads like a giant joke and contains assurances that it’s 100% real to boot. And it turns out the conference – SIGBOVIK 2013 – is an annual conference devoted to spoof research. Hmm …
After reading the paper and watching the videos I’m convinced this is either the best thought out April fool’s joke ever, or real research published in an awesome way. Either way, the nitty gritty of the paper looks believable. There’s also source code and plenty of video evidence.
I’m assuming the paper is real because I want to believe!
In this paper, @Tom7 writes about an algorithm he’s developed over several weekends that plays classical NES games. The software consists of two utilities: learnfun and playfun.
learnfun watches you play a game and figures out what it means to win.
playfun uses that knowledge to play the game. In some cases failing miserably, in others playing impressively well and even exploiting bugs.
The best part of Tom7′s approach is that it’s mathematically elegant and very stupid simple. He says so himself.
A smarter system might emulate how a human plays the game – looking at the score, lives, and looking at enemies. Instead, this system relies only on RAM snapshots and tries to make bytes go up. Doesn’t really matter what they mean, just that they go up.
After being tweaked on Super Mario the approach does smashingly well on left-to-right platformers, does a decent job of Pacman, and fails miserably on Tetris. But it’s smart enough to pause Tetris riiiiight before losing and never unpauses it.
“The Nintendo Entertainment System is probably the best video game console, citation not needed,” which I agree with. But back then hardware was much slower than even the most basic computers we have today.
Based around an 8-bit processor running at 1.79MHz (the latest iPhone’s got two 1,500MHz processors) and with just 2048 bytes of RAM the NES is relatively easy to emulate on modern computers. In fact, this whole paper is based on the idea that handling NES’s memory footprint as a data point is possible.
As a direct result of these limitations, NES programs use fixed memory locations for most things. For instance, Super Mario’s lives are always stored at address
Tom7 used the FCEUX emulator, which let him simulate inputs, play at speeds faster than native hardware allows and gave him the ability to record user inputs and RAM snapshots.
What is “win”?
The central idea of Tom7′s approach is that certain memory locations contain interesting data about the player’s progress. But looking at just these locations going up would be too naive.
You can go from level 1-4 to level 2-1, which makes a byte go down and another byte go up. So we’re going to look at lexicographic orderings and make sure those go up instead. Lexicographic orderings are mathematically beautiful and also generalize to multi-byte sequences (highest 1 byte score would be 255).
A lexicographic ordering is essentially a pair of values (w1, l1) that is smaller than (w2, l2) when w1 equals w2 and l1 < l2 or w1 < w2.
To find the objective function
playfun has to go through our recording and look for maximal tight valid lexicographic orderings in the memory snapshots. Now I’ll be perfectly honest with you and admit I’m not entirely certain what those are.
But on “a scale from simple to fancy the algorithm for finding them is a 3.″ The algorithm uses a list of locations and memories, looks for locations where memories differ and uses those as prefixes to find pairs of locations that globally go up as lexicographic orderings.
The orderings are tight when every location participates in the ordering at least once, and maximal when you can’t find a longer ordering.
In practice we ignore all RAMs until the first player input and slice the recording to avoid problems with momentarily decreasing values. The whole memory output is sliced every few frames with the idea of finding objective functions that increase steadily and don’t get confused by local noise.
To further reduce the influence of randomness and the arbitrary slicing, objective functions are weighted by how far away from ideal they are. The values of each objective function’s memory locations are ordered lexicographically and duplicates removed to produce a vector.
This vector’s value fraction is calculated as i/|V| where i is the lowest index where the vector is less than or equal to Vi, which gives us a chance to compare objectives on a global scale.
We then weight them with VF(Mm) – VF(M0) where M’s are memory slices.
Learning How to Play
playfun knows what to aim for, it needs to know what to do. Generally speaking,
playfun will be a search function that finds the optimal sequence of inputs to satisfy the objective functions from before.
The simplest approach is trying everything and seeing what works. But the search space at each frame is 28 (8 inputs), which makes it far too big. The video above shows you that this approach just makes Mario jump up until he dies of old age.
But hey, the music counter always went up, so surely
playfun was winning, right?
A better approach is taking motifs of 10 keystrokes from player inputs and weighting them by how often they appear in the sequence. “On a scale from gravitational constant to pulled it out of my arse, this is an 8.”, but 10 works well enough so just roll with it.