Big Data/BI Zone is brought to you in partnership with:

Justin Bozonier is the Product Optimization Specialist at GrubHub formerly Sr. Developer/Analyst at Cheezburger. He's engineered a large, scalable analytics system, worked on actuarial modeling software. As Product Optimization Specialist he is currently leading split test design, implementation, and analysis. The opinions expressed here represent my own and not those of my employer. Justin is a DZone MVB and is not an employee of DZone and has posted 27 posts at DZone. You can read more from them at their website. View Full User Profile

Understanding Multi-Armed Bandit Algorithms

  • submit to reddit

The Scenario

Imagine you are in front of three slot machines, each with different likelihoods of paying out. How should you play them in order to maximize your payout?

This is the problem the multi-armed bandit algorithm solves.

You might say the solution lies in just trying them all out and seeing which one does better and you’d be right! But how much should you play each slot machine? How little is too little? Here is where it gets tricky. It isn’t too bad though. A little Bayesian stats trick can make this pretty easy. Really all we need to do is play the machine most likely to payout as often as possible.

Reducing the Unknown Iteratively

Now that we know we don’t need to know the exact likelihood for any of the options let’s explore how we can find the best option.

Let’s say we start with machine A, drop a quarter in, and pull back the handle. No pay out. How does that affect our belief in this machine? Turns out it’s just like the way we updated our beliefs in a coin flip in my earlier post. (You can read that here: lity_distributions.html)

A key feature we want is experimentation of the different options. We can do that by collecting a set of credible hypotheses for the likelihood of a payout and randomly choosing one to try. There is a really handy statistical function we can use to accomplish this called the “Beta” function. The Beta function is defined like this:

random likelihood = Beta(1 + plays resulting in payout, 1 + plays NOT

resulting in payout)

Imagine you’ve played slot machine A ten times without a payout. The beta function would look like this:


%load_ext rmagic
from pretty import pprint

The rmagic extension is already loaded. To reload it, use: %reload_ext rmagic


curve(dbeta(x, 1, 11))

Example Beta Curve

This shows how the likelihoods nearest zero are most likely. Anything above .4 is practically impossible. Of course zero is the single most likely likelihood, but it is not at all the only possibility. So now if we were to sample a likelihood from the above beta function we should see that we get a low lielihood for this machine.



[1] 0.04452788

Here’s where it gets cool! We can use this same technique even for the slot machines we haven’t tried yet. Let’s see if we should try slot machine B next. We’ll evaluate the beta function with no prior information. That’s a Beta(1,1). Above we use rbeta with an extra parameter (the first) to specify how many samples we want to pull from the function. So this will be all one’s.


print("Likelihood for Slot Machine B to payout")

[1] "Likelihood for Slot Machine B to payout"

[1] 0.655021

Well shucks. If Slot Machine A only has a 4.5% chance to pay out but Slot Machine B has a 65.5% chance to pay out, well, we’d be crazy to play Machine A. But, let’s see what Slot Machine C’s likelihood is.

Published at DZone with permission of Justin Bozonier, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)