For a little more than a year now, I have spent a significant amount of time learning new programming languages and frameworks. With each new language/framework, there’s a chance that the value I receive will be greater than any of the languages/frameworks I have learned thus far. However, the opposite is also true.
There is a chance that the value I get from a new language or framework won’t be worth the time investment to learn it. With each new language I want to learn I ask myself: Will learning this new language be worth the cost (time) that it takes to learn?
It turns out that this problem is very well known—it’s the Multi-Armed Bandit Problem, and it begins with a gambler and some slot machines.
The Multi-Armed Bandit
We are introduced to a gambler who is sitting in front of a row of slot machines—also known as one-armed bandits. The gambler’s goal is to leave with as much value as possible, given a finite number of pulls on these slot machines. No other patrons will tell the gambler which slot machines are good and which are bad. To find the best machine, the gambler must try different slot machines and compare the outcome to the levers that have already been pulled.
The gambler always has the option to continue using the best machine so far or pull a new lever. Pulling a new lever has the benefit of gaining information about another machine, but there is an associated opportunity cost. If the gambler pulls an unknown lever instead of the best lever so far, he has sacrificed the guaranteed value of the best slot machine for the potential value of the new machine. This new machine could be better than any of the previous machines, but the gambler does not know this until the new lever has been pulled.
The Solution: Explore/Exploit
The best method that the gambler can use to solve this problem is to start with a period of exploration followed by a period of exploitation. Since they doesn’t know which slot machine is the best, they must spend some time to explore their options and learn which slot machines offer the most value from the machines they have tried. Eventually, the opportunity cost to try a new machine is high enough that the gambler should exploit the machine that has been the best so far.
How Does This Apply to Learning New Languages?
The question about which or how many programming languages you should learn is very much a version of the Multi-Armed Bandit Problem, but there are a few key differences worth mentioning:
- Patrons will tell you a lot of information about the machines before you use them. This might seem like a benefit, but it can actually do more harm than good since the other patrons might have different perspectives on what makes the slot machine valuable; i.e. is the lever on the right or left side of the machine? Is the text on the machine indented with tabs or spaces, etc.
- The casino is continuously adding new machines. The casino is never satisfied with the machines it currently has, so it is always adding new slot machines of various shapes and sizes. This can be very problematic since the opportunity cost fluctuates with each new slot machine, and the desire to try one of the new slot machines becomes increasingly more tempting.
- The new machines are often new versions of machines you’ve already used. Out of all the slot machines you’ve tried so far, the
B 1.0model is by far the most valuable. The casino recently added the
B 2.0model to the row of slot machines, which comes with plenty of fancy lights and new buttons to try out. After trying one of the slot machines in the “B” series, one might think that the value can only be better than the
B 1.0model. The
B 1.0model might have been the best machine you’ve used so far, but the
B 2.0model being newer doesn’t mean it will increase in value from the original model.
Do these differences change the original solution to this problem? Yes, but in a good way. The key is knowing when to switch between exploration and exploitation. Instead of a single phase of exploring/exploiting periods, you should toggle the periods as opportunity cost fluctuates.
For example, after learning a new language such as Python or F#, it’s important to exploit that new language for a period of time instead of running off to learn Clojure. This doesn’t mean that Clojure will never be worth your time now that you’ve learned Python or F#. There will be a time when exploring a new language will once again be worth your time and recognizing these moments are crucial to your growth as a maker.
Knowing which levers to pull is essential, but it’s important to realize that this isn’t your first visit to the casino, and it won’t be your last.
I’m interested to hear how you’ve experienced the Multi-Armed Bandit Problem in your life!