Welcome to the very first post of The Perceptron! Today, we’re talking about Monte Carlo methods.
Talk random to me
In Fooled by Randomness, Nassim Nicholas Taleb very famously wrote,
I have two ways of learning from history: from the past, by reading the elders; and from the future, thanks to my Monte Carlo toy.
How wonderful! Monte Carlo methods are a beautiful set of computational algorithms that in very simple terms, use a bunch of random numbers (sampling) to obtain predictions of otherwise hard-to-predict outcomes. They are used in a variety of fields such as Mathematics, Economics, Particle Physics, Chemical Physics, Computational Biology, Astronomy, Epidemic Simulations, Artificial Intelligence and many others. Now, why should you trust random numbers to give you accurate predictions?
Glad you asked. The Law of Large Numbers (LLN) says that the average of the results obtained from a large number of trials should be close to the expected value and will tend to become closer to the expected value as more trials are performed.
To illustrate this, imagine a square of dimension 1, with a quadrant of radius 1 inscribed within it.
We randomly choose different points inside this square. The ratio of the number of points inside the quadrant to the total number of points inside the square gives us an estimate of the ratio of the area of the quadrant to the area of the square, which is theoretically π/4. Multiplying that by 4 gives us the value of π.
As we can see from our experiments when we increase the number of random points that we sample, our calculated value of Pi gets closer to its true value. If we increase the number of points further, we get even closer to the actual value. That’s the Law of Large Numbers and there goes out first Monte Carlo Simulation to calculate the approximate value of Pi!
NOTE: There seem to be certain minor differences between what we call a Monte Carlo Method and a Monte Carlo Simulation, but they are not that relevant in the context of this post, and we can think of Monte Carlo Simulations as simulations done using Monte Carlo methods.
You’d say, “But aren’t there other better ways to calculate Pi? Why use Monte Carlo for this?”
Right. Estimating Pi might not be the most compelling reason to use a Monte Carlo Simulation, but certainly, the easiest one to grasp. Let’s consider a different (and a more interesting) scenario.
Suppose your friend Gyanand wants to play a game with you. He asks you to choose a parity (odd or even) and after you tell him your choice, he rolls a fair die. If the number on the die is of the same parity as your choice, Gyanand gives you One Rupee coins equal to the number on the die. However, if the number is of the opposite parity, you shall give him the coins equal to the number on the die. What parity would you choose?
Since both parties have equal probabilities of being rolled, it doesn’t matter what you choose. You have equal chances of winning or losing. Pretty easy. Would your answer change if instead of rolling once, Gyanand rolls the die ten thousand times? Well, since it’s a fair die, and each roll is independent of the other rolls, so you still have the same equal probability of winning or losing, on each roll of the die, so it shouldn’t matter. Right?
I beg to differ. If I tell you that I have rolled an odd number, the expected value (or the average value) of the number would be:
On the other hand, if I have rolled an even number, its expected value would be:
That being said, since the probability of rolling an even or odd number is equal and is 0.5, so if you choose even parity, you get an expected reward of:
Similarly, the expected reward if you choose odd parity is:
You can see where this is going. The expected value is better if you choose the even parity and with the virtue of the LLN, the expected reward is 10000*0.5 = 5000 if you choose even parity and 10000*(-0.5) = -5000 if you choose odd. You better be even with that Gyanand.
Let’s simulate this with Monte Carlo!
There we go! If you run this for multiple runs, you’d still pretty much get the same results. Go on and try it out for yourself here.
We saw that Monte Carlo gives us a pretty good simulation of the outcome, especially when it’s hard to predict and the answer is not deterministic (a fancy way to say when the output is not some fixed quantity every time). And thanks to LLN, we know that our simulation is pretty close to the expected value.
Thank you. End of this post. I hope you understand how powerful the Monte Carlo methods are. No?
Is it actually useful?
Apart from calculating Pi and rolling dice? Definitely! As we discussed earlier, Monte Carlo has a wide range of applications in different fields:
As we discussed earlier, Monte Carlo has a wide range of applications in different fields. One such field is Physics, where it’s hard to determine results in dynamic systems involving a lot of variables. These slides by Prof. S. B. Santra from IIT Guwahati cover these in greater detail if you want to read further.
Another area where Monte Carlo methods give surprisingly good results is Integration, especially ones involving multi-dimensional integral variables. Going back to our Pi example, we calculated the area under the curve that defined the quadrant. That’s exactly the integration of the equation of a circle, without the crazy stuff! In fact, it’s highly likely that if you’ve searched about Monte Carlo Integration before, you’d be greeted with the same example of approximating Pi. Let’s do one more single-variable integration. We know that the Gaussian Integral
is extremely notorious for not being integrable using the single-variable integration method that we are taught in schools. You’ll have to use Polar coordinates. Its graph looks like this:
Using the same Monte Carlo techniques for this graph, we get
in just 30 lines of code, without using any fancy integration formulas! You can try it for yourself here. Of course, this was generated in just a couple of seconds is not the exact value. Again, as you further increase the number of points, you’ll get a much closer estimate. We are ready to trade-off some precision for faster running times for most scenarios, and Monte Carlo might just be the one you could go for.
And if you want to solve a multi-variable integration? It’s not that hard. Most Monte Carlo methods usually follow through a particular pattern:
Define a domain of possible inputs
Generate inputs randomly from a probability distribution over the domain
Perform a deterministic computation on the inputs
Aggregate the results
You can read more about Monte Carlo Integration on these lecture notes by Richard Fitzpatrick from The University of Texas at Austin.
Say you have $100 and want to buy RELIANCE, HDFCBank, INFY, TCS and HINDUNILVR (from the NIFTY50) for your portfolio. How would you divide the amount among them? One simple way can be to divide the $100 equally and buy $20 worth of each asset. This means that each of these assets has equal weight in the portfolio. Can we do better than this?
The Modern Portfolio Theory (MPT) is a Nobel-Prize winning theory that allows investors to construct more efficient portfolios. It helps them understand the risks and returns associated with a particular portfolio based on its assigned weight to each asset. How do we know how much weight to assign to maximize returns according to our risk appetite? That’s where Monte Carlo comes to the rescue!
Instead of ‘calculating’ an efficient distribution of weights, we can simulate different portfolios with different distributions, using Monte Carlo and get the one that has the maximum returns for our risk appetite. Below, I have simulated 3000 portfolios for the assets mentioned above, from the past year’s data, to show the portfolio with maximum Sharpe Ratio (Return/Volatility) and the one with minimum volatility. So yes, Monte Carlo can help you optimize your investment portfolio and increase your wealth over time.
We can see that portfolio with equal weights for each asset has much more risk for its returns and we can optimize it for better returns. The code for this simulation can be found here.
Sampling from a probability distribution
Markov Chain Monte Carlo (MCMC) are a class of algorithms that use Markov Chains to sample from a probability distribution. Instead of using statistically independent random points like the ones we used earlier, random points in MCMC have some correlation that can be explained using the Markov Chains. The Metropolis-Hastings algorithm is an example of a popular MCMC algorithm. This article by Rahul Agarwal excellently explains some applications of the MCMC algorithm.
Dropout as a Bayesian Approximation in Neural Networks
Trust me, it isn’t as scary s it sounds. Gal et. al in their paper titled “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning” discusses that while Neural Networks have shown tremendous improvements in a lot of fields, we don’t know how certain a model is about its prediction. For example, a simple classifier trained on pictures of cats and dogs, if shown a picture of a truck would classify it into one of the two labels because that’s the only thing it can do. On the other hand, a human, even if he has seen only cats and dogs in his entire life and nothing else would still understand that a picture of a truck should neither be classified as a cat nor a dog.
So if we want to include this type of uncertainty in our Neural Network Models, one way can be to sample from various different similar models and calculate their mean and variance to see how confident the predictions are. Most neural networks use Dropout to reduce over-fitting the models to training data by randomly shutting down some connections in the network. The authors propose that this can be seen as random sampling from a population of various similar networks. You can already see where we’re going with this.
While predicting, we can use multiple forward passes of the same data to get different predictions (since Dropout during prediction would shut down different connections on every pass so we get slightly different results each time). We can then see how confident our network is on the given data. This type of Dropout is known as a Monte Carlo Dropout. Tobias Sterbak has written a very good article, complete with code on this topic.
With this, we move to the most interesting section of this post.
What about Games? Does Monte Carlo work there too?
Absolutely. Games have always been an important part of human lives. In the quest to understand them better, we created AI that can now play many of these games better than we do. While some games can be modelled easily, some others are not so straightforward.
You must remember playing Tic-Tac-Toe in your childhood days. It is a perfect-information game, which means that the players have no information hidden from each other (unlike games such as Poker, Bluff, Bridge or Scrabble). Some examples of perfect-information games are Chess, Go or Checkers. Some algorithms can be utilized to predict the best possible moves in these types of games.
Minimax and Minimax with Alpha-Beta Pruning are well-known algorithms that use tree-search to find the best possible move in a game of Tic-Tac-Toe. What Minimax does on a basic level is that it constructs a tree of all possible states of the game from a given point and uses an evaluation function to assign a value to each state. Finding the best possible move is then basically a tree-search alternating between choosing a maximum or a minimum value according to the player’s goals and proceed down the tree.
Now tree searches take a huge amount of time for trees with a large number of moves in each turn (the branching factor). Think of Chess, which on average has around 30 moves for a given state. Though Alpha-Beta pruning reduces our search space in such cases, it requires extensive domain knowledge to formulate a good evaluation function, which may be difficult. This is a challenge especially in games like Go. Now is the time when Monte Carlo comes again, and in the form of Monte Carlo Tree Search (MCTS). It works well for games with higher branching factors and since it doesn’t require domain-specific knowledge, can be generalized easily to other games.
As you might have guessed, we simulate a lot of random plays from the state we are at. If, from all the random plays, selecting a particular state results in the most number of wins, then definitely (by the virtue of LLN) that state should put us in advantage at play.
In mathematical terms, we use the "Upper Confidence Bound 1" (UCB1) formula:
We calculate this for each state that we visit in our different random plays and when deciding what node to visit next, we select the one with the highest value. By repeating it over our gameplay, MCTS allows us to predict the next move in a much simpler way. Since we have to run many random iterations, we can control the “explore vs exploit" trade-off by tweaking the temperature parameter C. A larger value would prefer exploration to visit unvisited nodes, while a smaller one would exploit the visited ones.
Given how simple MCTS is, it is widely used in various tree-search algorithms. In fact, DeepMind extended MCTS methods in Go with Deep Neural Networks in Alpha Go that was so powerful that it defeated the Go World Champion Lee Sedol 4-1.
To read more about Tree Search (and how you can write a program for it), check out the following posts:
That was a long one! I hope this post helped you appreciate the beauty of Monte Carlo Methods a little more. We started with understanding how Monte Carlo methods work and saw examples of simulating Pi and playing a simple dice game. Then, we moved to how they are used in different areas and then we finally saw how they are used to create AI capable of playing (and even beating some of the best players in a game) different games and saving computation costs in the process. The codes that I have used for some of the simulations used in the post can be found on this Kaggle Kernel.
Discovering the different applications was the most fun part for me and I’m sure it must have been for you too! Understanding how we gain so many insights, all from randomness is so much amazing.
Since we started this post with a quote from Nassim Nicholas Taleb, let’s end it with one of his tweets. Can you use Monte Carlo and find the answer to the tweet below? Let me know in the comments!
If you’re new to the Perceptron and are interested to know more about AI, why not subscribe?
Update 1 [01/08/2021]: Changed the Cryptocurrency to NIFTY50 stocks in the Portfolio Optimization example and added the “Dropout as a Bayesian Approximation in Neural Networks” section.
Had excellent time reading the article, interesting examples.
An awesome and fun read! Enjoyed a lot.