Monday, December 11, 2017

Monte Carlo Techniques: Estimating the Value of Pi with Random Numbers

I want to write a series of articles on using Monte Carlo methods in computational finance.  Monte Carlo algorithms are numerical techniques that use repeated sampling of random numbers to perform the desired calculation.  Such techniques are becoming increasingly frequent even down to the level of the individual trader with some brokerages using Monte Carlo simulations to estimate the probability of reaching a given percentage of maximum profit on short-premium options trades.

While that is not as complicated as one might think, I wanted to start out with a very simple article first.  In this article, we will write a simple Python script to estimate the value of $\pi$. In a follow-up post, we will introduce the concept of a random walk and use it to model a stock's price action. In the last article of this series, we will show how to predict the probability of reaching a given percentage of profit on an option trade.

So how can we use random numbers to estimate a constant like $\pi$?  We start with a unit circle circumscribed by a square.  Since the circle has a radius of one, the side of the square must have a length of two.
Figure 1:  A unit circle centered at the origin circumscribed by a square.
For the sake of coding convinces, we'll only look at the upper right quadrant as shown below in Fig 2.  This is because we'll use the built in rand function in numpy, which selects random values between zero and one. If we restrict ourselves in this way, we won't have to rescale this range to between -1 and 1.  Rescaling is no big deal, but if we can simply avoid doing so, we will take that opportunity.
Figure 2:  We will only consider the upper left quadrant.
The square shaded above in light gray now has sides $L = 1$ and an area of $L^2 = 1$.  The quadrant in the shaded region has an area of one-quarter the original, in other words, $\pi R^2 / 4$.  If we throw a large number of darts that hit randomly in that upper right quadrant,  we would expect the ratio of the number that hit within the circle to the total number of darts thrown to approach the ratio of the circle's area to that of the square.  In other words,
$$\frac{N_{inside}}{N_{total}} \approx \frac{A_{circle}}{A_{square}} = \frac{ \frac{1}{4} \pi R^2}{1}.$$

Fig. 3 shows an animation of this process.  For the sake of clarity, we only include 25 points in this animation whereas the actual calculation we will use significantly more.
Figure 3:  25 XY-pairs drawn at random.  For large numbers of pairs, the ratio of those within the circle to the total number of points should converge to the ration of the circle's area to that of the square.
Finding if a given point is within the circle is straightforward.  Since the circle has a radius of 1, the point is inside if
$$\sqrt{ x^2 + y^2} \leq 1^2$$.
The code below implements this calculation using the Python's Numpy library.  It should print "pi =  3.140936" to the screen.


# Python 2.7 script to estimate the value of pi using Monte Carlo techniques
import numpy as np

#  Use a constant random seed for debugging purposes
np.random.seed(0)

#  Number of random points to choose
N = 1000000

#  Create a 2-dimensional numpy array.  Each row will correspond to one
#  point with the entry in the first column being the X value and the
#  second column providing the Y values.
p = np.random.rand(N, 2)

#  Use the built-in norm function to calculate the distance from the origin
n = np.linalg.norm(p, axis = 1, keepdims = True)

#  Sum up the number of entries in the circle.  That is count the number of
#  points who have a norm less than or equal to one
number_in_circle =  np.sum(n <= 1)

#  Print out the results
print 'pi = ', 4 * float(number_in_circle) / float(N)

While this example borders on the trivial, it serves as a good starting point for more sophisticated calculations.  As noted in the introduction, I've used Monte Carlo methods in financial modeling (which we will discuss in later posts), as well as in complex physical systems such as modeling the loss of active material from cycling batteries.

Some comments on vectorization

The code above uses a "vectorized" implementation.  If you are familiar with the concept, there is no need to read further.  If not, I'd like to take a moment to explain the concept because we will use it heavily in future articles.

Numpy is a package for numerical calculations in Python.  It provides a significant speed increase over writing code in straight Python because it is written in C rather than Python itself.  It also provides the ability to "vectorize" calculations meaning operation can be performed on whole vectors or matrices without having to explicitly loop through the elements.  This can further enhance speed considerably.
Figure 4:  Snapshot of a Jupyter notebook.  We load in the numpy and math libraries (the math functionality will be used later), create two numpy arrays, add them together, and finally print out the results.
As an example, in Fig 4, we create two arrays p1, and p2, and compute their elementwise sum.  There is no need to loop over each index and add them in a piecemeal manner.  Numpy also provides a number of functions that can perform vectorized operation on arrays.  Notice in the code to estimate $\pi$,  we use the functions np.linalg.norm and np.sum to calculate the distance of the point from the origin and count the number of points with the circle, respectively.

Figure 5:  Compute the L2 norm of a vector $p$, and compare the result to that of calculating the distance between $p$ and the point (0, 0) using the Pythagorean theorem.
The function np.linalg.norm calculates the length of a vector (L2 norm).  In the code snippet shown in Fig. 5 we show that calculating the L2 norm via this  functionality is the same as explicitly calculating the length between the origin and a point $p$.  Lastly in Fig 6. we show the equivalence of looping over an array and calculating the distance of each point in that array to simply using the built in norm function.

Figure 6:  Combining the notions from Figs 4 & 5, we compare the vectorized implementation of the norm of an array of points to that of explicitly looping over the array and calculating the L2 norm via the Pythagorean theorem.

The vectorization used here is straightforward and pretty intuitive, but it is important to have a handle on the concept.  In subsequent articles where we discuss ransom walks in the context of modeling stock behavior, we will lean heavily on this as well as SciPy's ability to deal with sparse matrices  rapidly.  This is a bit more abstract, and thus it is a good idea to make sure one understands the simple case before jumping into the deep end.


Articles in this series

Part II:  Monte Carlo Techniques: Modeling Stock Price Action
Part III:  Monte Carlo Techniques:  Calculating the Probability of Making 50% of Max Profit on Short Option Positions
Part IV:  Monte Carlo Techniques:  Estimating Correlations between Option Positions and  and Their Underlying Stock


1 comment: