Friday, December 29, 2017

Monte Carlo Techniques: Modeling Stock Price Action




A random walk is a stochastic process that describes a path as a succession of random steps.  To understand the basics of random walks, we will consider the case where we start off at position $y = 0$ and take a step in the vertical direction if we flip a heads on a coin toss, and one step down if we throw a tails.  The animated image in Fig 1 shows this process for four different trials.  Any given path will, of course, be random, but repeating this process numerous times allows us to calculate statistics on the collection of paths and draw some conclusions.  For example with a large number of simulations, we could estimate the probability of ending up at a given point after a specified number of steps.  This is a useful technique when it is inconvenient or impossible to work out such details by hand, and in the age of fast computers is being used more and more.

Figure 1:  A random walk along the integers.  Four different walks are shown with the position at each step being shown on the Y-axis.


Stock returns-- the percent change (we'll use day-to-day changes)-- are assumed to be normally distributed.  The assumption is there an upward drift that depends on the risk-free interest rates at the time, plus some sort of volatility term.  This volatility term is the piece that is assumed to be normally distributed.  As an aside, we can assume any type of distribution we'd like.  That's an advantage of Monte Carlo techniques--  we can do these calculations quickly with a variety of assumptions just by tweaking a bit of code.  But for this article, we'll assume a normal distribution which is not far from what is actually observed looking at historical stock data.  We will implement a random walk to simulate stock behavior using Python 2.7 along with the Numpy and Scipy libraries.

Expressed as an equation, we have,
\begin{equation} \frac{S_{i+1} - S_i}{S_i} = r\Delta t + \sqrt{\Delta t}\sigma \epsilon_i. \label{daily_returns} \end{equation}

Here, $S_i$ is the stock price on the $i$th day, $r$ is the annualized risk-free ratem that has been scaled down to whatever time step we're using, $\Delta t$ is the time step (in our case it will be one day), $\sigma$ will be the volatility that has also been scaled down to the proper time step,  $\epsilon$ is a random number sampled from the standard normal distribution (mean = 0 and standard deviation = 1). Numpy provides a built-in function that does this sampling for us, np.random.normal

While we could rearrange Eq. \ref{daily_returns} as,
\begin{equation} S_{i+1} = S_i \left(r\Delta t + \sqrt{\Delta t}\sigma \epsilon_i\right) + S_i, \label{next_day} \end{equation}
and loop over the desired number of days, we will eventually want to reuse this code for more complicated problems where we might need to do millions of simulations.  Thus, speed is of the essence.  Therefore, we'll write a vectorized implementation instead of one with loops.  At first glance, the math invoked in doing so seems a bit abstract, but it just amounts to solving a bunch of algebraic equations. 

We will introduce a bit of notation just to keep things neater and set the right-hand side of Eq. \ref{daily_returns} to
$$r\Delta t + \sqrt{\Delta t}\sigma \epsilon_i = \Lambda_i.$$

If we do a bit of algebra and write out the first few iterations of Eq.  \ref{next_day} , we see
that
\begin{equation}\begin{array}{lcl}S_0                     & = & S_0 \\S_1 - S_0 \Lambda_1 - S_0 & = &  0\\\S_2 - S_1 \Lambda_2 - S_1 & = & 0 \\S_3 - S_2 \Lambda_3 - S_2 & = & 0 \\& \vdots & \\S_i - _{i-} \Lambda_i - S_{i-1} & = & 0 .\\\end{array}\label{unrolled}\end{equation}
We can rewrite Eq. \ref{unrolled} in matrix form as,
\begin{equation} \left( \begin{matrix} 1 & 0 & 0 & 0 & \cdots & 0 \\ \Lambda_1 + 1 & -1 & 0 & 0 & \cdots & 0 \\ 0 & \Lambda_2 + 1 & -1 & 0 & \cdots & 0 \\ 0 & 0 & \Lambda_3 + 1 & -1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & 0 & \Lambda_5 + 1 & -1 \end{matrix} \right) \left( \begin{array}{c} S_0 \\ S_1 \\ S_2 \\ S_3 \\ \vdots \\ S_i \\ \end{array} \right) = \left( \begin{array}{c} S_0 \\ 0 \\ 0 \\ 0 \\ \vdots \\ 0 \\ \end{array} \right). \label{simple_matrix} \end{equation}

So now we've reduced our entire problem to solving one equation.  The solution to Eq. \ref{simple_matrix} amounts to finding the inverse of a matrix.  Notice also that almost all of the entries in this matrix are zeros.  Only the entries along the main diagonal and the one immediately below are non-zero.  This is known as a sparse matrix.

Sparse matrices have several advantages.  First, since most of the entries are zero, we can store only the non-zero values in memory.  In our case, we have finite values only on the main diagonal and diagonal immediately below.  For our specific problem then, we have $2N_{\mbox{days}} + 1$ numbers to story in memory as opposed to $(N_{\mbox{days}}+1)^2$ if every entry have to be accounted for.  The second advantage is we can use sparse matrix algorithms to calculate the inverse.  This speeds up the computation immensely.

Many numerical methods libraries have have these sparse matrix algorithms included in the package and Python is no expectation with the scipy.sparse and scipy.sparse.linalg packages containing what we need.

A couple of notes and assumptions before we begin to code this:

  1.   We'll ignore the effect of weekends.  We won't try to account for the weekend by bumping up volatility for the simulated Mondays or try to account for it in some other way.
  2.  We will take the number of days per year as 252, the number of trading days rather than 365, the number of calendar days.  It is a bit ambiguous as which number to use and in practice should probably be determined by testing each against actual market data.  Since we are ignoring weekends, however, I thought it would be clearer to use the number of trading days.
  3.   We will state without justification that to scale the volatility from yearly to daily we use the formula, $\sigma_{\mbox{day}} = \sigma_{\mbox{year}} / \sqrt{252}$.
  4.  We will assume the volatility $\sigma$ remains constant over our time period.  In reality, this is not true.
  5.  For the most part, determining the values of the parameters that go into the model is not a problem, but deciding what to use for volatility is not immediately obvious.  Since we are assuming returns are normally distributed, we could look at historical price data and calculate the standard deviation of returns and use that value.  This is referred to as historical volatility.  In practice however,  it is best to use the implied volatility derived from option prices.  This is the market's estimate of what volatility will be in the time period until the expiration of the option contract.  This number is given by your trading platform or can be calculated directly from the option prices.  I have an article on how to do this calculation here.

Let's begin to translate this into Python code.  We will need the numpy and scipy libraries mentioned above as well as some basic math functions.  We will also load in the matplotlib library to plot the results.

import numpy as np
from math import sqrt
import matplotlib.pyplot as plt
import scipy.sparse
import scipy.sparse.linalg

We'll set the random seed to a constant for debugging purposes.  This makes the random number generator use the same numbers each time the program is run, which will allow one to discern whether odd numerical values are caused by fluctuating random numbers or a bug in the code.  We will also set the parameters of our simulation:  the initial stock price, the risk-free rate, time-step , and volatility.

np.random.seed(0)               #  Set a constant random seed for debugging
                                #  purposes

N_days = 30                     #  Number of days to simulate

S0 = 100.0                      #  Initial stock price
r = 0.01 / 252.0                #  Risk-free rate (daily percent)
dt = 1.0                        #  Time step (days)
sigma = 0.15 / sqrt(252.0)      #  Volatility (daily percent)

Next, we create a vector of normally distributed numbers and calculate the daily percent change of the stock.

#  Create a vector of normally distributed entries with each entry
#  corresponding to  one day
epsilon = np.random.normal( size = (N_days) )

ds_s = r * dt + sigma * sqrt(dt) * epsilon  #  Vector of daily percent
                                            #  change in the stock price

Next, we set up our sparse matrix.  The list $d$ will contain our two vectors which will form the diagonals, and the list $k$ will designate which of these vector goes along which diagonal.  Then we create the matrix $M$ with the sparse.diags command.  The option format = 'csc' tells the code to use compressed sparse column format to represent the matrix internally and is not really important for the purpose of this article.

#  We need to build the diagonals of our sparse matrix
#  All of the main diagonal is equal to -1 expect the first entry which is
#  equal to one
ones = -np.ones( (N_days + 1) ); ones[0] = 1.;

#  Define our two diagonals,  the lower and main
d = [ds_s + 1, ones]

#  the K vector tells Python which of the vectors defined in 'd' go in
#  which diagonal.  Zero corresponds to the main, and -1 to the diag
#  immediately below.
K = [-1, 0]

#  Define the sparse matrix
M = scipy.sparse.diags(d, K, format = 'csc')

Lastly, we'll define the vector $p$ which corresponds to the right-hand side of Eq. \ref{simple_matrix}.  Then we will solve for our vector of stock prices, $s$.

#  Define a column vector off all zeros expect for the first entry which is
#  our initial stock price.
p = np.zeros( (N_days + 1, 1) )
p[0] = S0

#  Solve the system  M * s = p for the vector s
s = scipy.sparse.linalg.spsolve(M, p)

Now that we have the solution, let's visualize the results.

#  Plot the results
t = np.arange(0, N_days + 1)
plt.plot(t, s, 'k')
plt.xlabel('Time (days)')
plt.ylabel('Stock Price ($)')
plt.grid(True)
plt.show()

The code above should produces the plot shown in Fig. 1.
Figure 1:  The results of a single simulation using the parameters noted above.
In and of itself, the above isn't very useful.  It returns only one random walk simulated run of a stock.  To get something useful we would need to do a large number of such simulations.  Again, we could loop over the above code to do so but we will do this in a fully vectorized way by "stacking" up our vectors in Eq. \ref{simple_matrix}.   An explicit example might be clearer.  Let's adopt the notation where $S_{i,j}$ is the price on the $i$th day in the $j$th simulation.  The if we have two simulations for five days, the column vectors in \ref{simple_matrix} would be,
\begin{equation} \left( \begin{array}{c} s_0 \\ s_{1,1} \\ s_{2,1} \\ s_{3,1} \\ s_{4,1} \\ s_{5,1} \\ s_0 \\ s_{1,2} \\ s_{2,2} \\ s_{3,2} \\ s_{4,2} \\ s_{5,2} \\ \end{array} \right), \mbox{ and} \left( \begin{array}{c} s_0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ s_0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ \end{array} \right).\end{equation}
The matrix is constructed similarly, stacking a matrix for each day along the diagonal.  In the example above, this would be,
$$\left( \begin{array}{rrrrrrrrrrrr} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \Lambda_{1,1} + 1 & -1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & \Lambda_{2,1} + 1 & -1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & \Lambda_{3,1} + 1 & -1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & \Lambda_{4,1} + 1 & -1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & \Lambda_{5,1} + 1 & -1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & \Lambda_{1,2} + 1 & -1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & \Lambda_{2,2} + 1 & -1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \Lambda_{3,2} + 1 & -1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \Lambda_{4,2} + 1 & -1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \Lambda_{5,2} + 1 & -1 \end{array} \right).$$

Since we are going to want to use this code in other models, we will encapsulate it all in a function.   The code below constructs the vectors and matrix for multiple simulations according to the user's input.

def stock_monte_carlo(init_price, N_days, N_sims, r, sigma):

    #  Scale interest rates and volatility.  Define time-step.
    r = r / 252.
    dt = 1.0
    sigma = sigma / sqrt(252.0)

    #  Calculate vector of normally distributed numbers and use it to
    #  calculate the daily percent change.
    epsilon = np.random.normal( size = (N_sims * N_days + N_sims - 1) )
    ds_s = r * dt + sigma * sqrt(dt) * epsilon

    #  Step up matrix diagonals
    ones = -np.ones( (N_sims * N_days + N_sims) )
    ones[0:-1:N_days+1] = 1.

    ds_s[N_days:N_days * N_sims + N_sims:N_days+1] = -1
    d = [ds_s + 1, ones]
    K = [-1, 0]

    #  Solve the system of equations
    M = scipy.sparse.diags(d, K, format = 'csc')
    p = np.zeros( (N_sims * N_days + N_sims, 1) )
    p[0:-1:N_days+1] = init_price
    s = scipy.sparse.linalg.spsolve(M, p)

    #  Reshape the column vector so the function returns a matrix where
    #  each row is a single simulation with each day corresponding the the
    #  columns
    return np.reshape(s, (N_sims, N_days+1))

Let's make use of this and generate a plot of all the simulated runs.


num_sims = 1000
N = 30

r = 0.01
sigma = 0.15
S0 = 100.0

s = stock_monte_carlo(S0, N, num_sims,  r, sigma)

t = np.arange(0, N + 1)
for i in range(num_sims):
    plt.plot(t, s[i,:], 'k', alpha = 0.05)

plt.xlabel('Time (days)')
plt.ylabel('Stock Price ($)')
plt.grid(True)
plt.show()

The above code should produce the following plot.  Since we're plotting 1000 different runs, we  make each plot transparent with an alpha of 0.05 to avoid the whole thing being a giant mess.  The results are what we'd intuitively expect --  the most frequent outcome is the stock drifting slightly higher or lower than the initial price.  The outlier moves, either substantially up or down, are rather rare.
Figure 2:  A plot of 1000 different stock simulations.
We can use a histogram to visualize the price on any given day:


plt.hist( s[:,-1], bins = 50, edgecolor = 'k', linewidth = 0.5 )
plt.xlabel('Stock Price ($)')
plt.ylabel('Count' )
plt.show()
Figure 4:  Histogram showing price distribution on day 30 for 1000 simulations.
The histogram confirms what we'd expect by looking at Fig. 3, the bulk of the prices end up a bit above or below our initial stock price of $100.   If we calculate the mean and median price on day 30, we find they are 100.20 and 100.04, respectively.

We've constructed a simple Monte Carlo simulation for the price action of a stock and written a function that outputs the values for all runs.  The next article in this series will show how to use the above code to calculate the probability of making 50% (or any specified percentage) of maximum profit on a short options position.

Articles in this series

Part I:  Monte Carlo Techniques: Estimating the Value of Pi with Random Numbers
Part III:  Monte Carlo Techniques: Calculating the Probability of Making 50% of Max Profit on Short Option Positions
Part IV:  Monte Carlo Techniques:  Estimating Correlations between Option Positions and  and Their Underlying Stock

No comments:

Post a Comment