Sunday, June 2, 2019

Solving Partial Differential Equations with Python

Despite having a plan in mind on the subjects of these posts, I tend to write them based on what is going on at the moment rather than sticking to the original schedule.  The last article was inspired by a couple of curve-fitting questions that came up at work within short succession, and this one, also inspired by questions from our scientists and engineers, is based on questions on using Python for solving ordinary  and partial differential equations (ODEs and PDEs).  One question involved needing to estimate how long a cylindrical battery cell would take to cool down given various thermal contacts with a heat sink and electrical pulsing criteria.  They wanted to run though a bunch of scenarios quickly and didn't have the time or desire to set something up with a full-blown finite-element package like Ansys or COMSOL.  That exact problem is beyond the scope of this post, but we can still look at solving simple heat-transfer problems.

We'll look at a couple examples of solving the diffusion equation for different geometries and boundary conditions.  We'll start off with the common Python libraries numpy and scipy and solve these problems in an somewhat "hacky" sort of way.  This will enable us to solve Dirichlet  boundary value problems.  You'll see why I say this is a bit of a hack as we go set up the problem.  Then we'll look at solving the same types of problems using the Assimulo package which a Python interface built around the Sundials differential algebraic equation solves put out by Lawrence Livermore National Laboratory.  This will enable us to solve problems with Neumann boundary conditions as well.

The reason for not just using Assimulo from the start is scipy is a pretty standard library while Assimulo is more specialized and needs a C and FORTRAN compiler to install on Linux.  I haven't yet tried installing it on Windows.  I wanted this to be useful for those who don't have the tools available, or the permission to buld the package from scratch such as might be the case in a company's work environment.

In the interest of brevity, I'll only do 1-dimensional problems here and in a short follow-up, expand the ideas here to more complicated 2-dimensional problems.

As a last bit of administrivia, I will build the time derivative vectors using a for-loop for the main part of this code.  I think this a bit clearer for the most part, but in general is not as computationally efficient.  The building of the derivatives and residuals should ideally be done in a vectorized manner.  I'll present code that does this in the appendix with a little description, but will stick to the for-loop for the most part.

So, with all that said, let's begin.

We'll start off with a 1-dimensional diffusion equation and look to solve for the temperature distribution in a rod whose end points are clamped at different fixed temperatures.  For the sake of simplicity, we'll assume the diffusion coefficient is constant.

Our strategy will be to approximate the PDE by a system of ordinary differential equations which can be solved by existing ODE solvers.  This is done by discretizing the spatial derivatives leading to an ordinary differential equation that describes the time evolution of the temperature at each grid point.  Boundary and initial conditions gives us some algebraic equations that provide constraints on this system.

The temperature as a function of position and time is given by the 1-d diffusion equation,
\begin{equation}
\frac{\partial T}{\partial t} = D\frac{\partial^2 T}{\partial x^2},
\label{eq:1d_diffusion}
\end{equation}
where $t$ is time, $x$ is the position along the rod, $D$ is the diffusion coefficient, and $T$ is the temperature at a given position and time.

We will impose the boundary conditions $T(t, x = 0) = T_L$ and $T(t, x = L) = T_R$ where $L$ is the length of the rod.  These are Dirichlet boundary conditions where the value of the function is specified at the boundaries. Later on, we'll touch briefly on  Neumann conditions, where the derivative at the boundaries are known.

We will use a finite difference method discretizing the spatial component of our equation.  We will use a central difference formula and approximate the second derivative at the $i$th point as,
\begin{equation}
\frac{\partial^2 T}{\partial x^2_i} \approx \frac{ T_{i+1} - 2T_i + T_{i-1}}{\Delta^2}.
\label{eq:central_difference}
\end{equation}
The subscript $i$ in Eq. \ref{eq:central_difference} again reffers to the $i$th grid point on our rod, and $\Delta$ is the distance between grid points.  Here, we've assumed a uniform spacing for simplicity.

If there are a total of $N$ points, we can rewrite Eq. \ref{eq:1d_diffusion} as
 a system of $N-2$ ODEs and two algebraic constrain equations for our boundary conditions.
\begin{align}
T_1 &  = T_l \nonumber  \\
\frac{dT_2}{dt} & = & D\left(\frac{ T_3 - 2T_2 + T_1}{\Delta^2}\right)\nonumber \\
\frac{dT_3}{dt} & = & D\left(\frac{ T_4 - 2T_3 + T_2}{\Delta^2}\right)\nonumber \\
&\vdots& \\
\frac{dT_{N-2}}{dt} & = & D\left(\frac{ T_{N-1} - 2T_{N-2} + T_{N-3}}{\Delta^2}\right)\nonumber \\
\frac{dT_{N-1}}{dt} & = & D\left(\frac{ T_{N} - 2T_{N-1} + T_{N-2}}{\Delta^2}\right) \nonumber \\
T_N & = T_r\nonumber.
\label{eq:difference_loop}
\end{align}

Now to code this.  Scipy has a built-in differential equation solver solve_ivp included in the scipy.integrate package.  It solves equations of the form $dy/dt = f(t, y)$, and uses a Runge-Kutta type algorithm by default.  The function passed to the solver does not take addition arguments.  Therefore, to pass other parameters to the solver, we need to use either a nested function or create a class and use class variables to hold additional information.  For the first examples, we will use the first method, and when we look at Assimulo later, we will use the second.

Consider the following code;

#  Import numpy and the ODE solver from Scipy
import numpy as np
from scipy.integrate import solve_ivp

#  Import plotting functionality
import matplotlib.pyplot as plt

def diffusion(T, t, D, N, L, Tl, Tr):

    #  Calculate spacing between points
    delta = float(L) / float(N)

    def equations(t, T):
        #  Boundary conditions set explicitly.  This is probably redundant
        #  as these numbers are set before the function is called and the
        #  code takes the time derivative at these points to be zero, but
        #  we'll ensure the proper boundary conditions anyway.
        T[0] = Tl
        T[-1] = Tr

        #  Initialize time derivative vector
        Tprime = np.zeros( (N) )

        #  We are being lazy and forcing the temperature of the boundaries
        #  not to change by setting the derivative to zero at those points
        Tprime[0] = 0.0
        Tprime[-1] = 0.0

        #  Implement the diffusion equation in the interior points using the
        #  central difference formula.  This can be vectorized for better
        #  performance, but I am just using a loop here.
        for i in range(1, N - 1):
            Tprime[i] = D * (T[i+1] - 2 * T[i] + T[i-1]) / delta**2

        return Tprime

    #  Solve the equation by calling solve_ivp and return the solution
    sol = solve_ivp(equations, [tspan[0], tspan[-1]], T, t_eval = tspan)
    return sol


#  Divide the rod into N points, 100 in this case
N = 100
L = 1           #  Rod length
D = 0.01        #  Diffusion coefficient
Tl = 0.         #  Left-hand boundary condition
Tr = 10.0       #  Right-hand boundary condition

#  We are interested in t = 2 to 2 = 2 seconds
tspan = np.linspace(0, 2)

#  Initial conditions
T = 5.0 * np.ones( (N) )

#  These are the boundary conditions.
T[0] = Tl
T[-1] = Tr

#  Solve the problem by calling our function which invokes the solver
sol  = diffusion(T, tspan, D, N, L, Tl, Tr)

#  Plot the solution
plt.plot(np.linspace(0, 1, N), sol.y[:, -1], 'k')
plt.xlabel('Position (m)')
plt.ylabel('Temperature ($^\circ$C)')
plt.grid(True)
plt.show()

The is rather simple, so I've just presented the entire thing here in one block.    We took the diffusion constant $D$ to be 0.01, the length of the rod $L$ to be 1m, and the initial temperature to be 5 degrees.  The important part of this is the function equations which does the discretization and gives an ODE for each point of our grid.  I've set the boundary conditions explicitly and set the time derivative at the two end point to zero to ensure they do not change.

Running this gives the following result for the temperature profile after two seconds.
Figure 1:  Temperature profile in the rod after two seconds
Simple enough.  You can also see why it is a bit of a hack.  The solver assumes each equation is an ODE, so I had to set the time derivatives equal to zero and explicitly set the boundary temperatures in the code.  This works well enough in this case, but with more complicated boundary conditions, such as constant flux at the boundary, we can't use this trick.

The proper way to approach this is to use a differential algebraic equation (DAE) solver.  DAEs, as the name implies, are systems of both ODEs and algebraic equations.  They are generally written in the form of,
\begin{align}
\frac{dx}{dt} & = f(x(t), y(y), t) \\
\label{eq:DAE}
0 & = g(x(t),  y(t), t). \nonumber
\end{align}

Assimulo comes with a DAE solver as part of the package and we can make use of that directly.  This will enable us to handle more involved boundary conditions.

The code is more or less the same as above with a couple important exceptions.  The first is the form of the function that defines Eq. 4.  The solver expects an equation in the form,
\begin{equation}
 0 = F\left( t, y, \frac{dy}{dt} \right).
\label{eq:DAE_res}
\end{equation}
So we write our function to have the form of res = f(t, y, yprime). where the variable res is the residual.  As I noted earlier, I wrapped the solver function in a class this time rather than using nested functions.


#  Import numpy
import numpy as np

#  Import needed Assimulo libraries
from assimulo.solvers import IDA
from assimulo.problem import Implicit_Problem

#  Import plotting libraries
import matplotlib.pyplot as plt


class diffusion:

    def __init__(self, t, T, D, N, L, Tl, Tr):

        #  Calculate spacing between points
        self.delta = float(L) / float(N)

        #  Assign the variables passed into the constructor to class
        #  variables.
        self.N = N
        self.D = D
        self.L = L
        self. Tl = Tl
        self. Tr = Tr

        #  Initial guesses for T and Tprime (T should be OK as that is just
        #  our initial temperature distribution
        T0 = T
        Tprime0 = np.zeros( T.shape )
 
        t0 = 0.0

        #  Set which variables are to be treated as differential or
        #  algebraic.
        algvar     = [True] * T.shape[0]
        algvar[0]  = False
        algvar[-1] = False

        #  Create the model object and pass initial guesses at T and
        #  Tprime.  The code will then solve for consistent initial
        #  conditions.
        model = Implicit_Problem(self.equations, T0, Tprime0, t0)

        #  Create a simulation object from the IDA DAE solve and pass in
        #  our model information.
        sim = IDA(model)

        #  Bind our algvar variable to the sim object
        sim.algvar = algvar

        #  Calculate consistent initial conditions.  See IDA and Sundials
        #  documentation for more information.
        sim.make_consistent('IDA_YA_YDP_INIT')

        #  Integrate the system and bind the results to class variables.
        self.t, self.T, self.Tprime = sim.simulate(t[-1], 100)


    #  Define the system of equations
    def equations(self, t, T, Tprime):

        #  Allocate the residual vector
        res = np.zeros( T.shape )

        #  Set up boundary conditions
        res[0] = T[0] - Tl
        res[-1] = T[-1] - Tr

        #  Calculate second derivative terms
        for i in range(1, N - 1):
            res[i] = Tprime[i] - D * (T[i+1] - 2 * T[i] + T[i-1]) \
                / self.delta**2

        #  Return the residual vector
        return res


#  Divide the rod into N points, 100 in this case
N = 100
L = 1           #  Rod length
D = 0.01        #  Diffusion coefficient
Tl = 0.         #  Left-hand boundary condition
Tr = 10.0       #  Right-hand boundary condition

#  Define a time vector and set the initial temperature
t = np.linspace(0, 2)
T = 5.0 * np.ones( (N) )

#  Run the code to solve the problem
problem = diffusion(t, T, D, N, L, Tl, Tr)

#  plot the results
x = np.linspace(0, L, N)
plt.plot(x, problem.T[-1, :], 'k')
plt.xlabel('Position (m)')
plt.ylabel('Temperature ($^\circ$C)')
plt.grid(True)
plt.show()

The other issue is that unlike ODEs, where for a first order equation we need only provide the initial values, in the case of DAEs, we need to provide both the initial values as well as the initial values of the first derivatives.  We don't know the value of these derivatives which presents a problem  We could write our own code to calculate these, but fortunately, the IDA solver can work this out for us without us having to much other than tell the solver which variables are to be treated as algebraic and which are differential.

The line,

sim.make_consistent('IDA_YA_YDP_INIT')
instructs the solver to calculate the initial values of the algrabraic compoents of $Y$ and the differential components of $y^\prime$.  Therefore we need to tell the solver which variables are algebraic and which are differential.  This is done in the following lines.

        #  Set which variables are to be treated as differential or
        #  algebraic.
        algvar     = [True] * T.shape[0]
        algvar[0]  = False
        algvar[-1] = False
and

        #  Bind our algvar variable to the sim object
        sim.algvar = algvar
See the Sundials documentation for more information.

The rest of the code is essentially the same as the previous.  Running this gives,
Figure 2:  Output of Assimul integration.
The results are identical to those presented in Fig. 1.

You can see the utility of using this method as opposed to Scipy's internal solvers right away.  Say we had the boundary constraint,
\begin{equation}
\frac{dT}{dx}_{x = L} = C.
\end{equation}
We can handle this by replacing the current boundary condition for the right-hand side with this line of code:
res[-1] = (x[-1] - x[-2]) / Delta - C
It's that simple.

Appendix - Vectorization

I have said in other articles, one wants to avoid looping over arrays and matrices when using libraries like Numpy and Scipy.  These packages are typically written in highly optimized C or FORTRAN code and compiled into libraries called by Python.  As these binary libraries are much faster than code running through an interpreter, it is a good idea to try to write things to take advantage of this fact.   When I learned numerical techniques, we were writing the code in C or C++, so this was less of an issue. I presented the code above using loops because I find that a little more intuitive, but I would be remiss if I didn't at least give a vectorized version.

Consider Eqs. \ref{eq:central_difference} and 3.  With the exception of the first and last row, these equations can be though of as the following matrix equation,
\begin{equation}
\vec{T^\prime} = \frac{D}{\Delta^2} \overline{\overline{\sigma}} \quad \vec{T},
\label{vecotr}
\end{equation}
where $\vec{T^\prime}$ and $\vec{T}$ are column vecrtors of the time derivative of the temperature and the temperature, respectively.  The matrix $\overline{\overline{\sigma}}$ is a sparse, tridiaonal matrix whose elements are,
\begin{equation}
\left(\begin{matrix}
\cdots & \cdots & \cdots & \cdots & \cdots & \cdots & \cdots & \cdots\\
1 & -2 & 1 & 0 & 0 & 0 & \cdots& 0 \\
0 & 1 & -2 & 1 & 0 & 0 & \cdots & 0 \\
\vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\
0 & 0 & 0 & 0 & 0 & 1 & -2 & 1 \\
\cdots & \cdots & \cdots & \cdots & \cdots & \cdots & \cdots & \cdots\\
\end{matrix}\right).
\label{eq:matrix}
\end{equation}
The first and last rows of the matrix are where we'd set the boundary conditions.  When we build the matrix, we explicitly tell Scipy to use a sparse matrix representation.  Most of the entries in Eq. \ref{eq:matrix} are zeros. Scipy is smart enough to not allocate memory for all the zero entries and to not multiply out all the zeros when evaluating Eq. 7.  This is very useful, as these matrices can get large fast.

The code is essentially the same as before, just with the loop replaced by matrix multiplication.

#  Import numpy
import numpy as np

import scipy.sparse

#  Import needed Assimulo libraries
from assimulo.solvers import IDA
from assimulo.problem import Implicit_Problem

#  Import plotting libraries
import matplotlib.pyplot as plt

#  Import time module to compare speed differences
import time

class diffusion:

    def __init__(self, t, T, D, N, L, Tl, Tr):

        #  Calculate spacing between points
        self.delta = float(L) / float(N)

        #  Assign the variables passed into the constructor to class
        #  variables.
        self.N = N
        self.D = D
        self.L = L
        self. Tl = Tl
        self. Tr = Tr

        #  Initial guesses for T and Tprime (T should be OK as that is just
        #  our initial temperature distribution
        T0 = T
        Tprime0 = np.zeros( T.shape )
 
        t0 = 0.0

        #  Set which variables are to be treated as differential or
        #  algebraic.
        algvar     = [True] * T.shape[0]
        algvar[0]  = False
        algvar[-1] = False

        #  Create the model object and pass initial guesses at T and
        #  Tprime.  The code will then solve for consistent initial
        #  conditions.
        model = Implicit_Problem(self.equations, T0, Tprime0, t0)

        #  Create a simulation object from the IDA DAE solve and pass in
        #  our model information.
        sim = IDA(model)

        #  Bind our algvar variable to the sim object
        sim.algvar = algvar

        #  Calculate consistent initial conditions.  See IDA and Sundials
        #  documentation for more information.
        sim.make_consistent('IDA_YA_YDP_INIT')

        #  Integrate the system and bind the results to class variables.
        self.t, self.T, self.Tprime = sim.simulate(t[-1], 100)


    #  Define the system of equations
    def equations(self, t, T, Tprime):

        #  Allocate the residual vector
        res = np.zeros( T.shape )

        #  Set up boundar conditions
        res[0] = T[0] - Tl
        res[-1] = T[-1] - Tr

        #  Calculate second derivative terms
        for i in range(1, N - 1):
            res[i] = Tprime[i] - D * (T[i+1] - 2 * T[i] + T[i-1]) \
                / self.delta**2

        #  Return the residual vector
        return res

class vectorized_diffusion:

    def __init__(self, t, T, D, N, L, Tl, Tr):

        #  Calculate spacing between points
        self.delta = float(L) / float(N)


        #  Assign the variables passed into the constructor to class
        #  variables.
        self.N = N
        self.D = D
        self.L = L
        self. Tl = Tl
        self. Tr = Tr

        diagonals = [  np.ones((self.N-1,)),
            -2. * np.ones((self.N,)), 
            np.ones((self.N-1,)) ]
        self.M = scipy.sparse.diags(diagonals, [-1, 0, 1])

        #  Initial guesses for T and Tprime (T should be OK as that is just
        #  our initial temperature distribution
        T0 = T
        Tprime0 = np.zeros( T.shape )
 
        t0 = 0.0

        #  Set which variables are to be treated as differential or
        #  algebraic.
        algvar     = [True] * T.shape[0]
        algvar[0]  = False
        algvar[-1] = False

        #  Create the model object and pass initial guesses at T and
        #  Tprime.  The code will then solve for consistent initial
        #  conditions.
        model = Implicit_Problem(self.equations, T0, Tprime0, t0)

        #  Create a simulation object from the IDA DAE solve and pass in
        #  our model information.
        sim = IDA(model)

        #  Bind our algvar variable to the sim object
        sim.algvar = algvar

        #  Calculate consistent initial conditions.  See IDA and Sundials
        #  documentation for more information.
        sim.make_consistent('IDA_YA_YDP_INIT')

        #  Integrate the system and bind the results to class variables.
        self.t, self.T, self.Tprime = sim.simulate(t[-1], 100)


    #  Define the system of equations
    def equations(self, t, T, Tprime):

        #  Allocate the residual vector
        res = Tprime - self.D / self.delta**2 * self.M * T

        #  Set up boundar conditions
        res[0] = T[0] - Tl
        res[-1] = T[-1] - Tr

        #  Return the residual vector
        return res




#  Divide the rod into N points, 100 in this case
N = 100
L = 1           #  Rod length
D = 0.01        #  Diffusion coefficient
Tl = 0.         #  Left-hand boundary condition
Tr = 10.0       #  Right-hand boundary condition

#  Define a time vector and set the initial temperature
t = np.linspace(0, 2)
T = 5.0 * np.ones( (N) )

#  Run the code with the loop to solve the problem.  Also calculate the
#  time taken to run the code.
start_time = time.time()
problem = diffusion(t, T, D, N, L, Tl, Tr)
end_time = time.time()
print('Run time with for-loop = ', end_time - start_time)

#  Run the vectorized version and calculate the time.
start_time = time.time()
problem_vectorized = vectorized_diffusion(t, T, D, N, L, Tl, Tr)
end_time = time.time()
print('Run time with vectorized = ', end_time - start_time)

#  plot the results
x = np.linspace(0, L, N)
plt.plot(x, problem_vectorized.T[-1, :], 'b.', label = 'Vectorized')
plt.plot(x, problem.T[-1, :], 'k', label = 'For-loop')
plt.xlabel('Position (m)')
plt.ylabel('Temperature ($^\circ$C)')
plt.grid(True)
plt.legend()
plt.show()

For small numbers of grid points, the vectorization speedup is between a factor of three and four.  With $N = 1000$, the vectorized code takes 72 seconds to run vs. 6000 seconds for the nonvectorized code.  So use vectorization if you can!

See Also

For more information on finite difference methods, check out the Wikipedia article on the subject.
The Wikipedia article on DAEs has some basic information.
The Assimulo project
The SUNDIALS page at Lawrence Livermore National Laboratory.

Saturday, March 30, 2019

Curve Fitting and Parameter extraction

"I have this model and some experimental data.  I need to fit the model to the data to extract parameters but the problem doesn't lend itself to a neat, canned routine.  How do I do that?"  It's a question I've gotten here, at work, and in other venues.  In fact, I had the question myself in grad school when trying to pull a parameter out from a system of equations that described the heat capacity of $^3$He-$^4$He mixtures near the lambda point.  In retrospect, that was an easy problem, but at the time, most of the fitting I did was much simpler in nature like linear or polynomial fits.  The answer I am tempted to give when asked this question, and the answer I was given 20 years ago is, "The same way you'd do any other curve fit."  This isn't particularly useful, so I want to go through some examples starting with the very simple and working to more complicated problems.  I don't want to delve much into the math; rather, I want this to be more of a practical guide on coding such problems.  

Let's assume we have taken some data and wish to model it with a straight line.  Fig. 1 contains eight data points that lie long a line.  I added a random number to each point just to make things a little more realistic  A common way to do this is to vary the parameters that go into the model in such a way as to minimize the sum of the squared error between the model prediction and the data points, a so-called least-squares analysis.

Figure 1

In mathematical terms we need to find,
$$\min \sum_i \left(\mbox{f}(x_i, p) - y_i\right)^2,$$
where $p$ are our parameters and $x_i$ and $y_i$ are the x and y-values of each of our data points.  In the case of simple linear problems, every spreadsheet program I know of will do this for you, and most programming languages have libraries that will do the same.

In Python (using Scipy) the code to do this is straightforward using canned linear regression routines.  We don't even need consider the above equation unless we want to get under the hood and mess around  or do other forms of customization.  This is also simple to do in a software package like Excel, which contains basic curve-fitting tools.   We will limit ourselves to Python here.


import numpy as np
from scipy import stats

#  Set a constant random seed so the random numbers selected will always be
#  the same
np.random.seed(2)

#  Set number of points
num_points = 8

#  Generate the points that will serve as our experimental data
x = np.linspace(0, 10, num_points)
y = 3 * x + 5;
random_number =  7 * np.random.random(num_points)
y = y +  random_number

#  Do the regression
slope, intercept, r, p, std_err = stats.linregress(x, y)

#  Print out the values of the slope, intercept, and correlation
#  coefficient.
print('Slope = '+ str(slope))
print('Intercept = ' + str(intercept))
print('r = ' + str(r))

This code will print out:

    Slope = 3.08771040246238
    Intercept = 7.205285187141509
    r = 0.9929123805249126


The built-in function also returns some statistics about the fit including the correlation coefficient between the experimental Y-values and those predicted by the model.  Plotting that slope and intercept on top of our experimental data gives:
Figure 2:  Data points along with fit results
This problem is very simple, and, being linear, can be worked out by hand.  For the sake of argument, let's assume we couldn't do that, and furthermore, that there weren't already canned routines to do a linear regression.  How might we approach this problem in that case?

One answer would be to use an optimization routine and minimize the sum of the squared errors ourselves.  We can use the Python package ascipy.optimize.minimize to do this.  In this case we do need to calculate calculate the least-squares error as it is the function we are actively trying to minimize.


import numpy as np
from scipy.optimize import minimize

#  This model we are using.  In this case a straight line
def equation(x, m, b):
    return m * x + b

# This is the function we actually want to minimize
def objective(p, x_data, y_data):
    m = p[0]
    b = p[1]
    model_values = equation(x_data, m, b)

    res = np.sum( np.square( y_data - model_values ) )
    return res

#  Set a constant random seed so the random numbers selected will always be
#  the same
np.random.seed(2)

#  Set number of points
num_points = 8

#  Generate the points that will serve as our experimental data
x = np.linspace(0, 10, num_points)
y = 3 * x + 5;
random_number =  7 * np.random.random(num_points)
y = y +  random_number

#  Initial guess of the slope and intercept
p = [0., 0.]

#  Run the minimization routine and extract the results
results = minimize(objective, p, args=(x, y))
slope = results.x[0]
intercept = results.x[1]

#  We need to manually calculate the correlation
y_model = equation(x, slope, intercept)
r = np.corrcoef(y, y_model)

#  Print out the values of the slope, intercept, and correlation
#  coefficient.
print('Slope = '+ str(slope))
print('Intercept = ' + str(intercept))
print('r = ' + str(r[0][1]))

This code gives the same results to with $10^{-8}$.

Figure 3:  Specific heat at constant pressure and constant chemical potential difference between $^3$He and $^4$He plotted as a function of reduced temperature, $t$.  The molar concentration of $^3$He is 14.85%.  For the sake o clarity, data above $T_\lambda$ are plotted in red and data below $T_\lambda$ are in blue.

OK.  That's an overly simple problem and is linear in nature, too.  Let's look at something more complicated.

Back in the day, I did measurements of the thermodynamic properties of confined liquid helium (pure $^4$He) and mixtures of the two helium isotopes $^3$He and $^4$He as it went through the superfluid transition.    We are going to concentrate on one of the mixtures here.

As part of the analysis, we need to compare our results on the confined system to that of bulk helium.  That bulk specific heat at constant pressure and molar concentration of $^3$He denoted by $C_{px}$ can be calculated by interpolating previously done work.  The difficulty lies in the fact that the analysis needs the specific heat at constant pressure and chemical potential difference between $^3$He and $^4$He.  This quantity is denoted by $C_{p\phi}$.  The calculation of this transformation invokes derivatives of various thermodynamic quantities at $T_\lambda$.  Again, a lot of this has been measured historically and we'll just state that we can simply transform from $C_{px}$ to $C_{p\phi}$ easily.

The difficult part is that we also need to transform the reduced temperature $t$, which is the distance from the $\lambda$-line along a path of constant temperature, to a difference reduced temperature $\theta$, the distance to the $\lambda$-line along a path of constant chemical potential difference.  The expression "$\lambda$-line" denotes that the phase transition temperature, $T_\lambda$, changes as a function concentration.  Thus if we vary the concentration $x$, we'd could plot a locus of phase transitions as a function of $x$ and $T$.  This curve is where the superfluid transition occurs and where we need to evaluate certain thermodynamic derivatives in our analysis, hence the repeated reference to $\lambda$-line derivatives.

Fig 3. summarizes what he have.  It shows the specific heat of a mixture as it goes through the superfluid transition at the temperature $T_\lambda$  However, the data below is plotted as a function of reduced temperature, $t = (T - T_\lambda) / T_\lambda$ where we want it to be a function of $\theta$

The relation between $C_{p\phi}$ and $\theta$ is given by,
\begin{equation} C_{p\phi} = \frac{A}{\alpha}\theta^{-\alpha}(1+D\theta^\Delta)+B.\label{eq:specific_heat} \end{equation}
Here, the parameters to be determined are $A$, $\alpha$, $D$, and $B$.  $\Delta$ has a value of 0.5.  The value of $A$, the amplitude of the specific heat is different on either side of $T_\lambda$ as is the value of $B$.  I will use $A^\prime$ to denote the amplitude below $T_\lambda$ and $A$ above.  Likewise, $B^\prime$ is for $T < T_\lambda$ and $B$ is for temperatures above the superfluid transition.

The process of converting $t$ to $\theta$ is is as follows:
  1. Initially assume $t = \theta$.
  2. Fit Eq. \ref{eq:specific_heat}.
  3. Use parameters to convert $t$ to $\theta$.
  4. Iterate steps 2 and 3 until the ratio $A/A^\prime$ converges.
It's the fitting part that we're interested in here, so I will just put the t-to-$\theta$ equation in the code along with the functions needed to calculate the $\lambda$-line derivatives.  We'll also use a fairly unsophisticated approach and do a brute-force fit to accomplish this.

First we code up Eq. 1.


#  This is Eq. 1
def equation(p, theta):
    #  Extract individual parameters from p vector
    A     = p[0]
    alpha = p[1]
    B     = p[2]
    D     = p[3]

    #  Return specific heat
    return (A/alpha) * np.power(theta, -alpha) * (1.0 +
        D * np.power(theta, 0.5)) + B

Next, we code up our least-squares function that will actually be minimized.  This function takes out parameter vector $q$ as well as additional arguments for reduced temperatures on either side of the transition, and the known bulk specific heats, also on both sides of the transition.


def objective(q, theta_warm, theta_cold, C_warm_exp, C_cold_exp):

    #  Extract the parameters from the array q and calculate the specific
    #  heat on the warm side
    #  Format is (A, APrime, alpha, B, D, Dprime)
    p = [q[0], q[2], q[3], q[4]]
    C_warm = equation(p, theta_warm)

    #  Extract the parameters from the array q and calculate the specific
    #  heat on the cold side
    p = [q[1], q[2], q[3], q[5]]
    C_cold = equation(p, theta_cold)
    error_warm_sq = np.square(C_warm - C_warm_exp)
    error_cold_sq = np.square(C_cold - C_cold_exp)

    res = np.sum(error_warm_sq) + np.sum(error_cold_sq)
    return res
Next, we set up some variables specifically for this concentration.


conc = 0.1485                   #  mixture 1
Tlambda = 1.963165835           #  mixture 1

#  Calculate lambda-line derivatives
dX_dT = calc_dX_dT(conc)
dphi_dT = calc_dphi_dT(conc)
dS_dT = calc_dS_dT(conc)

#  Take theta = t initially
theta_warm = t_warm
theta_cold = t_cold

Lastly, we fit and iterate.



#  Initial guess for our parameters
q = [7.0, 10.0, -0.021, 375.0, -0.01, -0.01]

for i in range(5):
    results = minimize(objective, q, args=(theta_warm, theta_cold, 
        bulk_warm, bulk_cold), method='Powell' )

    #  Format is (A, APrime, alpha, B, D, Dprime)
    A      =  results.x[0]
    Aprime =  results.x[1]
    alpha  =  results.x[2]
    B      =  results.x[3]
    D      =  results.x[4]
    Dprime =  results.x[5]
    print A/Aprime
    print 'Iteration', i+1, results.success

    #  T-to-thea calculation
    theta_warm = t_warm * ( 1. - dX_dT**(-1.) * dphi_dT**(-1.) * (
        dS_dT -(1./Tlambda)*(A/alpha*theta_warm**(-alpha)* (1./(1.-alpha) +
        D*theta_warm**0.5 /(0.5 - alpha + 1) ) + B)))**(-1)

    theta_cold = t_cold * ( 1. - dX_dT**(-1.) * dphi_dT**(-1.) * (
    dS_dT -(1./Tlambda)*(Aprime/alpha*theta_cold**(-alpha)*( 1./(1.-alpha) +
    Dprime*theta_cold**0.5 /(0.5 - alpha + 1.) ) + B)))**(-1.)

    #  Use newly fitted parameters as our new initial guesses
    q = results.x

The number of iterations is hard coded in the example above.  In practice, one would likely look at the ratio $A/A^\prime$ each iteration and seem how much it changes.  If the change falls below some tolerance, one would stop the process.


Figure 4:  Bulk specific heat plotted along with the fit results.

This example is simple enough where we could probably get away with the curve_fit function included in scipy.optimize.  

The code will return the following results:
$A = 9.334426752192908$
$A^\prime = 8.69867448279229$
$\alpha = -0.02185117731336304$
$B = 412.4595717146204$
$D = -0.07295632878279142$
$D^\prime = -0.3560880053296793$

Plugging these numbers back into our model and plotting the results on our data is shown in Fig 4.  But perhaps a clearer way in this case to see the results is to plot these on a semilog plot.  Since the specific heat follows a near power law, the semilog plot shroud be almost linear.  This is shown below in Fig 5.  Note, we use the absolute value of $\theta$ for $T < T_\lambda$ because of the issue in taking logs of negative numbers.
Figure 5:  A plot of the specific heat and fit on a semilog scale for clarity.  The absolute value of $\theta$ is plotted on the X-axis as to avoid issues with taking the log of a negative number.

In solving this problem, there are a couple of issues I didn't address.  There is a possibility of local minima.  This is actually the biggest issue with what I've done above and one which I ignored entirely.

The other issue is that the parameters vary over several orders of magnitude.  The biggest parameter, $B$, turns out to be on the order of $10^2$, and the smallest, $\alpha$, is on the order of $10^{-2}$.  So there are about four orders of magnitude between those parameters.  One can often get convergence issues if the parameters differ in size over a vast range.  We can get away with it here, but in a lot of problems I work on, the parameters can vary over 19 orders of magnitude.  One potential way to deal with this is to fit to the log of the parameters rather than the parameters themselves.  This will tend to put them all on an even footing.  For example of one parameter is on the order of $10^{-15}$ and the other of order $10^4$, $\log(10^{-15}) = -15$ and $\log(10^4) = 4$.  These numbers are much easier to work with in terms of getting convergence.

The last issue is the potential for certain parameter values used by the code in the optimization to return a value of $\pm\infty$ for either the model or its gradient.  I neglected to say anything about this previously, but it is why I set the method equal to 'Powell' in the minimization function.  This is a gradient-free method which allows for a work-around in this case.

Neither of the above examples is particularly difficult, but the helium example has different equations on either side of the transition with some parameters shared between both sides while other parameters differ.  I wanted to sketch out a general approach on how to set up and solve these sorts of problems.



Saturday, April 14, 2018

Polynomial Interpolation

I want to write a couple of articles on orthogonal collocation, and orthogonal collocation on finite elements, techniques for solving differential and partial differential equations numerically.  I found the academic papers on the subject to be somewhat opaque so my goal here is to come up with something that conveys the details of the method in an easier to understand manner.  The ideas behind these methods are fairly straightforward, but they are much simpler to comprehend if we walk through the concepts piece by piece.  In that light, I want to write the following sequence of posts on the following topic:

  • Polynomial interpolation (this article)
  • Technique of orthogonal collocation for the solution of ordinary differential equations
  • Orthogonal collocation for the solution of partial differential equations
  • Piecewise polynomial interpolation focusing on quadratic and cubic splines
  • Orthogonal collocation on finite elements for the solution of ordinary differential equations
  • Orthogonal collocation on finite elements for the solution of partial differential equations
I will largely be using MATLAB for this series though I'll make an effort to also provide the equivalent Python code.

So let's start with idea of polynomial interpolation.  Consider the case where we have $N + 1$ points and we wish to generate an $N$th order polynomial that passes through all points.

These $N+1$ points are referred to as knots, breaks, or collocation points in the literature.  Let's denote the X-coordinate of each point by $x_i = x_1, x_2, \cdots, x_N, x_{N+1}$, and the corresponding Y-coordinate by $y_i = y_1, y_2, \cdots, y_N, y_{N+1}$.

An $N$th order polynomial has the forms,
\begin{equation}
P(x) = \alpha_0 + \alpha_1 x + \alpha_2 x^2 + \alpha_3 x^3 + \cdots + \alpha_{N-1} x^{N-1} + \alpha_N x^N.
\label{polynomial}
\end{equation}
Our task is to find the coefficients, $\alpha_i$ such that the polynomial passes through all our points.

Since we know the polynomial passes through all of out points $(x_i, y_i)$ we can write out an equations for each of the $N+1$ points.  Specifically,
\begin{align*}
y_1 = & \alpha_0 + \alpha_1 x_1 + \alpha_2 x_1^2 + \alpha_3 x_1^3 + \cdots + \alpha_{N-1} x_1^{N-1} + \alpha_N x_1^N \\
y_2 =&  \alpha_0 + \alpha_1 x_2 + \alpha_2 x_2^2 + \alpha_3 x_2^3 + \cdots + \alpha_{N-1} x_2^{N-1} + \alpha_N x_2^N \\
y_3 =&  \alpha_0 + \alpha_1 x_3 + \alpha_2 x_3^2 + \alpha_3 x_3^3 + \cdots + \alpha_{N-1} x_3^{N-1} + \alpha_N x_3^N \\
\vdots &  \\
y_N =&  \alpha_0 + \alpha_1 x_N + \alpha_2 x_N^2 + \alpha_3 x_N^3 + \cdots + \alpha_{N-1} x_N^{N-1} + \alpha_N x_N^N \\
y_{N+1} =&  \alpha_0 + \alpha_1 x_{N+1} + \alpha_2 x_{N+1}^2 + \alpha_3
x_{N+1}^3 + \cdots + \alpha_{N-1} x_{N+1}^{N-1} + \alpha_N x_{N+1}^N.
\end{align*}
Notice that this is a linear system of equations (in terms of the unknown values of $\alpha_i$) and can be written in matrix form as,
\begin{equation} \begin{pmatrix} y_1 \\ y_2 \\ y_3 \\ \vdots \\ y_N \\ y_{N+1} \\ \end{pmatrix} = \begin{pmatrix} x_1^0  & x_1^1  & x_1^2  & x_1^3  & \cdots & x_1^N  \\ x_2^0  & x_2^1  & x_2^2  & x_2^3  & \cdots & x_2^N  \\ x_3^0  & x_3^1  & x_3^2  & x_3^3  & \cdots & x_3^N  \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ x_N^0  & x_N^1  & x_N^2  & x_N^3  & \cdots & x_N^N  \\ x_{N+1}^0  & x_{N+1}^1  & x_{N+1}^2  & x_{N+1}^3  & \cdots & x_{N+1}^N  \\ \end{pmatrix} \begin{pmatrix} \alpha_0 \\ \alpha_1 \\ \alpha_2 \\ \vdots \\ \alpha_{N-1} \\ \alpha_N \\ \end{pmatrix}. \label{eq:matrix}\end{equation}

Let's introduce a bit of notation that will be helpful in subsequent articles.  We'll denote the column vector of Y-values as $\vec{Y}$, the column vector of unknown values of $\alpha_i$ as $\vec{\alpha}$ and the matrix as $\bf{X}$.
Using this notation, our problem can be written as,
\begin{equation}\vec{Y} = \bf{X}\vec{\alpha}\label{eq:poly_matrix}\end{equation}
So
\begin{equation}\vec{\alpha} = \bf{X}^{-1}\vec{Y}\label{eq:polynomial_inverse}\end{equation}

Our problem of finding the coefficient vector boils down to simply finding the inverse of the matrix $\bf{X}$.  Let's work out a problem.

I've chosen a bunch of X values, in this case the integers from one to six, and six corresponding Y-values at random from between zero and ten.  Those happened to be 9.6703, 5.4723, 9.7268, 7.1482, 6.9773, and  2.1609.  These six points are plotted below in Fig 2.
Figure 2:  Six points whose Y-values are randomly generated.  We wish to use Eqs. \ref{eq:poly_matrix} and \ref{eq:polynomial_inverse}  to calculate the coefficients of our interpolating polynomial.


Plugging in our X- and Y-values into Eq. \ref{eq:matrix} gives,
\begin{equation} \begin{pmatrix} 9.6703 \\ 5.4723 \\ 9.7268 \\ 7.1482 \\ 6.9773 \\ 2.1609 \end{pmatrix} = \begin{pmatrix} 1  &   1  &   1  &   1 &    1 &    1 \\ 1  &   2  &   4  &   8 &   16 &   32 \\ 1  &   3  &   9  &  27 &   81 &  243 \\ 1  &   4  &  16  &  64 &  256 & 1024 \\ 1  &   5  &  25  & 125 &  625 & 3125 \\ 1  &   6  &  36  & 216 & 1296 & 7776 \end{pmatrix} \begin{pmatrix} \alpha_0 \\ \alpha_1 \\ \alpha_2 \\ \alpha_3 \\ \alpha_4 \\ \alpha_5 \end{pmatrix}\label{eq:numerical_matrix}\end{equation}

We can invert the matrix in Eq. \ref{eq:numerical_matrix} numerically in either MATLAB or numpy and calculate our coefficients from Eq. \ref{eq:polynomial_inverse}.  The MATLAB code that generate the points, fills in the matrix values, and calculates the inverse and coefficients is given below.  The code also plots the result.  The plot is shown in Fig. 3.

clear all; clc; close all;

%  Set random seed to a constant value for reproduceablility.
rng(4);

%  Our X values will just be evenly spaced points.
%  The Y values will be randomly chosen numbers between zero and ten.
x = [1; 2; 3; 4; 5; 6];
y = 10 * rand(length(x),1);

%  Plot these points
plot(x, y, 'o', 'markerfacecolor', [0 0 0]);
xlim( [0, 7] );
ylim( [0, 12] );

%  Define our square matrix
X = [ x.^0, x.^1, x.^2, x.^3, x.^4 x.^5];

%  Solve for unknown coefficients
alpha = inv(X) * y;

%  The way I wrote the code the values of alpha go from lower to higher
%  powers of X as we go down the vector.  MATLAB's polyval function assumes
%  the direction is reversed so we'll flip the alpha vector as to use
%  MATLAB's built-in functionaility
alpha = flipud(alpha);

%  Plot the polynomial
z = linspace(1, 6);
hold on;
plot(z, polyval(alpha, z), 'k');
hold off;
grid on;
xlabel('x');
ylabel('y');
set(gca, 'FontSize', 14);

Figure 3:  Points connected by interpolating polynomial
The code calculates the coefficients as (rounded to two decimal places) $\alpha_0 = 102.95$, $\alpha_1 =  -189.21$,  $\alpha_2 = 131.82$,  $\alpha_3 = -41.68$,    $\alpha_4 = 6.12$, and $\alpha_5 =  -0.340$.

If you don't have access to MATLAB, below is the equivalent calulcation using Python.


import numpy as np
import matplotlib.pyplot as plt

#  Set random seed to a constant value for reproduceablility.
np.random.seed(4)

#  Our X values will just be evenly spaced points.
#  The Y values will be randomly chosen numbers between zero and ten.
x = np.array( [1, 2, 3, 4, 5, 6] )
y = 10.0 * np.random.rand( x.shape[0], );

plt.plot(x, y, 'ko', markerfacecolor = 'k')
plt.xlim( [0, 7] )
plt.ylim( [0, 12] )
plt.grid(True)
plt.xlabel('x')
plt.ylabel('y')

X = np.column_stack(( x**0, x**1, x**2, x**3, x**4, x**5 ) )
alpha = np.linalg.solve(X, y)
alpha = np.fliplr( [alpha] )[0]

z = np.linspace(1, 6);
plt.plot(z, np.polyval(alpha, z), 'k');
plt.show()

Of course, in this example, the size of the matrices are pre-determined so I constructed them by hand.  Obviously, that could be done programmatically if variable numbers of points are needed.

There are a couple of issues which limit this technique as a general-purpose interpolation method.  Problems arise when we have a large number of points.  The solid curve in Fig. 3 doesn't seem to be an unreasonable representation of our data.  In Fig. 4, we repeat the calculation for eight points whose Y-values are also chosen to be between zero and ten.  As we can see, there is a large "bump" between the first and second points where the interpolating polynomial gets above 20.  This wiggling phenomenon will become more and more pronounced as we increase the number of points.

Figure 4:  Interpolating polynomial for six points whose Y-values are chosen randomly n the interval [0, 10]

In Fig. 5, we take that a step further and look at fifteen points.  Obviously, the resulting polynomial is a very poor representation of our data, and MATLAB throws a warning shown in Fig. 6 regarding the condition number of the matrix $\bf{X}$.  As the message says, the matrix is close to singular, meaning its determinate is very close to zero.  In practical terms, this means small uncertainties (due to numerical rounding) in our calculated values will have a huge effect on the shape of the resulting polynomial.  In Fig. 5., this problem isn't bad enough to cause the interpolating curve to miss our points, but the wiggle effect makes this unusable for interpolation unless we confine ourselves to the middle points away from the two spikes.

Figure 5:  Interpolating polynomial for 15 points.


Figure 6:  Warning message given when calculating coefficients for the 15-point interpolation

In Fig. 7, we try this technique on a large number of points and find the results fail entirely.

Figure 7:  An attempt to using a simple polynomial interpolation to draw a smooth curve through a large number of points.  Due to the almost singular matrix, the resulting answer misses the data entirely.

We will address these issues later in an article on splines, but for the next few posts, I want to show how we can use these simple interpolating polynomials to numerically solve differential and partial differential equations.

Update 11/25/2018:  I've been meaning to note that the method presented here is for pedagogical purposes and is not the way I'd actually perform this operation.  In practice, this is computationally inefficient, the method of Lagrange interpolating polynomials being preferable.