Remember to Smile: Curve Fitting and Parameter extraction

"I have this model and some experimental data. I need to fit the model to the data to extract parameters but the problem doesn't lend itself to a neat, canned routine. How do I do that?" It's a question I've gotten here, at work, and in other venues. In fact, I had the question myself in grad school when trying to pull a parameter out from a system of equations that described the heat capacity of $^3$He-$^4$He mixtures near the lambda point. In retrospect, that was an easy problem, but at the time, most of the fitting I did was much simpler in nature like linear or polynomial fits. The answer I am tempted to give when asked this question, and the answer I was given 20 years ago is, "The same way you'd do any other curve fit." This isn't particularly useful, so I want to go through some examples starting with the very simple and working to more complicated problems. I don't want to delve much into the math; rather, I want this to be more of a practical guide on coding such problems.

Let's assume we have taken some data and wish to model it with a straight line. Fig. 1 contains eight data points that lie long a line. I added a random number to each point just to make things a little more realistic A common way to do this is to vary the parameters that go into the model in such a way as to minimize the sum of the squared error between the model prediction and the data points, a so-called least-squares analysis.

Figure 1

In mathematical terms we need to find,

$$\min \sum_i \left(\mbox{f}(x_i, p) - y_i\right)^2,$$

where $p$ are our parameters and $x_i$ and $y_i$ are the x and y-values of each of our data points. In the case of simple linear problems, every spreadsheet program I know of will do this for you, and most programming languages have libraries that will do the same.

In Python (using Scipy) the code to do this is straightforward using canned linear regression routines. We don't even need consider the above equation unless we want to get under the hood and mess around or do other forms of customization. This is also simple to do in a software package like Excel, which contains basic curve-fitting tools. We will limit ourselves to Python here.

import numpy as np
from scipy import stats

#  Set a constant random seed so the random numbers selected will always be
#  the same
np.random.seed(2)

#  Set number of points
num_points = 8

#  Generate the points that will serve as our experimental data
x = np.linspace(0, 10, num_points)
y = 3 * x + 5;
random_number =  7 * np.random.random(num_points)
y = y +  random_number

#  Do the regression
slope, intercept, r, p, std_err = stats.linregress(x, y)

#  Print out the values of the slope, intercept, and correlation
#  coefficient.
print('Slope = '+ str(slope))
print('Intercept = ' + str(intercept))
print('r = ' + str(r))

This code will print out:

Slope = 3.08771040246238
Intercept = 7.205285187141509
r = 0.9929123805249126

The built-in function also returns some statistics about the fit including the correlation coefficient between the experimental Y-values and those predicted by the model. Plotting that slope and intercept on top of our experimental data gives:

Figure 2: Data points along with fit results

This problem is very simple, and, being linear, can be worked out by hand. For the sake of argument, let's assume we couldn't do that, and furthermore, that there weren't already canned routines to do a linear regression. How might we approach this problem in that case?

One answer would be to use an optimization routine and minimize the sum of the squared errors ourselves. We can use the Python package ascipy.optimize.minimize to do this. In this case we do need to calculate calculate the least-squares error as it is the function we are actively trying to minimize.

import numpy as np
from scipy.optimize import minimize

#  This model we are using.  In this case a straight line
def equation(x, m, b):
    return m * x + b

# This is the function we actually want to minimize
def objective(p, x_data, y_data):
    m = p[0]
    b = p[1]
    model_values = equation(x_data, m, b)

    res = np.sum( np.square( y_data - model_values ) )
    return res

#  Set a constant random seed so the random numbers selected will always be
#  the same
np.random.seed(2)

#  Set number of points
num_points = 8

#  Generate the points that will serve as our experimental data
x = np.linspace(0, 10, num_points)
y = 3 * x + 5;
random_number =  7 * np.random.random(num_points)
y = y +  random_number

#  Initial guess of the slope and intercept
p = [0., 0.]

#  Run the minimization routine and extract the results
results = minimize(objective, p, args=(x, y))
slope = results.x[0]
intercept = results.x[1]

#  We need to manually calculate the correlation
y_model = equation(x, slope, intercept)
r = np.corrcoef(y, y_model)

#  Print out the values of the slope, intercept, and correlation
#  coefficient.
print('Slope = '+ str(slope))
print('Intercept = ' + str(intercept))
print('r = ' + str(r[0][1]))

This code gives the same results to with $10^{-8}$.

Figure 3: Specific heat at constant pressure and constant chemical potential difference between $^3$He and $^4$He plotted as a function of reduced temperature, $t$. The molar concentration of $^3$He is 14.85%. For the sake o clarity, data above $T_\lambda$ are plotted in red and data below $T_\lambda$ are in blue.

OK. That's an overly simple problem and is linear in nature, too. Let's look at something more complicated.

Back in the day, I did measurements of the thermodynamic properties of confined liquid helium (pure $^4$He) and mixtures of the two helium isotopes $^3$He and $^4$He as it went through the superfluid transition. We are going to concentrate on one of the mixtures here.

As part of the analysis, we need to compare our results on the confined system to that of bulk helium. That bulk specific heat at constant pressure and molar concentration of $^3$He denoted by $C_{px}$ can be calculated by interpolating previously done work. The difficulty lies in the fact that the analysis needs the specific heat at constant pressure and chemical potential difference between $^3$He and $^4$He. This quantity is denoted by $C_{p\phi}$. The calculation of this transformation invokes derivatives of various thermodynamic quantities at $T_\lambda$. Again, a lot of this has been measured historically and we'll just state that we can simply transform from $C_{px}$ to $C_{p\phi}$ easily.

The difficult part is that we also need to transform the reduced temperature $t$, which is the distance from the $\lambda$-line along a path of constant temperature, to a difference reduced temperature $\theta$, the distance to the $\lambda$-line along a path of constant chemical potential difference. The expression "$\lambda$-line" denotes that the phase transition temperature, $T_\lambda$, changes as a function concentration. Thus if we vary the concentration $x$, we'd could plot a locus of phase transitions as a function of $x$ and $T$. This curve is where the superfluid transition occurs and where we need to evaluate certain thermodynamic derivatives in our analysis, hence the repeated reference to $\lambda$-line derivatives.

Fig 3. summarizes what he have. It shows the specific heat of a mixture as it goes through the superfluid transition at the temperature $T_\lambda$ However, the data below is plotted as a function of reduced temperature, $t = (T - T_\lambda) / T_\lambda$ where we want it to be a function of $\theta$

The relation between $C_{p\phi}$ and $\theta$ is given by,
\begin{equation} C_{p\phi} = \frac{A}{\alpha}\theta^{-\alpha}(1+D\theta^\Delta)+B.\label{eq:specific_heat} \end{equation}
Here, the parameters to be determined are $A$, $\alpha$, $D$, and $B$. $\Delta$ has a value of 0.5. The value of $A$, the amplitude of the specific heat is different on either side of $T_\lambda$ as is the value of $B$. I will use $A^\prime$ to denote the amplitude below $T_\lambda$ and $A$ above. Likewise, $B^\prime$ is for $T < T_\lambda$ and $B$ is for temperatures above the superfluid transition.

The process of converting $t$ to $\theta$ is is as follows:

Initially assume $t = \theta$.
Fit Eq. \ref{eq:specific_heat}.
Use parameters to convert $t$ to $\theta$.
Iterate steps 2 and 3 until the ratio $A/A^\prime$ converges.

It's the fitting part that we're interested in here, so I will just put the t-to-$\theta$ equation in the code along with the functions needed to calculate the $\lambda$-line derivatives. We'll also use a fairly unsophisticated approach and do a brute-force fit to accomplish this.

First we code up Eq. 1.

#  This is Eq. 1
def equation(p, theta):
    #  Extract individual parameters from p vector
    A     = p[0]
    alpha = p[1]
    B     = p[2]
    D     = p[3]

    #  Return specific heat
    return (A/alpha) * np.power(theta, -alpha) * (1.0 +
        D * np.power(theta, 0.5)) + B

Next, we code up our least-squares function that will actually be minimized. This function takes out parameter vector $q$ as well as additional arguments for reduced temperatures on either side of the transition, and the known bulk specific heats, also on both sides of the transition.

def objective(q, theta_warm, theta_cold, C_warm_exp, C_cold_exp):

    #  Extract the parameters from the array q and calculate the specific
    #  heat on the warm side
    #  Format is (A, APrime, alpha, B, D, Dprime)
    p = [q[0], q[2], q[3], q[4]]
    C_warm = equation(p, theta_warm)

    #  Extract the parameters from the array q and calculate the specific
    #  heat on the cold side
    p = [q[1], q[2], q[3], q[5]]
    C_cold = equation(p, theta_cold)

    error_warm_sq = np.square(C_warm - C_warm_exp)
    error_cold_sq = np.square(C_cold - C_cold_exp)

    res = np.sum(error_warm_sq) + np.sum(error_cold_sq)
    return res

Next, we set up some variables specifically for this concentration.

conc = 0.1485                   #  mixture 1
Tlambda = 1.963165835           #  mixture 1

#  Calculate lambda-line derivatives
dX_dT = calc_dX_dT(conc)
dphi_dT = calc_dphi_dT(conc)
dS_dT = calc_dS_dT(conc)

#  Take theta = t initially
theta_warm = t_warm
theta_cold = t_cold

Lastly, we fit and iterate.

#  Initial guess for our parameters
q = [7.0, 10.0, -0.021, 375.0, -0.01, -0.01]

for i in range(5):
    results = minimize(objective, q, args=(theta_warm, theta_cold, 
        bulk_warm, bulk_cold), method='Powell' )

    #  Format is (A, APrime, alpha, B, D, Dprime)
    A      =  results.x[0]
    Aprime =  results.x[1]
    alpha  =  results.x[2]
    B      =  results.x[3]
    D      =  results.x[4]
    Dprime =  results.x[5]
    print A/Aprime
    print 'Iteration', i+1, results.success

    #  T-to-thea calculation
    theta_warm = t_warm * ( 1. - dX_dT**(-1.) * dphi_dT**(-1.) * (
        dS_dT -(1./Tlambda)*(A/alpha*theta_warm**(-alpha)* (1./(1.-alpha) +
        D*theta_warm**0.5 /(0.5 - alpha + 1) ) + B)))**(-1)

    theta_cold = t_cold * ( 1. - dX_dT**(-1.) * dphi_dT**(-1.) * (
    dS_dT -(1./Tlambda)*(Aprime/alpha*theta_cold**(-alpha)*( 1./(1.-alpha) +
    Dprime*theta_cold**0.5 /(0.5 - alpha + 1.) ) + B)))**(-1.)

    #  Use newly fitted parameters as our new initial guesses
    q = results.x

The number of iterations is hard coded in the example above. In practice, one would likely look at the ratio $A/A^\prime$ each iteration and seem how much it changes. If the change falls below some tolerance, one would stop the process.

Figure 4: Bulk specific heat plotted along with the fit results.

This example is simple enough where we could probably get away with the curve_fit function included in scipy.optimize.

The code will return the following results:
$A = 9.334426752192908$
$A^\prime = 8.69867448279229$
$\alpha = -0.02185117731336304$
$B = 412.4595717146204$
$D = -0.07295632878279142$
$D^\prime = -0.3560880053296793$

Plugging these numbers back into our model and plotting the results on our data is shown in Fig 4. But perhaps a clearer way in this case to see the results is to plot these on a semilog plot. Since the specific heat follows a near power law, the semilog plot shroud be almost linear. This is shown below in Fig 5. Note, we use the absolute value of $\theta$ for $T < T_\lambda$ because of the issue in taking logs of negative numbers.

Figure 5: A plot of the specific heat and fit on a semilog scale for clarity. The absolute value of $\theta$ is plotted on the X-axis as to avoid issues with taking the log of a negative number.

In solving this problem, there are a couple of issues I didn't address. There is a possibility of local minima. This is actually the biggest issue with what I've done above and one which I ignored entirely.

The other issue is that the parameters vary over several orders of magnitude. The biggest parameter, $B$, turns out to be on the order of $10^2$, and the smallest, $\alpha$, is on the order of $10^{-2}$. So there are about four orders of magnitude between those parameters. One can often get convergence issues if the parameters differ in size over a vast range. We can get away with it here, but in a lot of problems I work on, the parameters can vary over 19 orders of magnitude. One potential way to deal with this is to fit to the log of the parameters rather than the parameters themselves. This will tend to put them all on an even footing. For example of one parameter is on the order of $10^{-15}$ and the other of order $10^4$, $\log(10^{-15}) = -15$ and $\log(10^4) = 4$. These numbers are much easier to work with in terms of getting convergence.

The last issue is the potential for certain parameter values used by the code in the optimization to return a value of $\pm\infty$ for either the model or its gradient. I neglected to say anything about this previously, but it is why I set the method equal to 'Powell' in the minimization function. This is a gradient-free method which allows for a work-around in this case.

Neither of the above examples is particularly difficult, but the helium example has different equations on either side of the transition with some parameters shared between both sides while other parameters differ. I wanted to sketch out a general approach on how to set up and solve these sorts of problems.

Remember to Smile

Saturday, March 30, 2019

Curve Fitting and Parameter extraction

No comments:

Post a Comment