Mount Sinai School of Medicine Systems Biology—Biomedical Modeling Development of Models II: Model Fitting and Error Estimation Kevin D.

Download Report

Transcript Mount Sinai School of Medicine Systems Biology—Biomedical Modeling Development of Models II: Model Fitting and Error Estimation Kevin D.

Mount Sinai School of Medicine
Systems Biology—Biomedical Modeling
Development of Models II:
Model Fitting and Error Estimation
Kevin D. Costa
with
Steven H. Kleinstein and Uri Hershberg
1
Biomathematical Model
• A system of mathematical equations or
computer simulations that provides a
quantitative picture of how a complex
biological system functions under
healthy and diseased conditions
• Computational models use numerical
methods to examine mathematical
equations or systems of equations
too complex for analytical solution
2
Advantages of the
Modeling Approach
• Concise summary of present knowledge of
operation of a particular system
• Predict outcomes of modes of operation not
easily studied experimentally in a living system
• Provide diagnostic tools to test theories about
the site of suspected pathology or effect of
drug treatment
• Clarify or simplify complex experimental data
• Suggest new experiments to advance
understanding of a system
3
Limitations of the
Modeling Approach
• Models often require many simplifying
assumptions
– beware of garbage in, garbage out
• Validation of model predictions is essential
– examination of behavior under known limiting
conditions
– experimental validation
– limits of model performance can point out
what we don’t understand about a system
4
Forward Model
• A detailed mathematical model designed to
incorporate a desired level of anatomic,
physical, or physiologic features
– Can have arbitrary complexity as desired
Trayanova and Tice, 2009
– Parameter values often obtained from published literature
– Ex: cardiac electromechanical coupling, cell signaling networks
• Used for simulating realistic experimental data under
precisely defined conditions to test hypotheses in silico
• Can help design better experiments and reduce animal use
• Generally too complicated for fitting to experimental data
5
• Allows generation of synthetic data sets with prescribed
noise characteristics (Monte Carlo simulation) for evaluating
parameters obtained by inverse modeling
Inverse Model
• A mathematical model designed to fit experimental data
so as to explicitly quantify physical or physiological
parameters of interest
• Values of model elements are obtained using parameter
estimation techniques aimed at providing a “best fit” to
the data
• Generally involves an iterative process to minimize the
average difference between the model and the data
• Evaluating the quality of an inverse model involves a
combination of established mathematical techniques as
well as intuition and creative insight
6
Characteristics of a
Good Inverse Model
• Fit is good—model should be able to adequately
describe a relatively noise-free data set
(of course a poor fit provides some insight also)
• Model parameters are unique
– Theoretically identifiable for noise-free data
– Well-determined model parameters in presence of
measurement noise
• Values of parameter estimates are consistent
with hypothesized physical or physiologic
meanings and change appropriately in response
to alterations in the actual system
7
Forward-Inverse Modeling
• A process of combined data simulation and
model fitting used for evaluating the robustness,
uniqueness, and sensitivity of parameters
obtained from an inverse model of interest
• A powerful tool for improving data analysis and
understanding the limitations on model
parameters used for system characterization
and distinguishing normal from abnormal
populations
8
Six Steps for
Inverse-Modeling of Data
1. Select an appropriate mathematical model
• Polynomial or other functional form
• Based on underlying theoretical equations
2. Define a “figure of merit” function
• Measures agreement between data & model for given parameters
3. Adjust model parameters to get a “best fit”
• Typically involves minimizing the figure of merit function
4. Examine “goodness of fit” to data
• Never perfect due to measurement noise
5. Determine whether a much better fit is possible
• Tricky due to possible local minima vs. global minimum
• F-test for comparing models of different complexity
6. Evaluate accuracy of best-fit parameter values
9
• Provide confidence limits and determine uniqueness
• Assess physical reasonability of estimated parameter values
Selecting the Model
• “Trend lines”
– Polynomial, exponential, and other standard
functions are often used when a data set seems to
follow a mathematical trend but the governing
formula is not known
• Physically-based equations
– Given knowledge of a governing physical process,
the desired model equations are derived from
underlying theoretical principles
– Resulting model parameters have a specific
physical interpretation
10
Least-Squares
Error Minimization
y
x
x
x
xx
x
x
x
• Goal is to fit N data points (xi, yi) i=1..N
data (xi,yi)
model (xi,ŷi)
x
yˆ i  yˆ (x i ,a1 ..aM )
• The model is a function with M adjustable
parameters ak, k=1..M used to generate N
model points (xi, ŷi)
• The residual measures the difference
between a data point and the
corresponding model estimate
• Since residuals can be positive or negative,
a sum of residuals is not a good measure

of overall error in the fit
x
x
y i  yˆ (x i ,a1 ..aM )
N
[y
i
 yˆ (x i ,a1..aM )]
i1
• A better measure is the sum of squared
N
residuals, E, which is only zero if
2
ˆ
E

[y

y
(x
,a
..a
)]

i
i 1
M

each and every residual is zero
11
i1
Maximum Likelihood Estimation
• Not meaningful to ask “What is the probability
that my set of model parameters is correct?”
– Only one correct parameter set  Mother Nature!
• Better to ask “Given my set of model
parameters, what is the probability that this data
set could be obtained?”
– What is the likelihood of the parameters given the
data?
• Inverse modeling is also known as “maximum
likelihood estimation”.
12
The Chi-Square Error Measure and
Maximum Likelihood Estimation
• For Gaussian distribution of measurement
noise with varying standard deviation, si,
the probability, P, of the data set coming
from the model parameters is given by
• Maximizing this probability involves
maximizing ln(P) or minimizing –ln(P),
yielding the chi-square function of

weighted residuals, 2
– The “weight” is the inverse of the variance
of each measurement (wi = si-2)
– Other functions may be useful for nonGaussian measurement noise, yielding socalled “robust estimation” methods
• If variance is assumed to be uniform, then

let s = constant = 1, and chi-square
function yields the sum of squared
residuals function defined earlier
13
 [y i  yˆ (x i )]2 
P  exp

2
2s i


i1
N
N
ln(P)  
i1
N
[y i  yˆ (x i )]2
s i2
 2
2 |s 1  [y i  yˆ (x i )]2  E
i1
Minimizing Chi-Square
• Since the error in the model fit depends on the model
parameters, ak, minimizing the chi-square function requires
finding where the derivatives are zero
N
2  
i1
[y i  yˆ (x i )]2
s i2
N 
( 2 )
[y i  yˆ (x i )]yˆ (x i ,a1..aM ) 
 2

 0 ; k 1..M
2
ak
si
ak


i1 

• This yields a general set of M (nonlinear) equations for the
 M unknowns ak
• The model derivatives dŷ/dak are often known exactly, or
may be approximated numerically using finite differences
14
Linear Regression Analysis
• Consider a set of measurements of
photodetector voltage (dependent
variable) as a function of incident
laser intensity (independent
variable).
ŷ = a +bx
• We propose to examine a linear
relationship between voltage (y) and
intensity (x).
ŷ = a + bx
• Define the least-squares error
x
=
[x
,
x
,
…,
x
]
data
1
2
N
norm quantifying the “goodness”
of the linear fit. Then adjust model
y = [y1, y2, …, yN]
data
parameters a and b to minimize
model ŷ = [ŷ1, ŷ2, …, ŷN]
this error.
• We need to find the best set of
values of a and b to fit the data.
N
E(a,b)  y i  (a  bxi )
i1
15
2
Computing Model Parameters
for Linear Regression
• We can determine the best values of
a and b by calculating the partial
derivatives of E with respect to a and
b, and setting these to zero. This
yields 2 equations in terms of the
mean values x and y to be solvedfor
the 2 unknowns a and b, yielding:

• Standard error of the estimate, sy•x
 standard deviation of

approximates
 in
population about the line of means
terms of regression slope, b, and x
and y standard deviations, sx and sy
• Standard error of slope and intercept,

sa and sb, used for t-test of a,b = 0 or
to compute confidence intervals
16
N
E(a,b)  y i  (a  bxi )
2
i1
E(a,b)
0
a
E(a,b)
0
b
x i y i  N x y
b
x i2  N(x ) 2
syx 
N 1 2
(sy  b 2 sx2 )
N 2
sa  sx,y
sb 
a y bx
1
x2

N (N 1)sx2
1 sx,y
N 1 sx
Regression versus Correlation
• Correlation coefficient describes
the strength of the association
between the two variables
r  +1 if they increase together
r  -1 if one decreases as other increases
r  0 if they do not relate to one another
• The correlation coefficient can be
related to results of the regression

• Unlike the regression parameters,
a and b, the correlation
coefficient, r, is symmetric in x
and y and therefore does not
require choosing of independent
and dependent variables
17
r
(x i  x )(y i  y )
(x i  x ) 2 (y i  y ) 2
2
(N  2) sx,y
 1
(N 1) sy2
b
x i y i  N x y
x i2  N(x ) 2
a y bx
Linearization of Nonlinear Models
• Many nonlinear equations can be “linearized”
by selecting a suitable change of variables
V  Vmax
[S]
K m  [S]
– (A) Nonlinear Michaelis-Menten model of enzymatic
reaction rate, V, in terms of substrate concentration
[S], and kinetic constants, Vmax and Km
– (B) Linearized double reciprocal Lineweaver-Burk
representation of Michaelis-Menten equation 
• Historically this has been a common approach
in analysis of scientific data, mainly due to
ease of implementation
• However, “linearization” often distorts the error
structure, violates key assumptions, and affects
resulting model parameter values, which may
lead to incorrect conclusions
1 Km 1
1


V Vmax [S] Vmax

• In our modern era of computers it is usually
wisest to perform nonlinear least squares
analysis when dealing with nonlinear data sets
18
adapted from Lobemeier, 2000
General Model Fitting
• It is important to understand where these regression
equations come from, but this is rarely done by hand.
• Spreadsheet programs typically have several trendline functions built in, including nonlinear models
which follow the same idea but cannot readily be
solved analytically.
• Often in biomedical experiments, a data set is
governed by a system of equations determined by
underlying physical principles rather than just the
apparent shape of the curve.
19
Nonlinear Model Fitting
• The selected model ŷ is a nonlinear
function of model parameters ak, k=1..M
comprising the vector a
• The 2 merit function is still given by
2
• The gradient  with respect to
model parameters ak must approach
zero at minimum 2
• However,
because the gradients are

nonlinear functions of a, minimization
must proceed by iteratively updating a
until 2 stops decreasing.

• In the steepest descent method, the
constant, , must be small enough to
not exhaust the downhill direction.
yˆ i  yˆ (x i ,a)
N
 2 (a)  

i1
[y i  yˆ (x i,a)]2
s i2
N 
( 2 )
[y i  yˆ (x i ,a)]yˆ (x i,a) 
 2


2
ak
s
a


i
k
i1 

a next  a current     2 (a current )
• Alternative numerical methods include the inverse-Hessian method, the
method, and the robust but
popular hybrid Levenberg-Marquardt
inefficient full Newton-type methods.
20
Goodness of Fit and the
Residuals Plot
• The correlation coefficient (r2) is
often used to characterize the
goodness of fit between model
and data
• However, a high correlation can
exist even for a model that
systematically differs from the data
(all 3 examples have r2 > 0.99)
• One must also examine the
distribution of residuals—a good
model fit should yield residuals
equally distributed along x and
normally distributed around zero
with no systematic trends, as in A
rather than B or C
21
model fits
residuals
A
A
B
B
C
C
adapted from Lobemeier, 2000
Global Error Minimization
• The error function depends on
model parameters ak, and can
be thought of as an Mdimensional “surface” of which
we seek the minimum
• Depending on the complexity
of the model (i.e. the number
of model degrees of freedom)
the error surface may be quite
“bumpy”
from Numerical Recipes online
• A challenge is to ensure that a given set of “optimal” model
parameters represents the true global minimum of the error
surface, and not a local minimum
• This can be tested by varying the initial guesses and
comparing the resulting model parameters and fitting error
22
Implementation in MATLAB
% basic model fitting routine
global known;
filename = input('Enter the name of file: ','s');
% read x and y data from file
data = dlmread(filename);
x_data = data(:,1);
y_data = data(:,2);
known = 5;
% assign known model parameters
guess = [.1 .1 1 1];
% guess initial values of unknown model parameters
LB = [-1000, -1000, -1000, -1000]; % LB and UB can be used to enforce lower and upper
UB = [1000, 1000, 1000, 1000];
%
bounds on parameter values if desired
[optimum,resnorm,residual] = lsqcurvefit(@model,guess,x_data,y_data,LB,UB)
% optimum contains the fitted model parameters
% resnorm is the sum of squared residuals fitting error
% residual is a vector of residuals for further
inspection
y_model=model(optimum,x_data);
% generate vector of simulated data for plotting
plot(x_data,y_data,'bx',x_data,y_model,'r-');
xlabel('Independent Variable (units)');
ylabel('Dependent Variable (units)');
23
file model.m
function y = model(a,x)
global known;
y = a(1)+a(2)*x.^2+a(3).*sin(a(4).*x) – known;
return
% model may depend on known variables
MATLAB Example (continued)
y=a(1)+a(2)*x.^2+a(3).*sin(a(4).*x) - known
true=
1.0000
2.0000
-1.0000
6.0000
guess=
0.8000
1.5000
2.0000
2.0000
guess=
0.8000
1.5000
-2.0000
5.0000
optimum =
0.0288
2.2895
0.9220
1.7929
optimum =
1.0000
2.0000
-1.0000
6.0000
resnorm =
12.2049
resnorm =
8.0159e-10
local minimum
global minimum
Y values
x data
- initial guess
- optimal fit
Y values
x data
- initial guess
- optimal fit
24
known=
5.0000
X values
X values
y
Comparing Two Model Fits
• The number of data points, N, must exceed
the number of model parameters, M, yielding
the degrees of freedom (DOF = N-M)
• Increasing M using a more complex model
will generally improve the quality of fit and
reduce 2
• The mean squared error, MSE, can be used
to compare two models fit to a given data set
x
M  N 1

MSE 
2
• Increasing MSE with decreasing can
reveal an over-parameterized model
• An F-statistic can be computed to compare 
the results of two model fits
25
– F ~ 1, the simpler model is adequate
– F > 1, the more complex model is better, or
random error led to a better fit with the
complex model
– P-value defines the probability of such a
“false positive” result (lookup in F table)
model 1
model 2
F
2
NM

2
DOF
2
2
( simple
 complex
)
(DOFsimple  DOFcomplex )
2
complex
DOFcomplex
df1  DOFsimple  DOFcomplex  M complex  M simple
df2  DOFcomplex  N  M complex

26
Accuracy of Estimated
Model Parameters
• The underlying true set of
system parameters, atrue,
is known to Mother Nature
but unknown to the
experimenter
• True parameters are statistically realized,
along with measurement errors, as the
measured data set D(0)
D(o)
D(1)
D(2)
D(3)
from Numerical Recipes online
• Fitting D(0) using 2 minimization yields
the estimated model parameters a(0)
• Other experiments could have resulted in data sets D(1), D(2), etc.
which would have yielded model parameters a(1), a(2), etc.
• We wish to estimate the probability distribution of a(j) - atrue without
knowing atrue and without a limitless number of hypothetical data
sets. Hmmmm…
27
Monte Carlo Simulation of
Synthetic Data Sets
DS(1)
• Assume that if a(0) is a
reasonable estimate of
D(o)
atrue, then the distribution
of a(j)-a(0) should be similar to that of a(j)-atrue
DS(2)
DS(3)
• With the assumed a(0), and some understanding of
D
the characteristics of the measurement noise, we
from Numerical Recipes online
can generate “synthetic data sets” DS(1), DS(2),…
at the same xi values as the actual data set, D(0), that
have the same relationship to a(0) as D(0) has to atrue
S
(4)
• For each DS(j), perform a model fit to obtain
corresponding aS(j), yielding one point aS(j)- a(0) for
simulating the desired M-dimensional probability
distribution. This is a very powerful technique!!
2-parameter probability distribution for
1,000 Monte Carlo simulations
• Note: if experimental si2 are not known, can
2
s

estimate after fit and use randn function in MATLAB
28
N
[y
i1
2
ˆ

y
(x
,a
)]
i
i
(0)
NM
The Bootstrap Method
• If you don’t know enough about the measurement errors
(i.e. cannot even say they are normally distributed) then
Monte Carlo simulation cannot be used.
• Bootstrap Method uses actual data set D(0), with its N data
points, to generate synthetic data sets DS(1), DS(2),… also
with N data points.
• Randomly select N data points from D(0) with replacement,
which makes DS(j) differ from D(0) with a fraction of the
original points replaced by duplicated original points.
• The 2 merit function does not depend on the order of (xi,yi),
so fitting the DS(j) data yields model parameter sets aS(j) as
with Monte Carlo, except using actual measurement noise.
29
Confidence Intervals and
Accuracy of Model Parameters
• The probability distribution is a
function defined on M-dimensional space of parameters a
• A confidence interval is a
region that contains a high
percentage of the total
distribution relative to model
parameters of interest
• You choose the confidence
level (e.g. 68.3%, 90%, etc.)
and the region shape
– e.g. lines, ellipses, ellipsoids
from Numerical Recipes online
30
In MATLAB: y=prctile(x,[5 95])
• You want a region that is
compact and reasonably
centered on a(0)
Validating Physical Interpretation of
Model Parameters
• Physical sensibleness
– Chemical rate constant cannot be negative
– Poisson’s ratio cannot exceed 0.5
– Can enforce lower and upper bounds on parameters,
but should examine closely if these end up “optimal”
• Independent measurements of key physical quantities
– Comparison with published values or limiting behavior
– Measure steady state modulus of viscoelastic material
• Experimentally alter specific parameters, collect data, and
examine results of model fit
– May involve building a physical model for testing
• Compare model fitting results using data from normal and
abnormal populations
– In asthma patients, airway resistance should be higher than normal
31
Problem Set
B lymphocytes in the immune response
The movie is available at
http://www.youtube.com/watch?v=iDYL4x1Q6uU
www.EnCognitive.com
32
Harnett, Science, 2006
Problem Set
• Ordinary differential equation (ODE) model of BrdU
labeling to estimate proliferation and death rates of
B cells
U – number of unlabeled B cells
L – number of BrdU labeled B cells
p – rate of proliferation (per hour)
d – rate of death (per hour)
s – rate of B cell inflow from source (cells/hr)
• Given experimental data on fraction of total B cells
labeled with BrdU versus time, develop a model to
fit the data, estimate values of p, s, and d, and
evaluate the model performance
33
Steven Kleinstein and Uri Hershberg
Resources
• Numerical Recipes online
http://www.nrbook.com/a/bookfpdf.php
• MATLAB online help
http://www.mathworks.com/access/helpdesk/help/techdoc/
• GraphPad PRISM online guide to curve fitting
http://www.graphpad.com/manuals/prism4/regressionbook.pdf
• References
Anderson SM, Khalil A, Uduman M, Hershberg U, Louzoun Y, Haberman AM,
Kleinstein SH, Shlomchik MJ. Taking advantage: high-affinity B cells in the
germinal center have lower death rates, but similar rates of division, compared
to low-affinity cells, J Immunol, 183:7314-7325, 2009.
Glantz SA. Primer of Biostatistics, 6th Ed., McGraw-Hill, 2005.
Lobemeier ML. Linearization plots: time for progress in regression, BioMedNet,
issue 73, March 3, 2000.
Lutchen KL and Costa KD. Physiological interpretations based on lumped element
models fit to respiratory impedance data: use of forward-inverse modeling,
IEEE Trans Biomed Eng, 37:1076-1086, 1990.
34
Slides from a lecture in the course Systems Biology—Biomedical
Modeling
Citation: K. D. Costa, S. H. Kleinstein, U. Hershberg, Biomedical model fitting and
error analysis, Sci. Signal. 4, tr9 (2011).
www.sciencesignaling.org