CHEE434/821 Module 0 Slides
Download
Report
Transcript CHEE434/821 Module 0 Slides
CHEE801
Module 5:
Nonlinear Regression
1
Notation
Model:
random noise
component
Yi f (x i , ) i
explanatory
variables – ith
run conditions
p-dimensional vector of parameters
Model specification –
f ( x i , )
– the model equation is
– with n experimental runs, we have
– ( ) defines the expectation surface
– the nonlinear regression model is
Y ( )
f ( x1 , )
f ( x 2 , )
( )
f ( x , )
n
2
Parameter Estimation – Gauss-Newton Iteration
Least squares estimation – minimize
Y ( )
2
e e S ( )
T
Numerical optimization procedure is required.
One possible method:
1. Linearization about the current estimate of the parameters
2. Solution of the linear(ized) regression problem to obtain the next
parameter estimate
3. Iteration until a convergence criterion is satisfied
3
Linearization about a nominal parameter vector
Linearize the expectation function η(θ) in terms of the parameter
vector θ about a nominal vector θ0:
( ) ( 0 ) V 0 ( 0 )
( 0 ) V 0
Sensitivity Matrix
-Jacobian of the expectation
function
-contains first-order
sensitivity information
V0
( )
T
0
f ( x1 , )
1
f ( x n , )
1
f ( x1 , )
p
f ( x n , )
p
0
4
Parameter Estimation – Gauss-Newton Iteration
Iterative procedure consisting of:
1. Linearization about the current estimate of the parameters
Y (
2.
) V
(i )
( i 1)
e
Solve the linearized regression problem to obtain the next parameter
estimate update
( i 1)
3.
(i )
(V
( i 1)
( i )T
V
( i ) 1
(i )
)
V
( i )T
( y (
(i )
))
( i 1)
Iterate until the parameter estimates converge
5
Computational Issues in Gauss-Newton Iteration
The Gauss-Newton iteration can be subject to poor numerical conditioning,
for some parameter values:
» Conditioning problems arise in inversion of VTV
» Solution – use a decomposition technique
• QR decomposition
• Singular Value Decomposition (SVD)
» Use a different optimization technique
» Don’t try to estimate so many parameters
• Simplify the model
• Fix some parameters at reasonable values
6
Other numerical estimation methods
• Nonlinear least-squares is a minimization problem
• Use any good optimization technique to find parameter estimates
to minimize the sum of squares of the residuals
7
Inference – Joint Confidence Regions
•
•
Approximate confidence regions for parameters and predictions can be
obtained by using a linearization approach
Approximate covariance matrix for parameter estimates:
T
1 2
ˆ ( Vˆ Vˆ )
•
•
ˆ is the Jacobian of () evaluated at the least squares
where V
parameter estimates
This covariance matrix is asymptotically the true covariance matrix for
the parameter estimates as the number of data points becomes infinite
100(1-α)% joint confidence region for the parameters:
T T
2
( ˆ ) Vˆ Vˆ ( ˆ ) p s F p , n p ,
» Compare to the linear regression case
8
Inference – Marginal Confidence Intervals
• Marginal confidence intervals
» Confidence intervals on individual parameters
ˆi t ,
/ 2 sˆi
where sˆ is the approximate standard error of the parameter
i
estimate
• – i-th diagonal element of the approximate parameter
estimate covariance matrix, with noise variance estimated
as in the linear case
T
1 2
ˆ ( Vˆ Vˆ ) s
9
Precision of the Predicted Responses
– Linear Case
From the linear regression module
The predicted response from an estimated model has uncertainty, because
it is a function of the parameter estimates which have uncertainty:
e.g., Solder Wave Defect Model - first response at the point -1,-1,-1
y1 0 1 ( 1) 2 ( 1) 3 ( 1)
If the parameter estimates were uncorrelated, the variance of the predicted
response would be:
V a r ( y1 ) V a r ( 0 ) V a r ( 1 ) V a r ( 2 ) V a r ( 3 )
Why?
10
Precision of the Predicted Responses - Linear
In general, both the variances and covariances of the parameter estimates
must be taken into account.
For prediction at the k-th data point:
T
T
Var ( yˆ k ) x k ( X X )
xk1
xk 2
1
2
x k
xk1
x
T
1 k 2 2
x kp ( X X )
x
kp
T
T
Var ( yˆ k ) x k ( X X )
1
2
T
x k x k ˆ x k
11
Precision of the Predicted Responses - Nonlinear
Linearize the prediction equation about the least squares estimate:
f (x k , )
ˆ ) f ( x , ˆ ) v T ( ˆ )
yˆ k f ( x k , ˆ )
(
k
k
T
ˆ
For prediction at the k-th data point:
T
T
1
2
Var ( yˆ k ) vˆ k ( Vˆ Vˆ ) vˆ k
vˆ k 1
vˆ k 2
vˆ k 1
v
ˆ
T
1 k 2 2
vˆ kp ( Vˆ Vˆ )
vˆ
kp
T
T
1
2
T
Note - Var ( yˆ k ) vˆ k ( Vˆ Vˆ ) vˆ k vˆ k ˆ vˆ k
12
Estimating Precision of Predicted Responses
Use an estimate of the inherent noise variance
T
T
1
T
T
1
s
2
yˆ k
x k (X X )
s
2
yˆ k
v k (V V )
2
x k s
2
v k s
linear
nonlinear
The degrees of freedom for the estimated variance of the predicted
response are those of the estimate of the noise variance
» replicates
» external estimate
» MSE
13
Confidence Limits for Predicted Responses
Linear and Nonlinear Cases:
Follow an approach similar to that for parameters - 100(1-α)% confidence
limits for the mean value of a predicted response are:
y k t , / 2 s y
k
» degrees of freedom are those of the inherent noise variance
estimate
If the prediction is for a new data value, confidence intervals are:
yˆ k t , / 2
s
2
2
se
yˆ k
Why?
14
Properties of LS Parameter Estimates
Key Point - parameter estimates are random variables
» because stochastic variation in data propagates through
estimation calculations
» parameter estimates have a variability pattern - probability
distribution and density functions
Unbiased
E { }
» “average” of repeated data collection / estimation sequences will
be true value of parameter vector
15
Properties of Parameter Estimates
Linear Regression Case
– Least squares estimates are –
» Unbiased
» Consistent
» Efficient
Nonlinear Regression Case
– Least squares estimates are –
» Asymptotically unbiased – as number of data points becomes
infinite
» Consistent
» Efficient
16
Diagnostics for nonlinear regression
• Similar to linear case
• Qualitative – residual plots
– Residuals vs.
» Factors in model
» Sequence (observation) number
» Factors not in model (covariates)
» Predicted responses
– Things to look for:
» Trend remaining
» Non-constant variance
• Qualitative – plot of observed and predicted responses
– Predicted vs. observed – slope of 1
– Predicted and observed – as function of independent variable(s)
17
Diagnostics for nonlinear regression
• Quantitative diagnostics
– Ratio tests:
» 3 tests are the same as for linear case
» R-squared
• coarse measure of significant trend
• squared correlation of observed and predicted values
• adjusted R-squared
• squared correlation of observed and predicted values
18
Diagnostics for nonlinear regression
• Quantitative diagnostics
– Parameter confidence intervals:
» Examine marginal intervals for parameters
• Based on linear approximations
• Can also use hypothesis tests
» Consider dropping parameters that aren’t statistically significant
» What should we do if parameters are
• Not significantly different from zero
• Not signficiantly different from the initial guesses
» In nonlinear models– parameters are more likely to be involved
in more complex expressions involving factors and other
parameters
• E.g., Arrhenius reaction rate expression
» If possible, examine joint confidence regions
19
Diagnostics for nonlinear regression
• Quantitative diagnostics
– Parameter estimate correlation matrix:
» Examine correlation matrix for parameter estimates
• Based on linear approximation
• Compute covariance matrix, then normalize using pairs of
standard deviations
» Note significant correlations and keep these in mind when
retaining/deleting parameters using marginal significance tests
» Significant correlation between some parameter estimates may
indicate over-parameterization relative to the data collected
• Consider dropping some of the parameters whose estimates
are highly correlated
• Further discussion – Chapter 3 - Bates and Watts (1988),
Chapter 5 - Seber and Wild (1988)
20
Practical Considerations
– What kind of stopping conditions should be used to determine
convergence?
– Problems with local minima?
– Reparameterization to reduce correlation between parameter
estimates
• Ensuring physically realistic parameter estimates
– Common problem – we know that some parameters should be
positive or should be bounded between reasonable values
– Solutions
» Constrained optimization algorithm to enforce non-negativity of
parameters
exp( ) positive
» Reparameterization tricks
• Estimate instead of
positive
10
1
1 e
Bounded between
0 and 1
21
Practical considerations
• Correlation between parameter estimates
– Reduce by reparameterization
– Exponential example –
1 exp( 2 x )
1 exp( 2 ( x x 0 x 0 ))
1 exp( 2 x 0 ) exp( 2 ( x x 0 ))
1 exp( 2 ( x x 0 ))
22
Practical considerations
• Particular example – Arrhenius rate expression
E 1
E
1
1
k 0 exp
)
k 0 exp (
R
T
T
T
RT
ref
ref
E 1
E
1
exp (
k 0 exp
)
RT
R T T
ref
ref
E 1
1
k ref exp (
)
R T T
ref
– Reduces correlation between parameter estimates and improves
conditioning of estimation problem
23
Practical considerations
• Scaling – of parameters and responses
• Choices
– Scale by nominal values
» Nominal values – design centre point, typical value over range,
average value
– Scale by standard errors or initial uncertainty ranges for parameters
» Parameters – estimate of standard devn of parameter estimate
» Responses – by standard devn of observations – noise standard
deviation
• Scaling can improve conditioning of the estimation problem (e.g.,
scale sensitivity matrix V), and can facilitate comparison of terms
on similar (dimensionless) bases
24
Practical considerations
• Initial parameter guesses are required
– From prior scientific knowledge
– From prior estimation results
– By simplifying model equations
25
Things to learn in CHEE 811
• Estimating parameters in differential equation models:
dy
dt
f ( y , u , t ; ); y ( t 0 ) y 0
• Estimating parameters in multi-response models
• Deriving model equations based on chemical engineering
knowledge and stories about what is happening
• Solving model equations numerically
• Deciding which parameters to estimate and which to leave at
initial guesses when data are limited.
26