CHEE434/821 Module 0 Slides

Download Report

Transcript CHEE434/821 Module 0 Slides

CHEE801
Module 5:
Nonlinear Regression
1
Notation
Model:
random noise
component
Yi  f (x i ,  )   i
explanatory
variables – ith
run conditions
p-dimensional vector of parameters
Model specification –
f ( x i , )
– the model equation is
– with n experimental runs, we have
–  ( ) defines the expectation surface
– the nonlinear regression model is
Y   ( )  
 f ( x1 ,  ) 


 f ( x 2 , ) 
 ( )  




 f ( x , ) 


n
2
Parameter Estimation – Gauss-Newton Iteration
Least squares estimation – minimize
Y   ( )
2
 e e  S ( )
T
Numerical optimization procedure is required.
One possible method:
1. Linearization about the current estimate of the parameters
2. Solution of the linear(ized) regression problem to obtain the next
parameter estimate
3. Iteration until a convergence criterion is satisfied
3
Linearization about a nominal parameter vector
Linearize the expectation function η(θ) in terms of the parameter
vector θ about a nominal vector θ0:
 ( )   ( 0 )  V 0 (   0 )
  ( 0 )  V 0 
Sensitivity Matrix
-Jacobian of the expectation
function
-contains first-order
sensitivity information
V0 
  ( )

T
0
  f ( x1 ,  )

 1

 

 f ( x n , )

 1



 f ( x1 ,  ) 

 p



f ( x n , ) 

 p

0
4
Parameter Estimation – Gauss-Newton Iteration
Iterative procedure consisting of:
1. Linearization about the current estimate of the parameters
Y   (
2.
)  V 
(i )
( i 1)
e
Solve the linearized regression problem to obtain the next parameter
estimate update

( i  1)

3.
(i )
 (V
( i 1)
( i )T

V
( i ) 1
(i )
)
V
( i )T
 
( y   (
(i )
))
( i 1)
Iterate until the parameter estimates converge
5
Computational Issues in Gauss-Newton Iteration
The Gauss-Newton iteration can be subject to poor numerical conditioning,
for some parameter values:
» Conditioning problems arise in inversion of VTV
» Solution – use a decomposition technique
• QR decomposition
• Singular Value Decomposition (SVD)
» Use a different optimization technique
» Don’t try to estimate so many parameters
• Simplify the model
• Fix some parameters at reasonable values
6
Other numerical estimation methods
• Nonlinear least-squares is a minimization problem
• Use any good optimization technique to find parameter estimates
to minimize the sum of squares of the residuals
7
Inference – Joint Confidence Regions
•
•
Approximate confidence regions for parameters and predictions can be
obtained by using a linearization approach
Approximate covariance matrix for parameter estimates:
T
1 2
 ˆ  ( Vˆ Vˆ )  
•
•
ˆ is the Jacobian of () evaluated at the least squares
where V
parameter estimates
This covariance matrix is asymptotically the true covariance matrix for
the parameter estimates as the number of data points becomes infinite
100(1-α)% joint confidence region for the parameters:
T T
2
(  ˆ ) Vˆ Vˆ (  ˆ )  p s F p , n  p ,
» Compare to the linear regression case
8
Inference – Marginal Confidence Intervals
• Marginal confidence intervals
» Confidence intervals on individual parameters
ˆi  t ,
/ 2 sˆi
where sˆ is the approximate standard error of the parameter
i
estimate
• – i-th diagonal element of the approximate parameter
estimate covariance matrix, with noise variance estimated
as in the linear case
T
1 2
 ˆ  ( Vˆ Vˆ ) s 
9
Precision of the Predicted Responses
– Linear Case
From the linear regression module
The predicted response from an estimated model has uncertainty, because
it is a function of the parameter estimates which have uncertainty:
e.g., Solder Wave Defect Model - first response at the point -1,-1,-1
y1  0  1 (  1)  2 (  1)  3 (  1)
If the parameter estimates were uncorrelated, the variance of the predicted
response would be:
V a r ( y1 )  V a r ( 0 )  V a r ( 1 )  V a r ( 2 )  V a r ( 3 )
Why?
10
Precision of the Predicted Responses - Linear
In general, both the variances and covariances of the parameter estimates
must be taken into account.
For prediction at the k-th data point:
T
T
Var ( yˆ k )  x k ( X X )

 xk1
xk 2
1
2
x k 
 xk1 


x
T
1  k 2  2
x kp ( X X ) 
 



x 
 kp 


T
T
Var ( yˆ k )  x k ( X X )
1
2
T
x k    x k  ˆ x k
11
Precision of the Predicted Responses - Nonlinear
Linearize the prediction equation about the least squares estimate:
f (x k , )
ˆ )  f ( x , ˆ )  v T (  ˆ )
yˆ k  f ( x k , ˆ ) 
(



k
k
T

ˆ

For prediction at the k-th data point:
T
T
1
2
Var ( yˆ k )  vˆ k ( Vˆ Vˆ ) vˆ k  

 vˆ k 1
vˆ k 2

 vˆ k 1 


v
ˆ
T
1  k 2  2
vˆ kp ( Vˆ Vˆ ) 
 



 vˆ 
 kp 

T
T
1
2
T
Note - Var ( yˆ k )  vˆ k ( Vˆ Vˆ ) vˆ k    vˆ k  ˆ vˆ k
12
Estimating Precision of Predicted Responses
Use an estimate of the inherent noise variance
T
T
1
T
T
1
s
2
yˆ k
 x k (X X )
s
2
yˆ k
 v k (V V )
2
x k s
2
v k s
linear
nonlinear
The degrees of freedom for the estimated variance of the predicted
response are those of the estimate of the noise variance
» replicates
» external estimate
» MSE
13
Confidence Limits for Predicted Responses
Linear and Nonlinear Cases:
Follow an approach similar to that for parameters - 100(1-α)% confidence
limits for the mean value of a predicted response are:
y k  t , / 2 s y
k
» degrees of freedom are those of the inherent noise variance
estimate
If the prediction is for a new data value, confidence intervals are:
yˆ k  t , / 2
s
2
2
 se
yˆ k
Why?
14
Properties of LS Parameter Estimates
Key Point - parameter estimates are random variables
» because stochastic variation in data propagates through
estimation calculations
» parameter estimates have a variability pattern - probability
distribution and density functions
Unbiased
E {  }  
» “average” of repeated data collection / estimation sequences will
be true value of parameter vector
15
Properties of Parameter Estimates
Linear Regression Case
– Least squares estimates are –
» Unbiased
» Consistent
» Efficient
Nonlinear Regression Case
– Least squares estimates are –
» Asymptotically unbiased – as number of data points becomes
infinite
» Consistent
» Efficient
16
Diagnostics for nonlinear regression
• Similar to linear case
• Qualitative – residual plots
– Residuals vs.
» Factors in model
» Sequence (observation) number
» Factors not in model (covariates)
» Predicted responses
– Things to look for:
» Trend remaining
» Non-constant variance
• Qualitative – plot of observed and predicted responses
– Predicted vs. observed – slope of 1
– Predicted and observed – as function of independent variable(s)
17
Diagnostics for nonlinear regression
• Quantitative diagnostics
– Ratio tests:
» 3 tests are the same as for linear case
» R-squared
• coarse measure of significant trend
• squared correlation of observed and predicted values
• adjusted R-squared
• squared correlation of observed and predicted values
18
Diagnostics for nonlinear regression
• Quantitative diagnostics
– Parameter confidence intervals:
» Examine marginal intervals for parameters
• Based on linear approximations
• Can also use hypothesis tests
» Consider dropping parameters that aren’t statistically significant
» What should we do if parameters are
• Not significantly different from zero
• Not signficiantly different from the initial guesses
» In nonlinear models– parameters are more likely to be involved
in more complex expressions involving factors and other
parameters
• E.g., Arrhenius reaction rate expression
» If possible, examine joint confidence regions
19
Diagnostics for nonlinear regression
• Quantitative diagnostics
– Parameter estimate correlation matrix:
» Examine correlation matrix for parameter estimates
• Based on linear approximation
• Compute covariance matrix, then normalize using pairs of
standard deviations
» Note significant correlations and keep these in mind when
retaining/deleting parameters using marginal significance tests
» Significant correlation between some parameter estimates may
indicate over-parameterization relative to the data collected
• Consider dropping some of the parameters whose estimates
are highly correlated
• Further discussion – Chapter 3 - Bates and Watts (1988),
Chapter 5 - Seber and Wild (1988)
20
Practical Considerations
– What kind of stopping conditions should be used to determine
convergence?
– Problems with local minima?
– Reparameterization to reduce correlation between parameter
estimates
• Ensuring physically realistic parameter estimates
– Common problem – we know that some parameters should be
positive or should be bounded between reasonable values
– Solutions
» Constrained optimization algorithm to enforce non-negativity of
parameters
  exp(  ) positive
» Reparameterization tricks

• Estimate  instead of 
positive
  10
 
1
1 e

Bounded between
0 and 1
21
Practical considerations
• Correlation between parameter estimates
– Reduce by reparameterization
– Exponential example –
1 exp(  2 x )
 1 exp(   2 ( x  x 0  x 0 ))
  1 exp(   2 x 0 ) exp(   2 ( x  x 0 ))
 1 exp(   2 ( x  x 0 ))
22
Practical considerations
• Particular example – Arrhenius rate expression
 E 1
E 
1
1 


k 0 exp  

)
  k 0 exp   ( 

R
T
T
T
 RT 
ref
ref



 E 1
E 
1 
 exp   ( 
 k 0 exp  
)
 RT

 R T T

ref
ref




 E 1
1 
 k ref exp   ( 
)
 R T T

ref 

– Reduces correlation between parameter estimates and improves
conditioning of estimation problem
23
Practical considerations
• Scaling – of parameters and responses
• Choices
– Scale by nominal values
» Nominal values – design centre point, typical value over range,
average value
– Scale by standard errors or initial uncertainty ranges for parameters
» Parameters – estimate of standard devn of parameter estimate
» Responses – by standard devn of observations – noise standard
deviation
• Scaling can improve conditioning of the estimation problem (e.g.,
scale sensitivity matrix V), and can facilitate comparison of terms
on similar (dimensionless) bases
24
Practical considerations
• Initial parameter guesses are required
– From prior scientific knowledge
– From prior estimation results
– By simplifying model equations
25
Things to learn in CHEE 811
• Estimating parameters in differential equation models:
dy
dt
 f ( y , u , t ;  ); y ( t 0 )  y 0
• Estimating parameters in multi-response models
• Deriving model equations based on chemical engineering
knowledge and stories about what is happening
• Solving model equations numerically
• Deciding which parameters to estimate and which to leave at
initial guesses when data are limited.
26