Transcript No Slide Title

```Slides for Introduction to Stochastic Search
and Optimization (ISSO) by J. C. Spall
CHAPTER 3
RECURSIVE ESTIMATION FOR
LINEAR MODELS
•Organization of chapter in ISSO
–Linear models
•Relationship between least-squares and mean-square
–LMS and RLS estimation
–LMS, RLS, and Kalman filter for time-varying solution
–Case study: Oboe reed data
Basic Linear Model
•Consider estimation of vector  in model that is linear in 
•Model has classical linear form
zk  hkT   vk ,
where zk is kth measurement, hk is corresponding “design
vector,” and vk is unknown noise value
•Model used extensively in control, statistics, signal
processing, etc.
•Many estimation/optimization criteria based on “squarederror”-type loss functions
– Unique (global) estimate 
3-2
Least-Squares Estimation
•Most common method for estimating  in linear model is
by method of least squares
•Criterion (loss function) has form
1 n
1
T 2
T
(
z

h

)

(
Z

H

)
( Z n  Hn  )

k
k
n
n
2n k 1
2n
where Zn = [z1, z2 ,…, zn]T and Hn is n  p concatenated
matrix of hkT row vectors
•Classical batch least-squares estimate is
ˆ (n)  (HnT Hn )1HnT Zn
•Popular recursive estimates (LMS, RLS, Kalman filter)
may be derived from batch estimate
3-3
Geometric Interpretation of Least-Squares
Estimate when p = 2 and n = 3
3-4
Recursive Estimation
•Batch form not convenient in many applications
– E.g., data arrive over time and want “easy” way to update
estimate at time k to estimate at time k+1
•Least-mean-squares (LMS) method is very popular
recursive method
– Stochastic analogue of steepest descent algorithm
•LMS recursion:
ˆ k 1  ˆ k  ahk 1(hkT1ˆ k  zk 1), a  0
•Convergence theory based on stochastic approximation
(e.g., Ljung, et al., 1992; Gerencsér, 1995)
– Less rigorous theory based on connections to steepest
descent (ignores noise) (Widrow and Stearns, 1985;
Haykin, 1996)
3-5
LMS in Closed-Loop Control
•Suppose process is modeled according to autoregressive
(AR) form:
xk 1  0 xk  1xk 1 
 m xk m  uk  wk ,
where xk represents state,  and i are unknown
parameters, uk is control, and wk is noise
•Let target (“desired”) value for xk be dk
•Optimal control law known (minimizes mean-square
tracking error):
dk 1  0 xk  1xk 1   m xk  m
uk 

•Certainty equivalence principle justifies substitution of
parameter estimates for unknown true parameters
– LMS used to estimate  and i in closed-loop mode
3-6
LMS in Closed-Loop Control for
First-Order AR Model
3-7
Recursive Least Squares (RLS)
•Alternative to LMS is RLS
– Recall LMS is stochastic analogue of steepest descent
(“first order” method)
– RLS is stochastic analogue of Newton-Raphson (“second
order” method)  faster convergence than LMS in practice
•RLS algorithm (2 recursions):
Pk 1  Pk 
Pk hk 1hkT1Pk
1  hkT1Pk hk 1
ˆ k 1  
ˆ k  Pk 1hk 1(hkT1
ˆ k  zk 1)

ˆ 0 to initialize RLS recursions
•Need P0 and 
3-8
Recursive Methods for Estimation of TimeVarying Parameters
•It is common to have the underlying true  evolve in time
(e.g., target tracking, adaptive control, sequential
experimental design, etc.)
– Time-varying parameters implies  replaced with k
•Consider modified linear model
zk  hkT k  vk
•Prototype recursive form for estimating k is
ˆ k 1  Ak 
ˆ k  k 1(hkT1Ak 
ˆ k  zk 1),

where choice of Ak and k depends on specific algorithm
3-9
Three Important Algorithms for Estimation
of Time-Varying Parameters
• LMS
– Goal is to minimize instantaneous squared-error criteria
across iterations
– General form for evolution of true parameters k
• RLS
– Goal is to minimize weighted sum of squared errors
– Sum criterion creates “inertia” not present in LMS
– General form for evolution of k
• Kalman filter
– Minimizes instantaneous squared-error criteria
– Requires precise statistical description of evolution of k
via state-space model
• Details for above algorithms in terms of prototype
algorithm (previous slide) are in Section 3.3 of ISSO
3-10
Case Study: LMS and RLS with Oboe Reed Data
…an ill wind that nobody blows good.
—Comedian Danny Kaye in speaking of the oboe in the “The
Secret Life of Walter Mitty” (1947)
•Section 3.4 of ISSO reports on linear and curvilinear
models for predicting quality of oboe reeds
– Linear model has 7 parameters; curvilinear has 4
parameters
•This study compares LMS and RLS with batch leastsquares estimates
– 160 data points for fitting models (reeddata-fit ); 80
(independent) data points for testing models (reeddatatest)
– reeddata-fit and reeddata-test data sets
available from ISSO Web site
3-11
Oboe with
Attached Reed
Comparison of Fitting Results for
reeddata-fit and reeddata-test
• To test similarity of fit and test data sets, performed
model fitting using test data set
• This comparison is for checking consistency of the two
data sets; not for checking accuracy of LMS or RLS
estimates
• Compared model fits for parameters in
– Basic linear model (eqn. (3.25) in ISSO) (p = 7)
– Curvilinear model (eqn. (3.26) in ISSO) (p = 4)
• Results on next slide for basic linear model
3-13
Comparison of Batch Parameter Estimates for
Basic Linear Model. Approximate 95%
Confidence Intervals Shown in [·, ·]
reeddata-fit
Constant,
0.156
const
[0.52, 0.21]
0.102
Top close, T
[0.01, 0.19]
Appearance,
0.055
[0.08, 0.19]
A
Ease of
0.175
[0.05, 0.30]
Gouge, E
0.044
Vascular, V
[0.08, 0.17]
Shininess,
0.056
[0.06, 0.17]
S
0.579
First blow, F
[0.41, 0.74]
reeddata-test
0.240
[0.75, 0.28]
0.067
[0.12, 0.25]
0.178
[0.03, 0.39]
0.095
[0.15, 0.34]
0.125
[0.06, 0.31]
0.066
[0.13, 0.26]
0.541
[0.24, 0.84]
3-14
Comparison of Batch and RLS with
Oboe Reed Data
• Compared batch and RLS using 160 data points in
reeddata-fit and 80 data points for testing models
in reeddata-test
• Two slides to follow present results
– First slide compares parameter estimates in pure linear
model
– Second slide compares prediction errors for linear and
curvilinear models
3-15
Batch and RLS Parameter Estimates for Basic
Linear Model (Data from reeddata-fit )
Constant,
const
Top close, T
Appearance,
A
Ease of
Gouge, E
Vascular, V
Shininess,
S
First blow, F
Batch
Estimates
RLS
Estimates
0.156
0.079
0.102
0.101
0.055
0.046
0.175
0.171
0.044
0.043
0.056
0.056
0.579
0.540
3-16
Mean and Median Absolute Prediction
Errors for the Linear and Curvilinear Models
(Model fits from reeddata-fit; Prediction
Errors from reeddata-test)
Batch linear
model
Mean
Median
0.242
0.243
RLS
linear
model
0.242
0.250
Batch
curvilinear
model
0.235
0.227
RLS
curvilinear
model
0.235
0.224
 Ran matched-pairs t-test on linear versus curvilinear
models. Used one-sided test.
 P-value for Batch/linear versus Batch/curvilinear is
0.077
 P-value for RLS/linear vs. RLS/curvilinear is 0.10
 Modest evidence for superiority of curvilinear model
3-17
```