ENGRD 241 / CEE 241: Engineering Computation 1

Download Report

Transcript ENGRD 241 / CEE 241: Engineering Computation 1

Engineering Computation
Curve Fitting 1
Curve Fitting By
Least-Squares Regression
and
Spline Interpolation
Part 7
Engineering Computation
Curve fitting
Curve Fitting:
Given a set of points:
- experimental data
- tabular data
- etc.
Fit a curve (surface) to the points so that we can easily evaluate
f(x) at any x of interest.
If x within data range
 interpolating (generally safe)
If x outside data range
extrapolating (often dangerous)
2
Engineering Computation
Curve fitting
Curve Fitting:
Two main methods will be covered:
1. Least-Squares Regression
• Function is "best fit" to data.
• Does not necessarily pass through points.
• Used for scattered data (experimental)
• Can develop models for analysis/design.
2. Interpolation
• Function passes through all (or most) points.
• Interpolates values of well-behaved (precise) data or for
geometric design.
3
Engineering Computation
Curve Fitting & Interpolation
Curve Fitting:
1. We have discussed Least-Squares Regression where the
function is "best fit" to points but does not necessarily pass
through the points.
2. We now discuss Interpolation & Extrapolation
The function passes through all (or at least most) points.
interpolation
extrapolation
Engineering Computation
Least Squares Regression: General Procedure
5
Engineering Computation
Least Squares Regression
Curve Fitting by Least-Squares Regression:
Objective:
Obtain low order approximation (curve or surface) that
"best fits" data
Note:
• Because the order of the approximation is < the number
of data points, the curve or surface can not pass through
all points.
• We will need a consistent criterion for determining the
"best fit."
Typical Usage:
Scattered (experimental) data
Develop empirical models for analysis/design.
6
Engineering Computation
Least Squares Regression
Least-Squares Regression:
1. In laboratory, apply x, measure y, tabulate data.
2. Plot data and examine the relationship.
y
yi
xi
x
7
Engineering Computation
Least Squares Regression
Least-Squares Regression:
1. In laboratory, apply x, measure y, tabulate data.
2. Plot data and examine the relationship.
y
yi
xi
x
8
Engineering Computation
Least Squares Regression
Least-Squares Regression:
3. Develop a "model" – an approximate relationship between y
and x:
y = mx +b
4. Use the model to predict or estimate y for any given x.
5. "Best fit" of the data requires:
• Optimal way of finding parameters (e.g., slope and
intercept of a straight line.
• Perhaps optimize the selection of the model form
(i.e., linear, quadratic, exponential, ...).
• That the magnitudes of the residual errors do not vary in
any systematic fashion. [In statistical applications, the
residual errors should be independent and identically
distributed.]
9
Engineering Computation
Least Squares Regression
10
Least-Squares Regression
Given: n data points: (x1,y1), (x2,y2), … (xn,yn)
Obtain: "Best fit" curve:
f(x) =a0 Z0(x) + a1 Z1(x) + a2 Z2(x)+…+ am Zm(x)
ai's are unknown parameters of model
Zi's are known functions of x.
We will focus on two of the many possible types of regression
models:
Simple Linear Regression
Z0(x) = 1 & Z1(x) = x
General Polynomial Regression
Z0(x) = 1, Z1(x)= x, Z2(x) = x2, …, Zm(x)= xm
b = REGRESS(y,X) returns the vector of regression
coefficients, b, in the linear model y = Xb,
(X is an nxp matrix, y is the nx1 vector of observations).
[B,BINT,R,RINT,STATS] = REGRESS(y,X,alpha) uses the input,
ALPHA to calculate 100(1 - ALPHA) confidence intervals for
B and the residual vector, R, in BINT and RINT
respectively. The vector STATS contains the R-square
statistic along with the F and p values for the regression.
>> x=linspace(0,1,20)’;
>> y=2*x+1+0.1*randn(20,1);
>> plot(x,y,'.')
>> xx=[ones(20,1), x];
>> b=regress(y,xx)
b=
>> yy=xx*b;
1.0115
>> hold on
1.9941
>> plot(x,yy,‘k-')
Engineering Computation
Least Squares Regression: General Procedure
12
Least Squares Regression (cont'd):
General Procedure:
For the ith data point, (xi,yi) we find the set of coefficients for
which:
yi = a0 Z0(xi) + a1 Z1(xi) .... + am Zm (xi) + ei
where ei is the residual error = the difference between reported
value and model:
ei = yi – a0Z0 (xi) – a1Z1 (x)i –… – amZm (xi)
Our "best fit" will minimize the total sum of the squares of the
residuals:
n
Sr   ei2
i 1
Engineering Computation
Least Squares Regression: General Procedure
13
measured
value
y
yi
ei
modeled
value
x
xi
Our "best fit" will be the function which minimizes the sum of
squares of the residuals:
2
n
n 
m

 yi 
Sr 
ei2 
a jZ j (x i ) 


i 1
i 1 
j1

n
Sr 
 yi  a 0 Z0 (x i )  a1Z1 (x i )  a 2 Z2 (x i )   a m Zm (x i ) 2


i 1



Engineering Computation
Least Squares Regression: General Procedure
14
Least Squares Regression (cont'd):
n
Sr   e
2
i
i 1
n
  ( yi  a0 Z 0 ( xi )    am Z m ( xi ))
i 1
To minimize this expression with respect to the unknowns a0, a1
… am take derivatives of Sr and set them to zero:
n
Sr
 2
a 0
 Z (x )(y  a Z (x )  ...  a
0
Sr
 2
a1
i
i
0
0
i
i 1
n
 Z (x )(y  a Z (x )  ...  a
Sr
 2
a m
m Z m (x i ))
1
i
i
0
0
i
m Z m (x i ))
i 1
n
Z
i 1
m (x i )(yi
 a 0 Z0 (x i )  ...  a m Z m (x i ))
Engineering Computation
Least Squares: Linear Algebra
Least Squares Regression (cont'd):
In Linear Algebra form:
{Y} = [Z] {A} + {E} or {E} = {Y} – [Z] {A}
where: {E} and {Y} --- n x 1
[Z] -------------- n x (m+1)
{A} ------------- (m+1) x 1
n = # points
(m+1) = # unknowns
{E}T = [e1 e2 ... en],
{Y}T = [y1 y2 ... yn],
{A}T = [a0 a1 a2 ... am]
 Z0 x1
Z x
 Z   0 2

 Z0 x n
Z0 x1
Z1x 2
Z1x n
Zm x1 
Zm x 2 


Zm x n 
15
Engineering Computation
Least Squares: Sum Square error 16
Least Squares Regression (cont'd):
{E} = {Y} – [Z]{A}
Then
Sr = {E}T{E} = ({Y}–[Z]{A})T ({Y}–[Z]{A})
= {Y}T{Y} – {A}T[Z]T{Y} – {Y}T[Z]{A} + {A}T[Z]T[Z]{A}
= {Y}T{Y}– 2 {A}T[Z]T{Y} + {A}T[Z]T[Z]{A}
Setting
or
=0
for i =1,...,n yields:
Sr
= 0 = 2 [Z]T[Z]{A} – 2 [Z]T{Y}
ai
[Z]T[Z]{A} = [Z]T{Y}
Engineering Computation
Least Squares: Normal Equations
Least Squares Regression (cont'd):
[Z]T[Z]{A} = [Z]T{Y}
(C&C Eq. 17.25)
This is the general form of Normal Equations.
They provides (m+1) equations in (m+1) unknowns.
(Note that we end up with a system of linear equations.)
17
Engineering Computation
Least Squares: Simple Linear Regression
Simple Linear Regression (m = 1):
Given: n data points, (x1,y1),(x2,y2),…(xn,yn)
with n > 2
Obtain: "Best fit" curve:
f(x) = a0 + a1x
from the n equations:
y1 = a0 + a1x1 + e1
y2 = a0 + a1x2 + e2
yn = a0 + a1xn + en
Or, in matrix form: [Z]T[Z] {A} = [Z]T{Y}
1 1
x x
2
 1
1 x1 
1  1 x 2  a 0   1 1

 

xn  
  a1   x1 x 2


1
x
n

 y1 
1   y 2 
 
x n   
 y n 
18
Engineering Computation
Least Squares: Simple Linear Regression
19
Simple Linear Regression (m = 1):
Normal Equations
[Z]T[Z] {A} = [Z]T{Y}
upon multiplying the matrices become

 n


 n
xi

 i1

 n


xi 
yi 
 a 0   i1

i 1
  

n
n
a
 1 

2
xi 
x i yi 

i 1

 i1

n


 

Normal Equations
for Linear Regression
C&C Eqs. (17.4-5)
(This form works well
for spreadsheets.)
Engineering Computation
Least Squares: Simple Linear Regression
20
Simple Linear Regression (m = 1):
[Z]T[Z] {A} = [Z]T{Y}
Solving for {a}:
n
n
a1 
n
n
x y x y
i i
i 1
i 1
n
n
i

i 1

x i2  


n

i 1
i 1
2

xi 


i
1
yi  a1 
 n
i 1
 y  a1x
1
a0 
n
n

C&C equations (17.6) and (17.7)
n

i 1

xi  

Engineering Computation
Least Squares: Simple Linear Regression
21
Simple Linear Regression (m = 1):
[Z]T[Z] {A} = [Z]T{Y}
A better version of the first normal equation is:
n
a1 
  y  y  x  x 
i
i

 xi  x 
i 1
n
2
i 1
which is easier and numerically more stable, but the 2nd equation
remains the same:
a 0  y  a1x
ENGRD 241 / CEE 241: Engineering Computation
Curve Fitting 22
Common Nonlinear Relations:
Objective: Use linear equations for simplicity.
Remedy: Transform data into linear form and perform
regressions.
Given: data which appears as:
(1) exponential-like curve:
y = a1eb1x
(e.g., population growth, radioactive decay,
attenuation of a transmission line)
Can also use: ln(y) = ln(a1) + b1x
ENGRD 241 / CEE 241: Engineering Computation
Curve Fitting 23
5
Common Nonlinear Relations:
4
3
(2) Power-like curve:
y = a2x
ln(y) = ln(a2) + b2 ln(x)
b2
2
a3=5
b3=1..10
1
20
(3) Saturation growth-rate curve
40
60
80
a3x
y=
b3 + x
population growth under limiting conditions
1
1 1 b3 1
= a3 + b3 = +
y
x a3 a3 x
Be careful about the implied distribution of the errors. Always
use the untransformed values for error analysis.
100
Engineering Computation
Goodness of fit
Major Points in Least-Squares Regression:
1. In all regression models one is solving an
overdetermined system of equations, i.e., more
equations than unknowns.
2. How good is the fit?
Often based on a coefficient of determination, r2
24
Engineering Computation
Goodness of fit
r2 Compares the average spread of the data about the
regression line compared to the spread of the data about
the mean.
Spread of the data around the regression line:
Sr  ei2  ( yi  y'i )2
Spread of the data around the mean:
St 
(y - y)
i
2
25
Engineering Computation
Goodness of fit
26
Coefficient of determination
describes how much of variance is “explained” by
the regression equation
St - Sr
r =
St
2
• Want r2 close to 1.0.
• Doesn't work if models have different numbers of parameters.
• Be careful when using different transformations – always do the
analysis on the untransformed data.
Engineering Computation
Standard Errpr of the estimate
27
Precision :
If the spread of the points around the line is of similar
magnitude along the entire range of the data,
Then one can use
sy
x
Sr

n  (m  1)
= standard error of the estimate
(standard deviation in y)
to describe the precision of the regression estimate (in which m+1
is the number of coefficients calculated for the fit, e.g., m+1=2 for
linear regression)
Engineering Computation
Standard Errpr of the estimate
28
Statistics
Chapra and Canale in sections PT5.2, 17.1.3 and 17.4.3 discuss
the statistical interpretation of least squares regression and some
of the associated statistical concepts.
The statistical theory of least squares regression is elegant,
powerful, and widely used in the analysis of real data
throughout the sciences.
See Lecture Notes pages X-14 through X-16.