Math macros for Powerpoint

Download Report

Transcript Math macros for Powerpoint

Linear discrete inverse Problems
(parameter estimation)
Least squares and all that....
Least squares problems
Least squares is the basis of many parameter estimation and data
fitting procedures.
A concise tutorial can be found in Chapter 15 of
the book Numerical Recipes Press et al. (1992)
Cambridge Univ. Press.
Available for free online at http://www.nrbook.com
Good explanation of essentials in Aster et al.
(2005).
2
Linear discrete inverse problems
Can a and b be resolved ?
Under-determined
Over-determined
Even-determined
3
Over-determined: Linear discrete inverse problem
To find the best fit model we can
minimize the prediction error of
the solution
But the data contain errors. Let’s assume these are independent and
normally distributed, then we weight each residual inversely by the
standard deviation of the corresponding (known) error distribution.
We can obtain a least squares solution by minimizing the weighted
prediction error of the solution.
4
Over-determined: Linear discrete inverse problem
We seek the model vector m which minimizes
Compare with
maximum likelihood
Note that this is a quadratic function of the model vector.
Solution: Differentiate with respect to m and solve for the
model vector which gives a zero gradient in
This gives…
This is the least-squares solution.
A solution to the normal equations:
3
5
Over-determined: Linear discrete inverse problem
How does the Least-squares solution compare to the standard
equations of linear regression ?
Given N data yi with independent normally distributed errors
and standard deviations i what are the expressions for the
model parameters m = [a,b]T ?
6
Linear discrete inverse problem: Least squares
What happens in the under and even-determined cases ?
Under-determined, N=1:
Matrix has a zero determinant
What
m ?eigenvalue
and aiszero
an infinite number of solutions exist
Even-determined, N=2:
r
d
m
0
G =error ?
What=is the¡prediction
Prediction error is zero !
7
Example: Over-determined, Linear discrete inverse
problem
The Ballistics example
Given data and noise
Calculate G
Is the data fit good enough ?
And how to errors in data propagate
into the solution ?
8
The two questions in parameter
estimation
We have our fitted model parameters
…but we are far from finished !
We need to:
Assess the quality of the data fit.
Goodness of fit: Does the model fit the data to within
the statistical uncertainty of the noise ?
Estimate how errors in the data propagate
into the model
What are the errors on the model parameters ?
9
Goodness of fit
Once we have our least squares solution mLS how do we know
whether the fit is good enough given the errors in the data ?
Use the prediction error at the least squares solution !
If data errors are Gaussian this as a chi-square statistic
10
Why do we always assume errors are Gaussian ?
Probability density functions of 5 random variables
X
i
0
The Central
Limit Theorem
2.5
0
100000
deviates
Xi
0.5
X3
5.0
1.0
0
0.5
X1
1.0
0
0.5
X2
1.0
0
0.5
X4
1.0
0
0.5
X5
1.0
11
Mathematical Background:
Probability distributions
A random variable x follows a Normal or Gaussian probability
density function if
Multivariate case
If data covariance matrix is diagonal
then data errors are independent.
12
Mathematical Background:
Probability distributions
Expectation operator
Expectation of a Gaussian random variable
Variance
Covariance
If X and Y are independent then
13
Mathematical Background:
Probability distributions
Multi-dimensional Gaussian
Expectation of a Gaussian random vector
Covariance matrix
2
Correlation matrix
6
6
6
4
¾1;1
¾2;1
¾3;1
¾4;1
¾1;2
¾2;2
¾3;2
¾4;2
¾1;3
¾2;3
¾3;3
¾4;3
¾1;4
¾2;4
¾3;4
¾4;4
3
7
7
7
5
Independence between X and Y
Positive or negative correlation
14
Background: Chi-square distribution
If x follows a Normal distribution
What distribution does the square of x follow ?
Answer: A chi-square distribution with
1 degree of freedom
If x1 and x2 are Normal, what distribution does y=x21 + x22 follow ?
Answer: A chi-square distribution with
2 degrees of freedom
15
Goodness of fit
For Gaussian data errors the data prediction error is the square of
a Gaussian random variable hence it has a chi-square probability
density function with N-M degrees of freedom.
p = 0.95
p = 0.05
ndf Â2 ( 5% ) Â2 ( 50% ) Â2 ( 95% )
5
1:15
4:35
11:07
10
3:94
9:34
18:31
20
10:85
19:34
31:41
50
34:76
49:33
67:50
100 77:93
99:33
124:34
The
test provides a means to testing the assumptions that
went into producing the least squares solution. It gives the
likelihood that the fit actually achieved is reasonable.
16
Example: Goodness of fit
The Ballistics problem
Given data and noise
How many degrees of freedom ?
 =N-M = 10 -3=7
In practice values between 0.1 and 0.9 are plausible
4
17
Variance ratio
Another way of looking at the chi-square statistic is as the variance
ratio of two distributions.
Given two sets of random samples
Question: Did they come from the same distribution or not ?
The ratio of the variances of the two distributions follows a chi-square distribution.
Chi-square tables tell us the likelihood that the two sets of observables
come from the same distribution.
18
Goodness of fit
For Gaussian data errors the chi-square statistic has a chisquare distribution with u = N-M degrees of freedom.
ndf Â2 ( 5% ) Â2 ( 50% ) Â2 ( 95% )
5
1:15
4:35
11:07
10
3:94
9:34
18:31
20
10:85
19:34
31:41
50
34:76
49:33
67:50
100 77:93
99:33
124:34
Exercise:
If I fit 7 data points with a straight line and get
what would you conclude ?
If I fit 102 data points with a straight line and get
what would you conclude ?
If I fit 52 data points with a straight line and get
what would you conclude ?
19
Goodness of fit
For Gaussian data errors the chi-square statistic has a chisquare distribution with u = N-M degrees of freedom.
ndf Â2 ( 5% ) Â2 ( 50% ) Â2 ( 95% )
5
1:15
4:35
11:07
10
3:94
9:34
18:31
20
10:85
19:34
31:41
50
34:76
49:33
67:50
100 77:93
99:33
124:34
What could be the cause if:
the prediction error is much too large ? (poor data fit)
Truly unlikely data errors
Errors in forward theory
Under-estimated data errors
the prediction error is too small ? (too good data fit)
Truly unlikely data errors
Over-estimated the data errors
Fraud !
20
Solution error
Once we have our least squares solution mLS and we know
that the data fit is acceptable, how do we find the likely errors
in the model parameters arising from errors in the data ?
m L S = G ¡ gd
The data set we actually observed is only one realization of
the many that could have been observed
d0 !
m 0L S !
d+ ²
m LS + ²m
m 0L S = G ¡ gd 0
m L S + ² m = G ¡ g( d + ² )
The effect of adding noise to the data is to add noise to the solution
² m = G ¡ g²
The model noise is a linear combination of the data noise !
21
Solution error: Model Covariance
Multivariate Gaussian data error distribution
How to turn this into a probability distribution for the model errors ?
We know that the solution error is a linear combination of
the data error
² m = G ¡ g²
The covariance of any linear
combination Ad of Gaussian
distributed random variables d is
So we have the covariance of the model parameters
22
Solution error: Model Covariance
The model covariance for a least squares problem depends
on data errors and not the data itself ! G is controlled by
the design of the experiment.
is the least squares solution
The data error distribution gives a
model error distribution !
23
Solution error: Model Covariance
For the special case of independent data errors
Independent data errors
Correlated model errors
For linear regression problem
24
Confidence intervals by projection (1-D)
The model covariance is a symmetric M x M matrix
In the multi-dimensional model space
the value of D2 follows a
distribution with M degrees of freedom.
Projecting onto the mi axis the 1-D confidence interval becomes
Where D2 follows a 21 distribution
e.g. for 90% interval on m1
Note this is 90% the confidence interval on m1 alone.
The joint (m1, m2) 90% confidence ellipse is wider than this.
25
Example: Model Covariance and confidence intervals
For Ballistics problem
95% confidence interval for parameter i
26
Confidence intervals by projection
The M-dimensional confidence ellipsoid can be projected onto any subset (or
combination) of D parameters to obtain the corresponding confidence ellipsoid.
Full M-dimensional ellipsoid
Projected  dimension ellipsoid
Projected model vector
Projected covariance matrix
Chosen percentage point of
the 2 distribution
To find the 90% confidence ellipse for
(x,y) from a 3-D (x,y,z) ellipsoid
Can you see that this procedure gives the same formula for the 1-D
case obtained previously ?
27
Recap: Goodness of fit and model covariance
Once a best fit solution has been obtained we test goodness of fit with a
chi-square test (assuming Gaussian statistics)
If the model passes the goodness of fit test we may proceed to evaluating
model covariance (if not then your data errors are probably too small)
Evaluate model covariance matrix
Plot model or projections of it onto chosen subsets of parameters
Calculate confidence intervals using projected equation
Where D2 follows a 21 distribution
28
Robust data fitting with the L1 norm
Least squares solutions are not robust
Minimize
We can calculate an L1 solution with the IRLS algorithm
is a diagonal weighting matrix that depends on the model
See section 2.4 of Aster (2005)
29
Monte Carlo error propagation
Its possible to define an approximate p statistic for the L1 solution
and hence test goodness of fit of the solution. However there is no
analytical solution to error propagation.
…but Monte Carlo error propagation can be used.
1. Calculate data prediction from solution
2. Add random realization of noise to data
and repeat IRLS algorithm
3. Repeat Q times and generate difference vectors
30
Monte Carlo error propagation
For the ballistics problem we get
Compare to LS solution without outlier
31
What if we do not know the errors on the data ?
Both Chi-square goodness of fit tests and model covariance
Calculations require knowledge of the variance of the data.
What can we do if we do not know  ?
Consider the case of
CD = ¾2 I
Independent data errors
Calculated from least
squares solution
So we can still estimate model errors using the calculated data errors
but we can no long claim anything about goodness of fit.
32
Example: Over-determined, Linear discrete inverse
problem
MATLAB exercise
Generate data with Gaussian noise for a linear regression
problem and invert for the best fitting gradient and intercept
1. Generate xi points randomly between 0 and 10
2. Calculate data yi
3. Add N[0,] noise to data yi
4. Calculate G matrix
5. Use MATLAB matrix routines to solve the normal equations
6. Plot the data, plot the data errors and plot the least squares solution
33
Model Resolution matrix
If we obtain a solution to an inverse problem we can
ask what its relationship is to the true solution
m est = G ¡ gd
But we know
d = G m t r ue
and hence
m est = G ¡ gG m t r ue = R m t r ue
The matrix R measures how `good an inverse’ G-g is.
The matrix R shows how the elements of mest are built from
linear combination of the true model, mtrue. Hence matrix R
measures the amount of blurring produced by the inverse
operator.
For the least squares solution we have
¡ 1
¡ 1
G ¡ g = ( G T CD
G) ¡ 1 G T CD
5
)
R= I
34
Example: Model resolution in a tomographic
experiment
m = R m tr ue
If the calculated model resolution matrix looked like this
2
6
6
R= 6
4
0:75 ¡ 0:25 0:25
0:25
¡ 0:25 0:75
0:25
0:25
0:25
0:25
0:75 ¡ 0:25
0:25
0:25 ¡ 0:25 0:75
True model
3
7
7
7
5
Spike test
What units do the
elements of R have ?
Recovered model
35
Data Resolution matrix
If we obtain a solution to an inverse problem we can
ask what how it compares to the data
d pr e = G m est
But we know
m est = G ¡ gd obs
and hence
d pr e = GG¡ gd obs = D d obs
The matrix D is analogous to the model resolution matrix R
but measures how independently the model produced by G-g
can reproduce the data. If D = I then the data is fit exactly
and the prediction error d-Gm is zero.
36
Recap: Linear discrete inverse problems
The Least squares solution minimizes the prediction error.
Goodness of fit criteria tells us whether the least squares
model adequately fits the data, given the level of noise.
Chi-square with N-M degrees of freedom
The covariance matrix describes how noise propagates from
the data to the estimated model
Chi-square with M degrees of freedom
Gives confidence intervals
The resolution matrix describes how the estimated model
relates to the true model
37