Estimating uncertainty in surrogate

Download Report

Transcript Estimating uncertainty in surrogate

Uncertainty in fall time surrogate
• Prediction variance vs. data sensitivity
– Non-uniform noise
– Example 3.2.1
• Uncertainty in fall time data
• Bootstrapping
– Estimating accuracy of statistics
Linear Regression
• Functional form
• For linear approximation
n
yˆ    i i (x)
i 1
1  1  2  x
• Define x  i (x), then
yˆ  x b,
•
T
T
X
X
b

X
y
• Regression coefficients
( m )T
( m)
i
• Altogether yˆ (x)  x
( m )T
X X 
T
1
T
X y
• Differentiate with respect to ith component of y.

1
yˆ (x)
( m )T
T
T
 x
X
X
X


yi

i
Example 3.2.1
• Given data
• Linear fit
1 2 
1 1


X  1 0 


1
1


1 2 
1.5
1.5


y 0 
1.25 


1.75 
x
(m)
1 
 
x
5 0 
XTX  

0 10 
X -2
-1
Y -1.5
-1.5 0 1.25 1.75
X X 
T
x ( m )T  X T X  X T  0.1 2  2 x 2  x 2 2  x 2  2 x  
1
x ( m )T  X T X  X T y  0.925 x
1
1
0 1
2
 0.2 0.2 0.2 0.2 0.2 
XT  

 0.2 0.1 0 0.1 0.2 
yˆ
y
Prediction variance with variable noise
• Prediction variance based on assumptions on
1 ( m )
2 ( m )T
T
noise
Var[ yˆ (x)]   x
X X  x ,
• Variance of surrogate prediction
2
 yˆ (x)  2
Var[ yˆ (x)]   
 i
 yi 
• Allows different variances.
• Note that with different variances better to
use weighted least squares.
Comparison for example at x=3
• Prediction variance (surprisingly small, why?)
Var[ yˆ (x)]   x
2
( m )T
(m)
X
X
x
 
T
1
0.2 0  1 
2
2
2
  1 x  

0.2

0.1
x


1.1


  

 0 0.1  x 
2
• Variance of prediction
yˆ
 0.1 2  2 x 2  x 2 2  x 2  2 x   0.1 4 1 2 5 8
y
• If all data variances are the same, check you
get the same 1.1 2
• If not, variance of y5 is most important
Bootstrapping
• When we calculate statistics from random data
bootstrapping can provide error estimates.
• If we had multiple samples we can use them to estimate
the error in the computation.
• With bootstrapping we perform the amazing feat of getting
the error from a single sample.
• This is done by resampling with replacement the same
data.
• We draw a samples from the original data without
removing it so that the new sample may have repetitions.
• We repeat for many bootstrap samples to get a distribution
of the statistic of interest.
Example with sample mean
x=randn(1,10)
x =0.5377 1.8339 -2.2588 0.8622 0.3188 -1.3077 -0.4336 0.3426 3.5784
2.7694
[bootstat,bootsam]=bootstrp(1000,@mean,x);
bootsam(:,1:5) ans =
1
1
6
8
10
2
6
5
1
4
2
8
4
6
2
7
3
7
7
8
5
1
3
10
2
9
6
10
1
5
2
10
1
8
9
9
1
4
3
9
5
10
2
3
2
2
9
6
6
2
Each column contains the indices of one boot
strap sample. For example, the last column
indicates that we drew x(2)=1.8339 four times,
x(6) twice, along with x(3), x(5), x(9), and x(10).
What is the probability of
getting no repetitions?
Matlab bootstrp routine
• bootstat = bootstrp(nboot,bootfun,d1,...) draws nboot
bootstrap data samples, computes statistics on each
sample using bootfun, and returns the results in the
matrix bootstat. bootfun is a function handle specified
with @. Each row of bootstat contains the results of
applying bootfun to one bootstrap sample.
• [bootstat,bootsam] = bootstrp(...) returns an n-bynboot matrix of bootstrap indices, bootsam. Each
column in bootsam contains indices of the values that
were drawn from the original data sets to constitute
the corresponding bootstrap sample
Statistics for sample mean
mean(x) =0.6243
mean(bootstat)=0.6091
std(x) =1.7699
std(bootstat)=0.5191
• In this case we know that the standard deviation of
the mean is the native standard deviation divided by
the square root of the sample size, or about 0.56
var   x   Var  x  / n
• In other cases we may not have a formula. May use
bootstrapping to estimate accuracy of probability
Sample standard deviation
[bootstat,bootsam]=bootstrp(10000,@std,x);
mean(bootstat)=1.6387
std(bootstat)=0.3415
• Check ratio
a=randn(10,10000);
s=std(a);
mean(s) = 0.9728
std(s)=0.2302
• Bootstrap ratio is 0.208, actual ratio 0.237
Exercise
• The variables x and y are normally distributed
with N(0,1) marginal distributions and a
correlation coefficient of 0.7.
1. Generate a sample of 10 pairs and use bootstrap
to estimate the accuracy of the correlation
coefficient you obtain from the sample.
2. Compare to the accuracy you can get from a
formula or by repeating step 1 many times.