Simple Method for Outlier Detection in Fitting
Download
Report
Transcript Simple Method for Outlier Detection in Fitting
On Sequential Experimental Design
for Empirical Model-Building
under Interval Error
Sergei Zhilin,
[email protected]
Altai State University,
Barnaul, Russia
Outline
• Regression under interval error
• Experimental design: refining context
• Classical and “interval” design optimality criteria
• Sequential experimental design for regression
models under interval error
• Comparative simulation study of classical and
“interval” sequential design procedures
• Conclusions
2
Regression under Interval Error
x = (x1,…,xp)
measured
without error
…
• Model structure
x1
Input variables
x2
T
x T
xp
Linear-parameterized
modeling function
+
Model parameters
to be estimated
y
Output variable
y
measured with
error
Measurement error
• “Interval” error means “unknown but bounded”:
[ , ]
3
Regression under Interval Error
• Each row (xj , yj , j) of the measurements
table constrains possible values of the
parameter with the set
Sj
y j j x j y j j , j 1,...,n.
• Values of the parameter consistent with
all constraints form an uncertainty set
n
A S j
j 1
4
Regression under Interval Error
• Fitting data with the model y = 1 + 2x
In (1, 2) domain
In (x, y) domain
y
Set of feasible
models
Set of feasible
models
x
2
Uncertainty set A is
unbounded =
Uncertainty set A
not enough data to
build the model
Uncertainty set A
1
5
Regression under Interval Error
• Problems that may be stated with respect
to uncertainty set A
• Model parameters estimation
• Interval estimates of
2
IA [ 1 , 1 ] ...[ p , p ] :
i min i , i max i ,
A
i 1,..., p.
A
• Point estimates of
1 ,..., p :
2
^ 2
2
i 12 i i , i 1,..., p.
1
^ 1
1
1
6
Regression under Interval Error
• Problems that may be stated with respect
to uncertainty set A
• Prediction of the output variable value for
fixed values of input variables
• Interval estimate of y
y( x) y( x), y( x):
y( x) min T x,
A
y
y(x)
^y(x)
y(x)
y( x) max T x,
A
• Point estimate of y
y ( x) 1 y ( x ) y ( x)
2
x
x
7
Experimental Design:
Refining Context
• Product or process optimization
• Model quality optimization
Design for N
observations
Experiment
Analysis
End
Begin
– Simultaneous experimental design
Begin
– Sequential experimental design
Analysis
Experiment
(Is the model quality
satisfactory?)
Design for ~1
observation
End
8
Experimental Design for
Regression under Interval Error
• Notations
y xT , x
R p – design space
x1T
X – design matrix
xnT
y1
Y – measurements
yn
1
E – error bounds
n
– model
T
M
X
X
1
D M 1
1
0
d ( x) xT Dx
– information
matrix
– covariance
matrix
– standardized
variance function
of y(x,)
9
Experimental Design for
Regression under Interval Error
• Design optimality criteria
– Classical
Name
D -optimality
G -optimality
Minimizes
Depend only on X,
hence are applicable for
D =interval
(XTX)–1error as well
det D
(volume of joint confidence interval)
max d ( x)
(maximal variance of prediction)
x
IE=
- and
TDxIG-optimality
d(x)
x
– Interval (by M.P. Dyvak)
are equivalent for
Name
Minimizes
spherical design space
ID -optimality
squared volume of Aand n > p
IE -optimality
IG -optimality
squared maximal diagonal of A
maximal prediction error
10
Experimental Design for
Regression under Interval Error
• Motivation
– Classical methods of experimental design
use only an information which X brings, nor Y, nor E
– Interval methods of experimental design developed
by Dyvak work for saturated designs (p=n) and
use X and E, nor Y.
– Does using of information, which Y contains, allow
to improve the quality of constructed model or
to increase the “speed” of sequential experimental
design procedure?
11
Experimental Design for
Regression under Interval Error
• How to use the information which Y brings?
xnext = IEDesign( , X, Y, E)
1. Find out the direction a
of maximal spread of A:
Uncertainty set A(X,Y,E)
2
{1* , 2*} arg max 1 2 ,
a
*
1
*
2
1 , 2 A
2. Next experimental point xnext
is selected in such a way that it
• induces the constraint
orthogonal to a
• has maximal norm (width of
constraint w 2 xnext )
xnext k *a, k * max | k |
kR , ka
w
1
12
Experimental Design for
Regression under Interval Error
• IE-optimal sequential design
(X0, Y0, E0) – initial dataset
i = 0;
repeat
x = IEDesign( , Xi, Yi, Ei);
13
Experimental Design for
Regression under Interval Error
• IE-optimal sequential design
(X0, Y0, E0) – initial dataset
i = 0;
repeat
x = IEDesign( , Xi, Yi, Ei);
y = measurement in x with error ;
X
Y
E
X i 1 Ti ; Yi 1 i ; Ei 1 i ;
y
x
i = i + 1;
until i > N or IA(Xi, Yi, Ei) is small;
14
Experimental Design for
Regression under Interval Error
• Simulation study 1. Comparison of IE- and D-optimal
sequential designs under zero errors
x R
2
0.26 0.61
x x 1 , (1, 2) , 0.4, X 0 0.59 0.24
0.49 0.31
T
IE-optimal sequential design
i0
repeat
Yi X i
xnext I E Design , X i , Yi ,
Xi
X i 1
x
next
i i 1
until i > 9
T
D-optimal sequential design
i0
repeat
xnext DDesign , X i
X
X i 1 i
xnext
i i 1
until i > 9
15
Experimental Design for
Regression under Interval Error
• Simulation study 1. D-optimal sequential design results
Variables domain
Parameters domain
1,5,9
1
3
3,7
2
2.5
0.5
2
0
-0.5
1.5
2,6,10
-1
4,8
-1
-0.5
0
0.5
1
1
0
0.5
1
1.5
2
Volume(A) = 0.6400 42
IA = [0.45, 1.55][1.45, 2.55]
Volume(IA) = 1.21
16
Experimental Design for
Regression under Interval Error
• Simulation study 1. IE-optimal sequential design results
Variables domain
Parameters domain
3
1
2
2.5
0.5
0
2
-0.5
1.5
-1
-1
-0.5
0
0.5
1
1
0
0.5
1
1.5
2
Volume(A) = 0.5077 2
IA = [0.59, 1.41][1.60, 2.40]
Volume(IA) = 0.66
17
Experimental Design for
Regression under Interval Error
• Simulation study 2. Comparison of IE- and D-optimal
sequential designs under error which follows truncated
normal distribution
x R d xT x 1 , (1, 2)T , 0.4,
X 0 { 3 uniformly distributed points from
}
Errors are simulated by N T ( ) – truncated normal distribution
N T ( )
3s
18
Experimental Design for
Regression under Interval Error
Simulation study 2
k 0;
for r = 1 to 1500 do
i 0; Ξ0 { 3 random values from N T ( ) };
X 0 { 3 uniformly distributed points from }; Y0 X 0 Ξ0 ;
X 0D X 0 ; Y0D Y0 ;
X 0I X 0 ; Y0I Y0 ;
repeat
random value from N T ( ) ;
x I I E Design
y I x I
X iI
Yi I
I
X I ; Yi 1 I ;
x
y
i i 1;
I
i 1
, X iI , Yi I , ;
x D DDesign , X iD ;
y D x D
X
D
i 1
X iD
Yi D
D
D ; Yi 1 D ;
x
y
until i > N
if VolumeIAX NI , YNI , VolumeIAX ND , YND , then k k 1;
end for
19
Experimental Design for
Regression under Interval Error
Simulation study 2. Results for x R x x 1,
T
2
•
Number of winnings
k, (1500 – k)
1500
100%
90%
1250
80%
70%
1000
60%
50%
750
40%
500
30%
20%
250
IE-Design
10%
D-Design
0
0
5
10
15
Number of selected points
20
N
25
0%
20
Experimental Design for
Regression under Interval Error
Simulation study 2. Results for x R x x 1,
T
3
•
Number of winnings
k, (1500 – k)
1500
100%
90%
1250
80%
70%
1000
60%
50%
750
40%
500
30%
20%
250
IE-Design
10%
D-Design
0
0
5
10
15
Number of selected points
20
N
25
0%
21
Experimental Design for
Regression under Interval Error
• The “cost” of IE-optimal design
– The problem of finding maximal spread direction of A
{1* , 2*} arg max 1 2
1 , 2 A
is a concave quadratic programming problem (CQPP)
– It is proved that CQPP is NP-hard, i.e. solving
time of the problem exponentially depends on its
dimension (the number of input variables p)
– To overcome the difficulties we need to use special
computational means (such as parallel computers) or
we can limit ourself with near-optimal solutions
22
Conclusions
• Interval model of error allows to use the information
about measured values of output variable for effective
sequential experimental design
• The results of the performed simulation study give a
cause for careful analytical investigation of properties of
IE-optimal sequential design procedures
• IE-optimal sequential design for high-dimensional
models demands for special computational techniques
23