Prediction variance

Download Report

Transcript Prediction variance

Sampling plans

• • • • • Given a domain, we can reduce the prediction error by good choice of the sampling points The choice of sampling locations is called “design of experiments” or DOE The simplest DOE is full factorial design where we sample each variable (factor) at a fixed number of values (levels) Example, with four factors and three levels each we will sample 81 points Full factorial design is not practical except for low dimensions

• • • • • •

Prediction variance for full factorial design

Recall that standard error (square root of prediction variance is

s y

  ˆ

x

T X X

  1

x

We start with simple design domain: Box Cheapest full factorial design: two levels (not good for quadratic polynomials) For a linear polynomial standard error is then  ˆ

s y

n

2 1 

x

1 2 

x

2 2

x n

2 2 Maximum error at vertices

s y

  ˆ

n

 1 2

n

Why do we get this result?

Designs for linear RS

• • • • Traditionally use only two levels Orthogonal design when X

T

X is diagonal Full factorial design is orthogonal, not so easy to produce other orthogonal designs with less points.

Stability: Small variation of prediction variance in domain is also desirable property

Example 4.2.2

• • • Compare an orthogonal array based on equilateral triangle  2 ,     to right triangle at vertices (both are saturated) 1.5

Linear polynomial y=b

1 +b 2 x 1 +b 3 x 2 X

    1 3 2 1   1 1 3 2 0 1 2 2 2     

T X X

   3 0 0 0 3 0 0 0 3 For right triangle we obtained     -0.5

1 0.5

0 -1 -1 -0.8

-0.6

-0.4

-0.2

0 0.2

0.4

0.6

0.8

1

Comparison

• • • • • • Prediction variances for equilateral triangle

x

 1  

x x

1 2 ,

x

T X X

  1

x

 1 3  1 

x

1 2 

x

2 2  .

The maximum variance at (1,1) is three times larger than the lowest one.

For right triangle  0.5(1   1

x

2 

x

1 2 

x x

1 2 

x

2 2 ) Maximum variance is nine times the lowest A fairer comparison is when we restrict triangle to lie inside box 3) (0,1) The prediction variance is doubled. Maximum error and stability are still better, but variance in coefficients is not as good.

Quadratic RS

• • • • • • • Need at least (n+1)(n+2)/2 points Need at least three points in every direction Simplest DOE is three-level, full factorial design Impractical for n>5 Also unreasonable ratio between number of points and number of coefficients For example, for n=8 we get 6561 samples for 45 coefficients.

My rule of thumb is that you want twice as many points as coefficients

Central Composite Design

• • • • Includes 2

n

vertices, 2n face points plus n

c

repetitions of central point Distance of face point α varies Can choose α so to – achieve spherical design – achieve rotatibility (prediction variance is a spherical function) – Stay in box (face centered) FCCCD Still impractical for n>8

From Myers and Montgomery’s Response Surface Methodology. Figure 7.4 in 1995 edition (Fig. 7.5 on next slide)

Spherical CCD

Spherical CCD for n=3

Repeated observations at origin

• • • • • Unlike linear designs, prediction variance high at origin Repetition at origin decreases variance there and improves stability What other rational for choosing the origin for repetition?

Repetition also gives an independent measure of magnitude of noise Can be used also for lack-of-fit tests

Without repetition (9 points)

Contours of prediction variance for spherical CCD design.

From Myers and Montgomery’s Response Surface Methodology. Figure 7.10 in 1995 edition (Fig. 7.11 on next slide)

.

Center repeated 5 times (13 points)

d=ccdesign(2,'center', 'uniform') d = -1.0000 -1.0000

-1.0000 1.0000

1.0000 -1.0000

1.0000 1.0000

-1.4142 0 1.4142 0 0 -1.4142

0 1.4142

0 0 0 0 0 0 0 0 0 0

Variance optimal designs

• • • • • • Full factorial and CCD are not flexible in number of points Standard error in coefficients

se

(

b i

)   ˆ 

X T X

ii

 1 A key to most optimal DOE methods is moment matrix

M

T n y

 A good design of experiments will maximize the terms in this matrix, especially the diagonal elements D-optimal designs maximize determinant of moment matrix Inversely proportional to square of volume of confidence region on coefficients

Example

• • Given the model y=b

1 x 1 +b 2 x 2

, and the two data points (0,0) and (1,0), find the optimum third data point (p,q) in the unit square.

We have

X

     0 1

p

0 0

q

      1 

pq p

2

pq q

2   det( ) 

q

2 • • So that the third point is (p,1), for any value of p Finding D-optimal design in higher dimensions is a difficult optimization problem often solved heuristically

Matlab example

>> ny=6;nbeta=6; >> [dce,x]=cordexch(2,ny,'quadratic'); >> dce' 1 1 -1 -1 0 1 -1 1 1 -1 -1 0 scatter(dce(:,1),dce(:,2),200,'filled') >> det(x'*x)/ny^nbeta ans = 0.0055

1 0.8

0.6

0.4

0.2

0 -0.2

-0.4

-0.6

-0.8

-1 -1 -0.8

-0.6

-0.4

-0.2

0 0.2

0.4

0.6

0.8

1

With 12 points:

>> ny=12; >> [dce,x]=cordexch(2,ny,'quadratic'); >> dce' -1 1 -1 0 1 0 1 -1 1 0 -1 1 1 -1 -1 -1 1 1 -1 -1 0 0 0 1 scatter(dce(:,1),dce(:,2),200,'filled') >> det(x'*x)/ny^nbeta ans =0.0102

-0.2

-0.4

-0.6

-0.8

-1 -1 1 0.8

0.6

0.4

0.2

0 -0.8

-0.6

-0.4

-0.2

0 0.2

0.4

0.6

0.8

1

Other criteria

• • A-optimal minimizes trace of inverse of moment matrix, minimizes the sum of the variances of the coefficients G-optimality minimizes the maximum of the prediction variance.

Example

• • For the previous example, find the A-optimal design

X

     0 1

p

0 0

q

      1 

p pq

2    1     1 Minimum at (0,1) 1

q

2   

q

2

pq

1 

p

2  1 

pq p

2  

q

2

pq q

2   det( ) 

q

2