Uncertainty in Computer Models

Download Report

Transcript Uncertainty in Computer Models

Gaussian process modelling
Outline
Emulators
The basic GP emulator
Practical matters
GEM-SA course - session 2
2
Emulators
Simulator, meta-model, emulator
I’ll refer to a computer model as a simulator
It aims to simulate some real-world phenomenon
A meta-model is a simplified representation or
approximation of a simulator
Built using a training set of simulator runs
Importantly, it should run much more quickly than the simulator
itself
So it serves as a quick surrogate for the simulator, for any task
that would require many simulator runs
An emulator is a particular kind of meta-model
More than just an approximation, it makes fully probabilistic
predictions of what the simulator would produce
And those probability statements correctly reflect the training
information
GEM-SA course - session 2
4
Meta-models
Various kinds of meta-models have been
proposed by modellers and model users
Notably regression models and neural networks
But misrepresent
training data
Line does not pass
through the points
Variance around the
line also has the
wrong form
GEM-SA course - session 2
5
Emulation
Desirable properties for a meta-model
If asked to predict the simulator output at one of the training data
points, it returns the observed output with zero variance
Assuming the simulator output doesn’t have random noise
So it must be sufficiently flexible to pass through all the training
data points
Not restricted to some regression form
If asked to predict output at another point its predictions will have
non-zero variance, reflecting realistic uncertainty
Given enough training data it should be able to predict simulator
output to any desired accuracy
These properties characterise what we call an emulator
GEM-SA course - session 2
6
2 code runs
Consider one input and one output
Emulator estimate interpolates data
Emulator uncertainty grows between data points
dat2
10
5
0
0
1
2
3
4
5
6
x
GEM-SA course - session 2
7
3 code runs
Adding another point changes estimate and
reduces uncertainty
dat3
10
5
0
0
1
2
3
4
5
6
x
GEM-SA course - session 2
8
5 code runs
And so on
9
8
7
dat5
6
5
4
3
2
1
0
0
1
2
3
4
5
6
x
GEM-SA course - session 2
9
The basic GP emulator
Gaussian processes
A Gaussian process (GP) is a probability distribution for
an unknown function
A kind of infinite dimensional multivariate normal distribution
If a function f(x) has a GP distribution we write
f(.) ~ GP(m(.), c(.,.))
m(.) is the mean function
c(.,.) is the covariance function
f(x) has a normal distribution with mean m(x) and variance c(x,x)
c(x,x') is the covariance between f(x) and f(x')
A GP emulator represents the simulator as a GP
Conditional on some unknown parameters
Estimated from the training data
GEM-SA course - session 2
11
The mean function
The emulator’s mean function provides the central
estimate for predicting the model output f(x)
It has two parts
1.
A conventional regression component
r(x) = μ + β1h1(x) + β2h2(x) + …+βphp(x)
The regression terms hj(x) are a modelling choice
– Should reflect how we expect the simulator to respond to its inputs
– E.g. r(x) = μ + β1x1 + β2x2 + …+βpxp models a general linear trend
The coefficients μ and βj are estimated from the training data
2.
A smooth interpolator of the residuals yi – r(xi) at the training
points
Smoothness is controlled by correlation length parameters
Also estimated from the training data
GEM-SA course - session 2
12
The mean function – example
20
8
6
15
4
r
y
2
10
0
0
-2
-4
5
-6
-8
0
1
2
3
4
5
0
1
x
Red dots are training data
2
3
4
5
x.
Green line is regression line
Red dots are residuals from
regression through training data
Black line is emulator mean
Black line is smoothed residuals.
GEM-SA course - session 2
13
The prediction variance
The variance of f(x) depends on where x is
relative to training data
At a training data point, it is zero
Moving away from a training point, it grows
Growth depends on correlation lengths
When far from any training point (relative to
correlation lengths), it resolves into two components
1.
2.
The usual regression variance
An interpolator variance
– Estimated from observed variance of residuals
The mean function is then just the regression part
GEM-SA course - session 2
14
Correlation length
Correlation length parameters are crucial
But difficult to estimate
There is one correlation length for each input
Points less than one correlation length away in a single input are highly
correlated
Learning f(x') says a lot about f(x)
So if x' is a training point, the predictive uncertainty about f(x) is small
But if we go more than about two correlation lengths away, the
correlation is minimal
We now ignore f(x') when predicting f(x)
Just use regression
Large correlation length signifies an input with very smooth and
predictable effect on simulator output
Small correlation length denotes an input with more variable and
fine scale influence on the output
GEM-SA course - session 2
15
Correlation length and variance
Examples of GP realisations. GEM-SA uses a roughness parameter b
which is the inverse square of correlation length. σ2 is the interpolation
variance.
GEM-SA course - session 2
16
Practical matters
Modelling
The main modelling decision is to choose the regression
terms hj(x)
Want to capture the broad shape of the response of the
simulator to its inputs
Then residuals are small
Emulator predicts f(x) with small variance
And predicts realistically for x far from training data
If we get it wrong
Residuals will be unnecessarily large
Emulator has unnecessarily large variance when interpolating
And extrapolates wrongly
GEM-SA course - session 2
18
Design
Another choice is the set of training data points
This is a kind of experimental design problem
We want points spread over the part of the input space
for which the emulator is needed
So that no prediction is too far from a training point
We want this to be true also when we project the points
into lower dimensions
So that prediction points are not too far from training points in
dimensions (inputs) with small correlation lengths
We also want some points closer to each other
To estimate correlation lengths better
Conventional designs don’t take account of this yet!
GEM-SA course - session 2
19
Validation
No emulator is perfect
The GP emulator is based on assumptions
A particular form of covariance function parametrised by just one
correlation length parameter per input
Homogeneity of variance and correlation structure
Simulators rarely behave this nicely!
Getting the regression component right
Normality
Not usually a big issue
Estimating parameters accurately from the training data
Can be a problem for correlation lengths
Failure of these assumptions will mean the emulator
does not predict faithfully
f(x) will too often lie outside the range of its predictive distribution
So we need to apply suitable diagnostic checks
GEM-SA course - session 2
20
When to use GP emulation
The simulator output should vary smoothly in response
to changing its inputs
Discontinuities are difficult to emulate
Very rapid and erratic responses to inputs also may need
unreasonably many training data points
The simulator is computer intensive
So it’s not practical to run many thousands of times for Monte
Carlo methods
But not so that we can’t run it a few hundred times to build a
good emulator
Not too many inputs
Fitting the emulator is hard
Particularly if more than a few inputs influence the output
strongly
GEM-SA course - session 2
21
Stochastic simulators
Throughout this course we are assuming the
simulator is deterministic
Running it again at the same inputs will produce the
same outputs
If there is random noise in the outputs we can
modify the emulation theory
Mean function doesn’t have to pass through the data
Noise increases predictive variance
The benefits of the GP emulator are less compelling
But we are working on this!
GEM-SA course - session 2
22
References
1.
2.
3.
O'Hagan, A. (2006). Bayesian analysis of
computer code outputs: a tutorial. Reliability
Engineering and System Safety 91, 1290-1300.
Santner, T. J., Williams, B. J. and Notz, W. I.
(2003). The Design and Analysis of Computer
Experiments. New York: Springer.
Rasmussen, C. E., and Williams, C. K. I.
(2006). Gaussian Processes for Machine
Learning. Cambridge, MA: MIT Press.
GEM-SA course - session 2
23