Uncertainty and Sensitivity Analysis of Complex Computer Codes

Download Report

Transcript Uncertainty and Sensitivity Analysis of Complex Computer Codes

Uncertainty and Sensitivity Analysis
of Complex Computer Codes
John Paul Gosling
University of Sheffield
Slide 1
Outline
 Uncertainty and computer models
 Uncertainty analysis (UA)
 Sensitivity analysis (SA)
mucm.group.shef.ac.uk
Slide 2
Why worry about uncertainty?
 How accurate are model predictions?
 There is increasing concern about uncertainty
in model outputs


Particularly where model predictions are used
to inform scientific debate or environmental
policy
Are model predictions robust enough for high
stakes decision-making?
mucm.group.shef.ac.uk
Slide 3
For instance…
 Models for climate change produce different
predictions for the extent of global warming or
other consequences



Which ones should we believe?
What error bounds should we put around
these?
Are model differences consistent with the error
bounds?
 Until we can answer such questions
convincingly, decision makers can continue to
dismiss our results
mucm.group.shef.ac.uk
Slide 4
Where is the uncertainty?
 Several principal sources of uncertainty



Accuracy of parameters in model equations
Accuracy of data inputs
Accuracy of the model in representing the real
phenomenon, even with accurate values for
parameters and data
 In this section, we will be concerned with the
first two
mucm.group.shef.ac.uk
Slide 5
Inputs
 We will interpret “inputs” widely




Initial conditions
Other data defining the particular context being
simulated
Forcing data (e.g. rainfall in hydrology models)
Parameters in model equations

These are often hard-wired (which is a problem!)
mucm.group.shef.ac.uk
Slide 6
Input uncertainty
 We are typically uncertain about the values of
many of the inputs



Measurement error
Lack of knowledge
Parameters with no real physical meaning
 However, we must have beliefs about the
parameters.
 The elicitation of these beliefs must be done in
a careful manner as there will often be no data
to contradict or support them.
mucm.group.shef.ac.uk
Slide 7
Output Uncertainty
 Input uncertainty induces uncertainty in the
output y
 It also has a probability distribution
 In theory, this is completely determined by


the probability distribution on x
and the model f
 In practice, finding this distribution and its
properties is not straightforward
mucm.group.shef.ac.uk
Slide 8
A trivial model
 Suppose we have just two inputs and a simple
linear model
y = x1 + 3*x2
 Suppose that x1 and x2 have independent
uniform distributions over [0, 1]

i.e. they define a point that is equally likely to be
anywhere in the unit square
 Then we can determine the distribution of y
exactly
mucm.group.shef.ac.uk
Slide 9
A trivial model – y’s distribution
0.4
0.3
0.2
0.1
0
1
2 y
3
4
 The distribution of y has this trapezium form
mucm.group.shef.ac.uk
Slide 10
A trivial model – y’s distribution
0.4
0.3
0.2
0.1
-1
0
1
2
y
3
4
5
 If x1 and x2 have normal distributions (x1, x2
~N(0.5, 0.252)), we get a normal output
mucm.group.shef.ac.uk
Slide 11
A slightly less trivial model
 Now consider the simple nonlinear model
y = sin(x1)/{1+exp(x1+x2)}
 We still have only 2 inputs and quite a simple
equation
 But even for nice input distributions, we cannot
get the output distribution exactly
 The simplest way to compute it would be by
Monte Carlo
mucm.group.shef.ac.uk
Slide 12
Monte Carlo output distribution
 This is for the normal inputs
 10,000 random normal pairs were generated
and y calculated for each pair
mucm.group.shef.ac.uk
Slide 13
Uncertainty analysis (UA)
 The process of characterising the distribution
of the output y is called uncertainty analysis
 Plotting the distribution is a good graphical
way to characterise it
 Quantitative summaries are often more
important



Mean, median
Standard deviation, quartiles
Probability intervals
mucm.group.shef.ac.uk
Slide 14
UA of slightly nonlinear model
 Mean = 0.117, median = 0.122
 Std. dev. = 0.049
 50% range (quartiles) = [0.093, 0.148]
 95% range = [0.002, 0.200]
mucm.group.shef.ac.uk
Slide 15
UA versus plug-in
 Even if we just want to estimate y, UA does
better than the “plug-in” approach of running
the model for estimated values of x

For the simple nonlinear model, the central
estimates of x1 and x2 are 0.5, but
sin(0.5)/(1+exp(1)) = 0.129
is a slightly too high estimate of y compared
with the mean of 0.117 or median of 0.122
 The difference can be much more marked for
highly nonlinear models
mucm.group.shef.ac.uk
Slide 16
Summary
Why UA?
 Proper quantification of output uncertainty

Need proper probabilistic expression of input
uncertainty
 Improved central estimate of output

Better than the usual plug-in approach
mucm.group.shef.ac.uk
Slide 17
Which inputs affect output most?
 This is a common question
 Sensitivity analysis (SA) attempts to address it
 There are various forms of SA
 The methods most frequently used are not the
most helpful!
mucm.group.shef.ac.uk
Slide 18
Local sensitivity analysis
 To measure the sensitivity of y to input xi,
compute the derivative of y with respect to xi
 Nonlinear model:

At x1 = x2 = 0.5, the derivatives are
wrt x1, 0.142; wrt x2, –0.094
 What does this tell us?
mucm.group.shef.ac.uk
Slide 19
Local SA – deficiencies
 Derivatives evaluated at the central estimate

Could be quite different at other points nearby
 Doesn’t capture interactions between inputs

E.g. sensitivity of y to increasing both x1 and x2
could be greater or less than the sum of their
individual sensitivities
 Not invariant to change of units
mucm.group.shef.ac.uk
Slide 20
One-way SA
 Vary inputs one at a time from central estimate
 Nonlinear model:


Vary x1 to 0.25, 0.75, output is 0.079, 0.152
Vary x2 to 0.25, 0.75, output is 0.154, 0.107
 Is this really a good idea?
mucm.group.shef.ac.uk
Slide 21
One-way SA – deficiencies
 Depends on how far we vary each input

Relative sensitivities of different inputs change
if we change the ranges
 Also fails to capture interactions

Statisticians have known for decades that
varying factors one at a time is bad
experimental design!
mucm.group.shef.ac.uk
Slide 22
Multi-way SA
 Vary factors two or more at a time


Maybe statistical factorial design
Full factorial designs require very many runs
 Can find interactions but hard to interpret

Often just look for the biggest change of output
among all runs
 Still dependent on how far we vary each input
mucm.group.shef.ac.uk
Slide 23
Probabilistic SA (PSA)
 Inputs varied according to their probability
distributions

As in UA
 Gives an overall picture and can identify
interactions
mucm.group.shef.ac.uk
Slide 24
Variance decomposition
 One way to characterise the sensitivity of the output to
individual inputs is to compute how much of the UA
variance is due to each input
 For the simple non-linear model, we have
Input
Contribution
X1
80.30 %
X2
16.77 %
X1.X2 interaction
mucm.group.shef.ac.uk
2.93 %
Slide 25
Main effects
 We can also plot the effect of varying one input
averaged over the others
 Nonlinear model


Averaging y = sin(x1)/{1+exp(x1+x2)} with
respect to the uncertainty in x2, we can plot it as
a function of x1
Similarly, we can plot it as a function of x2
averaged over uncertainty in x1
 We can also plot interaction effects
mucm.group.shef.ac.uk
Slide 26
Nonlinear example – main effects
0.15
1
2
y
0.10
0.05
0.00
0.0
0.5
1.0
x
 Red is main effect of x1 (averaged over x2)
 Blue is main effect of x2 (averaged over x1)
mucm.group.shef.ac.uk
Slide 27
Summary
Why SA?
 For the model user: identifies which inputs it
would be most useful to reduce uncertainty
about
 For the model builder: main effect and
interaction plots demonstrate how the model is
behaving

Sometimes surprisingly!
mucm.group.shef.ac.uk
Slide 28
What’s this got to do with emulators?
 Computation of UA and (particularly) SA by
conventional methods (like Monte Carlo) can
be an enormous task for complex
environmental models



Typically at least 10,000 model runs needed
Not very practical when each run takes 1
minute (a week of computing)
And out of the question if a run takes 30
minutes
 Emulators use only a fraction of model runs,
and their probabilistic framework helps keep
track of all the uncertainty.
mucm.group.shef.ac.uk
Slide 29
What’s to come?
 GEM-SA is the first stage of the GEM project
 GEM = “Gaussian Emulation Machine”
 It uses highly efficient emulation methods
based on Bayesian statistics
 The fundamental idea is that of “emulating” the
physical model by a statistical representation
called a Gaussian process
 GEM-SA does UA and SA

Future stages of GEM will add more
functionality
mucm.group.shef.ac.uk
Slide 30
References
 There are two papers that cover the material in
these slides:
Oakley and O’Hagan (2002). Bayesian inference
for the uncertainty distribution of computer
model outputs, Biometrika, 89, 769—784.
Oakley and O’Hagan (2004). Probabilistic
sensitivity analysis of complex models: a
Bayesian approach, J. R. Statist. Soc. B, 66,
751—769.
mucm.group.shef.ac.uk
Slide 31