GEM-SA: a tutorial

Download Report

Transcript GEM-SA: a tutorial

GEM-SA: a tutorial
John Paul Gosling
University of Sheffield
Slide 1
Overview
 GEM-SA:
Gaussian Emulation Machine for Sensitivity
Analysis
 It’s a Windows based program that has a
graphical interface created by Marc Kennedy
during his time in CTCD
 It does emulation for prediction, uncertainty
analysis and sensitivity analysis
 It also has a facility to create experimental
designs for the analysis of computer models.
mucm.group.shef.ac.uk
Slide 2
Starting the program
 On the desktop, there is a folder <GEM-SA
tutorial>, opening it will reveal two other
folders:
 Inside the folder <GEM-SA1.1> is the
program:
 Double-clicking this will start the program
mucm.group.shef.ac.uk
Slide 3
Main window
menu
toolbar
Sensitivity Analysis output grid
log window
mucm.group.shef.ac.uk
Slide 4
Generating input designs
Press this button to create a file of
inputs for your computer model
 There are two designs available: LP-TAU and
Maximin Latin Hypercube. Both have good
space filling properties.
mucm.group.shef.ac.uk
Slide 5
Generating input designs
 Then we specify ranges over which the input
will be of interest
 These must cover your beliefs about the range
of each input
mucm.group.shef.ac.uk
Slide 6
The design
 Here’s a 50-point LP-TAU design for three inputs
 You’ll also find they’ve been written to the file you
specified (LP_TAU50.txt) in GEM-SA’s working
directory
mucm.group.shef.ac.uk
Slide 7
Creating/Editing a project
 Now, we’ll run through some of the options
available to us for emulator building.
 We can create a new project or edit an existing
project by selecting the appropriate item from
the project menu.
 Or we can use these toolbar buttons.
New
Edit
mucm.group.shef.ac.uk
Slide 8
Edit Project - Files
Names of
input files
Names of
output files
mucm.group.shef.ac.uk
Slide 9
Edit Project - Options
Edit input
names
How many
inputs?
mucm.group.shef.ac.uk
Slide 10
Edit Project - Options
What
should be
calculated,
and how?
Which joint
effects should
be calculated?
mucm.group.shef.ac.uk
Slide 11
Edit Project - Options
What prior
mean for the
output?
Are the
inputs
uncertain?
mucm.group.shef.ac.uk
Slide 12
Edit Project - Options
What kind of
predictions
and cross
validation?
mucm.group.shef.ac.uk
Slide 13
Edit Project - Simulations
MCMC
control
parameters
Number of
realisations for
prediction and
ME/JE
How many points used
to calculate main
effects, joint effects
mucm.group.shef.ac.uk
Slide 14
Input names
 By clicking the <Names…> button, a window
opens that allows us to name each of the
inputs.
 This can be handy when viewing the variance
decomposition results and main effects plots.
mucm.group.shef.ac.uk
Slide 15
Distributions for inputs
 When we click the <OK> button, the following
window opens.
 This windows allows us to specify our beliefs
about the inputs.
mucm.group.shef.ac.uk
Slide 16
A first run through
 Consider the simple nonlinear model we saw
earlier
y = sin(x1)/{1+exp(x1+x2)}
 We have 2 inputs, x1 and x2, and we assume
they both must be valued in the range [0,1].
 20 points will give us a decent coverage of the
unit square that is the input space here.
 Two files have already been saved in the
folder <Examples\Eg1> to help save us time.
mucm.group.shef.ac.uk
Slide 17
Monte Carlo method
 Here’s the result of a Monte Carlo analysis
using 30 input pairs.
 Mean = 0.139, median = 0.142
 Std. dev. = 0.053
 Variance = 0.0028
mucm.group.shef.ac.uk
Slide 18
Monte Carlo method
 Here’s the result of a Monte Carlo analysis
using 10,000 input pairs.
 Mean = 0.114, median = 0.115
 Std. dev. = 0.054
 Variance = 0.0029
mucm.group.shef.ac.uk
Slide 19
Prediction
 Predictions can be

Correlated realisations of outputs at the
prediction inputs


Similar to main effect outputs
Marginal means and variances of outputs at the
prediction inputs


Faster to compute, especially with many prediction
points
Easy to interpret
mucm.group.shef.ac.uk
Slide 20
A plot of the predictions
 Here is the prediction output files plotted with
the real function with x2 fixed at 0.5.
mucm.group.shef.ac.uk
Slide 21
Cross validation
 Choice of none, leave-one-out or leave final
20% out
 Leave-one-out

Hyperparameters use all data and are then
fixed when prediction is carried out for each
omitted point
 Leave final 20% out

Hyperparameters are estimated using the
reduced data subset
mucm.group.shef.ac.uk
Slide 22
A real example
 A dynamic vegetation model is being used to
predict the NBP of deciduous broadleaf
woodland in the vicinity of Whitby, North
Yorkshire.
 The scientists are uncertain about ten inputs of
the model and want to know how this
uncertainty affects the NBP output of the
model – Monte Carlo methods are out of the
question as the model is too complex.
 When they used their best guesses for these
inputs, the model returned a NBP of
146.4gC/m2.
mucm.group.shef.ac.uk
Slide 23
The input names in order
 Maximum age (years)
N(200,625)
 Water potential (M Pa)
N(3,0.25)
 Leaf life span (days)
N(190,1600)
 Leaf mortality index
N(0.005,6.25e-6)
 Bud burst limit (degree days)
N(135,6.25)
 Seeding density (m2)
N(0.1,0.0001)
 Soil sand (%)
N(43.27,222.12)
 Soil clay (%)
N(22.36,49.21)
 log(stem growth rate)
N(-5.116,0.041209)
 Bulk density
N(1.214,0.0325)
mucm.group.shef.ac.uk
Slide 24
Main effects plots
 The plug-in estimate of the NBP is far away
from our mean for NBP as the main effect plot
for bulk density is concave around it’s
expected value of 1.214.
mucm.group.shef.ac.uk
Slide 25
Producing main/joint effects plots for
publication
 In the files section of the edit project window,
there are two fields that allow the user to
specify where the main/joint effects data
should be written.
 These files can be used to produce graphs like
the one I showed earlier.
 The main effects file is structured as follows:


There are a number of blocks of function
realisations – one for each input.
These are controlled by
mucm.group.shef.ac.uk
Slide 26
Limitations of GEM-SA
 In theory, the methods used by GEM-SA are
limitless; however, the program itself isn’t.
 It can handle up to 30 inputs and 400 training
data.
 Also, the distributions that are used to express
our uncertainty about the inputs are limited to
uniform or normal.
mucm.group.shef.ac.uk
Slide 27
When it all goes wrong…
 How do we know when the emulator is not
working?

Large roughness parameters



Especially ones hitting the limit of 99
Large emulation variance on UA mean
Poor CV standardised prediction error

Especially when some are extremely large
 In such cases, see if a larger training set helps

Other ideas like transforming output scale
mucm.group.shef.ac.uk
Slide 28
Where to find the program
 GEM-SA is available on the web along with
tutorial slides from a longer course and further
example data sets.
 Links to it can be found on my website where
there is also a technical report explaining the
perils of using the “plug-in” approach:
j-p-gosling.staff.shef.ac.uk
mucm.group.shef.ac.uk
Slide 29