Getting started with GEM-SA Marc Kennedy Tony O’Hagan, Jeremy Oakley

Download Report

Transcript Getting started with GEM-SA Marc Kennedy Tony O’Hagan, Jeremy Oakley

Getting started with GEM-SA
Marc Kennedy
Central Science Laboratory, York
Tony O’Hagan, Jeremy Oakley
University of Sheffield
Part 1: Getting started




Starting GEM-SA program
Creating input and output files
Explanation of the menus, toolbars, etc.
Description of the project window
Starting GEM-SA
 Double-click the GEM-SA icon to start
 The main window appears, with
– Menu
– Toolbar
– Sensitivity analysis output grid
 Tab windows for other types of output
– Log window
menu
toolbar
Sensitivity analysis
output grid
Log window
Toolbar icons
New project
Open project
Save project
Print output report
Edit project
Generate input design
points
Rescale an input
Standardise design
Copy input design to
clipboard
Convert input to integer
Run the analysis
Help
Sensitivity analysis output grid
 This will report the sensitivity results after the
analysis is complete
– One line for each input parameter
– One line for each pair of inputs, if joint
effects are selected
Log Window output
 Tells us
– Which training data are being loaded/saved
– Transformations applied to the data
– Fitted Gaussian process parameters
– Summary of uncertainty analysis results
Creating a GEM project
 To build the emulator we first need 3 files:
– Data file of code inputs
– Data file of code outputs
– GEM-SA project file
Restrictions on input/output data
 Single output
– Multiple outputs must be treated individually
– GEM can read multiple outputs file, but a
single column is specified within a project
 Max 30 input parameters
 Max 400 training points
 The data files should be plain text files
– One line for each point
– Input file can be space or tab delimited
Generating a new input design
 Designs can be generated using the toolbar
icon
or the menu: Input  Generate…
 The design dialog appears
Generating a new input design
 Click OK and fill in the required range for each
input
 Click OK again
Editing input designs
 If you select a column, you can rescale values of that
input
or round values to be integers
 Designs can be loaded into or saved from this window
using the Inputs menu. Use to copy the points to the
clipboard for use in other programs
Types of design
 GEM-SA can generate 2 types of design
– LP-
– Maximin Latin Hypercube designs
 Both have good space-filling properties
– Ensure all regions of the input space are
well represented
 LP- quick to generate, good for increasing
input design sequentially
 MmLH can be better in high dimensions
Creating output data from these inputs
 Each row from the input design must be used
to generate a single output, e.g. using
– Spreadsheet
 Simple, but requires functional form
– Script
 Only need executable code
 Loop through inputs, modify code input file
– Modify code to loop through the points
 Can be difficult, need source code
Example: using a spreadsheet
 Copy the input design to
the clipboard using
 Open Excel and paste
inputs
 Create formula in final
column
 Copy formula for all
rows of the design
 Cut and paste special
(values) in a new sheet
 Save as text file
Example: using a script
 Read base input file (read by executable code)
 Loop through lines of input design file
– Replace selected inputs in base input file
– Run executable code with new input file
– Calculate single output and add to training
output file
The project window
 Appears whenever you
– Load a project
– Edit a project
– Create new project
 This window has 3 tabs
– Files
– Options
– Simulations
Names for
the input
files
Names for
the output
files
How many
inputs?
Which
column
from
output file?
What are
the input
names?
What
should be
calculated,
and how?
Which joint
effects
should be
calculated?
Are the
inputs
uncertain?
What prior
mean for
the output?
What kind of
prediction?
What kind of cross
validation?
MCMC
control
parameters
How many realisations
of predictions, main
and joint effects to
generate
How many points
used to calculate
main effects, joint
effects
Part 2
Uncertainty Analysis Using GEM-SA
Part 2: Outline
 Setting up the project
 Running a simple analysis
 More complex analyses
Setting up the project
Create a new project
 Select Project -> New,
or click toolbar icon
 Project dialog appears
 We’ll specify the data
files first
Files
 Using “Browse”
buttons, select
input and output
files
 The “Inputs” file contains one column for each
parameter and one row for each model
training run (the design)
 The “Outputs” file contains the outputs from
those runs (one column, in this examle)
Our example
 We’ll use the example “model1” in the GEM-SA
DEMO DATA directory
 This example is based on a vegetation model
with 7 inputs
– RESAEREO, DEFLECT, FACTOR, MO,
COVER, TREEHT, LAI
 The model has 16 outputs, but for the present
we will consider output 4
– June monthly GPP
Number of inputs
 Click on Options tab
 Select number
of inputs using
or click “From
Inputs File”
Define input names
 Click on “Names …”
 The “Input
parameter
names” dialog
opens
 Enter parameter
names
 Click “OK”
Complete the project
 We will leave all other settings at their default
values for now
 Click “OK”
The Input Parameter
Ranges window
appears
Close and save project
 Click “Defaults from
input ranges” button
 Click “OK”
 Select Project -> Save
– Or click toolbar icon
 Choose a name and
click “Save”
Running a simple analysis
Build the emulator
 Click
to build the emulator
 A lot of things now start to happen!
– The log window at the bottom starts to
record various bits of information
– A little window appears showing progress of
minimisation of the roughness parameter
estimation criterion
– A new window appears in the “Main Effects”
tab and several graphs appear
 Progress bar at the bottom
Focus on the log window
 Ignore the outputs in the “Main Effects” and
“Sensitivity Analysis” windows for now
– These will be explained later
 Focus on the log window
 This reports two key things
– Diagnostics of the emulator build
– The basic uncertainty analysis results
 These also appear in the “Output Summary”
window and can be printed using
Emulation diagnostics
 Note where the log window reports …
Estimating emulator parameters by maximising probability distribution...
maximised posterior for emulator parameters: sigma-squared =
0.342826, roughness = 0.217456 0.0699709 0.191557 16.9933
0.599439 0.459675 1.01559
 The first line says roughness parameters have
been estimated by the simplest method
 The values of these indicate how non-linear the
effect of each input parameter is
– Note the high value for input 4 (MO)
Uncertainty analysis – mean
 Below this, the log reports
Estimate of mean output is 24.145, with variance 0.00388252
 So the best estimate of the output (June GPP)
is 24.1 (mol C/m2)
– This is averaged over the uncertainty in the
7 inputs
 Better than just fixing inputs at best estimates
– There is an emulation standard error of
0.062 in this figure
Uncertainty analysis – variance
 The final line of the log is
Estimate of total output variance = 73.9033
 This shows the uncertainty in the model output
that is induced by input uncertainties
– The variance is 73.9
– Equal to a standard deviation of 8.6
– So although the best estimate of the output
is 24.3, the uncertainty in inputs means it
could easily be as low as 16 or as high as
33
More complex analyses
Input distributions
 Default is to assume the uncertainty in each
input is represented by a uniform distribution
– Range determined by the range of values
found in the input file, or input manually
 A normal (gaussian) distribution
is generally a more realistic
representation of uncertainty
– Range unbounded
– More probability in the
middle
-4
0.3
0.2
0.1
-2
0
2 x
4
Changing input distributions
 In Project dialog,
Options tab, click
the button for
“All unknown,
product normal”
 Then OK
 A new dialog
opens to specify
means and
variances
Model 1 example
 Uniform
distributions from
input ranges
 Normal
distributions to
match
– Range is 4
std devs
 Except for MO
– Narrower
distribution
Uniform
Parameter
Normal
Lower
Upper
Mean
Variance
RESAEREO
80
200
140
900
DEFLECT
0.6
1
0.8
0.01
FACTOR
0.1
0.5
0.3
0.01
MO
30
100
60
100
COVER
0.6
0.99
0.8
0.01
TREEHT
10
40
25
100
3.75
9
6.5
1
LAI
Effect on UA
 After running the revised model, we see:
– It runs faster, with no need to rebuild the
emulator
The emulator fit is unchanged
– The mean is changed a little and variance is
halved
Estimate of mean output is 26.2698, with variance 0.00784475
Estimate of total output variance = 38.1319
Reducing the MO uncertainty further
 If we reduce the variance of MO even more, to
49:
– UA mean changes a little more and
variance reduces again
Estimate of mean output is 26.3899, with variance 0.0108792
Estimate of total output variance = 27.1335
– Notice also how the emulation uncertainty
has increased (0.004 for uniform)
– This is because the design points cover the
new ranges less thoroughly
Cross-validation
 In the Project dialog, look at the bottom menu
box, labelled “Cross-validation”
 There are 3 options
– None
– Leave-one-out
– Leave final 20% out
 CV is a way of checking the emulator fit
– Default is None because CV takes time
Leave-one-out CV
 After estimating roughness and other
parameters, GEM predicts each training run
point using only the remaining n-1 points
 Results appear in log window
Close to 1
Cross Validation Root Mean-Squared Error = 0.907869
Cross Validation Root Mean-Squared Relative Error = 4.34773 percent
Cross Validation Root Mean-Squared Standardised Error = 1.15273
Largest standardised error is 4.32425 for data point 61
Cross Validation variances range from 0.18814 to 3.92191
Written cross-validation means to file cvpredmeans.txt
Written cross-validation variances to file cvpredvars.txt
Leave out final 20% CV
 This is an even better check, because it tests
the emulator on data that have not been used
in any way to predict it
 Emulator is built on first 80% of data and used
to predict last 20%
Cross Validation Root Mean-Squared Error = 1.46954
Cross Validation Root Mean-Squared Relative Error = 7.4922 percent
Cross Validation Root Mean-Squared Standardised Error = 1.73675
Largest standardised error is 5.05527 for data point 22
Cross Validation variances range from 0.277304 to 4.88653
Other options
 There are various other options associated with
the emulator building that we have not dealt
with
 But we’ve done the main things that should be
considered in practice
 And it’s enough to be going on with!
When it all goes wrong
 How do we know when the emulator is not
working?
– Large roughness parameters
 Especially ones hitting the limit of 99
– Large emulation variance on UA mean
– Poor CV standardised prediction error
 Especially when some are extremely large
 In such cases, see if a larger training set helps
– Other ideas like transforming output scale
Part 3
Sensitivity Analysis in GEM-SA
Example
 Again we use the ForestETP vegetation model
– 7 input parameters
– 120 model runs
 Objective: conduct a variance-based sensitivity
analysis to identify which uncertain inputs are
driving the output uncertainty.
Exploratory scatter plots
Sensitivity Analysis Walkthrough
1.  Project  New
2. Click “Browse” for the Inputs File
– From the GEM-SA Demo Data/Model1/
folder, select “emulator7x120inputs.txt”
3. Click “Browse” for the Outputs File
– From the GEM-SA Demo Data/Model1/
folder, select “out11.txt”
4. Select the Options tab
Sensitivity Analysis Walkthrough
5. Change the Number of Inputs to 7.
6. Leave the other options unchanged
–
–
–
Input uncertainty options: All unknown, uniform
Prior mean options: Linear term for each input
Generate predictions as: function realisations
(correlated points)
Sensitivity Analysis Walkthrough
Sensitivity Analysis Walkthrough
7. Click OK
8. Select “Default from input ranges” then
OK
9.  Project  Run or use
Main effect plots
Main effect plots
Fixing X6 = 18, this point shows the expected value of the output (obtained by
averaging over all other inputs).
X6
Simply fixing all the other inputs at their central values and comparing X6=10
with X6=40 would underestimate the influence of this input
(The thickness of the band shows emulator uncertainty)
Variance of main effects
Main effects for each
input. Input 6 has the
greatest individual
contribution to the
variance
Main effects sum to 66.8% of the total variance
Interactions and total effects
 Main effects explain 2/3 of the variance
– Model must contain interactions
 Any input can have small main effect, but large
interaction effect, so overall this input is still
‘important’
 Can ask GEM-SA to compute all pair-wise
interaction effects
– 435 in total for a 30 input model – can take
some time!
 Useful to know what to look for
Interactions and total effects
 For each input Xi
Total effect of Xi = main effect for Xi + all
interactions involving Xi
 Total effect >> main effect implies interactions
in the model
 So for any input with large total effect relative to
the main effect
– investigate possible interactions involving
that input
Interactions and total effects
Total effects for inputs 4
and 7 much larger than
its main effect. Implies
presence of interactions
Interaction effects
10.  Project  Edit or
11. In Options tab, tick calculate joint effects
12. De-select all inputs under “Inputs to include in
joint effects”, then select X4, X5, X6, X7
Interaction effects
13. Click OK, then OK again
14.  Project  Run or
Interaction effects
Note interactions
involving inputs 4 and 7
Main effects and
selected interactions
now sum to almost 92%
of the total variance
Exercise
1. Set up a new project using SAex1_inputs.txt for
the inputs and SAex1_outputs.txt for the output
– 8 input parameters (uniform on [0,1])
– 100 model runs
2. Estimate the main effects only for this model
and identify the influential input variables
3. By comparing main effects with total effects,
can you spot any interactions?
4. Estimate any suspected interactions to test
your intuition!