Transcript ppt
Uncertainty Quantification & the
PSUADE Software
Mahmoud Khademi
Supervisor: Prof. Ned Nedialkov
Department of Computing and Software
McMaster University, Hamilton, Ontario
Canada
2012
Outline
Introduction to Uncertainty Quantification (UQ)
Identification
Characterization
Propagation
Analysis
Common algorithms and methods
PSUADE: UQ software library and environment
https://computation.llnl.gov/casc/uncertainty_quantification/
Conclusions & future research directions
Introduction to UQ
Quantitative characterization and reduction of
uncertainty
Estimating probability of certain outcomes when
some aspects of system are unknown
Advances of simulation-based scientific discovery
caused emergence of verification and validation
(V&V) and UQ
Many problems in the natural sciences and
engineering have uncertainty
Identification
Model structure: models are only approximation to
reality
Numerical approximation: methods are not exact
Input and model parameters may only be known
approximately
Variations in inputs and model parameters due to
differences between instances of same object
Noise, measurement errors and lack of data
Characterization
Aleatoric (statistical) uncertainties: differ each time
we run same experiment
Monte Carlo methods are used, probability
density function (PDF) can be represented by its
moments
Epistemic (systematic) uncertainties: due to things
we could in principle know but don't in practice
Fuzzy logic or generalization of Bayes theory are
used
Propagation
How uncertainty evolve?
Analyzing impact parameter uncertainties have on
outputs
Finding major sources of uncertainties (sensitivity
analysis)
Exploring “interesting” regions in parameter space
(model exploration)
Analysis
Assessing "anomalous" regions in parameter space
(risk analysis)
Creating integrity of a simulation model (validation)
Providing information on which additional physical
experiments are needed to improve understanding
of system (experimental guidance)
Selecting Proper Methods
Is there nonlinear relationship between uncertain
and output variables?
Is uncertain parameter space high-dimensional?
There may be some model form uncertainties
How much is computational cost per simulation?
Which experimental data are available?
Monte Carlo Algorithms
Based on repeated random sampling to compute
their results
Used when it is not feasible to compute an exact
result with a deterministic algorithm
Useful for simulating systems with many degrees of
freedom, e.g. cellular structures
Monte Carlo Method: Outline
Define a domain of possible inputs
Generate inputs randomly from a probability density
function over domain
Perform a deterministic computation on inputs
Aggregate results
Polynomial Regression
Input data:
(x i , yi) :i= 1,... , n
a i (i= 0,... ,m)
Unknown parameters:
ε: random error with mean zero conditioned on x
yi= a 0+ a1 x i+ a 2 x2i + ...+ a m x m
i + εi( i= 1, ... , n)
[ ][
y1
y2 =
⋮
yn
1
1
⋮
1
x1
x2
⋮
xn
x2
1
x2
2
⋮
x2
n
⋯
⋯
⋯
⋯
xm
1
xm
2
⋮
xm
n
⃗
⃗Y= X ⃗a + ε⇒ ⃗â = (XT X)− 1 XT Y
][ ] [ ]
a0
a1
a2 +
a3
⋮
am
ε1
ε2
⋮
εn
MARS
MARS (multivariate adaptive regression splines) is
weighted sum of some bases functions:
f̂ (x )= ∑ c i Bi (x )
Each basis is constant 1, hinge function or
product of them as:
max (0, x− c) or max (0, c− x)
Each step of forward pass finds pair of bases
functions that gives maximum reduction in error
Backward pass prunes the model
MARS Versus Linear Regression
̂y= 25+6.1max (0,x−13)−3.1max (0,13− x)
ŷ = − 37+ 5.1x
Principal Component Analysis
Consider a set of N points in n-dimensional space:
{x 1 , x 2 ,... , x N }
Principal Component Analysis (PCA) looks for n by m
linear transformation matrix W mapping original ndimensional space into an m-dimensional feature
space, where m < n:
T
yk= W x k
(k= 1,. ., N)
High variance is associated with more information
Principal Component Analysis
Scatter matrix of transformed feature vectors is:
T
T
Sy= ∑ (y k− my )(y k− my ) = W Sx W
Sx is scatter of input vectors & m y mean yk ' s
Projection is chosen to maximize determinant of
total scatter matrix of projected samples:
Wopt = argmaxdet (WT Sx W)= [w 1 w2 ... wm ]
{w i : i= 1,... , m} are set of eigenvectors
corresponding to m largest eigenvalues of scatter
matrix of input vectors
PSUADE: How it works?
Input section allows the users to specify number of
inputs, their names, their range, their distributions,
etc.
Driver program can be in any language provided that
it is executable.
Run PSUADE with: [Linux] psuade psuade.in
At completion of runs, information will be displayed
and data file will also be created for further analysis
PSUADE Capabilities
Can study first order sensitivities of individual input
parameter (main effect)
Can construct a relationship between some input
parameters to model & output (response surface
modeling)
Can quantify impact of a subset of parameters on
output (global sensitivity analysis)
Can identify subset of parameters accounting for
output variability (parameter screening)
PSUADE Capabilities
Monte Carlo, quasi-Monte Carlo, Latin hypercube
and variants, factorial, Morris method, Fourier
Amplitude Sampling Test (FAST), etc
Simulator Execution Environment
Markov Chain Monte Carlo for parameter estimation
and basic statistical analysis
Many different types of response surfaces
Many methods for main, second-order, and totalorder effect analyses
y= sin( x 1)+ 7 (sin ( x 2))2+ 0.1x34 sin( x 1)
Scatter plot of x1 and y
Linear regres. (y with respect to x1)
Quadratic regres. (y with respect to x1)
MARS (y with respect to x1)
y= sin( x 1)+ 7 (sin ( x 2))2+ 0.1x34 sin( x 1)
Scatter plot of x2 and y
Linear regres. (y with respect to x2)
Quadratic regres. (y with respect to x2)
MARS (y with respect to x2)
y= sin( x 1)+ 7 (sin ( x 2))2+ 0.1x34 sin( x 1)
Scatter plot of x3 and y
Linear regres. (y with respect to x3)
Quadratic regres. (y with respect to x3)
MARS (y with respect to x3)
Sensitivity Analysis
y= sin ( x 1)+ 7 (sin ( x 2))2+ 0.1x34 sin ( x 1)
MARS screening rankings :
* Rank 1 : Input = 1 (score = 100.0)
* Rank 2 : Input = 3 (score = 0.0)
* Rank 3 : Input = 2 (score = 0.0)
MOAT Analysis (ordered):
Input 1 (mu*, sigma, dof) = 1.1011e-04 6.9425e-05 17
Input 3 (mu*, sigma, dof) = 0.0000e+00 0.0000e+00 -1
Input 2 (mu*, sigma, dof) = 0.0000e+00 0.0000e+00 -1
delta_test: perform Delta test:
Order of importance (based on 20 best configurations):
(D)Rank 1 : input 1 (score = 80 )
(D)Rank 2 : input 3 (score = 48 )
(D)Rank 3 : input 2 (score = 38 )
Sensitivity Analysis
y= sin ( x 1)+ 7 (sin ( x 2))2+ 0.1x34 sin ( x 1)
Gaussian process-based sensitivity analysis:
* Rank 1 : Input = 1 (score = 100.0)
* Rank 2 : Input = 2 (score = 75.9)
* Rank 3 : Input = 3 (score = 5.9)
Sum-of-trees-based sensitivity analysis:
* SumOfTrees screening rankings (with bootstrapping)
* Minimum points per node = 10
* Rank 1 : Input = 1 (score = 100.0)
* Rank 2 : Input = 3 (score = 0.9)
* Rank 3 : Input = 2 (score = 0.0)
Correlation Analysis
y= sin ( x 1)+ 7 (sin ( x 2))2+ 0.1x34 sin ( x 1)
Pearson correlation coefficients (PEAR) - linear relationship which gives a measure of relationship between X_i's & Y.
* Pearson Correlation coeff. (Input 1) = -8.526593e-01
* Pearson Correlation coeff. (Input 2) = -3.777038e-18
* Pearson Correlation coeff. (Input 3) = -2.356118e-18
Spearman coefficients (SPEA) - nonlinear relationship which gives a measure of relationship between X_i's & Y.
* Spearman coefficient(ordered) (Input 1 ) = 8.833944e-01
* Spearman coefficient(ordered) (Input 2 ) = 6.837607e-02
* Spearman coefficient(ordered) (Input 3 ) = 5.189255e-02
Main Effect Analysis
y= sin ( x 1)+ 7 (sin ( x 2))2+ 0.1x34 sin ( x 1)
RS-based 1-input Sobol' decomposition:
RSMSobol1: Normalized VCE (ordered) for input 1 = 1.003211e+00
RSMSobol1: Normalized VCE (ordered) for input 2 = 9.395314e-32
RSMSobol1: Normalized VCE (ordered) for input 3 = 4.440130e-33
McKay's correlation ratio:
INPUT 1 = 7.27e-01 (raw = 2.02e-09)
INPUT 2 = 1.14e-11 (raw = 3.17e-20)
INPUT 3 = 1.77e-35 (raw = 4.92e-44)
2 2
2
y= 100( x 2− x 1) + (1− x 1) , x 1 , x 2 ϵ [− 2 , 2]
Response surface analysis (MARS)
Response surface anal . (Linear regres.)
2 2
2
y= 100( x 2− x 1) + (1− x 1) , x 1 , x 2 ϵ [− 2 , 2]
Response surface analysis (Quadratic)
Response surface anal ysis (Cubic)
2 2
2
y= 100( x 2− x 1) + (1− x 1) , x 1 , x 2 ϵ [− 2 , 2]
Response surface analysis (Sum-of-trees)
Response surface anal ysis (Quartic)
Future Research Directions
Resolving curse of dimensionality
Representation of uncertainty
Bayesian computation & machine learning
techniques e.g. stochastic multi-scale systems for
model selection , classification & decision making
Visualization in high-dimensional spaces