Transcript Document

Error and Uncertainty
Scott Ferson, [email protected]
4 September 2007, Stony Brook University, MAR 550, Challenger 165
Scientific hubris
• Imprudent extrapolations
• Overfitting crimes against Occam
– e.g., 40 parameters, 25 data points
• Neglecting uncertainty
– in estimates, models and decisions
• Wishful thinking
– using values or models because they are convenient,
or because you hope they are true
Kansai International Airport
•
•
•
•
•
30 km from Kobe in Osaka Bay
Artificial island made with fill
Engineers told planners it’d sink [6, 8] m
Planners elected to design for 6 m
It’s sunk 9 m so far and is still sinking
(The operator of the airport denies these media reports)
Success
Good
engineering
Dumb luck
Wishful
thinking
Prudent
analysis
Honorable
failure
Negligence
Failure
“Uncertainties appear everywhere! … When using a
mathematical model, careful attention must be given to
uncertainties in the model.”
Richard Feynman
“Uncertainty quantification is the missing piece of the
puzzle in large scale computations.”
Tim Barth
“We have to make the best model we possibly can, and
then not trust it.”
Robert Costanza
Credible uncertainty analysis
• Decision makers far more likely to use
modeling results because they’d know the
outputs are good enough
• Program managers could focus research on
areas where uncertainty is intolerable
So how to do it?
• Direct statistical analysis of mechanistic model
– Monte Carlo simulation
– Latin hypercube and stratified sampling
– Response surface approaches
• Recast model as stochastic PDE and solve it
– Perturbation expansion methods for random fields
– Stochastic operator expansions
We need simple methods that don’t require
unreasonable assumptions or inordinate effort
Traditional uncertainty analyses
• Worst case bounding analysis
• Taylor series approximations (delta method)
• Normal theory propagation (ISO/NIST)
• Monte Carlo simulation
• Two-dimensional Monte Carlo
Untenable assumptions
• Uncertainties are small
• Sources of variation are independent
• Uncertainties cancel each other out
• Linearized models good enough
• Underlying mechanisms are known and modeled
• Computations are inexpensive to make
Need ways to relax assumptions
• Possibly large uncertainties
• Non-independent, or unknown dependencies
• Uncertainties that may not cancel
• Arbitrary mathematical operations
• Model uncertainty
Kinds of uncertainty
• Variability
– aleatory uncertainty, stochasticity, randomness, Type A
• Incertitude
– epistemic uncertainty, imprecision, uncertainty, Type B
• Vagueness
– semantic uncertainty, fuzziness, multivalent uncertainty
• Confusion, etc.
Incertitude
• Arises from incomplete knowledge
• Incertitude arises from
– limited sample size
– mensurational limits (‘measurement error’)
– use of surrogate data
• Reducible with empirical effort
Variability
• Arises from natural stochasticity
• Variability arises from
– spatial variation
– temporal fluctuations
– genetic or manufacturing differences
• Not reducible by empirical effort
Propagating variability
• Probability theory can project variability in
inputs through mathematical models
• Suppose
– Doses of an environmental contaminant vary
among individuals
– Susceptibilities also vary independently among
those individuals
• Model both by probability distributions
Propagating incertitude
Suppose
A is in [2, 4]
B is in [3, 5]
What can be said about the sum A+B?
The right answer is [5,9]
4
6
8
10
They must be treated differently
• Variability should be modeled as randomness
with the methods of probability theory
• Incertitude should be modeled as ignorance
with the methods of interval analysis
Incertitude is common
• Periodic observations
When did the fish in my aquarium die during the night?
• Plus-or-minus measurement uncertainties
Coarse measurements, measurements from digital readouts
• Non-detects and data censoring
Chemical detection limits, studies prematurely terminated
• Privacy requirements
Epidemiological or medical information, census data
• Theoretical constraints
Concentrations, solubilities, probabilities, survival rates
• Bounding studies
Presumed or hypothetical limits in what-if calculations
Basic problems
• Representation of what’s (un)known
• Aggregation and updating
• Prediction
– Arithmetic expressions
– Logical expressions (fault or event trees)
– Differential equations
•
•
•
•
•
•
Sensitivity analysis
Validation
Decision making
Backcalculation
Optimization
Etc.
Two basic approaches
D eter ministic
calculatio n
P robabilistic
convolutio n
Interval
analysis
Example applications
• Plume travel time
• Dike reliability
• Endangered species
• Environmental pollution
Example: contaminant plume
• Hydrocarbon in groundwater near some wells
• Constant, one-dimensional, uniform Darcian flow
• Homogeneous properties (e.g., no pipes, conduits,
barriers or differential permeability among layers)
• Linear retardation
• No dispersion
• How long before the contaminant reaches the wells?
Plume travel time

n  BD  foc  Koc L
T
K i
Parameter
L source-receptor distance
i
hydraulic gradient
K hydraulic conductivity
n
effective soil porosity
BD soil bulk density
foc fraction organic carbon
Koc organic partition coefficient
Units
m
m/m
m/yr
kg/m3
m3/kg
Min
80
0.0003
300
0.2
1500
0.0001
5
Max
120
0.0008
3000
0.35
1750
0.005
20
Mean
100
0.00055
1000
0.25
1650
0.00255
10
Stdv
11.55
0.0001443
750
0.05
100
0.001415
3
Example: dike reliability
revetment
blocks
wave
sea level
D

Reliability is strength minus stress
H tan()
Z = D  ——————
cos() M s
 relative density of the revetment blocks
D revetment blocks thickness
H offshore peak wave steepness
 slope of the revetment
What kind of information might be
s significant wave height
available about these variables?
M model parameter
Risk (cumulative probability)
Reliability function
1
0
0
1
Z
Example: endangered species
•
•
•
•
•
•
Northern spotted owl Strix occidentalis caurina
Olympic Peninsula, Washington State
Leslie matrix model (with composite age)
Environmental and demographic stochasticity
Density dependence (territorial, Allee effects)
Catastrophic windstorms
IUCN threat criteria
Extinct (not sighted in the wild for 50 years)
Critical (50% risk of extinction in 18 years)
Endangered (20% risk of extinction in 89 years)
Vulnerable (10% risk of extinction in 100 years)
Nonthreatened (better than any of the above)
Leslie matrix model
juveniles t + 1
subadults t + 1
adults t + 1
=
0
Sjuveniles
0
Fsubadults
0
Ssubadults
Fadults
0
Sadults
juveniles t
subadults t
adults t
What kind of information might be available about these variables?
Risk of quasi-extinction
Cumulative probability
1
0.8
0.6
critical
0.4
endangered
0.2
vulnerable
0
0
20
40
60
Time (years)
80
100
Example: environmental pollution
Location:
Bayou d’Inde, Louisiana
Receptor:
generic piscivorous small mammal
Contaminant: mercury
Exposure route: diet (fish and invertebrates)
Based on the assessment described in “Appendix I2: Assessment of Risks to Piscivorus [sic]
Mammals in the Calcasieu Estuary”, Calcasieu Estuary Remedial Investigation/Feasibility Study
(RI/FS): Baseline Ecological Risk Assessment (BERA), prepared October 2002 for the U.S.
Environmental Protection Agency. See http://www.epa.gov/earth1r6/6sf/pdffiles/appendixi2.pdf.
Total daily intake from diet
FMR
normalized free metabolic rate
Cfish, Cinverts mercury concentration in fish or invertebrate tissue
Pfish, Pinverts
proportion of fish or inverts in the mammal’s diet
BW
body mass of the mammal
AEfish, AEinverts assimilation efficiency for dietary fish or inverts
GEfish, GEinverts gross energy of fish or invertebrate tissue
What kind of information might be available about these variables?
Results
Exceedance risk
1
0
0
0.1
TDI, mg kg1 day1
0.2
How to use uncertainty results
When uncertainty makes no difference
(because results are so clear), bounding gives
confidence in the reliability of the decision
When uncertainty swamps the decision
(i) use results to identify inputs to study better, or
(ii) use other criteria within probability bounds
More complicated models
• It will not always be easy to propagate
uncertainty correctly through very complex
process models
• New methods are under development to do it
• It must be done
Contentions
• Biometry is insufficient
– Need decision analysis, ways to handle poor data
• Worst case analysis is misleading
– Usually ignores some knowledge or information
• Monte Carlo simulation alone is obsolete
– Need methods that handle incertitude
Ethic
• Failing to report uncertainty is lying
• Overstating uncertainty is cowardice
• Assumptions are a playground where honesty
and courage are developed
Everyone makes assumptions
• But not all sets of assumptions are equal
Point value
Interval range
Entire real line
Linear function
Monotone function
Any function
Normal distribution
Unimodal distribution
Any distribution
Independence
Known correlation
Any dependence
• Want to discharge unwarranted assumptions
“Certainties lead to doubt; doubts lead to certainty”
End
For next time
• Discuss an example from your discipline
where ignoring uncertainty led to a poor
result
• Discuss a situation in which you made an
assumption you knew was probably false
• Read Nikolaidis and Haftka