Transcript Document
Error and Uncertainty Scott Ferson, [email protected] 4 September 2007, Stony Brook University, MAR 550, Challenger 165 Scientific hubris • Imprudent extrapolations • Overfitting crimes against Occam – e.g., 40 parameters, 25 data points • Neglecting uncertainty – in estimates, models and decisions • Wishful thinking – using values or models because they are convenient, or because you hope they are true Kansai International Airport • • • • • 30 km from Kobe in Osaka Bay Artificial island made with fill Engineers told planners it’d sink [6, 8] m Planners elected to design for 6 m It’s sunk 9 m so far and is still sinking (The operator of the airport denies these media reports) Success Good engineering Dumb luck Wishful thinking Prudent analysis Honorable failure Negligence Failure “Uncertainties appear everywhere! … When using a mathematical model, careful attention must be given to uncertainties in the model.” Richard Feynman “Uncertainty quantification is the missing piece of the puzzle in large scale computations.” Tim Barth “We have to make the best model we possibly can, and then not trust it.” Robert Costanza Credible uncertainty analysis • Decision makers far more likely to use modeling results because they’d know the outputs are good enough • Program managers could focus research on areas where uncertainty is intolerable So how to do it? • Direct statistical analysis of mechanistic model – Monte Carlo simulation – Latin hypercube and stratified sampling – Response surface approaches • Recast model as stochastic PDE and solve it – Perturbation expansion methods for random fields – Stochastic operator expansions We need simple methods that don’t require unreasonable assumptions or inordinate effort Traditional uncertainty analyses • Worst case bounding analysis • Taylor series approximations (delta method) • Normal theory propagation (ISO/NIST) • Monte Carlo simulation • Two-dimensional Monte Carlo Untenable assumptions • Uncertainties are small • Sources of variation are independent • Uncertainties cancel each other out • Linearized models good enough • Underlying mechanisms are known and modeled • Computations are inexpensive to make Need ways to relax assumptions • Possibly large uncertainties • Non-independent, or unknown dependencies • Uncertainties that may not cancel • Arbitrary mathematical operations • Model uncertainty Kinds of uncertainty • Variability – aleatory uncertainty, stochasticity, randomness, Type A • Incertitude – epistemic uncertainty, imprecision, uncertainty, Type B • Vagueness – semantic uncertainty, fuzziness, multivalent uncertainty • Confusion, etc. Incertitude • Arises from incomplete knowledge • Incertitude arises from – limited sample size – mensurational limits (‘measurement error’) – use of surrogate data • Reducible with empirical effort Variability • Arises from natural stochasticity • Variability arises from – spatial variation – temporal fluctuations – genetic or manufacturing differences • Not reducible by empirical effort Propagating variability • Probability theory can project variability in inputs through mathematical models • Suppose – Doses of an environmental contaminant vary among individuals – Susceptibilities also vary independently among those individuals • Model both by probability distributions Propagating incertitude Suppose A is in [2, 4] B is in [3, 5] What can be said about the sum A+B? The right answer is [5,9] 4 6 8 10 They must be treated differently • Variability should be modeled as randomness with the methods of probability theory • Incertitude should be modeled as ignorance with the methods of interval analysis Incertitude is common • Periodic observations When did the fish in my aquarium die during the night? • Plus-or-minus measurement uncertainties Coarse measurements, measurements from digital readouts • Non-detects and data censoring Chemical detection limits, studies prematurely terminated • Privacy requirements Epidemiological or medical information, census data • Theoretical constraints Concentrations, solubilities, probabilities, survival rates • Bounding studies Presumed or hypothetical limits in what-if calculations Basic problems • Representation of what’s (un)known • Aggregation and updating • Prediction – Arithmetic expressions – Logical expressions (fault or event trees) – Differential equations • • • • • • Sensitivity analysis Validation Decision making Backcalculation Optimization Etc. Two basic approaches D eter ministic calculatio n P robabilistic convolutio n Interval analysis Example applications • Plume travel time • Dike reliability • Endangered species • Environmental pollution Example: contaminant plume • Hydrocarbon in groundwater near some wells • Constant, one-dimensional, uniform Darcian flow • Homogeneous properties (e.g., no pipes, conduits, barriers or differential permeability among layers) • Linear retardation • No dispersion • How long before the contaminant reaches the wells? Plume travel time n BD foc Koc L T K i Parameter L source-receptor distance i hydraulic gradient K hydraulic conductivity n effective soil porosity BD soil bulk density foc fraction organic carbon Koc organic partition coefficient Units m m/m m/yr kg/m3 m3/kg Min 80 0.0003 300 0.2 1500 0.0001 5 Max 120 0.0008 3000 0.35 1750 0.005 20 Mean 100 0.00055 1000 0.25 1650 0.00255 10 Stdv 11.55 0.0001443 750 0.05 100 0.001415 3 Example: dike reliability revetment blocks wave sea level D Reliability is strength minus stress H tan() Z = D —————— cos() M s relative density of the revetment blocks D revetment blocks thickness H offshore peak wave steepness slope of the revetment What kind of information might be s significant wave height available about these variables? M model parameter Risk (cumulative probability) Reliability function 1 0 0 1 Z Example: endangered species • • • • • • Northern spotted owl Strix occidentalis caurina Olympic Peninsula, Washington State Leslie matrix model (with composite age) Environmental and demographic stochasticity Density dependence (territorial, Allee effects) Catastrophic windstorms IUCN threat criteria Extinct (not sighted in the wild for 50 years) Critical (50% risk of extinction in 18 years) Endangered (20% risk of extinction in 89 years) Vulnerable (10% risk of extinction in 100 years) Nonthreatened (better than any of the above) Leslie matrix model juveniles t + 1 subadults t + 1 adults t + 1 = 0 Sjuveniles 0 Fsubadults 0 Ssubadults Fadults 0 Sadults juveniles t subadults t adults t What kind of information might be available about these variables? Risk of quasi-extinction Cumulative probability 1 0.8 0.6 critical 0.4 endangered 0.2 vulnerable 0 0 20 40 60 Time (years) 80 100 Example: environmental pollution Location: Bayou d’Inde, Louisiana Receptor: generic piscivorous small mammal Contaminant: mercury Exposure route: diet (fish and invertebrates) Based on the assessment described in “Appendix I2: Assessment of Risks to Piscivorus [sic] Mammals in the Calcasieu Estuary”, Calcasieu Estuary Remedial Investigation/Feasibility Study (RI/FS): Baseline Ecological Risk Assessment (BERA), prepared October 2002 for the U.S. Environmental Protection Agency. See http://www.epa.gov/earth1r6/6sf/pdffiles/appendixi2.pdf. Total daily intake from diet FMR normalized free metabolic rate Cfish, Cinverts mercury concentration in fish or invertebrate tissue Pfish, Pinverts proportion of fish or inverts in the mammal’s diet BW body mass of the mammal AEfish, AEinverts assimilation efficiency for dietary fish or inverts GEfish, GEinverts gross energy of fish or invertebrate tissue What kind of information might be available about these variables? Results Exceedance risk 1 0 0 0.1 TDI, mg kg1 day1 0.2 How to use uncertainty results When uncertainty makes no difference (because results are so clear), bounding gives confidence in the reliability of the decision When uncertainty swamps the decision (i) use results to identify inputs to study better, or (ii) use other criteria within probability bounds More complicated models • It will not always be easy to propagate uncertainty correctly through very complex process models • New methods are under development to do it • It must be done Contentions • Biometry is insufficient – Need decision analysis, ways to handle poor data • Worst case analysis is misleading – Usually ignores some knowledge or information • Monte Carlo simulation alone is obsolete – Need methods that handle incertitude Ethic • Failing to report uncertainty is lying • Overstating uncertainty is cowardice • Assumptions are a playground where honesty and courage are developed Everyone makes assumptions • But not all sets of assumptions are equal Point value Interval range Entire real line Linear function Monotone function Any function Normal distribution Unimodal distribution Any distribution Independence Known correlation Any dependence • Want to discharge unwarranted assumptions “Certainties lead to doubt; doubts lead to certainty” End For next time • Discuss an example from your discipline where ignoring uncertainty led to a poor result • Discuss a situation in which you made an assumption you knew was probably false • Read Nikolaidis and Haftka