#### Transcript Statistics * Bayes v Classical

UQ: from end to end Tony O’Hagan Outline Session 1: Quantifying input uncertainty Information Modelling Elicitation Coffee break: Propagation! Session 2: Model discrepancy and calibration All models are wrong Impact of model discrepancy Modelling model discrepancy 12/8/2014 UQ Summerschool 2014 2 UQ: from end to end Session 1: Quantifying input uncertainty Context You have a model To simulate or predict some real-world process I’ll call it a simulator For a given use of the simulator you are unsure of the true or correct values of inputs This uncertainty is a major component of UQ Propagating it through the simulator is a fundamental step in UQ We need to express that uncertainty in the form of probability distributions But how? I feel that this is a neglected area in UQ Distributions assumed, often with no discussion of where they came from 12/8/2014 UQ Summerschool 2014 4 Focus of this session Probability distributions for inputs Representing the analyst’s knowledge/uncertainty What they mean Interpretation of probability Where they come from Analysis of data and/or judgement Elicitation Principles Single input Multiple inputs Multiple experts 12/8/2014 UQ Summerschool 2014 5 The analyst The distributions should represent the best knowledge of the model user about the inputs I will refer to the model user as the analyst They are the analyst’s responsibility The analyst is the one who is interested in the simulator output For a particular application And some or all of the inputs refer specifically to that application The analyst must own the input distributions They should represent best knowledge Obviously! Anything else is unscientific Less input uncertainty means (generally) less output uncertainty 12/8/2014 UQ Summerschool 2014 6 What probability? Before we go further, we need to understand how a probability distribution represents someone’s knowledge The question goes right to the heart of what probability means Example: We are interested in X = the proportion of people infected with HIV who will develop AIDS within 10 years when treated with a new drug X will be an input to a clinical trial simulator To assist the pharmaceutical company in designing the drug’s development plan Analyst Mary expresses a probability distribution for X 12/8/2014 UQ Summerschool 2014 7 Mary’s distribution The stated distribution is shown on the right It specifies how probable any particular values of X are E.g. It says there is a probability of almost 0.7 that X is below 0.4 And the expected value of X is 0.35 It even gives a nontrivial probability to X being less than 0.2 Which would represent a major reduction in HIV progression 12/8/2014 UQ Summerschool 2014 8 How can X have probabilities? Almost everyone learning probability is taught the frequency interpretation The probability of something is the long run relative frequency with which it occurs in a very long sequence of repetitions How can we have repetitions of X? It’s a one-off: it will only ever have one value It’s that unique value we’re interested in Simulator inputs are almost always like this – they’re one-off! Mary’s distribution can’t be a probability distribution in that sense So what do her probabilities actually mean? And does she know? 12/8/2014 UQ Summerschool 2014 9 Mary’s probabilities Mary’s probability 0.7 that X < 0.4 is a judgement She thinks it’s more likely to be below 0.4 than above So in principle she would bet even money on it In fact she would bet £2 to win £1 (because 0.7 > 2/3) Her expectation of 0.35 is a kind of best estimate Not a long run average over many repetitions Her probabilities are an expression of her beliefs They are personal judgements You or I would have different probabilities We want her judgements because she’s the expert! We need a new definition of probability 12/8/2014 UQ Summerschool 2014 10 Subjective probability The probability of a proposition E is a measure of a person’s degree of belief in the truth of E If they are certain that E is true then P(E) = 1 If they are certain it is false then P(E) = 0 Otherwise P(E) lies between these two extremes Exercise 1 – How many Muslims in Britain? Refer to the two questions on your sheet The first asks for a probability Make your own personal judgement If you don’t already have a good feel for the probability scale, you may find it useful to think about betting odds The second asks for another probability 12/8/2014 UQ Summerschool 2014 11 Subjective includes frequency The frequency and subjective definitions of probability are compatible If the results of a very long sequence of repetitions are available, they agree Frequency probability equates to the long run frequency All observers who accept the sequence as comprising repetitions will give that frequency as their (personal/subjective) probability for the next (or any future) result in the sequence Subjective probability extends frequency probability But also seamlessly covers propositions that are not repeatable It’s also more controversial 12/8/2014 UQ Summerschool 2014 12 It doesn’t include prejudice etc! The word “subjective” has derogatory overtones Subjectivity should not admit prejudice, bias, superstition, wishful thinking, sloppy thinking, manipulation ... Subjective probabilities are judgements but they should be careful, honest, informed judgements As “objective” as possible without ducking the issue Using best practice Formal elicitation methods Bayesian analysis Probability judgements go along with all the other judgements that a scientist necessarily makes And should be argued for in the same careful, honest and informed way 12/8/2014 UQ Summerschool 2014 13 What about data? I’ve presented the analyst’s probability distributions as a matter of pure subjective judgement – what about data? Many possible scenarios: X is a parameter for which there is a published value Analyst has one or more direct experimental evaluations for X Analyst has data relating more or less directly to X Analyst has some hard data but also personal expertise about X Analyst relies on personal expertise about X Analyst seeks input from an expert on X … 12/8/2014 UQ Summerschool 2014 14 The case of a published value The published value may come with a completely characterised probability distribution for X representing uncertainty in the value The analyst simply accepts this distribution as her own judgement Or it may not The analyst needs to consider the uncertainty in X around the published value P X = P + E, where E is the error Analyst formulates her own probability distribution for E The published value P may simply come with a standard deviation The analyst accepts this as one judgement about E 12/8/2014 UQ Summerschool 2014 15 Using data – principles The appropriate framework for using data is Bayesian statistics Because it delivers a probability distribution for X Classical frequentist statistics can’t do that Even a confidence interval is not a probability statement about X The data are related to X through a likelihood function Derived from a statistical model This is combined with whatever additional knowledge the analyst may have In the form of a prior distribution Combination is performed by Bayes’ theorem The result is the analyst’s posterior distribution for X 12/8/2014 UQ Summerschool 2014 16 Using data – practicalities If data are highly informative about X, prior information may not matter Use a conventional non-informative prior distribution Otherwise the analyst formulates her own prior distribution for X Bayesian analysis can be complex Analyst is likely to need the services of a Bayesian statistician The likelihood/model is also a matter of judgement! Although I will not delve into this today 12/8/2014 UQ Summerschool 2014 17 Summary We have identified several situations where distributions need to be formulated by personal judgement No good data – analyst formulates distribution for X Published data does not have complete characterisation of uncertainty – analyst formulates distribution for E Data supplemented by additional expertise – analyst formulates prior distribution for X Analyst may seek judgements of one or more experts Rather than relying on her own Particularly when the stakes are high We have identified just one situation where personal judgements are not needed Published data with completely characterised uncertainty 12/8/2014 UQ Summerschool 2014 18 Elicitation The process of representing the knowledge of one or more persons (experts) concerning an uncertain quantity as a probability distribution for that quantity Typically conducted as a dialogue between the experts – who have substantive knowledge about the quantity (or quantities) of interest – and a facilitator – who has expertise in the process of elicitation ideally face to face but may also be done by video-conference, teleconference or online 12/8/2014 UQ Summerschool 2014 19 Some history The idea of formally representing uncertainty using subjective probability judgements began to be taken seriously in the 1960s For instance, for judgement of extreme risks Psychologists became interested How do people make probability judgements? What mental processes are used, and what does this tell us about the brain’s processing generally? They found many ways that we make bad judgements The heuristics and biases movement And continued to look mostly at how we get it wrong Since this told them a lot about our mental processes 12/8/2014 UQ Summerschool 2014 20 Meanwhile ... Statisticians increasingly made use of subjective probabilities Growth of Bayesian statistics Some formal elicitation but mostly unstructured judgements Little awareness of the work in psychology Reinforced recently by UQ with uncertain simulator inputs Our interests are more complex Not really interested in single probabilities Whole probability distributions Multivariate distributions We want to know how to get it right Psychology provides almost no help with these challenges 12/8/2014 UQ Summerschool 2014 21 Heuristics and biases Our brains evolved to make quick decisions Heuristics are short-cut reasoning techniques Allow us to make good judgements quickly in familiar situations Judgement of probability is not something that we evolved to do well The old heuristics now produce biases Anchoring and adjustment Availability Overconfidence And many others 12/8/2014 UQ Summerschool 2014 22 Anchoring and adjustment Exercise 1 was designed to exhibit this heuristic The probabilities should on average be different in the two groups When asked to make two related judgements, the second is affected by the first The second is judged relative to the first By adjustment away from the first judgement The first is called the anchor Adjustment is typically inadequate Second response too close to the first (anchor) Anchoring can be strong even when obviously not really relevant to the second question Just putting any numbers into the discussion creates anchors Exercise 1 12/8/2014 UQ Summerschool 2014 23 Availability The probability of an event is judged more likely if we can quickly bring to mind instances of it Things that are more memorable are deemed more probable High profile train accidents in the UK lead people to imagine rail travel is more risky than it really is My judgement of the risk of dying from a particular disease will be increased if I know (of) people who have the disease or have died from it Important for analyst to review all the evidence 12/8/2014 UQ Summerschool 2014 24 Overconfidence It is generally said that experts are overconfident When asked to give 95% intervals, say, far fewer than 95% contain the true value Several possible explanations Wish to demonstrate expertise Anchoring to a central estimate Difficulty of judging extreme events Not thinking ‘outside the box’ Expertise often consists of specialist heuristics Situations we elicit judgements on are not typical Probably over-stated as a general phenomenon Experts can be under-confident if afraid of consequences A matter of personality and feeling of security Evidence of over-confidence is not from real experts making judgements on serious questions 12/8/2014 UQ Summerschool 2014 25 The keys to good elicitation First, pay attention to the literature on psychology of elicitation How you ask a question influences the answer Second, ask about the right things Things that experts are likely to assess most accurately Third, prepare thoroughly Provide help and training for experts These are built into the SHELF system Sheffield Elicitation Framework 12/8/2014 UQ Summerschool 2014 26 The SHELF system SHELF is a package of documents and simple software to aid elicitation General advice on conducting the elicitation Templates for recording the elicitation Suitable for several different basic methods Annotated versions of the templates with detailed guidance Some R functions for fitting distributions and providing feedback SHELF is freely available and comments and suggestions for additions are welcomed Developed by Tony O’Hagan and Jeremy Oakley R functions by Jeremy http://tonyohagan.co.uk/shelf 12/8/2014 UQ Summerschool 2014 27 A SHELF template Word document Facilitator follows a carefully constructed sequence of questions Final step invites experts to give their own feed-back The tertile method One of several supported in SHELF 12/8/2014 UQ Summerschool 2014 28 Annotated template For facilitator’s guidance Advice on each field of the template Ordinary text says what is required in each field Text in brackets gives advice on how to work with experts Text in italics says why we are doing it this way Based on findings in psychology 12/8/2014 UQ Summerschool 2014 Let’s see how it works SHELF templates provide a carefully structured sequence of steps Informed by psychology and practical experience I’ll work through these, using the following illustrative example An Expert is asked for her judgements about the distance D between the airports of Paris Charles de Gaulle and Chicago O’Hare in miles She has experience of flying distances but has not flown this route before She knows that from LHR to JFK is about 3500 miles 12/8/2014 UQ Summerschool 2014 30 Credible range L to U Expert is asked for lower and upper credible bounds Expert would be very surprised if X was found to be below the lower credible bound or above the upper credible bound It’s not impossible to be outside the credible range, just highly unlikely Practical interpretation might be a probability of 1% that X is below L and 1% that it’s above U Example Expert sets lower bound L = 3500 CDG to ORD surely more than LHR to JFK Upper bound U = 5000 Additional flying distance for CDG to ORD surely less than 1500 12/8/2014 UQ Summerschool 2014 31 The median M The value of x for which the expert judges X to be equally likely to be above or below x Probability 0.5 (or 50%) below and 0.5 above Like a toss of a coin Or chopping the range into two equally probable parts If the expert were asked to choose either to bet on X < x or on X > x, he/she should have no preference It’s a specific kind of ‘estimate’ of X Need to think, not just go for mid-point of the credible range L = 0, U = 1 M = 0.36 Example Expert chooses median M = 4000 12/8/2014 UQ Summerschool 2014 32 The quartiles Q1 and Q3 The lower quartile Q1 is the p = 25% quantile The expert judges X < x to have probability 0.25 Like tossing two successive Heads with a coin Equivalently, x divides the range below the median into two equi-probable parts ‘Less than Q1’ & ‘between Q1 and M’ Should generally be closer to M than Q1 L = 0, U = 1 M = 0.36 Q1 = 0.25 Q3 = 0.49 Similarly, upper quartile Q3 is p = 75% Q1, M and Q3 divide the range into four equi-probable parts Example Expert chooses Q1 = 3850, Q3 = 4300 12/8/2014 UQ Summerschool 2014 33 Then fit a distribution Any convenient distribution As long as it fits the elicited summaries adequately SHELF has software for fitting a range of standard distributions At this point, the choice should not matter The idea is that we have elicited enough Any reasonable distribution choice will be similar to any other Elicitation can never be exact The elicited summaries are only approximate anyway If the choice does matter i.e. different fitted distributions give different answers to the problem for which we are doing the elicitation We can try to remove the sensitivity by eliciting more summaries Or involving more experts 12/8/2014 UQ Summerschool 2014 34 Exercise 2 So let’s do it! We’re going to elicit your beliefs about one of the following (you can choose!) Number of gold medals to be won by China in 2016 Olympics Length of the Yangtze River Population of Beijing in 2011 Proportion of the total world land area covered by China 12/8/2014 UQ Summerschool 2014 35 Do we need a facilitator? Yes, if the simulator output is sufficiently important A skilled facilitator is essential to get the most accurate and reliable representation of the expert’s knowledge At least for the most influential inputs Otherwise, no The analyst can simply quantify her own judgements But it’s still very useful to follow the SHELF process In effect, the analyst interrogates herself Playing the role of facilitator as well as that of expert 12/8/2014 UQ Summerschool 2014 36 Multiple inputs Hitherto we’ve basically considered just one input X In practice, simulators almost always have multiple inputs Then we need to think about dependence Two or more uncertain quantities are independent if: When you learn something about one of them it doesn’t change your beliefs about the others It’s a personal judgement, like everything else in elicitation! They may be independent for one expert but not for another Independence is nice Independent inputs can just be elicited separately 12/8/2014 UQ Summerschool 2014 37 Exercise 3 1. 2. 3. 4. Which of the following sets of quantities would you consider independent? The average weight B of black US males aged 40 and the average weight W of white US males aged 40 My height H and my age A The time T taken by the Japanese bullet train to travel from Tokyo to Kyoto and the distance D travelled The atomic numbers of Calcium (Ca), Silver (Ag) and Arsenic (As) 12/8/2014 UQ Summerschool 2014 38 Eliciting dependence If quantities are not independent we must elicit the nature and magnitude of dependence between them Remembering that probabilities are the best summaries to elicit Joint probabilities Probability that X takes some x-values and Y takes some y-values Conditional probabilities Probability that Y takes some y-values if X takes some x-values Much harder to think about than probabilities for a single quantity Perhaps the simplest is the quadrant probability Probability both X and Y are above their individual medians 12/8/2014 UQ Summerschool 2014 39 Bivariate quadrant probability First elicit medians Now elicit quadrant probability ? 0.5 Value indicates direction and strength of dependence Median 0.5 Median 12/8/2014 It can’t be negative Or more than 0.5 0.25 if X and Y are independent Greater if positively correlated 0.5 if when one is above its median the other must be Less than 0.25 if negatively correlated Zero if they can’t both be above their medians UQ Summerschool 2014 40 Higher dimensions This is already hard Just for two uncertain quantities In order to elicit dependence in any depth we will need to elicit several more joint or conditional probabilities More than two variables – more complex still! Even with just three quantities... Three pairwise bivariate distributions With constraints The three-way joint distribution is not implied by those, either We can’t even visualise or draw it! There is no clear understanding among elicitation practitioners on how to elicit dependence 12/8/2014 UQ Summerschool 2014 41 Avoiding the problem It would be so much easier if the quantities we chose to elicit were independent i.e. no dependence or correlation between them Then eliciting a distribution for each quantity would be enough We wouldn’t need to elicit multivariate summaries The trick is to ask about the right quantities Redefine inputs so they become independent This is called elaboration Or structuring 3 2 1 -3 -2 -1 y z0 x 0 1 2 3 -1 -2 -3 12/8/2014 UQ Summerschool 2014 42 Example – two treatment effects A clinical trial will compare a new treatment with an existing treatment Existing treatment effect A is relatively well known Expert has low uncertainty But added uncertainty due to the effects of the sample population New treatment effect B is more uncertain Evidence mainly from small-scale Phase III trial A and B will not be independent Mainly because of the trial population effect If A is at the high end of the expert’s distribution, she would expect B also to be relatively high Can we break this dependence with elaboration? 12/8/2014 UQ Summerschool 2014 43 Relative effect In the two treatments example, note that in clinical trials attention often focuses on the relative effect R = B/A When effect is bad, like deaths, this is called relative risk Expert may judge R to be independent of A Particularly if random trial effect is assumed multiplicative If additive we might instead consider A independent of D = B – A But this is unusual So elicit separate distributions for R and A The joint distribution of (A, B) is now implicit Can be derived if needed But often the motivating task can be rephrased in terms of (A, R) 12/8/2014 UQ Summerschool 2014 44 Trial effect Instead of simple structuring with the relative risk R, we can explicitly recognise the cause of the correlation Let T be the trial effect due to difference between the trial patients and the wider population Let E and N be efficacies of existing and new treatments in the wider population Then A = E x T and B = N x T Expert may be comfortable with independence of T, E and N With E well known, T fairly well known and N more uncertain We now have to elicit distributions for three quantities instead of two But can possibly assume them independent 12/8/2014 UQ Summerschool 2014 45 General principles Independence or dependence are in the head of the expert Two quantities are dependent if learning about one of them would change his/her beliefs about the other Explore possible structures with the expert(s) Find out how they think about these quantities Expertise often involves learning how to break a problem down into independent components SHELF does not yet handle multivariate elicitation But it does include an explicit structuring step Which we can now see is potentially very important! Templates for some special cases expected in the next release 12/8/2014 UQ Summerschool 2014 46 Multiple experts The case of multiple experts is important When elicitation is used to provide expert input to a decision problem with substantial consequences, we generally want to use the skill of as many experts as possible But they will all have different opinions Different distributions How do we aggregate them? In order to get a single elicited distribution 12/8/2014 UQ Summerschool 2014 47 Aggregating expert judgements 1. Two approaches Aggregate the distributions Elicit a distribution from each expert separately Combine them using a suitable formula For instance, simply average them Called ‘mathematical aggregation’ or ‘pooling’ 2. Aggregate the experts Get the experts together and elicit a single distribution Called ‘behavioural aggregation’ Neither is without problems 12/8/2014 UQ Summerschool 2014 48 Multiple experts in SHELF SHELF uses behavioural aggregation However, distributions are first elicited from experts separately After sharing of key information Allows facilitator to see the range of belief before aggregation Then experts discuss their differences With a view to assigning an aggregate distribution To represent what an impartial, intelligent observer might reasonably believe after seeing the experts’ judgements and hearing their discussions Facilitator can judge whether degree of compromise is appropriate to the intervening discussion 12/8/2014 UQ Summerschool 2014 49 Challenges in behavioural aggregation More psychological hazards Group dynamic – dominant/reticent experts Tendency to end up more confident Block votes Requires careful management What to do if they can’t agree? End up with two or more composite distributions Need to apply mathematical pooling to these But this is rare in practice 12/8/2014 UQ Summerschool 2014 50 Conclusions – Session 1 The analyst needs to supply probability distributions for uncertain inputs Probabilities are personal judgements But as objective and scientific as possible Distributions should represent her best knowledge A range of scenarios for specifying distributions From pure judgement When no good data are available To simply using a published value With quantification of uncertainty around that value Almost always, some part of the task will require distributions based on personal judgement E.g. prior distributions for Bayesian analysis of data 12/8/2014 UQ Summerschool 2014 51 Elicitation is the process of formulating knowledge about an uncertain quantity as a probability distribution Many pitfalls, practical and psychological SHELF is a set of protocols designed to avoid pitfalls An example of best practice in elicitation Templates to guide the facilitator or analyst through a structured sequence of steps Particular challenges arise when eliciting judgements about multiple inputs or from multiple experts 12/8/2014 UQ Summerschool 2014 52 And so to the coffee break Once we have specified input distributions, the next task is propagation of uncertainty through the simulator A well studied problem in UQ Polynomial chaos favoured by engineers, mathematicians Gaussian process emulators preferred by statisticians In my opinion, a more powerful and comprehensive UQ approach 12/8/2014 UQ Summerschool 2014 53 UQ: from end to end Session 2: Model discrepancy and calibration Case study – carbon flux Vegetation can be a major factor in mitigating the increase of CO2 in the atmosphere And hence reducing the greenhouse effect Through photosynthesis, plants take atmospheric CO2 Carbon builds new plant material and O2 is released But some CO2 is released again Respiration, death and decay The net reduction of CO2 is called Net Biosphere Production (NBP) I will refer to it as the carbon flux Complex processes modelled in SDGVM Sheffield Global Dynamic Vegetation Model 12/8/2014 UQ Summerschool 2014 55 SDGVM C flux outputs for 2000 Map of SDGVM estimates shows positive flux (C sink) in North, but negative (C source) in Midlands Total estimated flux is 9.06 Mt C Highly dependent on weather, so will vary greatly between years 12/8/2014 UQ Summerschool 2014 56 Quantifying input uncertainties Plant functional type parameters (growth characteristics) Expert elicitation Soil composition (nutrients and decomposition) Simple analysis from extensive (published) data Land cover (which PFTs are where) More complex Bayesian analysis of ‘confusion matrix’ data 12/8/2014 UQ Summerschool 2014 57 Elicitation Beliefs of expert (developer of SDGVMd) regarding plausible values of PFT parameters Four PFTs – Deciduous broadleaf (DBL), evergreen needleleaf (ENL), crops, grass Many parameters for each PFT Key ones identified by preliminary sensitivity analysis Important to allow for uncertainty about mix of species in a site and role of parameter in the model In the case of leaf life span for ENL, this was more complex 12/8/2014 UQ Summerschool 2014 58 ENL leaf life span 12/8/2014 UQ Summerschool 2014 59 Correlations PFT parameter value at one site may differ from its value in another Because of variation in species mix Common uncertainty about average over all species induces correlation Elicit beliefs about average over whole UK ENL joint distributions are mixtures of 25 components, with correlation both between and within years 12/8/2014 UQ Summerschool 2014 60 Soil composition Percentages of sand, clay and silt, plus bulk density Soil map available at high resolution Multiple values in each SDGVM site Used to form average (central estimate) And to assess uncertainty (variance) Augmented to allow for uncertainty in original data (expert judgement) Assumed independent between sites 12/8/2014 UQ Summerschool 2014 61 Land cover map LCM2000 is another high resolution map Obtained from satellite images Vegetation in each pixel assigned to one of 26 classes Aggregated to give proportions of each PFT at each site But data are uncertain Field data are available at a sample of pixels Countryside Survey 2000 Table of CS2000 class versus LCM2000 class is called the confusion matrix 12/8/2014 UQ Summerschool 2014 62 CS2000 versus LCM2000 matrix LCM2000 DBL CS2000 ENL Grass Crop Bare DBL 66 3 19 4 5 Enl Grass Crop Bare 8 31 7 2 20 5 1 0 1 356 41 3 0 22 289 8 0 15 9 81 Not symmetric Rather small numbers Bare is not a PFT and produces zero NBP 12/8/2014 UQ Summerschool 2014 63 Modelling land cover The matrix tells us about the probability distribution of LCM2000 class given the true (CS2000) class Subject to sampling errors But we need the probability distribution of true PFT given observed PFT Posterior probabilities as opposed to likelihoods We need a prior distribution for land cover We used observations in a neighbourhood Implicitly assuming an underlying smooth random field And the confusion matrix says nothing about spatial correlation of LCM2000 errors We again relied on expert judgement Using a notional equivalent number of independent pixels per site 12/8/2014 UQ Summerschool 2014 64 Overall proportions Red lines show LCM2000 proportions Clear overall biases Analysis gives estimates for all PFTs in each SDGVM site With variances and correlations 12/8/2014 UQ Summerschool 2014 65 Case study – results Following on the carbon flux case study, input uncertainties were propagated through the SDGVM simulator Extensive use of Gaussian process emulators 12/8/2014 UQ Summerschool 2014 66 Mean NBP corrections 12/8/2014 UQ Summerschool 2014 67 NBP standard deviations 12/8/2014 UQ Summerschool 2014 68 Aggregate across 4 PFTs Mean NBP 12/8/2014 Standard deviation UQ Summerschool 2014 69 England & Wales aggregate Plug-in estimate (Mt C) Mean (Mt C) Variance (Mt C2) Grass 5.28 4.37 0.2453 Crop 0.85 0.43 0.0327 Deciduous 2.13 1.80 0.0221 Evergreen 0.80 0.86 0.0048 PFT Covariances Total 12/8/2014 -0.0081 9.06 7.46 UQ Summerschool 2014 0.2968 70 Sources of uncertainty The total variance of 0.2968 is made up as follows Variance due to PFT and soil inputs = 0.2642 Variance due to land cover uncertainty = 0.0105 Variance due to interpolation/emulation = 0.0222 Land cover uncertainty much larger for individual PFT contributions Dominates for ENL But overall tends to cancel out Changes estimates Larger mean corrections and smaller overall uncertainty But we haven’t addressed what is probably the biggest source of uncertainty in this carbon flux problem … 12/8/2014 UQ Summerschool 2014 71 Notation A simulator takes a number of inputs and produces a number of outputs We can represent any output y as a function y = f (x) of a vector x of inputs 12/8/2014 UQ Summerschool 2014 72 Example: A simple machine (SM) A machine produces an amount of work y which depends on the amount of effort x put into it Ideally, y = f(x, β) = βx Parameter β is the rate at which effort can be converted to work True value of β is β* = 0.65 Data zi = yi + εi Graph shows observed data Points lie below y = 0.65x For large enough x Because of losses due to friction etc. Large relative to observation errors 12/8/2014 UQ Summerschool 2014 73 The SM as a simulator A simulator produces output from inputs When we consider calibration we divide its inputs into Calibration parameters – unknown but fixed Control variables – known features of application context Calibration concerns learning about the calibration parameters Using observations of the real process Extrapolation concerns predicting the real process At control variable values beyond where we have observations We can view the SM as a (very simple) simulator x is a control variable, β is a calibration parameter 12/8/2014 UQ Summerschool 2014 74 Tuning and physical parameters Calibration parameters may be physical or just for tuning We adjust tuning parameters so the model fits reality better We are not really interested in their ‘true’ values We calibrate tuning parameters for prediction Physical parameters are different We are often really interested in true physical values The SM’s efficiency parameter β is physical It’s the theoretically achievable efficiency in the absence of friction We like to think that calibration can help us learn about them 12/8/2014 UQ Summerschool 2014 75 Exercise 4 Look at the four datasets, and in each case estimate the best fitting slope β Draw a line by eye through the origin Using a straight-edge Read off the slope as the y value on the line when x = 1 Write that value beside the graph The actual best-fitting calibrated values are: Dataset 1 – 0.58 Dataset 2 – 0.58 Dataset 3 – 0.66 Dataset 4 – 0.57 12/8/2014 UQ Summerschool 2014 76 Calibrating the SM It’s basically a simple linear regression through the origin zi = βxi + εi Calibration Posterior distribution misses the true value completely More data makes things worse More and more tightly concentrated on the wrong value We could use a quadratic regression but the problem would remain 12/8/2014 UQ Summerschool 2014 77 The problem is completely general Calibrating (inverting, tuning, matching) a wrong model gives parameter estimates that are wrong Not equal to their true physical values – biased With more data we become more sure of these wrong values The SM is a trivial model, but the same conclusions apply to all models All models are wrong In more complex models it is just harder to see what is going wrong Even with the SM, it takes a lot of data to see any curvature in reality What can we do about this? 12/8/2014 UQ Summerschool 2014 78 Model discrepancy The SM example suggests that we need to allow that the model does not correctly represent reality For any values of the calibration parameters The simulator outputs deviate systematically from reality Model discrepancy (or model bias or model error) There is a difference between the model with best/true parameter values and reality r(x) = f(x, θ) + δ(x) where δ(x) represents this discrepancy and will typically itself have uncertain parameters We observe zi = r(xi) + εi = f(xi, θ) + δ(xi) + εi 12/8/2014 UQ Summerschool 2014 79 SM revisited Kennedy and O’Hagan (2001) introduced this model discrepancy Modelled it as a zero-mean Gaussian process They claimed it acknowledges additional uncertainty And mitigates against over-fitting of θ So add this model discrepancy term to the linear model of the simple machine r(x) = βx + δ(x) With δ(x) modelled as a zero-mean GP Posterior distribution of β now behaves quite differently 12/8/2014 UQ Summerschool 2014 80 SM – calibration, with discrepancy Posterior distribution much broader and doesn’t get worse with more data But still misses the true value 12/8/2014 UQ Summerschool 2014 81 Interpolation Main benefit of simple GP model discrepancy is prediction E.g. at x = 1.5 Prediction within the range of the data is possible And gets better with more data 12/8/2014 UQ Summerschool 2014 82 But when it comes to extrapolation … … at x = 6 More data doesn’t help because it’s all in the range [0, 4] Prediction OK here but gets worse for larger x 12/8/2014 UQ Summerschool 2014 83 Extrapolation One reason for wish to learn about physical parameters Should be better for extrapolation than just tuning Without model discrepancy The parameter estimates will be biased Extrapolation will also be biased Because best fitting parameter values are different in different parts of the control variable space With more data we become more sure of these wrong values With GP model discrepancy Extrapolating far from the data does not work No information about model discrepancy Prediction just uses the (calibrated) simulator 12/8/2014 UQ Summerschool 2014 84 We haven’t solved the problem With simple GP model discrepancy the posterior distribution for θ is typically very wide Increases the chance that we cover the true value But is not very helpful And increasing data does not improve the precision Similarly, extrapolation with model discrepancy gives wide prediction intervals And may still not be wide enough What’s going wrong here? 12/8/2014 UQ Summerschool 2014 85 Nonidentifiability Formulation with model discrepancy is not identifiable For any θ, there is a δ(x) to match reality perfectly Reality is r(x) = f(x, θ) + δ(x) Given θ, model discrepancy is δ(x) = r(x) – f(x, θ) Suppose we had an unlimited number of observations We would learn reality’s true function r(x) exactly Within the range of the data Interpolation works But we would still not learn θ It could in principle be anything And we would still not be able to extrapolate reliably 12/8/2014 UQ Summerschool 2014 86 The joint posterior Calibration leads to a joint posterior distribution for θ and δ(x) But nonidentifiability means there are many equally good fits (θ, δ(x)) to the data Induces strong correlation between θ and δ(x) This may be compounded by the fact that simulators often have large numbers of parameters (Near-)redundancy means that different θ values produce (almost) identical predictions Sometimes called equifinality Within this set, the prior distributions for θ and δ(x) count 12/8/2014 UQ Summerschool 2014 87 The importance of prior information The nonparametric GP term allows the model to fit and predict reality accurately given enough data Within the range of the data But it doesn’t mean physical parameters are correctly estimated The separation between original model and discrepancy is unidentified Estimates depend on prior information Unless the real model discrepancy is just the kind expected a priori the physical parameter estimates will still be biased To learn about θ in the presence of model discrepancy we need better prior information And this is also crucial for extrapolation 12/8/2014 UQ Summerschool 2014 88 Better prior information For calibration Prior information about θ and/or δ(x) We wish to calibrate because prior information about θ is not strong enough So prior knowledge of model discrepancy is crucial In the range of the data For extrapolation All this plus good prior knowledge of δ(x) outside the range of the calibration data That’s seriously challenging! In the SM, a model for δ(x) that says it is zero at x = 0, then increasingly negative, should do better 12/8/2014 UQ Summerschool 2014 89 Inference about the physical parameter We conditioned the GP δ(0) = 0 δ′(0) = 0 δ′(0.5) < 0 δ′(1.5) < 0 12/8/2014 UQ Summerschool 2014 90 Prediction x = 1.5 12/8/2014 x=6 UQ Summerschool 2014 91 Where is the uncertainty? Return to the general case How might the simulator output y = f (x) differ from the true real-world value z that the simulator is supposed to predict? Error in inputs x Initial values Forcing inputs Model parameters Error in model structure or solution Wrong, inaccurate or incomplete science Bugs, solution errors 12/8/2014 UQ Summerschool 2014 92 Quantifying uncertainty The ideal is to provide a probability distribution p(z) for the true real-world value The centre of the distribution is a best estimate Its spread shows how much uncertainty about z is induced by uncertainties on the previous slide How do we get this? Input uncertainty: characterise p(x), propagate through to p(y) Model discrepancy: characterise p(z - y) 12/8/2014 UQ Summerschool 2014 93 The hard part We know pretty well how to do uncertainty propagation Uncertainties associated with the simulator output The hard part is the link to reality The difference between the real system z and the simulator output y = f(x) using best input values Because all models are wrong (Box, 1979) It was through thinking about this link … Particularly in the context of calibration i.e. learning about uncertain parameters in the model And also extrapolation … that I was led to think more deeply about parameters And to realise just how important model discrepancy is 12/8/2014 UQ Summerschool 2014 94 Modelling model discrepancy 1. Three rules: Must account for model discrepancy Ignoring it leads to biased calibration, over-optimistic predictions 2. Discrepancy term must be modelled nonparametrically Allows learning about reality and interpolative prediction 3. Model must incorporate realistic knowledge about discrepancy To get unbiased learning about physical parameters and extrapolation But following these rules is hard Ongoing research 12/8/2014 UQ Summerschool 2014 95 Managing uncertainty To understand the implications of different uncertainty sources Probabilistic, variance-based sensitivity analysis Helps with targeting and prioritising research To reduce uncertainty, get more information! Informal – more/better science Tighten p(x) through improved understanding Tighten p(z - y) through improved modelling or programming Formal – using real-world data Calibration – learn about model parameters Data assimilation – learn about the state variables Learn about model discrepancy z - y Validation (another talk!) 12/8/2014 UQ Summerschool 2014 96 Conclusions – Session 2 Without model discrepancy Inference about physical parameters will be wrong And will get worse with more data The same is true of prediction Both interpolation and extrapolation With crude GP model discrepancy Interpolation inference is OK And gets better with more data But we still get physical parameters and extrapolation wrong The better our prior knowledge about model discrepancy The more chance we have of getting physical parameters right Also extrapolation But then we need even better prior knowledge 12/8/2014 UQ Summerschool 2014 97 Any final questions? It remains just to say thank you for sitting through this morning’s sessions! 12/8/2014 UQ Summerschool 2014 98