Beispielfoilen - National University of Singapore

Download Report

Transcript Beispielfoilen - National University of Singapore

Analyzing input and structural uncertainty
of a hydrological model with stochastic,
time-dependent parameters
Peter Reichert
Eawag Dübendorf and ETH Zürich,
Switzerland
Eawag: Swiss Federal Institute of Aquatic Science and Technology
Contents
Motivation
Approach
Implementation
Application
Discussion
 Motivation
 Approach
 Implementation
 Application
 Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Motivation
Motivation
Approach
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Motivation
Motivation
Motivation
Approach
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
 Environmental modelling is often based on deterministic
models that describe substance and organism mass
balances in environmental compartments.
 Statistical inference with such models is often based on
the assumption that the data is independently and
identically distributed around the predictions of the
deterministic model at „true“ parameter values.
 The concept underlying this approach is that the
deterministic model describes the „true“ system
behaviour and the probability distributions centered at
the model predictions the measurement process.
Motivation
Motivation
Approach
Implementation
Application
Discussion
 Empirical evidence often demonstrates the invalidity of
these statistical assumptions:
 Residuals are often heteroscedastic and
autocorrelated.
 The residual error is usually (much) larger than the
measurement error.
 This leads to incorrect results of statistical inference.
In particular, parameter and model output uncertainty are
usually underestimated.
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
 These obviously wrong results lead to abandoning of the
statistical approach and to the development of
conceptually poorer techniques in applied sciences.
 We are interested in a statistically satisfying approach to
this problem.
Motivation
Approach
(Kennedy and O‘Hagan, 2001,
and many earlier, more case-specific approaches):
Extend the model by a discrepancy or bias term.
Implementation
Replace:
Suggested solution
Motivation
Application
Discussion
by:
where yM = deterministic model, x = model inputs, q = model
parameters, Ey = observation error, B = bias or model
discrepancy, YM = random variable representing model
results.
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
The bias term is usually formulated as a non-parametric
statistical description of the model deficits (typically as a
Gaussian stochastic process).
Motivation
Motivation
Approach
Implementation
Advantage of this approach: Statistical description of
model discrepancy improves uncertainty analysis.
Disadvantage: Lack of understanding of the cause of the
discrepancy makes it still difficult to extrapolate.
Application
Discussion
We are interested in a technique that supports
identification of the causes and reduction of these
discrepancies.
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Motivation
Motivation
Approach
Implementation
There are three generic causes of failure of the
description of nature with a deterministic model plus
measurement error:
Application
Discussion
1. Errors in deterministic model structure.
2. Errors in model input.
3. Inadequateness of a deterministic description of
systems that contain intrinsic non-deterministic
behaviour due to
 influence factors not considered in the model,
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
 model simplifications (e.g. aggregation, adaptation,
etc.),
 chaotic behaviour not represented by the model.
Motivation
Motivation
Approach
Because of these deficits we cannot expect a
deterministic model to describe nature appropriately.
Implementation
Application
Discussion
Pathway for improving models:
1. Reduce errors in deterministic model structure to
improve average behaviour.
2. Add adequate stochasticity to the model structure to
account for random influences.
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
This requires the combination of statistical analyses
with scientific judgment.
This talk is about support of this process by statistical
techniques.
Approach
Motivation
Approach
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Approach
Approach
Questions:
Motivation
Approach
1. How to make a deterministic, continuous-time model
stochastic?
Implementation
Application
Discussion
2. How to distinguish between deterministic and
stochastic model deficits?
 Replacement of differential equations (representing
conservation laws) by stochastic differential equations can
violate conservation laws and does not address the cause
of stochasticity directly.
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
 It seems to be conceptually more satisfying to replace
model parameters (such as rate coefficients, etc.) by stochastic processes, as stochastic external influence factors
usually affect rates and fluxes rather than states directly.
The model consists then of an extended set of stochastic
differential equations of which some have zero noise.
Approach
Motivation
Approach
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Approach
Motivation
Approach
Implementation
Application
Discussion
Note that the basic idea of this approach is very old.
The original formulation was, however, limited to linear
or weakly nonlinear, discrete-time systems with slowly
varying driving forces (e.g. Beck 1987).
The bias term approach is a special case of our
approach that consists of an additive output parameter.
Our suggestion is to
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008

extend this original approach to continuous-time
and nonlinear models;

allow for rapidly varying external forces;

embed the procedure into an extended concept of
statistical „bias-modelling“ techniques.
Implementation
Motivation
Approach
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Implementation
Model
Motivation
Approach
Deterministc model:
y M (x, θM )
Implementation
Application
Consideration of observation error:
Discussion
q
M
x
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
YM
Model
Motivation
Model with parameter i time-dependent:
Approach
Implementation
Application
q
P,i
Discussion
x
q-iM
M(i)
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Y
QM,t
i
Time Dependent Parameter
Motivation
The time dependent parameter is modelled by a
mean-reverting Ornstein Uhlenbeck process:
Approach
Implementation
Application
Discussion
This has the advantage that we can use the analytical
solution:
or, after reparameterization:
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Inference
Motivation
Approach
Implementation
Application
Discussion
We combine the estimation of
 constant model parameters, θM,-i , with
 state estimation of the time-dependent parameter(s),
q Mt ,i , and with
 the estimation of (some of the) (constant) parameters
of the Ornstein-Uhlenbeck process of the time
dependent parameter(s), θiP .
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Inference
Motivation
Gibbs sampling for the three different types of
parameters. Conditional distributions:
simulation model (expensive)
Approach
Implementation
Application
Ornstein-Uhlenbeck process (cheap)
Discussion
Ornstein-Uhlenbeck process (cheap)
q
P,i
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
simulation model (expensive)
x
Tomassini et al. 2007
q-iM
(i)
YM
QM,t
i
Inference
Metropolis-Hastings sampling for each type of parameter:
Motivation
Approach
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Multivariate normal jump distributions for the
parameters qM and qP. This requires one simulation to
be performed per suggested new value of qM.
The discretized Ornstein-Uhlenbeck parameter, q Mt ,i , is
split into subintervals for which OU-process
realizations conditional on initial and end points are
sampled. This requires the number of subintervals
simulations per complete new time series of q Mt ,i.
Tomassini et al. 2007
Application
Motivation
Approach
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Application
Hydrological Model
Motivation
Simple Hydrological Watershed Model (1):
Approach
Implementation
qrain
qet
Application
qrunoff
Discussion
soil
qgw
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
ground
water
qlat
river
qr
qbf
qdp
Kuczera et al. 2006
Hydrological Model
Simple Hydrological Watershed Model (2):
Motivation
Approach
qgw  fsat qgw, max
A
4
qbf  kbf hgw
Implementation
Application
qdp  kdp hgw
Discussion
qlat  fsat qlat,max
1
5
B
qr  kr hr
2
6
qrain
qet
7
Qr  fQ Aw qr
qrunoff
C
soil
qgw
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
3
ground
water
qdp
8
qlat
river
qr
qbf
Kuczera et al. 2006
8
3
1
3
model parameters
initial conditions
standard dev. of obs. err.
„modification parameters“
Hydrological Model
Motivation
Approach
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Simple Hydrological Watershed Model (3):
Model Application
Motivation
Approach
Implementation
Application
Discussion
 Data set of Abercrombie watershed, New South
Wales, Australia (2770 km2), kindly provided by
George Kuczera (Kuczera et al. 2006).
 Box-Cox transformation applied to model and
data to decrease heteroscedasticity of residuals.
 Step function input to account for input data in
the form of daily sums of precipitation and
potential evapotranspiration.
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
 Daily averaged output to account for output data
in the form of daily averaged discharge.
Analysis with Constant Parameters
Motivation
Approach
Implementation
Application
Discussion
 Estimation of 11 model parameters:
8 rate parameters
3 initial conditions
1 measurement standard deviation
 Priors: Independent lognormal distributions for
all parameters with the exception of the
measurement standard deviation (1/s).
 Modification factors (frain, fpet, fQ) kept equal to
unity.
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Results for Constant Parameters
hgw.ini
hr.ini
100
150
200
0
50
150
0.006
0.012
4
6
8
0
5
10
15
20
kr
8
4
0.004
0
0
0.002
0.000
0.005
0.015
0.0
0.4
0.8
1.2
20
Q.trans
0.02
0.04
0
0.0000
10
300
0.010
sF
0.0020
ks
0.00
0.6
0.8
2
300
0 1000
IMS, Singapore,
Jan. 2008
0.4
qgw.max
k dp
0
Data-driven
and physicallybased models,
0.2
0.0
0
k bf
0.000
0.0
1.5
400
0
0.000
250
qlat.max
k et
Application
Discussion
50
0
0.000
0
Implementation
0.0
Approach
0.000
4
8
0.020
Motivation
0.020
hs.ini
0
4000
8000
12000
0.0
0.5
1.0
1.5
2.0
Results for Constant Parameters
Motivation
Approach
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Results for Constant Parameters
Motivation
Approach
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Results for Constant Parameters
Motivation
Approach
Implementation
Application
Discussion
The results show the typical deficiencies of
deterministic models:
 Residuals are heteroscedastic and autocorrelated.
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
 The standard deviation of the residuals is larger than
the measurement error (increasing from 0.24 m3/s at
a discharge of zero to 30 m3/s at 100 m3/s).
 Model predictions are overconfident.
In addition: ground water level trend seems unrealistic.
Deficiency
Analysis
Motivation
Approach
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Deficiency Analysis / Step 1
Motivation
Step 1: Estimation of time-dependent parameters
Approach
Implementation
Application
Discussion
 Estimation of 11 time-dependent parameters:
8 rate parameters
3 modification factors (frain, fpet, fQ)
 Ornstein-Uhlenbeck process applied to the log of
each parameter sequentially.
Hyperparameters: t =1d, s =0.2 (22%) fixed, only
estimation of initial value and mean (0 for log frain,
fpet, fQ).
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
 Constant parameters as before.
Deficiency Analysis / Step 2
Motivation
Step 2: Analyzing Degree of Bias Reduction
Approach
Implementation
Application
Discussion
 As quality of fit is insufficient (residual standard
deviation larger than measurement error), quality
of fit is a primary indicator of bias (when being
careful with regard to overfitting).
 Reduction of autocorrelation can be checked as
a secondary criterion (it is likely to be
accompanied by reduction of residual standard
deviation).
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Deficiency Analysis / Step 2
Improvement of fit:
Motivation
Approach
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Nash-Sutcliffe indices:
frain
ks
fQ
sF
fpet
kr
ket
qlat,max
kdp
kgw,max
kbf
0.90
0.84
0.67
0.63
0.60
0.57
0.54
0.54
0.53
0.52
0.52
Assessment:
 Input (frain) and output
(fQ) modifications.
 Potential for soil /
runoff model (ks, SF)
improvements.
 Some potential for
river and evaporation
improvements.
base
0.51
Random or deterministic?
Deficiency Analysis / Step 3
Motivation
Step 3: Identification of Potential Dependences
Approach
Implementation
Application
Discussion
 Despite doing an exploratory analysis of the
values of time dependent parameters on all
model states and inputs, no significant
dependences could be found.
 This is an indication that it may be difficult to
improve the deterministic model, or that the
improvement will be restricted to a small number
of data points.
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Deficiency Analysis / Step 4
Step 4: Improvement of Deterministic Model :
Motivation
Approach
Extension 1: Modification of runoff flux:
Implementation
Application
Discussion
Extension 2: Modification of sat. area funct.:
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Extentsion 1 has two, extension 2 three additional model
parameters.
Deficiency Analysis / Step 4
Motivation
Approach
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Model Extensions:
Deficiency Analysis / Step 4
Previous results:
Motivation
Approach
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Nash-Sutcliffe indices:
frain
ks
fQ
sF
fpet
kr
ket
qlat,max
kdp
kgw,max
kbf
0.90
0.84
0.67
0.63
0.60
0.57
0.54
0.54
0.53
0.52
0.52
base
0.51
Extended models:
Nash-Sutcliffe indices:
ext. 1
ext. 2
0.73
0.51
Deficiency Analysis / Step 4
Original Model:
Motivation
Approach
Implementation
Application
Discussion
Modified Model:
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Deficiency Analysis / Step 4
Original Model:
Motivation
Approach
Implementation
Application
Discussion
Modified Model:
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Deficiency Analysis / Step 4
Original Model:
Motivation
Approach
Implementation
Application
Discussion
Modified Model:
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Deficiency Analysis / Step 4
Motivation
Conclusions of Step 4
Approach
Implementation
Application
Discussion
 The significant increase in the Nash-Sutcliffe
index is caused by the elimination of a small
number of outliers.
 All other deficiencies remain.
 This is the reason why the improvement could
not have been detected in the exploratory
analysis.
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
 It seems questionable that the remaining
deficiencies could be significantly reduced by
improvements of the deterministic model.
Deficiency Analysis / Step 5
Motivation
Step 5: Addition of Stochasticity to the Model
Approach
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Major sources of indeterminism:
 Spatial aggregation:
Aggregation of distributed reservoirs in a much
smaller number of reservoirs in the model leads
to the same model results for different „states of
nature“ (that lead to different results in nature).
 Rainfall uncertainty:
Spatial heterogeneity of rainfall intensity is not
well captured by point rainfall measurements.
Deficiency Analysis / Step 5
Motivation
Approach
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
It seems reasonable to summarize these sources
of indeterminism by a stochastic rain modification
factor frain.
To quantify input uncertainty (combined with
aggregation error) we need an informative prior for
the measurement error.
We choose sQ,trans ~ N(0.5,0.05).
0.5 corresponds to a standard deviation in original
units increasing from 0.1 m3/s at a discharge of
zero to 12.6 m3/s at a discharge of 100 m3/s.
The standard deviation of the Ornstein-Uhlenbeck
process for log frain is now estimated from the data.
Deficiency Analysis / Step 5
Time-dependent parameter frain:
Application
0
Discussion
2
Implementation
2_frain
Approach
4
6
Motivation
0
500
1000
1500
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
4
2
0
2_frain
6
time
0
500
1000
time
1500
Deficiency Analysis / Step 5
hgw.ini
hr.ini
0
50
Approach
200
0
50
150
0.0
0.2
0.4
qgw.max
0.8
0
2
4
6
8
0
5
10
k dp
k bf
15
20
kr
0.002
0.004
0.015
0.0
0.4
0.8
1.2
sF
0.04
0
0.0000
0.02
0
4000
Krain
8000
12000
100
150
200
1.0
20
0
10
30
0
50
0.6 0.8
Q.trans
15
0.04
0
0.0 0.2 0.4
log f rain
0.00
IMS, Singapore,
Jan. 2008
0.010
2
300
0
0.00
Data-driven
and physicallybased models,
0.005
0.0020
ks
0.000
4
0.000
0
0
0
4
8
300
1500
0.6
0.0
1.5
0.0
0.000 0.004 0.008 0.012
Discussion
250
qlat.max
k et
400
Application
150
0
Implementation
100
0
0.000
Motivation
0.000
4
8
0.020
0.020
hs.ini
0.0
0.2
0.4
0.6
0.8
0.0
0.5
1.0
1.5
2.0
Deficiency Analysis / Step 5
Original Model:
q ra in
50
Discussion
150
Application
0
Implementation
Q
Approach
50
150
0
Motivation
850
900
950
1000
1050
IMS, Singapore,
Jan. 2008
50
0
Data-driven
and physicallybased models,
150
Q
q ra in
50
150
0
Modified Model with Time-Dependent Parameter frain:
850
900
950
1000
1050
Deficiency Analysis / Step 5
Original Model:
100
0
0
Application
Discussion
hs
200
400
200
h gw
Approach
Implementation
300
Motivation
0
500
1000
1500
IMS, Singapore,
Jan. 2008
150
hs
80
0
50
40
Data-driven
and physicallybased models,
0
h gw
120
Modified Model with Time-Dependent Parameter frain:
0
500
1000
1500
Deficiency Analysis / Step 5
Original Model:
Application
0
-5
Implementation
nondim. resid.
Approach
5
Motivation
Discussion
0
500
1000
1500
IMS, Singapore,
Jan. 2008
2
-2
-6
Data-driven
and physicallybased models,
nondim. resid.
6
Modified Model with Time-Dependent Parameter frain:
0
500
1000
1500
Discussion
Motivation
Approach
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Discussion
Discussion
Motivation
Approach
Implementation
Application
Discussion
• The suggested procedure seems to fulfil the
expectations of supporting the identification of
model deficits and of introducing stochasticity
into a deterministic model.
• It is related to and can be viewed as a
generalization of previous work on
• Time-dependent parameters using Kalman filtering
(e.g. Beck and Young 1976, etc.)
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
• Modelling of bias of deterministic models
(Craig et al. 1996, Kennedy and O‘Hagan 2001, Bayarri
et al. 2005, etc.)
• Rainfall multipliers
(Kuczera 1990, Kavetski et al. 2001, etc.)
Discussion
Motivation
Approach
Implementation
Application
Discussion
• There is need for future research in the following
areas:
• Explore alternative ways of learning from the
identified parameter time series.
• Different formulation of time-dependent parameters
(for some applications smoother behaviour).
• Include multiple time-dependent parameters into
the analysis.
• Use a more specific model to represent input
uncertainty.
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
• Improve efficiency (linearization, emulation).
• Learn from more applications.
Acknowledgements
Motivation
Approach
Implementation
Application
Discussion
 Collaboration for this paper:
Johanna Mieleitner
 Development of the technique:
Hans-Rudolf Künsch, Roland Brun, Christoph Buser ,
Lorenzo Tomassini, Mark Borsuk.
• Hydrological example and data:
George Kuczera.
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
• Interactions at SAMSI:
Susie Bayarri, Tom Santner, Gentry White, Ariel
Cintron, Fei Liu, Rui Paulo, Robert Wolpert, John
Paul Gosling, Tony O‘Hagan, Bruce Pitman, Jim
Berger, and many more.