- Lorentz Center

Download Report

Transcript - Lorentz Center

Modelling of forecast errors in
Data Assimilation for NWP
"Uncertainty Analysis in Geophysical Inverse Problems" Lorentz Centre Workshops in
Sciences series, Leiden, Netherlands. Nov 2011. Andrew Lorenc
© Crown copyright Met Office
Content
0. Audience-specific orientation!
1. Historical context – what is best for NWP?
2. Bayesian methods and error modelling
3. Adding non-Gaussianity
© Crown copyright Met Office Andrew Lorenc 2
© Crown copyright Met Office Andrew Lorenc 2
© Crown copyright Met Office Andrew Lorenc ECMWF Seminar 2011 3
Data Assimilation for
Numerical Weather Prediction
An assimilation cycle is an adaptive filter process:
1.Every 6~12 hours we make a forecast from the current best
estimate, using best available computer forecast model
2.The DA process also provides an estimate of the error
distribution of this forecast
3.We do a Bayesian combination of the prior from 1 & 2 with a
batch of observations, to give a new best estimate valid 6~12
hours later.
4.We measure the properties of the actual forecast error, and at
intervals adapt the algorithm used in 2 (and also in 1 & 3).
Data Assimilation = The Scientific Method
© Crown copyright Met Office Andrew Lorenc ECMWF Seminar 2011 4
Historical Background:
What has been important for getting the
best NWP forecast?(over last 3 decades)
NWP systems are improving by 1 day of predictive
skill per decade. This has been due to:
1. Model improvements, especially resolution.
2. Careful use of forecast & observations, allowing
for their information content and errors.
3. Advanced assimilation using forecast model:
4D-Var has been successful.
4. Better observations.
© Crown copyright Met Office Andrew Lorenc 5
© Crown copyright Met Office Andrew Lorenc 5
Performance Improvements
“Improved by about a day per decade”
Met Office RMS surface pressure error over the N. Atlantic & W. Europe
© Crown copyright Met Office Andrew Lorenc 6
Peak Flops
60 Years of Met Office Computers
1.E+15
Moore’s Law
1.E+14
18month doubling
time
1.E+13
1.E+12
1.E+11
1.E+10
1.E+09
1.E+08
1.E+07
1.E+06
1.E+05
KDF 9
1.E+04
1.E+03
Mercury
LEO 1
1.E+02
1950
1960
1970
IBM Power -Phase 1&2
Cray T3E
NEC SX6/8
Cray C90
Cray YMP
Cyber 205
IBM 360
1980
1990
Year of First Use
© Crown copyright Met Office Andrew Lorenc 7
2000
2010
Historical Background:
Continuing Improvement of a Complex System
• NWP improvements are due to a synergistic combination
of improvements of forecast model, DA and observations.
• Each helps the other.
• Total NWP system is very large, complex and expensive.
• We cannot expect to understand it completely as a single entity.
• We cannot afford thorough testing of each improvement.
• Best to base each improvement on scientific insight and
analysis of one component.
• With belief (checked by testing) that theoretically better parts will
eventually give a better system.
© Crown copyright Met Office Andrew Lorenc 8
Implications for R&D strategy of
causes of improvements to NWP
1. Model improvements, especially resolution.
• NWP models are very large (109 variables) and
expensive to run. Nevertheless there are known
shortcomings which could be addressed by making
then larger and more expensive.
• DA improvements have to compete with the model for
the use of computer upgrades. We cannot expect a
large increase in the relative resources devoted to
assimilation.
• Rely on linear DA methods to cope with 109 variables
and millions of observations. Selected important
nonlinearities can be added as corrections.
© Crown copyright Met Office Andrew Lorenc 9
Ratio of global computer costs:
ratio of supercomputer costs:
11 day’s
(total incl. /FC)
/ 1forecast
day’s forecast.
day's DA
assimilation
1 day
100
Computer power increased by 1M in 30 years.
Only 0.04% of the Moore’s Law increase over
this time went into improved DA algorithms,
rather than improved resolution!
10
31
20
4D-Var
with
outer_loop
simple
4D-Var
on SX8
8
3D-Var on
T3E
5
AC scheme
1
1985
1990
© Crown copyright Met Office Andrew Lorenc 10
1995
2000
2005
2010
Operational NWP Models: 20th July 2011
Global
25km 70L
Hybrid 4DVAR – 60km inner loop
60h forecast twice/day
144h forecast twice/day
+24member EPS at 60km 2x/day
NAE
12km 70L
4DVAR – 24km
60h forecast
 4 times per day
 +24member EPS at 18km 2x/day
UK-V (& UK-4)
1.5km 70L
3DVAR (3 hourly)
36h forecast
Crown Copyright
Office
Met
Office Global Regional Ensemble Prediction System = MOGREPS
 4 ©times
per day2011. Source: Met
Implications for R&D strategy of
causes of improvements to NWP
2. Careful use of forecast & observations,
allowing for their information content and
errors.
• The forecast background summarises information
from past observations. It is as accurate as most
observations.
• Use incremental DA methods which correct the
background based on a Bayesian updating with
observed information.
• Need to understand the real information content of
observations, e.g. satellite soundings observe
radiances, not temperature profiles.
© Crown copyright Met Office Andrew Lorenc 12
Simplest possible Bayesian NWP
analysis
© Crown copyright Met Office Andrew Lorenc 13
Simplest possible example – 2 grid-points,
1 observation. Standard notation:
Ide, K., Courtier, P., Ghil, M., and Lorenc, A.C. 1997: "Unified notation for data assimilation:
Operational, Sequential and Variational" J. Met. Soc. Japan, Special issue "Data
Assimilation in Meteorology and Oceanography: Theory and Practice." 75, No. 1B, 181—189
 x1 
x= 
x 
 2
Model is two grid points:
y o = y o 
1 observed value yo midway (but use notation for >1):
Can interpolate an estimate y of the observed value:
y  H x   x1  x2  Hx  
1
2
1
2
1
2
1
2
 x1 
 
 x2 
This example H is linear, so
we can use matrix notation for
fields as well as increments.
© Crown copyright Met Office Andrew Lorenc 14
Bayes theorem in continuous
form, to estimate a value x given an
observation yo
o
p(
| x)p(x)
y
o
p(x| y ) =
p( yo )
p(xyo)
p(x)
p(yox)
Can get
p(yo)
is the posterior distribution,
is the prior distribution,
is the likelihood function for x
o
o
p(
)
=

p(
| x)p(x)dx
y
y
by integrating over all x:
© Crown copyright Met Office Andrew Lorenc 15
background pdf

px  = 2V  exppx1  = 2Vb  exp - 12
We have prior estimate
xb1 with error variance Vb:
- 12
2
b
But errors in x1 and x2 are usually correlated
 must use a multi-dimensional Gaussian:
p( x1  x 2 ) = px  = (2 )
2
1
2
x  x 
x  x 

V 
1
b 2
1
Vb
2
b 2
2
b
x ~ N (x : x b , B)

 1
b T
1
b 
| B | exp - x - x  B x - x 
 2

- 12
where B is the covariance matrix:
© Crown copyright Met Office Andrew Lorenc 16
1
2
 1 

B =V b
  1


background pdf
© Crown copyright Met Office Andrew Lorenc 17
Observational errors
Lorenc, A.C. 1986: "Analysis methods for numerical weather prediction."
Quart. J. Roy. Met. Soc., 112, 1177-1194.

o
t
y ~ N y ,E


instrumental error
o
- 12
p y | y = 2 |E| exp - 12 y o  y



 E y  y 
T
1
o
Difference between the actual instrument and a hypothetical perfect instrument.
For instance, for a wind observation from a radiosonde, the errors in tracking the
balloon might lead to an instrumental error of about 1ms-1.
y t ~ N H x ,F 
error of representativeness
 




T 1
 1 t

t
pt y |x = 2π|F| exp - y  H x  F y  H x 
 2

t
- 12
Errors from H and model resolution in predicting the value from a perfect instrument.
For instance, the error predicting a radiosonde wind from a model with grid-length
200km would be about 3ms-1, and a grid-length of 20km would reduce the error of
representativeness to about 1ms-1.
© Crown copyright Met Office Andrew Lorenc 18
Observational errors
Lorenc, A.C. 1986: "Analysis methods for numerical weather prediction."
Quart. J. Roy. Met. Soc., 112, 1177-1194.
Observational error
combines these 2 :
y ~ N H x ,E+F 
o
 

  
p y o|x =  p y o|y t pt y t|x dy


 E+F  y
= 2π|E+F| exp - y  H x 
- 12
1
2
o
T
-1
o
Instrumental and representativeness errors are convolved to give a
combined observational error: R=E+F.
Note that we have assumed observations and background are unbiased.
Treatment of biases is discussed in another lecture.
We have also assumed that observational errors are uncorrelated with
background errors.
© Crown copyright Met Office Andrew Lorenc 19

 H x 
background pdf
obs likelihood function
© Crown copyright Met Office Andrew Lorenc 20
Bayesian analysis equation
 
 
o
p
|x px 
y
p x| yo =
p yo
 
x ~ N x a , A
Property of Gaussians that, if H is linearisable :
where
xa
and A are defined by:
A 1  B 1  HT R 1H

x a  xb  AHT R 1 y o  H (xb )
Remember that observational error: R=E+F.
© Crown copyright Met Office Andrew Lorenc 21

background pdf
obs likelihood function
posterior analysis PDF
© Crown copyright Met Office Andrew Lorenc 22
Summary Equations
- all equivalent.
• Variational
x minimises J  x  
a
1
2
x  x 
• Kalman Filter. Kalman Gain=K.
• Observation space
b T
B
1
x  x   y
b
1
2
 H  x   R 1  y o  H  x  
 2 J 
1
A   2   B 1  HT R 1H
 x 

x a  xb  K y o  H  xb 
K  BH  HBH  R 
T
T
T
o

A   I  KH  B
1
Demonstrate equivalence using Sherman–Morrison–Woodbury formula
• Model space
K   HT R 1H  B 1  HT R 1
1
B  Z f Z f
• Ensemble space Square-root Filters, e.g. ETKF
T
Za  Z f T
A  Z Z
a
© Crown copyright Met Office Andrew Lorenc 23


a T
Options in solution methods
1. Global or local
•
Local allows observation selection, localising correlations
•
Global avoids seams, simplifies logic, allows use of model
operators & hence extends to 4D-Var.
2. Explicit or iterative
•
•
Iterative is cheaper (allowing bigger problems to be solved)
and can be extended to weakly nonlinear analysis.
Explicit matrix inversion give analysis error covariance.
Useful in QC and in some ensemble methods.
© Crown copyright Met Office Andrew Lorenc 24
Deterministic 4D-Var
Initial PDF is approximated by a Gaussian.
Descent algorithm only explores a small part of the PDF,
on the way to a local minimum.
4D analysis is a trajectory of the full model,
optionally augmented by a model error correction term.
© Crown copyright Met Office Andrew Lorenc 25
When does deterministic 4D-Var using
“automatic” adjoint methods not work?
Thermostats: - Fast processes which are modulated to maintain a
longer-time-scale “balance” (e.g. boundary layer fluxes).
Limits to growth: - Fast processes which in a nonlinear model are limited
by some available resource (e.g. evaporation of raindrops).
Butterflies: - Fast processes which are not predictable over a long 4DVar
time-window. (e.g. eddies with short space- & time-scales).
Observations of intermittent processes: - If something (e.g. a cloud or
rain) is missing from a state, then the gradient does not say what to
do to make it appear.
These are fundamental atmospheric processes – it is impossible to
write a good NWP model without representing them.
© Crown copyright Met Office Andrew Lorenc 26
Statistical, incremental 4D-Var
PF model evolves any simplified perturbation,
and hence covariance of PDF
Simplified
Gaussian
PDF t1
Simplified
Gaussian
PDF t0 Full
model evolves mean of PDF
Statistical 4D-Var approximates entire PDF by a Gaussian.
4D analysis increment is a trajectory of the PF model,
optionally augmented by a model error correction term.
© Crown copyright Met Office Andrew Lorenc 27
Statistical 4D-Var - equations
Independent, Gaussian
background and model
errors  non-Gaussian
pdf for general y:

P  x,  η y
o
  exp    x   x

exp  
exp 
Incremental linear approximations in
forecasting model predictions of
observed values converts this to an
approximate Gaussian pdf:
The mean of this
approximate pdf is identical
to the mode, so it can be
found by minimising:
Lorenc (2003a)
© Crown copyright Met Office Andrew Lorenc 28
1
2
1
2
 η  η 
1
2
y  y 
g
T
o

x
b
T
g

T

B 1  x   xb  x g 

Q 1  η  ηg

R 1 y  y o

 

y  HM  x, η  H M x g , ηg



  x   x  x   B  x   x
  η  η  Q  η  η 
 y  y  R y  y 
J  x,  η 
b
1
2
g
1
2
1
2
g
o
T
T
T
1
1
1
g
o

b
 xg 

Modelling and representing prior
background error covariances B.
• Explicit point-point [multivariate] covariance functions.
• Transformed control variables to deal with inter-variable
covariances.
• Vertical – horizontal split
• EOF decomposition into modes.
• Spectral decomposition into waves.
• Wavelets.
• Recursive filters or diffusion operators to give local
variations.
• Evolved covariances – Ensemble members.
© Crown copyright Met Office Andrew Lorenc 29
Schlatter’s (1975) multivariate covariances
Specified as
multivariate 2-point
functions.
Not easy to ensure
that specified
functions are
actually valid
covariances.
Used in OI and
related observationspace methods.
© Crown copyright Met Office Andrew Lorenc 30
Transformed control variable.
• Choose a “balanced” variable from which we can
calculate balanced flow in all variables: e.g.
streamfunction .
• Define transforms from (U) or to (T) this variable
and residual variables; by construction/hypothesis
inter-variable correlations are then zero.
• Continue with each unbalanced residual variable
in turn (velocity potential , pressure p, moisture
) making B block diagonal. (Compare EOFs)
• Transformed variables still need a spatial
covariance model, but not multivariate.
(Further transforms are used to represent these.)
ψ
 
χ 
x  U p    U p v
b
 

 
0
0
0 
 B ψ 


B χ 
0
0 
 0
Bv   
0
0 B b 
0 


 0
0
0 B    

B x   U p B  v  U p
T
 x  U p U v U h v  Uv. B v   I.
B x   UUT .
Lorenc, A. C., S. P. Ballard, R. S. Bell, N. B. Ingleby, P. L. F. Andrews, D. M. Barker, J. R. Bray, A. M. Clayton, T. Dalby, D. Li, T.
J. Payne and F. W. Saunders. 2000: The Met. Office Global 3-Dimensional Variational Data Assimilation Scheme. Quart. J.
Roy. Met. Soc., 126, 2991-3012.
© Crown copyright Met Office Andrew Lorenc 31
Estimating PDFs or
covariances
• Even if we knew the “truth”, we could never run enough
experiments in the lifetime of an NWP system to estimate its
error PDF, or even its error covariance B.
• Simplifying assumptions are essential (e.g. Gaussian, ...)
• Even a simplified error model has so many parameters that we
cannot determine them by NWP trials to determine which give
the best forecasts.
• In practice we can only measure innovations – cannot get
separate estimates of B & R without assumptions (Talagrand).
• Need to understand physics!
© Crown copyright Met Office Andrew Lorenc 32
Effect of the null space of B
• B should be positive semi-definite. If it has any (near) zero eigenvalues,
then no analysis increments are permitted in the direction of the
corresponding eigenmodes.
• Causes of a null space:
• Strong constraints used in the definition of B. E.g.
• Hydrostatic & geostrophic relationships
• Model equation in 4D-Var
• Too small a sample when estimating B
• Effects of a null space:
 Synergistic use of complementary observations
 Spurious long-range correlations and over confidence in accuracy of analysis.
© Crown copyright Met Office Andrew Lorenc 33
Analysis error for 500hPa height for different combinations of error-free
observations.
Z500
1000hPa
height (m)
(surface P)
1000-500hPa 500hPa w ind
thickness (m) component
(layer-mean T)
(m/ s)
w eights
0.143
0.419
0.611
0.192
0.520
0.853
0.699
1.147
0.441
0.628
0.461
0.880
© Crown copyright Met Office Andrew Lorenc 34
V500
500hPa
height (m)
Error (m)
21.0
20.8
19.1
18.9
14.4
18.4
16.7
1.9
T1000-500
Z1000
Lorenc, A.C. 1981: "A global three-dimensional
multivariate statistical analysis scheme." Mon.
Wea. Rev., 109, 701-721.
Background error (prior) covariance
B modelling assumptions
The first operational 3D multivariate statistical analysis method (Lorenc 1981) made
the following assumptions about the B which characterizes background errors,
all of which are wrong!
• Stationary – time & flow invariant
• Balanced – predefined multivariate relationships exist
• Homogeneous – same everywhere
• Isotropic – same in all directions
• 3D separable – horizontal correlation independent of vertical levels or
structure & vice versa.
Since then many valiant attempts have been made to address them individually,
but with limited success because of the errors remaining in the others.
The most attractive ways of addressing them all are long-window 4D-Var or
hybrid ensemble-VAR.
© Crown copyright Met Office Andrew Lorenc 36
© Crown copyright Met Office Andrew Lorenc 36
Implications for R&D strategy of
causes of improvements to NWP
3. Advanced assimilation using forecast model.
• 4D-Var is used by 5 of the top 7 global NWP centres.
Ensemble Kalman filter methods popular in other
applications.
Hybrid and Ensemble-Var methods are exciting much
interest in operational NWP R&D.
• Multiple forecasts allow the evolution of error
covariances and hence the definition of a 4D covariance.
• The NWP model is the best tool for defining the
“attractor” of plausible “balanced” states.
Incremental methods designed such that background
is only altered based on observed evidence.
© Crown copyright Met Office Andrew Lorenc 37
Evolved covariances
The Kalman Filter evolves
covariances by pre- & postmultiplying by a linear forecast
model.
4D-Var implicitly uses evolved
covariances.
© Crown copyright Met Office Andrew Lorenc 38
P  ti 1   M i P  ti  M iT  Qi
3D Covariances dynamically generated
by 4D-Var
If the time-period is long enough, the evolved 3D
covariances also depend on the dynamics:
B xtn   M n 1
M1M 0 B xt0  M T0 M1T
M Tn 1
Cross-section of the 4D-Var
structure function (using a
24 hour window).
Thépaut, Jean-Nöel, P. Courtier, G. Belaud and G
Lemaître: 1996 "Dynamical structure functions in a
four-dimensional variational assimilation: A case
study" Quart. J. Roy. Met. Soc., 122, 535-561
© Crown copyright Met Office Andrew Lorenc 39
2-Way Coupled DA/E
yn  H(xnf ),  o ,...
x1a
.
.
.
x
a
N
x1f
UM1
.
.
.

x
UMN

x
f
N
OPS

MOGREPS
yo
Deterministic
UM

f
N
y1
E
.
.
.
T

yN
F
N=23

(UM = Unified Model)
(OPS = Observation Preprocessing System)
xa
x1a
x1f
OPS
xf
OPS
© Crown Copyright 2011. Source: Met Office
xan  xa   xan
Ensemble
Covariances
y
K

x1a
.
.
.
a
x N
x aN

B
4D-VAR
xa
Covariances calculated directly from a sample:
=
=
=
© Crown copyright Met Office
Flavours of Ensemble Kalman
Filter
•
EnKF: closest to KF. Allows Schur product
localisation. Uses perturbed observations to get
correct spread. (e.g. Houtekamer & Mitchell Canada)
•
SQRT filters: Allows Schur product localisation.
Deterministic equation gives correct spread.
Efficient with serial processing of obs. (e.g. Tippett,
Anderson, Bishop, Hamill, Whitaker)
•
ETKF: Localised by data selection. Deterministic
equation gives correct spread. Efficient because
matrices are order ensemble size. (e.g. Bowler Met
Office, Kalnay, Ott, Hunt et al. Univ Maryland, Miyoshi Japan)
© Crown copyright Met Office
Hybrid VAR formulation
• Basic code written in late 90’s! (Barker and Lorenc)
• VAR with climatological covariance Bc:
Bc  UUT
wc  Uv  Up Uv Uh v
• VAR with localised ensemble covariance Pe ○ Cloc:

Cloc  U U
T

αi  U v
α
i
we 
K
1
( xi -x )  α i

K  1 i 1
• Note: We are now modelling Cloc rather than the full covariance Bc.
• Hybrid VAR:
w  cwc  ewe
© Crown copyright Met Office Andrew Lorenc 44
1 T
1 T 
J  v v  v v  Jo  Jc
2
2
Single observation tests
u response to a single u observation at centre of window
Horizontal
Standard 3D-Var
Pure ensemble 3D-Var
Ensemble RMS
Standard 4D-Var
Adam Clayton
© Crown copyright Met Office Andrew Lorenc 45
50/50 hybrid 3D-Var
The need to localise Pe
• Ensemble covariances are noisy. In
particular, there are spurious longrange correlations:
Pe
• Solution is to “localise” the
covariances, by multiplying pointwise
with a localising covariance Cloc:
Pe → Pe ○ Cloc:
.
(Lorenc 2003)
Pe
• Crucially, localisation also increases the “rank” of the ensemble covariances: the
number of independent structures available to fit the observations.
• (No localisation implies just 23 global structures!)
Errors in sampled EnKF covariances
1.5
N=100
covariance
1
0.5
0
-0.5
0
500
1000
1500
2000
2500
3000
distance (km)
© Crown copyright Met Office Andrew Lorenc 47
Page 47
Localisation: The Schur or Hadamard Product
1.5
n=100 * compact support
covariance
1
0.5
0
-0.5
0
500
1000
1500
2000
2500
3000
distance (km)
Page 48
The nonlinear “Hólm” humidity
transform
• Several centres have implemented a nonlinear humidity
transform to compensate for the non-Gaussian errors of
humidity forecasts (Hólm 2003, Gustafsson et al. 2011, Ingleby et al. in preparation)
• The “principle of symmetry” suggests a non-Gaussian prior:

P RH RH
b


 exp  RH  RH

b 2
2 S  RH , RH b 

• This makes the variational minimisation implicit;
ECMWF and HIRLAM iterate this term in the outer-loop,
The Met Office include it in a non-quadratic inner
minimisation.
© Crown copyright Met Office Andrew Lorenc 49
Effect of 0% & 100% limits on RH
This would damage
forecasts of cloud
and rain!!
Diagram from Lorenc (2007)
© Crown copyright Met Office Andrew Lorenc 50
TRUE
 “best” estimate
obtained by
modifying xb away
from limits.
---------
P(x│xb) is biased,
with mean given by
blue line.
Histogram of RH (b-o) and (b+o)/2
© Crown copyright Met Office Andrew Lorenc 51
Principle of symmetry and Hólm
transform – a Bayesian interpretation.
What are the prior and loss function which make this optimal?
• The distribution of values in the background, generated by
the model, is close to correct – we have the right cloud cover
on average.
• It is important to us to retain this correct distribution – more
so than to reduce the expected RMS error at each point.
• The Hólm transform constructs a (skewed) prior whose
mode is the background.
• We rely on a minimisation which finds this mode (not the
mean) and hence returns the model background unaltered in
the absence of observations.
© Crown copyright Met Office Andrew Lorenc 52
Mean b-o, classified by cloud top in b
© Crown copyright Met Office Andrew Lorenc 53
Mean b-o, classified by cloud top in o
© Crown copyright Met Office Andrew Lorenc 54
Vertical correlation with level 5
© Crown copyright Met Office Andrew Lorenc 55
Nonlinearity – benefitting
from the attractor
• The atmospheric state is fundamentally governed by nonlinear effects, e.g.
convective-radiative equilibrium, condensation, cloud & precipitation.
Nonlinear chaotic systems have a fuzzy attractor manifold of states that
occur in reality – far fewer than all possible states. This gives us recognisable
weather systems and practical weather prediction!
• Usual minimum variance “best” estimate is not on the attractor.
• The best practical way of defining the attractor in by using the full model, as
we have for years in methods for spin-up and diabatic initialisation.
• Methods based on Gaussian PDFs can only approximate near-linear aspects
of this balance. We need to add an additional prior that we want the
analysis to be a state which the model might generate.
• It is very hard to formulate a practical Bayesian algorithm to do this. We
might try engineering solutions:
 Incremental methods with an outer-loop.
 Multiple DA “particles” sampling plausible solutions.
© Crown copyright Met Office Andrew Lorenc 56
© Crown copyright Met Office Andrew Lorenc 56
Questions and answers
© Crown copyright Met Office