Transcript Slide 1

Terrestrial carbon cycle parameter estimation
from the ground-up: A case study for Australia.
Parameter estimation of a terrestrial C-Cycle
model using multiple datasets of ground based
observations.
Model-Data Integration and Network Design for Biogeochemical
Research Advanced Study Institute,
National Center for Atmospheric Research,
May 2002.
Dr Damian Barrett
CSIRO Plant Industry,
GPO Box 1600
Canberra ACT Australia.
[email protected]
Thanks: Michael Raupach, Dean Graetz, Ying Ping
Wang, Peter Rayner, Ray Leuning, John Finnigan
and other ‘Carbon Dreaming’ participants...
© CSIRO Land & Water
Topics:
• Background: Motivations for developing ‘yet
another’ terrestrial BGC model of the C-cycle.
• Forward Model: Conservation equations,
parameters, state variables, forcing functions, and
driver data.
• Parameter estimation: Recast the forward model
as a steady state model, multiple observation
datasets, search algorithm (GAs) and parameter
covariance matrix
• Output: Parameter estimates (turnover time of C
in soil and litter pools, depth profiles of soil C
efflux, light use efficiency of NPP).
Science Motivations 1: Reducing uncertainty in carbon cycle
• Large uncertainties in the global C-cycle particularly
with terrestrial biogeochemistry, particularly below
ground processes.
• Limited capability to observe below-ground
dynamics, fluxes and transformations of carbon.
• Depth distribution of turnover time of C in soil?
• Depth distribution of soil C flux?
• Limited observations: very patchy & disparate data
• Limited process understanding:
• Some processes are well understood
(photosynthesis & d13C discrimination,
decomposition of litter and soil organic matter).
© CSIRO Land & Water
• Other processes are poorly understood (Callocation among plant tissues, T sensitivity of
humus decomposition, d13C discrimination of
decomposition…)
Science Motivations 2: Approach in a nutshell
• An application of the parameter estimation problem
• using a forward model of NEE of C (VAST) &
• multiple datasets of observations of plant
production and pool sizes to constrain
parameters in VAST.
• Approach is distinct from Data Assimilation:
• are not estimating initial conditions, updating
model state variables in time nor estimating time
dependent control parameters
• are estimating steady state model parameters
(ie no time- or space-dependency in parameters)
• We use algebraic ‘scaling functions’ in the forward
model to introduce time- and space-dependency in
parameters
© CSIRO Land & Water
Scene setting: Australia and North America
• Statistics
• Land area (km2):
• MAR:
• Evaporation:
• Runoff:
• onset of agriculture:
Australia
7.6 x 106
479 mm
437 mm
50 mm
1860
N. America
24.2 x 106
630 mm
301 mm
329 mm
~1750
(Conterminous USA)
(9.2 x 106 km2)
Australia is characterized by high year-year climate variability, high vapor pressure deficits, highly
weathered soils, high biodiversity and an evergreen vegetation evolved in isolation and adapted to
these conditions
VAST1.1: Forward Model – schematic diagram
• A linear compartmental model
of C-dynamics of the terrestrial
biosphere
Pg
r
Pn
aL
Ra
qL
Littermass
tL
qF
h
aW
qW
aR
z1
qR1
z2
qR2
z3
qR3
tW
tR1
tR2
qC
tC
• Linear dependence of qk and
Pn on parameters.
Rh
tF
qF
• 10 pools
qC
qS1
qS1
qS2
tS2
qS2
qS3
tR3
Biomass
• Plant Production: Light Use
Efficiency approach
tS1
Soil-C
tS3
• Forced T, P, NDVI, fn,fs.
qS
• 3 classes of parameters
• 12 Partitioning
• 10 Timescale
• Additional process e*, r
3
aL, aW, aR, zR1, zR2, zR3, qF, qC, h, qS1, qS2, qS3
tL, tW, tR1, tR2, tR3, tF, tC, tS1, tS2, tS3
• Mortality and Decomposition:
modeled as first-order kinetics.
VAST1.1: Forward Model
• VAST1.1 Input C-flux: Light Use Efficiency model
n
Pn  e 1  r  i  p , i
*
i 1
• Mass conservation equations: a system of 10
coupled first-order ODE.
dqL dt  a L Pn  qL t L
dqW dt  aW Pn  qW t W
dqRj dt z j a R Pn  qRj t Rj
dqF dt  qL t L  h qC t C  qF t F
dqC dt  qW tW  qC t C
dqS1 dt  qR1 t R1  q F qF t F  qC qC t C  qS1 t S1
© CSIRO Land & Water
dqS j dt  qRj t Rj  qS j 1 qS j 1 t S j 1  qS j t S j
VAST1.1: Forcing data
Data:
• Climate data: BoM 0.25o Monthly max/min
T(oC) (1950 - 2000) & Monthly rainfall(mm)
(1890 - 2000)
•
NASA PAL 8km-10day NDVI: Noise Filtered
(Lovell & Graetz) & re-georef (Barrett) (1981 2000)
•
NASA Langely SRB: Monthly Shortwave down
& net radiation (1983 - 1991)
•
Digital Atlas of Australian soils + Interpretation
(McKenzie and Hook 1992): depth, ksat.
•
Digital Atlas of Australian historic vegetation:
“Pre-European” growth form of tallest stratum
and FPC.
VAST1.0: Multiple observations dataset
VAST 1.0 Observation Dataset:
183 obs NPP
105 obs above ground biomass
94 obs fine littermass
346 obs soil [C]
55 obs soil bulk density
From 174 published studies.
Available:
http://www-eosdis.ornl.gov
VAST: Multiple observation datasets: interpretation
• Observation sites: vegetation is ‘minimally disturbed’
• where the return period of stand replacing disturbance
is longer than the recovery period of vegetation to
maximum biomass.
• Assume vegetation is at steady-state
• ie when averaged over space and time, the rate of
change of C-mass in any pool is zero (where C influx
into the biosphere = C efflux from biosphere).
• Criteria to meet steady state assumption
• Author’s description of overstorey vegetation was
equivalent with AUSLIG 1788 Historical Australian
Vegetation Classification
• Age sequence: oldest age vegetation used
• Multiple sites at a single lat/long: data were averaged
among sites.
• Only data from published literature was used
(Quality control = peer review)
• Only geo-referenced data used (lat/long)
VAST: Observation datasets in climate space
AS Savannahs
TF
• Open circles depict
individual grid cells of
continental raster in climate
space.
30
Mean annual temperature (oC)
25
20
• Colored circles show
location of observations in
climate space.
15
10
qL + qW
NPP
• NPP, Biomass and
Littermass observations are
biased towards higher
rainfall/productivity sites
5
25
20
• Bias in the landscape (more
productive sites)
15
10
Soil [C]
qF
5
0
500
1000
1500
2000
0
500
1000
1500
Mean annual rainfall (mm)
2000
• Soil observations are more
representatively distributed
2500
VAST: Parameter estimation – weighted least squares
• Aim: Estimate a by minimising the objective function, O(a), given ŷ, x & y, :
O  a    y  yˆ (x; a)  Cy1  y  yˆ (x; a) 
T
where
• y vector of observations for multiple datasets (ie. the VAST 1.0 Obs Dataset)
• ŷ(.) corresponding vector of model predictions (based on steady state equations)
• x vector of forcing variables (climate, radiation, NDVI…)
• a vector of model parameters
• Cy-1 is inverse of the error covariance matrix (a symmetric weighting matrix containing
information on measurement error and correlations among measurement errors).
• where measurement errors are gaussian, uncorrelated and errors constant variance
(Cii are equal & Cij = 0): Ordinary least squares
• where Cii are not equal & Cij = 0: Weighted least squares.
• where Cii are not equal & Cij ≠ 0: Generalised least squares.
• Multiple constraints case: need to deal with observations of unequal magnitude &
consequently have unequal variances.
• VAST1.1: we assume that measurement errors are independent and gaussian and that
the error variances are equal to the sample variances for each dataset.
VAST: Specification of steady state model
• Since observations are from ‘minimally’ disturbed
sites (ie. are assumed to represent steady state
conditions) we need to express the conservation
equations in steady state form.
• Recalling that at steady state:
dq
 0,
dt
I
q
t
0
• Re-arrange conservation equations steady-state
form:
qk  Pn f k t k
Where fk is the fraction of NPP which has passed
through pools upstream of qk.
© CSIRO Land & Water
VAST: Specification of inverse model
• fk in VAST1.1 are :
fL  aL
fW  a W
fR j  z j aR
f F  a L  h aW 
f C  aW
f S1  z 1a R  q F a L  haW   q C aW 
f S 2  z 2 a R  q S1 z 1 a R  q F a L  h aW   q C aW 
f S 3  z 3 a R  q S 2 z 2 a R  q S1 z 1 a R  q F a L  h aW   q C aW 
• Subject to
3
a L  a W  a R  1, and z j  1
j 1
VAST: Uncertainty in estimated parameters
• The uncertainty in parameters is given by the parameter covariance matrix
Cb = sy [JT J]-1
where
J is the Jacobian; the matrix of model derivatives with respect to parameters
The Jacobian is of dimensions n rows x p columns (n = Total Number of
observations and p = No. of parameters)
Each element of the Jacobian is
J ijk
 yˆij

 ak
;
i  1,..., n; j  1,..., m; k  1,..., p
xn , m
sy is the residual sum of squares
sy = [y – ŷ(x; a)]T [y – ŷ(x; a)] / (n – p)
Parameter estimation using multiple datasets
• Equations in each ŷ must share at least some
parameters in common
• otherwise there is no mutual constraint imposed by the
multiple datasets (off diagonal elements of [JT J] = 0)
• This is equivalent to independent parameter estimates
• Shared parameters between equations must be on an
equivalent SCALE
• eg. Photosynthesis models at leaf and canopy scales.
• leaf scale Jmax: e- transport processes in chloroplast
• canopy scale Jmax: a statistical average over a pop.
• observations used to constrain the canopy model cannot
constrain estimates of the leaf scale parameter (unless
we have a way of disaggregating the large scale
information among the population leaves).
© CSIRO Land & Water
Parameter estimation using multiple datasets (continued)…
•Highly correlated datasets add little information to
constrain parameter estimates
• eg. Do N concentration datasets provide a further
constraint on C fluxes?
• Due to conserved C:N ratios in terrestrial pools, C & N
data are highly correlated
• Therefore including N data does not necessarily add
much new information to more tightly constrain
parameters.
• In practice,
Cy may be very difficult to specify
• particularly for multiple datasets where information on
error correlation between datasets is unavailable.
© CSIRO Land & Water
Search method: Genetic algorithms
• a type of stochastic search technique that is particularly useful in
optimisation where...
the region of the global minimum of O occupies a small fraction
of parameter space
1
global minimum
parameter space is rough (numerous local minima)
parameter space is discontinuous (jacobian is unavailable)
• Example: shows the evolution of a solution to the global minimum
of a particularly difficult function
• Start with a random selection of population members which are
solutions to the problem and evolves the population towards the
60
global minimum within 90 trials
• Note:
 local minima are not retained in the population if other “fitter”
members are found
 even though the global minimum is found in < 90 trials, mutation
90
maintains diversity in the search of parameter space
Genetic algorithms: a Primer
• Population is made up of a set of “Chromosomes” = (a
gene
chromosome
{...pi-1,
pi, pi+1...}
3.14
set of parameters) comprising “Genes” (1 per parameter).
• Each parameter value (gene) is encoded into a binary
string.
314
encoding
• Crossover operator: generates offspring from mating
"
P1 ...10010111 | 100111010 | 100010111... parent chromosomes.
• Mutation operator: creates new genes stochastically.
P2
...1100000 | 1110110 | 111111101...
• Selection Operator: selects chromosomes based on a
mutation
‘Fitness’ function.
crossover
O1
...10010111 | 1001110 | 101111101...
• GAs generate solutions to problems by evolving the
O2
...1100000 | 111011010 | 100010111...
population over time and selecting for fitter solutions. They
decoding
O1
O2
4.74
0.78
selection
0.78
increase the average “fitness” of a population of model
solutions by exploiting prior knowledge of parameter
values retained in the population.
• For difficult objective functions: Can use monte carlo
approaches to obtain estimates of parameter
uncertainties. (not done here)
VAST: Turnover time of soil C pools
Turnover Time (years)
0
Depth (cm)
-20
-40
-60
-80
-100
0
50
100
150
200
250
• Estimated turnover time of C as a function of
soil depth (+/- 1s) corrected for temperature and
moisture effects on decomposition
•In situ turnover time at any time and place is
modified by climate, soil moisture content of the
soil and vegetation type.
• Faster turnover of carbon in surface soil.
• Turnover time of C not significantly different
between 20 – 50 cm and 50 – 100 cm depths.
• Increasing turnover time with depth reflects
decreasing decomposition rate, due to less
labile C, lower nutrient or oxygen availability,
increasing physical protection of C by higher soil
clay contents,…
VAST: Depth profiles of soil carbon flux
0.00
-20
CWD
Fine
0.20
0.40
0.60
0.00
-20
0.20
0.40
0.60
0.00
-20
0
0
20
20
20
40
40
40
60
60
60
80
80
80
100
100
100
Soil Depth (cm)
0
0.20
0.40
0.60
‘Tall’ productive forests
Open Woodlands
Arid shrublands
• Plots show realizations of the fraction of soil C-flux originating from fine and coarse
litter pools and from different soil horizons for each of the 3 vegetation types.
• More than 89% of total soil C-flux originates from < 20cm depth (>98% < 50cm)
• Largest source of C flux from soil is fine litter
• Model is relatively insensitive to uncertainty in the estimated turnover time
(predicted soil respiration in 50 - 100 cm horizon has low uncertainty).
Summary points:
To integrate inventory data, remote sensing, flux station and atmospheric [CO2] data
for parameter estimation we need to consider the following:
• We need a comprehensive set of forward models to predict system state
• predict fluxes = f(NPP, stores, …)
• predict fluxes = f(near surface [CO2], u, …)
• predict radiance measures = f(LAI, n, …)
• predict atmospheric [CO2] = f(fluxes, atmospheric transport, …)
• We need the forward models to share common parameters
• otherwise no benefit obtained using multiple constraints approach
• Need to address issues of scale in order to relate data obtained on different time
and space scales
• eg the need to relate near surface [CO2] to atmospheric [CO2] in order to combine eddy
flux and atmospheric CO2 datasets
Summary points: continued…
• We need an objective means for specifying the network design (ie. a quantitative
means of identifying the types and locations of measurements)
• How do you decide who’s network is better?
• network design is an optimization problem!
• so its possible to include in the objective function a cost term for new observations
• “Is it better to generate extensive datasets of cheap and uncertain observations over the
scale of interest, than few expensive accurate observations?”
• We need analysis of the information content of different types of datasets
• because adding new datasets may not lead to further constraint on model parameters if:
• new data are highly correlated with existing data
• if by adding new data we also add new model equations having new unknown
parameters (just shifts the problem of insufficient information to one of estimation of
new parameters).