2013 Workshop on Quantitative Evaluation of Downscaled Data “Perfect Model” Experiments: Testing Stationarity in Statistical Downscaling (NCPP Protocol 2) IS PAST PERFORMANCE AN INDICATION OF FUTURE.

Download Report

Transcript 2013 Workshop on Quantitative Evaluation of Downscaled Data “Perfect Model” Experiments: Testing Stationarity in Statistical Downscaling (NCPP Protocol 2) IS PAST PERFORMANCE AN INDICATION OF FUTURE.

2013 Workshop on Quantitative
Evaluation of Downscaled Data
“Perfect Model” Experiments:
Testing Stationarity in Statistical
Downscaling (NCPP Protocol 2)
IS PAST PERFORMANCE AN
INDICATION OF FUTURE RESULTS?
Keith Dixon1, Katharine Hayhoe2, John Lanzante1,
Anne Stoner2, Aparna Radhakrishnan3, V. Balaji4, Carlos Gaitán5
1
2
3 DRC Inc.
4 Princeton Univ.
5 Univ. of Oklahoma
Goals of this presentation
1. Define the ‘stationarity assumption’ inherent to
2.
3.
4.
5.
6.
statistical downscaling of future climate projections.
Present our ‘perfect model’ (aka ‘big brother’)
approach to quantitatively assess the extent
to which the stationarity assumption holds.
Illustrate with a few examples the kind of results one
can generate using this evaluation framework
(Anne Stoner will show more detail next…)
Introduce options to extend and supplement the
method (setting the hurdle at different heights).
Invite statistical downscalers to consider testing
their methods within the perfect model framework.
Garner feedback from workshop participants.
This project does not produce any data files that one would use in
real world applications.
The aim at GFDL is to gain knowledge about aspects of
commonly used SD methods (& GCMS), and to communicate
that knowledge so that better informed decisions can be made.
REAL WORLD APPLICATION:
Cannot
Start with
Compute
Produce
Skill:
Compare
evaluate
downscaled
transform
3 types
Obs
future
ofto
functions
data
refinements
SD
skill
sets
historical
-- left
in
… training
lacking
assuming
ofoutput
GCMobservations
step
(historical
(e.g.,
transform
(1 example
crossand/or
functions
validation)
of below)
thefuture)
future
apply equally well to past & future -- “The Stationarity Assumption”
TARGET
PREDICTORS
“PERFECT MODEL” EXPERIMENTAL DESIGN:
Start with 4 types of data sets – Hi-res GCM output as proxy for obs
& coarsened version of Hi-res GCM output as proxy for usual GCM
Proxy for observations in real-world application
Proxy for GCM output in real-world application
Daily time resolution
~25km grid spacing
194 x 114 grid
22k pts
64:1
~200km grid spacing
~64 of the smaller
25km resolution
grid cells fit within
the coarser cell
Daily time resolution
~25km grid spacing
194 x 114 grid
Follow the same interpolation/regridding
sequence to produce coarsened data
sets for the future climate projections
“PERFECT MODEL” EXPERIMENTAL DESIGN:
Can
directly
evaluatefunctions
skill
both
the historical
period
and
the
Compute
Produce
downscaled
transform
output
for
infor
historical
training
step
and(1
future
example
time
below)
periods
future using the Hi Res GCM output as “truth” – Test Stationarity
QuantitativeTests of Stationarity:
The extent to which
SKILL computed for the FUTURE CLIMATE PROJECTIONS
is diminished relative to the
SKILL computed for the HISTORICAL PERIOD
GFDL-HiRAM-C360 experiments
All
used in2086-2095,
the perfectwe
model
tests
Fordata
the period
make
use
were
derived
from GFDL-HiRAM-C360
of a pair
of 3 member
ensembles -- a
(C360)
model
simulations
conducted
total of six
10-year
long experiments,
following
highand
resolution
time
identified CMIP5
as the “C”
“E” ensembles
slice
protocols.
to theexperiment
right (C warms
more than E)...
We consider a pair of 30-year long “historical”
experiments (1979-2008), labeled H2 and H2
here…
Same GCM used to generate 2086-2095 projection “C” (3x10yr)
2 sets of future projections
(3 members each)
1979-2008 model climatology
7.2
"C" Regional
Warming [K]
6.2
2086-2095 projection “E” (3x10yr)
5.0
"E" Regional
Warming [K]
4.1
0
Land
2
4
All pts
6
8
Q: How High a Hurdle
does this Perfect Model
approach present?
A: Varies geographically,
by variable of interest,
time period of interest,
etc.
August Daily Max Temperatures
for a point in Oklahoma
Coarsened data is
slightly cooler with
Slightly smaller std dev.
… not too challenging
(2.5C histogram bins)
August Daily Max Temperatures
for a point in Oklahoma
approx. +7C
mean warming
(2.5C histogram bins)
August Daily Max Temperatures
for a point just NE of San Fancisco
Coarsened data is
more ‘maritime’ &
HiRes target is more
‘continental’.
Presents more of a
challenge during the
historical period than
the Oklahoma
example.
(2.5C histogram bins)
August Daily Max Temperatures
for a point just NE of San Fancisco
(2.5C histogram bins)
Map produced by NCPP Evaluation Environment ‘Machinery’
Note that in several locations there is a tendency for ARRM
cross-validated downscaled output to have too low standard
deviations just offshore and too high std. dev. over coastal land
NEXT: A sampling of results…
 Intended to be illustrative…
not exhaustive, nor systematic.
 Anne Stoner (TTU) will show more.
Start with a summary: ARRM downscaling errors are
st
larger
1.6 for daily max temp at end of 21 C than for 1979-2008
All Pts
1.4
Land
1.14
1.2
1.0
0.82
0.98
0.90
0.76
0.8
0.63
0.6
0.4
0.2
0.0
1979-2008
"C" 2086-2095
"E" 2086-2095
Area Mean Time Mean Absolute Downscaling Errors
Σ | (Downscaled Estimate – HiRes GCM) |
(NumDays)
Geographic Variations: Downscaling MAE for 1979-2008
1.6
1.4
All Pts
Land
1.2
1.0
0.8
0.6
0.4
0.2
0.0
19792008
Mean Absolute downscaling Error (MAE) during 1979-2008
Σ | (Downscaled Estimate – HiRes GCM) |
(60 * 365)
Geographic Variations: MAE pattern for “E” projections (+5,4C)
1.6
All Pts
Land
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0
"E" 20862095
Mean absolute downscaling error during 2086-2095
“E” Projections (3 member ensemble)
Geographic Variations: MAE pattern for “C” projections (+7,6C)
1.6
All Pts
Land
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0
"C" 20862095
Mean absolute downscaling error during 2086-2095
“C” Projections (3 member ensemble)
Geographic Variations: Bias pattern for “C” projections (+7,6C)
Mean Climate Change Signal Difference (ARRM minus Target)
“C” Projection
Looking at how well the stationarity
assumption holds in different seasons
4
3
J
F
M
J
J
A
S
J F
M A
M J
Entire
US48
Domain
J
A
S
Red = “C”
A
M
tasmax
O
O
N
D
N
D
Blue=“E”
2
1
Where and when ratio=1.0, the stationarity assumption fully holds
(i.e., no degradation in mean absolute downscaling error
during 2085-2095 vs. the 1979-2008 period using in training.)
Looking at how well the stationarity
assumption holds in different seasons
…comparing near-coastal land vs. interior
Looking at how well the stationarity
assumption holds in different seasons
4
3
2
1
J
F
M
A
M
Blue= coastal SC-CSC
Black = full US48+ domain
Green = interior SC-CSC
J
J
“C” ensemble
A
S
O
N
D
Looking at how well the stationarity
assumption holds in different seasons
“C” ensemble
tasmax
A clear intra-month MAE trend in some but not all months
“C” ensemble
tasmax
“sawtooth” has lower values in cooler part of month,
higher in warmer part of month
Options for extending this ‘perfect model-based’
exploration of statistical downscaling stationarity
?? More
Ideas ??
More SD
Methods
Different Emissions
Scenarios & Times
Mix & Match GCMs’
Target & Predictors
More Climate
Variables & Indices
Use of Synthetic
Time Series
Different HiRes
Climate Models
Alter GCM-based
data in known ways
to ‘Raise the Bar’
GFDL-HiRAM-C360 experiments
Data sets used in the perfect model tests are being
made available to the community.
This enables others to use our exp design to test the
stationarity assumption for their own SD method.
Also, we invite SD developers to consider
collaborating with us, so that their techniques &
perspectives may be more fully incorporated into
this evolving perfect model-based research &
assessment system.
For more info, visit www.gfdl.noaa.gov/esd_eval &
www.earthsystemcog.org/projects/gfdl-perfectmodel
In other words, by accessing these types of files, folks can
generate their own SD results and test how well the
stationarity assumption holds for their favorite SD method.
from us
from us
In other words, by accessing these types of files, folks can
generate their own SD results and test how well the
stationarity assumption holds for their favorite SD method.
from us
SD
method
of
choice
from us
Obviously, one does not need to perform tests using all
22,000+ grid points. Below we indicate the locations of
a set of 16 points we’ve used for some development work.
(color areas are 3x3 with grid point of interest in the center)
 Anne Stoner and Katharine Hayhoe with
more ‘perfect model’ analyses…
2013 Workshop on Quantitative
Evaluation of Downscaled Data
“Perfect Model” Experiments:
Testing Stationarity in Statistical
Downscaling (NCPP Protocol 2)
IS PAST PERFORMANCE AN
INDICATION OF FUTURE RESULTS?
Keith Dixon1, Katharine Hayhoe2, John Lanzante1,
Anne Stoner2, Aparna Radhakrishnan3, V. Balaji4, Carlos Gaitán5
1
2
3 DRC Inc.
4 Princeton Univ.
5 Univ. of Oklahoma
“PERFECT MODEL” EXPERIMENTAL DESIGN:
Can
directly
evaluatefunctions
skill
both
the historical
period
and
the
Compute
Produce
downscaled
transform
output
for
infor
historical
training
step
and(1
future
example
time
below)
periods
future using the Hi Res GCM output as “truth” – Test Stationarity
Options for extending this ‘perfect model-based’
exploration of statistical downscaling stationarity
?? More
Ideas ??
More SD
Methods
Different Emissions
Scenarios & Times
Mix & Match GCMs’
Target & Predictors
More Climate
Variables & Indices
Use of Synthetic
Time Series
Different HiRes
Climate Models
Alter GCM-based
data in known ways
to ‘Raise the Bar’
This project does not produce any data files that one would use in
real world applications.
Odds-n-ends from day 1
* Not all SD codes run on a cell phone
* Not All GCMs have 200km atmos grid (50km)
* GCMs are developed primarily as research tools – not prediction tools
The aim at GFDL is to gain knowledge about aspects of
commonly used SD methods (& GCMS), and to communicate
that knowledge so that better informed decisions can be made.
Goals of this presentation
1. Define the ‘stationarity assumption’ inherent to
2.
3.
4.
5.
6.
statistical downscaling of future climate projections.
Present our ‘perfect model’ (aka ‘big brother’)
approach to quantitatively assess the extent
to which the stationarity assumption holds.
Illustrate with a few examples the kind of results one
can generate using this evaluation framework
(Anne Stoner will show more detail next…)
Introduce options to extend and supplement the
method (setting the hurdle at different heights).
Invite statistical downscalers to consider testing
their methods within the perfect model framework.
Garner feedback from workshop participants.