2013 Workshop on Quantitative Evaluation of Downscaled Data “Perfect Model” Experiments: Testing Stationarity in Statistical Downscaling (NCPP Protocol 2) IS PAST PERFORMANCE AN INDICATION OF FUTURE.
Download ReportTranscript 2013 Workshop on Quantitative Evaluation of Downscaled Data “Perfect Model” Experiments: Testing Stationarity in Statistical Downscaling (NCPP Protocol 2) IS PAST PERFORMANCE AN INDICATION OF FUTURE.
2013 Workshop on Quantitative Evaluation of Downscaled Data “Perfect Model” Experiments: Testing Stationarity in Statistical Downscaling (NCPP Protocol 2) IS PAST PERFORMANCE AN INDICATION OF FUTURE RESULTS? Keith Dixon1, Katharine Hayhoe2, John Lanzante1, Anne Stoner2, Aparna Radhakrishnan3, V. Balaji4, Carlos Gaitán5 1 2 3 DRC Inc. 4 Princeton Univ. 5 Univ. of Oklahoma Goals of this presentation 1. Define the ‘stationarity assumption’ inherent to 2. 3. 4. 5. 6. statistical downscaling of future climate projections. Present our ‘perfect model’ (aka ‘big brother’) approach to quantitatively assess the extent to which the stationarity assumption holds. Illustrate with a few examples the kind of results one can generate using this evaluation framework (Anne Stoner will show more detail next…) Introduce options to extend and supplement the method (setting the hurdle at different heights). Invite statistical downscalers to consider testing their methods within the perfect model framework. Garner feedback from workshop participants. This project does not produce any data files that one would use in real world applications. The aim at GFDL is to gain knowledge about aspects of commonly used SD methods (& GCMS), and to communicate that knowledge so that better informed decisions can be made. REAL WORLD APPLICATION: Cannot Start with Compute Produce Skill: Compare evaluate downscaled transform 3 types Obs future ofto functions data refinements SD skill sets historical -- left in … training lacking assuming ofoutput GCMobservations step (historical (e.g., transform (1 example crossand/or functions validation) of below) thefuture) future apply equally well to past & future -- “The Stationarity Assumption” TARGET PREDICTORS “PERFECT MODEL” EXPERIMENTAL DESIGN: Start with 4 types of data sets – Hi-res GCM output as proxy for obs & coarsened version of Hi-res GCM output as proxy for usual GCM Proxy for observations in real-world application Proxy for GCM output in real-world application Daily time resolution ~25km grid spacing 194 x 114 grid 22k pts 64:1 ~200km grid spacing ~64 of the smaller 25km resolution grid cells fit within the coarser cell Daily time resolution ~25km grid spacing 194 x 114 grid Follow the same interpolation/regridding sequence to produce coarsened data sets for the future climate projections “PERFECT MODEL” EXPERIMENTAL DESIGN: Can directly evaluatefunctions skill both the historical period and the Compute Produce downscaled transform output for infor historical training step and(1 future example time below) periods future using the Hi Res GCM output as “truth” – Test Stationarity QuantitativeTests of Stationarity: The extent to which SKILL computed for the FUTURE CLIMATE PROJECTIONS is diminished relative to the SKILL computed for the HISTORICAL PERIOD GFDL-HiRAM-C360 experiments All used in2086-2095, the perfectwe model tests Fordata the period make use were derived from GFDL-HiRAM-C360 of a pair of 3 member ensembles -- a (C360) model simulations conducted total of six 10-year long experiments, following highand resolution time identified CMIP5 as the “C” “E” ensembles slice protocols. to theexperiment right (C warms more than E)... We consider a pair of 30-year long “historical” experiments (1979-2008), labeled H2 and H2 here… Same GCM used to generate 2086-2095 projection “C” (3x10yr) 2 sets of future projections (3 members each) 1979-2008 model climatology 7.2 "C" Regional Warming [K] 6.2 2086-2095 projection “E” (3x10yr) 5.0 "E" Regional Warming [K] 4.1 0 Land 2 4 All pts 6 8 Q: How High a Hurdle does this Perfect Model approach present? A: Varies geographically, by variable of interest, time period of interest, etc. August Daily Max Temperatures for a point in Oklahoma Coarsened data is slightly cooler with Slightly smaller std dev. … not too challenging (2.5C histogram bins) August Daily Max Temperatures for a point in Oklahoma approx. +7C mean warming (2.5C histogram bins) August Daily Max Temperatures for a point just NE of San Fancisco Coarsened data is more ‘maritime’ & HiRes target is more ‘continental’. Presents more of a challenge during the historical period than the Oklahoma example. (2.5C histogram bins) August Daily Max Temperatures for a point just NE of San Fancisco (2.5C histogram bins) Map produced by NCPP Evaluation Environment ‘Machinery’ Note that in several locations there is a tendency for ARRM cross-validated downscaled output to have too low standard deviations just offshore and too high std. dev. over coastal land NEXT: A sampling of results… Intended to be illustrative… not exhaustive, nor systematic. Anne Stoner (TTU) will show more. Start with a summary: ARRM downscaling errors are st larger 1.6 for daily max temp at end of 21 C than for 1979-2008 All Pts 1.4 Land 1.14 1.2 1.0 0.82 0.98 0.90 0.76 0.8 0.63 0.6 0.4 0.2 0.0 1979-2008 "C" 2086-2095 "E" 2086-2095 Area Mean Time Mean Absolute Downscaling Errors Σ | (Downscaled Estimate – HiRes GCM) | (NumDays) Geographic Variations: Downscaling MAE for 1979-2008 1.6 1.4 All Pts Land 1.2 1.0 0.8 0.6 0.4 0.2 0.0 19792008 Mean Absolute downscaling Error (MAE) during 1979-2008 Σ | (Downscaled Estimate – HiRes GCM) | (60 * 365) Geographic Variations: MAE pattern for “E” projections (+5,4C) 1.6 All Pts Land 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 "E" 20862095 Mean absolute downscaling error during 2086-2095 “E” Projections (3 member ensemble) Geographic Variations: MAE pattern for “C” projections (+7,6C) 1.6 All Pts Land 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 "C" 20862095 Mean absolute downscaling error during 2086-2095 “C” Projections (3 member ensemble) Geographic Variations: Bias pattern for “C” projections (+7,6C) Mean Climate Change Signal Difference (ARRM minus Target) “C” Projection Looking at how well the stationarity assumption holds in different seasons 4 3 J F M J J A S J F M A M J Entire US48 Domain J A S Red = “C” A M tasmax O O N D N D Blue=“E” 2 1 Where and when ratio=1.0, the stationarity assumption fully holds (i.e., no degradation in mean absolute downscaling error during 2085-2095 vs. the 1979-2008 period using in training.) Looking at how well the stationarity assumption holds in different seasons …comparing near-coastal land vs. interior Looking at how well the stationarity assumption holds in different seasons 4 3 2 1 J F M A M Blue= coastal SC-CSC Black = full US48+ domain Green = interior SC-CSC J J “C” ensemble A S O N D Looking at how well the stationarity assumption holds in different seasons “C” ensemble tasmax A clear intra-month MAE trend in some but not all months “C” ensemble tasmax “sawtooth” has lower values in cooler part of month, higher in warmer part of month Options for extending this ‘perfect model-based’ exploration of statistical downscaling stationarity ?? More Ideas ?? More SD Methods Different Emissions Scenarios & Times Mix & Match GCMs’ Target & Predictors More Climate Variables & Indices Use of Synthetic Time Series Different HiRes Climate Models Alter GCM-based data in known ways to ‘Raise the Bar’ GFDL-HiRAM-C360 experiments Data sets used in the perfect model tests are being made available to the community. This enables others to use our exp design to test the stationarity assumption for their own SD method. Also, we invite SD developers to consider collaborating with us, so that their techniques & perspectives may be more fully incorporated into this evolving perfect model-based research & assessment system. For more info, visit www.gfdl.noaa.gov/esd_eval & www.earthsystemcog.org/projects/gfdl-perfectmodel In other words, by accessing these types of files, folks can generate their own SD results and test how well the stationarity assumption holds for their favorite SD method. from us from us In other words, by accessing these types of files, folks can generate their own SD results and test how well the stationarity assumption holds for their favorite SD method. from us SD method of choice from us Obviously, one does not need to perform tests using all 22,000+ grid points. Below we indicate the locations of a set of 16 points we’ve used for some development work. (color areas are 3x3 with grid point of interest in the center) Anne Stoner and Katharine Hayhoe with more ‘perfect model’ analyses… 2013 Workshop on Quantitative Evaluation of Downscaled Data “Perfect Model” Experiments: Testing Stationarity in Statistical Downscaling (NCPP Protocol 2) IS PAST PERFORMANCE AN INDICATION OF FUTURE RESULTS? Keith Dixon1, Katharine Hayhoe2, John Lanzante1, Anne Stoner2, Aparna Radhakrishnan3, V. Balaji4, Carlos Gaitán5 1 2 3 DRC Inc. 4 Princeton Univ. 5 Univ. of Oklahoma “PERFECT MODEL” EXPERIMENTAL DESIGN: Can directly evaluatefunctions skill both the historical period and the Compute Produce downscaled transform output for infor historical training step and(1 future example time below) periods future using the Hi Res GCM output as “truth” – Test Stationarity Options for extending this ‘perfect model-based’ exploration of statistical downscaling stationarity ?? More Ideas ?? More SD Methods Different Emissions Scenarios & Times Mix & Match GCMs’ Target & Predictors More Climate Variables & Indices Use of Synthetic Time Series Different HiRes Climate Models Alter GCM-based data in known ways to ‘Raise the Bar’ This project does not produce any data files that one would use in real world applications. Odds-n-ends from day 1 * Not all SD codes run on a cell phone * Not All GCMs have 200km atmos grid (50km) * GCMs are developed primarily as research tools – not prediction tools The aim at GFDL is to gain knowledge about aspects of commonly used SD methods (& GCMS), and to communicate that knowledge so that better informed decisions can be made. Goals of this presentation 1. Define the ‘stationarity assumption’ inherent to 2. 3. 4. 5. 6. statistical downscaling of future climate projections. Present our ‘perfect model’ (aka ‘big brother’) approach to quantitatively assess the extent to which the stationarity assumption holds. Illustrate with a few examples the kind of results one can generate using this evaluation framework (Anne Stoner will show more detail next…) Introduce options to extend and supplement the method (setting the hurdle at different heights). Invite statistical downscalers to consider testing their methods within the perfect model framework. Garner feedback from workshop participants.