Transcript PowerPoint Presentation Template
PM
2.5
Model Performance: Lessons Learned and Recommendations
Naresh Kumar Eladio Knipping EPRI February 11, 2004
Acknowledgements
• Atmospheric & Environmental Research, Inc. (AER) – Betty Pun, Krish Vijayaraghavan and Christian Seigneur • Tennessee Valley Authority (TVA) – Elizabeth Bailey, Larry Gautney, Qi Mao and others • University of California, Riverside – Zion Wang, Chao-Jung Chien and Gail Tonnesen
Overview
• Model Performance Issues • Need for Performance Guidelines/Benchmarking • Review of Statistics • Summary
Model Performance Issues
• Evaluation of Modeling Systems • Local vs. Regional Evaluation • Daily/Episodic/Seasonal/Annual Averaging • Threshold and Outliers • What Species to Evaluate? • Sampling/Network Issues
Examples from Model Applications
• Two applications of CMAQ-MADRID – Southeastern U.S. (SOS 1999 Episode) – Big Bend National Park, Texas (BRAVO); Four-Month Study • Statistical performance for SO
4 2– , EC, OM, PM 2.5
Application in Southeastern U.S.
• Southern Oxidant Study (SOS 1999) – June 29 to July 10, 1999 • Meteorology processed from MM5 simulations using MCIP2.2 • Emissions files courtesy of TVA • Simulation – Continental U.S. Domain – 32-km horizontal resolution without nesting
Application to Big Bend National Park
REMSAD CMAQ-MADRID 45 49
• The Georgia Tech/Goddard Global Ozone Chemistry Aerosol Radiation Transport (GOCART) model prescribed boundary conditions for SO 2 and SO 4 2– to the REMSAD domain.
• Preliminary Base Case simulation used boundary conditions as prescribed from a simulation of the larger outer domain by REMSAD. • SO 2 and SO 4 2– concentrations were scaled at CMAQ-MADRID boundary according to CASTNet and IMPROVE Network observations.
BRAVO Monitoring Network
Wichita Mtns Hagerman Esperanza Guadalupe Presidio Mtns McDonald Marathon Persimmon Gap
Big Bend K-Bar
Lake Colorado City Stephenville Monahans Ft McKavett Ft Stockton Ft La ncaster Sanderson Langtry LBJ Amistad Rio Grande Eagle Pass Brackettville Pleasanton Everton Ranch Purtis Creek Stillhouse Somerville Wright Patman Center Big Thicket San Bernard Laredo Aransas Lake Corpus Christi Padre Island Falcon Dam Laguna Atascosa
Local vs. Regional (SOS 1999)
rural urban suburban Yorkville (YRK) North Birmingham (BHM) Centreville (CTR) Jefferson Street (JST) Oak Grove (OAK) Outlying Landing Field #8 (OLF) Gulfport (GFP) Pensacola (PNS) SO 4 2– OM EC PM 2.5
SEARCH SEARCH SEARCH SEARCH
MNB(%) MNE(%)
20% 72% 14% -19% 51% 72% 52% 32%
SO 4 2– OM EC PM 2.5
IMPROVE
MNB(%) MNGE(%)
51% 89% IMPROVE IMPROVE IMPROVE -25% -8% -8% 46% 54% 49%
Local vs. Regional (BRAVO)
SO 4 2– Statistic Big Bend NP BRAVO Network CMAQ Sulfate Statistics Normalized Bias (%) Mean Norm. Bias 20% 37% Mean Norm. Error 55% 65% CMAQ Sulfate Statistics Normalized Error (%) < 15 15 to 30 30 to 45 45 to 60 60 to 75 75 to 90 90 to 105 105 to 120 > 120 5.3
13.2
52.8
23.4
83.9
129.9
40 Spatial Distribution of Mean Normalized Bias for SO 4 2– 40.8
54.5
90.3
63 < 15 15 to 30 30 to 45 45 to 60 60 to 75 75 to 90 90 to 105 105 to 120 > 120 Spatial Distribution of Mean Normalized Error for SO 4 2–
Daily SO
4 2–
P:O Pairs with Different Averaging
MNB = 37% MNE = 65% y = 1.11x + 0.54
R 2 = 0.47
MNB = 29% MNE = 35% y = 1.30x + 0.01
R 2 = 0.50
4 2 0 0 Daily 24 20 16 12 8 4 0 0 4 8 12 Observed 16 Monthly 20 24 8 6 2 4 Observed 6 8 4 2 0 0 Weekly 12 10 8 6 4 2 0 0 2 4 6 Observed 8 Four-Month 10 12 8 6 2 4 Observed 6 8 MNB = 28% MNE = 43% y = 1.32x - 0.11
R 2 = 0.56
MNB = 26% MNE = 26% y = 1.73x - 1.41
R 2 = 0.87
Daily SO
4 2–
P:O Pairs for Each Month
MNB = 2% MNE = 52% y = 0.71x + 0.46
R 2 = 0.26
MNB = 37% MNE = 61% y = 1.08x + 0.46
R 2 = 0.51
20 16 12 8 4 0 0 12 10 8 6 4 2 0 0 2 4 6 Observed 8 September 1999 10 12 4 July 1999 8 12 Observed 16 20 August 1999 20 16 12 8 4 0 0 4 8 12 Observed October 1999 16 20 24 20 16 12 8 4 0 0 4 8 12 Observed 16 20 24 MNB = 8% MNE = 49% y = 0.99x + 0.22
R 2 = 0.58
MNB = 85% MNE = 92% R 2 y = 1.53x + 0.77
= 0.51
Effect of Threshold
MNB and MNE as a Function of Threshold (PM 2.5
Sulfate - BRAVO - All Sites - Daily Data) FE, 48% FB, 9% Threshold Concentration (µg/m 3 ) MNB FE
Mean-Normalized/Fractional Statistics
Difference in Chossing Mean Normalized Statistics and Fractional Statistics
140% 120% 100% 80% 60%
MNE, 50, 63%
40%
MNB, 50, 29%
20% 0% 50 55
MNE, 80, 61% MNB, 80, 31%
60 65 70 75 80 85
Data Interval (% of Data Centered at Median of Observations) MNB MNE FB FE
90
MNE, 99, 66% FE, 48% MNB, 99, 38%
95
FB, 9%
100
Need for Model Performance Guidelines
• If no guidelines exist – Conduct model simulation with best estimate of emissions and meteorology – Perform model evaluation using favorite statistics • Difficult to compare across models – State that model performance is “quite good” or “adequate” or “reasonable” or “not bad” or “as good as it gets” – Use relative reduction factors • With guidelines for ozone modeling – If model didn’t perform within specified guidelines • Extensive diagnostics performed to understand poor performance • Improved appropriate elements of modeling system • Enhanced model performance
Issues with Defining Performance Guidelines for PM
2.5
Models
• What is “reasonable”, “acceptable” or “good” model performance? – Past experience: How well have current models done?
• What statistical measures should be used to evaluate the models?
Criteria to Select Statistical Measures I
• Simple yet Meaningful – Easy to Interpret – Relevant to Air Quality Modeling Community • Properties of Statistics – Normalized – Paired – Non-Fractional vs. Absolute vs. Unpaired vs. Fractional • “Symmetry” – Underestimates and overestimates must be perceived equally – Underestimates and overestimates must be weighted equally – Scalable: biases scale appropriately in statistics
Criteria to Select Statistical Measures II
• Statistics that can attest to – Bias – Error – Ability to capture variability – Peak accuracy (to some extent) – Normalizes
daily observations predictions paired with corresponding
– Inherently minimizes effect of outliers
daily
– Some statistics/figures may be preferable for EVALUATION, whereas others may be preferred for DIAGNOSTICS
Problems with Thresholds & Outliers
• Issues with addressing low-end comparisons via threshold – Instrumental uncertainty: detection limit, signal-to-noise – Operational uncertainty – Additional considerations: network, background concentration, geography, demographics • Inspection for outliers – Outlier vs. valid high observation • Definition of outlier must be objective and unambiguous • Clear guidance necessary for performance analysis.
Review of Statistics
• Ratio of Means (Bias of Means) or Quantile-Quantile Comparisons – Defeats purpose of daily observations: completely unpaired – “Hides” any measure of true model performance • Normalized Mean Statistics
(not to confuse with Mean Normalized)
– Defeats purpose of daily observations: Equally weighs all errors regardless of magnitude of individual daily observations – Masks results in bias (e.g., numerator zero effect) • Based on Linear Regressions – Slope of Least Squares Regression; Root (Normalized) Mean Square Error – Slope of Least Median of Squares Regression (Rousseeuw regression) – Can be skewed; neglects magnitude of observations; good for cross-comparisons.
• Fractional Statistics – Taints integrity of statistics by placing predictions in denominator; not scalable
Bias Statistics
• Mean Normalized Bias/Arithmetic Bias Factor
ABF
1
N i N
1
P i O i
1
MNB
1 1
N i N
1 (
P i
O i
– Same statistic: ABF is the style for “symmetric” perception
O i
) • ABF = 2:1 for 100% MNB, ABF = 1:2 for –50% MNB • MNB in % can be useful during diagnostics due to simple and meaningful comparison to MNE, but the comparison is flawed.
– The statistics give less weight to underpredictions than to overpredicitions. • Logarithmic Bias Factor/Logarithmic-Mean Normalized Bias
LBF
exp 1
N i N
1 ln
P i O i
,
LMNB=LBF
– 1 – Wholly symmetric representation of bias that satisfies all criteria – Can be written in “factor” form or in “percentage” form
Error Statistics
• Mean Normalized Error – Each data point normalized with paired observation – Similar flaw as Arithmetic Mean Normalized Bias: The statistic gives less weight to underpredictions than to overpredicitions. 1
N i N
1
P i
O i O i
• Logarithmic Error Factor/Logarithmic-Mean Normalized Error
LEF
exp 1
N i N
1 ln
P i O i
,
LMNE=LEF
– 1 – Satisfies all criteria – Comparisons between logarithmic-based statistics (bias and error) are visibly meaningful when expressed in “factor” form
Comparing Bias and Error Statistics Based on Arithmetic and Logarithmic Means
Case I Pred 2 0.5
Obs 1 1 Case I MNB 25% Case II -50% Case III -15% Case IV 60% Arithmetic BF 1.25 : 1 1 : 2 1 : 1.18
1.6 : 1 FB 0% LMNB 0% -67% -50% Logarithmic LBF 1 : 1 1 : 2 -24% -24% 24% 32% 1 : 1.32
1.32 : 1 Case II Pred 0.5
0.5
Obs 1 1 Case III Pred 1 2 3 4 1 Obs 1 2 3 4 4 Case IV Pred 4 1 2 3 4 Obs 1 1 2 3 4 3 2 5 4 1 0 0 Case I Case II Case III Case IV Arithmetic Logarithmic MNE FE LMNE LEF 75% 67% 100% 2.00
50% 67% 100% 2.00
15% 24% 60% 24% 32% 32% 1.32
1.32
1 2 3 Obs Case III Case IV 4 5
Mean Normalized/Fractional Statistics
Difference in Chossing Mean Normalized Statistics and Fractional Statistics: PM 2.5
100% 80% 60%
MNE, 50, 55%
40%
MNE, 80, 55% MNE, 99, 56% FE, 52%
20%
MNB, 50, 13%
0% 50 -20% 55 60 65 70 75
MNB, 80, 13%
80 85
Data Interval (% of Data Centered at Median of Observations) MNB MNE FB FE
90
MNB, 99, 14%
95 100
FB, -7%
Logarithmic/Arithmetic Statistics
Comparing Logarithmic and Arithmetic Statistics: PM 2.5
100%
LMNE, 78%
80% 60%
MNE, 50, 55%
40%
MNE, 80, 55% MNE, 99, 56% FE, 52%
20%
MNB, 50, 13%
0% 50 -20% 55 60 65 70 75
MNB, 80, 13%
80 85
Data Interval (% of Data Centered at Median of Observations) MNB MNE FB FE LMNB LMNE
90
MNB, 99, 14%
95
FB, -7%
100
LMNB, -8%
Logarithmic/Arithmetic Statistics Bias Statistics
MNB SO 4 2– EC 37% 342% OM PM 2.5
-50% 13% Arithmetic BF 1.37 : 1 4.42 : 1 1 : 2.01
1.13 : 1 FB LMNB 18% 22% 64% 128% Logarithmic LBF 1.22 : 1 2.28 : 1 -78% -61% -7% -8% 1 : 2.57
1 : 1.09
Error Statistics
OM PM 2.5
Arithmetic Logarithmic MNE FE LMNE LEF SO 4 2– EC 59% 45% 64% 1.64
364% 104% 301% 4.01
55% 82% 167% 2.67
55% 51% 78% 1.78
Note: MNB/ABF & MNE use 95% data interval. FB, FE, LMNB/LBF and LMNE/LEF use 100% of data.
Relating Criteria for LBF/LMNB and LEF/LMNE
• Criterion for Logarithmic EF/MNE can be Established from Criterion
for Logarithmic BF/MNB
For example: Error twice the amplitude of Bias
• Logarithmic Bias Factor/Logarithmic-Mean Normalized Bias
LBF: 1.25:1 to 1:1.25
= LMNB: 25% to -20%
• Logarithmic Error Factor/Logarithmic-Mean Normalized Error
LEF: ≤ 1.56 = LMNE: ≤ 56%
Relating Criteria for LBF/LMNB and LEF/LMNE
• Criterion for Logarithmic EF/MNE can be Established from Criterion
for Logarithmic BF/MNB
For example: Error twice the amplitude of Bias
• Logarithmic Bias Factor/Logarithmic-Mean Normalized Bias
LBF: 1.50:1 to 1:1.50
= LMNB: 50% to -33%
• Logarithmic Error Factor/Logarithmic-Mean Normalized Error
LEF: ≤ 2.25 = LMNE: ≤ 125%
Variability Statistics
• Coefficient of Determination: R
2
– Should not be used in absence of previous statistics • Coefficient of Determination of Linear Regressions – Least Squares Regression through Origin: R o 2 • Used by some in global model community as a measure of performance and ability to capture variability – Least Median of Squares Regression • More robust, inherently minimizes effects of outliers • Comparison of Coefficients of Variation – Comparison of Standard Deviation/Mean of predictions and observations • Other statistical metrics?
Summary Items for Discussion
• What spatial scales to use for model performance? – Single Site; Local/Region of Interest – Large Domain/Continental • What statistics should be used? – What are the guidelines/benchmarks for performance evaluation? – Should the same guidelines be used for all components: • Sulfate, Nitrate, Carbonaceous, PM 2.5
• Ammonium, Organic Mass, EC, “Fine Soil”, “Major Metal Oxides” – How are network considerations taken into account in guidelines?
• Should models meet performance guidelines for an entire year
and/or
other time scales (monthly, seasonal)?
– Should there be separate guidelines for different time scales?
– Statistics based on daily P:O pairs – Average daily results to create weekly, monthly, seasonal or annual statistics
More Examples
• More examples of Comparison of Statistics: – Fractional – Arithmetic-Mean Normalized – Logarithmic-Mean Normalized
Mean Normalized/Fractional Statistics
Difference in Chossing Mean Normalized Statistics and Fractional Statistics: SO 4 2–
120% 100% 80% 60%
MNE, 50, 62%
40%
MNB, 50, 37%
20% 0% 50 55
MNE, 80, 60% MNB, 80, 36%
60 65 70 75 80 85
Data Interval (% of Data Centered at Median of Observations) MNB MNE FB FE
90
MNE, 99, 64% FE, 45% MNB, 99, 43% FB, 18%
95 100
Logarithmic/Arithmetic Statistics
Comparing Logarithmic and Arithmetic Statistics: SO 4 2–
120% 100% 80% 60%
MNE, 50, 62%
40%
MNB, 50, 37%
20% 0% 50 55
MNE, 80, 60% LMNE, 64% MNE, 99, 64% FE, 45% MNB, 80, 36% LMNB, 22% MNB, 99, 43% FB, 18%
60 65 70 75 80 85
Data Interval (% of Data Centered at Median of Observations) MNB MNE FB FE LMNB LMNE
90 95 100
Mean Normalized/Fractional Statistics
Difference in Chossing Mean Normalized Statistics and Fractional Statistics: EC 2.5
600% 500%
MNE, 99, 520% MNB, 99, 496%
400% 300%
MNE, 50, 268%
200%
MNB, 50, 256%
100% 0% 50 55
MNE, 80, 302% MNB, 80, 284% MNE, 90, 329% MNB, 90, 309%
60 65 70 75 80 85
Data Interval (% of Data Centered at Median of Observations) MNB MNE FB FE
90
FE, 104%
95
FB, 64%
100
Logarithmic/Arithmetic Statistics
Comparing Logarithmic and Arithmetic Statistics: EC 2.5
600% 500%
MNE, 99, 520% MNB, 99, 496%
400% 300%
MNE, 50, 268%
200%
MNB, 50, 256%
100% 0% 50 55
MNE, 90, 329%
60
MNB MNE FB MNE, 80, 302% MNB, 80, 284% MNB, 90, 309% LMNE, 301%
65 70 75 80 85
Data Interval (% of Data Centered at Median of Observations) FE LMNB LMNE
90
LMNB, 128%
95
FE, 104% FB, 64%
100
Mean Normalized/Fractional Statistics
Difference in Chossing Mean Normalized Statistics and Fractional Statistics: OM 2.5
200% 150% 100% 50%
MNE, 50, 56%
0% 50 55 -50%
MNB, 50, -52%
60 65 70
MNE, 80, 56%
75 80
MNB, 80, -52%
85 90
FE, 82% MNE, 99, 55%
95 100
MNB, 99, -50% FB, -78%
-100%
Data Interval (% of Data Centered at Median of Observations) MNB MNE FB FE
Logarithmic/Arithmetic Statistics
Comparing Logarithmic and Arithmetic Statistics: OM 2.5
200% 150% 100% 50%
MNE, 50, 56%
0% 50 55 -50%
MNB, 50, -52%
60 65 70
MNE, 80, 56%
75 80
MNB, 80, -52%
85 90 -100%
Data Interval (% of Data Centered at Median of Observations) MNB MNE FB FE LMNB LMNE LMNE, 167% FE, 82% MNE, 99, 55%
95 100
MNB, 99, -50% LMNB, -61% FB, -78%