PowerPoint Presentation Template

Download Report

Transcript PowerPoint Presentation Template

PM

2.5

Model Performance: Lessons Learned and Recommendations

Naresh Kumar Eladio Knipping EPRI February 11, 2004

Acknowledgements

Atmospheric & Environmental Research, Inc. (AER) – Betty Pun, Krish Vijayaraghavan and Christian Seigneur • Tennessee Valley Authority (TVA) – Elizabeth Bailey, Larry Gautney, Qi Mao and others • University of California, Riverside – Zion Wang, Chao-Jung Chien and Gail Tonnesen

Overview

Model Performance Issues Need for Performance Guidelines/BenchmarkingReview of StatisticsSummary

Model Performance Issues

Evaluation of Modeling SystemsLocal vs. Regional EvaluationDaily/Episodic/Seasonal/Annual AveragingThreshold and OutliersWhat Species to Evaluate?Sampling/Network Issues

Examples from Model Applications

Two applications of CMAQ-MADRID – Southeastern U.S. (SOS 1999 Episode) – Big Bend National Park, Texas (BRAVO); Four-Month Study • Statistical performance for SO

4 2– , EC, OM, PM 2.5

Application in Southeastern U.S.

Southern Oxidant Study (SOS 1999) – June 29 to July 10, 1999 • Meteorology processed from MM5 simulations using MCIP2.2 Emissions files courtesy of TVASimulation – Continental U.S. Domain – 32-km horizontal resolution without nesting

Application to Big Bend National Park

REMSAD CMAQ-MADRID 45 49

• The Georgia Tech/Goddard Global Ozone Chemistry Aerosol Radiation Transport (GOCART) model prescribed boundary conditions for SO 2 and SO 4 2– to the REMSAD domain.

• Preliminary Base Case simulation used boundary conditions as prescribed from a simulation of the larger outer domain by REMSAD. • SO 2 and SO 4 2– concentrations were scaled at CMAQ-MADRID boundary according to CASTNet and IMPROVE Network observations.

BRAVO Monitoring Network

Wichita Mtns Hagerman Esperanza Guadalupe Presidio Mtns McDonald Marathon Persimmon Gap

Big Bend K-Bar

Lake Colorado City Stephenville Monahans Ft McKavett Ft Stockton Ft La ncaster Sanderson Langtry LBJ Amistad Rio Grande Eagle Pass Brackettville Pleasanton Everton Ranch Purtis Creek Stillhouse Somerville Wright Patman Center Big Thicket San Bernard Laredo Aransas Lake Corpus Christi Padre Island Falcon Dam Laguna Atascosa

Local vs. Regional (SOS 1999)

rural urban suburban Yorkville (YRK) North Birmingham (BHM) Centreville (CTR) Jefferson Street (JST) Oak Grove (OAK) Outlying Landing Field #8 (OLF) Gulfport (GFP) Pensacola (PNS) SO 4 2– OM EC PM 2.5

SEARCH SEARCH SEARCH SEARCH

MNB(%) MNE(%)

20% 72% 14% -19% 51% 72% 52% 32%

SO 4 2– OM EC PM 2.5

IMPROVE

MNB(%) MNGE(%)

51% 89% IMPROVE IMPROVE IMPROVE -25% -8% -8% 46% 54% 49%

Local vs. Regional (BRAVO)

SO 4 2– Statistic Big Bend NP BRAVO Network CMAQ Sulfate Statistics Normalized Bias (%) Mean Norm. Bias 20% 37% Mean Norm. Error 55% 65% CMAQ Sulfate Statistics Normalized Error (%) < 15 15 to 30 30 to 45 45 to 60 60 to 75 75 to 90 90 to 105 105 to 120 > 120 5.3

13.2

52.8

23.4

83.9

129.9

40 Spatial Distribution of Mean Normalized Bias for SO 4 2– 40.8

54.5

90.3

63 < 15 15 to 30 30 to 45 45 to 60 60 to 75 75 to 90 90 to 105 105 to 120 > 120 Spatial Distribution of Mean Normalized Error for SO 4 2–

Daily SO

4 2–

P:O Pairs with Different Averaging

MNB = 37% MNE = 65% y = 1.11x + 0.54

R 2 = 0.47

MNB = 29% MNE = 35% y = 1.30x + 0.01

R 2 = 0.50

4 2 0 0 Daily 24 20 16 12 8 4 0 0 4 8 12 Observed 16 Monthly 20 24 8 6 2 4 Observed 6 8 4 2 0 0 Weekly 12 10 8 6 4 2 0 0 2 4 6 Observed 8 Four-Month 10 12 8 6 2 4 Observed 6 8 MNB = 28% MNE = 43% y = 1.32x - 0.11

R 2 = 0.56

MNB = 26% MNE = 26% y = 1.73x - 1.41

R 2 = 0.87

Daily SO

4 2–

P:O Pairs for Each Month

MNB = 2% MNE = 52% y = 0.71x + 0.46

R 2 = 0.26

MNB = 37% MNE = 61% y = 1.08x + 0.46

R 2 = 0.51

20 16 12 8 4 0 0 12 10 8 6 4 2 0 0 2 4 6 Observed 8 September 1999 10 12 4 July 1999 8 12 Observed 16 20 August 1999 20 16 12 8 4 0 0 4 8 12 Observed October 1999 16 20 24 20 16 12 8 4 0 0 4 8 12 Observed 16 20 24 MNB = 8% MNE = 49% y = 0.99x + 0.22

R 2 = 0.58

MNB = 85% MNE = 92% R 2 y = 1.53x + 0.77

= 0.51

Effect of Threshold

MNB and MNE as a Function of Threshold (PM 2.5

Sulfate - BRAVO - All Sites - Daily Data) FE, 48% FB, 9% Threshold Concentration (µg/m 3 ) MNB FE

Mean-Normalized/Fractional Statistics

Difference in Chossing Mean Normalized Statistics and Fractional Statistics

140% 120% 100% 80% 60%

MNE, 50, 63%

40%

MNB, 50, 29%

20% 0% 50 55

MNE, 80, 61% MNB, 80, 31%

60 65 70 75 80 85

Data Interval (% of Data Centered at Median of Observations) MNB MNE FB FE

90

MNE, 99, 66% FE, 48% MNB, 99, 38%

95

FB, 9%

100

Need for Model Performance Guidelines

If no guidelines exist – Conduct model simulation with best estimate of emissions and meteorology – Perform model evaluation using favorite statistics • Difficult to compare across models – State that model performance is “quite good” or “adequate” or “reasonable” or “not bad” or “as good as it gets” – Use relative reduction factors • With guidelines for ozone modeling – If model didn’t perform within specified guidelines • Extensive diagnostics performed to understand poor performance • Improved appropriate elements of modeling system • Enhanced model performance

Issues with Defining Performance Guidelines for PM

2.5

Models

What is “reasonable”, “acceptable” or “good” model performance? – Past experience: How well have current models done?

What statistical measures should be used to evaluate the models?

Criteria to Select Statistical Measures I

Simple yet Meaningful – Easy to Interpret – Relevant to Air Quality Modeling Community • Properties of Statistics – Normalized – Paired – Non-Fractional vs. Absolute vs. Unpaired vs. Fractional • “Symmetry” – Underestimates and overestimates must be perceived equally – Underestimates and overestimates must be weighted equally – Scalable: biases scale appropriately in statistics

Criteria to Select Statistical Measures II

Statistics that can attest toBiasErrorAbility to capture variabilityPeak accuracy (to some extent) – Normalizes

daily observations predictions paired with corresponding

Inherently minimizes effect of outliers

daily

– Some statistics/figures may be preferable for EVALUATION, whereas others may be preferred for DIAGNOSTICS

Problems with Thresholds & Outliers

Issues with addressing low-end comparisons via threshold – Instrumental uncertainty: detection limit, signal-to-noise – Operational uncertainty – Additional considerations: network, background concentration, geography, demographics • Inspection for outliers – Outlier vs. valid high observation • Definition of outlier must be objective and unambiguous • Clear guidance necessary for performance analysis.

Review of Statistics

Ratio of Means (Bias of Means) or Quantile-Quantile Comparisons – Defeats purpose of daily observations: completely unpaired – “Hides” any measure of true model performance • Normalized Mean Statistics

(not to confuse with Mean Normalized)

– Defeats purpose of daily observations: Equally weighs all errors regardless of magnitude of individual daily observations – Masks results in bias (e.g., numerator zero effect) • Based on Linear Regressions – Slope of Least Squares Regression; Root (Normalized) Mean Square Error – Slope of Least Median of Squares Regression (Rousseeuw regression) – Can be skewed; neglects magnitude of observations; good for cross-comparisons.

Fractional Statistics – Taints integrity of statistics by placing predictions in denominator; not scalable

Bias Statistics

Mean Normalized Bias/Arithmetic Bias Factor

ABF

 1

N i N

  1

P i O i

 1 

MNB

 1  1

N i N

  1 (

P i

O i

– Same statistic: ABF is the style for “symmetric” perception

O i

) • ABF = 2:1 for 100% MNB, ABF = 1:2 for –50% MNB • MNB in % can be useful during diagnostics due to simple and meaningful comparison to MNE, but the comparison is flawed.

– The statistics give less weight to underpredictions than to overpredicitions. • Logarithmic Bias Factor/Logarithmic-Mean Normalized Bias

LBF

 exp 1

N i N

  1 ln  

P i O i

  ,

LMNB=LBF

– 1 – Wholly symmetric representation of bias that satisfies all criteria – Can be written in “factor” form or in “percentage” form

Error Statistics

Mean Normalized Error – Each data point normalized with paired observation – Similar flaw as Arithmetic Mean Normalized Bias: The statistic gives less weight to underpredictions than to overpredicitions. 1

N i N

  1

P i

O i O i

Logarithmic Error Factor/Logarithmic-Mean Normalized Error

LEF

 exp 1

N i N

  1 ln

P i O i

,

LMNE=LEF

– 1 – Satisfies all criteria – Comparisons between logarithmic-based statistics (bias and error) are visibly meaningful when expressed in “factor” form

Comparing Bias and Error Statistics Based on Arithmetic and Logarithmic Means

Case I Pred 2 0.5

Obs 1 1 Case I MNB 25% Case II -50% Case III -15% Case IV 60% Arithmetic BF 1.25 : 1 1 : 2 1 : 1.18

1.6 : 1 FB 0% LMNB 0% -67% -50% Logarithmic LBF 1 : 1 1 : 2 -24% -24% 24% 32% 1 : 1.32

1.32 : 1 Case II Pred 0.5

0.5

Obs 1 1 Case III Pred 1 2 3 4 1 Obs 1 2 3 4 4 Case IV Pred 4 1 2 3 4 Obs 1 1 2 3 4 3 2 5 4 1 0 0 Case I Case II Case III Case IV Arithmetic Logarithmic MNE FE LMNE LEF 75% 67% 100% 2.00

50% 67% 100% 2.00

15% 24% 60% 24% 32% 32% 1.32

1.32

1 2 3 Obs Case III Case IV 4 5

Mean Normalized/Fractional Statistics

Difference in Chossing Mean Normalized Statistics and Fractional Statistics: PM 2.5

100% 80% 60%

MNE, 50, 55%

40%

MNE, 80, 55% MNE, 99, 56% FE, 52%

20%

MNB, 50, 13%

0% 50 -20% 55 60 65 70 75

MNB, 80, 13%

80 85

Data Interval (% of Data Centered at Median of Observations) MNB MNE FB FE

90

MNB, 99, 14%

95 100

FB, -7%

Logarithmic/Arithmetic Statistics

Comparing Logarithmic and Arithmetic Statistics: PM 2.5

100%

LMNE, 78%

80% 60%

MNE, 50, 55%

40%

MNE, 80, 55% MNE, 99, 56% FE, 52%

20%

MNB, 50, 13%

0% 50 -20% 55 60 65 70 75

MNB, 80, 13%

80 85

Data Interval (% of Data Centered at Median of Observations) MNB MNE FB FE LMNB LMNE

90

MNB, 99, 14%

95

FB, -7%

100

LMNB, -8%

Logarithmic/Arithmetic Statistics Bias Statistics

MNB SO 4 2– EC 37% 342% OM PM 2.5

-50% 13% Arithmetic BF 1.37 : 1 4.42 : 1 1 : 2.01

1.13 : 1 FB LMNB 18% 22% 64% 128% Logarithmic LBF 1.22 : 1 2.28 : 1 -78% -61% -7% -8% 1 : 2.57

1 : 1.09

Error Statistics

OM PM 2.5

Arithmetic Logarithmic MNE FE LMNE LEF SO 4 2– EC 59% 45% 64% 1.64

364% 104% 301% 4.01

55% 82% 167% 2.67

55% 51% 78% 1.78

Note: MNB/ABF & MNE use 95% data interval. FB, FE, LMNB/LBF and LMNE/LEF use 100% of data.

Relating Criteria for LBF/LMNB and LEF/LMNE

Criterion for Logarithmic EF/MNE can be Established from Criterion

for Logarithmic BF/MNB

For example: Error twice the amplitude of Bias

Logarithmic Bias Factor/Logarithmic-Mean Normalized Bias

LBF: 1.25:1 to 1:1.25

= LMNB: 25% to -20%

Logarithmic Error Factor/Logarithmic-Mean Normalized Error

LEF: ≤ 1.56 = LMNE: ≤ 56%

Relating Criteria for LBF/LMNB and LEF/LMNE

Criterion for Logarithmic EF/MNE can be Established from Criterion

for Logarithmic BF/MNB

For example: Error twice the amplitude of Bias

Logarithmic Bias Factor/Logarithmic-Mean Normalized Bias

LBF: 1.50:1 to 1:1.50

= LMNB: 50% to -33%

Logarithmic Error Factor/Logarithmic-Mean Normalized Error

LEF: ≤ 2.25 = LMNE: ≤ 125%

Variability Statistics

Coefficient of Determination: R

2

– Should not be used in absence of previous statistics • Coefficient of Determination of Linear Regressions – Least Squares Regression through Origin: R o 2 • Used by some in global model community as a measure of performance and ability to capture variability – Least Median of Squares Regression • More robust, inherently minimizes effects of outliers • Comparison of Coefficients of Variation – Comparison of Standard Deviation/Mean of predictions and observations • Other statistical metrics?

Summary Items for Discussion

What spatial scales to use for model performance? – Single Site; Local/Region of Interest – Large Domain/Continental • What statistics should be used?What are the guidelines/benchmarks for performance evaluation? – Should the same guidelines be used for all components: • Sulfate, Nitrate, Carbonaceous, PM 2.5

• Ammonium, Organic Mass, EC, “Fine Soil”, “Major Metal Oxides” – How are network considerations taken into account in guidelines?

Should models meet performance guidelines for an entire year

and/or

other time scales (monthly, seasonal)?

– Should there be separate guidelines for different time scales?

– Statistics based on daily P:O pairs – Average daily results to create weekly, monthly, seasonal or annual statistics

More Examples

More examples of Comparison of Statistics:FractionalArithmetic-Mean NormalizedLogarithmic-Mean Normalized

Mean Normalized/Fractional Statistics

Difference in Chossing Mean Normalized Statistics and Fractional Statistics: SO 4 2–

120% 100% 80% 60%

MNE, 50, 62%

40%

MNB, 50, 37%

20% 0% 50 55

MNE, 80, 60% MNB, 80, 36%

60 65 70 75 80 85

Data Interval (% of Data Centered at Median of Observations) MNB MNE FB FE

90

MNE, 99, 64% FE, 45% MNB, 99, 43% FB, 18%

95 100

Logarithmic/Arithmetic Statistics

Comparing Logarithmic and Arithmetic Statistics: SO 4 2–

120% 100% 80% 60%

MNE, 50, 62%

40%

MNB, 50, 37%

20% 0% 50 55

MNE, 80, 60% LMNE, 64% MNE, 99, 64% FE, 45% MNB, 80, 36% LMNB, 22% MNB, 99, 43% FB, 18%

60 65 70 75 80 85

Data Interval (% of Data Centered at Median of Observations) MNB MNE FB FE LMNB LMNE

90 95 100

Mean Normalized/Fractional Statistics

Difference in Chossing Mean Normalized Statistics and Fractional Statistics: EC 2.5

600% 500%

MNE, 99, 520% MNB, 99, 496%

400% 300%

MNE, 50, 268%

200%

MNB, 50, 256%

100% 0% 50 55

MNE, 80, 302% MNB, 80, 284% MNE, 90, 329% MNB, 90, 309%

60 65 70 75 80 85

Data Interval (% of Data Centered at Median of Observations) MNB MNE FB FE

90

FE, 104%

95

FB, 64%

100

Logarithmic/Arithmetic Statistics

Comparing Logarithmic and Arithmetic Statistics: EC 2.5

600% 500%

MNE, 99, 520% MNB, 99, 496%

400% 300%

MNE, 50, 268%

200%

MNB, 50, 256%

100% 0% 50 55

MNE, 90, 329%

60

MNB MNE FB MNE, 80, 302% MNB, 80, 284% MNB, 90, 309% LMNE, 301%

65 70 75 80 85

Data Interval (% of Data Centered at Median of Observations) FE LMNB LMNE

90

LMNB, 128%

95

FE, 104% FB, 64%

100

Mean Normalized/Fractional Statistics

Difference in Chossing Mean Normalized Statistics and Fractional Statistics: OM 2.5

200% 150% 100% 50%

MNE, 50, 56%

0% 50 55 -50%

MNB, 50, -52%

60 65 70

MNE, 80, 56%

75 80

MNB, 80, -52%

85 90

FE, 82% MNE, 99, 55%

95 100

MNB, 99, -50% FB, -78%

-100%

Data Interval (% of Data Centered at Median of Observations) MNB MNE FB FE

Logarithmic/Arithmetic Statistics

Comparing Logarithmic and Arithmetic Statistics: OM 2.5

200% 150% 100% 50%

MNE, 50, 56%

0% 50 55 -50%

MNB, 50, -52%

60 65 70

MNE, 80, 56%

75 80

MNB, 80, -52%

85 90 -100%

Data Interval (% of Data Centered at Median of Observations) MNB MNE FB FE LMNB LMNE LMNE, 167% FE, 82% MNE, 99, 55%

95 100

MNB, 99, -50% LMNB, -61% FB, -78%