Quality, Uncertainty and Bias – by way of example(s) Peter Fox Data Science – ITEC/CSCI/ERTH-6961 Week 11, November 13, 2012
Download ReportTranscript Quality, Uncertainty and Bias – by way of example(s) Peter Fox Data Science – ITEC/CSCI/ERTH-6961 Week 11, November 13, 2012
Quality, Uncertainty and Bias – by way of example(s) Peter Fox Data Science – ITEC/CSCI/ERTH-6961 Week 11, November 13, 2012 1 Where are we in respect to the data challenge? “The user cannot find the data; If he can find it, cannot access it; If he can access it, ; he doesn't know how good they are; if he finds them good, he can not merge them with other data” The Users View of IT, NAS 1989 2 Definitions – atmospheric science • Quality – Is in the eyes of the beholder – worst case scenario… or a good challenge • Uncertainty –has aspects of accuracy (how accurately the real world situation is assessed, it also includes bias) and precision (down to how many digits) 3 Definitions – atmospheric science • Bias has two aspects: – Systematic error resulting in the distortion of measurement data caused by prejudice or faulty measurement technique – A vested interest, or strongly held paradigm or condition that may skew the results of sampling, measuring, or reporting the findings of a quality assessment: • Psychological: for example, when data providers audit their own data, they usually have a bias to overstate its quality. • Sampling: Sampling procedures that result in a sample that is not truly representative of the population sampled. 4 Data quality needs: fitness for purpose • Measuring Climate Change: – Model validation: gridded contiguous data with uncertainties – Long-term time series: bias assessment is the must , especially sensor degradation, orbit and spatial sampling change • Studying phenomena using multi-sensor data: – Cross-sensor bias is needed • Realizing Societal Benefits through Applications: – Near-Real Time for transport/event monitoring - in some cases, coverage and timeliness might be more important that accuracy – Pollution monitoring (e.g., air quality exceedance levels) – accuracy • Educational (users generally not well-versed in the intricacies of quality; just taking all the data as usable can impair educational lessons) – only the best products Producers Consumers Quality Control Quality Assessment Fitness for Purpose Fitness for Use Trustee Trustor 6 Quality Control vs. Quality Assessment • Quality Control (QC) flags in the data (assigned by the algorithm) reflect “happiness” of the retrieval algorithm, e.g., all the necessary channels indeed had data, not too many clouds, the algorithm has converged to a solution, etc. • Quality assessment is done by analyzing the data “after the fact” through validation, intercomparison with other measurements, self-consistency, etc. It is presented as bias and uncertainty. It is rather inconsistent and can be found in papers, validation reports all over the place. 8 20080602 Fox VSTO et al. Level 2 data 9 Level 2 data • Swath for MISR, orbit 192 (2001) 10 Factors contributing to uncertainty and bias in L2 • • • • • Physical: instrument, retrieval algorithm, aerosol spatial and temporal variability… Input: ancillary data used by the retrieval algorithm Classification: erroneous flagging of the data Simulation: the geophysical model used for the retrieval Sampling: the averaging within the retrieval footprint Level 3 data 12 What is Level 3 accuracy? It is not often defined in Earth Science…. • If Level 2 errors are known, the corresponding Level 3 error can be computed, in principle, but… • Processing from L2L3 daily L3 monthly may reduce random noise but can also exacerbate systematic bias and introduce additional sampling bias • Quality is usually presented in the form of standard deviations (i.e., variability within a grid box), and sometimes pixel counts and quality histograms are provided. QC Flags are rare (MODIS Land Surface). • Convolution of natural variability with sensor/retrieval uncertainty and bias – need to understand their relative contribution to differences between data • This does not solve sampling bias MODIS vs. MERIS Same parameter MODIS Same space & time MERIS Different results – why? A threshold used in MERIS processing effectively excludes high aerosol values. Note: MERIS was designed primarily as an ocean-color instrument, so aerosols are “obstacles” not signal. Why is it so difficult? • Quality is perceived differently by data providers and data recipients. • There are many different qualitative and quantitative aspects of quality. • Methodologies for dealing with data qualities are just emerging • Almost nothing exists for remote sensing data quality ISO Model for Data Quality DQ_Completeness DQ_CompletenessCommission DQ_CompletenessOmission DQ_Thema cClassifica onCorrectness DQ_Thema cAccuracy DQ_NonQuan ta veA ributeAccuracy DQ_Quan ta veA ributeAccurach DQ_ConceptualConsistency DQ_Element DQ_LogicalConsistency DQ_DomainConsistency DQ_FormatConsistency DQ_TopologicalConsistency DQ_AccuracyOfATimeMeasurement DQ_TemporalAccuracy DQ_TemporalConsistency DQ_TemporalValidity DQ_AbsoluteExternalPosi onalAccuracy DQ_Posi onalAccuracy DQ_GriddedDataPosi onalAccuracy DQ_Rela veInternalPosi onalAccuracy But beware the limited nature of the ISO Model Land Surface Temperature anomaly from Advanced Very High Resolution Radiometer trend artifact from orbital drift discontinuity artifact from change in satellites Q: Examples of “Temporal Consistency” quality issues? ISO: Temporal Consistency=“correctness of the order of events” Going Beyond ISO for Data Quality • • • • Drilling Down on Completeness Expanding Consistency Examining Representativeness Generalizing Accuracy Drill Down on Completeness • Spatial Completeness: coverage of daily product Due to a wider swath, MODIS AOD covers more area than MISR. The seasonal and zonal patterns are rather similar MODIS Aqua AOD Average Daily Spatial Coverage By Region and Season MODIS Aqua AOD DJF MAM JJA SON Global 38% 42% 45% 41% Arctic 0% 5% 19% 4% Subarctic 3% 26% 49% 25% N Temperate 43% 43% 51% 52% Tropics 46% 48% 49% 44% S Temperate 45% 59% 60% 49% Subantarctic 32% 17% 10% 24% 5% 0% 0% 1% Antarctic This table and chart is Quality Evidence for the Spatial Completeness (Quality Property) of MODIS Aqua Dataset Expanding Consistency • Temporal “Consistency” Terra: +0.005 before 2004; -.005 after 2004, relative to Aeronet Aqua: no change over time relative to Aeronet From Levy, R., L. Remer, R. Kleidman, S. Mattoo, C. Ichoku, R. Kahn, and T. Eck, 2010. Global evaluation of the Collection 5 MODIS dark-target aerosol products over land, Atmos. Chem. Phys., 10, 10399-10420, doi:10.5194/acp-10-10399-2010. Examining Temporal Representativeness How well does a Daily Level 3 file represent the AOD for that day? Terra Aqua Arctic Subarctic Chance of an overpass during a given hour of the day N. Temperate N. Temperate S. Temperate Subantarctic Antarctic 0 24 Local Time 0 24 Local Time How well does a monthly product represent all the days of the month? MODIS Aqua AOD July 2009 MISR Terra AOD July 2009 • Completeness: MODIS dark target algorithm does not work for deserts • Representativeness: monthly aggregation is not enough for MISR and even MODIS • Spatial sampling patterns are different for MODIS Aqua and MISR Terra: “pulsating” areas over ocean are oriented differently due to different orbital direction during day-time measurement Examining Spatial Representativeness Neither pixel count nor standard deviation alone express how representative the grid cell value is MODIS Aerosol Optical Thickness at 0.55 microns Level 3 Grid AOT Mean Level 3 Grid AOT Standard Deviation 0 1 Level 2 Swath 2 0 0.5 1 Level 3 Grid AOT Input Pixel Count 1 122 Generalizing Accuracy • Aerosol Optical Depth – Different sources of uncertainty: • Low AOD: surface reflectance • High AOD: assumed aerosol models – Also: distribution is closer to lognormal than normal • Thus, “normal” accuracy expressions are problematic: – Slope relative to Ground Truth (Aeronet) – Correlation Coefficient – Root-mean-square error • Instead, common practice with MODIS data is: – Percent falling within expected error bounds of e.g., +0.05 + 0.2Aeronet Intercomparison of data from multiple sensors Data from multiple sources to be used together: • Current sensors/missions: MODIS, MISR, GOES, OMI. Harmonization needs: • It is not sufficient just to have the data from different sensors and their provenances in one place • Before comparing and fusing data, things need to be harmonized: • Metadata: terminology, standard fields, units, scale • Data: format, grid, spatial and temporal resolution, wavelength, etc. • Provenance: source, assumptions, algorithm, processing steps • Quality: bias, uncertainty, fitness-for-purpose, validation Dangers of easy data access without proper assessment of the joint data usage - It is easy to use data incorrectly 26 Example: South Pacific Anomaly 27 MODIS Level 3 dataday definition leads to artifact in correlation …is caused by an Overpass Time Difference 28 Investigation of artifacts in AOD correlation between MODIS and MISR near the Dateline Standard MODIS Terra and MISR Using calendar data-day definition for each pixel Using Local time-based data-day definition for each pixel Progressively removing artifacts by applying appropriate dataday definition for Level3 daily data generation for both MODIS Terra and MISR Different kinds of reported data quality • Pixel-level Quality: algorithmic guess at usability of data point – Granule-level Quality: statistical roll-up of Pixel-level Quality • Product-level Quality: how closely the data represent the actual geophysical state • Record-level Quality: how consistent and reliable the data record is across generations of measurements Different quality types are often erroneously assumed having the same meaning Ensuring Data Quality at these different levels requires different focus and action Sensitivity of Aerosol and Chlorophyll Relationship to Data-Day Definition • The standard Level 3 daily MODIS Aerosol Optical Depth(AOD) at 550nm generated by the atmosphere group uses granule UTC-time based data-day, while the standard Level 3 daily SeaWiFS Chlorophyll (Chl) generated by ocean group uses the pixel-based Local Solar Time (LST) data-day. • The correlation coefficients between Chl and AOT differ significantly near the dateline due to the data-day definition. • This study suggests that the same or similar statistical aggregation methods using the same or similar data-day definitions, should be used when creating climate time series for different parameters to reduce potential emergence of artifacts. Sensitivity Study: Effect of the Data Day definition on Ocean Color data correlation with Aerosol data Starting with Aerosols: Only half of the Data Day artifact is present because the Ocean Group uses the better Data Day definition! Correlation between MODIS Aqua AOD (Ocean group product) and MODIS-Aqua AOD (Atmosphere group product) Pixel Count distribution Sensitivity Study: Effect of the Data Day definition on Ocean Color data correlation with Aerosol data Continuing with Chlorophyll and Aerosols: Data Day effect is quite visible! Pixel Count distribution Correlation between MODIS Aqua Chlorophyll and MODIS-Aqua AOD 550nm (Atmosphere group product) for Apr 1 – Jun 4 2007 GEO-CAPE impact: observation time difference with ACE and other sensors may lead to artifacts in comparison statistics Sensitivity of Aerosol and Chl Relationship to Data-Day Definition Correlation Coefficients MODIS AOT at 550nm and SeaWiFS Chl Artifact: difference between using LST and the calendar UTC-based dataday Difference between Correlation of A and B: A: MODIS AOT of LST and SeaWiFS Chl B: MODIS AOT of UTC and SeaWiFS Chl Presenting data quality to users Split quality (viewed here broadly) into two categories: • Global or product level quality information, e.g. consistency, completeness, etc., that can be presented in a tabular form. • Regional/seasonal: various approaches: – maps with outlines regions, one map per sensor/parameter/season – scatter plots with error estimates, one per a combination of Aeronet station, parameter, and season; with different colors representing different wavelengths, etc. But really what works is… 36 Quality Labels Quality Labels Generated for a request for 20-90 deg N, 0-180 deg E Advisory Report (Dimension Comparison Detail) comparable input parameters and their semantic equivalence 39 Advisory Report (Expert Advisories Detail) Expert Advisories 40 Quality Comparison Table for Level3 AOD (Global example) Quality Aspect MODIS MISR Completeness Total Time Range Platform Time Range Terra 2/2/2000-present Aqua 7/2/2002-present Local Revisit Time Platform Time Range Terra 10:30 AM Aqua 1:30 PM 2/2/200-present Platform Time Range Terra 10:30 AM Revisit Time global coverage of entire earth in 1 day; coverage overlap near pole global coverage of entire earth in 9 days & coverage in 2 days in polar region Swath Width 2330 km 380 km Spectral AOD AOD over ocean for 7 wavelengths (466, 553, 660, 860, 1240, 1640, 2120 nm ); AOD over land for 4 wavelengths (466, 553, 660, 2120 nm (land) AOD over land and ocean for 4 wavelengths (446, 558, 672, and 866 nm) AOD Uncertainty or Expected Error (EE) +-0.03+- 5% (over ocean; QAC > = 1) +-0.05+-20% (over land, QAC=3); 63% fall within 0.05 or 20% of Aeronet AOD; 40% are within 0.03 or 10% Successful Retrievals 15% of Time 15% of Time (slightly more because of retrieval over Glint region also) Going down to the individual level 42 The quality of data can vary considerably AIRS Parameter Total Precipitable Water Carbon Monoxide Surface Temperature Best (%) Good Do Not (%) Use (%) 38 38 24 64 5 7 44 29 51 Version 5 Level 2 Standard Retrieval Statistics Data Quality • Validation of aerosol data show that not all data pixel labeled as “bad” are actually bad if looking at from a bias perspective. • But many pixels are biased due to various reasons From Levy et al, 2009 44 2/18/2011 Percent of Biased Data in MODIS Aerosols Over Land Increase as Confidence Flag Decreases Very Good Good Compliant* Biased Low Biased High Marginal Bad 0% 20% 40% 60% 80% 100% *Compliant data are within + 0.05 + 0.2Aeronet Statistics from Hyer, E., J. Reid, and J. Zhang, 2010, An over-land aerosol optical depth data set for data assimilation by filtering, correction, and aggregation of MODIS Collection 5 optical depth retrievals, Atmos. Meas. Tech. Discuss., 3, 4091–4167. The effect of bad quality data is often not negligible Hurricane Ike, 9/10/2008 Total Column Precipitable Water Quality Best kg/m2 Good Do Not Use …or they can be more complicated Hurricane Ike, viewed by the Atmospheric Infrared Sounder (AIRS) Air Temperature at 300 mbar PBest : Maximum pressure for which quality value is “Best” in temperature Quality flags are also sometimes packed together into bytes Cloud Mask Status Flag Day / Night Flag Snow/ Ice Flag 0=Undetermined 1=Determined 0=Night 1=Day 0=Yes 1=No Cloud Mask Cloudiness Flag 0=Confident cloudy 1=Probably cloudy 2=Probably clear 3=Confident clear Sunglint Flag 0=Yes 1=No Surface Type Flag 0=Ocean, deep lake/river 1=Coast, shallow lake/river 2=Desert 3=Land Bitfield arrangement for the Cloud_Mask_SDS variable in atmospheric products from Moderate Resolution Imaging Spectroradiometer (MODIS) So, replace bad-quality pixels with fill values!!!! Original data array (Total column precipitable water) Mask based on user criteria (Quality level < 2) Good quality data pixels retained Output file has the same format and structure as the input file (except for extra mask and original_data fields) Visualizations help users see the effect of different quality filters Data within product file Best quality only Best + Good quality Or, let users select their own criteria... Initial settings are based on Science Team recommendation. (Note: “Good” retains retrievals that Good or better). You can choose settings for all parameters at once... ... or parameter by parameter. 51 Types of Bias Correction Type of Correction Spatial Basis Tempora Pros l Basis Cons Relative (Crosssensor) linear Climatological Region Season Not influenced by data in other regions, good sampling Difficult to validate Relative (Crosssensor) nonlinear Climatological Global Full data record Complete sampling Difficult to validate Anchored Parameterized Linear Near Aeronet stations Full data record Can be validated Limited areal sampling Anchored Parameterized Non-Linear 2/18/2011 Near Aeronet stations Full data record Can be validated Limited insight into 52 correction Data Quality Issues • Validation of aerosol data show that not all data pixels labeled as “bad” are actually bad if looking at from a bias perspective. • But many pixels are biased differently due to various reasons From Levy et al, 2009 Quality & Bias assessment using FreeMind FreeMind allows capturing various relations between various aspects of aerosol measurements, algorithms, conditions, validation, etc. The “traditional” worksheets do not support complex multi-dimensional nature of the task from the Aerosol Parameter Ontology Title: MODIS Terra C5 AOD vs. Aeronet during Aug-Oct Biomass burning in Central Brazil, South America (General) Statement: Collection 5 MODIS AOD at 550 nm during AugOct over Central South America highly over-estimates for large AOD and in non-burning season underestimates for small AOD, as compared to Aeronet; good comparisons are found at moderate AOD. Region & season characteristics: Central region of Brazil is mix of forest, cerrado, and pasture and known to have low AOD most of the year except during biomass burning season (Dominating factors leading to Aerosol Estimate bias): 1.Large positive bias in AOD estimate during biomass burning season may be due to wrong assignment of Aerosol absorbing characteristics. (Specific explanation) a constant Single Scattering Albedo ~ 0.91 is assigned for all seasons, while the true value is closer to ~0.92-0.93. * Alta Floresta * Mato Grosso * Santa Cruz Central South America [ Notes or exceptions: Biomass burning regions in Southern Africa do not show as large positive bias as in this case, it may be due to different optical characteristics or single scattering albedo of smoke particles, Aeronet observations of SSA confirm this] 2. Low AOD is common in non burning season. In Low AOD cases, biases are highly dependent on lower boundary conditions. In general a negative bias is found due to uncertainty in Surface Reflectance Characterization which dominates if signal from atmospheric aerosol is low. (Example) : Scatter plot of MODIS AOD and AOD at 550 nm vs. Aeronet from ref. (Hyer et al, 2011) (Description Caption) shows severe over-estimation of MODIS Col 5 AOD (dark target algorithm) at large AOD at 550 nm during AugOct 2005-2008 over Brazil. (Constraints) Only best quality of MODIS data (Quality =3 ) used. Data with scattering angle > 170 deg excluded. (Symbols) Red Lines define regions of Expected Error (EE), Green is the fitted slope Results: Tolerance= 62% within EE; RMSE=0.212 ; r2=0.81; Slope=1.00 For Low AOD (<0.2) Slope=0.3. For high AOD (> 1.4) Slope=1.54 0 1 Aeronet AOD 2 Reference: Hyer, E. J., Reid, J. S., and Zhang, J., 2011: An over-land aerosol optical depth data set for data assimilation by filtering, correction, and aggregation of MODIS Collection 5 optical depth retrievals, Atmos. Meas. Tech., 4, 379-408, doi:10.5194/amt-4-379-2011 Completeness: Observing Conditions for MODIS AOD at 550 nm Over Ocean Region Ecosystem % of Retrieval Within Expected Error Average Aeronet AOD AOD Estimation Relative to Aeronet US Atlantic Ocean Dominated by Fine mode aerosols (smoke & sulfate) 72% 0.15 Over- estimated (by 7%) * Indian Ocean Dominated by Fine mode aerosols (smoke & sulfate) 64 % 0.16 Over- estimated (by 7% ) * Asian Pacific Oceans Dominated by fine aerosol, not dust 56% 0.21 Over-estimated (by 13%) “Saharan” Ocean Outflow Regions in Atlantic dominated by Dust in Spring 56% 0.31 Random Bias (1%) * Mediterranean Dominated by fine aerosol 57% 0.23 Under- estimated (by 6% ) * *Remer L. A. et al., 2005: The MODIS Aerosol Algorithm, Products and Validation. Journal of the Atmospheric Sciences, Special Section. 62, 947-973. Completeness: Observing Conditions for MODIS AOD at 550 nm Over Land Region Ecosystem % of Retrieval Within Expected Error Correlation W.r.t Chinese ground Sun Hazetometer AOD Estimation Relative to Ground based sensor Yanting, China Agriculture Site (central China) 45% slope=1.04 ; offset= -0.063 Corr ^2 = 0.83 Slightly Overestimated Fukung, China Semi Desert (North West China) Site 7% slope=1.65 offset=0.074 Corr ^2 = 0.58 Over- estimated (more than 100% at large AOD values Beijing Urban Site Industrial Pollution 35% Slope = 0.38, Offset = 0.086, Corr ^2 = 0.46% Severely Underestimated (more than 100% at large AOD values) * Li Z. et al, 2007: Validation and understanding of Moderate Resolution Imaging Spectroradiometer aerosol products (C5) using ground-based measurements from the handheld Sun photometer network in China, JGR, VOL. 112, D22S07, doi:10.1029/2007JD008479. Summary • Quality is very hard to characterize, different groups will focus on different and inconsistent measures of quality – HOW WOULD YOU ADDRESS THIS? • Products with known Quality (whether good or bad quality) are more valuable than products with unknown Quality. – Known quality helps you correctly assess fitness-for-use • Harmonization of data quality is even more difficult that characterizing quality of a single data product 58 What is next • Project discussions…. • A3 – coming back to you this week • Next week 12 – Nov. 20 - Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Integration – Project write ups due. • Reading for this week – see web site • Last class is week 13, Nov. 27 – project presentations (and final assignment due) 59