Quality, Uncertainty and Bias – by way of example(s) Peter Fox Data Science – ITEC/CSCI/ERTH-6961 Week 11, November 13, 2012

Download Report

Transcript Quality, Uncertainty and Bias – by way of example(s) Peter Fox Data Science – ITEC/CSCI/ERTH-6961 Week 11, November 13, 2012

Quality, Uncertainty and Bias
– by way of example(s)
Peter Fox
Data Science – ITEC/CSCI/ERTH-6961
Week 11, November 13, 2012
1
Where are we in respect to
the data challenge?
“The user cannot find the data;
If he can find it, cannot access it;
If he can access it, ;
he doesn't know how good they are;
if he finds them good, he can not merge
them with other data”
The Users View of IT, NAS 1989
2
Definitions – atmospheric science
• Quality
– Is in the eyes of the beholder – worst
case scenario… or a good challenge
• Uncertainty
–has aspects of accuracy (how
accurately the real world situation is
assessed, it also includes bias) and
precision (down to how many digits)
3
Definitions – atmospheric science
• Bias has two aspects:
– Systematic error resulting in the distortion of
measurement data caused by prejudice or
faulty measurement technique
– A vested interest, or strongly held paradigm or
condition that may skew the results of
sampling, measuring, or reporting the findings
of a quality assessment:
• Psychological: for example, when data providers audit their
own data, they usually have a bias to overstate its quality.
• Sampling: Sampling procedures that result in a sample that is
not truly representative of the population sampled.
4
Data quality needs:
fitness for purpose
• Measuring Climate Change:
– Model validation: gridded contiguous data with uncertainties
– Long-term time series: bias assessment is the must , especially
sensor degradation, orbit and spatial sampling change
• Studying phenomena using multi-sensor data:
– Cross-sensor bias is needed
• Realizing Societal Benefits through Applications:
– Near-Real Time for transport/event monitoring - in some cases,
coverage and timeliness might be more important that accuracy
– Pollution monitoring (e.g., air quality exceedance levels) – accuracy
• Educational (users generally not well-versed in the
intricacies of quality; just taking all the data as usable can
impair educational lessons) – only the best products
Producers
Consumers
Quality Control
Quality Assessment
Fitness for Purpose
Fitness for Use
Trustee
Trustor
6
Quality Control vs. Quality Assessment
• Quality Control (QC) flags in the data (assigned by
the algorithm) reflect “happiness” of the retrieval
algorithm, e.g., all the necessary channels indeed
had data, not too many clouds, the algorithm has
converged to a solution, etc.
• Quality assessment is done by analyzing the data
“after the fact” through validation, intercomparison
with other measurements, self-consistency, etc. It is
presented as bias and uncertainty. It is rather
inconsistent and can be found in papers, validation
reports all over the place.
8
20080602 Fox VSTO et al.
Level 2 data
9
Level 2 data
• Swath
for
MISR,
orbit
192
(2001)
10
Factors contributing to
uncertainty and bias in L2
•
•
•
•
•
Physical: instrument, retrieval algorithm,
aerosol spatial and temporal variability…
Input: ancillary data used by the retrieval
algorithm
Classification: erroneous flagging of the data
Simulation: the geophysical model used for the
retrieval
Sampling: the averaging within the retrieval
footprint
Level 3 data
12
What is Level 3 accuracy?
It is not often defined in Earth Science….
• If Level 2 errors are known, the corresponding Level
3 error can be computed, in principle, but…
• Processing from L2L3 daily  L3 monthly may
reduce random noise but can also exacerbate
systematic bias and introduce additional sampling
bias
• Quality is usually presented in the form of standard
deviations (i.e., variability within a grid box), and
sometimes pixel counts and quality histograms are
provided. QC Flags are rare (MODIS Land Surface).
• Convolution of natural variability with sensor/retrieval
uncertainty and bias – need to understand their
relative contribution to differences between data
• This does not solve sampling bias
MODIS vs. MERIS
Same parameter
MODIS
Same space & time
MERIS
Different results – why?
A threshold used in MERIS processing effectively excludes high aerosol
values. Note: MERIS was designed primarily as an ocean-color instrument, so aerosols are
“obstacles” not signal.
Why is it so difficult?
• Quality is perceived differently by data
providers and data recipients.
• There are many different qualitative
and quantitative aspects of quality.
• Methodologies for dealing with data
qualities are just emerging
• Almost nothing exists for remote
sensing data quality
ISO Model for Data Quality
DQ_Completeness
DQ_CompletenessCommission
DQ_CompletenessOmission
DQ_Thema cClassifica onCorrectness
DQ_Thema cAccuracy
DQ_NonQuan ta veA ributeAccuracy
DQ_Quan ta veA ributeAccurach
DQ_ConceptualConsistency
DQ_Element
DQ_LogicalConsistency
DQ_DomainConsistency
DQ_FormatConsistency
DQ_TopologicalConsistency
DQ_AccuracyOfATimeMeasurement
DQ_TemporalAccuracy
DQ_TemporalConsistency
DQ_TemporalValidity
DQ_AbsoluteExternalPosi onalAccuracy
DQ_Posi onalAccuracy
DQ_GriddedDataPosi onalAccuracy
DQ_Rela veInternalPosi onalAccuracy
But beware the limited
nature of the ISO Model
Land Surface Temperature anomaly from Advanced Very High Resolution Radiometer
trend artifact from
orbital drift
discontinuity
artifact from
change in
satellites
Q: Examples of “Temporal Consistency” quality issues?
ISO: Temporal Consistency=“correctness of the order of events”
Going Beyond ISO for Data
Quality
•
•
•
•
Drilling Down on Completeness
Expanding Consistency
Examining Representativeness
Generalizing Accuracy
Drill Down on Completeness
• Spatial Completeness: coverage of
daily product
Due to a wider swath, MODIS AOD covers more area than MISR. The seasonal and zonal
patterns are rather similar
MODIS Aqua AOD Average Daily
Spatial Coverage By Region and
Season
MODIS Aqua AOD
DJF
MAM
JJA
SON
Global
38%
42%
45%
41%
Arctic
0%
5%
19%
4%
Subarctic
3%
26%
49%
25%
N Temperate
43%
43%
51%
52%
Tropics
46%
48%
49%
44%
S Temperate
45%
59%
60%
49%
Subantarctic
32%
17%
10%
24%
5%
0%
0%
1%
Antarctic
This table and chart is Quality Evidence for the Spatial Completeness (Quality Property) of MODIS Aqua Dataset
Expanding Consistency
• Temporal “Consistency”
Terra: +0.005 before 2004; -.005 after
2004, relative to Aeronet
Aqua: no change over time relative to
Aeronet
From Levy, R., L. Remer, R. Kleidman, S. Mattoo, C. Ichoku, R. Kahn, and T. Eck,
2010. Global evaluation of the Collection 5 MODIS dark-target aerosol products over
land, Atmos. Chem. Phys., 10, 10399-10420, doi:10.5194/acp-10-10399-2010.
Examining Temporal
Representativeness
How well does a Daily Level 3 file represent
the AOD for that day?
Terra
Aqua
Arctic
Subarctic
Chance of an
overpass during
a given hour of
the day
N. Temperate
N. Temperate
S. Temperate
Subantarctic
Antarctic
0
24
Local Time
0
24
Local Time
How well does a monthly product
represent all the days of the month?
MODIS Aqua AOD July 2009
MISR Terra AOD July 2009
• Completeness: MODIS dark target algorithm does not work for deserts
• Representativeness: monthly aggregation is not enough for MISR and even MODIS
• Spatial sampling patterns are different for MODIS Aqua and MISR Terra:
“pulsating” areas over ocean are oriented differently due to different orbital
direction during day-time measurement
Examining Spatial
Representativeness
Neither pixel count nor standard deviation alone express how representative
the grid cell value is
MODIS Aerosol Optical Thickness at 0.55 microns
Level 3 Grid AOT Mean
Level 3 Grid AOT Standard Deviation
0
1
Level 2 Swath
2
0
0.5
1
Level 3 Grid AOT Input Pixel Count
1
122
Generalizing Accuracy
• Aerosol Optical Depth
– Different sources of uncertainty:
• Low AOD: surface reflectance
• High AOD: assumed aerosol models
– Also: distribution is closer to lognormal than normal
• Thus, “normal” accuracy expressions are
problematic:
– Slope relative to Ground Truth (Aeronet)
– Correlation Coefficient
– Root-mean-square error
• Instead, common practice with MODIS data is:
– Percent falling within expected error bounds of
e.g., +0.05 + 0.2Aeronet
Intercomparison of data from
multiple sensors
Data from multiple sources to be used together:
• Current sensors/missions: MODIS, MISR, GOES, OMI.
Harmonization needs:
• It is not sufficient just to have the data from different sensors and
their provenances in one place
• Before comparing and fusing data, things need to be harmonized:
• Metadata: terminology, standard fields, units, scale
• Data: format, grid, spatial and temporal resolution,
wavelength, etc.
• Provenance: source, assumptions, algorithm, processing steps
• Quality: bias, uncertainty, fitness-for-purpose, validation
Dangers of easy data access without proper assessment of
the joint data usage - It is easy to use data incorrectly
26
Example: South Pacific
Anomaly
27
MODIS Level 3 dataday definition leads to artifact in correlation
…is caused by an Overpass Time
Difference
28
Investigation of artifacts in AOD correlation
between MODIS and MISR near the Dateline
Standard MODIS Terra and MISR
Using calendar data-day
definition for each pixel
Using Local time-based
data-day definition for each
pixel
Progressively removing artifacts by applying appropriate dataday definition for
Level3 daily data generation for both MODIS Terra and MISR
Different kinds of reported data quality
• Pixel-level Quality: algorithmic guess at usability of data
point
– Granule-level Quality: statistical roll-up of Pixel-level Quality
• Product-level Quality: how closely the data represent
the actual geophysical state
• Record-level Quality: how consistent and reliable the
data record is across generations of measurements
Different quality types are often erroneously assumed having the
same meaning
Ensuring Data Quality at these different levels requires different
focus and action
Sensitivity of Aerosol and Chlorophyll
Relationship to Data-Day Definition
• The standard Level 3 daily MODIS Aerosol Optical Depth(AOD) at
550nm generated by the atmosphere group uses granule UTC-time
based data-day, while the standard Level 3 daily SeaWiFS
Chlorophyll (Chl) generated by ocean group uses the pixel-based
Local Solar Time (LST) data-day.
• The correlation coefficients between Chl and AOT differ significantly
near the dateline due to the data-day definition.
• This study suggests that the same or similar statistical aggregation
methods using the same or similar data-day definitions, should be
used when creating climate time series for different parameters to
reduce potential emergence of artifacts.
Sensitivity Study: Effect of the Data Day definition on
Ocean Color data correlation with Aerosol data
Starting with Aerosols:
Only half of the Data Day
artifact is present because the
Ocean Group uses the better
Data Day definition!
Correlation between MODIS Aqua AOD (Ocean group
product) and MODIS-Aqua AOD (Atmosphere group product)
Pixel Count distribution
Sensitivity Study: Effect of the Data Day definition on Ocean Color
data correlation with Aerosol data
Continuing with Chlorophyll and Aerosols:
Data Day effect is quite
visible!
Pixel Count distribution
Correlation between MODIS Aqua Chlorophyll and MODIS-Aqua
AOD 550nm (Atmosphere group product) for Apr 1 – Jun 4 2007
GEO-CAPE impact: observation time difference with ACE and other sensors
may lead to artifacts in comparison statistics
Sensitivity of Aerosol and Chl
Relationship to Data-Day Definition
Correlation Coefficients
MODIS AOT at 550nm and SeaWiFS Chl
Artifact: difference between
using LST and the calendar
UTC-based dataday
Difference between Correlation of A and B:
A: MODIS AOT of LST and SeaWiFS Chl
B: MODIS AOT of UTC and SeaWiFS Chl
Presenting data quality
to users
Split quality (viewed here broadly) into two
categories:
• Global or product level quality information, e.g.
consistency, completeness, etc., that can be
presented in a tabular form.
• Regional/seasonal: various approaches:
– maps with outlines regions, one map per
sensor/parameter/season
– scatter plots with error estimates, one per a combination
of Aeronet station, parameter, and season; with different
colors representing different wavelengths, etc.
But really what works is…
36
Quality Labels
Quality Labels
Generated for a request for 20-90 deg N, 0-180 deg E
Advisory Report
(Dimension Comparison Detail)
comparable
input parameters
and their semantic
equivalence
39
Advisory Report
(Expert Advisories Detail)
Expert Advisories
40
Quality Comparison Table for Level3 AOD (Global example)
Quality Aspect
MODIS
MISR
Completeness
Total Time Range
Platform
Time Range
Terra
2/2/2000-present
Aqua
7/2/2002-present
Local Revisit Time
Platform
Time Range
Terra
10:30 AM
Aqua
1:30 PM
2/2/200-present
Platform Time Range
Terra
10:30 AM
Revisit Time
global coverage of entire earth in 1 day;
coverage overlap near pole
global coverage of entire earth in 9 days
& coverage in 2 days in polar region
Swath Width
2330 km
380 km
Spectral AOD
AOD over ocean for 7 wavelengths (466,
553, 660, 860, 1240, 1640, 2120 nm );
AOD over land for 4 wavelengths (466, 553,
660, 2120 nm (land)
AOD over land and ocean for 4 wavelengths
(446, 558, 672, and 866 nm)
AOD Uncertainty or
Expected Error (EE)
+-0.03+- 5% (over ocean; QAC > = 1)
+-0.05+-20% (over land, QAC=3);
63% fall within 0.05 or 20% of Aeronet
AOD; 40% are within 0.03 or 10%
Successful
Retrievals
15% of Time
15% of Time (slightly more because of
retrieval over Glint region also)
Going down to the individual
level
42
The quality of data can vary
considerably
AIRS Parameter
Total Precipitable
Water
Carbon Monoxide
Surface Temperature
Best
(%)
Good Do Not
(%)
Use
(%)
38
38
24
64
5
7
44
29
51
Version 5 Level 2 Standard Retrieval Statistics
Data Quality
• Validation of aerosol data show that not all data
pixel labeled as “bad” are actually bad if looking at
from a bias perspective.
• But many pixels are biased due to various reasons
From Levy et al,
2009
44
2/18/2011
Percent of Biased Data in MODIS
Aerosols Over Land Increase as
Confidence Flag Decreases
Very Good
Good
Compliant*
Biased Low
Biased High
Marginal
Bad
0%
20%
40%
60%
80%
100%
*Compliant data are within + 0.05 + 0.2Aeronet
Statistics from Hyer, E., J. Reid, and J. Zhang, 2010, An over-land aerosol optical depth data
set for data assimilation by filtering, correction, and aggregation of MODIS Collection 5 optical
depth retrievals, Atmos. Meas. Tech. Discuss., 3, 4091–4167.
The effect of bad quality
data is often not negligible
Hurricane Ike, 9/10/2008
Total Column
Precipitable Water
Quality
Best
kg/m2
Good
Do Not Use
…or they can be more complicated
Hurricane Ike, viewed by the Atmospheric Infrared Sounder (AIRS)
Air Temperature
at 300 mbar
PBest : Maximum pressure
for which quality value is
“Best” in temperature
Quality flags are also sometimes
packed together into bytes
Cloud Mask
Status Flag
Day /
Night Flag
Snow/ Ice
Flag
0=Undetermined
1=Determined
0=Night
1=Day
0=Yes
1=No
Cloud Mask
Cloudiness Flag
0=Confident cloudy
1=Probably cloudy
2=Probably clear
3=Confident clear
Sunglint
Flag
0=Yes
1=No
Surface Type Flag
0=Ocean, deep lake/river
1=Coast, shallow lake/river
2=Desert
3=Land
Bitfield arrangement for the Cloud_Mask_SDS variable in atmospheric
products from Moderate Resolution Imaging Spectroradiometer (MODIS)
So, replace bad-quality pixels with fill
values!!!!
Original data
array
(Total column
precipitable water)
Mask based on
user criteria
(Quality level < 2)
Good quality
data pixels
retained
Output file has the same format and structure as the
input file (except for extra mask and original_data
fields)
Visualizations help users see the
effect of different quality filters
Data within
product file
Best quality
only
Best + Good
quality
Or, let users select their own criteria...
Initial settings are based on Science Team
recommendation.
(Note: “Good” retains retrievals that Good or better).
You can choose settings for all parameters at once...
... or parameter by parameter.
51
Types of Bias Correction
Type of
Correction
Spatial
Basis
Tempora Pros
l Basis
Cons
Relative (Crosssensor) linear
Climatological
Region
Season
Not influenced by
data in other
regions, good
sampling
Difficult to
validate
Relative (Crosssensor) nonlinear
Climatological
Global
Full data
record
Complete sampling Difficult to
validate
Anchored
Parameterized
Linear
Near
Aeronet
stations
Full data
record
Can be validated
Limited
areal
sampling
Anchored
Parameterized
Non-Linear
2/18/2011
Near
Aeronet
stations
Full data
record
Can be validated
Limited
insight into 52
correction
Data Quality Issues
• Validation of aerosol data show that not all data pixels labeled as
“bad” are actually bad if looking at from a bias perspective.
• But many pixels are biased differently due to various reasons
From Levy et al, 2009
Quality & Bias assessment using
FreeMind
FreeMind allows capturing
various relations between
various aspects of aerosol
measurements, algorithms,
conditions, validation, etc.
The “traditional” worksheets
do not support complex
multi-dimensional nature of
the task
from the Aerosol Parameter Ontology
Title: MODIS Terra C5 AOD vs. Aeronet during Aug-Oct Biomass burning in Central Brazil,
South
America
(General)
Statement: Collection 5 MODIS AOD at 550 nm during AugOct over Central South America highly over-estimates for large AOD
and in non-burning season underestimates for small AOD, as
compared to Aeronet; good comparisons are found at moderate AOD.
Region & season characteristics: Central region of Brazil is mix of forest,
cerrado, and pasture and known to have low AOD most of the year except
during biomass burning season
(Dominating factors leading to Aerosol Estimate bias):
1.Large positive bias in AOD estimate during biomass burning season may
be due to wrong assignment of Aerosol absorbing characteristics.
(Specific explanation) a constant Single Scattering Albedo ~ 0.91 is
assigned for all seasons, while the true value is closer to ~0.92-0.93.
* Alta Floresta
* Mato Grosso
* Santa Cruz
Central South America
[ Notes or exceptions: Biomass burning regions in Southern Africa do not show as large
positive bias as in this case, it may be due to different optical characteristics or single
scattering albedo of smoke particles, Aeronet observations of SSA confirm this]
2. Low AOD is common in non burning season. In Low AOD cases, biases
are highly dependent on lower boundary conditions. In general a negative
bias is found due to uncertainty in Surface Reflectance Characterization
which dominates if signal from atmospheric aerosol is low.
(Example) : Scatter plot of MODIS AOD and AOD at 550 nm vs. Aeronet from
ref. (Hyer et al, 2011) (Description Caption) shows severe over-estimation of
MODIS Col 5 AOD (dark target algorithm) at large AOD at 550 nm during AugOct 2005-2008 over Brazil. (Constraints) Only best quality of MODIS data
(Quality =3 ) used. Data with scattering angle > 170 deg excluded. (Symbols)
Red Lines define regions of Expected Error (EE), Green is the fitted slope
Results: Tolerance= 62% within EE; RMSE=0.212 ; r2=0.81; Slope=1.00
For Low AOD (<0.2) Slope=0.3. For high AOD (> 1.4) Slope=1.54
0
1
Aeronet AOD
2
Reference: Hyer, E. J., Reid, J. S., and Zhang, J., 2011: An over-land aerosol optical depth data set for data assimilation by filtering, correction,
and aggregation of MODIS Collection 5 optical depth retrievals, Atmos. Meas. Tech., 4, 379-408, doi:10.5194/amt-4-379-2011
Completeness: Observing Conditions for
MODIS AOD at 550 nm Over Ocean
Region
Ecosystem
% of Retrieval
Within
Expected Error
Average
Aeronet
AOD
AOD Estimation
Relative to
Aeronet
US Atlantic
Ocean
Dominated by Fine
mode aerosols
(smoke & sulfate)
72%
0.15
Over- estimated
(by 7%) *
Indian Ocean
Dominated by Fine
mode aerosols
(smoke & sulfate)
64 %
0.16
Over- estimated (by
7% ) *
Asian Pacific
Oceans
Dominated by fine
aerosol, not dust
56%
0.21
Over-estimated (by
13%)
“Saharan”
Ocean
Outflow Regions in
Atlantic dominated
by Dust in Spring
56%
0.31
Random Bias (1%) *
Mediterranean
Dominated by fine
aerosol
57%
0.23
Under- estimated
(by 6% ) *
*Remer L. A. et al., 2005: The MODIS Aerosol Algorithm, Products and Validation.
Journal of the Atmospheric Sciences, Special Section. 62, 947-973.
Completeness: Observing
Conditions for MODIS AOD at 550 nm Over Land
Region
Ecosystem
% of Retrieval
Within
Expected
Error
Correlation
W.r.t Chinese
ground Sun
Hazetometer
AOD Estimation
Relative to
Ground based
sensor
Yanting, China
Agriculture Site
(central China)
45%
slope=1.04 ;
offset= -0.063
Corr ^2 = 0.83
Slightly Overestimated
Fukung, China
Semi Desert (North
West China) Site
7%
slope=1.65
offset=0.074
Corr ^2 = 0.58
Over- estimated
(more than 100% at
large AOD values
Beijing
Urban Site
Industrial Pollution
35%
Slope = 0.38,
Offset = 0.086,
Corr ^2 =
0.46%
Severely Underestimated
(more than 100% at
large AOD values)
* Li Z. et al, 2007: Validation and understanding of Moderate Resolution Imaging
Spectroradiometer aerosol products (C5) using ground-based measurements from the handheld
Sun photometer network in China, JGR, VOL. 112, D22S07, doi:10.1029/2007JD008479.
Summary
• Quality is very hard to characterize, different groups will
focus on different and inconsistent measures of quality
– HOW WOULD YOU ADDRESS THIS?
• Products with known Quality (whether good or bad
quality) are more valuable than products with unknown
Quality.
– Known quality helps you correctly assess fitness-for-use
• Harmonization of data quality is even more difficult that
characterizing quality of a single data product
58
What is next
• Project discussions….
• A3 – coming back to you this week
• Next week 12 – Nov. 20 - Webs of Data and
Data on the Web, the Deep Web, Data
Discovery, Data Integration
– Project write ups due.
• Reading for this week – see web site
• Last class is week 13, Nov. 27 – project
presentations (and final assignment due)
59