Identifying Rare Class in Absence of True Labels: Application to Monitoring Forest Fires from Satellite data Vipin Kumar University of Minnesota kumar@cs.umn.edu www.cs.umn.edu/~kumar ACM SIGKDD Workshop.

Identifying Rare Class in Absence of True Labels: Application to Monitoring Forest Fires from Satellite data Vipin Kumar University of Minnesota [email protected] www.cs.umn.edu/~kumar ACM SIGKDD Workshop.

Transcript Identifying Rare Class in Absence of True Labels: Application to Monitoring Forest Fires from Satellite data Vipin Kumar University of Minnesota [email protected] www.cs.umn.edu/~kumar ACM SIGKDD Workshop.

Identifying Rare Class in Absence of True Labels:
Application to Monitoring Forest Fires from Satellite data
Vipin Kumar
University of Minnesota
[email protected]
www.cs.umn.edu/~kumar
ACM SIGKDD Workshop on Outlier Definition, Detection and Description
(August 10, 2015)
Work supported by NASA and NSF Expeditions
in Computing project on Understanding
Climate Change using Data-driven Approaches
Global Mapping of Forest Fires
Mapping fires is important for…
• Climate change studies
e.g., linking the impact of a changing climate on the frequency of fires
• Carbon cycle studies
e.g., quantifying how much C02 is emitted by fires (critical for UN-REDD)
• Land cover management
e.g., identifying active deforestation fronts
2
Global Mapping of Forest Fires
Mapping fires is important for…
• Climate change studies
e.g., linking the impact of a changing climate on the frequency of fires
• Carbon cycle studies
e.g., quantifying how much C02 is emitted by fires (critical for UN-REDD)
• Land cover management
e.g., identifying active deforestation fronts
Aerial/Ground Surveys
– Accurate
– Expensive
– Globally infeasible
3
Manual inspection
Computational Techniques
– Human effort
– Difficult due to rare class
– Globally infeasible
–
–
–
–
Automated
Cost-effective
Globally scalable
Historical as well as near-real time
Predictive Modeling Approach
Forest Fire Mapping
Instance
Multispectral reflectance data
Label
1
•
•
•
7 spectral bands
500 m spatial resolution
8-day composites
0
0
1
.
.
1
4
Forest fire mapping
Predicts whether a given
pixel is burned or not?
Challenges: Heterogeneity
Variations in the relationship between
the explanatory and target variable
• Geographical heterogeneity
• Seasonal heterogeneity
• Land class heterogeneity
Train
Test
Precision
Recall F-value
California California 94
65
72
Georgia
California 53
53
53
Georgia
Georgia
87
53
66
California Georgia
10
30
16
Temporal heterogeneity:
Impossible to obtain training samples going back in time
5
Global availability of labeled samples
for burned area classification
Challenges: Ultra skewed class distribution
Burned areas (California) in year 2008
# Positives : 103 sq. km.
# Negatives: 106 sq. km.
Prediction at every time step: 46 * 106
 Requires extremely low FPR
TPR
FPR
0.57 0.0003
Precision Recall
0.58
0.57
 Overall accuracy is not very useful
 Need to jointly maximize precision and recall
• Harmonic mean (F-measure)
• Geometric mean
6
RAPT: RAre class Prediction in absence of ground Truth
• Step 1: Learn classification models using
imperfect (noisy) labels
• Step 2: Combine predictions from classification
model and the imperfect label
• Step 3: Exploit guilt-by-association using spatial
context
7
Learning with imperfect labels
Supervised Learning
Expert-annotated Labels
Sufficient
Inadequate
training samples training samples
SVM
Decision tree
Logistic regression
Semi-supervised
Active Learning
Multi-view
Multi-task
Imperfect Labels
Multiple annotators
Learning with crowds
Raykar et al.
Partial Supervision
Single annotator
Imperfect Supervision
Positive Unlabeled learning
Bing Liu et al.
Elkan et al.
Balanced
Natrajan et al.
8
Rare class
Rare Class Prediction in Absence of Ground Truth
Step 1: Train a classifier using imperfect labels
Features (x)
True
Labels (y)
Features (x)
Use a set of features/heuristics to derive imperfect labels
9
Imperfect
labels (a)
Rare Class Prediction in Absence of Ground Truth
Step 1: Train a classifier using imperfect labels
Assumptions
(1)
+
< 1
(2) Imperfect label is conditionally independent
of feature space given the true label
10
Learning with imperfect labels
Assumptions
(1)
+
< 1
(2) Imperfect label is conditionally independent
of feature space given the true label
11
Learning with imperfect labels
Assumptions
(1)
< 1
+
(2) Imperfect label is conditionally independent
of feature space given the true label
Ranking according to Pr(a=1|x) and Pr(y=1|x) is identical
Conditional
probability
Pr(y=1|x)
Pr(a=1|x)
Test instances ordered according to Pr(y=1|x)
12
Learning with imperfect labels
Assumptions
(1)
< 1
+
(2) Imperfect label is conditionally independent
of feature space given the true label
Ranking according to Pr(a=1|x) and Pr(y=1|x) is identical
Conditional
probability
Pr(y=1|x)
0.5
Maximizes Classification Accuracy
Test instances ordered according to Pr(y=1|x)
13
Learning with imperfect labels
Assumptions
(1)
+
< 1
(2) Imperfect label is conditionally independent
of feature space given the true label
Conditional
probability
Ranking according to Pr(a=1|x) and Pr(y=1|x) is identical
0.5
Not optimal
Pr(a=1|x)
Test instances ordered according to Pr(y=1|x)
14
Learning with imperfect labels
Assumptions
(1)
+
< 1
(2) Imperfect label is conditionally independent
of feature space given the true label
Ranking according to Pr(a=1|x) and Pr(y=1|x) is identical
Pr(a=1|x)
Approach
Use labeled validation data set to select
threshold.
Labeled data not available
Test instances ordered according to Pr(y=1|x)
15
Learning with imperfect labels
Assumptions
(1)
+
< 1
(2) Imperfect label is conditionally independent
of feature space given the true label
Ranking according to Pr(a=1|x) and Pr(y=1|x) is identical
Pr(a=1|x)
Test instances ordered according to Pr(y=1|x)
One possible approach (Natrajan et al.
2013)
Select the threshold that maximizes
classification accuracy by treating imperfect
labels as target.
Our Contribution
We prove that for balanced datasets this
approach is optimal.
*Identical prediction is possible using appropriate threshold on Pr(a=1|x), for every threshold on Pr(y=1|x). Natarajan 2013
16
Rare class
Conditional
probability
Pr(y=1|x)
17
Maximizes Classification Accuracy
Recall = 0.20
Precision = 1
Rare class
Conditional
probability
Pr(y=1|x)
Maximizes Classification Accuracy
Recall = 0.20
Precision = 1
Conditional
probability
Pr(y=1|x)
Maximizes precision*recall
Recall = 0.8
Precision = 0.5
Test instances ordered according to Pr(y=1|x)
18
Rare class
Conditional
probability
Pr(y=1|x)
Maximizes Classification Accuracy
Recall = 0.20
Precision = 1
Conditional
probability
Pr(y=1|x)
Maximizes precision*recall
Recall = 0.8
Precision = 0.5
Test instances ordered according to Pr(y=1|x)
Challenge: How to accurately estimate precision and recall with imperfect labels?
19
Rare class
Conditional
probability
Pr(y=1|x)
Maximizes Classification Accuracy
Recall = 0.20
Precision = 1
Conditional
probability
Pr(y=1|x)
Maximizes precision*recall
Recall = 0.8
Precision = 0.5
Test instances ordered according to Pr(y=1|x)
Challenge: How to accurately estimate precision and recall with imperfect labels?
Our Contributions:
(1) A new method to estimate precision*recall using imperfect labels.
(2) We prove that the selected threshold maximizes the true precision*recall
20
Estimating Precision*Recall
• Estimate precision and recall using imperfect labels
– Incorrect estimate of true precision and recall
21
Estimating Precision*Recall
• Estimate precision and recall using imperfect labels
– Incorrect estimate of true precision and recall
• Estimate FPR using imperfect labels
– Correct approximation of true FPR
* g(x) = P(a=1|x) > th
22
Estimating Precision*Recall
• Estimate precision and recall using imperfect labels
– Incorrect estimate of true precision and recall
• Estimate FPR using imperfect labels
– Correct approximation of true FPR
* g(x) = P(a=1|x) > th
• Estimate Precision
23
Estimating Precision*Recall
• Lack method to correctly estimate recall
• Compute precision*recall directly
–
–
–
–
Write precision*recall in terms of precision, P(g(x)=1) & P(y=1)
Estimate precision
Compute P(g(x)=1)
P(y=1) is a constant
• Select threshold to maximize precision*recall
24
Illustration of Step1
Distribution of Vegetation change feature for burned and unburned classes
Negative class
Positive class
25
Illustration of Step1
Distribution of Vegetation change feature for burned and unburned classes
threshold
Negative class
Positive class
Step 1 allows us to select the threshold that maximizes precision*recall
26
Illustration of Step1
Skew
• 1:5
• 1 : 200
• 1 : 1000
Performance of RAPT is comparable to GT
27
GT model
RAPT model
Rare Class Prediction in Absence of Ground Truth
Step 1: Train a classifier using weak labels
Step 2: Combine predictions of classifier with imperfect labels
•
•
Instance is labeled positive only if it is flagged positive by both
Considerably reduces the number of false positives
–
•
Drastic increase in precision
Incorrectly prunes away some positives
–
Loss in recall
f(x)
Step 2
1
1
1
1
0
0
0
1
0
0
0
0
Combining strategy
For rare class scenarios, the combination step drastically
increases precision for relatively smaller loss of recall
Maximize precision*recall at end of combination step
28
Rare Class Prediction in Absence of Ground Truth
Step 1: Train a classifier using weak label
Step 2: Combine predictions
Step 3: Guilt-by-association
Observations:
• Combination step prunes away some positives
• Missed positives in the neighborhood of confident positives
Approach:
• A collective classification method to make use of labels of
neighbors during final classification of each node
29
Results for Burned Area Mapping
Weak label
RAPT Step 1
GT-based classifier
California State
30
Results for Burned Area Mapping
Weak label
RAPT Step 1
GT-based classifier
RAPT Step 2
California State
31
Results for Burned Area Mapping
Weak label
RAPT Step 1
GT-based classifier
RAPT Step 2
RAPT Step 3
California State
32
Results for Burned Area Mapping
Weak label
RAPT Step 1
GT-based classifier
RAPT Step 2
RAPT Step 3
Georgia State
33
Results for Burned Area Mapping
Weak label
RAPT Step 1
GT-based classifier
RAPT Step 2
RAPT Step 3
Montana State
34
Global Monitoring of Fires in Tropical Forests
Fires in tropical forests during 2001-2014
1 million sq. km. burned area found in tropical forests
● more than three times the total
area reported by state-of-art
NASA products.
RAPT
220 K
780 K
35
60K
MCD45
Validation: Multiple sources
RAPT
MCD45
Burn scar in Landsat composite
Change in Vegetation series
Before Fire Event
After Fire Event
Validation confirmed that the additional burned areas detected
using RAPT are actual burns missed by state-of-art products
36
Validation: Burn Index
A burn index tries to capture the degree of burn at a location and is
computed as a function of spectral values before and after the
event.
A commonly used index is dNBR
- Used for validation in previous studies, including MCD45
37
Validation: Burn Index
A burn index tries to capture the degree of burn at a location and is
computed as a function of spectral values before and after the
event.
A commonly used index is dNBR
- Used for validation in previous studies, including MCD45
Unburned pixels
38
Validation: Burn Index
A burn index tries to capture the degree of burn at a location and is
computed as a function of spectral values before and after the
event.
A commonly used index is dNBR
- Used for validation in previous studies, including MCD45
MCD45
Unburned pixels
39
Common
Unburned pixels
RAPT
Validation: Burn Index
A burn index tries to capture the degree of burn at a location and is
computed as a function of spectral values before and after the
event.
A commonly used index is dNBR
- Used for validation in previous studies, including MCD45
MCD45
RAPT
Unburned pixels
40
Only RAPT
Validation: Burn Index
A burn index tries to capture the degree of burn at a location and is
computed as a function of spectral values before and after the
event.
A commonly used index is dNBR
- Used for validation in previous studies, including MCD45
MCD45
Only MCD45
Only RAPT
41
RAPT
Dynamics of Fire Event
Only RAPT
Common
Only MCD45
Region in North Brazil
Probability of burn
42
Comparison with MCD45
Time of burn
Impact on REDD+
“The [Peru] government needs to
spend more than $100m a year on
high-resolution satellite pictures of
its billions of trees.
But … a computing facility
developed by the Planetary Skin
Institute (PSI) … might help cut
that budget.”
“ALERTS, which was launched at
Cancún, uses … data-mining
algorithms developed at the
University of Minnesota and a
lot of computing power … to spot
places where land use changed.”
(The Economist 12/16/2010)
43
Concluding Remarks
• Future research
– Study the impact of the conditional independence
between features and imperfect labels
– Extend to incorporate labels from multiple annotators
• Other applications
– Urban extent mapping
– Cyber security
– Epidemiology
44
Thank You! Questions?
UMN team members
Varun Mithal
Guruprasad Nayak
Ankush Khandelwal
(PhD thesis)
NASA AMES Collaborators
Rama Nemani
Nikunj C. Oza
Work supported by NASA and NSF Expeditions in Computing project on Understanding Climate Change using Data-driven Approaches
45

Identifying Rare Class in Absence of True Labels: Application to Monitoring Forest Fires from Satellite data Vipin Kumar University of Minnesota [email protected] www.cs.umn.edu/~kumar ACM SIGKDD Workshop.

Transcript Identifying Rare Class in Absence of True Labels: Application to Monitoring Forest Fires from Satellite data Vipin Kumar University of Minnesota [email protected] www.cs.umn.edu/~kumar ACM SIGKDD Workshop.

Directory