remsensing10 - Rutgers University

Download Report

Transcript remsensing10 - Rutgers University

Image Classification: Accuracy
Assessment
Lecture 10
Prepared by R. Lathrop 11//99
Updated 3/06
Readings:
ERDAS Field Guide 5th Ed. Ch 6:234-260
Where in the World?
Learning objectives
• Remote sensing science concepts
– Rationale and technique for post-classification smoothing
– Errors of omission vs. commission
– Accuracy assessment
• Sampling methods
• Measures
Fuzzy accuracy assessment
• Math Concepts
– Calculating accuracy measures: overall accuracy, producer’s
accuracy and user’s accuracy and kappa coefficient.
• Skills
Interpreting Contingency matrix and Accuracy
assessment measures
Post-classification “smoothing”
• Most classifications have a problem with “salt and pepper”,
i.e., single or small groups of mis-classified pixels, as they
are “point” operations that operate on each pixel
independent of its neighbors
• “Salt and pepper” may be real. The decision on whether to
filter/eliminate depends on the choice of the minimum
mapping unit = does it equal single pixel or an aggregation
• Majority filtering: replaces central pixel with the majority
class in a specified neighborhood (3 x 3 window); con:
alters edges
• Eliminate: clumps “like” pixels and replaces clumps under
size threshold with majority class in local neighborhood;
pro: doesn’t alter edges
Example: Majority filtering
6
6
6
6
6
2
6
6
2
6
2
6
2
6
6
2
8
2
6
6
2
2
2
2
2
6
6
2
6
2
6
8
2
6
3x3 window
Class 6 = majority in window
Example from ERDAS IMAGINE
Field Guide, 5th ed.
Example: reduce single pixel
“salt and pepper”
Input
Output
6
6
6
6
6
6
6
6
6
6
2
6
6
2
6
2
6
6
6
6
2
6
2
6
6
2
2
6
6
6
2
8
2
6
6
2
2
2
6
6
2
2
2
2
2
2
2
2
2
2
Edge
Example: altered edge
Input
Output
6
6
6
6
6
6
6
6
6
6
2
6
6
2
6
2
6
6
6
6
2
6
2
6
6
2
2
6
6
6
2
8
2
6
6
2
2
2
6
6
2
2
2
2
2
2
2
2
2
2
Edge
Example: Majority filtering
6
6
6
6
6
2
6
6
2
6
2
6
2
6
6
2
8
2
6
6
2
2
2
2
2
6
6
2
6
2
6
8
2
6
3x3 window
Class 6 = majority in window
Example from ERDAS IMAGINE
Field Guide, 5th ed.
Example: ERDAS “Eliminate”
no altered edge
Input
Output
6
6
6
6
6
6
6
6
6
6
2
6
6
2
6
2
6
6
2
6
2
6
2
6
6
2
6
2
6
6
2
8
2
6
6
2
2
2
6
6
2
2
2
2
2
2
2
2
2
2
Edge
Small clump “eliminated”
Accuracy Assessment
• Always want to assess the accuracy of the final
thematic map! How good is it?
• Various techniques to assess the “accuracy’ of
the classified output by comparing the “true”
identity of land cover derived from reference
data (observed) vs. the classified (predicted) for
a random sample of pixels
• The accuracy assessment is the means to
communicate to the user of the map and should
be included in the metadata documentation
Accuracy Assessment
• R.S. classification accuracy usually assessed
and communicated through a contingency table,
sometimes referred to as a confusion matrix
• Contingency table: m x m matrix where m = #
of land cover classes
– Columns: usually represent the reference data
– Rows: usually represent the remote sensed
classification results (i.e. thematic or information
classes)
Accuracy Assessment Contingency Matrix
Reference Data
Class.
Data
1.10
1.20
1.40
1.60
2.00
2.10
2.40
2.50
Row
Total
1.10
109
11
14
0
0
0
1
1
126
1.20
2
82
2
13
0
0
0
0
111
1.40
3
4
123
0
0
0
0
0
130
1.60
2
1
0
22
0
0
1
1
25
2.00
0
0
0
3
5
0
0
0
8
2.10
0
0
0
0
0
9
0
0
9
2.40
0
2
1
0
0
0
74
0
77
2.50
0
1
0
0
0
1
4
41
47
Col
Total
116
101
140
38
5
10
80
43
533
Accuracy Assessment
• Sampling Approaches: to reduce analyst bias
– simple random sampling: every pixel has equal
chance
– stratified random sampling: # of points will be
stratified to the distribution of thematic layer classes
(larger classes more points)
– equalized random sampling: each class will have
equal number of random points
• Sample
class
size: at least 30 samples per land cover
How good is good?
• How accurate should the classified map be?
• General rule of thumb is 85% accuracy
• Really depends on how much “risk” you are
willing to accept if the map is wrong
• Are you interested in more in the overall
accuracy of the final map or in quantifying the
ability to accurately identify and map individual
classes
• Which is more acceptable overestimation or
underestimation
How good is good? Example
• USGS_NPS National Vegetation
classification standard
• Horizontal positional locations meet
National Map Accuracy standards
• Thematic accuracy >80% per class
• Minimum Mapping Unit of 0.5 ha
• http://biology.usgs.gov/npsveg/aa/indexdoc.
html
A whole set of
field reference
point can be
developed using
some sort of
random allocation
but due to
travel/access
constraints, only a
subset of points is
actually visited.
Resulting in a not
truly random
distribution.
Accuracy Assessment Issues
• What constitutes reference data?
- higher spatial resolution imagery (with
visual interpretation)
- “ground truth”: GPSed field plots
- existing GIS maps
• Reference data can be polygons or points
Accuracy Assessment Issues
• Problem with “mixed” pixels: possibility of
sampling only homogeneous regions (e.g., 3x3
window) but introduces a subtle bias
• If smoothing was undertaken, then should assess
accuracy on that basis, i.e., at the scale of the
mmu
• If a filter is used should be stated in metadata
• Ideally, % of overall map that so qualifies should
be quantified, i.e., 75% of map is composed of
homogenous regions greater than 3x3 in size –
thus 75% of map assessed, 25% not assessed.
Errors of Omission vs.
Commission
• Error of Omission: pixels in class 1
erroneously assigned to class 2; from the
class 1 perspective these pixels should have
been classified as class1 but were omitted
• Error of Commission: pixels in class 2
erroneously assigned to class 1; from the
class 1 perspective these pixels should not
have been classified as class but were
included
Errors of Omission vs. Commission:
from a Class2 perspective
# of
pixels
Omission error:
pixels in Class2
erroneously
assigned to
Class 1
Class 1
Commission error:
pixels in Class1
erroneously
assigned to Class 2
Class 2
0
255
Digital Number
Accuracy Assessment Measures
• Overall accuracy: divide total correct (sum of the
major diagonal) by the total number of sampled
pixels; can be misleading, should judge individual
categories also
• Producer’s accuracy: measure of omission error; total
number of correct in a category divided by the total # in that
category as derived from the reference data; measure of
underestimation
• User’s accuracy: measure of commission error;
total number of correct in a category divided by the total #
that were classified in that category ; measure of
overestimation
Accuracy Assessment Contingency Matrix
Reference Data
Class
.
Data
1.10
1.10
1.20
1.40
1.60
2.00
2.10
2.40
2.50
Row
Total
308
23
12
1
0
1
3
0
348
1.20
2
279
9
2
0
0
2
1
295
1.40
0
1
372
0
1
1
4
0
379
1.60
0
1
0
26
0
0
0
0
27
2.00
0
0
1
0
10
0
2
5
18
2.10
1
0
2
0
2
93
1
0
99
2.40
3
1
12
0
0
1
176
1
194
2.50
1
0
0
0
0
1
1
48
51
Col
Total
315
305
408
29
13
97
189
55
1411
Accuracy Assessment Measures
Code
Land Cover Description
Number
Correct
1.10
Developed
308
1.20
Cultivated/Grassland
279
1.40
Forest/Scrub/Shrub
372
1.60
Barren
26
2.00
Unconsolidated Shore
10
2.10
Estuarine Emergent Wetland
93
2.40
Palustrine Wetland: Emergent/Forested
176
2.50
Water
48
Totals
1312
Overall Classification Accuracy =
Producer=s
Accuracy
User=s
Accuracy
Kappa
---
---
---
Accuracy Assessment Measures
Number
Correct
Producer=s
Accuracy
User=s
Accuracy
Code
Land Cover Description
1.10
Developed
308
308/315
308/348
1.20
Cultivated/Grassland
279
279/305
279/295
1.40
Forest/Scrub/Shrub
372/408
372/379
1.60
Barren
26
26/29
26/27
2.00
Unconsolidated Shore
10
2.10
Estuarine Emergent Wetland
93
2.40
Palustrine Wetland: Emergent/Forested
2.50
372
10/13
10/18
93/97
93/99
176
176/189
176/194
Water
48
48/55
48/51
Totals
1312
Overall Classification Accuracy = 1312/1411
Kappa
Accuracy Assessment Measures
Code
Land Cover Description
Number
Correct
Producer=s
Accuracy
User=s
Accuracy
1.10
Developed
308
97.8
88.5
.
1.20
Cultivated/Grassland
279
91.5
94.6
.
1.40
Forest/Scrub/Shrub
372
91.2
98.2
1.60
Barren
26
89.7
96.3
2.00
Unconsolidated Shore
10
2.10
Estuarine Emergent Wetland
93
95.9
93.9
2.40
Palustrine Wetland: Emergent/Forested
176
93.1
90.7
2.50
Water
48
87.3
94.1
Totals
1312
Overall Classification Accuracy = 93.0%
76.9
Kappa
.
55.6
.
.
Accuracy Assessment Measures
• Kappa coefficient: provides a difference measurement
between the observed agreement of two maps and
agreement that is contributed by chance alone
• A Kappa coefficient of 90% may be interpreted as 90%
better classification than would be expected by random
assignment of classes
• What’s a good Kappa? General range
K < 0.4: poor 0.4 < K < 0.75: good K > 0.75: excellent
• Allows for statistical comparisons between matrices (Z
statistic); useful in comparing different classification
approaches to objectively decide which gives best results
• Alternative statistic: Tau coefficient
Kappa coefficient
Khat = (n * SUM Xii) - SUM (Xi+ * X+i)
n2 - SUM (Xi+ * X+i)
where SUM = sum across all rows in matrix
Xii = diagonal
Xi+ = marginal row total (row i)
X+I = marginal column total (column i)
n = # of observations
Takes into account the off-diagonal elements of the
contingency matrix (errors of omission and commission)
Kappa coefficient: Example
Kˆ 
k
k
i 1
i 1
N *  xii   ( xi  * xi )
k
N   ( xi  * xi )
2
i 1
(SUM Xii) = 308 + 279 + 372 + 26 +10 + 93 + 176 + 48 = 1312
SUM (Xi+ * X+i) = (348*315) + (295*305) + (379*408) + (27*29) +
(18*13) + (99*97) + (194*189) + (51*55) =
Khat = 1411(1312) – 404,318
(1411)2 – 404,318
Khat = 1851232 – 404,318 = 1,446,914 = .912
1990921 – 404,318 1,586,603
Accuracy Assessment Measures
Number
Correct
Producer=s
Accuracy
User=s
Accuracy
Kappa
Developed
308
97.8
88.5
.8520
1.20
Cultivated/Grassland
279
91.5
94.6
.9308
1.40
Forest/Scrub/Shrub
372
91.2
98.2
.9740
1.60
Barren
26
89.7
96.3
.9622
2.00
Unconsolidated Shore
10
55.6
***
2.10
Estuarine Emergent Wetland
93
95.9
93.9
.9349
2.40
Palustrine Wetland: Emergent/Forested
176
93.1
90.7
.8929
2.50
Water
48
87.3
94.1
.9388
Totals
1312
Code
Land Cover Description
1.10
76.9
** Sample Size for this Land Cover Class Too Small (< 25) for valid Kappa measure
Overall Classification Accuracy = 93.0%
.9120
Case Study
Multi-scale segmentation approach to mapping seagrass habitats using
airborne digital camera imaging
Richard G. Lathrop¹, Scott Haag¹·² , and Paul Montesano¹.
¹Center for Remote Sensing & Spatial Analysis
Rutgers University
New Brunswick, NJ 08901-8551
²Jacques Cousteau National Estuarine Research Reserve
130 Great Bay Blvd
Tuckerton NJ 08087
Method> Field Surveys
 All transect endpoints and
individual check points were
first mapped onscreen in the
GIS.
 Endpoints were then loaded into
a GPS (+- 3meters) for
navigation on the water.
 A total of 245 points were
collected.
Method> Field Surveys
For each field reference point, the
following data was collected:
•
•
•
•
•
•
•
•
•
•
•
GPS location (UTM)
Time
Date
SAV species presence/dominance: Zostera marina or
Ruppia maritima or macroalgae
Depth (meters)
% cover (10 % intervals) determined by visual
estimation
Blade Height of 5 tallest seagrass blades
Shoot density (# of shoots per 1/9 m2 quadrat that was
extracted and counted on the boat)
Distribution (patchy/uniform)
Substrate (mud/sand)
Additional Comments
Results> Accuracy Assessment
 The resulting maps were compared
with the 245 field reference points.
 All 245 reference points were used
to support the interpretation in
some fashion and so can not be
truly considered as completely
independent validation
 The overall accuracy was 83% and
Kappa statistic was 56.5%, which
can be considered as a moderate
degree of agreement between the
two data sets.
Reference
Reference
GIS Map
Seagrass
Absent
Seagrass
Present
Seagrass
Absent
67
32
68%
Seagrass
Present
10
136
93%
Producer’s
Accuracy
87%
81%
83%
User’s
Accuracy
Results> Accuracy Assessment
 The resulting maps were also
compared with an independent set of
41 bottom sampling points collected
as part of a seagrass-sediment study
conducted during the summer of
2003 (Smith and Friedman, 2004).
 The overall accuracy was 70.7% and
Kappa statistic was 43%, which can
be considered as a moderate degree
of agreement between the two data
sets.
GIS Map
Reference
Reference
Seagrass
Absent
Seagrass
Present
User’s
Accuray
Seagrass
Absent
14
3
82%
Seagrass
Present
9
15
62%
61%
83%
71%
Producer’s
Accuracy
SAV Accuracy Assessment Issues
• Matching spatial scale of field reference data
with scale of mapping
• Ensuring comparison of “apples to apples”
• Spatial accuracy of “ground truth” point
locations
• Temporal coincidence of “ground truth” and
image acquisition
“Fuzzy” Accuracy Assessment
•“Real world” is messy; natural vegetation communities
are a continuum of states, often with one grading into the
next
•R.S. classified maps generally break up land
cover/vegetation into discrete either/or classes
•How to quantify this messy world? R.S. classified maps
have still have some error while still having great utility
•Fuzzy Accuracy Assessment: doesn’t quantify errors as
binary correct or incorrect but attempts to evaluate the
severity of the error
“Fuzzy” Accuracy Assessment
•Fuzzy rating: severity of error or conversely the
similarity between map classes is defined from a
user standpoint
•Fuzzy rating can be developed quantitatively based
on the deviation from a defined class based on a %
difference (i.e., within +/- so many %)
•Fuzzy set matrix: fuzzy rating between each map
class and every other class is developed into a fuzzy
set matrix
For more info, see: Gopal & Woodcock, 1994. PERS:181-188
“Fuzzy” Accuracy Assessment:
Level Description
5
Absolutely right: Exact match
4
Good: minor differences; species dominance or
composition is very similar
Acceptable Error: mapped class does not match; types
have structural or ecological similarity or similar species
3
2
Understandable but wrong: general similarity in
structure but species/ecological conditions are not
similar
1
Absolutely wrong: no conditions or structural similarity
http://biology.usgs.gov/npsveg/fiis/aa_results.pdf
http://www.fs.fed.us/emc/rig/includes/appendix3j.pdf
“Fuzzy” Accuracy Assessment:
•Each user could redefine the fuzzy set matrix on an
application-by-application basis to determine what
percentage of each map class is acceptable and the
magnitude of the errors within each map class
•Traditional map accuracy measures can be calculated
at different levels of error
Exact – only level 5
(MAX)
Acceptable – level 5, 4, 3
(RIGHT)
•Example: from USFS
Label #Sites MAX(5 only)
CON 88
71 81%
RIGHT (3,4,5)
82
93%
“Fuzzy” Accuracy Assessment:
example from USFS
Confusion Matrix based on Level 3,4,5 as Correct
Label Sites CON MIX HDW SHB HEB NFO Total
CON 88
X
0
1
5
0
0
6
MIX
14 2
X 1 1
0
0
4
HDW
6
1
1
X 0
0
0
2
SHB
8
1
0
0
X
0
0
1
HEB
1
0
0
0
1
X
0
1
NFO
4
3
0
0
3
0
X
6
Total 121 7
1
2
10
0
0
20
http://www.fs.fed.us/emc/rig/includes/appendix3j.pdf
“Fuzzy” Accuracy Assessment:
•Ability to evaluate the magnitude or seriousness of
errors
•Difference Table: error within each map class based on
its magnitude with error magnitude calculated by
measuring the difference between the fuzzy rating of
each ground reference point and the highest rank
assigned to all other possible map classes
• All points that are Exact matches have Difference
values >= 0; all mismatches are negative. Values -1 to 4
generally correspond to correct map labels. Values of -2
to -4 correspond to map errors with -4 representing a
more serious error than -1
“Fuzzy” Accuracy Assessment:
Difference Table: example from USFS
Label Sites # Mismatches
-4 -3 -2
CON 88
4
2
0
-1
11
0
3
# Matches
1
2
3
0
12 23
4
33
Higher positive values indicate that pure conditions are
well mapped while lower negative values show pure
conditions to be poorly mapped. Mixed or transitional
conditions, where a greater number of class types are
likely to be considered acceptable, will fall more in the
middle
http://www.fs.fed.us/emc/rig/includes/appendix3j.pdf
“Fuzzy” Accuracy Assessment:
•Ambiguity Table: tallies map classes that characterize
a reference site as well as the actual map label
• Useful in identifying subtle confusion between map
classes and may be useful in identifying additional
map classes to be considered
•Example from USFS
Label Sites CON MIX HDW SHB HEB NFO Total
CON
88
X
11
6
15
0
0
32
15 out of 88 reference sites mapped as conifer could have been
equally well labeled as shrub
http://www.fs.fed.us/emc/rig/includes/appendix3j.pdf
Alternative Ways of Quantifying
Accuracy: Ratio Estimators
• Method of statistically adjusting for over- or
underestimation
• Randomly allocate “test areas”, determine area from
map and reference data
• Ratio estimation uses the ratio of Reference/Map area
to adjust the mapped area estimate
• Uses the estimate of the variance to develop
confidence levels for land cover type area
Shiver & Border, 1996. Sampling Techniques for Forest Resource
Inventory, Wiley, NY, NY. Pp. 166-169
Example: NJ 2000 Land Use Update
Comparison of urban/transitional land use
as determined by photo-interpretation of 1m
B&W photography vs. 10m SPOT PAN
1 m B&W
10 m SPOT PAN
Comparison of Land Use between Reference Imagery & SPOT: Urban & Transitional
400
Tile
1 to 1 line
350
Reference Imagery (acres)
300
Above 1-to-1 line:
underestimate
250
200
150
Below 1-to-1 line:
overestimate
100
50
0
0
50
100
150
200
SPOT (acres)
250
300
350
400
Example: NJ Land Use Change
Land Use Change
Category
Mapped
Estimate
(Acres)
Statistically
Adjusted
Estimate with
95% CI (acres)
Urban
73,191
77,941 +/17,922
Transitional/Barren
20,861
16,082 +/7,053
Total Urban &
Barren
94,052
89,876 +/16,528
Case Study: Sub-pixel Un-mixing
Urban/Suburban
Mixed Pixels:
varying
proportions of
developed
surface, lawn and
trees
30m TM pixel grid on IKONOS image
Objective: Sub-pixel Unmixing
False
Color
Composite
Image
R: Forest
G: Lawn
B: IS
Grass
Estimation
Impervious
Surface
Estimation
Woody
Estimation
Validation Data
• For homogenous 90mx90m test areas
interpreted DOQ
-DOQ pixels scaled to match TM
• For selected sub-areas:
IKONOS multi-spectral image
– 3 key indicator land use classified map:
impervious surface, lawn, and forest
-IKONOS pixels scaled to match TM
Egg Harbor City
IKONOS
Landsat SOM-LVQ
Landsat LMM
Interior LMM - Impervious Surface
8000
Area
6000
Impervio
us
LMM
Reference
4000
2000
0
1
3
5
7
9
11 13
Plot
Interior LMM - Lawn
8000
Grass
Area
6000
LMM
Reference
4000
2000
0
1
3
5
7
9
11 13
Plot
Interior LMM - Urban Tree
8000
Woody
Area
6000
LMM
Reference
4000
2000
0
1
3
5
7
Plot
9
11 13
Hammonton
IKONOS
Landsat SOM-LVQ
Landsat LMM
Interior LMM - Impervious Surface
8000
Area
6000
Impervio
us
LMM
4000
Reference
2000
0
1
3
5
7
9 11 13 15
Plot
Interior LMM - Lawn
8000
Area
6000
Grass
LMM
4000
Reference
2000
0
1
3
5
7
9 11 13 15
Plot
Interior LMM - Urban Tree
8000
Woody
Area
6000
LMM
4000
Reference
2000
0
1
3
5
7
9 11 13 15
Plot
Root Mean Square Error: 90m x 90m test plots
Hammonton
Impervious
Grass
Tree
± 7.4%
± 10.8%
± 12.0 %
± 8.2%
± 13.6%
± 10.3%
± 7.1%
± 20.7%
± 11.0%
Impervious
Lawn
Urban
Tree
± 5.6%
± 7.7%
± 6.8%
± 5.8%
± 12.5%
± 6.0%
± 6.1%
± 19.6%
± 5.0%
IKONOS
LMM
SOM_LVQ
Egg Harbor City
IKONOS
LMM
SOM_LVQ
Hammonton
SOM-LVQ vs.
IKONOS
Study sub-area
comparison
3x3 TM pixel
zonal %
I
m
p
e
r
v
i
o
u
s
G
r
a
s
s
T
r
e
e
s
RMSE =
± 13.5%
Egg Harbor
City
RMSE =
± 17.6%
RMSE =
± 15.0%
RMSE =
± 14.4%
RMSE =
± 17.6%
RMSE =
± 21.6%
Comparison of Landsat TM vs.
NJDEP IS estimates
NJDEP
- Landsat_SOM
Area
NJDEP
-- Landsat_SOM
IS
NJDEP
Landsat_SOM IS
IS Area
Area
4000000
4000000
1500000
2500000
Medium
HighDensity
Density
density
Residential
Residential
Commercial
Low
Residential
2000000
1200000
Landsat_SOM
Landsat_SOM
Landsat_SOM
3000000
3000000
1500000
900000
2000000
2000000
1000000
600000
1000000
1000000
500000
300000
0
00
0
00
10000001000000
2000000
3000000
4000000
500000
1500000
2000000 4000000
2500000
1000000
2000000
3000000
300000
600000
900000 1200000 1500000
NJDEP
NJDEP
NJDEP
NJDEP
Summary of Results
• Impervious surface estimation compares favorably
to DOQ and IKONOS
– ±10 to 15% for impervious surface
– ±12 to 22% for grass and tree cover.
• Shows strong linear relationship with IKONOS in
impervious surface and grass estimation
• Greater variability in forest fraction due to variability
in canopy shadowing and understory background
Summary of the lecture
1 Majority filter: remove “salt & pepper” and/or
eliminate clump-like pixels.
2 Sampling methods of reference points
3 Contingency matrix and Accuracy assessment
measures: overall accuracy, producer’s accuracy
and user’s accuracy, and kappa coefficient.
4 Fuzzy accuracy assessment: Fuzzy rating, set
matrix, and ratio estimators.
Homework
1 Homework: Accuracy Assessment;
3 Reading Textbook Ch 13, Field Guide Ch 6