Transcript No Slide Title
Information Extraction Principles for Hyperspectral Data
David Landgrebe Professor of Electrical & Computer Engineering Purdue University [email protected]
Outline • A Historical Perspective • Data and Analysis Factors • Hyperspectral Data Characteristics • Examples • Summary of Key Factors 1
Brief History
REMOTE SENSING OF THE EARTH Atmosphere - Oceans - Land
1957 - Sputnik 1958 - National Space Act - NASA formed 1960 - TIROS I 1960 - 1980 Some 40 Earth Observational Satellites Flown 2
Image Pixels
Thematic Mapper Image Enlarged 10 Times
3
Three Generations of Sensors
MSS
1968
50 45 40 35 30 25 20 15 10 5 0 1 4 Green Veg.
Bare Soi l
6-bit data
2
Band No.
3 TM
1975
200 150 100 50 0 1 2 Green Veg.
Bare Soil
8-bit data
Hyperspectral
1986
3 4
Band No.
2500
5
2000 1500 1000 500
6 7 Water Emerging Crop Trees Soil
0 0.4
0.6
0.8
1.0
1.2
1.4
1.6
Wavelength (µm) 1.8
2.0
2.2
2.4
10-bit data
4
Systems View
Ephemeris, Calibration, etc.
Sensor On-Board Processing Preprocessing Data Analysis Information Utilization Human Participation with Ancillary Data
5
Scene Effects on Pixel
6
Data Representations
Sample Image Space Spectral Space Feature Space • • Image Space - Geographic Orientation Spectral Space - Relate to Physical Basis for Response • Feature Space - For Use in Pattern Analysis 7
Data Classes
8
SCATTER PLOT FOR TYPICAL DATA
BiPlot of Channels 4 vs 3 210 180 150 120 90 60 30 17 + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 34 51 68 Channel 3 85 + + 102 9
BHATTACHARYYA DISTANCE
B 1 8 1 2 T 1 2 2 1 1 2 1 2 Ln 1 2 1 1 2 2 Mean Difference Term Covariance Term 10
Vegetation in Spectral Space
Laboratory Data: Two classes of vegetation 11
Scatter Plots of Reflectance
Scatter of 2-Class Data
24 22 20 18 16 14 12 10 0.65
0.66
0.67
0.68
0.69
0.70
0.71
0.72
Wavele ngth - µm
Clas s 1 - 0.6 7 µm Clas s 2 - 0.6 7 µm Clas s 1 - 0.6 9 µm Clas s 2 - 0.6 9 µm 12
Vegetation in Feature Space
Samples from T wo Classes
23 22 21 20 19 18 17 16 10 11 12 13
% Re flectance at 0.67 µm
Clas s 1 Clas s 2 14 15 13
Hughes Effect
0.75
m = • 0.70
1000 0.65
500 0.60
0.55
20 10 5 m=2 50 100 200 0.50
1 2 5 10 20 50 100 200 MEASUREMENT COMPLEXITY n (Total Discrete Values) 500 1000 G.F. Hughes, "On the mean accuracy of statistical pattern recognizers," IEEE Trans. Inform. Theory., Vol IT-14, pp. 55-63, 1968.
14
A Simple Measurement Complexity Example 15
Classifiers of Varying Complexity
• Quadratic Form
g i
(
X
) = 1 2 (
X
i )
T
i
1 (
X
i ) 1 2 ln
i
• Fisher Linear Discriminant - Common class covariance
g i
(
X
) = 1 2 (
X
i )
T
1 (
X
i ) • Minimum Distance to Means - Ignores second moment
g i
(
X
) = 1 2 (
X
i )
T
(
X
i ) 16
Classifier Complexity con’t
• Correlation Classifier
g i
(
X
)
X
T
X
T
i
X
i T
i
• Spectral Angle Mapper
g i
(
X
) cos 1
X
T
X
T
i
X
i T
i
• Matched Filter - Constrained Energy Minimization
g i
(
X
)
X
T
C
b
1
i T
C
b
1
i i
• Other types - “Nonparametric” Parzen Window Estimators Fuzzy Set - based Neural Network implementations K Nearest Neighbor - K-NN etc.
17
Covariance Coefficients to be Estimated • Assume a 5 class problem in 6 dimensions Class 1 b a b a a b a a a b a a a a b a a a a a b Class 2 b a b a a b a a a b a a a a b a a a a a b Class 3 b a b a a b a a a b a a a a b a a a a a b Common Covar.
d c d c c d c c c d c c c c d c c c c c d Class 4 b a b a a b a a a b a a a a b a a a a a b Class 5 b a b a a b a a a b a a a a b a a a a a b • Normal maximum likelihood - estimate coefficients a and b • Ignore correlation between bands - estimate coefficients b • Assume common covariance - estimate coefficients c and d • Ignore correlation between bands - estimate coefficients d 18
EXAMPLE SOURCES OF CLASSIFICATION ERROR
class 2 class 1 Decisio n bo unda ry defi ned by the di agon al covaria nce cl assi fier Decisio n bo unda ry defi ned by Gaus sian ML class ifie r 19
Number of Coefficients to be Estimated
• Assume 5 classes and p features N o. of Fe ature s p 5 10 20 50 200 5{{ p +1 )p/2} 75 275 1050 6375 100,500 5p 25 50 100 250 1000 C omm on { p+1)p/ 2} 15 55 210 1275 20,100 p 5 10 20 50 200 20
Intuition and Higher Dimensional Space
Borsuk’s Conjecture: If you break a stick in two, both pieces are shorter than the original. Keller’s Conjecture: It is possible to use cubes (hypercubes) of equal size to fill an n-dimensional space, leaving no overlaps nor underlaps.
Counter-examples to both have been found for higher dimensional spaces.
Science, Vol. 259, 1 Jan 1993, pp 26-27
21
The Geometry of High Dimensional Space
The Volume of a Hypercube concentrates in the corners
V hypersphere V hypercube 2 d d2 d 1 2 d 0 1 0.8
0.6
0.4
0.2
0 1 2 3 4 5 dimension 6 d 7
The Volume of a Hypersphere concentrates in the outer shell
1 0.8
0.6
V d
(
r
)
V d
(
r
)
V d
(
r
)
r d
(
r
)
d r d
1 1
r
d
d
1 0.4
0.2
0 1 2 3 4 5 6 7 dimension d 8 9 10 11 22
Some Implications
High dimensional space is mostly empty. Data in high dimensional space is mostly in a lower dimensional structure.
Normally distributed data will have a tendency to concentrate in the tails; Uniformly distributed data will concentrate in the corners.
23
How can that be?
Volume of a hypersphere = 2
r d d
d
(
d
/ 2 / 2) Differential Volume at r =
dV dr
2 (
d d
/ 2 / 2)
r
(
d
1) 24
How can that be? (continued)
The Probability Mass at r =
r d
1
e
r
2 2 2
d
2 1 2 25
MORE ON GEOMETRY
• The diagonals in high dimensional spaces become nearly orthogonal to all coordinate axes cos d 1 d Implication: The projection of any cluster onto any diagonal, e.g., by averaging features could destroy information 26
STILL MORE GEOMETRY
•
The number of labeled samples needed for supervised classification increases rapidly with dimensionality
In a specific instance, it has been shown that the samples required for a linear classifier increases linearly, as the square for a quadratic classifier. It has been estimated that the number increases exponentially for a non-parametric classifier.
•
For most high dimensional data sets, lower dimensional linear projections tend to be normal or a combination of normals.
27
A HYPERSPECTRAL DATA ANALYSIS SCHEME
200 Dimensional Data Class Conditional Feature Extraction Feature Selection Classifier/Analyzer Class-Specific Information 28
Finding Optimal Feature Subspaces
•
Feature Selection (FS)
•
Discriminant Analysis Feature Extraction (DAFE)
•
Decision Boundary Feature Extraction (DBFE)
•
Projection Pursuit (PP)
Available in MultiSpec via WWW at: http://dynamo.ecn.purdue.edu/~biehl/MultiSpec/ Additional documentation via WWW at: .
http://dynamo.ecn.purdue.edu/~landgreb/publications.html
29
Hyperspectral Image of DC Mall
HYDICE Airborne System 1208 Scan Lines, 307 Pixels/Scan Line 210 Spectral Bands in 0.4 2.4 µm Region 155 Megabytes of Data (Not yet Geometrically Corrected) 30
Define Desired Classes
Training areas designated by polygons outlined in white 31
Thematic Map of DC Mall
Legend Roofs Streets Grass Trees Paths Water Shadows Operation
Display Image Define Classes Feature Extraction Reformat Initial Classification
CPU Time (sec.) Analyst Time
18 12 67 34 Inspect and Mod. Training Final Classification
Total
33
164 sec = 2.7 min.
< 20 min.
≈ 5 min.
≈ 25 min
.
(No preprocessing involved) 32
Hyperspectral Potential - Simply Stated
• Assume 10 bit data in a 100 dimensional space.
• That is (1024) 100 ≈ 10 300 discrete locations Even for a data set of 10 6 pixels, the probability of any two pixels lying in the same discrete location is vanishingly small.
33
Summary - Limiting Factors
Ephemeris, Calibration, etc.
• Scene - The most complex and dynamic part
Sensor On-Board Processing
• Sensor - Also not under analyst’s control • Processing System - Analyst’s choices
Preprocessing Data Analysis Information Utilization Human Participation with Ancillary Data
34
Limiting Factors
Scene - Varies from hour to hour and sq. km to sq. km Sensor - Spatial Resolution, Spectral bands, S/N Processing System • Classes to be labeled - Informational Value, - Separable, - Exhaustive, • Number of samples to define the classes • Features to be used • Complexity of the Classifier 35
Source of Ancillary Input
Possibilities
• Ground Observations - From the Ground - Of the Ground • “Imaging Spectroscopy” .
Image Space Spectral Space • Previously Gather Spectra • “End Members” Feature Space 36
Use of Ancillary Input
A Key Point:
• Ancillary input is used to label training samples.
• Training samples are then used to compute class quantitative descriptions
Result:
• This reduces or eliminates the need for many types of preprocessing by normalizing out the difference between class descriptions and the data 37