No Slide Title

Download Report

Transcript No Slide Title

Information Extraction Principles for Hyperspectral Data

David Landgrebe Professor of Electrical & Computer Engineering Purdue University [email protected]

Outline • A Historical Perspective • Data and Analysis Factors • Hyperspectral Data Characteristics • Examples • Summary of Key Factors 1

Brief History

REMOTE SENSING OF THE EARTH Atmosphere - Oceans - Land

1957 - Sputnik 1958 - National Space Act - NASA formed 1960 - TIROS I 1960 - 1980 Some 40 Earth Observational Satellites Flown 2

Image Pixels

Thematic Mapper Image Enlarged 10 Times

3

Three Generations of Sensors

MSS

1968

50 45 40 35 30 25 20 15 10 5 0 1 4 Green Veg.

Bare Soi l

6-bit data

2

Band No.

3 TM

1975

200 150 100 50 0 1 2 Green Veg.

Bare Soil

8-bit data

Hyperspectral

1986

3 4

Band No.

2500

5

2000 1500 1000 500

6 7 Water Emerging Crop Trees Soil

0 0.4

0.6

0.8

1.0

1.2

1.4

1.6

Wavelength (µm) 1.8

2.0

2.2

2.4

10-bit data

4

Systems View

Ephemeris, Calibration, etc.

Sensor On-Board Processing Preprocessing Data Analysis Information Utilization Human Participation with Ancillary Data

5

Scene Effects on Pixel

6

Data Representations

Sample Image Space Spectral Space Feature Space • • Image Space - Geographic Orientation Spectral Space - Relate to Physical Basis for Response • Feature Space - For Use in Pattern Analysis 7

Data Classes

8

SCATTER PLOT FOR TYPICAL DATA

BiPlot of Channels 4 vs 3 210 180 150 120 90 60 30 17 + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 34 51 68 Channel 3 85 + + 102 9

BHATTACHARYYA DISTANCE

B  1 8   1   2  T    1   2 2    1   1   2   1 2 Ln 1 2   1  1   2   2 Mean Difference Term Covariance Term 10

Vegetation in Spectral Space

Laboratory Data: Two classes of vegetation 11

Scatter Plots of Reflectance

Scatter of 2-Class Data

24 22 20 18 16 14 12 10 0.65

0.66

0.67

0.68

0.69

0.70

0.71

0.72

Wavele ngth - µm

Clas s 1 - 0.6 7 µm Clas s 2 - 0.6 7 µm Clas s 1 - 0.6 9 µm Clas s 2 - 0.6 9 µm 12

Vegetation in Feature Space

Samples from T wo Classes

23 22 21 20 19 18 17 16 10 11 12 13

% Re flectance at 0.67 µm

Clas s 1 Clas s 2 14 15 13

Hughes Effect

0.75

m = • 0.70

1000 0.65

500 0.60

0.55

20 10 5 m=2 50 100 200 0.50

1 2 5 10 20 50 100 200 MEASUREMENT COMPLEXITY n (Total Discrete Values) 500 1000 G.F. Hughes, "On the mean accuracy of statistical pattern recognizers," IEEE Trans. Inform. Theory., Vol IT-14, pp. 55-63, 1968.

14

A Simple Measurement Complexity Example 15

Classifiers of Varying Complexity

• Quadratic Form

g i

(

X

) =  1 2 (

X

  i )

T

i

 1 (

X

  i )  1 2 ln 

i

• Fisher Linear Discriminant - Common class covariance

g i

(

X

) =  1 2 (

X

  i )

T

  1 (

X

  i ) • Minimum Distance to Means - Ignores second moment

g i

(

X

) =  1 2 (

X

  i )

T

(

X

  i ) 16

Classifier Complexity con’t

• Correlation Classifier

g i

(

X

)   

X

T

X

T

i

X

i T

i

  • Spectral Angle Mapper

g i

(

X

)  cos  1  

X

T

X

T

i

X

i T

i

  • Matched Filter - Constrained Energy Minimization

g i

(

X

) 

X

T

C

b

 1 

i T

C

b

 1  

i i

• Other types - “Nonparametric”  Parzen Window Estimators     Fuzzy Set - based Neural Network implementations K Nearest Neighbor - K-NN etc.

17

Covariance Coefficients to be Estimated • Assume a 5 class problem in 6 dimensions Class 1 b a b a a b a a a b a a a a b a a a a a b Class 2 b a b a a b a a a b a a a a b a a a a a b Class 3 b a b a a b a a a b a a a a b a a a a a b Common Covar.

d c d c c d c c c d c c c c d c c c c c d Class 4 b a b a a b a a a b a a a a b a a a a a b Class 5 b a b a a b a a a b a a a a b a a a a a b • Normal maximum likelihood - estimate coefficients a and b • Ignore correlation between bands - estimate coefficients b • Assume common covariance - estimate coefficients c and d • Ignore correlation between bands - estimate coefficients d 18

EXAMPLE SOURCES OF CLASSIFICATION ERROR

class  2 class  1 Decisio n bo unda ry defi ned by the di agon al covaria nce cl assi fier Decisio n bo unda ry defi ned by Gaus sian ML class ifie r 19

Number of Coefficients to be Estimated

• Assume 5 classes and p features N o. of Fe ature s p 5 10 20 50 200 5{{ p +1 )p/2} 75 275 1050 6375 100,500 5p 25 50 100 250 1000 C omm on { p+1)p/ 2} 15 55 210 1275 20,100 p 5 10 20 50 200 20

Intuition and Higher Dimensional Space

Borsuk’s Conjecture: If you break a stick in two, both pieces are shorter than the original. Keller’s Conjecture: It is possible to use cubes (hypercubes) of equal size to fill an n-dimensional space, leaving no overlaps nor underlaps.

Counter-examples to both have been found for higher dimensional spaces.

Science, Vol. 259, 1 Jan 1993, pp 26-27

21

The Geometry of High Dimensional Space

The Volume of a Hypercube concentrates in the corners

V hypersphere V hypercube   2 d d2 d  1  2  d 0 1 0.8

0.6

0.4

0.2

0 1 2 3 4 5 dimension 6 d 7

The Volume of a Hypersphere concentrates in the outer shell

1 0.8

0.6

V d

(

r

) 

V d

(

r

  )

V d

(

r

) 

r d

 (

r

  )

d r d

 1    1  

r

 

d



d

  1 0.4

0.2

0 1 2 3 4 5 6 7 dimension d 8 9 10 11 22

Some Implications

 High dimensional space is mostly empty. Data in high dimensional space is mostly in a lower dimensional structure.

 Normally distributed data will have a tendency to concentrate in the tails; Uniformly distributed data will concentrate in the corners.

23

How can that be?

Volume of a hypersphere = 2

r d d

d

 (

d

/ 2 / 2) Differential Volume at r =

dV dr

  2 ( 

d d

/ 2 / 2)

r

(

d

 1) 24

How can that be? (continued)

The Probability Mass at r =

r d

 1

e

r

2 2 2

d

2  1    2 25

MORE ON GEOMETRY

• The diagonals in high dimensional spaces become nearly orthogonal to all coordinate axes cos  d 1 d Implication: The projection of any cluster onto any diagonal, e.g., by averaging features could destroy information 26

STILL MORE GEOMETRY

The number of labeled samples needed for supervised classification increases rapidly with dimensionality

In a specific instance, it has been shown that the samples required for a linear classifier increases linearly, as the square for a quadratic classifier. It has been estimated that the number increases exponentially for a non-parametric classifier.

For most high dimensional data sets, lower dimensional linear projections tend to be normal or a combination of normals.

27

A HYPERSPECTRAL DATA ANALYSIS SCHEME

200 Dimensional Data Class Conditional Feature Extraction Feature Selection Classifier/Analyzer Class-Specific Information 28

Finding Optimal Feature Subspaces

Feature Selection (FS)

Discriminant Analysis Feature Extraction (DAFE)

Decision Boundary Feature Extraction (DBFE)

Projection Pursuit (PP)

Available in MultiSpec via WWW at: http://dynamo.ecn.purdue.edu/~biehl/MultiSpec/ Additional documentation via WWW at: .

http://dynamo.ecn.purdue.edu/~landgreb/publications.html

29

Hyperspectral Image of DC Mall

HYDICE Airborne System 1208 Scan Lines, 307 Pixels/Scan Line 210 Spectral Bands in 0.4 2.4 µm Region 155 Megabytes of Data (Not yet Geometrically Corrected) 30

Define Desired Classes

Training areas designated by polygons outlined in white 31

Thematic Map of DC Mall

Legend Roofs Streets Grass Trees Paths Water Shadows Operation

Display Image Define Classes Feature Extraction Reformat Initial Classification

CPU Time (sec.) Analyst Time

18 12 67 34 Inspect and Mod. Training Final Classification

Total

33

164 sec = 2.7 min.

< 20 min.

≈ 5 min.

≈ 25 min

.

(No preprocessing involved) 32

Hyperspectral Potential - Simply Stated

• Assume 10 bit data in a 100 dimensional space.

• That is (1024) 100 ≈ 10 300 discrete locations Even for a data set of 10 6 pixels, the probability of any two pixels lying in the same discrete location is vanishingly small.

33

Summary - Limiting Factors

Ephemeris, Calibration, etc.

• Scene - The most complex and dynamic part

Sensor On-Board Processing

• Sensor - Also not under analyst’s control • Processing System - Analyst’s choices

Preprocessing Data Analysis Information Utilization Human Participation with Ancillary Data

34

Limiting Factors

Scene - Varies from hour to hour and sq. km to sq. km Sensor - Spatial Resolution, Spectral bands, S/N Processing System • Classes to be labeled - Informational Value, - Separable, - Exhaustive, • Number of samples to define the classes • Features to be used • Complexity of the Classifier 35

Source of Ancillary Input

Possibilities

• Ground Observations - From the Ground - Of the Ground • “Imaging Spectroscopy” .

Image Space Spectral Space • Previously Gather Spectra • “End Members” Feature Space 36

Use of Ancillary Input

A Key Point:

• Ancillary input is used to label training samples.

• Training samples are then used to compute class quantitative descriptions

Result:

• This reduces or eliminates the need for many types of preprocessing by normalizing out the difference between class descriptions and the data 37