Value Proposition for Acoustic Fan Monitoring Technology

Download Report

Transcript Value Proposition for Acoustic Fan Monitoring Technology

Bell Laboratories
Data Complexity Analysis:
Linkage between Context and Solution in Classification
Tin Kam Ho
With contributions from
Mitra Basu, Ester Bernado-Mansilla, Richard Baumgartner, Martin Law,
Erinija Pranckeviciene, Albert Orriols-Puig, Nuria Macia
Pattern Recognition: Research vs. Practice
Steps to solve a practical pattern recognition problem
Data
Collection
Sensory
Data
Study of the
Problem Context
Feature
Extraction
Feature
Vectors
Classifier
Training
Classifier
Practical
Focus
Danger of
Disconnection
2
Classification
Research
Focus
Decision
Study of the
Mathematical Solution
All Rights Reserved © Alcatel-Lucent 2008
Reconnecting Context and Solution
To understand how changes in the
problem set-up and data collection
procedures may affect such properties
Study of the
Problem Context
Data Complexity Analysis:
Analysis of the properties of
feature vectors
Feature
Vectors
To understand how such
properties may impact the
classification solution
Improvements
Limitations
Study of the
Mathematical Solution
Expectations
3
All Rights Reserved © Alcatel-Lucent 2008
Focus is on Boundary Complexity
•Kolmogorov complexity
•Boundary length can be exponential in dimensionality
•A trivial description is to list all points & class labels
•Is there a shorter description?
4
All Rights Reserved © Alcatel-Lucent 2008
Early Discoveries
•Problems distribute in a continuum in complexity space
•Several key measures provide independent characterization
•There exist identifiable domains of classifier’s dominant
competency
•Feature selection and transformation induce variability in
complexity estimates
5
All Rights Reserved © Alcatel-Lucent 2008
Parameterization of Data Complexity
6
All Rights Reserved © Alcatel-Lucent 2008
Complexity Classes vs. Complexity Scales
•Study is driven by observed limits in classifier accuracy,
even with new, sophisticated methods (e.g., ensembles,
SVM, …)
•Analysis is needed for each instance of a classification
problem, not just the worst case of a family of problems
•Linear separability: the earliest attempt to address
classification complexity
•Observed in real-world problems: different degrees of
linear non-separability
•Continuous scale is needed
7
All Rights Reserved © Alcatel-Lucent 2008
Some Useful Measures of Geometric Complexity
Degree of Linear Separability
Find separating hyperplane by linear
programming
Error counts and
distances to plane
measure separability
Length of Class Boundary
Compute minimum
spanning tree
Count class-crossing
edges
8
Fisher’s Discriminant Ratio
Classical measure of
class separability
Maximize over all
features to find the
most discriminating
(μ1  μ2 )2
f  σ 2 σ 2
1
2
Shapes of Class Manifolds
Cover same-class pts
with maximal balls
Ball counts describe
shape of class
manifold
All Rights Reserved © Alcatel-Lucent 2008
Continuous Distributions in Complexity Space
Real-World Data Sets:
Benchmarking data from UC-Irvine archive
844 two-class problems
452 are linearly separable, 392 non-separable
Synthetic Data Sets:
Random
labeling
randomly located points
100 problems in 1-100 dimensions
Linearly
separable
real-world data
9
Metric 2
Random labeling of
Linearly nonseparable realworld data
Complexity Metric 1
All Rights Reserved © Alcatel-Lucent 2008
Measures of Geometrical Complexity
10
All Rights Reserved © Alcatel-Lucent 2008
The First 6 Principal Components
11
All Rights Reserved © Alcatel-Lucent 2008
Interpretation of the First 4 PCs
PC 1: 50% of variance: Linearity of boundary and
proximity of opposite class neighbor
PC 2: 12% of variance: Balance between within-class
scatter and between-class distance
PC 3: 11% of variance: Concentration & orientation of
intrusion into opposite class
PC 4: 9% of variance: Within-class scatter
12
All Rights Reserved © Alcatel-Lucent 2008
Problem Distribution in 1st & 2nd Principal Components
• Continuous distribution
Linearly
separable
• Known easy & difficult
problems occupy
opposite ends
• Few outliers
• Empty regions
Random
labels
13
All Rights Reserved © Alcatel-Lucent 2008
Apparent vs. True Complexity:
Uncertainty in Measures due to Sampling Density
Problem may
appear deceptively
simple or complex
with small samples
14
2 points
10 points
100 points
500 points
All Rights Reserved © Alcatel-Lucent 2008
1000 points
Observations
•Problems distribute in a continuum in complexity space
•Several key measures/dimensions provide independent
characterization
•Need further analysis on uncertainty in complexity
estimates due to small sample size effects
15
All Rights Reserved © Alcatel-Lucent 2008
Relating Classifier Behavior to Data Complexity
16
All Rights Reserved © Alcatel-Lucent 2008
Class Boundaries Inferred by Different Classifiers
XCS: a genetic
algorithm
17
Nearest neighbor
classifier
All Rights Reserved © Alcatel-Lucent 2008
Linear
classifier
Accuracy Depends on the Goodness of Match between
Classifiers and Problems
Problem A
Problem B
Better
!
Better
!
XCS
18
error=
1.9%
NN
error=
0.06%
XCS
All Rights Reserved © Alcatel-Lucent 2008
error=
0.6%
NN
error=
0.7%
Domains of Competence of Classifiers
Given a classification problem,
we want determine which classifier is the best for it.
Can data complexity give us a hint?
Metric 2
?
XCS
LC
Decision
Forest
NN
Complexity metric 1
19
All Rights Reserved © Alcatel-Lucent 2008
Domain of Competence Experiment
Use a set of 9 complexity measures
Boundary, Pretop, IntraInter, NonLinNN, NonLinLP,
Fisher, MaxEff, VolumeOverlap, Npts/Ndim
Characterize 392 two-class problems from UCI data,
all shown to be linearly non-separable
Evaluate 6 classifiers
NN
LP
Odt
Pdfc
Bdfc
XCS
20
(1-nearest neighbor)
(linear classifier by linear programming)
(oblique decision tree)
(random subspace decision forest)
(bagging based decision forest)
(a genetic-algorithm based classifier)
All Rights Reserved © Alcatel-Lucent 2008
ensemble
methods
Identifiable Domains of Competence by NN and LP
Best Classifier for Benchmarking Data
21
All Rights Reserved © Alcatel-Lucent 2008
Less Identifiable Domains of Competence
Regions in complexity space where the best classifier is (nn,lp, or odt)
vs. an ensemble technique
Boundary-NonLinNN
IntraInter-Pretop
MaxEff-VolumeOverlap
•ensemble
+ nn,lp,odt
22
All Rights Reserved © Alcatel-Lucent 2008
Uncertainty of Estimates at Two Levels
Sparse training data in each problem &
complex geometry cause ill-posedness of
class boundaries
(uncertainty in feature space)
Sparse sample of problems causes
difficulty in identifying regions of
dominant competence
(uncertainty in complexity space)
23
All Rights Reserved © Alcatel-Lucent 2008
Complexity and Data Dimensionality:
Class Separability after Dimensionality Reduction
Feature selection/transformation may change the difficulty
of a classification problem:
• Widening the gap between classes
• Compressing the discriminatory information
• Removing irrelevant dimensions
It is often unclear to what extent these happen
We seek quantitative description of such changes
Feature selection
24
All Rights Reserved © Alcatel-Lucent 2008
Discrimination
Spread of classification accuracy and geometrical
complexity due to forward feature selection
FFS subsets all datasets
boundary versus 1NN classification
error
0.7
0.6
spectra1
colon
spectra2
ovarian
eogat
spectra3
1NN error
0.5
0.4
0.3
0.2
0.1
0
10
25
20
30
40
50
Boundary
60
70
All Rights Reserved © Alcatel-Lucent 2008
80
90
Designing a Strategy for Classifier Evaluation
26
All Rights Reserved © Alcatel-Lucent 2008
A Complete Platform for Evaluating Learning Algorithms
To facilitate progress on learning algorithms:
• Need a way to systematically create learning problems
• Provide a complete coverage of the complexity space
• Be representative of all the known problems
i.e., every classification problem arising
in the real-world should have a close neighbor
representing it in the complexity space.
Is this possible?
27
All Rights Reserved © Alcatel-Lucent 2008
Ways to Synthesize Classification Problems
• Synthesizing data with targeted levels of complexity
• e.g. compute MST over a uniform point distribution, then assign
class-crossing edges randomly [Macia et al. 2008]
• or, create partitions with increasing resolution
• can create continuous cover of complexity space
• but, are the data similar to those arising from reality?
28
All Rights Reserved © Alcatel-Lucent 2008
Ways to Synthesize Classification Problems
• Synthesizing data to simulate natural processes
• e.g. Neyman-Scott process
• how many such processes have explicit models?
• how many are needed to cover all real-world problems?
• Systematically degrade real-world datasets
• increase noise, reduce image resolution, …
29
All Rights Reserved © Alcatel-Lucent 2008
Simplification of Class Geometry
30
All Rights Reserved © Alcatel-Lucent 2008
Manifold Learning and Dimensionality Reduction
• Manifold learning techniques that highlight intrinsic dimensions
• But the class boundary may not follow the intrinsic dimensions
31
All Rights Reserved © Alcatel-Lucent 2008
Manifold Learning and Dimensionality Reduction
• Supervised manifold learning – seek mappings
that exaggerate class separation
[de Ridder et al., 2003]
• Best, the mapping should be sought to directly
minimize some measures of data complexity
32
All Rights Reserved © Alcatel-Lucent 2008
Seeking Optimizations Upstream
Back to the application context:
• Use data complexity measures for guidance
• Change the setup, definition of the classification problem
• Collect more samples, in finer resolution, extract more features …
• Alternative representations:
• dissimilarity-based? [Pekalska & Duin 2005]
Data complexity gives an operational definition of learnability
Optimization in the upstream: formalize the intuition of seeking
invariance, systematically optimize the problem setup and data
acquisition scenario to reduce data complexity
33
All Rights Reserved © Alcatel-Lucent 2008
Recent Examples from the Internet
34
All Rights Reserved © Alcatel-Lucent 2008
CAPTCHA:
Completely Automated Public Turing test to tell Computers and Humans Apart
Also known as
• Reverse Turing Test
• Human Interactive Proofs
[von Ahn et al., CMU 2000]
Exploit limitations in accuracy of machine pattern recognition
35
All Rights Reserved © Alcatel-Lucent 2008
The Netflix Challenge
• $1 Million Prize for the first team to improve 10% over the company’s
own recommender system
• But, is the goal achievable? Do the training data support such
possibility?
36
All Rights Reserved © Alcatel-Lucent 2008
Amazon’s Mechanical Turk
• “Crowd-sourcing” tedious human intelligence (pattern recognition) tasks
• Which ones are doable by machines?
37
All Rights Reserved © Alcatel-Lucent 2008
Conclusions
38
All Rights Reserved © Alcatel-Lucent 2008
Summary
Automatic classification is useful, but can be very difficult.
We know the key steps and many promising methods.
But we have not fully understood how they work, what else is
needed.
We found measures for geometric complexity that are useful
to characterize difficulties of classification problems and
classifier domains of competence.
Better understanding of how data and classifiers interact can
guide practice, and re-establish the linkage between context
and solution.
39
All Rights Reserved © Alcatel-Lucent 2008
For the Future
Further progress in statistical and machine learning will need
systematic, scientific evaluation of the algorithms with
problems that are difficult for different reasons.
A “problem synthesizer” will be useful to provide a complete
evaluation platform, and reveal the “blind spots” of current
learning algorithms.
Rigorous statistical characterization of complexity estimates
from limited training data will help gauge the uncertainty,
and determine applicability of data complexity methods.
40
All Rights Reserved © Alcatel-Lucent 2008