Introduction to Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis

Download Report

Transcript Introduction to Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis

Introduction to Pattern
Recognition
Chapter 1 (Duda et al.)
CS479/679 Pattern Recognition
Dr. George Bebis
1
What is Pattern Recognition?
• Assign an unknown pattern to one of several
known categories (or classes).
2
What is a Pattern?
• A pattern could be an object or event.
biometric patterns
hand gesture patterns
3
What is a Pattern? (con’t)
• Loan/Credit card applications
– Income, # of dependents, mortgage amount  credit worthiness
classification.
• Dating services
– Age, hobbies, income “desirability” classification
• Web documents
– Key-word based descriptions (e.g., documents containing
“football”, “NFL”)  document classification.
4
Pattern Class
• A collection of “similar” objects.
Female
Male
5
How do we model a Pattern Class?
• Typically, using a statistical model.
– probability density function (e.g., Gaussian)
Gender Classification
male
female
6
How do we model a Pattern Class?
(cont’d)
• Key challenges:
– Intra-class variability
The letter “T” in different typefaces
– Inter-class variability
Letters/Numbers that look similar
7
Pattern Recognition:
Main Objectives
• Hypothesize the models that describe each
pattern class (e.g., recover the process that
generated the patterns).
• Given a novel pattern, choose the best-fitting
model for it and then assign it to the pattern class
associated with the model.
8
Classification vs Clustering
– Classification (known categories)
– Clustering (unknown categories)
Category “A”
Category “B”
Classification (Recognition)
Clustering
(Unsupervised Classification)
(Supervised Classification)
9
Pattern Recognition
Applications
10
Handwriting Recognition
11
Handwriting Recognition (cont’d)
12
License Plate Recognition
13
Biometric Recognition
14
Fingerprint Classification
15
Face Detection
16
Gender Classification
Balanced classes (i.e., male vs female)
17
Autonomous Systems
18
Medical Applications
Skin Cancer Detection
Breast Cancer Detection
19
Land Classification
(from aerial or satellite images)
20
“Hot” Applications
• Recommendation systems
– Amazon, Netflix
• Targeted advertising
21
The Netflix Prize
• Predict how much someone is going to enjoy a
movie based on their movie preferences.
– $1M awarded in Sept. 2009
• Can software recommend movies to customers?
– Not Rambo to Woody Allen fans
– Not Saw VI if you’ve seen all previous Saw movies
22
Main Classification Approaches
x: input vector (pattern)
y: class label (class)
• Generative
– Model the joint probability, p(x, y)
– Make predictions by using Bayes rules to calculate p(ylx)
– Pick the most likely label y
• Discriminative
– Estimate p(ylx) directly (e.g., learn a direct map from inputs x to
the class labels y)
– Pick the most likely label y
23
“Syntactic” Pattern Recognition
Approach
• Represent patterns
in terms of simple
primitives.
• Describe patterns
using deterministic
grammars or formal
languages.
24
Complexity of PR – An Example
Problem: Sorting
incoming fish on a
conveyor belt.
Assumption: Two
kind of fish:
(1) sea bass
(2) salmon
25
Pre-processing Step
Example
(1) Image enhancement
(2) Separate touching
or occluding fish
(3) Find the boundary of
each fish
26
Feature Extraction
• Assume a fisherman told us that a sea bass is
generally longer than a salmon.
• We can use length as a feature and decide
between sea bass and salmon according to a
threshold on length.
• How should we choose the threshold?
27
“Length” Histograms
threshold l*
• Even though sea bass is longer than salmon on
the average, there are many examples of fish
where this observation does not hold.
28
“Average Lightness” Histograms
• Consider a different feature such as “average
lightness”
threshold x*
• It seems easier to choose the threshold x* but we
still cannot make a perfect decision.
29
Multiple Features
• To improve recognition accuracy, we might have
to use more than one features at a time.
– Single features might not yield the best performance.
– Using combinations of features might yield better
performance.
 x1  x1 : lightness
 x  x : width
 2 2
• How many features should we choose?
30
Classification
• Partition the feature space into two regions by
finding the decision boundary that minimizes the
error.
• How should we find the optimal decision
boundary?
31
PR System – Two Phases
Test Phase
Training Phase
32
Sensors & Preprocessing
• Sensing:
– Use a sensor (camera or microphone) for data
capture.
– PR depends on bandwidth, resolution, sensitivity,
distortion of the sensor.
• Pre-processing:
– Removal of noise in data.
– Segmentation (i.e., isolation of patterns of interest
from background).
33
Training/Test data
• How do we know that we have collected an
adequately large and representative set of
examples for training/testing the system?
Training Set
Test Set ?
34
Feature Extraction
• How to choose a good set of features?
– Discriminative features
– Invariant features (e.g., translation, rotation and
scale)
• Are there ways to automatically learn which
features are best ?
35
How Many Features?
• Does adding more features always improve
performance?
– It might be difficult and computationally
expensive to extract certain features.
– Correlated features might not improve
performance.
– “Curse” of dimensionality.
36
Curse of Dimensionality
• Adding too many features can, paradoxically, lead to a
worsening of performance.
– Divide each of the input features into a number of intervals, so
that the value of a feature can be specified approximately by
saying in which interval it lies.
– If each input feature is divided into M divisions, then the total
number of cells is Md (d: # of features).
– Since each cell must contain at least one point, the number of
training data grows exponentially with d.
37
Missing Features
• Certain features might be missing (e.g., due
to occlusion).
• How should we train the classifier with
missing features ?
• How should the classifier make the best
decision with missing features ?
38
Complexity
• We can get perfect classification performance on the
training data by choosing complex models.
• Complex models are tuned to the particular training
samples, rather than on the characteristics of the true
model.
overfitting
How well can the model generalize to unknown samples?
39
Generalization
• Generalization is defined as the ability of a classifier to
produce correct results on novel patterns.
• How can we improve generalization performance ?
– More training examples (i.e., better model estimates).
– Simpler models usually yield better performance.
complex model
simpler model
40
More on model complexity
• Consider the following 10 sample points (blue circles)
assuming some noise.
• Green curve is the true function that generated the
data.
• Approximate the true function from the sample points.
41
More on model complexity (cont’d)
Polynomial curve fitting: polynomials having various
orders, shown as red curves, fitted to the set of 10 sample
points.
42
More on complexity (cont’d)
Polynomial curve fitting: 9’th order polynomials fitted to
15 and 100 sample points.
43
Ensembles of Classifiers
• Performance can be
improved using a
"pool" of classifiers.
• How should we build
and combine different
classifiers ?
44
PR System (cont’d)
• Post-processing:
– Exploit context to improve performance.
How m ch
info mation are y u
mi sing?
45
Cost of miss-classifications
• Consider the fish classification example; there
are two possible classification errors:
(1) Deciding the fish was a sea bass when it was a
salmon.
(2) Deciding the fish was a salmon when it was a sea
bass.
• Are both errors equally important ?
46
Cost of miss-classifications (cont’d)
• Suppose the fish packing company knows that:
– Customers who buy salmon will object vigorously if
they see sea bass in their cans.
– Customers who buy sea bass will not be unhappy if
they occasionally see some expensive salmon in
their cans.
• How does this knowledge affect our decision?
47
Computational Complexity
• How does an algorithm scale with the
number of:
• features
• patterns
• categories
• Consider tradeoffs between computational
complexity and performance.
48
Would it be possible to build a
“general purpose” PR system?
• Humans have the ability to switch rapidly and
seamlessly between different pattern recognition
tasks.
• It is very difficult to design a system that is
capable of performing a variety of classification
tasks.
– Different decision tasks may require different features.
– Different features might yield different solutions.
– Different tradeoffs exist for different tasks.
49