CP467 IP & PR - Wilfrid Laurier University

Download Report

Transcript CP467 IP & PR - Wilfrid Laurier University

Lecture 20 Object recognition I
1.
2.
3.
4.
5.
Pattern and pattern classes
Classifiers based on Bayes Decision Theory
Recognition based on decision-theoretical methods
Optimum statistical classifiers
Pattern recognition with Matlab
Patterns and Pattern classes
• A pattern is an arrangement of descriptors (features)
– Three commonly used pattern arrangements
• Vectors
• Strings
• Trees
 x1 
x 
x   2  , x  ( x1 , x2 ,..., xn )T
 
 
 xn 
• A pattern class is a family of patterns that share some
common properties.
1 , 2 ,..., W
• Pattern recognition is to assign a given pattern to its
respective class.
Example 1
• Represent flow petals by features width and length
Then three types of iris flowers are in different pattern
classes
 x1 
x 
 x2 
Example 2
• Use signature as pattern vector
x  (r(1 ), r(2 ),,..., r(n ))
T
Example 3
• Represent pattern by string
Example 4
• Represent pattern by trees
2. Classifier based on Baysian Decision Theory
• Fundamental statistical approach
• Assumes relevant probabilities are known, compute the probability
of the event observed, then make optimal decisions
P( B | A) P( A)
• Bayes’ Theorem: P( A | B) 
P( B)
• Example:
Suppose at Laurier, 50% are girl students, 30% are science
students, among science students, 20% are girl students. If one
meet a girl student at Laurier, what is the probability that she is a
science student.
B – girl students, A – science students. Then
P( A)  30%
P( B)  50%
P( B | A)  20%
P( A | B) 
P( B | A) P( A) 0.2  0.3 0.06


 0.12
P( B)
0.5
0.5
Bayes theory
• Given x ∈ Rl and a set classes, ωi , i = 1, 2, . . . , c, the Bayes theory
states that
P(i | x) P( x)  p( x | i ) P(i )
c
P( x)   p( x | i ) P(i )
i 1
where P(ωi) is the a priori probability of class ωi ; i = 1, 2, . . . , c,
P(ωi |x) is the a posteriori probability of class ωi given the value of x;
p(x) is the probability density function (pdf ) of x; and p(x| ωi), i = 1 =
2, . . . , c, is the class conditional pdf of x given ωi (sometimes called
the likelihood of ωi with respect to x).
Bayes classifier
Let x ≡ [x(1), x(2), . . . , x(l)]T ∈ Rl be its corresponding
feature vector, which results from some measurements.
Also, we let the number of possible classes be equal to
c, that is, ω1, . . . , ωc.
Bayes decision theory: x is assigned to the class ωi if
P(i | x)  P( j | x), j  i
Multidimensional Gaussian PDF
Random vecctor x  ( x1 ,..., xn ) follows Gaussian distribution N (m, S )
p( x) 
1
(2 )1/ 2 | S |1/ 2
1
exp(  ( x  m)T S 1 ( x  m))
2
where m  E[ x], S  E[( x  m)( x  m)T ],
| S | is determinant of S
Special case n=1,N(m, )
1
1
2
p( x) 
exp
(

(
x

m
)
)
1/ 2
2
(2 ) 
2
Example
Consider a 2-class classification task in the 2-dimensional space,
where the data in both classes, ω1, ω2, are distributed according to
the Gaussian distributions N(m1,S1) and N(m2,S2), respectively.
Let
1 0 
m1  [1,1] , m1  [3,3] , S1  S2  

0 1 
Assuming that,
T
T
P(1 )  P(2 )  0.5,
Classify x = [1.8, 1.8]T into ω1 or ω2 .
Solution
P1=0.5;
P2=0.5;
m1=[1 1]'; m2=[3 3]'; S=eye(2); x=[1.8 1.8]';
p1=P1*comp_gauss_dens_val(m1,S,x);
p2=P2*comp_gauss_dens_val(m2,S,x);
The resulting values p1 = 0.042, p2 = 0.0189
According to the Bayesian classifier, x is assigned to ω1
Decision-theoretic methods
• Decision (discriminate) functions
Let x  ( x1 , x2 ,..., xn )T be pattern vector, and given
W pattern classes 1 , 2 ,..., w .
Functions d1 (x), d 2 (x),..., d w (x) are decision function
if x belongs to i if di (x)  d j (x), j  1, 2,.., w; j  i
• Decision boundary
Separting i and  j is {x : di , j (x)  di (x)  d j (x)  0}
Minimum distance classifier
• The prototype of class  j is defined to be
mj 
1
Nj
 x, j  1,..,W , where N
x
j
|  j |
j
• The distance of a pattern x to pattern class  j is defined to be
D j (x) || x  m j ||, j  1,.., W , where ||a||=(a T a)1/ 2
• Assign x to pattern class  if D ( x)  min{D ( x) : j  1,...,W }
i
i
j
or equivalently by decision function defined as
1
d j ( x )  xT m j  m j T m j
2
assign x to pattern class i if d i ( x)  min{d j ( x) : j  1,..., W }.
The boundary of i and  j is
1
di , j (x)  di (x)  d j (x)  xT (m i  m j )  (m i  m j )T (m i  m j )  0
2
Example
m1 =(4.3, 1.3)T , m 2 =(1.5, 0.3)T
d1 (x)  4.3x1  1.3x2  10.1
d 2 (x)  1.5 x1  0.3x2  1.17
d1,2 (x)  d1 (x)  d 2 (x)  2.8 x1  1.0 x2  8.9  0
x  1 if d1,2 (x)  0
x  2 if d1,2 (x)  0
Minimum Mahalanobis distance classifiers
x  ( x1 ,..., xn ) is classified into i if
1
|| x  mi || ( x  mi ) S ( x  mi ) 
T
1
( x  m j ) S ( x  m j ) || x  m j ||
T
j  i
Example
x=[0.1 0.5 0.1]';
m1=[0 0 0]'; m2=[0.5 0.5 0.5]';
m=[m1 m2];
z1=euclidean_classifier(m,x)
x=[0.1 0.5 0.1]';
m1=[0 0 0]'; m2=[0.5 0.5 0.5]';
m=[m1 m2];
S=[0.8 0.01 0.01;0.01 0.2 0.01; 0.01 0.01 0.2];
z2=mahalanobis_classifier(m,S,x);
z1 = 1 < z2 = 2
x is classified to w1
4. Matching by correlation
• Given a template w(s,t) (or mask), i.e. an m × n matrix,
find the a sub m × n matrix in f(x,y) such that it best
matches w, i.e. with largest correlation.
Correlation: c( x, y )   w( s, t ) f ( x  s, y  t )
s
t
Normalized correlation:
 (x,y)=
[w(s, t )  w][ f ( x  s, y  t )  f ( x  s, y  t )]
s
t
{ [ w( s, t )  w]
2
s
t
s
t
[ f ( x  s, y  t )  f ( x  s, y  t )] }
2
s
t
w is the average of the template
f ( x  s, y  t ) is the average of the subimage under the template
Correlation theorem
f ( x, y ) w( x, y )  F (u, v) H (u, v)
*
f ( x, y ) w ( x, y )  F (u, v) H (u, v)
*
[M, N] = size(f);
f = fft2(f);
w = conj(fft2(w, M, N));
g = real(ifft2(w.*f));
Example
Case study
• Optical character recognition (OCR)
– Preprocessing
Digitization, make binary
Noise elimination, thinning, normalizing Feature Extraction
(by character, word part, word)
Segmentation (explicit or implicit)
Detection of major features (top-down approach)
– Matching
Recognition of character
Context verification from knowledge base
– Understanding and Action
• See the reference
Example
3. Optimum statistical classifiers
Let p (i / x) denote the probability that pattern x from class i
Li , j denote the loss if the classifier determine x from class i but it
actually  j . The average loss
W
1 W
rj (x)   Lk , j p (k / x) 
Lk , j p (k / x) P(k )

p ( x) k 1
k 1
Simplied loss function
W
rj (x)   Lk , j p (k / x) P(k )
k 1
Optimum statistical classifier (Bayes classifer):
Assigne x to i if ri ( x)  min{rj ( x) : j  1,..., W }
i.e., assign to the class of the minimum loss.
Example :
Li , j  1   i , j ,  i , j  1 if i  j otherwise  i , j  0
rj (x)  p ( x)  p ( j / x) P( j )
d j (x)  p ( j / x) P( j ), j  1,..., W
Bayes classifer for Gaussian pattern class
• Consider two patter classes with Gaussian distribution
of means m1 ,m2 ,and standard deviations  1 ,  2
d j (x)  p ( j / x) P ( j ) 
1
2 j

e
( xm j )
2 2j
, j  1, 2
N-dimensional case
1
 ( x m j )T Cj 1 ( x m j )
1
2
p ( j / x) 
e
(2 ) n / 2 | C j |1/ 2
1
m j =E j{x} 
Nj
 x
x
j
1
C j  E j{( x  m j )(x  m j ) } 
Nj
T
T
T
xx

m
m

j
j
x j
1
1
d j (x)  ln P( j )  ln | C j |  [(x  m j )T Cj 1 (x  m j )]
2
2
Example
3
1
1
1
m1  1 , m 2  3
4
4
1
3
3 1 1 
 8 4 4 
1
C  C1  C2  1 3 1 , C1   4 8 4 
6
1 1 3 
 4 4 8 
1 T 1
T 1
d j ( x)  x C m j  m j C m j
2
d1 (x)  4 x1  1.5, d 2 (x)  4 x1  8 x2  8 x3  5.5
d1 (x)  d 2 (x)  8 x1  8 x2  8 x3  4
A real example
Linear classifier
• Two classes
f ( x)  w1 x1  ...  wn xn  w0  w x  w0
T
x  1 if f ( x)  0;otherwise x  2
• f(x) is a separation hyperplane
• How to obtain the coefficients, or weights wi
• By perceptron algorithm
w(t  1)  w(t )  t   x x
xY
 1, x  1
t is a constant.  x  
1, x  2
• How to obtain the coefficients, or weights wi
The Online Form of the Perceptron Algorithm
The Multiclass LS Classifier
• The classification rule is now as follows: Given x, classify
it to class ωi if