CP467 IP & PR - Wilfrid Laurier University
Download
Report
Transcript CP467 IP & PR - Wilfrid Laurier University
Lecture 20 Object recognition I
1.
2.
3.
4.
5.
Pattern and pattern classes
Classifiers based on Bayes Decision Theory
Recognition based on decision-theoretical methods
Optimum statistical classifiers
Pattern recognition with Matlab
Patterns and Pattern classes
• A pattern is an arrangement of descriptors (features)
– Three commonly used pattern arrangements
• Vectors
• Strings
• Trees
x1
x
x 2 , x ( x1 , x2 ,..., xn )T
xn
• A pattern class is a family of patterns that share some
common properties.
1 , 2 ,..., W
• Pattern recognition is to assign a given pattern to its
respective class.
Example 1
• Represent flow petals by features width and length
Then three types of iris flowers are in different pattern
classes
x1
x
x2
Example 2
• Use signature as pattern vector
x (r(1 ), r(2 ),,..., r(n ))
T
Example 3
• Represent pattern by string
Example 4
• Represent pattern by trees
2. Classifier based on Baysian Decision Theory
• Fundamental statistical approach
• Assumes relevant probabilities are known, compute the probability
of the event observed, then make optimal decisions
P( B | A) P( A)
• Bayes’ Theorem: P( A | B)
P( B)
• Example:
Suppose at Laurier, 50% are girl students, 30% are science
students, among science students, 20% are girl students. If one
meet a girl student at Laurier, what is the probability that she is a
science student.
B – girl students, A – science students. Then
P( A) 30%
P( B) 50%
P( B | A) 20%
P( A | B)
P( B | A) P( A) 0.2 0.3 0.06
0.12
P( B)
0.5
0.5
Bayes theory
• Given x ∈ Rl and a set classes, ωi , i = 1, 2, . . . , c, the Bayes theory
states that
P(i | x) P( x) p( x | i ) P(i )
c
P( x) p( x | i ) P(i )
i 1
where P(ωi) is the a priori probability of class ωi ; i = 1, 2, . . . , c,
P(ωi |x) is the a posteriori probability of class ωi given the value of x;
p(x) is the probability density function (pdf ) of x; and p(x| ωi), i = 1 =
2, . . . , c, is the class conditional pdf of x given ωi (sometimes called
the likelihood of ωi with respect to x).
Bayes classifier
Let x ≡ [x(1), x(2), . . . , x(l)]T ∈ Rl be its corresponding
feature vector, which results from some measurements.
Also, we let the number of possible classes be equal to
c, that is, ω1, . . . , ωc.
Bayes decision theory: x is assigned to the class ωi if
P(i | x) P( j | x), j i
Multidimensional Gaussian PDF
Random vecctor x ( x1 ,..., xn ) follows Gaussian distribution N (m, S )
p( x)
1
(2 )1/ 2 | S |1/ 2
1
exp( ( x m)T S 1 ( x m))
2
where m E[ x], S E[( x m)( x m)T ],
| S | is determinant of S
Special case n=1,N(m, )
1
1
2
p( x)
exp
(
(
x
m
)
)
1/ 2
2
(2 )
2
Example
Consider a 2-class classification task in the 2-dimensional space,
where the data in both classes, ω1, ω2, are distributed according to
the Gaussian distributions N(m1,S1) and N(m2,S2), respectively.
Let
1 0
m1 [1,1] , m1 [3,3] , S1 S2
0 1
Assuming that,
T
T
P(1 ) P(2 ) 0.5,
Classify x = [1.8, 1.8]T into ω1 or ω2 .
Solution
P1=0.5;
P2=0.5;
m1=[1 1]'; m2=[3 3]'; S=eye(2); x=[1.8 1.8]';
p1=P1*comp_gauss_dens_val(m1,S,x);
p2=P2*comp_gauss_dens_val(m2,S,x);
The resulting values p1 = 0.042, p2 = 0.0189
According to the Bayesian classifier, x is assigned to ω1
Decision-theoretic methods
• Decision (discriminate) functions
Let x ( x1 , x2 ,..., xn )T be pattern vector, and given
W pattern classes 1 , 2 ,..., w .
Functions d1 (x), d 2 (x),..., d w (x) are decision function
if x belongs to i if di (x) d j (x), j 1, 2,.., w; j i
• Decision boundary
Separting i and j is {x : di , j (x) di (x) d j (x) 0}
Minimum distance classifier
• The prototype of class j is defined to be
mj
1
Nj
x, j 1,..,W , where N
x
j
| j |
j
• The distance of a pattern x to pattern class j is defined to be
D j (x) || x m j ||, j 1,.., W , where ||a||=(a T a)1/ 2
• Assign x to pattern class if D ( x) min{D ( x) : j 1,...,W }
i
i
j
or equivalently by decision function defined as
1
d j ( x ) xT m j m j T m j
2
assign x to pattern class i if d i ( x) min{d j ( x) : j 1,..., W }.
The boundary of i and j is
1
di , j (x) di (x) d j (x) xT (m i m j ) (m i m j )T (m i m j ) 0
2
Example
m1 =(4.3, 1.3)T , m 2 =(1.5, 0.3)T
d1 (x) 4.3x1 1.3x2 10.1
d 2 (x) 1.5 x1 0.3x2 1.17
d1,2 (x) d1 (x) d 2 (x) 2.8 x1 1.0 x2 8.9 0
x 1 if d1,2 (x) 0
x 2 if d1,2 (x) 0
Minimum Mahalanobis distance classifiers
x ( x1 ,..., xn ) is classified into i if
1
|| x mi || ( x mi ) S ( x mi )
T
1
( x m j ) S ( x m j ) || x m j ||
T
j i
Example
x=[0.1 0.5 0.1]';
m1=[0 0 0]'; m2=[0.5 0.5 0.5]';
m=[m1 m2];
z1=euclidean_classifier(m,x)
x=[0.1 0.5 0.1]';
m1=[0 0 0]'; m2=[0.5 0.5 0.5]';
m=[m1 m2];
S=[0.8 0.01 0.01;0.01 0.2 0.01; 0.01 0.01 0.2];
z2=mahalanobis_classifier(m,S,x);
z1 = 1 < z2 = 2
x is classified to w1
4. Matching by correlation
• Given a template w(s,t) (or mask), i.e. an m × n matrix,
find the a sub m × n matrix in f(x,y) such that it best
matches w, i.e. with largest correlation.
Correlation: c( x, y ) w( s, t ) f ( x s, y t )
s
t
Normalized correlation:
(x,y)=
[w(s, t ) w][ f ( x s, y t ) f ( x s, y t )]
s
t
{ [ w( s, t ) w]
2
s
t
s
t
[ f ( x s, y t ) f ( x s, y t )] }
2
s
t
w is the average of the template
f ( x s, y t ) is the average of the subimage under the template
Correlation theorem
f ( x, y ) w( x, y ) F (u, v) H (u, v)
*
f ( x, y ) w ( x, y ) F (u, v) H (u, v)
*
[M, N] = size(f);
f = fft2(f);
w = conj(fft2(w, M, N));
g = real(ifft2(w.*f));
Example
Case study
• Optical character recognition (OCR)
– Preprocessing
Digitization, make binary
Noise elimination, thinning, normalizing Feature Extraction
(by character, word part, word)
Segmentation (explicit or implicit)
Detection of major features (top-down approach)
– Matching
Recognition of character
Context verification from knowledge base
– Understanding and Action
• See the reference
Example
3. Optimum statistical classifiers
Let p (i / x) denote the probability that pattern x from class i
Li , j denote the loss if the classifier determine x from class i but it
actually j . The average loss
W
1 W
rj (x) Lk , j p (k / x)
Lk , j p (k / x) P(k )
p ( x) k 1
k 1
Simplied loss function
W
rj (x) Lk , j p (k / x) P(k )
k 1
Optimum statistical classifier (Bayes classifer):
Assigne x to i if ri ( x) min{rj ( x) : j 1,..., W }
i.e., assign to the class of the minimum loss.
Example :
Li , j 1 i , j , i , j 1 if i j otherwise i , j 0
rj (x) p ( x) p ( j / x) P( j )
d j (x) p ( j / x) P( j ), j 1,..., W
Bayes classifer for Gaussian pattern class
• Consider two patter classes with Gaussian distribution
of means m1 ,m2 ,and standard deviations 1 , 2
d j (x) p ( j / x) P ( j )
1
2 j
e
( xm j )
2 2j
, j 1, 2
N-dimensional case
1
( x m j )T Cj 1 ( x m j )
1
2
p ( j / x)
e
(2 ) n / 2 | C j |1/ 2
1
m j =E j{x}
Nj
x
x
j
1
C j E j{( x m j )(x m j ) }
Nj
T
T
T
xx
m
m
j
j
x j
1
1
d j (x) ln P( j ) ln | C j | [(x m j )T Cj 1 (x m j )]
2
2
Example
3
1
1
1
m1 1 , m 2 3
4
4
1
3
3 1 1
8 4 4
1
C C1 C2 1 3 1 , C1 4 8 4
6
1 1 3
4 4 8
1 T 1
T 1
d j ( x) x C m j m j C m j
2
d1 (x) 4 x1 1.5, d 2 (x) 4 x1 8 x2 8 x3 5.5
d1 (x) d 2 (x) 8 x1 8 x2 8 x3 4
A real example
Linear classifier
• Two classes
f ( x) w1 x1 ... wn xn w0 w x w0
T
x 1 if f ( x) 0;otherwise x 2
• f(x) is a separation hyperplane
• How to obtain the coefficients, or weights wi
• By perceptron algorithm
w(t 1) w(t ) t x x
xY
1, x 1
t is a constant. x
1, x 2
• How to obtain the coefficients, or weights wi
The Online Form of the Perceptron Algorithm
The Multiclass LS Classifier
• The classification rule is now as follows: Given x, classify
it to class ωi if