Transcript Document
0
Pattern
Classification
All materials in these slides were taken
from
Pattern Classification (2nd ed) by R. O.
Duda, P. E. Hart and D. G. Stork, John Wiley
& Sons, 2000
with the permission of the authors and
the publisher
Pattern Classification, Chapter 2 (Part 2)
Chapter 2 (Part 2):
Bayesian Decision Theory
(Sections 2.3-2.5)
• Minimum-Error-Rate Classification
• Classifiers, Discriminant Functions and Decision Surfaces
• The Normal Density
2
Minimum-Error-Rate Classification
• Actions are decisions on classes
If action i is taken and the true state of nature is j then:
the decision is correct if i = j and in error if i j
• Seek a decision rule that minimizes the probability
of error which is the error rate
Pattern Classification, Chapter 2 (Part 2)
3
• Introduction of the zero-one loss function:
0 i j
( i , j )
1 i j
i , j 1 ,..., c
Therefore, the conditional risk is:
j c
R( i | x ) ( i | j ) P ( j | x )
j 1
P( j | x ) 1 P( i | x )
j 1
“The risk corresponding to this loss function is the
average probability error”
Pattern Classification, Chapter 2 (Part 2)
4
• Minimize the risk requires maximize P(i | x)
(since R(i | x) = 1 – P(i | x))
• For Minimum error rate
• Decide i if P (i | x) > P(j | x) j i
Pattern Classification, Chapter 2 (Part 2)
5
• Regions of decision and zero-one loss function, therefore:
12 22 P ( 2 )
P( x | 1 )
Let
.
then decide 1 if :
21 11 P ( 1 )
P( x | 2 )
• If is the zero-one loss function which means:
0 1
1 0
P( 2 )
then
a
P( 1 )
0 2
2 P( 2 )
then
if
b
P( 1 )
1 0
Pattern Classification, Chapter 2 (Part 2)
6
Pattern Classification, Chapter 2 (Part 2)
Classifiers, Discriminant Functions
and Decision Surfaces
7
• The multi-category case
• Set of discriminant functions gi(x), i = 1,…, c
• The classifier assigns a feature vector x to class i
if:
gi(x) > gj(x) j i
Pattern Classification, Chapter 2 (Part 2)
8
Pattern Classification, Chapter 2 (Part 2)
9
• Let gi(x) = - R(i | x)
(max. discriminant corresponds to min. risk!)
• For the minimum error rate, we take
gi(x) = P(i | x)
(max. discrimination corresponds to max. posterior!)
gi(x) P(x | i) P(i)
gi(x) = ln P(x | i) + ln P(i)
(ln: natural logarithm!)
Pattern Classification, Chapter 2 (Part 2)
10
• Feature space divided into c decision regions
if gi(x) > gj(x) j i then x is in Ri
(Ri means assign x to i)
• The two-category case
• A classifier is a “dichotomizer” that has two discriminant
functions g1 and g2
Let g(x) g1(x) – g2(x)
Decide 1 if g(x) > 0 ; Otherwise decide 2
Pattern Classification, Chapter 2 (Part 2)
11
• The computation of g(x)
g( x ) P ( 1 | x ) P ( 2 | x )
P( x | 1 )
P( 1 )
ln
ln
P( x | 2 )
P( 2 )
Pattern Classification, Chapter 2 (Part 2)
12
Pattern Classification, Chapter 2 (Part 2)
13
The Normal Density
• Univariate density
• Density which is analytically tractable
• Continuous density
• A lot of processes are asymptotically Gaussian
• Handwritten characters, speech sounds are ideal or prototype
corrupted by random process (central limit theorem)
P( x )
2
1
1 x
exp
,
2
2
Where:
= mean (or expected value) of x
2 = expected squared deviation or variance
Pattern Classification, Chapter 2 (Part 2)
14
Pattern Classification, Chapter 2 (Part 2)
15
• Multivariate density
• Multivariate normal density in d dimensions is:
P( x )
1
( 2 )
d/2
1/ 2
1
t
1
exp ( x ) ( x )
2
where:
x = (x1, x2, …, xd)t (t stands for the transpose vector form)
= (1, 2, …, d)t mean vector
= d*d covariance matrix
|| and -1 are determinant and inverse respectively
Pattern Classification, Chapter 2 (Part 2)
16
Appendix
• Variance=S2
n
1
2
S
( xi x )
n 1 i 1
2
• Standard Deviation=S
Pattern Classification, Chapter 2 (Part 2)
17
Bays theorem
A
﹁A
B
A and B
﹁ A and B
﹁B
A and ﹁ B
﹁A and ﹁ B
P( A) P( B | A)
P( A | B)
P( A) P( B | A) P(A) P( B | A)
P( A) P( B | A)
P( A | B)
P( B)
Pattern Classification, Chapter 2 (Part 2)
18
Pattern Classification, Chapter 2 (Part 2)