Transcript Chapter 2
Chapter 2:
Bayesian Decision Theory (Part 2)
Minimum-Error-Rate Classification
Classifiers, Discriminant Functions and Decision Surfaces
The Normal Density
All materials used in this course were taken from the textbook “Pattern Classification” by Duda et al., John Wiley & Sons, 2001
with the permission of the authors and the publisher
2
Minimum-Error-Rate Classification
Actions are decisions on classes
If action i is taken and the true state of nature is j then:
the decision is correct if i = j and in error if i j
Seek a decision rule that minimizes the probability
of error which is the error rate
Dr. Djamel Bouchaffra
CSE 616 Applied Pattern Recognition, Chapter 2 , Section 2. 3
3
Introduction of the zero-one loss function:
0 i j
( i , j )
1 i j
i, j 1,..., c
Therefore, the conditional risk is:
j c
R( i | x ) ( i | j )P( j | x )
j 1
P( j | x ) 1 P(i | x )
j 1
“The risk corresponding to this loss function is the
average probability error”
Dr. Djamel Bouchaffra
CSE 616 Applied Pattern Recognition, Chapter 2 , Section 2. 3
4
Minimize the risk requires maximize P(i | x)
(since R(i | x) = 1 – P(i | x))
For Minimum error rate
Decide i if P (i | x) > P(j | x) j i
Dr. Djamel Bouchaffra
CSE 616 Applied Pattern Recognition, Chapter 2 , Section 2. 3
5
Regions of decision and zero-one loss function,
therefore:
Let
12 22 P(2 )
P(x | 1 )
.
then decide 1 if :
21 11 P(1 )
P(x | 2 )
If is the zero-one loss function wich means:
0 1
1
0
then
P(2 )
a
P(1 )
2P(2 )
0 2
if
b
then
P(1 )
1 0
Dr. Djamel Bouchaffra
CSE 616 Applied Pattern Recognition, Chapter 2 , Section 2. 3
6
Dr. Djamel Bouchaffra
CSE 616 Applied Pattern Recognition, Chapter 2 , Section 2. 3
Classifiers, Discriminant Functions
and Decision Surfaces
The multi-category case
Dr. Djamel Bouchaffra
Set of discriminant functions gi(x), i = 1,…, c
The classifier assigns a feature vector x to class i
if: gi(x) > gj(x) j i
CSE 616 Applied Pattern Recognition, Chapter 2 , Section 2. 4
7
8
Dr. Djamel Bouchaffra
CSE 616 Applied Pattern Recognition, Chapter 2 , Section 2. 4
9
Let gi(x) = - R(i | x)
(max. discriminant corresponds to min. risk!)
For the minimum error rate, we take
gi(x) = P(i | x)
(max. discrimination corresponds to max.
posterior!)
gi(x) P(x | i) P(i)
gi(x) = ln P(x | i) + ln P(i)
(ln: natural logarithm!)
Dr. Djamel Bouchaffra
CSE 616 Applied Pattern Recognition, Chapter 2 , Section 2. 4
10
Feature space divided into c decision regions
if gi(x) > gj(x) j i then x is in
Ri
(Ri means assign x to i)
The two-category case
A classifier is a “dichotomizer” that has two discriminant functions
g1 and g2
Let g(x) g1(x) – g2(x)
Decide 1 if g(x) > 0; Otherwise decide 2
Dr. Djamel Bouchaffra
CSE 616 Applied Pattern Recognition, Chapter 2 , Section 2. 4
11
The computation of g(x)
g(x ) P(1 | x ) P(2 | x )
P(x | 1)
P(1)
ln
ln
P(x | 2 )
P(2 )
Dr. Djamel Bouchaffra
CSE 616 Applied Pattern Recognition, Chapter 2 , Section 2. 4
12
Dr. Djamel Bouchaffra
CSE 616 Applied Pattern Recognition, Chapter 2 , Section 2. 4
13
The Normal Density
Univariate density
Density which is analytically tractable
Continuous density
A lot of processes are asymptotically Gaussian
Handwritten characters, speech sounds are ideal or prototype
corrupted by random process (central limit theorem)
P(x )
1 x 2
1
exp
,
2
2
Where:
= mean (or expected value) of x
2 = expected squared deviation or variance
Dr. Djamel Bouchaffra
CSE 616 Applied Pattern Recognition, Chapter 2 , Section 2. 5
14
Dr. Djamel Bouchaffra
CSE 616 Applied Pattern Recognition, Chapter 2 , Section 2. 5
15
Multivariate density
Multivariate normal density in d dimensions is:
1
t 1
P(x )
exp (x ) (x )
d / 2 1/ 2
2
(2)
1
where:
x = (x1, x2, …, xd)t (t stands for the transpose vector form)
= (1, 2, …, d)t mean vector
= d*d covariance matrix
|| and -1 are determinant and inverse respectively
Dr. Djamel Bouchaffra
CSE 616 Applied Pattern Recognition, Chapter 2 , Section 2. 5