Transcript Chapter 2

Chapter 2:
Bayesian Decision Theory (Part 2)

Minimum-Error-Rate Classification

Classifiers, Discriminant Functions and Decision Surfaces

The Normal Density
All materials used in this course were taken from the textbook “Pattern Classification” by Duda et al., John Wiley & Sons, 2001
with the permission of the authors and the publisher
2
Minimum-Error-Rate Classification

Actions are decisions on classes
If action i is taken and the true state of nature is j then:
the decision is correct if i = j and in error if i  j

Seek a decision rule that minimizes the probability
of error which is the error rate
Dr. Djamel Bouchaffra
CSE 616 Applied Pattern Recognition, Chapter 2 , Section 2. 3
3

Introduction of the zero-one loss function:
0 i  j
 ( i ,  j )  
1 i  j
i, j  1,..., c
Therefore, the conditional risk is:
j c
R( i | x )   ( i |  j )P( j | x )
j 1
  P( j | x )  1  P(i | x )
j 1
“The risk corresponding to this loss function is the
average probability error”
Dr. Djamel Bouchaffra
CSE 616 Applied Pattern Recognition, Chapter 2 , Section 2. 3
4


Minimize the risk requires maximize P(i | x)
(since R(i | x) = 1 – P(i | x))
For Minimum error rate

Decide i if P (i | x) > P(j | x) j  i
Dr. Djamel Bouchaffra
CSE 616 Applied Pattern Recognition, Chapter 2 , Section 2. 3
5

Regions of decision and zero-one loss function,
therefore:
Let

 12   22 P(2 )
P(x | 1 )
.
   then decide 1 if :
 
 21   11 P(1 )
P(x | 2 )
If  is the zero-one loss function wich means:
 0 1


1
0


then   
P(2 )
 a
P(1 )
2P(2 )
0 2 
if   
 b
 then   
P(1 )
1 0
Dr. Djamel Bouchaffra
CSE 616 Applied Pattern Recognition, Chapter 2 , Section 2. 3
6
Dr. Djamel Bouchaffra
CSE 616 Applied Pattern Recognition, Chapter 2 , Section 2. 3
Classifiers, Discriminant Functions
and Decision Surfaces

The multi-category case


Dr. Djamel Bouchaffra
Set of discriminant functions gi(x), i = 1,…, c
The classifier assigns a feature vector x to class i
if: gi(x) > gj(x) j  i
CSE 616 Applied Pattern Recognition, Chapter 2 , Section 2. 4
7
8
Dr. Djamel Bouchaffra
CSE 616 Applied Pattern Recognition, Chapter 2 , Section 2. 4
9


Let gi(x) = - R(i | x)
(max. discriminant corresponds to min. risk!)
For the minimum error rate, we take
gi(x) = P(i | x)
(max. discrimination corresponds to max.
posterior!)
gi(x)  P(x | i) P(i)
gi(x) = ln P(x | i) + ln P(i)
(ln: natural logarithm!)
Dr. Djamel Bouchaffra
CSE 616 Applied Pattern Recognition, Chapter 2 , Section 2. 4
10

Feature space divided into c decision regions
if gi(x) > gj(x) j  i then x is in
Ri
(Ri means assign x to i)

The two-category case

A classifier is a “dichotomizer” that has two discriminant functions
g1 and g2
Let g(x)  g1(x) – g2(x)
Decide 1 if g(x) > 0; Otherwise decide 2
Dr. Djamel Bouchaffra
CSE 616 Applied Pattern Recognition, Chapter 2 , Section 2. 4
11

The computation of g(x)
g(x )  P(1 | x )  P(2 | x )
P(x | 1)
P(1)
 ln
 ln
P(x | 2 )
P(2 )
Dr. Djamel Bouchaffra
CSE 616 Applied Pattern Recognition, Chapter 2 , Section 2. 4
12
Dr. Djamel Bouchaffra
CSE 616 Applied Pattern Recognition, Chapter 2 , Section 2. 4
13
The Normal Density

Univariate density




Density which is analytically tractable
Continuous density
A lot of processes are asymptotically Gaussian
Handwritten characters, speech sounds are ideal or prototype
corrupted by random process (central limit theorem)
P(x ) 
 1  x   2 
1
exp  
 ,
2 
 2    
Where:
 = mean (or expected value) of x
2 = expected squared deviation or variance
Dr. Djamel Bouchaffra
CSE 616 Applied Pattern Recognition, Chapter 2 , Section 2. 5
14
Dr. Djamel Bouchaffra
CSE 616 Applied Pattern Recognition, Chapter 2 , Section 2. 5
15

Multivariate density

Multivariate normal density in d dimensions is:
 1

t 1
P(x ) 
exp  (x  )  (x  )
d / 2 1/ 2
 2

(2)

1
where:
x = (x1, x2, …, xd)t (t stands for the transpose vector form)
 = (1, 2, …, d)t mean vector
 = d*d covariance matrix
|| and -1 are determinant and inverse respectively
Dr. Djamel Bouchaffra
CSE 616 Applied Pattern Recognition, Chapter 2 , Section 2. 5