An Introduction to Pattern Recognition (Part Two)

Download Report

Transcript An Introduction to Pattern Recognition (Part Two)

Slide 1

Pattern Recognition
Ku-Yaw Chang
[email protected]
Assistant Professor, Department of
Computer Science and Information Engineering
Da-Yeh University


Slide 2

Outline
Introduction
Features and Classes
Supervised v.s. Unsupervised
Statistical v.s. Structural (Syntactic)
Statistical Decision Theory

2004/03/02

Pattern Recognition

2


Slide 3

Supervised v.s. Unsupervised
Supervised learning


Using a training set of patterns of known class
to classify additional similar samples

Unsupervised learning


Dividing samples into groups or clusters
based on measures of similarity without any
prior knowledge of class membership

2004/03/02

Pattern Recognition

3


Slide 4

Supervised v.s. Unsupervised
Dividing the class into two groups:
Supervised learning



Male features
Female features

Unsupervised learning





Male v.s. Female
Tall v.s. Short
With v.s. Without glasses


2004/03/02

Pattern Recognition

4


Slide 5

Statistical v.s. Structural
Statistical PR


To obtain features by manipulating the
measurements as purely numerical (or
Boolean) variables

Structural (Syntactic) PR


To design features in some intuitive way
corresponding to human perception of the
objects

2004/03/02

Pattern Recognition

5


Slide 6

Statistical v.s. Structural
Optical Character Recognition (OCR)



Statistical PR
Structural PR

2004/03/02

Pattern Recognition

6


Slide 7

Statistical Decision Theory
An automated classification system



Classified data sets
Selected features

2004/03/02

Pattern Recognition

7


Slide 8

Statistical Decision Theory
Hypothetical Basketball Association (HBA)



apg : average number of points per game
To predict the winner of the game
Based on the difference between the home team’s
apg and the visiting team’s apg for previous games



Training set
Scores of previously played games
Home team classified as a winner or a loser

2004/03/02

Pattern Recognition

8


Slide 9

Statistical Decision Theory
Given a game to be played, predict the
home team to be a winner or loser using
the feature:
dapg = Home Team apg – Visiting Team apg

2004/03/02

Pattern Recognition

9


Slide 10

Statistical Decision Theory
Game

dapg

Home Team

Game

dapg

Home Team

1

1.3

Won

16

-3.1

Won

2

-2.7

Lost

17

1.7

Won

3

-0.5

Won

18

2.8

Won

4

-3.2

Lost

19

4.6

Won

5

2.3

Won

20

3.0

Won

6

5.1

Won

21

0.7

Lost

7

-5.4

Lost

22

10.1

Won

8

8.2

Won

23

2.5

Won

9

-10.8

Lost

24

0.8

Lost

10

-0.4

Won

25

-5.0

Lost

11

10.5

Won

26

8.1

Won

12

-1.1

Lost

27

-7.1

Lost

13

2.5

Won

28

2.7

Won

14

-4.2

Won

29

-10.0

Lost

15

-3.4

Lost

30

-6.5

Won

2004/03/02

Pattern Recognition

10


Slide 11

Statistical Decision Theory
A histogram of dapg
10
8
6
Number

Lost
Won

4
2

9

5

1

-3

-7

-1

1

0

dapg
2004/03/02

Pattern Recognition

11


Slide 12

Statistical Decision Theory
The classification cannot be performed
perfectly using the single feature dapg.



Probability of membership in each class
With the smallest expected penalty

Decision boundary or threshold


The value T for Home Team
Won: dapg is less than or equal to T
Lost: dapg is greater than T

2004/03/02

Pattern Recognition

12


Slide 13

Statistical Decision Theory
T = -1



Home team’s apg = 103.4
Visiting team’s apg = 102.1

dapg = 103.4 – 102.1 = 1.3 and 1.3 > T


Home team will win the game

T = 0.8 or -6.5


T = 0.8 achieves the minimum error rate

2004/03/02

Pattern Recognition

13


Slide 14

Statistical Decision Theory
Adding an additional feature to increase
the accuracy of classification



dwp = Home Team wp – Visiting Team wp
wp denotes the winning percentage

2004/03/02

Pattern Recognition

14


Slide 15

Statistical Decision Theory
Game

dapg

dwp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

1.3
-2.7
-0.5
-3.2
2.3
5.1
-5.4
8.2
-10.8
-0.4
10.5
-1.1
2.5
-4.2
-3.4

25.0
-16.9
5.3
-27.5
-18.0
31.2
5.8
34.3
-56.3
13.3
16.3
-17.6
5.7
16.0
-3.4

2004/03/02

Home
Team
Won
Lost
Won
Lost
Won
Won
Lost
Won
Lost
Won
Won
Lost
Won
Won
Lost

Game

dapg

dwp

16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

-3.1
1.7
2.8
4.6
3.0
0.7
10.1
2.5
0.8
-5.0
8.1
-7.1
2.7
-10.0
-6.5

9.4
6.8
17.0
13.3
-24.0
-17.8
44.6
-22.4
12.3
-3.8
36.0
-20.6
23.2
-46.9
19.7

Pattern Recognition

Home
Team
Won
Won
Won
Won
Won
Lost
Won
Won
Lost
Lost
Won
Lost
Won
Lost
Won

15


Slide 16

Statistical Decision Theory
Feature vector (dapg, dwp)


Presented as a scatterplot
40

W
W W
W

dwp

20

WW
WW
W
W W
L
W
W
L L
W WW
L
L L L
W
W
W
L

0
-20
-40
-60

W

L
L
-10

-5

0

5

10

dapg
2004/03/02

Pattern Recognition

16


Slide 17

Statistical Decision Theory
The feature space can be divided into two
decision region by a straight line


Linear decision boundary

If a feature space cannot be perfectly separated
by a straight line, a more complex boundary
line might be used.

2004/03/02

Pattern Recognition

17


Slide 18

Exercise One
The values of a feature x for nine samples
from class A are 1, 2, 3, 3, 4, 4, 6, 6, 8.
Nine samples from class B had x values of
4, 6, 7, 7, 8, 9, 9, 10, 12. Make a
histogram (with an interval width of 1) for
each class and find a decision boundary
(threshold) that minimizes the total number
of misclassifications for this training data
set.
2004/03/02

Pattern Recognition

18


Slide 19

Exercise Two
Can the feature vectors (x,y) = (2,3), (3,5),
(4,2), (2,7) from class A be separated from
four samples from class B located at (6,2),
(5,4), (5,6), (3,7) by a linear decision
boundary? If so, give the equation of one
such boundary and plot it. If not, find a
boundary that separates them as well as
possible.
2004/03/02

Pattern Recognition

19