Data Mining Classification: Naïve Bayes Classifier Introduction to Data Mining

Download Report

Transcript Data Mining Classification: Naïve Bayes Classifier Introduction to Data Mining

Data Mining
Classification: Naïve Bayes Classifier
Lecture Notes for Chapter 4 &5
Introduction to Data Mining
by
Tan, Steinbach, Kumar
© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
1
Classification: Definition
• Given a collection of records (training
set )
– Each record contains a set of attributes, one of the
attributes is the class.
• Find a model for class attribute as a
function of the values of other attributes.
• Goal: previously unseen records should
be assigned a class as accurately as
possible.
– A test set is used to determine the accuracy of the model. Usually,
the given data set is divided into training and test sets, with training
set used to build the model and test set used to validate it.
Illustrating Classification Task
Tid
Attrib1
Attrib2
Attrib3
Class
1
Yes
Large
125K
No
2
No
Medium
100K
No
3
No
Small
70K
No
4
Yes
Medium
120K
No
5
No
Large
95K
Yes
6
No
Medium
60K
No
7
Yes
Large
220K
No
8
No
Small
85K
Yes
9
No
Medium
75K
No
10
No
Small
90K
Yes
Learning
algorithm
Induction
Learn
Model
Model
10
Training Set
Tid
Attrib1
Attrib2
11
No
Small
55K
?
12
Yes
Medium
80K
?
13
Yes
Large
110K
?
14
No
Small
95K
?
15
No
Large
67K
?
10
Test Set
Attrib3
Apply
Model
Class
Deduction
Examples of Classification Task
• Predicting tumor cells as benign or malignant
• Classifying credit card transactions
as legitimate or fraudulent
• Classifying secondary structures of protein
as alpha-helix, beta-sheet, or random
coil
• Categorizing news stories as finance,
weather, entertainment, sports, etc
Classification Techniques
•
•
•
•
•
Decision Tree based Methods
Rule-based Methods
Memory based reasoning
Neural Networks
Naïve Bayes and Bayesian Belief
Networks
• Support Vector Machines
Example of a Decision Tree
Tid Refund Marital
Status
Taxable
Income Cheat
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced 95K
Yes
6
No
Married
No
7
Yes
Divorced 220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
60K
Splitting Attributes
Refund
Yes
No
NO
MarSt
Single, Divorced
TaxInc
< 80K
NO
NO
> 80K
YES
10
Training Data
Married
Model: Decision Tree
Another Example of Decision
Tree
MarSt
10
Tid Refund Marital
Status
Taxable
Income Cheat
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced 95K
Yes
6
No
Married
No
7
Yes
Divorced 220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
60K
Married
NO
Single,
Divorced
Refund
No
Yes
NO
TaxInc
< 80K
NO
> 80K
YES
There could be more than one tree that
fits the same data!
Decision Tree Classification
Task
Tid
Attrib1
Attrib2
Attrib3
Class
1
Yes
Large
125K
No
2
No
Medium
100K
No
3
No
Small
70K
No
4
Yes
Medium
120K
No
5
No
Large
95K
Yes
6
No
Medium
60K
No
7
Yes
Large
220K
No
8
No
Small
85K
Yes
9
No
Medium
75K
No
10
No
Small
90K
Yes
Tree
Induction
algorithm
Induction
Learn
Model
Model
10
Training Set
Tid
Attrib1
Attrib2
11
No
Small
55K
?
12
Yes
Medium
80K
?
13
Yes
Large
110K
?
14
No
Small
95K
?
15
No
Large
67K
?
10
Test Set
Attrib3
Apply
Model
Class
Deduction
Decision
Tree
Apply Model to Test Data
Test Data
Start from the root of tree.
Refund
Yes
10
No
NO
MarSt
Single, Divorced
TaxInc
< 80K
NO
Married
NO
> 80K
YES
Refund Marital
Status
Taxable
Income Cheat
No
80K
Married
?
Apply Model to Test Data
Test Data
Refund
Yes
10
No
NO
MarSt
Single, Divorced
TaxInc
< 80K
NO
Married
NO
> 80K
YES
Refund Marital
Status
Taxable
Income Cheat
No
80K
Married
?
Apply Model to Test Data
Test Data
Refund
Yes
10
No
NO
MarSt
Single, Divorced
TaxInc
< 80K
NO
Married
NO
> 80K
YES
Refund Marital
Status
Taxable
Income Cheat
No
80K
Married
?
Apply Model to Test Data
Test Data
Refund
Yes
10
No
NO
MarSt
Single, Divorced
TaxInc
< 80K
NO
Married
NO
> 80K
YES
Refund Marital
Status
Taxable
Income Cheat
No
80K
Married
?
Apply Model to Test Data
Test Data
Refund
Yes
10
No
NO
MarSt
Single, Divorced
TaxInc
< 80K
NO
Married
NO
> 80K
YES
Refund Marital
Status
Taxable
Income Cheat
No
80K
Married
?
Apply Model to Test Data
Test Data
Refund
Yes
Refund Marital
Status
Taxable
Income Cheat
No
80K
Married
?
10
No
NO
MarSt
Single, Divorced
TaxInc
< 80K
NO
Married
NO
> 80K
YES
Assign Cheat to “No”
Bayes Classifier
• A probabilistic framework for solving
classification problems
P ( A, C )
P (C | A) 
P ( A)
• Conditional Probability:
P ( A, C )
P( A | C ) 
P (C )
• Bayes theorem:
P( A | C ) P(C )
P(C | A) 
P( A)
Example of Bayes Theorem
• Given:
– A doctor knows that meningitis causes stiff neck 50% of the
time
– Prior probability of any patient having meningitis is 1/50,000
– Prior probability of any patient having stiff neck is 1/20
• If a patient has stiff neck, what’s the
probability he/she has meningitis?
P( S | M ) P( M ) 0.5 1 / 50000
P( M | S ) 

 0.0002
P( S )
1 / 20
Bayesian Classifiers
• Consider each attribute and class label as
random variables
• Given a record with attributes (A1, A2,…,An)
– Goal is to predict class C
– Specifically, we want to find the value of C that
maximizes P(C| A1, A2,…,An )
• Can we estimate P(C| A1, A2,…,An ) directly
from data?
Bayesian Classifiers
• Approach:
– compute the posterior probability P(C | A1, A2, …, An) for
all values of C using the Bayes theorem
P (C | A A  A ) 
1
2
n
P ( A A  A | C ) P (C )
P( A A  A )
1
2
n
1
2
n
– Choose value of C that maximizes
P(C | A1, A2, …, An)
– Equivalent to choosing value of C that maximizes
P(A1, A2, …, An|C) P(C)
• How to estimate P(A1, A2, …, An | C )?
Naïve Bayes Classifier
• Assume independence among attributes Ai
when class is given:
– P(A1, A2, …, An |C) = P(A1| Cj) P(A2| Cj)… P(An| Cj)
– Can estimate P(Ai| Cj) for all Ai and Cj.
– New point is classified to Cj if P(Cj)  P(Ai| Cj) is
maximal.
How to Estimate Probabilities
l
l
s
a
a
u
c
c
from
Data?
i
i
o
r
r
u
s
in
go
go
t
a
c
Tid
10
Refund
e
t
a
c
e
n
o
c
t
as
l
c
Marital
Status
Taxable
Income
Evade
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced
95K
Yes
6
No
Married
60K
No
7
Yes
Divorced
220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
• Class: P(C) = Nc/N
– e.g., P(No) = 7/10,
P(Yes) = 3/10
• For discrete attributes:
P(Ai | Ck) = |Aik|/ kNc
– where |Aik| is number of
instances having attribute
Ai and belongs to class Ck
– Examples:
P(Status=Married|No) = 4/7
P(Refund=Yes|Yes)=0
How to Estimate Probabilities
from Data?
• For continuous attributes:
– Discretize the range into bins
• one ordinal attribute per bin
• violates independence assumption
k
– Two-way split: (A < v) or (A > v)
• choose only one of the two splits as new attribute
– Probability density estimation:
• Assume attribute follows a normal distribution
• Use data to estimate parameters of distribution
(e.g., mean and standard deviation)
• Once probability distribution is known, can use it
to estimate the conditional probability P(Ai|c)
l
l
How togo Estimate
Probabilities
from
Data
?
o
n
s
i
g
a
c
i
r
c
Tid
e
at
Refund
c
a
c
i
r
e
at
Marital
Status
c
t
n
o
Taxable
Income
u
o
u
s
as
l
c
Evade
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced
95K
Yes
6
No
Married
60K
No
7
Yes
Divorced
220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
• Normal distribution:
1
P( A | c ) 
e
2
i
j

( Ai   ij ) 2
2  ij2
2
ij
– One for each (Ai,ci) pair
• For (Income, Class=No):
– If Class=No
• sample mean = 110
• sample variance = 2975
10
1
P( Income  120 | No) 
e
2 (54.54)

( 120110) 2
2 ( 2975)
 0.0072
Example of Naïve Bayes
Given a Test Record:
Classifier
X  (Refund  No, Married, Income  120K)
naive Bayes Classifier:
P(Refund=Yes|No) = 3/7
P(Refund=No|No) = 4/7
P(Refund=Yes|Yes) = 0
P(Refund=No|Yes) = 1
P(Marital Status=Single|No) = 2/7
P(Marital Status=Divorced|No)=1/7
P(Marital Status=Married|No) = 4/7
P(Marital Status=Single|Yes) = 2/7
P(Marital Status=Divorced|Yes)=1/7
P(Marital Status=Married|Yes) = 0
For taxable income:
If class=No:
sample mean=110
sample variance=2975
If class=Yes: sample mean=90
sample variance=25

P(X|Class=No) = P(Refund=No|Class=No)
 P(Married| Class=No)
 P(Income=120K| Class=No)
= 4/7  4/7  0.0072 = 0.0024

P(X|Class=Yes) = P(Refund=No| Class=Yes)
 P(Married| Class=Yes)
 P(Income=120K| Class=Yes)
= 1  0  1.2  10-9 = 0
Since P(X|No)P(No) > P(X|Yes)P(Yes)
Therefore P(No|X) > P(Yes|X)
=> Class = No
Naïve Bayes Classifier
• If one of the conditional probability is zero,
then the entire expression becomes zero
• Probability estimation:
N ic
Original : P( Ai | C ) 
Nc
N ic  1
Laplace : P( Ai | C ) 
Nc  c
N ic  mp
m - estimate : P( Ai | C ) 
Nc  m
c: number of classes
p: prior probability
m: parameter
Name
human
python
salmon
whale
frog
komodo
bat
pigeon
cat
leopard shark
turtle
penguin
porcupine
eel
salamander
gila monster
platypus
owl
dolphin
eagle
Give Birth
yes
Example of Naïve Bayes
Classifier
Give Birth
yes
no
no
yes
no
no
yes
no
yes
yes
no
no
yes
no
no
no
no
no
yes
no
Can Fly
no
no
no
no
no
no
yes
yes
no
no
no
no
no
no
no
no
no
yes
no
yes
Can Fly
no
Live in Water Have Legs
no
no
yes
yes
sometimes
no
no
no
no
yes
sometimes
sometimes
no
yes
sometimes
no
no
no
yes
no
Class
yes
no
no
no
yes
yes
yes
yes
yes
no
yes
yes
yes
no
yes
yes
yes
yes
no
yes
mammals
non-mammals
non-mammals
mammals
non-mammals
non-mammals
mammals
non-mammals
mammals
non-mammals
non-mammals
non-mammals
mammals
non-mammals
non-mammals
non-mammals
mammals
non-mammals
mammals
non-mammals
Live in Water Have Legs
yes
no
Class
?
A: attributes
M: mammals
N: non-mammals
6 6 2 2
P( A | M )      0.06
7 7 7 7
1 10 3 4
P( A | N )      0.0042
13 13 13 13
7
P( A | M ) P ( M )  0.06   0.021
20
13
P( A | N ) P( N )  0.004   0.0027
20
P(A|M)P(M) > P(A|N)P(N)
=> Mammals
Naïve Bayes (Summary)
• Robust to isolated noise points
• Handle missing values by ignoring the instance during
probability estimate calculations
• Robust to irrelevant attributes
• Independence assumption may not hold for some
attributes
– Use other techniques such as Bayesian Belief
Networks (BBN)