Transcript Slide 1

Statistical Reasoning :
Probabilitas Bayes
Yeni Herdiyeni
Departemen Ilmu Komputer
Institut Pertanian Bogor
“The World is
very Uncertain Place”
Ketidakpastian
•
•
•
•
•
Presiden Indonesia tahun 2014 adalah Perempuan
Tahun depan saya akan lulus
Jumlah penduduk Indonesia sekitar 200 juta
Tahun depan saya akan naik jabatan
Jika gejalanya adalah demam tinggi maka penyakitnya
adalah tipes
• Dll
• Ketidakpastian menyulitkan penyelesaian masalah
Ketidakpastian
• Randomness
• Fuzziness
Assume that we have two classes
c1 = male, and c2 = female.
We have a person whose sex we do not know,
say “drew” or d.
Classifying drew as male or female is
equivalent to asking is it more probable that
drew is male or female, I.e which is greater
p(male | drew) or p(female | drew)
(Note: “Drew
can be a male
or female
name”)
Drew Barrymore
Drew Carey
What is the probability of being called
“drew” given that you are a male?
p(male | drew) = p(drew | male ) p(male)
p(drew)
What is the probability
of being a male?
What is the probability of
being named “drew”?
(actually irrelevant, since it is
that same for all classes)
Interpreting the axioms
•
•
•
•
0 <= P(A) <= 1
P(True) = 1
P(False) = 0
P(A or B) = P(A) + P(B) - P(A and B)
The area of A can’t get any
bigger than 1
And an area of 1 would mean
all worlds will have A true
Interpreting the axioms
•
•
•
•
0 <= P(A) <= 1
P(True) = 1
P(False) = 0
P(A or B) = P(A) + P(B) - P(A and B)
A
B
Interpreting the axioms
•
•
•
•
0 <= P(A) <= 1
P(True) = 1
P(False) = 0
P(A or B) = P(A) + P(B) - P(A and B)
A
P(A or B)
B
Simple addition and subtraction
P(A and B)
B
Basic Probability Theory
• Probability can be expressed mathematically as a numerical index with a
range between zero (an absolute impossibility) to unity (an absolute
certainty).
• Most events have a probability index strictly between 0 and 1, which
means that each event has at least two possible outcomes: favourable
outcome or success, and unfavourable outcome or failure.
Psuccess  
the number of successes
the number of possible outcomes
P failure  
the number of failures
the number of possible outcomes
14
Basic Probability Theory
• If s is the number of times success can occur, and f is the number of times
failure can occur, then
• and
Psuccess   p 
s
s f
P failure   q 
f
s f
p+q=1
• If we throw a coin, the probability of getting a head will be equal to the
probability of getting a tail. In a single throw, s = f = 1, and therefore the
probability of getting a head (or a tail) is 0.5.
15
Conditional Probability
• Let A be an event in the world and B be another event. Suppose that
events A and B are not mutually exclusive, but occur conditionally on the
occurrence of the other. The probability that event A will occur if event B
occurs is called the conditional probability. Conditional probability is
denoted mathematically as p(A|B) in which the vertical bar represents
"given" and the complete probability expression is interpreted as
– “Conditional probability of event A occurring given that event B has occurred”.
pA B  
the number of times A and B can occur
the number of times B can occur
16
Conditional Probability
• The number of times A and B can occur, or the probability that both A and
B will occur, is called the joint probability of A and B. It is represented
mathematically as p(A B). The number of ways B can occur is the
probability of B, p(B), and thus
pA B  
p A  B 
p B 
• Similarly, the conditional probability of event B occurring given that event
A has occurred equals
pB A 
p  B  A
p  A
17
Conditional Probability
Hence
and
pB  A  pB A p A
p A  B  pB A p A
Substituting the last equation into the equation
pA B  
p A  B 
p B 
yields the Bayesian rule:
18
Naïve Bayes Classifier
Thomas Bayes
1702 - 1761
We will start off with a visual intuition, before looking at the math…
Bayesian Rule
p A B  
pB A p A
p B 
where:
p(A|B) is the conditional probability that event A occurs given that event B
has occurred;
p(B|A) is the conditional probability of event B occurring given that event A
has occurred;
p(A) is the probability of event A occurring;
p(B) is the probability of event B occurring.
20
Peluang (Probability)
 P(A) : Peluang terjadinya A
Visualizing A
Event space of all
possible worlds
Worlds in which
A is true
Its area is 1
Worlds in which A is False
P(A) = Area of
reddish oval
•
•
•
•
Aksioma Peluang (Probability)
0 <= P(A) <= 1
P(True) = 1
P(False) = 0
P(A or B) = P(A) + P(B) - P(A and B)
Interpreting the axioms
•
•
•
•
0 <= P(A) <= 1
P(True) = 1
P(False) = 0
P(A or B) = P(A) + P(B) - P(A and B)
The area of A can’t get any
smaller than 0
And a zero area would mean
no world could ever have A
true
The Joint Probability
n

i 1
p A  Bi    pA Bi  pBi 
n
i 1
A
B4
B3
B1
B2
24
The Joint Probability
• If the occurrence of event A depends on only two mutually exclusive
events, B and NOT B, we obtain:
p(A) = p(AB)  p(B) + p(AB)  p(B)
where  is the logical function NOT.
• Similarly,
p(B) = p(BA)  p(A) + p(BA)  p(A)
• Substituting this equation into the Bayesian rule yields:
pA B  
p B A p A
p B A p A  p B A pA
25
Bayesian Reasoning
• Suppose all rules in the knowledge base are represented in the following
form:
IF
THENH
E
is true
is true {with probability p}
• This rule implies that if event E occurs, then the probability that event H
will occur is p.
• In expert systems, H usually represents a hypothesis and E denotes
evidence to support this hypothesis.
Bogdan L. Vrusias © 2006
26
Bayesian Reasoning
The Bayesian rule expressed in terms of hypotheses and evidence looks like
this:
p H E  
p E H  pH 
p E H  pH   p E H  pH 
where:
p(H) is the prior probability of hypothesis H being true;
p(E|H) is the probability that hypothesis H being true will result in evidence E;
p(H) is the prior probability of hypothesis H being false;
p(E|H) is the probability of finding evidence E even when hypothesis H is
false.
21st September 2006
Bogdan L. Vrusias © 2006
27
Bayesian Reasoning
• In expert systems, the probabilities required to solve a problem are
provided by experts.
• An expert determines the prior probabilities for possible hypotheses p(H)
and p(H), and also the conditional probabilities for observing evidence E
if hypothesis H is true, p(E|H), and if hypothesis H is false, p(E|H).
• Users provide information about the evidence observed and the expert
system computes p(H|E) for hypothesis H in light of the user-supplied
evidence E. Probability p(H|E) is called the posterior probability of
hypothesis H upon observing evidence E.
21st September 2006
Bogdan L. Vrusias © 2006
28
Bayesian Reasoning
• We can take into account both multiple hypotheses H1, H2,..., Hm and
multiple evidences E1, E2,..., En. The hypotheses as well as the evidences
must be mutually exclusive and exhaustive.
• Single evidence E and multiple hypotheses follow:
pH i E  
pE H i  pH i 
m
 pE H k  pH k 
k 1
• Multiple evidences and multiple hypotheses follow:
pH i E1 E2 . . . En  
pE1 E2 . . . En H i  pH i 
m
 pE1 E2 . . . En H k  pH k 
k 1
21st September 2006
Bogdan L. Vrusias © 2006
29
Bayesian Reasoning
• This requires to obtain the conditional probabilities of all possible
combinations of evidences for all hypotheses, and thus places an
enormous burden on the expert.
• Therefore, in expert systems, conditional independence among different
evidences assumed. Thus, instead of the unworkable equation, we attain:
pH i E1 E2 . . . En  
pE1 H i  pE2 H i  . . .  pEn H i  pH i 
m
 pE1 H k  pE2 H k  . . .  pEn H k  pH k 
k 1
21st September 2006
Bogdan L. Vrusias © 2006
30
Ranking Potentially True Hypotheses
• Let us consider a simple example:
– Suppose an expert, given three conditionally
independent evidences E1, E2,..., En, creates three
mutually exclusive and exhaustive hypotheses H1,
H2,..., Hm, and provides prior probabilities for these
hypotheses – p(H1), p(H2) and p(H3), respectively.
The expert also determines the conditional
probabilities of observing each evidence for all
possible hypotheses.
21st September 2006
Bogdan L. Vrusias © 2006
31
The Prior and Conditional Probabilities
Probability
Hypothesis
i =1
i =2
i=3
p H i 
0.40
0.35
0.25
p E1 H i 
0.3
0.8
0.5
p E 2 H i 
0.9
0.0
0.7
p E3 H i 
0.6
0.7
0.9
Assume that we first observe evidence E3. The expert system computes the
posterior probabilities for all hypotheses as:
21st September 2006
Bogdan L. Vrusias © 2006
32
The Prior and Conditional Probabilities
pH i E3  
pE3 H i  pH i 
3
 pE3 H k  pH k 
,
i = 1, 2, 3
k 1
thus
pH1 E3  
0.6  0.40
 0.34
0.6  0.40 + 0.7  0.35 + 0.9  0.25
pH 2 E3  
0.7  0.35
 0.34
0.6  0.40 + 0.7  0.35 + 0.9  0.25
pH 3 E3  
0.9  0.25
 0.32
0.6  0.40 + 0.7  0.35 + 0.9  0.25
After evidence E3 is observed, belief in hypothesis H2 increases and becomes
equal to belief in hypothesis H1. Belief in hypothesis H3 also increases and
even nearly reaches beliefs in hypotheses H1 and H2.
21st September 2006
Bogdan L. Vrusias © 2006
33
The Prior and Conditional Probabilities
Suppose now that we observe evidence E1. The posterior probabilities are
calculated as
pH i E1E3  
3
pE1 H i  pE3 H i  pH i 
 pE1 H k  pE3 H k  pH k 
,
i = 1, 2, 3
k 1
hence
pH1 E1E3  
0.3  0.6  0.40
 0.19
0.3  0.6  0.40 + 0.8  0.7  0.35 + 0.5  0.9  0.25
pH 2 E1E3  
0.8  0.7  0.35
 0.52
0.3  0.6  0.40 + 0.8  0.7  0.35 + 0.5  0.9  0.25
pH 3 E1E3  
0.5  0.9  0.25
 0.29
0.3  0.6  0.40 + 0.8  0.7  0.35 + 0.5  0.9  0.25
Hypothesis H2 has now become the most likely one.
21st September 2006
Bogdan L. Vrusias © 2006
34
The Prior and Conditional Probabilities
After observing evidence E2, the final posterior probabilities for all hypotheses are
calculated:
pH i E1E2 E3  
3
hence
pE1 H i  pE2 H i  pE3 H i  pH i 
 pE1 H k  pE2 H k  pE3 H k  pH k 
,
i = 1, 2, 3
k 1
pH1 E1E2 E3  
0.3  0.9  0.6  0.40
 0.45
0.3  0.9  0.6  0.40 + 0.8  0.0  0.7  0.35 + 0.5  0.7  0.9  0.25
pH 2 E1E2 E3  
0.8  0.0  0.7  0.35
0
0.3  0.9  0.6  0.40 + 0.8  0.0  0.7  0.35 + 0.5  0.7  0.9  0.25
pH 3 E1E2 E3  
0.5  0.7  0.9  0.25
 0.55
0.3  0.9  0.6  0.40 + 0.8  0.0  0.7  0.35 + 0.5  0.7  0.9  0.25
Although the initial ranking was H1, H2 and H3, only hypotheses H1 and H3 remain
under consideration after all evidences (E1, E2 and E3) were observed.
21st September 2006
Bogdan L. Vrusias © 2006
35
Naive Bayesian Classifier (II)
 Given a training set, we can compute the probabilities
Outlook
sunny
sunny
overcast
rain
rain
rain
overcast
sunny
sunny
rain
sunny
overcast
overcast
rain
Temperature Humidity Windy Class
hot
high
false
N
hot
high
true
N
hot
high
false
P
mild
high
false
P
cool
normal false
P
cool
normal true
N
cool
normal true
P
mild
high
false
N
cool
normal false
P
mild
normal false
P
mild
normal true
P
mild
high
true
P
hot
normal false
P
mild
high
true
N
O u tlo o k
su n n y
o verc ast
rain
T em p reatu re
hot
m ild
cool
P
2 /9
4 /9
3 /9
2 /9
4 /9
3 /9
N
3 /5
0
2 /5
2 /5
2 /5
1 /5
H u m id ity P
h ig h
3 /9
n o rm al 6 /9
N
4 /5
1 /5
W in d y
tru e
false
3 /5
2 /5
3 /9
6 /9
Play-tennis
example:
estimating
P(x
|C)
i
p A  B 
pA B  
Outlook
sunny
sunny
overcast
rain
rain
rain
overcast
sunny
sunny
rain
sunny
overcast
overcast
rain
p B 
Temperature Humidity Windy Class
hot
high
false
N
hot
high
true
N
hot
high
false
P
mild
high
false
P
cool
normal false
P
cool
normal true
N
cool
normal true
P
mild
high
false
N
cool
normal false
P
mild
normal false
P
mild
normal true
P
mild
high
true
P
hot
normal false
P
mild
high
true
N
outlook
P(sunny|p) = 2/9
P(sunny|n) = 3/5
P(overcast|p) = 4/9
P(overcast|n) = 0
P(rain|p) = 3/9
P(rain|n) = 2/5
temperature
P(hot|p) = 2/9
P(hot|n) = 2/5
P(mild|p) = 4/9
P(mild|n) = 2/5
P(cool|p) = 3/9
P(cool|n) = 1/5
humidity
P(p) = 9/14
P(high|p) = 3/9
P(high|n) = 4/5
P(normal|p) = 6/9
P(normal|n) = 2/5
P(n) = 5/14
windy
P(true|p) = 3/9
P(true|n) = 3/5
P(false|p) = 6/9
P(false|n) = 2/5
Example : Naïve Bayes
Predict playing tennis in the day with the condition <sunny, cool, high, strong> (P(v|
o=sunny, t= cool, h=high w=strong)) using the following training data:
Day
Outlook Temperature
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Sunny
Sunny
Overcast
Rain
Rain
Rain
Overcast
Sunny
Sunny
Rain
Sunny
Overcast
Overcast
Rain
Hot
Hot
Hot
Mild
Cool
Cool
Cool
Mild
Cool
Mild
Mild
Mild
Hot
Mild
Humidity
High
High
High
High
Normal
Normal
Normal
High
Normal
Normal
Normal
High
Normal
High
Wind
Play Tennis
Weak
Strong
Weak
Weak
Weak
Strong
Strong
Weak
Weak
Weak
Strong
Strong
Weak
Strong
No
No
Yes
Yes
Yes
No
Yes
No
Yes
Yes
Yes
Yes
Yes
No
# days of playing tennise with strong wind
# days of playing tennise
p( y ) p( sun | y ) p(cool | y ) p(high | y ) p( strong | y )  .005
p(n) p( sun | n) p(cool | n) p(high | n) p( strong | n)  .021
we have :
tentu
Tentukan jika usia=muda, berat = overweight dan kelamin
= wanita, apakah terkena hipertensi atau tidak? Hitung
peluangnya..!
=====================
USIA Y
T
------------------muda 1/3 3/5
tua 2/3 2/5
BERAT Y
T
------------------under 0/3 2/5
avg 0/3 2/5
over 3/3 1/5
KLMN Y
T
------------------pria 2/3 4/5
wanita 1/3 1/5
1/3 == p(muda dan ya) / p(ya)
Tentukan jika usia=muda, berat = overweight dan kelamin
= wanita, apakah terkena hipertensi atau tidak? Hitung
peluangnya..!
(3/8)(1/3).(3/3).(1/3) = 3/72 = 1/24