Bayesian Statistics and Belief Networks

Transcript Bayesian Statistics and Belief Networks

Bayesian Statistics and Belief Networks

Overview

• Book: Ch 8.3

• Refresher on Bayesian statistics • Bayesian classifiers • Belief Networks / Bayesian Networks

Why Should We Care?

• Theoretical framework for machine learning, classification, knowledge representation, analysis • Bayesian methods are capable of handling noisy, incomplete data sets • Bayesian methods are commonly in use today

Bayesian Approach To Probability and Statistics

• Classical Probability : Physical property of the world (e.g., 50% flip of a fair coin). True probability.

• Bayesian Probability : A person’s

degree of belief

in event X. Personal probability.

• Unlike classical probability, Bayesian probabilities benefit from but do not require repeated trials only focus on next event; e.g. probability Seawolves win next game?

Product Rule: Equating Sides: i.e.

Bayes Rule



  

               

P evidence Class P Class

)

P evidence

) All classification methods can be seen as estimates of Bayes’ Rule, with different techniques to estimate P(evidence|Class).

Simple Bayes Rule Example

Probability your computer has a virus, V, = 1/1000.

If virused, the probability of a crash that day, C, = 4/5.

Probability your computer crashes in one day, C, = 1/10.

P(C|V)=0.8

P(V)=1/1000 P(C)=1/10   )  Even though a crash is a strong indicator of a virus, we expect only 8/1000 crashes to be caused by viruses. Why not compute P(V|C) from direct evidence? Causal vs. Diagnostic knowledge; (consider if P(C) suddenly drops).

Bayesian Classifiers

  

P evidence Class P Class

)

P evidence

) If we’re selecting the single most likely class, we only need to find the class that maximizes P(e|Class)P(Class).

Hard part is estimating P(e|Class).

Evidence e typically consists of a set of observations:

 ( , 1 2 ,...,

e n

) Usual simplifying assumption is conditional independence: 

i n

  1

P e C

 

i n

  1 ( | )

Bayesian Classifier Example

Probability P(C) P(crashes|C) P(diskfull|C) C=Virus 0.4

0.1

0.6

C=Bad Disk 0.6

0.2

0.1

Given a case where the disk is full and computer crashes, the classifier chooses Virus as most likely since (0.4)(0.1)(0.6) > (0.6)(0.2)(0.1).

Beyond Conditional Independence

Linear Classifier: C1 C2 • Include second-order dependencies; i.e. pairwise combination of variables via joint probabilities: 2

( | )



P e c

( | )[ 1

  1

( | )]

Correction factor  Difficult to compute 

2  joint probabilities to consider

Belief Networks

• DAG that represents the dependencies between variables and specifies the joint probability distribution • Random variables make up the nodes • Directed links represent causal direct influences • Each node has a conditional probability table quantifying the effects from the parents • No directed cycles

Burglary Alarm Example

Burglary John Calls P(B) 0.001

Earthquake Alarm A T F P(J) 0.90

0.05

B E P(A) T T 0.95

T F 0.94

F T F F 0.29

0.001

Mary Calls A T F P(E) 0.002

P(M) 0.70

0.01

Sample Bayesian Network

Using The Belief Network

P(B) P(E) Burglary Earthquake 0.001

0.002

Alarm B E T T T F F T F F P(A) 0.95

0.94

0.29

0.001

P(J) 0.90

0.05

A P(M) John Calls A T F Mary Calls T F 0.70

0.01

( , 1 2 ,...

x n

) 

i n

  1 ( |

(

)) Probability of alarm, no burglary or earthquake, both John and Mary call: 

B P



) )( .

)( .

) 

Belief Computations

• Two types; both are NP-Hard • Belief Revision – Model explanatory/diagnostic tasks – Given evidence, what is the most likely hypothesis to explain the evidence?

– Also called abductive reasoning • Belief Updating – Queries – Given evidence, what is the probability of some other random variable occurring?

Belief Revision

• Given some evidence variables, find the state of all other variables that maximize the probability.

• E.g.: We know John Calls, but not Mary. What is the most likely state? Only consider assignments where J=T and M=F, and maximize. Best:

( 

)

( 

)

( 

| 

 

)

(

| 

)

( 

( 0 .

999 )( 0 .

998 )( 0 .

999 )( 0 .

05 )( 0 .

99 )  0 .

049 | 

)

Belief Updating

• Causal Inferences E Q • Diagnostic Inferences Q E • Intercausal Inferences • Mixed Inferences Q E E Q E

Causal Inferences

Inference from cause to effect. E.g. Given a Burglary P(B) 0.001

Alarm burglary, what is John Calls P(J|B)?

A T F P(J) 0.90

0.05

(

)  ?

(

) 

(

)

( 

)( 0 .

94 ) 

(

)

(

)( 0 .

95 )

(

)  1 ( 0 .

998 )( 0 .

94 )  1 ( 0 .

002 )( 0 .

95 ) B E T T T F F T F F Earthquake P(A) 0.95

0.94

0.29

0.001

Mary Calls

(

)  0 .

(

) 

(

)( 0 .

9 ) 

( 

)( 0 .

05 )

(

|  0 .

)  ( 0 .

94 )( 0 .

9 )  ( 0 .

06 )( 0 .

05 ) A T F P(E) 0.002

P(M) 0.70

0.01

P(M|B)=0.67 via similar calculations

Diagnostic Inferences

From effect to cause. E.g. Given that John calls, what is the P(burglary)?

(

) 

(

)

(

)

(

) What is P(J)? Need P(A) first:

(

) 

(

)

(

)( 0 .

95 ) 

( 

)

(

)( 0 .

29 ) 

(

)

( 

)( 0 .

94 ) 

( 

)

( 

)( 0 .

001 )

(

)  ( 0 .

001 )( 0 .

002 )( 0 .

95 )  ( 0 .

999 )( 0 .

002 )( 0 .

29 )  ( 0 .

001 )( 0 .

998 )( 0 .

94 )  ( 0 .

998 )( 0 .

999 )( 0 .

001 )

(

)  0 .

002517

(

) 

(

)( 0 .

9 ) 

( 

)( 0 .

05 )

(

)  ( 0 .

002517 )( 0 .

9 )  ( 0 .

9975 )( 0 .

05 )

(

)  0 .

052

(

)  ( 0 .

85 )( 0 .

001 )  0 .

016 ( 0 .

052 ) Many false positives.

Intercausal Inferences

Explaining Away Inferences.

Given an alarm, P(B|A)=0.37. But if we add the evidence that earthquake is true, then P(B|A^E)=0.003.

Even though B and E are independent, the presence of one may make the other more/less likely.

Mixed Inferences

Simultaneous intercausal and diagnostic inference.

E.g., if John calls and Earthquake is false:

(

^ 

)  0 .

(

^ 

)  0 .

017 Computing these values exactly is somewhat complicated.

Exact Computation Polytree Algorithm

• Judea Pearl, 1982 • Only works on singly-connected networks - at most one undirected path between any two nodes. • Backward-chaining Message-passing algorithm for computing posterior probabilities for query node X – Compute causal support for X, evidence variables “above” X – Compute evidential support for X, evidence variables “below” X

U(1 )

Polytree Computation

...

U(m)

E x

 X Z(1,j) Z(n,j)

E x

 ...

Y(1 ) Y(n )

(

X P

(

X P

(

E x

 | | |

)

E x



) )    

( 



X P

| (

E X



i yi

 )



(

) 

i x P

(

U P

(



y i

)

) 

E ui

)

(

y i

| Algorithm recursive, message

z j

) 

j P

(

z ij

| passing chain

E zij

y i

)

Other Query Methods

• Exact Algorithms – Clustering • Cluster nodes to make single cluster, message-pass along that cluster – Symbolic Probabilistic Inference • Uses d-separation to find expressions to combine • Approximate Algorithms – Select sampling distribution, conduct trials sampling from root to evidence nodes, accumulating weight for each node. Still tractable for dense networks.

• Forward Simulation • Stochastic Simulation

Summary

• Bayesian methods provide sound theory and framework for implementation of classifiers • Bayesian networks a natural way to represent conditional independence information. Qualitative info in links, quantitative in tables.

• NP-complete or NP-hard to compute exact values; typical to make simplifying assumptions or approximate methods.

• Many Bayesian tools and systems exist

References

• Russel, S. and Norvig, P. (1995). Artificial Intelligence, A Modern Approach. Prentice Hall.

• Weiss, S. and Kulikowski, C. (1991). Computer Systems That Learn. Morgan Kaufman.

• Heckerman, D. (1996).

A Tutorial on Learning with Bayesian Networks

. Microsoft Technical Report MSR-TR 95-06. • Internet Resources on Bayesian Networks and Machine Learning: http://www.cs.orst.edu/~wangxi/resource.html

Bayesian Statistics and Belief Networks

Transcript Bayesian Statistics and Belief Networks

Bayesian Statistics and Belief Networks

Overview

• Book: Ch 8.3

• Refresher on Bayesian statistics • Bayesian classifiers • Belief Networks / Bayesian Networks

Why Should We Care?

Bayesian Approach To Probability and Statistics

Bayes Rule

Simple Bayes Rule Example

Bayesian Classifiers

P e C

Bayesian Classifier Example

Beyond Conditional Independence

( | )

P e c

( | )[ 1

( | )]

Belief Networks

Burglary Alarm Example

Sample Bayesian Network

Using The Belief Network

Belief Computations

Belief Revision

Belief Updating

Causal Inferences

Diagnostic Inferences

Intercausal Inferences

Mixed Inferences

Exact Computation Polytree Algorithm

Polytree Computation

Other Query Methods

Summary

References

Directory