How to Write and Present Class 6: Results

Download Report

Transcript How to Write and Present Class 6: Results

A Practical Course in Graphical Bayesian Modeling; Class 1 Eric-Jan Wagenmakers

Outline

 A bit of probability theory  Bayesian foundations  Parameter estimation: A simple example  WinBUGS and R2WinBUGS

Probability Theory (Wasserman, 2004)  The

sample space

Ω is the set of possible outcomes of an experiment.

 If we toss a coin twice then Ω = {HH, HT, TH, TT}.

 The

event

that the first toss is heads is

A

= {HH, HT}.

Probability Theory (Wasserman, 2004) 

A B

denotes intersection: “ A and B ” 

A B

denotes union: “ A or B

Probability Theory (Wasserman, 2004)

P

is a probability measure when the following axioms are satisfied: 1. Probabilities are never negative:  0 2. Probabilities add to 1.

P

1 2. The probability of the union of non-overlapping (disjoint) events is its sum:

P

i

 1

A i

  

i

   1  

i

Probability Theory (Wasserman, 2004) For any events

A

and

B

:

B

  Ω A B 

 

Conditional Probability

The conditional probability of

A

     given

B

is Ω A B

Conditional Probability

You will often encounter this as

   

 

Ω A B

Conditional Probability

From

 

 and

 

 follows Bayes’ rule.

       

Bayes’ Rule

 

P B

    

The Law of Total Probability Let

A 1

,…,

A k

be a partition of Ω. Then, for any event

B

: 

i k

  1

i

  

i

The Law of Total Probability This is just a weighted average of P(B) over the disjoint sets

A 1

,…,

A k

. For instance, when all P(

A i

) are equal, the equation becomes: 

k

1

i k

  1 

i

Bayes’ Rule Revisited

i

i k

  1

 

i

  

i i

  

i

Example (Wasserman, 2004)  I divide my Email into three categories: “spam”, “low priority”, and “high priority”.

 Previous experience suggests that the a priori probabilities of a random Email belonging to these categories are

.7

,

.2

, and

.1

, respectively.

Example (Wasserman, 2004)  The probabilities of the word “free” occurring in the three categories is

.9

,

.01

,

.01

, respectively.

 I receive an Email with the word “free”. What is the probability that it is spam?

Outline

 A bit of probability theory  Bayesian foundations  Parameter estimation: A simple example  WinBUGS and R2WinBUGS

The Bayesian Agenda

 Bayesians use probability to quantify uncertainty or “degree of belief” about parameters and hypotheses.

 Prior knowledge for a parameter θ is updated through the data to yield the posterior knowledge.

The Bayesian Agenda

P

 |

D

 | 

    

  |   Also note that this equation allows one to learn, from the probability of what is observed, something about what is not observed.

The Bayesian Agenda

 But why would one measure “degree of belief” by means of probability? Couldn’t we choose something else that makes sense?

 Yes, perhaps we can, but the choice of probability is anything but ad-hoc.

The Bayesian Agenda

 Assume “degree of belief” can be measured by a single number.

 Assume you are rational, that is, not self contradictory or “obviously silly”.

Then

degree of belief can be shown to follow the same rules as the probability calculus.

The Bayesian Agenda

 For instance, a rational agent would not hold intransitive beliefs, such as:

The Bayesian Agenda

 When you use a single number to measure uncertainty or quantify evidence, and these numbers do not follow the rules of probability calculus, you can (almost certainly?) be shown to be silly or

incoherent

.  One of the theoretical attractions of the Bayesian paradigm is that it ensures

coherence

right from the start.

Coherence Example a la De Finetti  There exists a ticket that says “If the French national soccer team wins the 2010 World Cup, this ticket pays $1.”   You must determine the fair price for this ticket.

After you set the price, I can choose to either sell the ticket to you, or to buy the ticket from you. This is similar to how you would divide a pie according to the rule “you cut, I choose”.

 Please write this number down, you are not allowed to change it later!

Coherence Example a la De Finetti  There exists another ticket that says “If the Spanish national soccer team wins the 2010 World Cup, this ticket pays $1.”  You must again determine the fair price for this ticket.

Coherence Example a la De Finetti  There exists a third ticket that says “If either the French or the Spanish national soccer team wins the 2010 World Cup, this ticket pays $1.”  What is the fair price for this ticket?

Bayesian Foundations

 Bayesians use probability to quantify uncertainty or “degree of belief” about parameters and hypotheses.

 Prior knowledge for a parameter θ is updated through the data to yield posterior knowledge.

 This happens through the use of probability calculus.

Bayes’ Rule

P

 |

D

 Likelihood | 

P D

  

Prior Distribution Posterior Distribution Marginal Probability of the Data

Bayesian Foundations

P

 |

D

 | 

    

  |   This equation allows one to learn, from the probability of what is observed, something about what is not observed. Bayesian statistics was long known as “inverse probability”.

Nuisance Variables

 Suppose θ is the mean of a normal distribution, and α is the standard deviation.   You are interested in θ, but not in α.

Using the Bayesian paradigm, how can you go from P(θ, α |

x

) to P(θ |

x

)? That is, how can you get rid of the nuisance parameter α? Show how this involves P(α).

Nuisance Variables

P

  |

x

  

P

 

P

   

P

  

 

 | |  

Predictions

 Suppose you observe data

x

, and you use a model with parameter θ.

 What is your prediction for new data

y

, given that you’ve observed

x

? In other words, show how you can obtain P(

y

|

x

).

Predictions

       |  

Want to Know More?

Outline

 A bit of probability theory  Bayesian foundations  Parameter estimation: A simple example  WinBUGS and R2WinBUGS

Bayesian Parameter Estimation: Example  We prepare for you a series of 10 factual true/false questions of equal difficulty.

 You answer 9 out of 10 questions correctly.

 What is your latent probability θ of answering any one question correctly?

Bayesian Parameter Estimation: Example  We start with a prior distribution for θ. This reflect all we know about θ prior to the experiment. Here we make a standard choice and assume that all values of θ are equally likely

a priori

.

Bayesian Parameter Estimation: Example  We then update the prior distribution by means of the data (technically, the

likelihood

) to arrive at a posterior distribution .

The Likelihood

 We use the binomial model, in which P(D|θ) is given by |   

n

  

s

 1    where n =10 is the number of trials, and s=9 is the number of successes.

Bayesian Parameter Estimation: Example  The posterior distribution is a compromise between what we knew before the experiment (i.e., the prior) and what we have learned from the experiment (i.e., the likelihood). The posterior distribution reflects all that we know about θ.

Mode = 0.9

95% confidence interval: (0.59, 0.98)

Bayesian Parameter Estimation: Example  Sometimes it is difficult or impossible to obtain the posterior distribution analytically.

 In this case, we can use Markov chain Monte Carlo algorithms to sample from the posterior. As the number of samples increases, the approximation to the analytical posterior becomes arbitrarily small.

Mode = 0.89

95% confidence interval: (0.59, 0.98) With 9000 samples, almost identical to analytical result.

Outline

 A bit of probability theory  Bayesian foundations  Parameter estimation: A simple example  WinBUGS and R2WinBUGS

WinBUGS

Bayesian inference Using Gibbs Sampling You want to have this installed (plus the registration key)

WinBUGS

 Knows many probability distributions (likelihoods);  Allows you to specify a model;  Allows you to specify priors;  Will then automatically run the MCMC sampling routines and produce output.

Want to Know More About MCMC?

Models in WinBUGS

 The models you can specify in WinBUGS are

directed acyclical graphs

(DAGs).

Models in WinBUGS (Spiegelhalter, 1998) Below, E depends only on C B D A C E

Models in WinBUGS (Spiegelhalter, 1998) If the nodes are stochastic, the joint distribution factorizes… B D A C E

Models in WinBUGS (Spiegelhalter, 1998) P(A,B,C,D,E) = P(A) P(B) P(C|A,B) P(D|A,B) P(E|C) B D A C E

Models in WinBUGS (Spiegelhalter, 1998) This means we can sometimes perform “local” computations to get what we want B D A C E

Models in WinBUGS (Spiegelhalter, 1998) What is P(C|A,B,D,E)? B D A C E

Models in WinBUGS (Spiegelhalter, 1998) P(C|A,B,D,E) is proportional to P(C|A,B) P(E|C)  D is irrelevant B D A C E

WinBUGS & R

 WinBUGS produces MCMC samples.  We want to analyze the output in a nice program, such as R.

 This can be accomplished using the R package “R2WinBUGS”

End of Class 1