Transcript Document

Outline

• Historical note about Bayes’ rule • Bayesian updating for probability density functions – Salary offer estimate • Coin trials example • Reading material: – Gelman, Andrew, et al. Bayesian data analysis. CRC press, 2003, Chapter 1.

• Slides based in part on lecture by Prof. Joo-Ho Choi of Korea Aerospace University

Historical Note

• Birth of Bayesian – Rev. Thomas Bayes proposed Bayes’ theory (1763): q of Binomial dist. is estimated using observed data.

Laplace discovered, put his name (1812), generalized to many prob’s.

– For more 100 years, Bayesian “ degree of belief ” was rejected as vague and subjective. Objective “ frequency ” was accepted in statistics.

– Jeffreys (1939) rediscovered, made modern theory (1961).

Until 80s, still limited due to requirement for computation.

• Flourishing of Bayesian – From 1990, rapid advance of HW & SW, made it practical.

– Bayesian technique applied to areas of science (economics, medical) & engineering.

, 1999

Bayesian Probability • What is Bayesian probability ?

– Classical: relative frequency of an event, given many repeated trials (e.g., probability of throwing 10 with pair of dice) – Bayesian: degree of belief that it is true based on evidence at hand

• Saturn mass estimation

– Classical: mass is fixed but unknown.

– Bayesian: mass described probabilistically based on observations (e.g, uniformly in interval (a,b).

Bayes rule for pdf’s

θ

is a probability density to estimate based on data

y

.

• Conditional probability density functions

p

q 

p

  q  q • Leading to Bayes’ rule:

p

p

q • Often written as

p

p

q 

p

q •

L

used because p(y|

θ )

is called the likelihood function.

• Instead of dividing by p(y) can divide by area under curve.

p

p

q 

p

Bayesian updating

– The process schematically

Updated prior PDF

p

q |

y

L

y

| q

Observed data added

q Prior distribution Likelihood function Observed data

y

10 8 6 4 12 2 0 4 6 8 10 12 14 16 Posterior distribution 5 4 3 2 1 Posterior Prior 0 q

post

0.1

0.3

k

y

0.5

x 0.7

 0.9

k

 q

prior

Salary estimate example

• You are considering an engineering position for which salary offers

θ

(in thousand dollars)have recently followed the triangular distribution

p

q   q 90 110 • Your friend received a $93K offer for a similar position, and you know that their range of offers for such positions is no more than $5K.

• Before your friend’s data, what was your chance of an offer <$93K ?

• Estimate the distribution of the expected offer and the likeliest value.

p

 q q  0.1

| 93  

L

q

p y

q  5  0.1

 90 98 Right hand side is 0.008 at q  To make area equal to 1:

p

 q | 93     q   q 0.25

0.2

0.15

 0.032

0.1

0.05

0 90 92 94 theta 96 98 100

Self evaluation question • What value of salary offer to your friend would leave you with the least uncertainty about your own expected offer?

Coin Trials Example

• Problem – For a weighted (uneven) coin, probability of heads is to be determined based on the experiments.

– Assume the true

θ

is 0.78, obtained after ∞ trials.

This is the parameter q to be estimated.

But we don’t know this. Only infer based on experiments.

• Bayesian parameter estimation Prior knowledge on p 0 ( q 1.

2.

) No prior information Normal dist centered at 0.5 with s =0.05

3.

Uniform distribution [0.5, 0.7] Posterior distribution of q

p

 q |

x

  | q

p

0 Experiment data: x times out of n trials. • 4 out of 5 trials • 78 out of 100 trials Likelihood by Binomial dist.

Count of successes Outcome  | q  

n

Given parameter Count of failures

C x

q

x

 1  q  Failure probability Success probability

Probability of heads posterior

Prior

distributions

1. No prior (uniform) 7 6 5 4 10 9 8 3 2 1 0 0 0.1

figure

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

2. N(0.5,0.05), poor prior slows convergence.

3. U(0.5,0.7) cannot exceed barrier due to incorrect prior 1 10 9 8 4 3 7 6 5 4 3 6 5 2 1 0 0 10 9 8 7 2 1 0 0 0.1

0.2

0.1

0.2

0.3

0.4

0.3

0.4

0.5

0.6

0.5

0.6

0.7

0.8

0.7

0.8

0.9

0.9

1 1 Red: prior Wide: 4 out of 5 Narrow; 78 out of 100

Probability of 5 consecutive heads

• Prediction using posterior (no prior case) • Exact value is binom(5,5,0.78) = 0.78

5 = 0.289

Likelihood by Binomial dist.

Count of successes Outcome  | q  

n

Given parameter Count of failures

C x

q

x

 1  q  Failure probability Success probability Posterior distribution of q

p

 q |

x

  | q 2 1 5 4 3 10 9 8 7 6

p

 q |

x

 0 0 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Posterior PDF of q 1 Estimation process Draw random samples of q from PDF Compute p based on each q binom(5,5, q ) 18000 16000 14000 12000 10000 8000 6000 4000 2000 0.1

0.2

0.3

0.4

0.5

0.6

0.8

0.9

0 0 0.7

1 10,000 samples of q 18000 16000 14000 12000 10000 8000 6000 4000 2000 0 0

median

0.282

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

5% CI

0.172

1

95% CI

0.416

10,000 samples of predicted p Posterior prediction process

practice problems

1. For the salary estimate problem, what is the probability of getting a better offer than your friend?

2. For the salary problem, calculate the 95% confidence bounds on your salary around the mean and median of your expected salary distribution.

3. Slide 9 shows the risks associated with using a prior. When is it important to use a prior?

Source: Smithsonian Institution Number: 2004-57325