Introduction to Bayesian statistics

Download Report

Transcript Introduction to Bayesian statistics

Introduction to Bayesian statistics

Three approaches to Probability

Axiomatic


Relative Frequency


Repeated trials
Degree of belief (subjective)


Probability by definition and properties
Personal measure of uncertainty
Problems



The chance that a meteor strikes earth is 1%
The probability of rain today is 30%
The chance of getting an A on the exam is 50%
Problems of statistical inference


Ho: θ=1 versus Ha: θ>1
Classical approach




But scientist wants to know:



P-value = P(Data | θ=1)
P-value is NOT P(Null hypothesis is true)
Confidence interval [a, b] : What does it mean?
P(θ=1 | Data)
P(Ho is true) = ?
Problem

θ “not random”
Bayesian statistics




Fundamental change in philosophy
Θ assumed to be a random variable
Allows us to assign a probability distribution
for θ based on prior information
95% “confidence” interval [1.34 < θ < 2.97]
means what we “want” it to mean:
P(1.34 < θ < 2.97) = 95%

P-values mean what we want them to mean:
P(Null hypothesis is false)
Estimating P(Heads) for a biased coin


Parameter p
Data: 0, 0, 0, 1, 0, 1, 0, 0, 1, 0
p = 3/10 = 0.3
But what if we believe
coin is biased in favor
of low probabilities?
How to incorporate prior beliefs into model
We’ll see that p-hat = .22
1


0.8
0.6
0.4
0.2
0.1


0.2
0.3
0.4
Bayes Theorem
P( A and B)
P( A | B) 
P( B)
P( B | A) P( A)

P( B)
P( B | A) P( A)

C
C
P( B | A) P( A)  P( B | A ) P( A )
Example





Population has 10% liars
Lie Detector gets it “right” 90% of the time.
Let A = {Actual Liar},
Let R = {Lie Detector reports you are Liar}
Lie Detector reports suspect is a liar. What is
probability that suspect actually is a liar?
P( L | A) P( A)
P ( A | L) 
P( L | A) P( A)  P( L | AC ) P( AC )
(.90)(.10)
1

 !!!!!
(.90)(.10)  (.10)(.90) 2
More general form of Bayes Theorem
If S 
n
Ai , then
i 1
P( Ai and B) P( B | Ai ) P( Ai )
P( Ai | B) 

P( B)
P( B)
P( A | Ai ) P( Ai )

 P( B | Aj ) P( Aj )
j
Example



Three urns
Urn A: 1 red, 1 blue Urn B: 2 reds, 1 blue Urn C: 2 reds, 3 blues
Roll a fair die. If it’s 1, pick Urn A. If 2 or 3, pick Urn B. If 4, 5, 6, pick
Urn C. Then choose one ball.
A ball was chosen and it’s red. What’s the probability it came from Urn
C?
P(C | red ) 
P(red | C ) P(C )
P(red | A) P( A)  P(red | B) P( B)  P(red | C ) P(C )
(2 / 5)(3 / 6)

 0.3956
(1/ 2)(1/ 6)  (2 / 3)(2 / 6)  (2 / 5)(3 / 6)
Bayes Theorem for Statistics


Let θ represent parameter(s)
Let X represent data
f ( | X )  f ( X |  ) f ( ) / f ( X )


Left-hand side is a function of θ
Denominator on right-hand side does not depend on θ
f ( | X )  f ( X |  ) f ( )




Posterior distribution  Likelihood x Prior distribution
Posterior dist’n = Constant x Likelihood x Prior dist’n
Equation can be understood at the level of densities
Goal: Explore the posterior distribution of θ
A simple estimation example





Biased coin estimation: P(Heads) = p = ?
X1 , , X n 0-1 i.i.d. Bernoulli(p) trials
Let X   X i be the number of heads in n trials
X
n X
Likelihood is f ( X | p)  p (1  p)
For prior distribution use uninformative prior



Uniform distribution on (0,1): f(p) = 1
So posterior distribution is proportional to
X
n X
p
(1

p
)
f(X|p)f(p) =
X
n X
f(p|X)  p (1  p)
Coin estimation (cont’d)






Posterior density of the form f(p)=Cpx(1-p)n-x
Beta distribution: Parameters x+1 and n-x+1
http://mathworld.wolfram.com/BetaDistributio
n.html
Data: 0, 0, 1, 0, 0, 0, 0, 1, 0, 1
n=10 and x=3
Posterior dist’n is Beta(3+1,7+1) = Beta(4,8)
Coin estimation (cont’d)






Posterior dist’n: Beta(4,8)
Mean: 0.33
Mode: 0.30
Median: 0.3238
qbeta(.025,4,8),
qbeta(.975,4,8)
= [.11, .61] gives 95%
credible interval for p
P(.11 < p < .61|X) = .95
Prior distribution

Choice of beta distribution for prior


Posterior

Likelihood x Prior
= [ px(1-p)n-x ] [ pa+1(1-p)b+1 ]
= px+a+1(1-p)n-x+b+1
Posterior distribution is Beta(x+a, n-x+b)
Prior distributions

Posterior summaries:





Mean = (x+a)/(n+a+b)
Mode = (x+a-1)/(n+a+b-2)
Quantiles can be computed by integrating the
beta density
For this example, prior and posterior
distributions have same general form
Priors which have the same form as the
posteriors are called conjugate priors
Data example






Maternal condition placenta previa
Unusual condition of pregnancy where placenta is
implanted very low in uterus preventing normal
delivery
Is this related to the sex of the baby?
Proportion of female births in general population is
0.485
Early study in Germany found that in 980 placenta
previa births, 437 were female (0.4459)
Ho: p = 0.485 versus Ha: p < 0.485
Placenta previa births



Assume uniform prior Beta(1,1)
Posterior is Beta(438,544)
Posterior summaries


Mean = 0.446, Standard Deviation = 0.016
95% confidence interval: [ qbeta(.025,438,544),
qbeta(.975,438,544) ] = [ .415, .477 ]
Sensitivity of Prior



Suppose we took a prior more concentrated
about the null
hypothesis value
E.g., Prior ~ Normal(.485,.01)
Posterior proportional to
2
 ( p .485)
p437 (1  p)543 e



2*.01
Constant of integration is about 10-294
Mean, summary statistics, confidence intervals,
etc., require numerical methods
See S-script:
http://www.people.carleton.edu/~rdobrow/c
ourses/275w05/Scripts/Bayes.ssc