Bayesian Essentials Slides by Peter Rossi and David Madigan Distribution Theory 101 Marginal and Conditional Distributions: pY (y) = ò pX,Y (x,y )dx.

Download Report

Transcript Bayesian Essentials Slides by Peter Rossi and David Madigan Distribution Theory 101 Marginal and Conditional Distributions: pY (y) = ò pX,Y (x,y )dx.

Bayesian Essentials
Slides by Peter Rossi and David Madigan
1
Distribution Theory 101
Marginal and Conditional Distributions:
pY (y) = ò pX,Y (x,y )dx = ò pY X (y x )pX (x )dx
( )
Y
pX (x) = ò pX,Y x,y dy
1
x
= ò 2dy = 2y
x
0
= 2x,x Î(0,1)
0
pY X (y x ) =
1
X
y Î (0,x )
p X,Y (x,y )
pX (x )
2
=
2x
uniform
2
Simulating from Joint
pX,Y (x,y ) = pY X (y x )pX (x )
To draw from the joint:
i. Draw from marginal on X
ii. Condition on this draw, and draw from
conditional of Y|X
library(triangle)
x <- rtriangle(NumDraws,0,1,1)
y <- runif(NumDraws,0,x)
plot(x,y)
3
Triangular Distribution
If U~ unif(0,1), then:
sqrt(U) has the standard triangle distribution
If U1, U2 ~ unif(0,1), then:
Y=max{U1,U2} has the standard triangle distribution
4
1.5
2.0
Sampling Importance Resampling
1.0
g
0.0
0.5
2*y
f
0.0
0.2
0.4
0.6
0.8
1.0
x
draw a big sample from g
sub-sample from that sample with probability f/g
5
1.0
2*y
1.5
2.0
Metropolis
0.5
f
0.0
g
0.0
0.2
0.4
0.6
0.8
1.0
x
start with current = 0.5
to get the next value:
draw a “proposal” from g
keep with probability f(proposal)/f(current)
6
else keep current
The Goal of Inference
Make inferences about unknown quantities using
available information.
Inference -- make probability statements
unknowns -parameters, functions of parameters, states or latent variables,
“future” outcomes, outcomes conditional on an action
Information –
data-based
non data-based
theories of behavior; subjective views; mechanism
parameters are finite or in some range
7
Bayes theorem
p (D, q) p (D q )p (q)
p (q D ) =
=
p (D )
p (D)
p(θ|D) α p(D| θ) p(θ)
Posterior α “Likelihood” × Prior
Modern Bayesian computing– simulation methods
for generating draws from the posterior
distribution p(θ|D).
8
Summarizing the posterior
( )
Output from Bayesian Inference: p q D
A possibly high dimensional distribution
Summarize this object via simulation:
marginal distributions of q, h (q)
don’t just compute E éëq Dùû , Var q D
( )
Contrast with Sampling Theory:
point est/standard error
summary of irrelevant dist
bad summary (normal)
Limitations of asymptotics
9
Metropolis
Start somewhere with θcurrent
To get the next value, generate a proposal θproposal
Accept with “probability”:
(
p qproposal | D
(
p qcurrent | D
(
)(
p D qproposal p qproposal
=
(
( )
pD
)(
p D qcurrent p qcurrent
( )
pD
)
)
)
)
else keep currrent
10
Example
Believe these measurements (D) come from N(μ,1):
2.0
0.9072867 -0.4490744 -0.1463117
0.2525023 0.9723840 -0.8946437 0.2529104 0.5101836 1.2289795
0.5685497
1.0
0.5
0.0
y
p(μ) = 2μ
1.5
Prior for μ?
0.0
0.2
0.4
0.6
x
0.8
1.0
11
Example continued
p(D|μ)?
0.9072867 -0.4490744 -0.1463117
0.2525023 0.9723840 -0.8946437 0.2529104 0.5101836 1.2289795
0.5685497
ææ y - mö 2ö
-1/2
i
(2p)
exp
ç
÷
Õ
ç
÷
çè è 2 ø ÷ø
i=1
y1,…,y10
10
switch to R…
other priors? unif(0,1), norm(0,1), norm(0,100)
generating good candidates?
12
Prediction
future observable
p(D|D)
See D, compute:
“Predictive Distribution”
p(D|D) = ò p(D| q)p(q | D)dq
( ¹ p(D| qˆ )) !!!)
(
) ( )
assumes p D,D q = p D q p (D q)
13
Bayes/Classical Estimators
N
(q)
n
(q)
p (q)
Prior washes out – locally uniform!!! Bayes is
consistent unless you have dogmatic prior.
-1
æ
ö
ˆ
é
ù
p (q D ) ~ N ç qMLE, -Hq=qˆ
÷
MLE û
ë
è
ø
14
Bayesian Computations
Before simulation methods, Bayesians used posterior
expectations of various functions as summary of
posterior.
Eq D éëh (q)ùû = ò h (q)
p (D q )p (q)
p (D )
dq
If p(θ|D) is in a convenient form (e.g. normal), then I
might be able to compute this for some h.
note : p (D) = ò p (D q)p (q)dq
15
Conjugate Families
Models with convenient analytic properties almost
invariably come from conjugate families.
Why do I care now?
- conjugate models are used as building blocks
- build intuition re functions of Bayesian inference
Definition:
A prior is conjugate to a likelihood if the
posterior is in the same class of distributions as prior.
Basically, conjugate priors are like the posterior from
some imaginary dataset with a diffuse prior.
16
Beta-Binomial model
yi ~ Bern(q )
n
(q ) = Õq yi (1 - q )1- yi
i =1
= q y (1 - q ) n- y
where y =
n
åy
i =1
i
p(q | y) = ?
Need a prior!
17
Beta distribution
a-1
b-1
0.08
0.10
Beta(a,b) µ q (1 - q)
0.02
0.04
0.06
E[q] = a /(a + b)
0.00
a=2, b=4
a=3, b=3
a=4, b=2
0.0
0.2
0.4
0.6
0.8
1.0
18
Posterior
p(q | D) µ p(D | q)p(q)
y
n- y
a-1
b-1
é
ù
é
= ëq (1 - q) û ´ ëq (1 - q) ùû
a+ y -1
= q
n-y +b-1
(1 - q)
~ Beta(a + y,b + n - y)
19
Prediction
ò
Pr(y = 1|y) = Pr(y = 1| q,y)dq
1
= ò q p(q |y)dq
0
= E[q|y]
20
Regression model
ei ~ Normal(0, s )
yi = x b + ei
2
'
i
p(yi |b, s ) =
2
é -1
ù
exp ê 2 (yi - xi'b)2 ú
ë 2s
û
2ps2
1
y X,b, s2 ~ N(Xb, s2I)
21
Bayesian Regression
Prior:
Interpretation
as from
another
dataset.
p(b, s ) = p(b | s )p(s )
2
2
2
é -1
ù
p(b | s2 ) µ (s2 )-k / 2 exp ê 2 (b - b)' A(b - b)ú
ë 2s
û
æn
ö
-ç 0 + 1÷
2 è 2
ø
p(s2 ) µ (s )
é -n0s02 ù
exp ê
2 ú
2
s
ë
û
Inverted Chi-Square:
2
u
s
s 2 ~ 02 0
c u0
Draw from prior?
22
Posterior
p(b, s2 |D) µ (b, s2 )p(b | s2 )p(s2 )
é -1
ù
µ (s2 )-n/ 2 exp ê 2 (y - Xb)'(y - Xb)ú
ë 2s
û
2 -k / 2
´ (s )
é -1
ù
exp ê 2 (b - b)' A(b - b)ú
ë 2s
û
æn
ö
-ç 0 + 1÷
2 è 2
ø
´ (s )
é -n0s02 ù
exp ê
2 ú
2
s
ë
û
23
Combining quadratic forms
(y - Xb)'(y - Xb) + (b - b)' A(b - b )
= (y - Xb)'(y - Xb) + (b - b )'U'U(b - b )
= (v - W b)'(v - W b)
é y ù
v=ê ú
ëUb û
éXù
W=ê ú
ëU û
(v - Wb)'(v - Wb) = ns2 + (b - b )' W ' W(b - b )
b = (W 'W)-1W ' v = (X' X + A)-1(X' Xbˆ + Ab)
ns2 = (v - Wb)'(n - Wb) = (y - Xb)'(y - Xb) + (b-b)' A(b-b)
24
Posterior
2 -k / 2
= (s )
-
´ ( s2 )
é -1
ù
exp ê 2 (b -b)'(X ' X + A)(b -b)ú
ë 2s
û
n+n0 + 2
2
é -(n0 s02 + ns2 ) ù
exp ê
ú
2
2s
ë
û
[b | s2 ] = N(b, s2 (X' X + A)-1)
2
n
s
[s2 ] = 12 1 with n1 = n0 + n
cn1
2
2
n
s
+
ns
s12 = 0 0
n0 + n
25
IID Simulations
2]
Scheme: [y|X,
 2] 
[|
[, 2,|y,X]
[22] |[
y,X]
[ | 2,y,X]
1) Draw [2 | y, X]
2) Draw [ | 2,y, X]
3) Repeat
26
IID Simulator, cont.
2
n
s
1) [s2 |y,X] = 12 1
c n1
(
-1
2
2
é
ù
2) ëb y, X, s û = N b, s (X' X + A )
b = (X' X + A)-1(X' Xbˆ + Ab )
(
)
note : q ~ N (0,I); b = U' q + b ~ N b,U'U = s (X' X + A )
2
-1
27
)