Part 25: Bayesian [1/57] Econometric Analysis of Panel Data William Greene Department of Economics Stern School of Business.

Download Report

Transcript Part 25: Bayesian [1/57] Econometric Analysis of Panel Data William Greene Department of Economics Stern School of Business.

Part 25: Bayesian [1/57]
Econometric Analysis of Panel Data
William Greene
Department of Economics
Stern School of Business
Part 25: Bayesian [2/57]
Econometric Analysis of Panel Data
25. Bayesian Econometric Models
for Panel Data
Part 25: Bayesian [3/57]
Sources






Lancaster, T.: An Introduction to Modern Bayesian
Econometrics, Blackwell, 2004
Koop, G.: Bayesian Econometrics, Wiley, 2003
… “Bayesian Methods,” “Bayesian Data Analysis,” …
(many books in statistics)
Papers in Marketing: Allenby, Ginter, Lenk, Kamakura,…
Papers in Statistics: Sid Chib,…
Books and Papers in Econometrics: Arnold Zellner, Gary
Koop, Mark Steel, Dale Poirier,…
Part 25: Bayesian [4/57]
Software



Stata, Limdep, SAS, etc.
S, R, Matlab, Gauss
WinBUGS


Bayesian inference Using Gibbs Sampling
(On random number generation)
Part 25: Bayesian [5/57]
http://www.mrcbsu.cam.ac.uk/bugs/welcome.shtml
Part 25: Bayesian [6/57]
A Philosophical Underpinning


A method of using new information to update
existing beliefs about probabilities of events
Bayes Theorem for events. (Conceived for
updating beliefs about games of chance)
Pr(A,B)
Pr(A | B) 
Pr(B)
Pr(B | A)Pr(A)

Pr(B)
Part 25: Bayesian [7/57]
On Objectivity and Subjectivity


Objectivity and “Frequentist” methods in
Econometrics – The data speak
Subjectivity and Beliefs




Priors
Evidence
Posteriors
Science and the Scientific Method
Part 25: Bayesian [8/57]
Paradigms

Classical


Formulate the theory
Gather evidence



Evidence consistent with theory? Theory stands and waits for
more evidence to be gathered
Evidence conflicts with theory? Theory falls
Bayesian






Formulate the theory
Assemble existing evidence on the theory
Form beliefs based on existing evidence
Gather evidence
Combine beliefs with new evidence
Revise beliefs regarding the theory
Part 25: Bayesian [9/57]
Applications of the Paradigm


Classical econometricians doggedly cling to
their theories even when the evidence conflicts
with them – that is what specification searches
are all about.
Bayesian econometricians NEVER incorporate
prior evidence in their estimators – priors are
always studiously noninformative. (Informative
priors taint the analysis.) As practiced,
Bayesian analysis is not Bayesian.
Part 25: Bayesian [10/57]
Likelihoods

(Frequentist) The likelihood is the density of the
observed data conditioned on the parameters


Inference based on the likelihood is usually
“maximum likelihood”
(Bayesian) A function of the parameters and
the data that forms the basis for inference –
not a probability distribution

The likelihood embodies the current information
about the parameters and the data
Part 25: Bayesian [11/57]
The Likelihood Principle


The likelihood embodies ALL the
current information about the
parameters and the data
Proportional likelihoods should lead to
the same inferences
Part 25: Bayesian [12/57]
Application:

(1) 20 Bernoulli trials, 7 successes (Binomial)
 20  7
L(;N  20, s  7)     (1  )13
7

(2) N Bernoulli trials until the 7th success (Negative
Binomial)
19  7
L(;N  20, s  7)     (1  )13
6
Part 25: Bayesian [13/57]
Inference
Classical :
(1) The MLE is ˆ
=7/20
(2) There is no estimator. We have a sample
of 1 from the distribution of N. What can be
said about ? Apparently nothing.
Bayesian: The posterior for both scenarios is
L(;N=20,s=7)P()

1
0
L(;N=20,s=7)P()d

7 (1-)13P()

1
0
7 (1-)13P()d
Inference about , whatever it is, is the same.
A. Bayesian analysis adheres to the likelihood principle
B. Data and parameters are treated the same.
Part 25: Bayesian [14/57]
The Bayesian Estimator

The posterior distribution embodies all that is
“believed” about the model.


Posterior = f(model|data)
= Likelihood(θ,data) * prior(θ) / P(data)
“Estimation” amounts to examining the
characteristics of the posterior distribution(s).



Mean, variance
Distribution
Intervals containing specified probabilities
Part 25: Bayesian [15/57]
Priors and Posteriors


The Achilles heel of Bayesian Econometrics
Noninformative and Informative priors for estimation of
parameters



Noninformative (diffuse) priors: How to incorporate the total
lack of prior belief in the Bayesian estimator. The estimator
becomes solely a function of the likelihood
Informative prior: Some prior information enters the
estimator. The estimator mixes the information in the
likelihood with the prior information.
Improper and Proper priors



P(θ) is uniform over the allowable range of θ
Cannot integrate to 1.0 if the range is infinite.
Salvation – improper, but noninformative priors will fall out of
the posterior.
Part 25: Bayesian [16/57]
Diffuse (Flat) Priors
N
E.g., the binomial example: L(;N,s)=   s (1  )Ns
s
Uninformative Prior (?): Uniform (flat) P()=1, 0    1
P(|N,s)=

1
0
N s
N s
   (1  )  1
s (1  )Ns
s

(N  s  1)(s  1)
N s
N s

(1


)

1d

 
(N  2)
s
(N  2)
s (1  )Ns  a Beta distribution
(s  1)(N  s  1)
s+1
s+1
Posterior mean =
=
(N-s+1)+(s+1) N+2
For the example, N=20, s=7. MLE = 7/20=.35.
Posterior Mean =8/22=.3636 > MLE. Why? The prior was informative.
(Prior mean = .5)

Part 25: Bayesian [17/57]
Conjugate Prior
Mathematical device to produce a tractable posterior
This is a typical application
N
(N  1)
L(;N,s)=   s (1  )Ns 
s (1  )Ns
(s  1)(N  s  1)
s
(a+b) a1
Use a conjugate beta prior , p()=
 (1  )b 1
(a)(b)

(N  2)
s
N s   (a+b)
a 1
b 1 

(1


)

(1


)
 (s  1)(N  s  1)
  (a)(b)





Posterior 
1
(N  2)
s
N s   (a+b)
a 1
b 1 

(1


)

(1


)
  (a)(b)
 d
0  (s  1)(N  s  1)



s  a1 (1  )Ns b 1

1
0

s  a 1
Posterior mean =
N s b 1
(1  )
d
 a Beta distribution.
s+a
(we used a = b = 1 before)
N+a+b
Part 25: Bayesian [18/57]
THE Question
Where does the prior come from?
Part 25: Bayesian [19/57]
Large Sample Properties of Posteriors

Under a uniform prior, the posterior is
proportional to the likelihood function




Bayesian ‘estimator’ is the mean of the posterior
MLE equals the mode of the likelihood
In large samples, the likelihood becomes
approximately normal – the mean equals the mode
Thus, in large samples, the posterior mean will be
approximately equal to the MLE.
Part 25: Bayesian [20/57]
Reconciliation
A Theorem (Bernstein-Von Mises)



The posterior distribution converges to normal with
covariance matrix equal to 1/N times the information
matrix (same as classical MLE). (The distribution that is
converging is the posterior, not the sampling
distribution of the estimator of the posterior mean.)
The posterior mean (empirical) converges to the mode
of the likelihood function. Same as the MLE. A proper
prior disappears asymptotically.
Asymptotic sampling distribution of the posterior mean
is the same as that of the MLE.
Part 25: Bayesian [21/57]
Mixed Model Estimation

MLWin: Multilevel modeling for Windows



http://multilevel.ioe.ac.uk/index.html
Uses mostly Bayesian, MCMC methods
“Markov Chain Monte Carlo (MCMC) methods allow
Bayesian models to be fitted, where prior
distributions for the model parameters are specified.
By default MLwin sets diffuse priors which can be
used to approximate maximum likelihood
estimation.” (From their website.)
Part 25: Bayesian [22/57]
Bayesian Estimators

First generation: Do the integration (math)
βf(data | β)p(β)
dβ
β
f(data)
E(β | data)  

Contemporary - Simulation:


(1) Deduce the posterior
(2) Draw random samples of draws from the posterior and
compute the sample means and variances of the samples.
(Relies on the law of large numbers.)
Part 25: Bayesian [23/57]
The Linear Regression Model
Likelihood
2
2 -n/2 -[(1/(2σ2 ))(y-Xβ)(y-Xβ)]
L(β,σ |y,X)=[2πσ ]
e
Transformation using d=(N-K) and s2  (1 / d)(y  Xb)(y  Xb)
 1 
 1 2  1  1
 1





(
y

Xβ
)
(
y

Xβ
)


ds

(
β

b
)
X
X
 22 
 2
  2  2
 2
 (β  b)



 


Diffuse uniform prior for β, conjugate gamma prior for 2
Joint Posterior
[ds 2 ]v 2  1 
2
f(β,  | y , X) 
(d  2)  2 
d1
e  ds
2
(1 / 2 )
 exp{(1 / 2)(β - b) '[2 ( XX) 1 ]1 (β - b)}
[2]K / 2 | 2 ( XX) 1 |1 / 2
Part 25: Bayesian [24/57]
Marginal Posterior for 
After integrating  2 out of the joint posterior:
[ds 2 ]v 2 (d  K / 2)
[2 ]K / 2 | X X |1 / 2
(d  2)
f (β | y , X ) 
.
2
d K / 2
1
[ds  2 (β  b)X X (β  b)]
n-K
[s 2 ( X'X) 1 ]
n K 2
The Bayesian 'estimator' equals the MLE. Of course; the prior was
Multivariate t with mean b and variance matrix
noninformative. The only information available is in the likelihood.
Part 25: Bayesian [25/57]
Nonlinear Models and Simulation

Bayesian inference over parameters in a
nonlinear model:





1. Parameterize the model
2. Form the likelihood conditioned on the
parameters
3. Develop the priors – joint prior for all model
parameters
4. Posterior is proportional to likelihood times prior.
(Usually requires conjugate priors to be tractable.)
5. Draw observations from the posterior to study its
characteristics.
Part 25: Bayesian [26/57]
Simulation Based Inference
Form the likelihood L(θ,data)
Form the prior p(θ)
Form the posterior K  p(θ)L(θ,data) where K
is a constant that makes the whole thing integrate to 1.
Posterior mean =
 θ K  p(θ)L(θ,data)dθ
θ
ˆ θ|data)= 1  R θrS
Estimate the posterior mean by E(
R r 1
by simulating draws from the posterior.
Part 25: Bayesian [27/57]
A Practical Problem
Sampling from the joint posterior may be impossible.
E.g., linear regression.
[vs 2 ]v 2  1 
2
f(β,  | y , X) 
(v  2)  2 
v 1
e
 vs 2 (1 / 2 )
[2]K / 2 | 2 ( X X) 1 |1 / 2
 exp((1 / 2)(β  b)[2 ( X X) 1 ]1 (β  b))
What is this???
To do 'simulation based estimation' here, we need joint
observations on (β, 2 ).
Part 25: Bayesian [28/57]
A Solution to the Sampling Problem
The joint posterior, p(β,2|data) is intractable. But,
For inference about β, a sample from the marginal
posterior, p(β|data) would suffice.
For inference about 2 , a sample from the marginal
posterior of 2 , p(2|data) would suffice.
Can we deduce these? For this problem, we do have conditionals:
p(β|2 ,data) = N[b,2 ( X'X) 1 ]
i (y i  x iβ)2
p( |β,data) = K 
 a gamma distribution
2
Can we use this information to sample from p(β|data) and
2
p(2|data)?
Part 25: Bayesian [29/57]
The Gibbs Sampler




Target: Sample from marginals of f(x1, x2) = joint distribution
Joint distribution is unknown or it is not possible to sample from the joint
distribution.
Assumed: f(x1|x2) and f(x2|x1) both known and samples can be drawn from
both.
Gibbs sampling: Obtain one draw from x1,x2 by many cycles between x1|x2
and x2|x1.






Start x1,0 anywhere in the right range.
Draw x2,0 from x2|x1,0.
Return to x1,1 from x1|x2,0 and so on.
Several thousand cycles produces the draws
Discard the first several thousand to avoid initial conditions. (Burn in)
Average the draws to estimate the marginal means.
Part 25: Bayesian [30/57]
Bivariate Normal Sampling
 0   1   
Draw a random sample from bivariate normal   , 

0

1

  
v 
u 
u 
(1) Direct approach:  1     1  where  1  are two
 v 2 r
 u2 r
 u2 
1
independent standard normal draws (easy) and = 
 1
1 
2
such that '= 
 . 1  , 2  1   .
 1
(2) Gibbs sampler: v1 | v 2 ~ N v 2 , 1  2 


v 2 | v 1 ~ N v 1 , 1  2 


0

2 
Part 25: Bayesian [31/57]
Gibbs Sampling for the Linear Regression
Model
p(β|2 ,data) = N[b,2 ( X'X)1 ]
i (y i  xiβ)2
p( |β,data) = K 
2
 a gamma distribution
Sample back and forth between these two distributions
2
Part 25: Bayesian [32/57]
Application – the Probit Model
(a) y i *  x iβ + i
i ~ N[0,1]
(b) y i  1 if y i * > 0, 0 otherwise
Consider estimation of β and y i * (data augmentation)
(1) If y* were observed, this would be a linear regression
(y i would not be useful since it is just sgn(y i *).)
We saw in the linear model before, p(β| y i *, y i )
(2) If (only) β were observed, y i * would be a draw from
the normal distribution with mean x iβ and variance 1.
But, y i gives the sign of y i * . y i * | β, y i is a draw from
the truncated normal (above if y=0, below if y=1)
Part 25: Bayesian [33/57]
Gibbs Sampling for the Probit Model
(1) Choose an initial value for β (maybe the MLE)
(2) Generate y i * by sampling N observations from
the truncated normal with mean x iβ and variance 1,
truncated above 0 if y i  0, from below if y i  1.
(3) Generate β by drawing a random normal vector with
mean vector (X'X)-1 X'y * and variance matrix (X'X )-1
(4) Return to 2 10,000 times, retaining the last 5,000
draws - first 5,000 are the 'burn in.'
(5) Estimate the posterior mean of β by averaging the
last 5,000 draws.
(This corresponds to a uniform prior over β.)
Part 25: Bayesian [34/57]
Generating Random Draws from f(X)
The inverse probability method of sampling
random draws:
If F(x) is the CDF of random variable x, then
a random draw on x may be obtained as F -1 (u)
where u is a draw from the standard uniform (0,1).
Examples:
Exponential: f(x)=exp(-x); F(x)=1-exp(-x)
x = -(1/)log(1-u)
Normal:
F(x) = (x); x =  -1 (u)
Truncated Normal: x=i +  -1 [1-(1-u)*(i )] for y=1;
x= i +  -1 [u(-i )] for y=0.
Part 25: Bayesian [35/57]
Example: Simulated Probit
? Generate raw data
Sample ; 1 - 1000 $
Create ; x1=rnn(0,1) ; x2 = rnn(0,1) $
Create ; ys = .2 + .5*x1 - .5*x2 + rnn(0,1) ; y = ys > 0 $
Namelist; x=one,x1,x2$
Matrix ; xx=x'x ; xxi = <xx> $
Calc
; Rep = 200 ; Ri = 1/Rep$
Probit ; lhs=y;rhs=x$
? Gibbs sampler
Matrix ; beta=[0/0/0] ; bbar=init(3,1,0);bv=init(3,3,0)$$
Proc
= gibbs$
Do for ; simulate ; r =1,Rep $
Create ; mui = x'beta ; f = rnu(0,1)
; if(y=1) ysg = mui + inp(1-(1-f)*phi( mui));
(else) ysg = mui + inp(
f *phi(-mui)) $
Matrix ; mb = xxi*x'ysg ; beta = rndm(mb,xxi)
; bbar=bbar+beta ; bv=bv+beta*beta'$
Enddo
; simulate $
Endproc $
Execute ; Proc = Gibbs $ (Note, did not discard burn-in)
Matrix ; bbar=ri*bbar ; bv=ri*bv-bbar*bbar' $
Matrix ; Stat(bbar,bv); Stat(b,varb) $
Part 25: Bayesian [36/57]
Example: Probit MLE vs. Gibbs
--> Matrix ; Stat(bbar,bv); Stat(b,varb) $
+---------------------------------------------------+
|Number of observations in current sample =
1000 |
|Number of parameters computed here
=
3 |
|Number of degrees of freedom
=
997 |
+---------------------------------------------------+
+---------+--------------+----------------+--------+---------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] |
+---------+--------------+----------------+--------+---------+
BBAR_1
.21483281
.05076663
4.232
.0000
BBAR_2
.40815611
.04779292
8.540
.0000
BBAR_3
-.49692480
.04508507
-11.022
.0000
+---------+--------------+----------------+--------+---------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] |
+---------+--------------+----------------+--------+---------+
B_1
.22696546
.04276520
5.307
.0000
B_2
.40038880
.04671773
8.570
.0000
B_3
-.50012787
.04705345
-10.629
.0000
Part 25: Bayesian [37/57]
Part 25: Bayesian [38/57]
A Random Parameters Approach to
Modeling Heterogeneity

Allenby and Rossi, “Marketing Models of
Consumer Heterogeneity,” Journal of
Econometrics, 89, 1999.




Discrete Choice Model – Brand Choice
“Hierarchical Bayes”
Multinomial Probit
Panel Data: Purchases of 4 brands of Ketchup
Part 25: Bayesian [39/57]
Structure
Conditional data generation mechanism
yit , j *  i xit , j   it , j , Utility for consumer i, choice t, brand j.
Yit , j  1[ yit , j *  maximum utility among the J choices]
xit , j  (constant, log price, "availability," "featured")
 it , j ~ N [0,  j ], 1  1
Implies a J outcome multinomial probit model.
Part 25: Bayesian [40/57]
Bayesian Priors
Prior Densities
 i ~ N [ , V ],
Implies  i    w i , w i ~ N [, V ]
 j ~ Inverse Gamma[v, s j ] (looks like chi-squared), v=3, s j  1
Priors over structural model parameters
 ~ N [ , aV ],   0
V1 ~ Wishart[v0 , V0 ], v0  8, V0  8I
Part 25: Bayesian [41/57]
Bayesian Estimator

Joint posterior mean=
E[1,..., N ,  ,V , 1,..., J | data]



Integral does not exist in closed form.
Estimate by random samples from the joint
posterior.
Full joint posterior is not known, so not possible
to sample from the joint posterior.
Part 25: Bayesian [42/57]
Gibbs Cycles for the MNP Model

Samples from the marginal posteriors
Marginal posterior for the individual parameters
(Known and can be sampled)
 i |  , V ,  , data has a known normal distribution
Marginal posterior for the common parameters
(Each known and each can be sampled)
 | V ,  , data
V | ,  , data
 |  V ,, data
Part 25: Bayesian [43/57]
Bayesian Fixed Effects




Application: Koop, et al., “Hospital Cost Efficiency,” Journal of
Econometrics, 1997, 76, pp. 77-106
Treat individual constants as first level parameters
Model=f(α1,…,αN,,σ,data)
Formal Bayesian treatment of K+N+1 parameters in the model.



Stochastic Frontier – as in latent variable application
Bayesian counterparts to fixed effects and random effects models
??? Incidental parameters? (Almost surely, or something like it.)
How do you deal with it


Irrelevant – There are no asymptotic properties
Must be relevant – estimates are numerically unstable
Part 25: Bayesian [44/57]
Comparison of Maximum Simulated Likelihood
and Hierarchical Bayes


Ken Train: “A Comparison of Hierarchical Bayes and
Maximum Simulated Likelihood for Mixed Logit”
Mixed Logit
U (i, t , j )  ix(i, t , j )   (i, t , j ),
i  1,..., N individuals,
t  1,..., Ti choice situations
j  1,..., J alternatives (may also vary)
Part 25: Bayesian [45/57]
Stochastic Structure – Conditional Likelihood
Pr ob(i, j, t ) 
exp(i xi , j ,t )


exp(

i x i , j ,t )
j 1
J
Likelihood   t 1
T
exp(i xi , j *,t )

J
j 1
exp(i xi , j *,t )
j*  indicator for the specific choice made by i at time t.
Note individual specific parameter vector, i
Part 25: Bayesian [46/57]
Classical Approach
 i ~ N [b,  ]
i  b + w i
 b + v i where   diag ( j1/ 2 ) (uncorrelated )
Log  likelihood   i 1 log 
N
w

T
t 1
exp[(b  w i )xi , j *,t ]
dw i
 j 1 exp[(b  w i )i xi , j*,t ]
J
Maximize over b,  using maximum simulated likelihood
(random parameters model)
Part 25: Bayesian [47/57]
Bayesian Approach – Gibbs Sampling and
Metropolis-Hastings
Posterior= i=1 L(data|βi )×priors
N
Prior=N(β1 ,...,β N |b,) (normal), = diagonal(γ1 ,...,γ K )
×IG(γ1 ,...,γ N |parameters) (Inverse gamma with 1 d.f. parameter λ)
×g(b|assumed parameters) (Uniform (flat) with very large range)
Part 25: Bayesian [48/57]
Gibbs Sampling from Posteriors: b
p(b | β1 ,..., β N , )  Normal[, (1/ N )]
  (1/ N ) i 1 βi
N
Easy to sample from Normal with known
mean and variance by transforming a set
of draws from standard normal.
Part 25: Bayesian [49/57]
Gibbs Sampling from Posteriors: Γ
p(  k | b, 1 ,...,  N ) ~ Inverse Gamma[1  N ,1  NVk ]
Vk  (1/ N ) i 1 (k ,i  bk ) 2 for each k=1,...,K
N
Draw from inverse gamma for each k:

Draw 1+N draws from N[0,1]
= h r,k ,
then the draw is
(1+NVk )
2
h
 r 1 r,k
R
Part 25: Bayesian [50/57]
Gibbs Sampling from Posteriors: i
p (βi | b,  )  M  L(data | i )  g (βi | b, )
M=a constant, L=likelihood, g=prior
(This is the definition of the posterior.)
Not clear how to sample.
Use Metropolis Hastings algorithm.
Part 25: Bayesian [51/57]
Metropolis – Hastings Method
Define :
i ,0  an 'old' draw (vector)
i ,1  the 'new' draw (vector)
d r =   vr ,
=a constant (see below)
  the diagonal matrix of standard deviations
v r =a vector of K draws from standard normal
Part 25: Bayesian [52/57]
Metropolis Hastings: A Draw of i
Trial value : i ,1  i ,0  d r
R
Posterior (i ,1 )
Posterior (i ,0 )
( Ms cancel )
U  a random draw from U(0,1)
If U < R, use i ,1 , else keep i ,0
During Gibbs iterations, draw i ,1
 controls acceptance rate. Try for .4.
Part 25: Bayesian [53/57]
Application: Energy Suppliers


N=361 individuals, 2 to 12 hypothetical
suppliers. (A stated choice experiment)
X=






(1)
(2)
(3)
(4)
(5)
(6)
fixed rates,
contract length,
local (0,1),
well known company (0,1),
offer TOD rates (0,1),
offer seasonal rates]
Part 25: Bayesian [54/57]
Estimates: Mean of Individual i
MSL Estimate
Bayes Posterior
Mean (Std. Dev.)
Price
-1.04 (0.396)
-1.04 (0.0374)
Contract
-0.208 (0.0240)
-0.194 (0.0224)
Local
2.40 (0.127)
2.41 (0.140)
Well Known
1.74 (0.0927)
1.71 (0.100)
TOD
-9.94 (0.337)
-10.0 (0.315)
Seasonal
-10.2 (0.333)
-10.2 (0.310)
Part 25: Bayesian [55/57]
Conclusions

Bayesian vs. Classical Estimation




In principle, some differences in interpretation
As practiced, just two different algorithms
The religious debate is a red herring
Gibbs Sampler. A major technological advance


Useful tool for both classical and Bayesian
New Bayesian applications appear daily
Part 25: Bayesian [56/57]
Standard Criticisms

Of the Classical Approach





Computationally difficult (ML vs. MCMC)
No attention is paid to household level parameters.
There is no natural estimator of individual or household level
parameters
Responses: None are true. See, e.g., Train (2009, ch. 10)
Of Classical Inference in this Setting


Asymptotics are “only approximate” and rely on “imaginary samples.”
Bayesian procedures are “exact.”
Response: The inexactness results from acknowledging that we try to
extend these results outside the sample. The Bayesian results are
“exact” but have no generality and are useless except for this sample,
these data and this prior. (Or are they? Trying to extend them
outside the sample is a distinctly classical exercise.)
Part 25: Bayesian [57/57]
Standard Criticisms

Of the Bayesian Approach





Computationally difficult.
Response: Not really, with MCMC and Metropolis-Hastings
The prior (conjugate or not) is a canard. It has nothing to do with “prior
knowledge” or the uncertainty of the investigator.
Response: In fact, the prior usually has little influence on the results.
(Bernstein and von Mises Theorem)
Of Bayesian ‘Inference’


It is not statistical inference
How do we discern any uncertainty in the results? This is precisely the
underpinning of the Bayesian method. There is no uncertainty. It is ‘exact.’