The Generalised Method of Moments

Download Report

Transcript The Generalised Method of Moments

The Generalised Method of Moments
Ibrahim Stevens
Joint HKIMR/CCBS Workshop
Advanced Modelling for Monetary Policy in the AsiaPacific Region
May 2004
GMM
• Why use GMM?
– Nonlinear estimation
– Structural estimation
– ‘Robust’ estimation
• Models estimated using GMM
– Many….
• Rational expectations models
– Euler Equations
• Non-Gaussian distributed models
The Method of Moments
• Simple moment conditions
Population
E [ ]  0
Sample

cov[X ,  ]  0 
1
T
1
T
 ˆ
t
 X ˆ
0
'
t t
0
The Method of Moments
• OLS as a MM estimator
y  X   , so thatˆ  y  Xˆ
• Moment conditions:
T
E[ ]  0  T1  ( yt  X t ˆ )  0
t 1
E[ X '  ]  0  X ' ( y  Xˆ )  0
• MM estimator
1
ˆ
ˆ
X ' y  X ' X  0     X ' X  X ' y
Slightly more Generalised MM
• IV is a MM estimator
cov[X ,  ]  0, but cov[Z ,  ]  0
• Moment condition:
E[Z '  ]  0  Z ' ( y  XˆIV )  0
• MM estimator:
1
ˆ
ˆ
Z ' y  Z ' X IV  0   IV  Z ' X  Z ' y
Slightly more Generalised MM
• In the previous IV estimator we have considered
the case where the number of instruments is equal
to the number of coefficients we want to estimate
• Size of Z is the same as the size of IV
• What happens if the number of instruments is
greater than the number of coefficients?
• Essentially, the number of equations is greater
than the number of coefficients you want to
estimate: model is over-identified
IV with more constraints than equations
• Maintain the moment condition as before
• Variance of moment condition is:
var[Z '  ]  E[Z '  ' Z ]   2 Z ' Z  W
• Minimise ‘weighted’ distance:
V   ' ZW Z ' 
 12 ( y' ˆ ' X ' ) Z ( Z ' Z ) 1 Z ' ( y  Xˆ )
1

IV with more constraints than equations
• Why do we do a minimisation exercise?
• Because we have more equations than ‘unknows’.
• How do we determine the true values of the
coefficients?
• Solution is to minimise the previous expression so
that the coefficients are able to approximate the
moment condition, that is pick coefficients such
that the orthogonality condition is satisfied
IV with more constraints than equations
• First order conditions:
V
1
 X ' Z ( Z ' Z ) Z ' ( y  Xˆ )  0
ˆ
• MM estimator (looks like an IV estimator with
more instruments than parameters to estimate):
1
1
1
ˆ
  ( X ' Z Z ' Z  Z ' X ) X ' Z Z ' Z  Z ' y
Moment conditions in estimation
• Model may be nonlinear
– Euler equations often imply models in levels not
logs (consumption, output, other first order
conditions)
– Both ad hoc and structural models may be
nonlinear in parameters of interest (systems)
• Models may have unknown disturbance structure
– Rational expectations
– May not be interested in related parameters
A generalised problem
• Let any (nonlinear) moment condition be:
Et [m( xt i ,  )' zt ]  0
• Sample counterpart:
T
1
T
[m( x
t i
t 1
• Minimise:
,  )' zt ]  0


 1  T

min  [m( xt i ,  )' zt ] W   [m( xt i ,  )' zt ]

 t 1

 t 1

T
A generalised problem
• If we have more instruments (n) than coefficients
(p) we choose to minimise:


 1  T

min  [m( xt i ,  )' zt ] W   [m( xt i ,  )' zt ]

 t 1

 t 1

T
• What should the matrix W look like?
A generalised problem
• It turns out that any symmetric positive definite
matrix of W yields consistent estimates for the
parameters
• However, it does not yield efficient ones
• Hansen (1982) derives the necessary (not
sufficient) condition to obtain asymptotically
efficient estimates for the coefficients
Choice of W (efficiency)
• Appropriate weight matrices (Hansen, 82):
T

W  var m( xt i ,  )' zt 
 t 1

• Intuition: W-1 denotes the inverse of the covariance
matrix of the sample moments. This matrix is
chosen because it ‘means’ that less weight is
placed on the more imprecise moments
Implementation
•
Implementation is generally undertaken in a
‘two-step procedure’:
1. Any symmetric positive definite matrix yields
consistent estimates of the parameters. Thus
exploit this. Using ‘any’ symmetric positive
definite matrix, back up estimates for the
parameters in the model
•
An arbitrary matrix such the identity matrix is normally used
to obtain the first consistent estimator
2. Using these parameters construct the weighting
matrix W and from that we can undertake the
minimisation problem
• This process can be iterated
–
Some computational cost
Instrument validity and W
• Estimation of the minimised criterion can be used
to test the validity of the instruments
• EViews gives you the ‘wrong’ Hansen J-statistic test of overidentification

T




1
ˆ
ˆ
ˆ
J    [m( xt i ,  )' zt ] W   [m( xt i ,  )' zt ]
 t 1

 t 1

T
– Multiply by the number of observations to get correct J
– This is a Chi squared with n-p degrees of freedom
• If a sub-optimal weighting matrix is used, Hansen’s J-test does
not apply. See Chochrane 1996
– We can also test as sub-set of othogonality conditions
Covariance estimators
• Choosing the right weighting matrix is important
for GMM estimation
• There have been many econometric papers written
on this subject
• Estimation results can be sensitive to the choice of
weighting matrix
Covariance estimators
• So far we have not considered the possibility that
heteroskedasticity and autocorrelation be a part of
your model
• How can we account for this?
• We need to modify the covariance matrix
Covariance estimators
• Write our covariance matrix of empirical moments
as:
1 T T

W  lim  E M p ( xt i , zt ,  )M q ( xt i , zt ,  ) 
T  T
 q 1 p 1


• Where Mq is the qth row of the Txn matrix of
sample moments

Covariance estimators
• Define the autocovariances:
( j ) T1
( j ) T1
 E M
T
p  j 1
p
 E M
( xt i , zt ,  )M p  j ( xt i , zt ,  ), for j  0
T
p   j 1
p
( xt i , zt ,  )M p  j ( xt i , zt ,  ), for j  0
• Express W in terms of the above expressions:
n 1
W
n

 ( j )
j  n 1
Covariance estimators
• If there is no serial correlation, the expressions for
j0 are all equal to zero (since the autocovariances
will be zero):
W n  (0) T1  E M p ( xt i , zt ,  )M p ( xt i , zt ,  )
T
p 1
• Note that this ‘looks like’ a White (1980)
heteroskesdastic consistent estimator…
Covariance estimators
• If this looks like a White (1980) heteroskesdastic
consistent estimator…
…implementation should be straight-forward!
• Example (Remembering White): Take the standard
heteroskedastic version of the linear model
y  X  u ,
 12

   
 0




0 

 
 T2 

Covariance estimators
• The appropriate problem and weighting matrix are
min u ' ZW

1
Z 'u

1
W  (0)  Z ' Z 
T
T
 
2
E
u
 p Z p 'Z p
p 1
• The weighting matrix can be consistently estimated
by using any consistent estimator of the model’s
parameters and substituting the expected value of the
squared residuals by the actual residual
(NB. The only difference here is that we are
generalising the problem by allowing instruments, ie
Zs)
Covariance estimators
• The problem is that with autocorrelation it is not
possible to replace the expected values of the squared
residuals by the actual values from the first
estimation
• It would lead to an inconsistent estimate of the
autocovariance matrix of order j
• The problem of this approach is that, asymptotically,
the number of estimated autocovariances grows at
the same rate as the sample size
• Thus whilst unbiased W is not consistent in the mean
squared error sense
Covariance estimators
• Thus we require a class of estimators that circumvents
these problems
• A class of estimators that prevent the autocovariances
from growing with the sample size are
W
T 1




j   T 1
j
( j )
• Parzen termed the ws’ the lag window
• These estimators correspond to a class of kernel
(spectral density) estimators (evaluated at frequency
zero)
Covariance estimators
• The key is to choose the sequence of ws’ such that
the sequence of weights approaches unity rapidly
enough to obtain asymptotic unbiasedness but slowly
enough to ensure that the variance converges to zero
• The type of weights you will find in EViews
correspond to a particular class of lagged windows
termed scale parameter windows
• The lag window is expressed as
 j  k  j / bT 
Covariance estimators
• HAC matrix estimation:
p

Wˆ  ˆ (0)   k  j / bT  ˆ ( j )  ˆ (  j )

j 1
• k(j/bT) is a kernel, bT is the bandwidth
• Intuition: bT streches or contracts the distribution;
it acts as a scaling parameter
• k(z) is referred to as the ‘lagged window
generator’
Covariance estimators
• HAC matrix estimation:
p

Wˆ  ˆ (0)   k  j / bT  ˆ ( j )  ˆ (  j )

j 1
• When the value of the kernel is zero for z>1, bT is
called a ‘lag truncation parameter’
(autocovariances corresponding to lags greater
than bT are given zero weight)
• The scalar bT is often referred to as the ‘bandwidth
parameter’
Covariance estimators
• Eviews provides two kernels:
1. Quadratic
2. Barlett
• It provides 3 options for the bandwidth parameter
bT
(See manual for specific functional forms and good
discussion!)
Covariance estimators
• For instance Newey and West (1987) suggest
using a Barlett:


j ˆ
ˆ
ˆ
W  (0)   1 
 ( j )  ˆ (  j )
p  1
j 1 
p

• Guarantees positive definiteness (which is
something that we desire since we would like a
positive variance)
Alternative covariance estimators
• Andrews (1991)
• Quadratic spectral estimator:
p

Wˆ  ˆ (0)   k  j / bT  ˆ ( j )  ˆ (  j )
where:

j 1
25  sin(6x / 5)
 6x  
k ( x) 
 cos


2 2
12 x  6x / 5
 5 
1
j
x  , and bT  1.3221(T ) 5
bT
Pre-whitening
• Andrews and Monahan (1992)
• Fit an VAR to the moment residuals:
ˆt  Aˆ ˆt 1  ˆ t
where:
Wˆ pw  Dˆ Wˆ Dˆ ' , and Dˆ  ( I  Aˆ ) 1.
• This is known as a pre-whitened estimate
• Can be applied to any kernel
Linear models
• Estimate by IV (consistent but inefficient):
1
1
1
ˆ
  ( X ' Z Z ' Z  Z ' X ) X ' Z Z ' Z  Z ' y
• Use estimates to construct estimate of W:
ˆˆ
1
1
1
ˆ
ˆ
  ( X ' ZW Z ' X ) X ' ZW Z ' y
• Can iterate on estimates of W
Nonlinear models
• Estimate by nonlinear IV
– May solve by standard ‘iterative’ nonlinear IV
• Estimate covariance matrix
• Minimise J using non-linear optimisation
• Iterate on covariance matrix (optional)
• Eviews uses Berndt-Hall-Hall-Hausman or
Marquardt algorithms (see manual for pros and
cons)
Useful facts
• Covariance matrix estimators must be positivedefinite, asymptotically it has been shown that the
quadratic spectral window is best
• But in small samples Newey and West (1994)
show little difference between the Quadratic and
their estimator (based on Barlett)
Useful facts
• Choice of bandwidth parameter more important
than the choice of the kernel
• Variable Newey and West and Andrews is state of
the art
• HAC estimators suffer from poor small sample
performance, thus test statistics (eg t-test) may not
be reliable – t-stats appear to reject a true null far
more often than their nominal size
• Adjustments to the matrix W may be made but
these depend on whether there is autocorrelation
and/or heteroskesdacity
Useful facts
• Numerical Optimisation – common problem of not
having a global maximum/minimum
• Eg Problems of local maximum/minimum or flat
functions
• Without a global mimimum, GMM estimation
does not yield consistent and efficient estimates
• Convexity of the criterion function is important –
it guarantees global minima
Useful facts
• For non-convex problems you must use ‘different
methods’
• A multi-start algorithm popular: start at a local
optimisation algorithm from initial values of the
parameters to converge to a local minimum and
the repeat the process a number of times with
different starting values. The estimator is taken to
be the parameter values corresponding to the small
value of the criterion function
• However it does not find the global minimum
• Andrews (1997) proposes a stopping-rule
procedure to overcome this problem
Useful facts
• Weak instrument literature
• Nelson and Startz (1990) instrumental variables
estimators have poor sample properties when the
instruments are weakly correlated with the
explanatory variables
• Chi-square tests tend to reject the null too
frequently compared to its asymptotic distribution
• T-ratios are too large
• Hansen (1985) characterises an efficiency bound
for the asymptotic covariance matrices of the
alternative GMM estimators and optimal
instruments that attain the bound
Useful facts
• Weak instrument literature – Stock, Wright and
Yogo (2002) provide an excellent summary of
some of the issues related to weak instruments
• Recently, some authors advocate the use of
limited-information maximum likelihood
techniques to compare results with GMM
estimation since both asymptotically equivalent
• Neely, Roy and Whiteman (2001) show that
results can be very different for CAPM models
• Furher and Rudebush (2003) show this to be the
case in Euler equations for output
• Mavroeidis (2003) finds similar results for KNPC
Useful facts
• Finite sample properties of GMM estimators –
similar to weak instrument literature
• Tauchen (1986) and Kocherlakota (1990) examine
artificial data generated from a non-linear CAPM
• Using two-step GMM estimator Tauchen
concluded that GMM estimators and test stats had
reasonable properties in small samples
• He also investigated optimal instruments finding
that optimal estimators based on optimal selection
of instruments often do not perform as well in
small samples as GMM estimators using an
arbitrary selection of instruments
Useful facts
• Kocherlakota (1990) allows for multiple assets
and different sets of instruments
• Using iterated GMM estimators, Kocherlakota
finds that GMM performs worse with larger
instrument sets leading to downward biases in
coefficient estimates and narrow confidence
intervals. Also the J test tends to reject too often
• Hansen, Heaton and Yaron (1996) consider the
same methods as Tauchen together with alternative
choices for W. Both the 2 stage and the iterative
methods have small sample distributions that can
be greatly distorted
Useful facts
• Furher, Moore and Schuh (1995) compare GMM
and maximum likelihood estimators in a class of
nonlinear models using MonteCarlo simulations
• They find that GMM estimates tend to reject their
model whilst ML support it
• Why? They find GMM estimates are often biased,
statistically insignificant, economically
implausible and dynamically unstable
• They attribute the result to weak instruments
Useful facts
• Nonstationarity – the data must all be
nonstationary to use GMM
• Thus data are differenced or cointegrated
• In the case of co-integration Cooley and Ogaki
(1996) suggest estimating the cointegration
relationship using OLS and use these parameters
for the covariance matrix W
Practical GMM
• Moment conditions
– Theoretical moment conditions best
– Empirical moment condition - try different
informational assumptions
• Try ‘straight’ IV
• If you know the form of autocorrelation try IVMA
• Eviews reports J/T
Practical GMM
• Use Newey-West first
– Try setting the lag truncation to something that
is close to the autocorrelation expected
– Then try T/3
• Pre-whitening
– Don’t do it unless nothing else works for NW
• QS-PW
– ‘State of the art’ - if it works use it
Euler equations and consumption
• Problem of intertemporal utility max:
1
C
max E0  (1   )t t
Ct
1 
t 0
subject to:
At  (1  r) At 1  Yt  Ct

• Constrained problem:
1
C
max E0  (1   )t t  t ((1  r ) At 1  Yt  Ct  At )
Ct
1
t 0

Euler equations and consumption
• First order conditions:
E0 [Ct  t ]  0
1 r


E0 t 
t 1   0
1


• Euler equation of:
1  r  C    
t 1
t 1
E0 

  1  0
 1    Ct 

Consumption reduced form
- dummy’s guide to Lucas’ critique
• Income process:
Yt  Yt 1  t
• Consumption function:
1 r
Ct  (   1)(1  r ) At 1  (   1)
Yt   t
1 r 
where
(Yt i  EtYt i )
 t  (   1)
,
i
(1  r )
i 0


(1   )
1

1
(1  r ) 
Consumption moment conditions
• The orthogonality (zero mean) conditions:

 1

 Ct 
  1  0
Et 
Rt 1 
1  

 Ct 1 

 1
 


C
Et 
Rt 1  t   1 Rt   0
 
Ct 1 
 1  

 


 1
 C 


C
Et 
Rt 1  t   1 t 1   0
 Ct 
Ct 1 
 1  




Conclusion
• GMM a natural way to estimate many models
• Helpful since
– It does not impose restrictions on the distribution of
errors
– Allows for heteroskedasticity of unknown form
– Estimates the parameters even when models cannot be
solved analytically
Useful References
• Davidson, R and MacKinnon, J. D., 2004, Econometric
Theory and Methods, OUP
• Hayashi, F., 2000, Econometrics, Princeton
• Matyas, L., 1999, Generalised Methods of Moments
Estimation
• Cochrane, J. H., 2001, Asset Pricing, Princeton
• Further references on especially the classic paper by Hansen
(1982), and papers by Hansen and Singleton, Newey and
McFadden, and others can be found in the above textbook.
• The Handbook of Statistics, Vol. 11: Econometrics, edited by
G.S. Maddala, C.R. Rao, and H.D. Vinod, North-Holland,
Amsterdam, (1993) has two good papers on GMM by Ogaki
and Hall