PPT version of slides

Transcript PPT version of slides

Structural identification of vector
autoregressions
Tony Yates
Lectures to MSc Time Series
students, Bristol, Spring 2014
Overview
• The algebra of the identification problem in
VARs.
• Cholesky-factoring; timing restrictions.
• Long run impact restrictions.
• (Max share restrictions).
• Sign restrictions.
• Identification through heteroskedasticity.
Some useful sources
•
•
•
•
•
Luktepohl, Hamilton.
Kilian survey.
Wouter den Haan lecture notes
Karl Whelan lecture notes
Many more!
Why bother with structural
identification
• Empirical form of business cycle accounting,
which is important for informing policy. Eg if RBC
claim that tech shocks dominant true, maybe no
need for stabilisation policy?
• Not needed for forecasting.
• Needed for estimation of our economic models,
eg impulse response function matching.
• Hence needed to understand appropriate policy
design. Eg identify a policy shock.
Lucas(1980) on why identifying shocks
is important
Methods and problems in business cycle theory, JMCB
Reduced form vs structural model for Y
We are estimating this VAR(p), in the vector Y....
Yt A 1 Yt1 A 2 Yt2 A 3 Yt3 
. . . A p Ytp e t
B0 Yt B1 Yt1 
. . . Bp Ytp u t

I B
L

Yt u t
E
ut u
u
t 
IK
..in order to learn about this structural
model, with different coefficients, and
driven by structural shocks
Structural shocks mutually
uncorrelated, and normalised so that
vcov matrix is identity=dimension of Y
Structural vs reduced form VARs
1
1
1
1
B
. . . B
B
0 B0 Yt B0 B1 Yt1 
0 Bp Ytp 
0 ut
1
1
1
Yt B
. . . B
B
0 B1 Yt1 
0 Bp Ytp 
0 ut
1
1
A i B
0 Bi , e t B0 u t
Once we have estimated the reduced form VAR for Y, if only we knew the B_0, we
could recover the sructural shocks, and ALL the coefficients of the structrual
model.
If only!!
Structural identification is about trying to find B_0.
Long and controversial story.
We will tell it chronologically.
Idealised factorisation of the reduced
form vcov matrix
1
 1
e B
E
u
u
B0
t
t
0
1
1
1 1
e B
B
0 u B0
0 B0
LHS known; RHS elements unknown. System of nonlinear equations.
Need to restrict B_0 so that have same number of unknowns as
equations.
Sigma_e is symmetric, as it’s a vcov matrix, hence has K(K+1)/2
independent elements only.
B_0 not necessarily symmetric
Referred to as ‘order condition’ for identification.
Simple illustration of the identification
problem in 2 dimensions
y1
Yt 
y2
AYt1 
t
b 0,11 b 0,12
y1
b 0,21 b 0,22
y2

t
We estimate this 2 variable reduced
form VAR(1) to learn about this 2
variable structural VAR.
e1
e2
t
b 1,11 b 1,12
y1
b 1,21 b 1,22
y2

t1
u1
u2
t
1
y1
y2

t
1
b 1,11 b 1,12
b 1,11 b 1,12
y1
b 1,21 b 1,22
b 1,21 b 1,22
y2

t1
b 1,11 b 1,12
u1
b 1,21 b 1,22
u2
Here we invert the coefficient matrix B0 so that the structural VAR has
the same form as the reduced form VAR.
Which allows us to see relation between rf and structural errors….
t
Identification problem in 2
dimensions, ctd…

e E
e1
e1
e2
e2

e,11 e,21
e,21 e,22
1

1
b 1,11 b 1,12
b 1,11 b 1,12
b 1,21 b 1,22
b 1,21 b 1,22

b 21,11 b 21,21
b 21,21 b 21,22
First line: we compute the vcov matrix of reduced form errors which
we see has only 3 separate elements.
Second line: we note that this is equal to inv(B0)*inv(B0)’.
Problem: this is a 2*2 with 4 independent unknown elements.
We have only 3 knowns to find these 4 unknowns.
This is the identification problem in VARs/SVARs.
Cholesky identification
e PP , P chol
e 
1
P B
0
xt
Yt 

t
it
p 11

0
p 21 p 22
P is lower triangular
Used eg to identify a monetary policy shock
Assumes strict causal chain in the VAR
GDP and inflation don’t react within period to
a monetary policy shock.
0
0
p 31 p 32 p 33
p 11
B1 Yt1 
...
0
p 21 p 22
0
ux
0
u
p 31 p 32 p 33
ui
t
For those studying DSGE models.
Third equation looks like a central bank reaction function.
But it isn’t! Coefficients of the central bank reaction function will show up in all
of the VARs reduced form equations.
See, eg, Canova comment on Benati/Surico paper.
Recovering the monetary policy shock
and structural coefficients with
Cholesky identification
ex
p 11

e
ei
0
p 21 p 22
0
ux
0
u
p 31 p 32 p 33
t
e it p 31 u xt p 32 u t p 33 u it
e it p 31 u xt p 32 u t
u it
p 33
1
A i B
0 Bi
1
p 11
B1 A 1
0
p 21 p 22
0
0
p 31 p 32 p 33
ui
t
Finishing the 3d monetary policy shock
e it p 31 u xtp 32 u t
p 33
u it
e xt p 11 u xt
u xt e xt /p 11
e t p 21 u xt p 22 u t
p 21 e xt /p 11 p 22 u t
e t p 21 e xt /p 11
u t
p 22
Our expression for the mon
pol shock u_it was in terms
of some unknowns.
Here we can get rid of one of
them, u_xt.
Here we get rid of the
second, u_pit, also using
expression for u_xt
And we are done. u_it now
entirely in terms of knowns.
Impulse responses to the structural
shocks
1
e B
0 u  u B0 e
irf
Yh A h B0 e
We can use the reduced form VAR estimates to compute the response to
structural shocks.
Our structural shocks are functions of the reduced form shocks, a function
of our assumed value for B_0.
So we then feed these through the reduced form VAR in the normal way,
taking successively higher and higher powers of the A matrix.
Problems with recursive structure
implied by Cholesky identification
• Is it economically plausible? Quarterly data.
Does output or inflation really not respond to
interest rates within the quarter?
• Assumed causal ordering means we can’t use
the VAR to find out about causality.
• Common practice to check for robustness to
alternative causal orderings, but this is rarely
done comprehensively; and [so Kilian says] is
nonsensical!
Recap on causal assumptions in
Cholesky identification of mon pol
shock
ex
p 11

e
ei
t
0
p 21 p 22
0
ux
0
u
p 31 p 32 p 33
ui

t  E
t
t
1 x t u 
x t Ex t1 
it E
t
1 u xt
it  
t 
x x t u it
Neither e_x nor e_pi are functions
of the structural shock u_it.
t
Implies output gap and inflation
do not respond [within the
period] to a monetary policy
shock.
Is this really plausible?
Remember the NK DSGE model.
If we trace the timing of the effects of the monetary policy shock, we can
see that it affects inflation and the output gap straight away.
Of course, the NK model could be a load of nonsense!
More problems with recursive
identification.
• Sims and the ‘price puzzle’.
• Would find that recursively identified, contract
mon pol shock leads to increase in prices,
contrary to theory.
• Concluded that had omitted variable cb
responding to, eg commodity prices.
• CP up means prices up, despite central bank
rate increase.
Recursive ID problems: omitted
variables
• Omitted variable bias leads us to enlarge the
VAR.
• But there is a cost: imprecision in the
coefficients. And the search for more
restrictions.
• Solutions: Bayesian shrinkage and factor
modelling to reduce dimensionality.
• In time we will cover both of these.
What are these mp shocks anyway?
• Why would policymakers induce shocks?
• Are they really conducting experiments for their own
edification [a control literature involving Sargent,
Cogley shows there is a benefit]. Alan Blinder says not.
• Shocks are just misspecifications by econometrician
• Shocks are policymakers’ real time measurement
error?
• Shocks are policymakers’ [or our] model misspecification
• Same goes, of course, for fiscal policy shocks.
Alternative, less intensive measures of
identifying (eg) mp shocks
• Romer and Romer (1989) narrative measures
of monetary policy (and fiscal policy) shocks.
• Rudebusch (1998): gap between actual Fed
Funds Rate and expectations implied by Fed
Funds futures.
Romer and Romer’s narrative shock
definition
Problem: they are including movements in rates prompted by concerns about inflation.
So if we think i=a*pi+b*y+shock, they are not identifying m p shocks
Romer and Romer(1989), quoting
Friedman and Schwartz (1964)
Rudebusch, and Sims on Rudebusch
• Rudebusch: FFFs produce better forecasts
than VARs. His surprise a better measure of
policy ‘shocks’.
• Sims (1996): 1. False premise. FFR shocks
confound surprises due to non policy shocks
with those due to shocks. 2. Not true that if
shock measures badly correlated, estimated
effects are very different.
Famous applications
• Rotemburg and Woodford (1998).
– Early application. Estimated DSGE model by fitting
IRFs of monetary policy shock using MDE.
– Later pointed out that in DSGE model
EVERYTHING responds within the period to
monetary policy.
• Christiano, Eichenbaum and Evans (2005)
– Same exercise. But DSGE model is consistent with
timing assumption.
Minimum distance estimation

mde
arg min Y
 Y
A, e 
DSGE model is defined by a vector of parameters, PHI.
We take as our PHI_hat the value of these parameters that makes
the impulse response in the DSGE model as close as possible to
the identified, estimated IRF in the VAR.
Some costs and benefits wrt eg MLE estimation of DSGE model.
Cost: partial information, means bad identification.
Benefit: MLE only consistent if model well-specified.
Identification using long run
restrictions
• Attractive because much agreement on
certain long run restrictions. So these are
‘credible’ in Sims language. At least among
members of the RBC/DSGE cult [like me]!
• Famous early applications are Blanchard and
Quah (1989), and Gali (1999). More later.
• Technique sparked famous arguments
between Chari et al and VAR proponents,
notably Christiano and coauthors.
Egs of restrictions uncontroversial, at
least in DSGE/RBC-land
• Nominal shocks should have no long run
impact on real variables.
• Corrollary: only real shocks should have a
long run impact on real variables.
• Real shocks like technology should be neutral
on inflation in the long run.
• Only inflation regime changes should affect
inflation in the long run.
Deriving long run restrictions for an
SVAR
1
Yt A t1 B
0 ut
1
impact period: B
0
1 period after:
1
AB
0
1
2 periods after:A 2 B
0
Reduced form VAR(1) with an expression in terms of
structural shocks and the unkown structural impact
matrix substiuted in place of the RF shock
This is how we would compute the IRF to the
structural shock, if only we knew the structural
impat matrix....
1
n periods after:A n B
0
You can see this defines an infinite sequence. Effect
on level of something=0, implies sum of effects on
difference=0.
Deriving long run restrictions
1
1
1
2 1
D B
AB
...
A n B
0 
0 A B0 
0
1
B
I A A 2 
. . . An 
0 
1
D B
I A1
0 
Factor the long run IRF in terms of
something you might recognise from high
school as the expression for an infinite
geometric series, or its matrix equivalent.

 a i  11a
i0

1 1
DD
I A1 B

I A1 
0 B0 


I A1 e 

I A1 
Some algebra and we spot that DD’
involves something we can estimate
from the data, the vcov of the RF
residuals!
Long run restrictions and the Cholesky
factor, again

1 1
1


DD
I A1 B
B

I
A
0
0


I A1 e 

I A1 

D chol

I A1 e 

I A1 
1
D B
I A1
0 
1
B
I A
D
0 
DD’ which we know to be related
to the magic B0 which we are
trying to find…
…we also know to be related
entirely to things we do know.
Gali’s (AER, 1999) search for
technology shocks
Yt 
 log
y t /h t 
 log
ht
D
d 11
0
d 21 d 22
Only the tech shock (which we say comes first,
has a long run effect on the first variable, (the
change in) output per hour
Celebrated applications of LR
restrictions
• Blanchard-Quah. Tech shock is the only thing to affect
output in the long run. Q: how important are tech
shocks in driving the business cycle?
• Gali (1999). Tech shock is only thing driving labour
productivity. Appears to cause hours to fall, not rise.
Consistent with sticky price model, not RBC models.
Hours tend to comove positively with business cycle.
Suggests tech shocks not dominant driver of business
cycle.
• Christiano-Eichenbaum-Vigfussen (). Reexamination of
Gali. Conclusion depends on definition of hours
worked used in VAR
Identification of VARs using sign
restrictions
e PP , P chol
e 
 
e PCC P , C Givens
1 0 0 0
0 c s 0
0 s c 0
, c cos
, s sin
0 0 0 1
Cholesky factorisation of rf varcov can be expanded with product of any
orthonormal matrix
So, parameterising with the angle theta, we choose all those that satisfy certain
sign restrictions
sign
S. PCA h R
Sign is the ‘signum’ function; S is a selector matrix with ones for restricted
elements, 0s otherwise. C is our givens matrix. A is our estimated rf impact matrix.
Signum function and the selector
matrix
sign
S.
0 0. 05
2
2 6
0 1
3

0 1

1 0
1 0
An example of what the signum
function does to a matrix.
Turns things into 0s, 1s or -1s.
Just a way to record in an
algorithm whether things are
positive, negative or zero.
Could do it differently, with
more if, then else statements.
1 1
.
2 6
0 1

2 0
0 0
An example of doing element by element multiplication with a
selector matrix.
Verbal description of sign restrictions
•
•
•
•
Take cholesky factor of vcov matrix P
Multiply by some Givens, C(theta)
Check signs of elements of interest.
If they agree with your restrictions, store and
keep.
• If not, move on.
• At the end, plot ALL the IRFs, and or
summarise them somehow.
i=cb rate, pi=inflation, x=output,
ur=unemployment rate
it
Yt 

t
AYt1 e t
xt
ur t
1
R
0
1 0
1 1
1 0
1 1 1 0
1 0 1 0
1 0 1 0
S
1 1 1 0
1 1 1 0
1 0 1 0
1 0 0 0
C

0 c s 0
0 s c 0
0 0 0 1
Mp shock in first column: contraction
raises rate, lowers pi, lowers output,
lowers unemployment.
Tech shock in second column: reduction
has ambiguous effect on i, increases pi,
lowers output, ambiguous effect on ue
Demand shock in third column: cb raises
rates to fight it, inflation and output
increase anyway, ue falls.
Fourth shock unidentified. A dustbin
containing many things we don’t need to
worry about.
Alternative way to do sign restrictions
using the QR decomposition
1. L KK , L ij  NID
0, 1
2. L QR, QQI K
3. C Q
4. Proceed as before
Any square matrix can be decomposed into a product of an orthonormal
matrix and something else.
Matlab will calculate this for you in a flash.
Note equivalence which seems not to be widely understood. Derives from
fact that all orthonormal matrices can be shown to be product of Givens
matrices.
Personal preference: use Givens method. Systematic way to explore the
space.
Random number generators for step 1 in computers are not random, they are
pseudo random. Don’t know if this matters much.
Sign restrictions at different or
multiple horizons
sign
SPC

A h R
Choose h the horizon, then
keep IRF if it satisifies this
condition
sign
SPC

A h 1 R, sign
SPC

A h 2 R
For multiple horizons, keep if the condition above applied for
multiple horizons holds!
Warning: spanning the entire space of
possible impulse responses
n
n 1
/2
n
n1
/2

i1
C
i 
A givens indexed by one angle only not
enough to guarantee to find all possible IRFs
that satisfy the sign restriction, except in the
2 variable case [eg n=2, 2*(1)/2=1]
More generally, we need to search across
orthonormal matrices formed by products of
Givens matrices.
Reporting distributions from sign
restrictions
n
1. For each h compute:
IRF h 1/n  Yh,i
irf
i1
2. Find arg min

irf
Yh,i IRF h
2
h
irf
3. Plot Yh,
Sign restrictions generate arbitrary numbers of impulse response functions.
How to report them? Can just plot the whole damned lot.
People used to report moments at each h eg the median.
Fry and Pagan suggested above. Find the single SVAR corresponding to single
angle that is closest to this median.
Sign restrictions: examples
• Giraitis, Kapetanios, Yates.
• Actually a TVP study, but don’t worry about
that until later on in the lecture series.
• Sign restrictions to identify various shocks.
• Studies time variation in the IRFs.
Sign restrictions in Giraitis, Kapetanios,
Yates
c i y h
monetary policy -
-
technology
 
-

-
  - -
labour supply
demand
-
 w/p r




Identification using Mountford and
Uhlig’s penalty method
1. draw rotation matrix C

H
2. compute K

 w h SPCA h V h SPCA h V h 
h1
3. Rank according to K

w_h are weights; V records not just signs, but magnitudes.
Why would you do this? Isn’t it just assuming the answer?
Well, one motivation is to use one implication of the model to test
another, that you don’t impose.
Another is that some rotations may satisfy the sign restriction literally,
but you want the IRF not just to clear the zero line by a tiny amount.
Using the ‘max share’ criterion in a
VAR to identify news shocks
• Barsky-Sims (2010): identified news shocks for
technology.
• Long history of identifying technology shocks,
measuring impact, quantifying contribution to
business cycles
• RE assumption suggests agents may also react to
news about future events.
• Failure to account for this may mislead us in
properly measuring and quantifying effects of
technology shocks.
Other work on news shocks
• Beaudry-Portier: news shocks in VARs: news
causes hours to increase.
• Jaimovich and Rebello: news shocks in RBC
model, neutralising the wealth effect so that
hours don’t fall.
• Schmitt-Grohe and Uribe (2012): multiple news
shocks in large RBC style model
• Christiano, Motto, Rostagno and Pinter,
Theodoridis and Yates: risk news shocks in DSGE
model and VAR respectively
News shocks: algebra
y t B
L
ut
u t A
t
AA 
h
y th E t1 y th   BAQ

t
h
0
Q
QI N , KK1/21
Q
Q1 Q2 . . . QKK1/21 
A chol

Note annoying change of notation; matches Barsky-Sims. Sorry. Good for the
soul though.
This slide is about writing down the expression for the forecast error up to
horizon h.
News shocks identification, ctd...
i,j 
h
e
i


h

B AQ
e j e 
Q

A
B ei

j
0
h
e
B B  e i
i
0
The share of the FEV up to
horizon h, of variable i,
accounted for by shock j...

Example news process for tfp.
Could be more general than this.
tfp t 
tfp t1 u tfp,t u news
tfp,t1
1,1 
h1,2 
h1
In an ideal setting, share of current and news shock to tfp
that accounts for tfp should be 1. tfp is exogenous after
all!
News shocks: the max share criterion
max

i,j 
h


subject to
A 
1, j0, j 1
sign
SA 22 F
By choice of the K.(K-1)/2 vector of angles
w, maximise the share of tfp forecast
error variance up to horizon h, accounted
for by the news shock to tfp, subject to
the restriction that the news shock is
orthogonal to tfp today. [if it weren’t, it
wouldn’t be news, it would be a contemp.
Shock to tfp]
In Pinter, Theodoridis and Yates, in our
search for ‘risk news’ shocks, we impose
a sign restrictions condition too.
Comments on news shock ID
• Method proposed originally as an alternative
to LR restrictions that does not depend on
uncovering zero frequency events in finite
samples: see work by Faust and Diebold et al.
• Note practical contradiction. Proxy for shock
is supposed to be exogenous, but is included
as an endogenous variable.
• Monte carlo tests in economic laboratory
show that it works in ‘theory’.
Application : ‘risk news shock’ in
Pinter, Theodoridis and Yates
• Looks for the same shock as in CMR(2013), AER.
• Risk shocks are fluctuations in the variance of
idiosyncratic returns to entrepreneurs.
• Literally, in CMR, these guys build capital goods
from capital inputs, selling them to sticky-price
producers.
• They borrow from banks at a spread related to
their cross-sectional risk.
• Higher risk means more defaults, means bigger
spread to compensate.
Pinter, Theodoridis, Yates
• VAR study allows us to drop lots of
contestable assumptions in the full
information estimation of CMR’s DSGE model.
• Cost: we have to assume we can observe risk.
• Do this using the VIX, and using cross section
of stock returns computed from a US panel.
Risk proxies
‘F’: Sign and zero restrictions in the
VAR
t
t
1
news tech net w mpol news tech net w mpol
0
0
0
0

-


-
-


-
inflation
-
-

-
-
-

-
policy rate
-
-


-
-




-
-


-
risk
spread
GDP growth
C growth
I growth
hours
r wage growth
net worth growth -
IRF to a risk news shock: VIX vs CSR
FEVD contributions (VIX)
Headlines from application
• PTY get a much smaller contribution of the
risk and risk news shocks to the business cycle
than CMR. 20% contribution in total,
compared to 60% in CMR.
Rigobon: identification through
heteroskedasticity
• If you can persuasively argue there was a
change in the structural shock variances
• And at the same time the VARs coefficients
were constant
• Then you can use the change to generate
more equations to find your unknowns and
get identification.
• In the limit, this is like event study analysis.
Similar variance of demand and supply shocks generates cloud of dots, unable
to see shape of either curve of course.
Increase in variance of supply shocks starts to trace out shape of demand
curve helping us to ‘identify’ demand coefficients.
In the limit : event study; no demand shocks at all.
The identification problem: Rigobon’s
simple demand-supply example
p t q t t
q t p t t
P p 1 , p 2 . . . p T ,
Q q 1 , q 2 . . . q T 
, , 2, 2

1

12
A simultaneous demand and supply system.
All we observe is time series on prices P and
quantities Q
We try to estimate four unknowns, the two slopes,
and the two shock variances
2 2 2 2 2 2
2 2 2
2 2
However all we can estimate consistently is the reduced form vcov matrix.
Its symmetry means we only have 3 different elements, so this is 3 equations
in 4 unknowns.
Rigobon’s heteroskedasticity solution
to the identification problem
s 
1

12
2 2,s 2,s 2 2,s 2,s
 ,s 2,s
2
2
,s 2,s
2
, s  1, 2
There are TWO values for s, so we have TWO reduced form vcov
matrices.
This means two lots of 3 equations=6.
Slope coefficients are unchanged, so they provide 2 unknowns as
before.
Structural variances change, so there are now double 2=4
unknowns.
Making 6 equations in 6 unknowns.
Luktepohl has applied this to VARs.