Week6.1 VAR

Transcript Week6.1 VAR

VAR Models
Gloria González-Rivera
University of California, Riverside
and
Jesús Gonzalo U. Carlos III de Madrid
Some References
• Hamilton, chapter 11
• Enders, chapter 5
• Palgrave Handbook of Econometrics, chapter 12 by Lutkepohl
• Any of the books of Lutkepohl on Multiple Time Series
Multivariate Models
• VARMAX Models as a multivariate generalization of the
univariate ARMA models:
p

q
r
 s Ls Yt 
s 0
nx n

G i Li X t 

 j L j t
i 0
j 0
nx1
n x k k x1
nx n
• Structural VAR Models:
BYt  1Yt 1  ...   pYt  p   t
• VAR Models (reduced form)
Yt  1Yt 1  ...   pYt  p  a t
Multivariate Models (cont)
where the error term is a vector white noise:
E ( at a ' s )   if s  t
 0
otherwise
To avoid parameter redundancy among the parameters, we need
to assume certain structure on
0
and
This is similar to univariate models.

A Structural VAR(1)
Consider a bivariate Yt=(yt, xt), first-order VAR model:
y t  b10  b12 x t  11y t 1  12 x t 1   yt
x t  b20  b21y t   21y t 1   22 x t 1   xt
• The error terms (structural shocks) yt and xt are white noise
innovations with standard deviations y and x and a zero covariance.
• The two variables y and x are endogenous (Why?)
• Note that shock yt affects y directly and x indirectly.
• There are 10 parameters to estimate.
From a Structural VAR to a Standard VAR
• The structural VAR is not a reduced form.
• In a reduced form representation y and x are just functions of lagged y
and x.
• To solve for a reduced form write the structural VAR in matrix form
as:
 1 b12   yt   b10    11  12   yt 1   yt 
b 1   x   b       x    
 21   t   20   21 22   t 1   xt 
BYt   0  1Yt 1   t
From a Structural VAR to a Standard VAR (cont)
• Premultipication by B-1 allow us to obtain a standard VAR(1):
BYt  0  1Yt 1   t
Yt  B 10  B 11Yt 1  B 1 t
Yt   0  1Yt 1  at
• This is the reduced form we are going
to estimate (by OLS equation

by equation)
• Before estimating it, we will present the stability conditions (the
roots of some characteristic polynomial have to be outside the unit
circle) for a VAR(p)
• After estimating the reduced form, we will discuss which information
do we get from the obtained estimates (Granger-causality, Impulse
Response Function) and also how can we recover the structural
parameters (notice that we have only 9 parameters now).
A bit of history ....Once Upon a Time
Sims(1980) “Macroeconomics and Reality” Econometrica, 48
Generalization of univariate analysis to an array of random variables
i.e. Z t  money supply, X t  interest rate, Vt  income
 Z t  VAR(p)
Yt   X t  Yt  c  1Yt 1   2Yt 2  ..... pYt  p  at
 
Vt 
 t  
E (at )  0 E (at a ' )  
0 t 
(1)

11 12 13 
i are matrices
1  21 22 23 


31 32 33 
A typical equation of the system is
Z t  c1  
Z t 1  
(1)
11
(1)
12
X t 1  
V  ..... 
(1)
13 t 1
11( p ) Z t  p  12( p ) X t  p  13( p )Vt  p  a1t
Each equation has the same regressors
Stability Conditions
Yt  1Yt 1   2Yt 2  ...... pYt  p  c  at
( I  1L   2 L2  ...... p Lp )Yt  c  at
 ( L)Yt  c  at
 ( L) is a nxn matrix polynomial in the lag operator L
the ij element of  (L) is
[ ij  ij L  ij L  ....  ij
(1)
A VAR(p) for
( 2) 2
Yt
( p)
1 i  j
L ]  ij  
0 i  j
p
is STABLE if
I n  1 x   2 x 2  ..... p x p  0
p x n roots of the characteristic polynomial are outside of the unit circle.
  ( I n  1  2  ..... p )1 c
If the VAR is stable then a
MA( ) representation exists.
Yt    at  1at 1  2at 2  ......    ( L)at
( L)  [ I n  1L  2 L2  ......]
This representation will be the “key” to study the impulse response
function of a given shock.
VAR(p)
VAR(1)
Re-writing the system in deviations from its mean
Yt    1 (Yt 1   )   2 (Yt 2   )  ... p (Yt  p   )  at
Stack the vector as
1  2 ...... p 1  p 
Yt   
 at 


Y   
0 
 I n 0...............0 
t 1
 F  0 I ...............0  v   
t  
n
t




Y


 t  p 1

(nxp)x1


0 0.......... I 0
n





 
 
0 
(nxp)x1
(nxp)x(nxp)
H t  
t  Ft 1  vt E ( vt v ')  
0 t  
STABLE:
 0.....0 
0 0......0  (nxp)x(nxp)
eigenvalues of F lie inside

where H  
of the unit circle (WHY?).




0
0......0


Estimation of VAR models
Estimation: Conditional MLE
T
f (YT , YT 1.....Y1 | Y0 , Y1....Y p 1; )   f (Yt | Yt 1 , Yt 2 ....Yt  p 1; )
t 1
Yt | Yt 1 , Yt 2 ....  N ( c  1Yt 1  .... pYt  p , )
 '  [c 1  2 ..... p ]
n x (np+1)
X t  [1 Yt 1 Yt 2 ......Yt  p ]'
(np+1) x 1
Yt   ' X t  at
T
( )   log f (Yt | past; ) 
t 1
Tn
T
1 T
1
  log(2 )  log    Yt   ' X t  '  1 Yt   ' X t  
2
2
2 t 1
Claim: OLS estimates equation by equation are good!!!



ˆ
 'ols   Yt X 't    X t X 't 
 t 1
  t 1

T
ˆ 
ˆ

mle
ols
Proof:
T
T
1
 Y  ' X '  Y  ' X  
1
t
t
t
t
t 1
T

 

ˆ 'X 
ˆ ' X   ' X '  1 Y  
ˆ 'X 
ˆ ' X  ' X 
  Yt  
ols
t
ols
t
t
t
ols
t
ols
t
t
t 1
T

 

ˆ   )' X '  1 aˆ  ( 
ˆ   )' X 
  aˆt  ( 
ols
t
t
ols
t
t 1
ˆ   ) 1 ( 
ˆ   )' X  2 aˆ '  1 ( 
ˆ   )' X
  aˆ 't  1aˆt   X 't ( 
ols
ols
t
t
ols
t
t
t
t


1 ˆ
1 ˆ
ˆ
ˆ
(*)  a 't  (  ols   )' X t  tr  a 't  (  ols   )' X t  
t
 t


ˆ   )' X aˆ '   tr  1 ( 
ˆ   ) X aˆ '   0
 tr   1 ( 
ols
t t
ols
t t

t
 t



T
min


Yt
  ' X t '  1 Yt   ' X t
t 1
 min
t
â ' t  1â t 


ˆ ols   ) 1 (
ˆ ols   )' X t
X ' t (
t
because  is positive definite matrix   -1 is positive definite
ˆ ols  
the smallest v alue is achieved when 
Maximum Likelihood of  Evaluate the log-likelihood at ̂ , then
T
Tn
T
1

1
ˆ)
(, 
log( 2 )  log    aˆ 't  1aˆt
2
2
2 t 1
T
ˆ) T
(, 
1 T
1
ˆ   aˆ aˆ '
ˆ
ˆ


'

a
a
'

0



t
t
t
t
 1
2
2 t 1
T t 1
diagonal elements ˆ ii
2
1

T
off - diagonal elements ˆ ij
2
T
2
ˆ
a
 it
t 1
1

T
T
 aˆ aˆ
it
t 1
jt
Testing Hypotheses in a VAR model
Likelihood ratio test in VAR
T
Tn
T
1
ˆ ,
ˆ 1   aˆ ' 
ˆ 1aˆ
ˆ )   log( 2 )  log 
( 
t
t
2
2
2 t 1
T
T
1 T
1
1




1
1
1
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
a 't  at  trace  a 't  at   trace   at a 't  

2 t 1
2
 t 1
 2
 t 1

1
1
Tn
1 ˆ
ˆ
 trace  T  traceTI n  
2
2
2
Tn
T
Tn
1
ˆ
ˆ
ˆ
(,  )   log( 2 )  log  
2
2
2
Testing the number of lags p1  p0 :


H 0 : VAR( p0 )
H1 : VAR( p1 )
under H 0
perform n OLS regression s of each variable on
ˆ
ˆ ,
a constant and p lags  
0
0
0
Tn
T
Tn
1
ˆ
 0   log( 2 )  log 0 
2
2
2
*
under H1
Tn
T
Tn
1
ˆ
   log( 2 )  log 1 
2
2
2
*
ˆ 1  log 
ˆ 1  T log 
ˆ  log 
ˆ
LR  2(*1   0 )  T log 
1
0
0
1
*
1

LR   m
2
 

m  number of restrictio ns  n 2 ( p1  p0 )
each equation has p1  p0 restrictio n on each variable 
n( p1  p0 ) in each equation
In general, linear hypotheses can be tested directly as usual and
their A.D follows from the next asymptotic result:
Let  T  vec (  T ) denote the (nk  1) (with k=1+np number of parameters
estimated per equation) vector of coef. resulting from OLS regressions of each
of the elements of y on x for a sample of size T:
t
t
T 
 1.T 
 . ,
 n.T 
T

'
where  iT =   x t x t 
 t=1

-1
T

  x t yit 
 t=1

ˆ is
Asymptotic distribution of 
T ( T   )  N (0, (   M
2
T (ˆ iT   i )  N (0,  i M
1
1
)), and the coef of regression i
)
with Mˆ  p lim(1 / T )  X t X ' t
t
Information Criterion in a Standard VAR(p)
• In the same way as in the univariate AR(p) models,
Information Criteria (IC) can be used to choose the “right”
number of lags in a VAR(p): p̂ that minimizes IC(p) for
p=1, ..., P.
2(n 2 p  n)
AIC  ln  
T
(n 2 p  n) ln(T)
SBC  ln  
T
• Similar consistency results to the ones obtained in the univariate
world are obtained in the multivariate world.The only difference is
that as the number of variables gets bigger, it is more unlikely that
the AIC ends up overparametrizing (see Gonzalo and Pitarakis
(2002), Journal of Time Series Analysis)
Granger Causality
Granger (1969) :
“Investigating Causal Relations by Econometric Models and CrossSpectral Methods”, Econometrica, 37
Consider two random variables
X t , Yt
Two Forecast of X t , s periods ahead:
(2)
ˆX ( s ) (1)  E ( X
ˆ
t
t  s | X t , X t 1 , ....) X t ( s )  E ( X t  s | X t , X t 1 , ....Yt , Yt 1 , ....)
2
ˆ
ˆ
MSE ( X t ( s ))  E ( X t  s  X t ( s ))
(1)
(2)
ˆ
ˆ
If MSE ( X t ( s ) )  MSE ( X t ( s ) ) then Yt does not Granger-cause X t s  0
 Yt is not linearly informative to forecast X t
Test for Granger-causality
Assume a lag length of p
X t  c1  1 X t 1   2 X t 2  ..... p X t  p  1Yt 1   2Yt 2  .... pYt  p  at
Estimate by OLS and test for the following hypothesis
H 0 : 1   2  ......   p  0 (Yt does not Granger - cause X t )
H1 : any i  0
Unrestricted sum of squared residuals
RSS1   aˆt
2
t
Restricted sum of squared residuals
RSS 2   aˆˆt
2
t
( RSS2  RSS1 )
F
RSS1 /(T  2 p  1)
• Under general conditions
F
( p)
Impulse Response Function (IRF)
Objective: the reaction of the system to a shock
Yt  c  1Yt 1   2Yt 2  ....   pYt  p  at
If the system is stable,
Yt     ( L)at    at  1at 1   2at 2  ....
 ( L)  [ ( L)]1
Redating at time t  s :
Yt  s    at  s  1at  s 1   2at  s 2  ....   s at   s 1at 1  ....
 
Yt  s
(s)
 s   ij
a 't
nxn
yi ,t  s
(s)
  ij
a jt
(multipliers)
Reaction of the i-variable to a unit change
in innovation j
Impluse Response Function (cont)
Impulse-response function: response of yi ,t  sto one-time impulse in
y with all other variables dated t or earlier held constant.
jt
yi ,t  s
  ij
a jt
 ij
1
2
3
s
Example: IRF for a VAR(1)
2

y
y
a



 12 


 1t   11 12  1t 1  1t 
1

 y    ; a  
 y   

2
 2t   21 22   2t 1  a2t 
 12  2 
t  0 y1t  y2 t  0
t  0 a10  0, a20  1 (y2t increases by 1 unit)
(no more shocks occur)
Reaction of the system y10   0
(impulse)
 y  1 
 20   
 y11  11 12   0 12 
 
 y   



 21   21 22  1  22 
 y12  11 12   y11  11 12  0

 y   



 1 

y


22   21 
22   
 22   21
 21
2
0
 y1s  11 12  0
s  



1  
 y  
 1 

1 
22   
 2 s   21
s
If you work with the MA representation:
 ( L)   ( L)
1
1  1
2  1
2

s   s1
In this example, the variance-covariance matrix of the innovations
is not diagonal, i.e.  12  0
There is contemporaneous correlation between shocks, then
 y10  0
 y   1 
 20   
This is not very realistic
To avoid this problem, the variance-covariance matrix has to be
diagonalized (the shocks have to be orthogonal) and here is where
a serious problems appear.
Reminder:
 is positive definite (symmetric) matrix.
 Q (non-singular) such that QQ '  I
Then, the MA representation:

Yt      i at i
i 0
0  I n

Yt      iQ 1Qat i
i 0

Let us call M i   iQ 1; wt  Qat  Yt     M i wt i
i 0
E[ wt w 't ]  E[Qat a 't Q ']  QE[at a 't ]Q '  QQ '  I n
wt has components that are all uncorrelated and unit variance
Yt  s
 M s   s Q 1
wt
Orthogonalized impulse-response
Function.
Problem: Q is not unique
Variance decomposition
Contribution of the j-th orthogonalized innovation to the MSE of
the s-period ahead forecast
MSE (Yˆt ( s ))  E (Yt  s  Yˆt ( s ))(Yt  s  Yˆt ( s )) '
et ( s )  Yt  s  Yˆt ( s )  at  s  1at  s 1  ..... s 1at 1
E [et ( s )et ( s ) ']   a  1 a 1 ' ....   s 1 a  s 1 '
MSE ( s )  Q 1Q aQ ' Q 1'  1Q 1Q aQ ' Q 1' 1 ' ....
 s 1Q 1Q aQ ' Q 1'  s 1 ' 
1
Q Q
1'
1
1'
1
1'
 1Q Q 1 ' ....... s 1Q Q  s 1 ' 
 M 0 M 0 ' M 1M 1 ' .........M s 1M ' s 1
recall that M i  iQ 1
and M 0  Q 1 , 0  I
contribution of the first orthogonalized
innovation to the MSE (do it for a two variables VAR model)
Example: Variance decomposition in a two
variables (y, x) VAR
• The s-step ahead forecast error for variable y is:
y t s  E t y t s  M 0 (1,1)  yt s  M1 (1,1)  yt s1  ...  M s1 (1,1)  yt 1 
M 0 (1, 2)  xt s  M1 (1, 2)  xt s1  ...  M s1 (1, 2)  xt 1
• Denote the variance of the s-step ahead forecast
error variance of yt+s as for y(s)2:
2
2
2
2
2
 y (s)   y [M 0 (1,1)  M1 (1,1)  ...  M s1 (1,1) ] 
2
2
2
2
 x [M 0 (1, 2)  M1 (1, 2)  ...  M s1 (1, 2) ]
• The forecast error variance decompositions are
proportions of y(s)2.
due to shocks to y 
2
2
2
2
2
 y [M 0 (1,1)  M1 (1,1)  ...  M s1 (1,1) ] /  y (s)
due to shocks to x 
2
2
2
2
2
 x [M 0 (1, 2)  M1 (1, 2)  ...  M s1 (1, 2) ] /  y (s)
Identification in a Standard VAR(1)
• Remember that we started with a structural VAR model, and
jumped into the reduced form or standard VAR for estimation
purposes.
•Is it possible to recover the parameters in the structural VAR
from the estimated parameters in the standard VAR? No!!
•There are 10 parameters in the bivariate structural VAR(1) and
only 9 estimated parameters in the standard VAR(1).
•The VAR is underidentified.
•If one parameter in the structural VAR is restricted the
standard VAR is exactly identified.
•Sims (1980) suggests a recursive system to identify the model
letting b21=0.
1 b12   y t   b10   11 12   y t 1  yt 
 



0 1   x    b    

  t   20   21  22   x t 1  xt 
Identification in a Standard VAR(1) (cont.)
• b21=0 implies
 y t  1  b12   b10  1  b12   11 12   y t 1  1  b12    yt 

 x   0 1   b   0 1   



 

x
0
1
  20  
  21 22   t 1  
   xt 
 t 
 y t   10   11 12   y t 1  e1t 
 
 x      



 t   20   21 22   x t 1  e2t 
• The parameters of the structural VAR can now be identified from the
following 9 equations
10  b10  b12 b20 20  b20 var(e1 )  2y  b122 2x
11  11  b12  21 21   21 var(e2 )  2x
12  12  b12  22 22   22 co v(e1,e2 )  b122x
Identification in a Standard VAR(1) (cont.)
•Note both structural shocks can now be identified from the
residuals of the standard VAR.
•b21=0 implies y does not have a contemporaneous effect on x.
•This restriction manifests itself such that both yt and xt affect y
contemporaneously but only xt affects x contemporaneously.
•The residuals of e2t are due to pure shocks to x.
•Decomposing the residuals of the standard VAR in this triangular
fashion is called the Choleski decomposition.
•There are other methods used to identify models, like Blanchard
and Quah (1989) decomposition (it will be covered on the
blackboard).
Critics on VAR
• A VAR model can be a good forecasting model, but in a sense it is
an atheoretical model (as all the reduced form models are).
• To calculate the IRF, the order matters: remember that “Q” is not
unique.
• Sensitive to the lag selection
• Dimensionality problem.
•THINK on TWO MORE weak points of VAR modelling