Differences-in-Differences and A (Very) Brief Introduction

Download Report

Transcript Differences-in-Differences and A (Very) Brief Introduction

Univariate Time series
Methods of Economic
Investigation
Lecture 18
Last Time

Maximum Likelihood Estimators


Likelihood function: useful for estimating nonlinear functions
Various test statistics


LR has nice properties and W/LM converge to LR in
large samples
Provides a clear way to test between models
Today’s Class

Why might you need to test between
models?



Error term or dependent variable doing
something unconventional
Correlation across observations
Usually this occurs in “time series” data
Data Structures

Remember back to the beginning of the
term: two types of data:

Uni-dimensional



Across individuals
Across time
Multi-dimensional


Panel data (repeated observations on same individual
over time)
Repeated Cross-section (observations in various time
periods on different individuals
Time Series Process

time series as set of repeated
observations of the same variable.




A good example of this is GNP or a stock
returns.
Define {x1, x2, . . . xT } = {xt}, for the time
periods t = 1, 2, . . . T.
xt is a random variable
Under the usual conditions we could estimate
yt = xtβ + εt, E(εt | xt) = 0, and OLS provides a
consistent estimate of β.
Data Generating Processes

We did maximum likelihood already



Think of the true data generating process as
generating different streams of potential data
sets
We only observe one of these data sets
Key insight: we want to understand how our
data relates to the true data generating
process so that we can ensure any estimation
doesn’t omit key variables in the DGP
Strict (or Strong) Stationarity

A process {xt} is strictly (or strongly)
stationary if the joint probability
distribution function of {xt−s,…,xt,…, xt+s}
is independent of t for all s.

Notice that strict stationarity requires that
f(xT) is the same regardless of the
starting/ending point of your series. This
is saying that the data generating process
is constant over time.
Weak Stationarity
A process xt is weakly stationary or
covariance stationary if E(xt), E(xt2) are
finite and E(xt,xt−j) depends only on j and
not on t.
 Weak stationarity is a different concept.
It requires finite and well defined first and
second moments, which are defined only
by the distance between to observations,
not the absolute time when those
observations exist.

Comparing types of stationarity

strict stationarity does not imply weak stationarity



Cauchy process which does not have moments. A process
could be drawn from an iid Cauchy distribution and therefore
be strongly stationary but not weakly stationary.
Contant data generating process that is constant over time
with finite first and second moments, then strong stationarity
will imply weak stationarity.
Weak stationarity does not imply strong stationarity.


most distributions are characterized by more parameters than
the mean and variance and thus weak stationarity allows that
other moments depend on t.
The special case is a normal distribution which is fully
characterized by its mean and variance and thus weak
stationarity plus normality will imply strong stationarity.
Why we like stationarity

Stationarity is a useful concept when
thinking about estimation because it
suggests that something is fixed across
the sequence of random variables. This
means if we could observe a process for
long enough, we can learn something
about the underlying data generating
process.
Ergodicity

This is actually not sufficient though for
our purposes. To ensure that this “long
enough” is finite, we add the additional
condition of ergodicity. Ergodicity is a
condition that restricts the memory of the
process. It can be defined in a variety of
ways. A loose definition of ergodicity is
that the process is asymptotically
independent. That is, for sufficiently large
T, xt and xt+n are nearly independent.
Lag operator

The lag operator, L, takes one whole time
series {xt} and produces another; the
second time series is the same as the
first, but moved backwards in date.



L2xt = LLxt = Lxt−1 = xt−2
Lj xt = xt−j ; L−j xt = xt+j .
It will be useful also to note that we can define
lag polynomials so that
a(L)xt = (a0L0 + a1L1 + a2L2)xt = a0xt + a1xt−1
+ a2xt−2.
Convergence to True Moments

A stationary, ergodic process is that with
finite persistence, we can estimate our
standard sample moments and as the
sample goes to infinity, these sample
moments will converge in the usual way to
the true moments.

This is typically called the “ergodic
theorem”
White Noise
A white noise process is just like our
random error terms.
 random variable εt. ~ i.i.d. N(0, σ2 ).




E(εt) = E(εt |all information at t − 1) = 0.
This simply means that seeing all the past ε’s
will not help predict what the current ε will be.
If all processes were like ε we wouldn’t need to
worry about persistence or serial correlation
but in practice, of course, processes like ε are
rare.
Moving Average or MA(q) Processes
Moving average processes are ones in
which the dependent variable is a function
of past realizations of the error
 We can write this in either polynomial or
lag notation as:





xt = εt + θ1εt-1 + …+ θqεt-q OR
xt = (1+θ1L + . . . θqLq)εt OR xt = b(L)εt
Ex MA(1):xt = εt + θεt-1 OR xt = (1+θL)εt
Sometimes, we define infinite moving 
averages, which we denote MA(∞): xt   j  t  j
j 0
Absolutely Summable

For MA process to be well defined we
require that its parameter θ is absolutely

summable or: |  j |  
j 0
A necessary condition for the θj’s to be
 j  0 which
absolutely summable is that lim
j 
is like the ergodicity condition
 For technical reasons we will actually
require square summability, which is that


2

 j 
j 0
AR(p)
Autoregressive processes are ones in
which the dependent variable is function
of previous period’s realizations.
 For example, GDP today is a function of
past GDP levels.
 We can write this in either polynomial or
lag-operator notation as:





xt =φ1 xt-1 + φ2 xt-2 + … + φp xt-p + εt
(1 + φ1L + φ2L2 + . . . + φpLp)xt = εt
a(L)xt = εt
Ex AR(1): xt =φ xt-1 + εt OR (1 − φL)xt = εt
Invertibility
We typically require that AR(p) processes be
invertible. To understand this, let’s return to
our lag polynomials
 Consider a specific example





xt = φ1xt−1 + φ2xt−2 + εt or (1 − φ1L − φ2L2)xt = εt
factor the quadratic lag polynomial with λ1 and λ2
such that
(1 − φ1L − φ2L2) = (1 − λ1L)(1 − λ2L).
This implies that λ1λ2 = −φ2 and λ1 + λ2 = φ1.
Invertibility - 2

Now we need to invert:



(1 − λ1L)(1 − λ2L)xt = εt
we get: xt = (1 − λ1L)−1(1 − λ2L)−1εt
Or x     j L j    j L j 
t


 j 0
1
 
 j 0
2


t
for our lag polynomial to be invertible



 1
To see why, notice xt = (1 − λ1L)−1(1 − λ2L)−1εt
would not be well defined.
we call a process invertible if the roots of its
parameter values are less than 1 in absolute value.
This is convenient because it will also imply
stationarity.
Autocovariance Generating Functions
the autocovariance generating function
(AGF) of series xt is defined as γj = cov(xt,
xt−j) =E(xtxt−j)
 this mapping of time to covariance relies
on the fact that that the covariance
depends on the time in between two x’s
and not on the absolute date t.
 This should sound familiar—it is our
stationarity property and invertibility is a
sufficient condition for a stationary series.

Example: MA(q)

MA(q) process, xt = (1+θ1L + . . . θqLq)εt
Var(εt) = σ2 for all t,
ThenL
 E(xt) = E[ θ0t + θ1t-1 + … + θqt-q]=
= θ0E(t) + θ1E(t-1) + … + θqE(t-q)= 0 .
 In general of course, this could equal some mean
μ, which we have for ease set to zero.
 Var(xt) = E[(xt – μ)2] = E[(θ0t + θ1t-1 … + θqt-q)2]
= (1 + θ12 + … + θq2)2
 γ1 = Cov(xt,xt-1) = E(xt,xt-1)
= E[(θ0t +…+θqt-q)(θ0t-1+θ1t-2 +…+θqt-q-1)]
= (θ1θ0 + … + θq-1θq)2

Next Time

Why are ARMA processes useful


Wold Decomposition
Estimating Stationary Univariate Time
Series


GLS
Model Selection