Methods of Economic Investigation: Lent Term: First Half

Download Report

Transcript Methods of Economic Investigation: Lent Term: First Half

Methods of Economic
Investigation:
Lent Term
Radha Iyengar
Office Hour: Monday 15.3016.30
Office: R425
Administrative Details

3 lectures per week for first 6 weeks all at 10am:






Monday, 10-11
Tuesday, 10-11
Thursday, 10-11
First Two Lectures each week: Theory
Thursday Lectures: Empirical Application
Recommended text – Johnston and Dinardo – not
very technical and good explanation
Course Outline

How we do causal inference (2 Weeks)



Various Non-Experimental Methods (3 weeks)




Difference-in-Differences
Matching
Instrumental variables
Various Data Issues (1 week)




Data Structure
Experimental vs. Non-experimental Methods
Measurement Error
Selection Bias
Censoring
Time series (4 weeks)
Why Suffer through Econometrics?

To predict the future (well, sort of)

To answer hard questions on the effect of
X on Y

To understand what all those wacky
economists are talking about
Econometrics is tool for useful thinking

We’re going to use econometrics for 2 things



Causal effects are answers to ‘what if’ questions:


Causal Effects
Forecasting
What would happen to driving if we increased gas taxes
were raised?
Forecasting –want best currently available
predictors: don’t worry about what causes what
Real-life Uses

Class exercises will contain practical work
with real data

Number of purposes:



Makes concepts less abstract, easier to
understand
Gives real-world skills
Gives insight into difficulty of of empirical work
Regression Re-cap

In our standard OLS model we estimate something
like
Yi  X i '    i

To estimate we need a condition like: E(X,ε) = 0

So generally, we’re interested in the relationship
between our X of interest on y holding other stuff
constant
E ( y | X )
 
k
X
k
OLS Estimation
If E(y|X)=Xβ, the OLS estimate is an
unbiased estimate of β
 Proof: Can write OLS estimator as:

1
OLS
ˆ

X 'X  X 'y

If X is fixed we have that:


1
1
OLS
ˆ
E 
X  X ' X  X 'E y X   X ' X  X ' X  
What do Regression Estimates tell us?
 Regressions
tell us about correlations
but ‘correlation is not causation’
 Example:
Crime
Regression of police on
As crime increases, police levels
increase
 Do Police cause crime?

Police Levels and Crime rates
Levitt (1997) American Economic Review
Problems in Estimating Causal Effects

Reverse Causality

Omitted Variables

Measurement Error

Sample selection
Omitted Variables (should be familiar)



Suppose we want to estimate E(y│X,W)
assumed to be linear in (X,W), so that E(y│X,W)
=Xβ+Wγ or:
y =Xβ+Wγ+ε
But you estimate
y=Xβ+u
i.e. E(y│X). Will have:
1
1
1
ˆ
   X ' X  X ' y     X ' X  X 'W    X ' X  X ' 
1
ˆ
p lim     XX XW
Form of Omitted Variables Bias

Where there is only one variable:
Cov W , X 
ˆ
p lim     
Var  X 

Extent of omitted variables bias related to:


size of correlation between X and W
strength of relationship between y and W
Reverse Causality/ Endogeneity
Idea is that correlation between y and X
may be because it is y that causes X not
the other way round
 Interested in causal model:
y=Xβ+ε
 But also causal relationship in other
direction:
X=αy+u

Endogeneity (II)
Reduced form is:
X=(u+αε)/(1-αβ)
 X correlated with ε – know this leads to
bias in OLS estimates
 In hospital example being sick causes you
to go to hospital – not clear what good
solution is.

Measurement Error
Most (all?) of our data are measured with
error.
 Suppose causal model is:
y=X*β+ε
 But only observe X which is X* plus some
error:
X=X*+u
 Classical measurement error:
E(u│X*)=0

Implications of Measurement Error





Can write causal relationship as:
Y=Xβ-u β +ε
Note that X correlated with composite error
Should know this leads to bias/ inconsistency in
OLS estimator
Can make some useful predictions about nature
of bias – later on in course
Want E(y│X*) but can only estimate E(y│X)
Sample Selection

One explanation is sample selection






Only have earnings data on women who work
Women with small children who work tend to have high
earnings (e.g. to pay for childcare)
Employment rates of mothers with babies is 28%, of
those with 5-year olds is 50%:
Causal model for everyone:
y=Xβ +ε
But only observe if work, W=1, so estimate
E(y|X,W=1) not E(y|X)
Sample selection bias if W correlated with ε – this
is likely
Common Features of Problems
All problems have an expression in
everyday language – omitted variables,
reverse causality etc
 All have an econometric form – the same
one
 A correlation of X with the ‘error’

What can we do?

More sophisticated econometric methods
than OLS e.g. IV

Better data – Griliches:

“since it is the ‘badness’ of the data that
provides us with our living, perhaps it is not at
all surprising that we have shown little interest
in improving it”
But Recent Trends

Much more emphasis on good quality data
and research design than ‘statistical fixes’
– the ‘credibility revolution’




Field Experiments
Natural Experiments
Instrumental Variables
Will illustrate this in course through wideranging examples
Issues to keep in Mind -1
Internal and External Validity

Estimates have internal validity if
conclusions valid for population being
studied

Estimates have external validity if
conclusions valid for other popoulations
e.g. can generalise impact of class size
reduction in Tennessee in late 1980s to
class size reduction in UK in 2005 –
nothing in data will help with this
Issues to Keep in Mind –2
Where’s the Bias

No identification strategy is going to be
perfect. We want to do the best we can
and then build credibility:




What is the worst case scenario for this
estimation?
If our instrument/natural experiment is biased,
what is generating that bias?
What direction will our estimates be biased in?
This of this as a bounding exercise—if we’re
wrong, can we use what we know and our
estimates to get a sense of where the truth
lies
Next Steps…

Start thinking about what we can do with
data



Next class: Data structures
How does our data affect what techniques can
we use?
What are the most common types of data for
different types of questions?