01_Identification

Transcript 01_Identification

Econometrics
Session 1 – Introduction
Amine Ouazad,
Asst. Prof. of Economics
Session 1 - Introduction
PRELIMINARIES
Introduction
•
•
•
•
•
•
Who I am
Arbitrage
Textbook
Grading
Homework
Implementation
Session 1
• The two econometric problems
• Randomization as the Golden Benchmark
Outline of the Course
Who I am
• Applied empirical economist.
• Work on urban economics, economics of
education, applied econometrics in accounting.
• Emphasis on the identification of causal effects.
• Careful empirical work: clean data work, correct
identification of causal effects.
• Large datasets:
– +100 million observations, administrative datasets,
geographic information software.
• Implementation of econometric procedures in
Stata/Mata.
Trade-offs
• Classroom is heterogeneous.
– In tastes, mathematics level, needs, prior
knowledge.
• Different fields have different habits.
– E.g. “endogeneity” is not an issue/the same
issue in OB, Finance, Strategy, or TOM.
• Conclusion:
– Course provides a particular spin on
econometrics, with mathematics when needed,
applications.
• This is a difficult course, even for students
with a prior course in econometrics.
Textbooks
• *William H. Greene, Econometrics, 6th
edition.
• Jeffrey Wooldridge, Econometrics of Cross
Section and Panel Data.
• Joshua Angrist and Jorn Steffen Pischke,
Mostly Harmless Econometrics.
• Applied Econometrics using Stata, Cameron
et al.
Prerequisites
• I assume you know:
– Statistics
• Random variables.
• Moments of random variables (mean, variance,
kurtosis, skewness).
• Probabilities.
– Real analysis
• Integral of functions, derivatives.
• Convergence of a function at x or at infinity.
– Matrix algebra
• Inverse, multiplication, projections.
Grading
• Exam:
60%
• Participation:
10%
• Homework:
30%
– One problem set in-between Econometrics A
and B.
Implementation
• STATA version 12.
– License for PhD students. Ask IT. 5555 or Alina
Jacquet.
– Interactive mode, Do files, Mata programming.
– Compulsory for this course.
• MATLAB, not for everybody.
– Coding econometric procedures yourself, e.g.
GMM.
Outline for Session 1
Introduction
1. Correlation and Causation
2. The Two Econometric Problems
3. Treatment Effects
Session 1 - Introduction
1. CORRELATION AND CAUSATION
1. The perils of confounding
correlation and causation
• How can we boost children’s reading scores?
– Shoe size is correlated with IQ.
• Women earn less than men.
– Sign of discrimination?
• Health is negatively correlated with the
number of days spent in hospital.
– Do hospitals kill patients?
Potential outcomes framework
• A.k.a the “Rubin causality model”.
• Outcome with the treatment Y(1), outcome
without the treatment Y(0).
• Treatment status D=0,1.
• FUNDAMENTAL PROBLEM OF
ECONOMETRICS: Either Y(1) or Y(0) is
observed, or, equivalently, Y=Y(1) D + Y(0) (1D) is observed.
• What would have happened if a given
subject had received a different treatment?
Naïve estimator of
the treatment effect
• D=E(Y|D=1) – E(Y|D=0).
• Does that identify any relevant parameter?
• Notice that:
– D= E(Y|D=1) – E(Y|D=0)
= E(Y(1)|D=1)-E(Y(0)|D=0)
• What are we looking for?
Ignorable Treatment (Rubin 1983)
• Assume Y(1),Y(0)  D.
• Then E(Y(0)|D=1)=E(Y(0)|D=0)=E(Y(0)).
• Similarly for Y(1).
• Then:
Another Interpretation
•
•
•
•
Assume Y(D)=a+bD+e.
e is the “unobservables”.
The naïve estimator D=b+E(e|D=1)-E(e|D=0).
Selection bias: S=E(e|D=1)-E(e|D=0).
– Overestimates the effect if S>0
– Underestimates the effect if S<0.
Definitions
• Treatment Effect.
Y(1)-Y(0)
• Average Treatment Effect.
E(Y(1)-Y(0))
• Average Treatment on the Treated.
E(Y(1)-Y(0)|D=1)
• Average Treatment on the Untreated.
E(Y(1)-Y(0)|D=0)
Randomization
as the Golden Benchmark
• Effect of a medical treatment.
– Treatment and control group.
– Randomization of the assignment to the
treatment and to the control.
• Why randomize?
• … effect of jumping without a parachute on
the probability of death.
With ignorability…
• If the treatment is ignorable (e.g. if the
treatment has been randomly assigned to
subjects) then
– ATE = ATT = ATU
Selection bias
• Why is there a selection bias?
– In medecine, in economics, in management?
1. Self-selection of subjects into the
treatment.
2. Correlation between unobservables and
observables, e.g. industry, gender, income.
Session 1 - Introduction
2. THE TWO ECONOMETRIC
PROBLEMS
2. The Two Econometric Problems
• Identification and Inference
– “Studies of identification seek to characterize
the conclusions that could be drawn if one could
use the sampling process to obtain an unlimited
number of observations.”
– “Studies of statistical inference seek to
characterize the generally weaker conclusions
that can be drawn from a finite number of
observations.”
Identification vs inference
• Consider a survey of a random subset of
1,302 French individuals.
• Identification:
– Can you identify the average income in France?
• Inference:
– How close to the true average income is the
mean income in the sample?
– i.e. what is the confidence interval around the
estimate of the average income in Singapore?
Identification vs inference
• Consider a lab experiment with 9 rats,
randomly assigned to a treatment group and
a control group.
• Identification:
– Can you identify the effect of the medication on
the rats using the random assignment?
• Inference:
– With 9 rats, can you say anything about the
effectiveness of the medication?
This session
• This session has focused on identification.
– i.e. I assume we have a potentially infinite
dataset.
– I focus on the conditions for the identification of
the causal effect of a variable.
• Next session: what problems appear
because we have a limited number of
observations?
Session 1 - Introduction
LOOKING FORWARD:
OUTLINE OF THE COURSE
Outline of the course
1. Introduction: Identification
2. Introduction: Inference
3. Linear Regression
4. Identification Issues in Linear Regressions
5. Inference Issues in Linear Regressions
6. Identification in Simultaneous Equation
Models
7. Instrumental variable (IV) estimation
8. Finding IVs: Identification strategies
9. Panel data analysis
10. Bootstrap
11. Generalized Method of Moments (GMM)
12. GMM: Dynamic Panel Data estimation
13. Maximum Likelihood (ML): Introduction
14. ML: Probit and Logit
15. ML: Heckman selection models
16. ML: Truncation and censoring
+ Exercise/Review session
+ Exam

01_Identification

Transcript 01_Identification

Directory