Methods of Economic Investigation: Lent Term: First Half

Download Report

Transcript Methods of Economic Investigation: Lent Term: First Half

Application 3: Estimating the Effect
of Education on Earnings
Methods of Economic
Investigation
Lecture 9
1
Quick Asymptotics reminder…

In class: Not really about “proving”
consistency or asymptotic bias in
estimates

When appropriate, will mention these bias
terms which are asymptotically zero but
not zero in finite samples
2
What should you know?

What happens to something in it’s
probability limit
1
1 
1
1

p lim  X ' X  X ' y   p lim   X ' X   p lim  X ' y 


N

N

1 
1
1

   p lim   X ' X   p lim  X '    
N

N


That our estimates will, in the limit, as N
goes to infinity, under regularity
conditions
3
What you do not need to know
Behind these results are various theorems
 Laws of Large Numbers for plims
 Central Limit Theorems for asymptotic
normality
 Various mathematical conditions



e.g contiunuous mapping theorem
You do not have to know:


which theorems you are using
You do not have to be able to prove these results
4
with the theorems
Bottom Line…

Understand the role of N→∞




the mean of the sample mean is μ
The variance goes to zero
If something is scaled by (N)-2 can converge in
distribution
So far, typically rely on concept of “bias”
but in large samples, consistency is more
useful term.


If bias is decreasing as sample is increasing,
then worry less about it
If even in large samples, our estimate is not
close to the true value, worry more about it
5
Today’s Lecture

Review Error component models



Application: Estimating the Effects of
Education on Earnings



Fixed Effects
Random Effects
Difficulty in Causal Estimation
Within-family estimator
Some limitation of fixed effects
6
Error has different components

Suppose we had to estimate Yij  X ij    ij
where  ij   j  ij

If unobserved factors are uncorrelated
with X’s: can do OLS w/ robust standard
errors or FGLS

If unobserved factors correlated with X’s,
can include group-specific fixed effect
Fixed effects versus “Dummy Variables”

These are not mutually exclusive
categories

Dummy variables are just a categorical
variable that is zero sometimes and one
sometimes


“control” variables, which have a direct
meaning, may sometimes be dummy variables
Fixed Effects, which tell us something about
the structure of our error term, are also
dummy variables.
Motivation for today’s example…

Want to know why do people earn
different amounts

Specifically, what are the returns, in terms
of increased wages, for various
investments people make

Most common labor improving
investment: Education
Motivation-2

Simple Linear regression first introduced
by Mincer
log( yij)  a  bSij  cXij  dXij 2  eij
Index this by
individual i in
group j
Measure of
schooling: we’re
going to use
years of
education
Experience: we’re going
to include a quadratic
specification which is
most commonly used
Basic Problem with estimating this

Lot’s of reasons why different people may
invest at different levels of education

Some of those reasons are probably
correlated with how much money a person
would earn as well as how much they will
invest in education


Unobserved “ability”
Family factors, such income, parental
involvement, genetic stuff, etc.
How might these bias our estimates?

Let’s say what we want to estimate is:
log( yij)  a  bSij  cXij  dXij  fj  eij
2

Interpret higher f as something like family
income or family investment

Recall the OVB formula—care about two
things:


Correlation between f and y: probably positive
Correlation between f and S: positive
Why is OLS biased?
Y
OLS  E[S ]
  E[Sij | j  3]
  E[Sij | j  2]
  E[Sij | j  1]
S
How could we fix this?

Some of the unobserved differences that
bias a cross-sectional comparison of
education and earnings are based on
family characteristics

Key Assumption: within families, these
differences should be fixed.

Observe multiple individuals with exactly
the same family effect, then we could
difference out the group effect
Estimating Family Averages

Can look at differences within family effect

This of this as a different CEF for each family
E[Yij -Yj | S, X, f] = a + b(Sij – Sj) + c(Xij – Xj) +
d(X2ij – X2j)

The way we estimate this:
2
ˆ
ˆ
ˆ
ˆ
log( yij)  a  bSij  cXij  dXij  fˆj  eˆij
What makes this believable

No within family differences

Might be a problem with siblings generally




Parents invest differently
Cohort related differences—influence siblings
differently
Different “inherited” endowment
More believable with identical twins
A twins sample

Collect data at the Twins festival in
Twinsburg Ohio

Survey twins:




Are you identical? If both say yes—then included
Ever worked in past two years
Earnings, education, and other characteristics
Useful because also get two measures of
shared characteristics, so can control for
measurement error
Twins sample issues…

Sample at Twinsburg NOT a random
sample of twins



Benefit: more likely to be similar because
attendees are into their “twinness”
Cost: not necessarily generalizable, even to
other twin
Attendees select segment of the population


Generally Richer, Whiter, More Educated, etc.
Worry about heterogeneity of effects across
some of these categories
External validity

Twins may not be very comparable to
other families—face different costs and
benefits to schooling

Twinsburg sample not representative of
twins


Maybe not even externally valid for twins
Worry that selection into sample will give us an
estimate that is not consistent with the
population average
20
Fixed effects
(same as first
difference w/ only
two obs/family
Control for avg.
family schooling—
”ability” measure
No family effect,
cross-section
regression
22
Where’s the variation

Recall our estimating equation
2
ˆ
ˆ
log( yij)  aˆ  bSij  cˆXij  dXij  fˆj  eˆij

If Sij is the same in both twins, no
contribution to estimate of b

Only estimated off of twins who are
different from each other in schooling
investments
Correlation Matrix for Twins
Education of twin 1, Education of twin 1,
reported by twin1
reported by twin2
ALL of the identification for b comes from the 25% of twins who don’t
have the same schooling
Measurement Error

Seems that twins not perfect at reporting
each other’s schooling: 5-10%
measurement error

May be generating a different bias


Can use instrumental variables to try to
address this (more on this after we do
Instrumental Variables methods)
Need to worry about Data Quality too, can’t
just worry about OVB
25
Limitations of Fixed Effects

Relies on within variation



Not transparent what is generating that variation
The variation that’s left may be ‘random’ but may
be limited in its external validity
Must be the case that there is NO within
group variation AND homogeneous effects
between groups (i.e. b the same across
groups)

May be less believable if family inputs have nonlinear effects on income or education
What did we learn today

When have unobserved group effects can
be two issues:



Uncorrelated with X’s: OLS not efficient, can
fix this with GLS
Correlated with X’s: OVB, can include “fixed
effects”
Fixed effects, within-group differences,
and deviation from means differences can
all remove bias from unobserved group
effect
27
Next Class

Application: The effect of Schooling on
wages


Ability Bias
Fixing this with “twins” and “siblings” models
28