Starting school at four: the effect of Universal Pre-K on

Download Report

Transcript Starting school at four: the effect of Universal Pre-K on

Starting School at Four: the Effect of
Universal Pre-Kindergarten on Children’s
Academic Achievement
Maria D. Fitzpatrick
The B. E. Journal of Economic Analysis and Policy
2008
Outline
1. What effects of child care? (literature)
2. The difference in difference framework
3. The article’s estimation strategy & results
3.1. Simple D-i-D
3.2. DDD
3.3. With synthetic control group
4. Discussion
What input? preschool in the US
Preschool in the US:


Kindergarten
Pre-K
Head Start
Experiments: Abecedarian, Perry
What outcomes ?
Child development




Language abilities
Socialization, behavior ( tests)
School preparedness: later test scores
Even later: wages, crime, # of arrests …
Labor supply of mothers
Literature
Experiments: Abecedarian, Perry
Cost:


between 15 000 and 40 000 USD /year
vs. 4000 for universal Pre-K
Head Start : heterogeneous in practice
 effects not that clear
= trade-off between intensive targeted care
and smaller-scale universal care?
 Articles to be reviewed
Cascio (JHR 2009): DiD, timing of free
Kindergarten / maternal employment
Fitzpatrick (JoLE 2010): same theme,
Regression Discontinuity Design
Cascio & al (QJE 2010) : effect of financial
incentives / desegregation: : 2SLS + DD
Heckman & al (JPE 2010) rate of return to
the HighScope Perry Preschool Program
revisited (experimental design)
2. The D-in-D method
The “difference-in-difference” method
Intuition:
Comparing “before” and “after” situation of
the treated  bias (many confounding
factors)
Comparing “after” situation of treated and
control  bias (composition effect)
 rather look at change in the difference in
outcome between treated and control
The “difference-in-difference” method
Example (Wooldridge): impact of the building of an
incinerator in 2005 on house prices in neighborhood
Define T=1 as “the house lies within 5 miles of the
incinerator”
Estimating price2006 = 0 + 1(T=1) + u yields, for
example, ˆ1= -30 000 (euros) = price T,2006 – price C,2006
= y T , 2006  y C , 2006
Can you conclude that the building of the incinerator
decreased neighborhood house prices by 30 000 € on
average?
The “difference-in-difference” method
You need price data from before construction was
even announced
Estimating price2002 = 0 + 1(T=1) + u yields
ˆ1 = -15 000 = price T,2002 – price C,2002 = y T , 2002  y C , 2002
dvd
This shows that prices were already lower in the
neighborhood where the incinerator would be built
The D-i-D estimator simply combines these 2
elements of information to estimate the causal
impact of T
The “difference-in-difference” method
Average increase in the difference in prices between houses
located near the incinerator (Near=1 : treated group) and
those located far (Control)
ˆ1  ˆ1  ˆ1

(priceC,2006 – priceT,2006) – (priceC,2002 – priceT,2002)
ˆ
This means we can easily derive  1 and test whether it is
significantly different from 0 by estimating which equation ?
The “difference-in-difference” method
price = 0 + 1.Near + 0.date2
+ 1.Near*date2 + u
[where date2 is a variable that =1 if t = 2006]
What do 0, 0, 1 represent, in this model ?
The “difference-in-difference” method
price = 0 + 1.Near+ 0.date2
+ 1.Near*date2 + u
 0  E [Y 2002 , far ]
 1  E [Y 2002 , near ]  E [Y 2002 , far ]
 0  E [Y 2006 , far ]  E [Y 2002 , far ]
 1  E [Y 2006 , near ]  E [Y 2006 , far ]   E [Y 2002 , near ]  E [Y 2002 , far ] 
b
The “difference-in-difference” method
price = 0 + 1.Near+ 0.date2
+ 1.Near*date2 + u
1 measures the difference in average price
change


b
“difference”: between the neighborhood where the
incinerator was built and the rest of the town
“change”: between date 1 (2002) and date 2 (2006)
The “difference-in-difference” method
price = 0 + 1.Near+ 0.date2
+ 1.Near*date2 + u
Mean Price
Far
Near
2002
0
0 + 1
2006
0 + 0
0 + 1 + 0 + 1
The “difference-in-difference” method
Price
0
1
1
0
0
1
2002
2006
The “difference-in-difference” method
In general: estimate
y = 0 + 0.date2 + 1.
controls + u
(T=1)+
1.
(T=1)*date2
+
The treatment effect is given by coefficient 1
It’s an ATE: Average Treatment Effect
The “difference-in-difference” method


Crucial assumption:
in the absence of treatment, the difference
(once observables have been controlled
for) between T & C would have remained
constant
= counterfactual
The “difference-in-difference” method



Same as: nothing but the treatment had an
impact on T’s outcome trend but not on C’s
Under this assumption, the entire shift
away from the common trend can be
attributed to the treatment
If something else happens at the same
time that could affect T & C differently,
DD won’t identify causal effect
3. The paper’s strategy
The evaluated policy
Fall 1993: Georgia institutes lottery to fund
targeted preschool
Fall 1995: too much money  preschool
program becomes universal
Share of Georgian 4-year-olds enrolled:
8% (1993)  50% (1996)
The evaluated policy
The evaluated policy
Treatment on the treated? No
ATE of the availability of Pre-K
“Intention to treat”
What outcomes to consider?


Test scores in 4th grade
Grade retention
The evaluated policy
Important checks
1) Take-up and crowd out


If all increase is made of children new to
preschool, total enrolment will have increased
If increase = only “crowd out” from children
previously attending private preschool, no increase
in total enrolment
2) Trends in Head Start

A pretty similar program  if changes at the same
time, confounding factor
Important checks (1/2)
Important checks (2/2)
Estimation
The basic D-i-D model :
Yijt   0   1Georgia * after  State i  year t   ijt
Where :



Yijt is test score of pupil i in school j on year t
Statei and yeart are state and year fixed effects
Georgia*after takes the value 1 if student 1
belongs to a cohort having had access to universal
pre-K
Who are we comparing to whom?
Estimation
Potential bias if composition of test-taking
population changed in GA relative to other
states over time  added controls :
Y ijt   0   1Georgia * after  State i  year t 
  X ijt   Z
jt
  ijt
Where :


Xijt is a vector of characteristics of pupil i in school
j on year t
Zjt is a vector of school j characteristics on year t
4. Results
Estimation strategy
Estimation strategy
Results
A DDD specification
Check for further confounding factors: add
8th graders to regression
The model becomes
Y ijt   0  fourth i  State i  year t
 fourth i * State i  fourth i * year t  State i * year t
  1Georgia
i
* after * fourth
i
 State i * year t * fourth i   ijt
A DDD specification
The estimated coeff before the
Georgia*after*fourth dummy is
[(GA 4th after - other States 4th after) (GA 4th before - other States 4th before)]
[(GA 8th after- other States 8th after) (GA 8th before - other States 8th before)]
A DDD specification
If something else than the policy at hand
(universal Pre-K) caused school results of
Georgian kids to shift away from those of
other states, the D-i-D on 8th graders will
catch that shift
Then the difference between that shift and
the one observed on 4th graders would be the
policy effect
A DDD specification
Assumption here = ?
Shift between GA & other states not caused
by the program = same for 4th & 8th graders
Put differently: other factors (not the
program) affected 4th & 8th graders in the
same way
The “synthetic control” method
Abadie, Diamond, Hainmueller (2007):
aggregrated data (e.g. at region level...)
One unit is treated
There are several control units
The “synthetic control” method
Instead of choosing one control group they
construct a counterfactual outcome as a
linear combination of all non treated
outcomes
Each control unit is weighted by its distance
to the treated unit (according to several
predictors of the outcome)
The ADH method
Minimize distance between GA “before” and
“before” synthetic control made of all other
states (k=2 to N) with weights wk
= minimize distance between
N
( y GA , before , X GA )
and
w
k 2
Uses both y and X
k
( y k , before , X k )
The ADH method
Treatment effect can then be estimated the
usual D-i-D way N
If y GA , before   w k . y k , before is small enough,
k 2
treatment effect is estimated by
N
y GA , after   w k . y k , after
k 2
The ADH method
Here: a “placebo DD test” with treatment
between 1996 and 2000 yields an “effect”
Math scores decreased by 1.7% of an SD in
GA relative to other states
Idea: compare GA to states that show the
same trend before treatment
Even better: compare GA to the linear
combination of states that best fits pretreatment trend
Results
Results
Results
Statistically insignificant though positive
state-wide effects
 heterogeneous effects?
Results
Discussion
Effects not very large, not always statistically
significant
What could cause this even if pre-K had an
effect on school preparedness?
12% in Head Start, 15 to 35% already in
some non-subsidized preschool  many
“treated” in the control group
Only “intent to treat” effect  assuming all
increase = new participants, ATT = much
stronger
Discussion
Seems that benefits are more prominent for
some groups : rural, non-white
Cost-benefit analysis: better grades  better
wages  more taxes
but huge costs ($300M) outweigh benefits
(<$50M)
Public policy recommendation: target publicly
funded pre-K programs