Transcript Slide 1

Comments on Ludwig and Morris

Amy Ellen Schwartz Director Institute for Education and Social Policy Please do not cite or quote without permission

Why are so many American public schools not performing better?

• Four major hypotheses: – Inadequate resources – Inefficient production technologies (curriculum, etc.) – Unmotivated teachers – Unmotivated students • AES: – Outside of School Factors?

– Do we mean public schools or public school STUDENTS?

– Are we looking under the light?

• Education researchers usually can only randomize what they can pay for – Severely limits which of these we can study with randomized experimental designs of any sort • But other actors can randomize too. (HPD?)

The rhetoric of randomization

• Me: “Can we randomly assign the intervention?” • Me: “Well, could we do a pilot program?” • Me: “How do you decide who gets the pilot program if there is excess demand?” • Me: “Could we flip a coin, which would be the fair thing to do?” • AES: How about distributing it first to those that can benefit the most or need it the most?

• AES: Can you rank by need and create a cutoff?

• City official in very large Midwestern city: “No way.” • City official: “Sure. We do pilot programs all the time.” • City official: “Good question.” • City official: “Ah, now I get it.” • City official: “That’s even better! I can easily explain that to parents, students and school leaders.. That’s what I’m supposed to do!”

The rhetoric of randomization

• Never use term “randomized experiment” • Acceptable talking points: – “Pilot program” – “Excess demand” – “Fair, random lotteries” • AES: Should this give us pause? • If there would be a natural unit for doing “regular” pilot program, randomize that – I.e., we’d just be implementing an unusually informative pilot program – For many education interventions would seem to argue for group randomization (ex: pay for grades)

Second consideration: Power

• Statistical arguments for spreading out across more units. • Economics may suggest fewer: – Economies of scale in data collection – Economics of scale in service delivery • For a given budget, cluster randomization might generate

more

power • Good points!

But

• Limitations on the number of observational units possibly significant.

• Substantive concerns real: – Do you want to include or exclude spillovers or peer effects?

• AES: Also ethics concerns about the randomization. Some things can be implemented as “school policy” that individuals may not want to opt into. Is this a good thing or a bad thing?

Bottom lines

• Clustered experiments might help realizing randomization • Power considerations complex • Substantive considerations about spillovers and “public good” interventions • ‘Tis better to have randomized at the wrong level than to have never randomized at all? • AES: Worth thinking more about. Are there alternatives? What about allocating using a clearly defined formula allowing for an RDD? Other strategies?

Morris’ Study Poses Four Research Questions

1. What are the impacts of differing program approaches? What are the pathways?

2. How do effects vary? 3. What characteristics are necessary for effective implementation?

4. What features of Head Start settings are associated with successful training, technical assistance, and implementation ? AES: These are WITHIN Head Start questions not relative to “no treatment”.

Background

• Unique opportunity provided by testing multiple program models in the same evaluation in the context of a national trial • What role does funding play in generating the “unique opportunity”? • Can centers or grantees opt out? How should we think about this?

Selecting Program Models

Goal: to select programs “ready” for national trial • First, review “readiness” – Three programs emerged with the strongest efficacy evidence • Review program content to develop theories of change • AES: What sort of methods did the efficacy evidence use?

10

Randomization Strategy

• Blocking by grantee • Cluster random assignment strategy –

Centers

will be randomly assigned to one of two or three treatment groups or to a control group – All classrooms and children within center have same treatment group assignment • Centers within grantees must be similar on key characteristics to ensure randomization “works” • Does this mean that we effectively exclude grantees serving segregated areas? That is, if a grantee serves a diverse set of centers, characteristics would not be the same. Is that a problem in practice?

Comments on Morris

• This is a well designed study that promises to yield solid answers to the questions posed by the research team and, presumably, by the funder.

• Esp. nice that it attempts to look inside the black box to gain insight into why something might (not) work • May contribute to understanding that goes beyond Head Start.

• Key features: – Evaluates the impact of a set of well designed programs – Randomization at the center level is natural – Little discussion of the recruitment/sign up process – suggests centers can be compelled to adopt? (implications?) – Generalizability to non-Head Start settings not discussed. Are any of the results generalizable? Why or why not? Does it matter?

Questions

• What’s the ultimate goal of this line of study?

– To improve the performance of HS kids in kindergarten, relative to other kids? Broader?

– Once we know the answer, what will we do?

• Key questions for policy making would include: – How much does this cost? The interventions? The research?

– Will it be feasible to implement elsewhere the future? – What will be required and are those things likely to be available?

– Who will pay for it?

• • How will the results be used? Can HHS require Head Start classrooms to use the new techniques or induce them through their grantmaking?

How easy/hard will it be to implement in a real world setting? At scale?

Taking the two together

• A nice combination.

– Jens’ discusses the big picture – Pamela presents a specific, well defined and designed study.

• What do we make of these?

RCT Stock is High

• Heralded as the “gold standard” for evaluation research from the evaluation world.

• Financial support for Randomized Controlled Trials is very high.

– IES – Foundations (e.g. WT Grant) • Increasingly embraced by the policymaking world driven by a need for “evidence” for “evidence based decision-making” that is broadly accessible.

Lots to like

• RCT’s have a number of attractive features, as described nicely by Jens and Pamela.

• The best give clear, reliable, internally and externally valid estimates of program impacts that are easily explained to the layman.

But Significant Challenges in Design and Implementation

• Some can be overcome – Group rather than Individual randomization helps.

– How can you get informed consent at a group level? What does it mean? • Some cannot – Limits to ability to “go to scale”: • Infrastructure issues, labor scarcity, etc. (think of the CA CSR initiative.) – General equilibrium effects • If govt. gives pre-school to all poor children in an effort to close the poor/non-poor achievement gap, response of the middle class may negate the effect.

– Ethical issues

CA Class Size Reduction Program

• Inspired, at least in part, by the positive findings of the Tennessee Star Class size experiment. • In 1996, California implemented a state-wide policy to significantly reduce class size in the early grades. • Unfortunately, CSR was associated with declines in teacher qualifications and inequitable distribution of credentialed teachers; • The increased demand for teachers drew teachers without full credentials, disproportionately into schools serving the most disadvantaged students. • Evidence on the impact of the reduction on test scores is mixed.

General Equilibrium Effects

• The impact of an intervention implemented on a small scale is a partial equilibrium effect be important.

students. white test score gap?

to preserve their advantage?

– it typically does not account for behavioral responses of others which may • EX: RCT shows that providing pre-school can improve the academic performance of poor, African American • Will public pre-school for all serve to close the black • Maybe, maybe not. What is the response of the group who would have used private pre-school. Do they increase their investment in Kindergarten, or tutors, or.. • In the end, providing public pre-school may not serve to close the gap, even if it affects performance.

Why is there resistance to RCT?

• We should take seriously resistance from individuals or groups.

• Easy to assume it is ignorance or irrationality.

• Possible they know something we don’t? • Research is needed to understand this.

• At least two possible reasons: – Unintended consequences (researchers don’t see.) – Ethical/moral issues

Unintended and Unobserved Potential Consequences

• What might they be? • Does participation mean a school or individual will not receive something else?

• Can we assure them that they intervention will ‘do no harm’?

• What is the effect of participation in the control group?

Ethical/Equity Issues

• Our notion of fair may not match those of potential “subjects.” • Who is being studied? Who will benefit from the results?

• What is the likelihood that the findings will lead to changes that will benefit those being studied?

• Scarcity does not necessarily make randomization fair. Other mechanisms often used to allocate scarce resources • Are we comfortable with the paucity of RCT in high income school districts? Are we taking advantage of the poor?

• IRB/Human Subjects regulations (following Belmont) cite: – Principle of Justice – Beneficence – Respect for Persons • Are we following these? See Blustein in JPAM (2006) on this.

Reconsidering the Gold Standard

“Having behind us the producing masses of this nation and the world, supported by the commercial interests, the laboring interests and the toilers everywhere, we will answer their demand for a gold standard by saying to them: You shall not press down upon the brow of labor this crown of thorns, you shall not crucify mankind upon a cross of gold"

Thank you, William Jennings Bryan

Alternatives?

• Remember there are other methods!

– Qualitative work - ethnography – Case studies – Econometric Analyses • RDD • Structural Models • Diff-in-Diff • Natural experiments • IV

My Bottom Line

• Viewing RCT as the “gold standard”, suggests other methods are less valuable.

• Need to consider what do we lose sight of by looking through the lens of the RCT.

• In general, the method should match the question.

– If RCT is always the answer, what limits does that place on the question?

• RCT is only one of many excellent research methods and may be best used alongside others.

Quote from John Tukey...

• "Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise."