Transcript Slide 1

Doing Experiments
An introduction
1
• Empirical social science, including economics, is
largely nonexperimental, using data from situations
occurring in natural situations.
• Lab experiments criticized on several grounds
– Participant pools are often unrepresentative
– Samples are too small
– Unrealistic data lacks relevance for the real world
– Generally field experiments, like the Rand Health
Insurance experiment of the 1970s, considered
superior
• Falk and Heckman disagree, arguing that lab
experiments are often superior
2
Why do Falk and Heckman argue for lab
experiments?
What are the major advantages of labs
experiments over field experiments and
surveys?
3
Why might lab experiments be better?
• Labs provide controlled variation
• Can specify complete contracts, something rarely
done in real-world environments
• Even though the situation is unrealistic, the results
reveal things about human nature
– Gift exchange game shows that higher payoffs
elicit more worker effort, contradicting the selfinterested worker theory
– We have experimental evidence that the paradigm
of rational selfishness does not hold
– Similar evidence for loss aversion, present-value
bias, and social approval bias
4
• Lab experiments allow testing precise predictions
from game theoretic behavior
• Measured behavior is reliable and real
• Lab experiments are relatively inexpensive to
implement
• Realism of field experiments does not necessarily
make them superior to lab experiments. Real issue is
the best way to isolate causal factors
5
• The problem: We have
Y=f(X1, …,Xn)
where we want to know the causal effect of X1 on Y,
which means we need to vary X1 holding X2,…Xn
constant.
• If f(.) is separable in X1 so Y=g(X1)+h(X2,…,Xn) then
varying X1 provides the causal impact of X1 on Y.
– Even if separable, the impact depends both on the
level of X1 and the magnitude change in X1
• If f(.) is not separable in X1, then the causal impact of
X1 on Y also depends on the level of (X2,…,Xn)
• Lab experiment allows better control over the level
and magnitude change in X1, and level of other Xs.
6
• Field experiments suffer from population specificity just as
do lab experiments
– “Natural situation” of sports card traders is as specific a
population as volunteer college students
– Field experiments usually have restrictive populations,
like the Rand health insurance experiment which used
people on public assistance
– Hence, field experiments give accurate causal impacts
only if the relationship is separable in X1 and, if the
causal relationship is linear
– But in that case, lab experiments give equally valid
causal inferences that transfer across populations
– Field experiments do offer greater variability in other
Xs, which offers complementary information
7
Other objections
• Experiments with students don’t produce
representative results about economic theories
– But most theories are independent of
assumptions about participant pools
• Stakes are trivial
– Never clear what “right” stakes should be
– Stakes can be varied
– Economic theory is based on epsilon changes.
• Samples are too small
– Statistical measures exist for analyzing small
samples
8
Other objections (continued)
• Experiments do not distinguish between
experienced and nonexperienced participants
– Can be better controlled in the lab than in the field
• Participants in lab experiments behave differently
because they are being scrutinized
– Not exclusive to labs
– Repeated experiments can “average this out”
• Self-selection into experiments
– Provides information about preferences
– Problem for field experiments too, which also suffer
from adherence and attrition, which lab experiments do
not
9
Go for complimentarity. Combine what we learn from
lab experiments with field experiments and large
surveys
– Lab experiments offer carefully controlled
environments, and provide important insights into
preference heterogeneity
– Field experiments offer broader participation and
wider variety
– Surveys can generate large and representative
data sets that provide statistical power
10
Big issue in the analysis of a treatment is the
missing counterfactual
– Idea of a treatment is to see how it changes an
individual behavior, but usually only observe a
person once, either with or without the treatment
– Hence, need to see the case of the treatment being
applied only with random assignment
– Then, average differences, once other covariates
are controlled for, can be assigned to the treatment
– Goal is to maximize the variance of the treatment
while controlling for other heterogeneity
11
So what are we looking for in a treatment
experiment?
We are looking for the effect of the treatment
Suppose
Y = XB + cT + a+e
where Y is the outcome, X are observed
characteristics, a are unobserved characteristics,
T is treatment, and e is a random term. We
assume person specific treatment effects are
nonexistent, so we are looking for the
magnitude and sign of c.
12
• Ideally, we would observe the same decision
making unit (observation) when T=1 (it has
the treatment) and T=0 (it doesn’t have the
treatment).
– This is the counterfactual
– Easy to achieve in lab sciences
– Rarely or never achieve it in social science
experiments
• The average treatment effect (ATE) is the
difference for the same person in Y with and
without the treatment, hence = c.
13
• But if the division into treatment or not is
correlated with any of the unobserved
variables, the estimate of c is biased
– This is the idea behind selectivity
• Randomization provides the appropriate
counterfactual, as I indicated earlier
– So comparing the average of those in the
treatment group to those not in the treatment
group gives us a statistically valid measure of c.
14
Two special issues in (mostly field) experiments
are adherence (fidelity to the treatment) and
attrition (dropping out of the experiment).
These can bias the results.
• Lack of fidelity often means while it appears
someone had the treatment, they really
didn’t. This biases results downward.
• Attrition may be tied to the lack of success of
the treatment. This biases the results
upward.
15
• Hence, the ATE may be biased.
• Solution is to do an Intent to Treat Analysis (ITT).
Compare the outcomes based on the initial
treatment intent, not on the treatment
eventually administered.
• Is pragmatic, focusing on the outcome that could
be expected as the treatment is really applied, as
opposed to only on the real treatment effect.
• The hypothesis that an ITT analysis addresses is
pragmatic–the effectiveness of therapy when
used on autonomous individuals.
16
List, et al paper offers specifics on achieving these goals
so the data are statistically reliable. It comes up with
several rules of thumb
1. With continuous outcome, treatment and control
groups should be the same only if the sample
variances of the outcome means are expected to be
equal.
2. If sample variances are not equal, the ratio of sample
sizes should be equal to the ratio of the standard
deviations.
3. If the cost of sampling varies across treatment cells,
ratio of sample sizes should be inversely related to
the square root of the relative costs.
17
List, et al (continued)
4. When the unit of randomization is different from the
unit of analysis (for example, randomizing treatment by
school, but measuring outcome by student) you need to
worry about correlations within the cluster than decides
the randomization.
5. When treatment variable is not discrete (so there are a
limited number of treatments), but instead is
continuous, then the number of cells should equal the
order of the treatment plus one.
–
–
If the expected impact is linear, the sample is divided into no
treatment and full treatment.
If expected impact is quadratic, the sample is divided into
three cells; no treatment, an intermediate level near the
18
middle of treatment intensity, and full treatment