statistically significant.

Download Report

Transcript statistically significant.

Experiments and Causal Inference
We had brief discussion of the role of randomized
experiments in estimating causal effects earlier on.
Today we take a deeper look.
Key concepts:
Explanatory, response, and confounding variables;
Randomized comparison
Statistical significance; Effect size
Placebo effect, double-blind experiments
Internal and External validity
Completely randomized design vs. block design
Isolating Causal Effects: The Logic of
Experimental Design
We want to study the causal effect of some explanatory
(“independent”) variable on some response (“dependent”)
variable. We need to eliminate the effect of confounding variables
e.g. Effects of sitting in the front rows on course grades.
Confounding factors? e.g. How might we conduct a classroom
experiment on group label effects?
Logic of experimental design:
Randomization produces comparable groups before different
treatments (no-treatment being a special case) are applied to the
Because the groups are comparable in every respect except the
treatment assignments, differences in the response variable must
be due to the effects of the treatments
Confounding (Lurking) Variables
The solution:
Experiment: randomization: possible confounding
variables should “even out” across groups
Observational Study: measure potential
confounding variables and determine if they have an
impact on the response
(may then adjust for these variables in the statistical
analysis. e.g. using them as “control” variables in
regression models)
Randomized Experiment vs.
Observational Study
Both typically have the goal of detecting a relationship
between the explanatory and response variables.
Create differences in the explanatory variable and examine
any resulting changes in the response variable
Observational Study
Observe differences in the explanatory variable and notice
any related differences in the response variable
Why Not Always Use a Randomized
Sometimes it is unethical or impossible to assign
people to receive a specific treatment. e.g.
Does talking on the cell phone while driving
increases the risk of having an accident?
Do religious people live longer?
Certain explanatory variables, such as handedness
or gender, are inherent traits and cannot be
randomly assigned.
Vocabulary of Experiments
Multiple Treatment Values
e.g. Effects of TV advertising: length of time;
frequency of repetition
e.g. Do Get Out the Vote efforts work? Which
methods work better? (Gerber and Green field
Personal canvassing
Direct mail
Phone calls
Example Design: Energy Conservation
Multiple Treatment Values (different methods for
monitoring energy use)
Statistical Significance
If an experiment (or observational study) finds a
difference in two (or more) groups, is this difference
“real”? Could it be due to chance?
If the “true” difference in the “population” is 0, what is
the probability that we observe a “sample” difference
of this size?
If this probability is very small, then we call the
observed effect statistically significant. (We'll learn
later on how to determine this probability)
Significance is partly determined by sample size, i.e.,
number of subjects in an experiment
“Significant” typically means “non-zero effect”. We
should also look at the actual effect size to determine
if they are practically important.
Experiments in the Real world:
Issues and Techniques
Hawthorne, placebo and experimenter effects
(psychological effects); Double blind designs.
Issues of internal validity: refusals, nonadherence, dropouts
External validity: generalizing the results
Toward more powerful inference: block designs
and matched pairs (special case of block design)
Hawthorne, Placebo and Experimenter
The problem:
People may respond differently when they know
they are part of an experiment.
So for example the experimental effect of a new
drug could be Placebo effect (A) + any real effect of
the medicine (B)
The solution:
Use placebos, control groups, and double-blind
studies when possible to isolate (B)
The Hawthorne Effects
1920’s Experiment (Company called “Hawthorne
Works of the Western Electric Company”)
What changes in working conditions improve
productivity of workers?
More lighting?
Less lighting?
Other changes?
All changes improved productivity!
Double-blind experiment?
Double-Blinded Experiment: an Example
Quitting Smoking with Nicotine Patches (JAMA, Feb.
23, 1994, pp. 595-600)
Explanatory: Treatment assignment
Response: Cessation of smoking (yes/no)
Participants don’t know which patch they received
Nor do those measuring smoking behavior
Internal Validity:
Are we getting at the
causal effect right within
the study?
Non-adherers (not
following procedure)
Problem: individuals in
these categories are
probably not random
samples of the
Comparing Pre- and Post-Treatment Results
When randomization fails due to issues such as noncompliance, the pre/post comparison can be helpful.
Differences between the treatment and control groups
that do not change in time (in change in time in a
similar fashion) get differenced out
The so called “difference in difference” estimation
method compares the shift in the response variable,
not the post-treatment response itself.
External Validity: Generalizing the Results
Potential problems:
lack of generalizability due to:
unrealistic treatments
unnatural settings
sample not representative of population (e.g.
Undergraduate students not representative of the larger
“Natural experiments” or “quasi-experiments” can have an
advantage in some of these as they take place in the real
e.g. Smoking ban in Helena, Montana for 6 months in 2002.
Helena geographically isolated, served by only one hospital.
Observed that during the ban heart attack rate dropped by
Experimental Design: Blocking (Stratification)
Randomization cannot eliminate differences. The smaller
the N, the larger the differences between the two groups.
(think extreme case of randomizing 4 people!)
To improve: Stratify the subjects into similar groups
Within each strata, do randomized experiments
Reduces variance in estimated effects by reducing
variance in the (stratified) population.
Block Design Example:
Effects of TV ads.