statistically significant.

Download Report

Transcript statistically significant.

Experiments and Causal Inference
●
●
We had brief discussion of the role of randomized
experiments in estimating causal effects earlier on.
Today we take a deeper look.
Key concepts:
–
Explanatory, response, and confounding variables;
Treatments
–
Randomized comparison
–
Statistical significance; Effect size
–
Placebo effect, double-blind experiments
–
Internal and External validity
–
Completely randomized design vs. block design
Isolating Causal Effects: The Logic of
Experimental Design
●
We want to study the causal effect of some explanatory
(“independent”) variable on some response (“dependent”)
variable. We need to eliminate the effect of confounding variables
–
●
e.g. Effects of sitting in the front rows on course grades.
Confounding factors? e.g. How might we conduct a classroom
experiment on group label effects?
Logic of experimental design:
–
Randomization produces comparable groups before different
treatments (no-treatment being a special case) are applied to the
groups
–
Because the groups are comparable in every respect except the
treatment assignments, differences in the response variable must
be due to the effects of the treatments
Confounding (Lurking) Variables
●
The solution:
–
Experiment: randomization: possible confounding
variables should “even out” across groups
–
Observational Study: measure potential
confounding variables and determine if they have an
impact on the response
(may then adjust for these variables in the statistical
analysis. e.g. using them as “control” variables in
regression models)
Randomized Experiment vs.
Observational Study
●
●
Both typically have the goal of detecting a relationship
between the explanatory and response variables.
Experiment
–
●
Create differences in the explanatory variable and examine
any resulting changes in the response variable
Observational Study
–
Observe differences in the explanatory variable and notice
any related differences in the response variable
Why Not Always Use a Randomized
Experiment?
●
●
Sometimes it is unethical or impossible to assign
people to receive a specific treatment. e.g.
–
Does talking on the cell phone while driving
increases the risk of having an accident?
–
Do religious people live longer?
Certain explanatory variables, such as handedness
or gender, are inherent traits and cannot be
randomly assigned.
Vocabulary of Experiments
Multiple Treatment Values
●
●
e.g. Effects of TV advertising: length of time;
frequency of repetition
e.g. Do Get Out the Vote efforts work? Which
methods work better? (Gerber and Green field
experiments)
–
Personal canvassing
–
Direct mail
–
Phone calls
Example Design: Energy Conservation
Multiple Treatment Values (different methods for
monitoring energy use)
Statistical Significance
●
●
●
●
●
If an experiment (or observational study) finds a
difference in two (or more) groups, is this difference
“real”? Could it be due to chance?
If the “true” difference in the “population” is 0, what is
the probability that we observe a “sample” difference
of this size?
If this probability is very small, then we call the
observed effect statistically significant. (We'll learn
later on how to determine this probability)
Significance is partly determined by sample size, i.e.,
number of subjects in an experiment
“Significant” typically means “non-zero effect”. We
should also look at the actual effect size to determine
if they are practically important.
Experiments in the Real world:
Issues and Techniques
●
●
●
●
Hawthorne, placebo and experimenter effects
(psychological effects); Double blind designs.
Issues of internal validity: refusals, nonadherence, dropouts
External validity: generalizing the results
Toward more powerful inference: block designs
and matched pairs (special case of block design)
Hawthorne, Placebo and Experimenter
Effects
●
●
The problem:
–
People may respond differently when they know
they are part of an experiment.
–
So for example the experimental effect of a new
drug could be Placebo effect (A) + any real effect of
the medicine (B)
The solution:
–
Use placebos, control groups, and double-blind
studies when possible to isolate (B)
The Hawthorne Effects
●
●
●
1920’s Experiment (Company called “Hawthorne
Works of the Western Electric Company”)
What changes in working conditions improve
productivity of workers?
–
More lighting?
–
Less lighting?
–
Other changes?
All changes improved productivity!
Double-blind experiment?
Double-Blinded Experiment: an Example
●
●
Quitting Smoking with Nicotine Patches (JAMA, Feb.
23, 1994, pp. 595-600)
Variables:
–
Explanatory: Treatment assignment
–
Response: Cessation of smoking (yes/no)
●
Double-blinded
●
Participants don’t know which patch they received
●
Nor do those measuring smoking behavior
Internal Validity:
Are we getting at the
causal effect right within
the study?
Issues:
–
Refusals
–
Non-adherers (not
following procedure)
–
Dropouts
Problem: individuals in
these categories are
probably not random
samples of the
subjects
Comparing Pre- and Post-Treatment Results
●
●
●
When randomization fails due to issues such as noncompliance, the pre/post comparison can be helpful.
Differences between the treatment and control groups
that do not change in time (in change in time in a
similar fashion) get differenced out
The so called “difference in difference” estimation
method compares the shift in the response variable,
not the post-treatment response itself.
External Validity: Generalizing the Results
●
Potential problems:
–
lack of generalizability due to:
●
unrealistic treatments
●
unnatural settings
●
●
sample not representative of population (e.g.
Undergraduate students not representative of the larger
population)
“Natural experiments” or “quasi-experiments” can have an
advantage in some of these as they take place in the real
world.
–
e.g. Smoking ban in Helena, Montana for 6 months in 2002.
Helena geographically isolated, served by only one hospital.
Observed that during the ban heart attack rate dropped by
60%.
Experimental Design: Blocking (Stratification)
●
Randomization cannot eliminate differences. The smaller
the N, the larger the differences between the two groups.
(think extreme case of randomizing 4 people!)
●
To improve: Stratify the subjects into similar groups
●
Within each strata, do randomized experiments
●
Reduces variance in estimated effects by reducing
variance in the (stratified) population.
Block Design Example:
Effects of TV ads.