Experiments and Observational Studies - Peacock

Download Report

Transcript Experiments and Observational Studies - Peacock

Experiments and
Observational
Studies
Chapter 13
Objectives:
•
•
•
•
•
•
•
•
•
Observational study
Retrospective study
Prospective study
Experiment
Experimental units
treatment
response
Factor
Level
• Principles of experimental
design
• Statistically significant
• Control group
• Blinding
• Placebo
• Blocking
• Matching
• Confounding
Observational Study
• Observes individuals and records variables of interest but
does not attempt to influence the response (does not impose
a treatment).
– Allows the researcher to directly observe the behavior of
interest rather than rely on the subject’s self-descriptions
(survey).
– Allows the researcher to study the subject in its natural
environment, thus removing the potentially biased effect
of the unnatural laboratory setting on the subject’s
performance (animal behavior).
Observational Study
• Two Types
1. Field Observation – Observations are made in a
particular natural setting over an extended period of
time.
2. Systematic Observation – Observations of one or more
particular behaviors in a specific setting.
• Since Observational Studies do not impose a treatment it is
not possible to prove a cause-and-effect relationship with
an observational study.
Observational Study
• Example:
– Researchers compared the scholastic performance of
music students with that of non-music students. The music
students had a much higher overall grade point average
than the non-music students, 3.59 to 2.91. Also, 16% of
the music students had all A’s compared with only 5% of
the non-music students.
Observational Study
• In an observational study, researchers don’t assign choices;
they simply observe them.
– The example looked at the relationship between music
education and grades.
– Since the researchers did not assign students to get music
education and simply observed students “in the wild,” it
was an observational study.
– Because researchers in the example first identified
subjects who studied music and then collected data on
their past grades, this was a
retrospective study.
Retrospective Study
• Observational studies that try to discover variables related to
rare outcomes, such as specific diseases, are often
retrospective. They first identify people with the disease and
then look into their history and heritage in search of things
that may be related to their condition.
• Retrospective studies have a restricted view of the world
because they are usually restricted to a small part of the
entire population.
• Because retrospective studies are based on historical data,
they can have errors.
– Do you recall exactly what you ate yesterday? How
about last Monday?
Prospective Study
• A somewhat better approach to a observational study, then
using historical data such as in a retrospective study, is to
identify subjects in advance and collect data as events unfold.
This called a prospective study.
• In our example studying the relationship between music
education and grades, had the researchers identified subjects
in advance and collected data over an entire school year or
years, the study would have been a prospective study.
Observational Study
• Observational studies are valuable for discovering
trends and possible relationships.
• However, it is not possible for observational studies,
whether prospective or retrospective, to
demonstrate a cause and effect relationship. There
are too many lurking variables that may affect the
relationship.
Experiment
• Definition: Experiment – deliberately imposes some
treatment on individuals in order to observe their
responses.
• Basic Experimental Design
– Subject
Treatment
Observation
• The purpose of an experiment is to reveal the
response of one variable to changes in other
variables, the distinction between explanatory and
response variables is essential.
Experiment
• An experiment is a study design that allows us to prove a
cause-and-effect relationship.
• In an experiment, the experimenter must identify at least one
explanatory variable, called a factor, to manipulate and at least
one response variable to measure.
• An experiment:
– Manipulates factor levels to create treatments.
– Randomly assigns subjects to these treatment levels.
– Compares the responses of the subject
groups across treatment levels.
Experiment
• In an experiment, the experimenter actively and
deliberately manipulates the factors to control the
details of the possible treatments, and assigns the
subjects to those treatments at random.
• The experimenter then observes the response
variable and compares responses for different groups
of subjects who have been treated differently.
Experiment
• In general, the individuals on whom or which we
experiment are called experimental units.
– When humans are involved, they are commonly called
subjects or participants.
• The specific values that the experimenter chooses for
a factor are called the levels of the factor.
• A treatment is a combination of specific levels from
all the factors that an experimental unit receives.
Review - Experimental Terminology
• Experimental Units
– The individuals or items on which the experiment is
performed.
– When the experimental units are human beings, the term
subject is often used in place of experimental unit.
• Response variable
– The characteristic of the experimental outcome that is
being measured or observed.
Review - Experimental Terminology
• Factor
– The explanatory variables in an experiment.
– A variable whose effect on the response variable is of
interest in the experiment.
• Levels
– The different possible values of a factor.
Review - Experimental Terminology
• Treatment
– A specific experimental condition applied to the units of an
experiment.
– For one-factor experiments, the treatments are the levels
of the single factor.
– For multifactor experiments, each treatment is a
combination of the levels of the factors.
Example:
• Researchers studying the absorption of a drug into
the bloodstream inject the drug into 25 people. 30
minutes after the injection they measure the
concentration of the drug in each person’s blood.
• Identify the;
a)
b)
c)
d)
e)
Experimental units.
Response variable.
Factors.
Levels of each factor.
Treatments.
Answer:
Researchers studying the absorption of a drug into the bloodstream inject the drug
into 25 people. 30 minutes after the injection they measure the concentration of
the drug in each person’s blood.
a)
Experimental units
–
b)
Response variable
–
c)
Single factor – the drug
Levels
–
e)
Concentration of the drug in the blood
Factors
–
d)
Subjects, the 25 people injected
One level – the dose
Treatment
–
Injecting the drug
Your Turn:
• Weight gain of Golden Torch Cacti. Researchers examined the
effects of a hydrophilic polymer and irrigation regime on
weight gain. For this study the researchers chose the
hydrophilic polymer P4. P4 was either used or not used, and
five irrigation regimes were employed: none, light, medium,
heavy, and very heavy.
• Identify the;
a)
b)
c)
d)
e)
Experimental units.
Response variable.
Factors.
Levels of each factor.
Treatments.
Answer:
Weight gain of Golden Torch Cacti. Researchers examined the effects of a hydrophilic polymer and irrigation
regime on weight gain. For this study the researchers chose the hydrophilic polymer P4. P4 was either
used or not used, and five irrigation regimes were employed: none, light, medium, heavy, and very heavy.
a)
Experimental units
–
b)
The cacti used in the study
Response variable
–
c)
The weight gain of the cacti
Factors
–
d)
Two factors – the hydrophilic polymer P4 and the irrigation regime
Levels
–
–
e)
P4 has two levels; with and without.
Irrigation regime has five levels; none, light, medium, heavy, and very heavy.
Treatment
–
There are 10 different treatments, each a combination of a level
of P4 and a level of irrigation regime. See next slide for treatments.
Schematic for the 10 Treatments in the
Cactus Study
Factors
Levels
Treatments
Randomized, Comparative Experiment
1. Manipulates the factor levels to create
treatments.
2. Randomly assigns subjects to these
treatments.
3. Compares the responses of the subject
groups across treatment levels.
The Four Principles of Experimental
Design
1. Control
2. Randomize
3. Replicate
4. Block
The Four Principles of Experimental
Design
1. Control:
–
–
–
Good experimental design reduces variability by
controlling the sources of variation.
We control sources of variation other than the factors we
are testing by making conditions as similar as possible for
all treatment groups.
Comparison is an important form of control. Every
experiment must have at least two groups so the effect
of a treatment can be compared with either the effect of
a traditional treatment or the effect of no
treatment at all.
The Four Principles of Experimental
Design
2. Randomize:
–
–
Subjects should be randomly divided into groups to
avoid unintentional selection bias in constituting the
groups, that is, to make the groups as similar as possible.
Randomization allows us to equalize the effects of
unknown or uncontrollable sources of variation.
•
–
It does not eliminate the effects of these sources, but it spreads
them out across the treatment levels so that we can see past
them.
Without randomization, you do not have a valid
experiment and will not be able to use the powerful
methods of Statistics to draw conclusions
from your study.
The Four Principles of Experimental
Design
2. Randomize:
– One source of variation is confounding variables (will
discuss later), variables that we did not think to measure
but which can affect the response variable.
– Randomization to treatment groups reduces bias by
equalizing the effects of confounding variables.
The Four Principles of Experimental
Design
3. Replicate:
–
Repeat the experiment, applying the treatments to a
number of subjects.
•
•
•
One or two subjects does not constitute an experiment.
The outcome of an experiment on a single subject is an
anecdote, not data.
A sufficient number of subjects should be used to ensure that
randomization creates groups that resemble each other closely
and to increase the chances of detecting differences among the
treatments when such differences actually exist.
Example: Replication
The outcome of an experiment on a single subject is
an anecdote, not data.
The Four Principles of Experimental
Design
3. Replicate:
–
When the experimental group is not a representative
sample of the population of interest, we might want to
replicate an entire experiment for different groups, in
different situations, etc.
•
Replication of an entire experiment with the
controlled sources of variation at different
levels is an essential step in science.
– The experiment should be designed in such a way that
other researchers can replicate the results.
The Four Principles of Experimental
Design
4. Block:
–
–
–
Sometimes, attributes of the experimental units that we
are not studying and that we can’t control may
nevertheless affect the outcomes of an experiment.
If we group similar individuals together and then
randomize within each of these blocks, we can remove
much of the variability due to the difference among the
blocks.
Note: Blocking is an important compromise between
randomization and control, but, unlike the
first three principles, is not required in
an experimental design.
Diagrams of Experiments
• It’s often helpful to diagram the procedure of an experiment.
• The following diagram emphasizes the random allocation of
subjects to treatment groups, the separate treatments
applied to these groups, and the ultimate comparison of
results:
Flow Chart
Logic of Experimental Design
1. Randomization produces groups of experimental units that
should be similar in all respects before the treatments are
applied.
2. Comparative design ensures that influences other than the
experimental treatments operate equally on all groups.
3. Therefore, differences in the response variable must be due
to the effects of the treatments. That is, the treatments not
only are associated with the observed differences in the
response but must actually cause them (cause and effect).
Does the Difference Make a Difference?
• How large do the differences need to be to say that there is a
difference in the treatments?
• Differences that are larger than we’d get just from the
randomization alone are called statistically significant.
• We’ll talk more about statistical significance later on. For now,
the important point is that a difference is statistically
significant if we don’t believe that it’s likely to have occurred
only by chance.
Experiments and Samples
• Both experiments and sample surveys use randomization to
get unbiased data.
• But they do so in different ways and for different purposes:
– Sample surveys try to estimate population parameters, so
the sample needs to be as representative of the
population as possible.
– Experiments try to assess the effects of treatments, and
experimental units are not always drawn randomly from a
population.
Control Treatments
• Often, we want to compare a situation involving a
specific treatment to the status quo situation.
• A baseline (“business as usual”) measurement is
called a control treatment, and the experimental
units to whom it is applied is called the control
group.
Blinding
• When we know what treatment was assigned, it’s difficult not
to let that knowledge influence our assessment of the
response, even when we try to be careful.
• In order to avoid the bias that might result from knowing what
treatment was assigned, we use blinding.
• There are two main classes of individuals who can affect the
outcome of the experiment:
– those who could influence the results (subjects, treatment
administrators, technicians)
– those who evaluate the results (judges, treating physicians, etc.)
Blinding
• When all individuals in either one of these classes are blinded,
an experiment is said to be single-blind.
– Single-Blind: An experiment is said to be single blind if the subjects of
the experiment do not know which treatment group they have been
assigned to or those who evaluate the results of the experiment do not
know how subjects have been allocated to treatment groups.
• When everyone in both classes is blinded, the experiment is
called double-blind.
– Double-Blind: An experiment is said to be double-blind if neither the
subject nor the evaluators know how the subjects have been allocated
to treatment groups.
Placebos
• Often simply applying any treatment can induce an
improvement.
• To separate out the effects of the treatment of
interest, we can use a control treatment that mimics
the treatment itself.
• A “fake” treatment that looks just like the treatment
being tested is called a placebo.
– Placebos are the best way to blind subjects from
knowing whether they are receiving
the treatment or not.
Placebos
• The placebo effect occurs when taking the sham
treatment results in a change in the response
variable.
– This highlights both the importance of effective
blinding and the importance of comparing
treatments with a control.
• Placebo controls are so effective that you should use
them as an essential tool for blinding
whenever possible.
Designing an Experiment
Step-By-Step
Completely Randomized Experiment
(the ideal simple design)
• Goal
– State what you want to know.
• Response
– Specify the response variable.
• Treatments
– Specify the factor levels and the treatments.
Designing an Experiment
Step-By-Step
• Experimental units
– Specify the experimental units.
• Experimental Design
– Observe the 4 principles of experimental design:
• Control – any sources of variability you know of and can control.
• Randomly – assign experimental units to treatments, to equalize the
effects of unknown or uncontrollable sources of variation. Specify how
the random numbers needed for randomization will be obtained.
• Replicate – results by placing sufficient experimental units in each
treatment group.
• Blocking – if required, group similar individuals together.
Designing an Experiment
Step-By-Step
• Specify any other experiment details
– Give enough details so that another experimenter could
exactly replicate your experiment.
– How to measure the response.
Randomized Comparative Experiment
Example:
• Researchers believe that diuretics may be as effective
in reducing a person’s blood pressure as the
conventional drug (drug A), which is much more
expensive and has more unwanted side effects.
Design a randomized comparative experiment to test
this hypothesis.
Randomized Comparative Experiment
Example:
• Explanatory Variable
– Type of Medication
Diuretic
– Treatments
Drug A
• Response Variable
– Change in Blood Pressure
Randomized Comparative Experiment
Example:
Randomized Comparative Experiment
Your Turn:
• Can chest pain be relieved by drilling holes in the
heart? Since 1980, surgeons have been using a laser
procedure to drill holes in the heart. Many patients
report a lasting and dramatic decease in chest pain.
Is the relief due to the procedure or is it a placebo
effect?
• Design a randomized comparative experiment, using
a group of 298 volunteers with severe chest pain, to
test this procedures effectiveness.
Randomized Comparative Experiment
Example
The Best Experiments…
• are usually:
– randomized.
– comparative.
– double-blind.
– placebo-controlled.
Other Experimental Designs
1.
Block Design
2.
Matched Pairs Design
Blocking
• When groups of experimental units are similar, it’s often a
good idea to gather them together into blocks.
• Blocking isolates the variability due to the differences
between the blocks so that we can see the differences due to
the treatments more clearly.
• In effect, we are conducting two parallel experiments. We use
blocks to reduce variability so that we can see the effect of
the treatments. The blocks themselves are not treatments.
• When randomization occurs only within the blocks, we call
the design a randomized block design.
Blocking
• Blocks are another form of control.
• Blocking is the same idea for experiments as stratifying is for
sampling.
– Both methods group together subjects that are similar and
randomize within those groups as a way to remove
unwanted variation.
– We use blocks to reduce variability so we can see the
effects of the factors; we’re not usually interested in
studying the effects of the blocks themselves.
Blocking
• Blocking is the same idea for experiments as stratifying is for
sampling.
– Both methods group together subjects that are similar and
randomize within those groups as a way to remove
unwanted variation.
– We use blocks to reduce variability so we can see the
effects of the factors; we’re not usually interested in
studying the effects of the blocks themselves.
Block Design – Example:
• Suppose the researchers in our Diuretics vs. Drug A example
have reason to believe that men and women respond
differently to blood pressure medication. Then gender would
be the blocking variable.
• Our goal is to be able to assess a cause-and-effect relationship
between the treatment imposed and the response variable.
Blocking reduces variability so that the differences we see can
be attributed to the treatment that we imposed. Blocking is to
experimental design as stratifying is to sampling design.
Block Design – Example:
Block Design – Your Turn:
• The progress of a type of cancer differs in women
and men. Design a clinical experiment to compare 3
different therapies for this cancer using a subject
pool made up of 80 men and 60 women (140 total
subjects).
Males
#80
Random Allocation
Solution:
Block By
Gender
Females
#60
Random Allocation
Total
Subjects
140
Group 1
#20
Treatment #1
Therapy 1
Group 2
#20
Treatment #2
Therapy 2
Group 3
#20
Treatment #3
Therapy 3
Group 4
(control)
#20
Treatment #4
Placebo
Group 1
#15
Treatment #1
Therapy 1
Group 2
#15
Treatment #2
Therapy 2
Group 3
#15
Group 4
(control)
#15
Treatment #3
Therapy 3
Treatment #4
Placebo
Compare
Cancer
Progress
Compare
Cancer
Progress
Experimental Design – Example:
• An ad for OptiGro plant fertilizer claims that with this
product you will grow “juicier, tastier” tomatoes.
You’d like to test this clam, and wonder whether you
might be able to get by with half the specified dose.
• How can you set up an experiment, using 24 tomato
plants from a garden store, to check out the claim?
Experimental Design – Example:
• Completely randomized experiment in one factor (three
levels)
Experimental Design – Example:
• Suppose we wanted to use 18 tomato plants of the same
variety for our experiment, but the garden store had only 12
plants left. So we drove down to the nursery and bought 6
more plants of that variety. We worry that the tomato plants
from the two stores are different somehow, and, in fact, they
don’t really look the same.
• How can we design the experiment so that the differences
between the stores don’t mess up our attempts to see
differences among fertilizer levels?
Experimental Design – Example:
• Randomized block design (block by store) in 1 factor
(3 levels)
Adding More Factors
• It is often important to include multiple factors in the same
experiment in order to examine what happens when the
factor levels are applied in different combinations.
Experimental Design – Example:
• There are two kinds of gardeners. Some water
frequently, making sure that the plants are never dry.
Others let Mother Nature take her course and leave
the watering to her. The makers of OptiGro want to
ensure that their product will work under a wide
variety of watering conditions. Maybe we should
include the amount of watering as part of our
experiment.
Experimental Design – Example:
• Completely randomized two-factors, 3 levels experiment (6
treatments)
Matching
• In a retrospective or prospective study, subjects are
sometimes paired because they are similar in ways not under
study.
– Matching subjects in this way can reduce variability in
much the same way as blocking.
Matched Pairs Design
• A simple and common special type of block design.
• Two types – One Subject or Two Subjects
• Conditions
– Compare only 2 treatments.
– Each block consists of just 2 units, as closely matched as possible (two
subjects).
– Units are assigned at random to the treatments.
– Each block may consist of one subject who gets both treatments one
after the other.
– Each subject serves as their own control.
– The order of the treatments can influence the subject’s
response, so the order is randomized for each subject.
Matched Pairs Design
• One Subject: A common form of matched pairs
design uses just one subject who receives both
treatments. The order in which the subject receives
the treatments is randomized.
• Example: 1) Cola taste test – Matched Pairs
– Each subject compares two colas (Pepsi/Coke) and
picks the one they prefer.
– The order in which they taste the colas is
randomized.
Matched Pairs Design
• Example: 2) A researcher believes that students are able to concentrate
better while listening to classical music. To test this theory she plans to
record the time it takes a student to complete a puzzle maze while
listening to classical music and the time it takes him/her to complete
another puzzle of the same difficulty level in a quiet room. Because there
is so much variability in problem-solving abilities among students, a
matched pairs design will be used to reduce this variability so that any
difference recorded can be attributed to the conditions under which the
student completed the puzzle.
Design - Each student will complete a puzzle under each of the
conditions. A coin will be flipped to determine whether the task will be
done in a quiet room first or while listening to classical music. The
difference in the time it takes to complete each puzzle
(Quiet-Music) is recorded for each student.
Matched Pairs Design
• Two Subjects: The two subjects are paired based on
common characteristics that might affect the
response variable. One subject from each pair is
randomly assigned to each of the treatment groups.
The response variable is then the difference in the
response to the two treatments for each pair.
Matched Pairs Design
• Example: Marathon runners are matched by weight,
physical build, and running times. They are asked to
test the design of a new running shoe compared to
the manufacturer’s old design for durability through
a race. A coin is tossed to determine which runner in
each pair will wear the new design. After the
marathon the difference in wear pattern for each
pair of runners is then measured and recorded.
Confounding
• An experiment is said to be confounding if we cannot separate the
effect of a factor or treatment (explanatory variable) from the effects
of other influences (confounding variables) on the response variable.
• Example: When the levels of one factor are associated with the levels
of another factor, we say that these two factors are confounded.
• When we have confounded factors, we cannot separate out the
effects of one factor from the effects of the other factor.
• In the lab, we try to avoid confounding by rigorously controlling the
environment of the experiment so that nothing except the
experimental treatment influences the response.
Lurking or Confounding
• A lurking variable creates an association between two other
variables that tempts us to think that one may cause the
other.
– This can happen in a regression analysis or an
observational study.
– A lurking variable is usually thought of as a prior cause of
both y and x that makes it appear that x may be causing y.
Lurking or Confounding
• Confounding can arise in experiments when some other
variables associated with a factor has an effect on the
response variable.
– Since the experimenter assigns treatments (at random) to
subjects rather than just observing them, a confounding
variable can’t be thought of as causing that assignment.
• A confounding variable, then, is associated in a noncausal way
with a factor and affects the response.
– Because of the confounding, we find that we can’t tell
whether any effect we see was caused by our factor or by
the confounding factor (or by both working together).
What Can Go Wrong?
• Don’t give up just because you can’t run an experiment.
– If we can’t perform an experiment, often an observational
study is a good choice.
• Beware of confounding.
– Use randomization whenever possible to ensure that the
factors not in your experiment are not confounded with
your treatment levels.
– Be alert to confounding that cannot be avoided, and report
it along with your results.
What Can Go Wrong?
• Bad things can happen even to good experiments.
– Protect yourself by recording additional information.
• Don’t spend your entire budget on the first run.
– Try a small pilot experiment before running the full-scale
experiment.
– You may learn some things that will help you make the fullscale experiment better.
What have we learned?
• We can recognize sample surveys, observational studies, and
randomized comparative experiments.
– These methods collect data in different ways and lead us
to different conclusions.
• We can identify retrospective and prospective observational
studies and understand the advantages and disadvantages of
each.
• Only well-designed experiments can allow us to reach causeand-effect conclusions.
– We manipulate levels of treatments to see if the factor we
have identified produces changes in our response variable.
What have we learned?
• We know the principles of experimental design:
– Identify and control as many other sources of variability as
possible so we can be sure that the variation in the
response variable can be attributed to our factor.
– Try to equalize the many possible sources of variability that
cannot be identified by randomly assigning experimental
units to treatments.
– Replicate the experiment on as many subjects as possible.
– Control the sources of variability we can, and consider
blocking to reduce variability from sources we recognize
but cannot control.
What have we learned?
• We’ve learned the value of having a control group and of
using blinding and placebo controls.
• We can recognize problems posed by confounding variables in
experiments and lurking variables in observational studies.
Assignment
• Ch-13, pg.312 – 316: #5 – 15 odd, 21 – 25 odd, 29,
33, 41
• Read Ch-14, pg. 324 - 337