Designing Experiments

Download Report

Transcript Designing Experiments

Producing Data: Samples and
Experiments
Chapter 5
Simple Random Sample
1. number the population
2. use a method to randomly select the desired
sample size from entire population
Advantages: every member of population always
has equal chance of being selected
Disadvantages: sample may not be representative
of population; difficult with large populations
Cluster Random Sample
1. divide population into clusters
2. use a method to randomly select one or more
clusters
3. use a method to randomly select from the
chosen clusters
Advantages: can work well if population is easy
to divide or there are established clusters
Disadvantages: not everyone has equal chance of
being chosen; selected clusters may not be
representative of population
Stratified Random Sample
1. divide population into strata
2. use a method to randomly select a sample
from each strata
Advantages: guarantees representation from each
strata
Disadvantages: not everyone has equal chance of
being chosen; strata (of interest) may be
difficult to determine; population may be
difficult/laborious to sort
Systematic Random Sample
1. use sample size and population size to
determine (estimate) “magic number”
2. use a method to randomly select number using
“magic number” as range; add to determine
corresponding selections
Advantages: allows rapid method to select from
large population; helps provide representation
throughout population
Disadvantages: not everyone has equal chance of
being chosen; sample may not be representative
Multi-Stage Random Sample
1. use a method (SRS, cluster, stratified)
to randomly select (large) groups
2. use a method (SRS, cluster, stratified to
randomly select (smaller) groups
3. repeat until participants are chosen
Role of Sampling Design
 Statistical inference provides ways to
answer specific questions from data with
some guarantee that the answers are good
ones.
 Statistical inference will be inaccurate if the
method of collecting data is flawed.
Other Sampling Designs
 Suppose the principal is interested in
finding out if Dripping Springs students
think more trees should be planted. He
makes an announcement and instructs
students to come by his office to let him
know if tree planting is an issue they
support. Will this sample of students give
him an accurate picture of all students
feelings at Dripping Springs?
Other Sampling Designs
 A voluntary response sample consists of
people who choose themselves by
responding to a general appeal.
 Voluntary response samples over represent
people with strong opinions.
Other Sampling Designs
 The principal is surprised to find most of
the students coming in his office are in
favor of the tree planting. Feeling that
maybe his design may not have worked, he
ventures into the hallways and starts asking
students randomly. Will this sample of
students give him an accurate picture of all
students feelings at Dripping Springs?
 convenience sample (ie. mall sampling)
Defining Important Terms
 population
 sample
 sample design


good: simple random sample, cluster, stratified,
systematic
poor: voluntary response, convenience
sampling
 bias
A poor design systematically favors certain outcomes
or results.
Random-random sample practice
1. simple random
sample
2. convenience sample
3. cluster sample
4. voluntary response
5. systematic sample
6. stratified sample
1. Dripping Springs
seniors
2. UT alumni
3. Hot Rod magazine
subscribers
4. Texans
5. national pet stores
6. Dripping Springs
middle school
students
Cautions about sample surveys







lurking variable
confounding variables
undercoverage
nonresponse
response bias
wording of questions
Sample results do not necessarily match
population results.
Identify potential problems
 To obtain a sample of households, a
television rating service dials numbers taken
at random from telephone-directories.
 To determine the percentage of teenage girls
with long hair, Teen magazine published a
mail-in questionnaire. Of the 500
respondents, 85% had hair shoulder length
or longer.
Identify potential problems
 To evaluate the reliability of cars owned by
its subscribers, Consumer Reports magazine
publishes a yearly list of automobiles and
their frequency-of-repair records. The
magazine collects the information by
mailing a questionnaire to subscribers and
tabulating the results from those who return
it.
Identify potential problems
 For a survey of student opinions about high
school athletic programs, a member of the
school board obtains a sample of students by
listing all high school students and using a
random number table to select 30 of them.
After making phone calls last weekend, she
notes six of the students said that they didn’t
have time to participate in the survey.
Role of mathematics in sampling
 Results may change from sample to
sample, but since we deliberately use
chance, the results obey the laws of
probability allowing fairly consistent
results (within a margin of error).
 The degree of accuracy can be improved
by increasing the size of the sample.
Discussion example 1
 Julie obtains lists of all seniors in her high
school who did and did not study a foreign
language. Then she compares their scores on
a standard test of English reading and
grammar given to all seniors. The average
score of the students who studied a foreign
language is much higher than the average
score of those who did not.
 Does this study give evidence that studying
another language builds skill in English?
Discussion example 2
 The ability to grow in shade may help pines
found in the dry forests of Arizona resist
drought. Investigators planted pine
seedlings in a greenhouse in either full light
or light reduced to 75% of normal by shade
cloth. They tried soil types from three
different areas of Arizona. At the end of the
study, they dried the young trees and
weighed them.
Design example 3
 Sickle-cell disease is an inherited disorder of
the red blood cells that in the United States
affects mostly blacks. It can cause severe
pain and many complications. A study by
the National Institutes of Health gave
hydroxyurea (pain medication) to 150 sicklecell suffers and a placebo (a dummy
medication) to another 150. The researchers
then counted the episodes of pain reported by
each subject.
Designing Experiments Vocabulary







Observational study
Experiment
Experimental units/subjects
Treatment
Factors
Level
Explanatory vs. response variable
Randomized comparative
experiments
 Goal of an experiment: collect good
evidence for a cause-and-effect relationship.
 The success of an experiment depends on our
ability to treat all the experimental units
identically except for the actual treatment.
 Any observed effect so large that it would
rarely occur by chance is called statistically
significant.
Example
 A baby-food producer claims that her product is
superior to that of her leading competitor, in that
babies gain weight faster with her product. As an
experiment, 30 healthy babies are randomly
selected. For two months, 15 are fed her product
and 15 are fed the competitor’s product. Each
baby’s weight gain (in ounces) was recorded.
 Produce a diagram of the experiment.
Completely Randomized Design
Group 1
Treatment 1
15 babies
Her product
Random
Assignment
Group 2
Treatment 2
15 babies
Competitor’s
Compare
weight
gain
Babies will be numbered 1 to 30. Using a random number table, the
first 15 selected will be in Group 1 with the remaining placed in group
2. Each babies’ weight will be measured in pounds and compared.
Example
 We wish to determine whether or not a new type
of fertilizer is more effective than the type
currently in use. Researchers have subdivided a
20-acre farm into twenty 1-acre plots. Wheat will
be planted on the farm, and at the end of the
growing season the number of bushels harvested
will be measured.
 Produce a diagram of the experiment.
Completely Randomized Design
Group 1
Treatment 1
10 acres
New type
Compare
bushels
Random
Allocation
Group 2
Treatment 2
10 acres
Current
Land plots will be numbered 1 to 20. Using a random number table, the
first 10 selected will be in Group 1 with the remaining placed in group
2. The bushels of wheat from each plot will be counted and compared.
An example of a good design?
 In order to test the effectiveness of nicotine
patches, Dr. Hurt recruited 240 smokers at
various locations. Volunteers were to
receive a 22-mg nicotine patch for eight
weeks. Almost half (46%) of the nicotine
group had quit smoking at the end of the
study.
 Placebo effect
Principles of Experimental Design
 Control

using comparison helps to reduce the effects of
lurking variables
 Randomization

the use of impersonal chance typically produces
groups of experimental units that should be
similar in all respects
 Replication

larger numbers of experimental units are better
than smaller due to the reduction of chance
variation in the results
Caution of Experimental Design
 The most serious potential weakness of experiments
is lack of realism. The setting of an experiment may
not realistically duplicate the conditions we really
want to study.
 single blind versus double blind
Improving the Design Example
 You are participating in the design of a
medical experiment to investigate whether a
calcium supplement in the diet will reduce
the blood pressure of middle-aged men.
Preliminary work suggests that calcium may
be effective and that the effect may be
greater for African-American men than for
white men. Outline the design of an
appropriate experiment.
Improving the Design
 A block is a group of experimental units or
subjects that are known before the experiment to
be similar in some way that is expected to affect
the response to the treatments.
 Block design has the same rationale as a stratified
random sample.
 Blocks allow us to reduce the amount of variation
to improve the accuracy of our conclusions.
Block Design
Group 1
African
Random
American men assignment
Group 2
Subjects
White men
Group 3
Treatment 1
Calcium
Treatment 2
Placebo
Treatment 1
Calcium
Compare
blood
pressure
Random
assignment
Group 4
Treatment 2
Placebo
All African American men will be assigned a random number. Half the
men who have the smallest numbers will be assigned group 1, the half
with the largest numbers will be assigned group 2. The process will repeat
for the white men. The reduction in blood pressure will be compared.
Design Example
 Is the right hand of a right-handed people
generally stronger than the left? Paul Murky
of Murky Research designs an experiment to
test this question. He fastens an ordinary
bathroom scale to a shelf five feet from the
floor, with the end of the scale projecting out
from the shelf. Subjects squeeze the scale
between their thumb and their fingers on the
top. The scale reading in pounds measures
hand strength.
Improving the Design
 In a matched pair design, each subject in the
experiment will receive two (and only two)
treatments.
 The order that each subject receives both
treatments is randomly selected to preserve
the important aspect of randomization.
Matched pair Design
Group 1 Treatment 1
left hand
Treatment 2
right hand
Compare
Random
Allocation
difference
Group 2
Treatment 2
Treatment 1
right hand
left hand
A coin will be flipped to decide which hand will be measured first by
each participant. Heads will squeeze the left hand first, tails will
squeeze the right hand first. The different in the pounds on the scale
will be compared.
Why a simulation?
 A simulation is using a model to imitate a
chance behavior based on a specific
problem situation.
 A simulation allows a model to be analyzed
when a theoretical probability is unknown
or indeterminate.
Elements of a simulation




Number assignment
Description of a trial
Stopping rule
Execution of simulation (marking of the
number line)
 Documentation of results
Simulation Example
 Traffic Lights: Coming to school each day, Anne
rides through three traffic lights, A, B, and C. The
probability that any one light is green is 0.3, and the
probability that it is not green is 0.7. Use a simulation
to answer questions below.
 We must assume that the lights operate independently.
 Estimate the probability that Anne will find all traffic
lights to be green.
 Estimate the probability that Anne will find at least
one light to be not green.
Simulation Example





Number assignment
0 – 2 green light; 3 – 9 not green
(1 – 3 green light; 4 – 0 not green)
Description of a trial/Stopping rule
A trial consists of choosing one digit at a time to
represent one traffic light. After we determine if
the light is green or not green, the trial ends after
three lights.
 Execution of simulation
 Documentation of results
Simulation Example
47039 27923 09105 89221 07043
90862
97329 90169 63091 31283 56000
67831
three green lights
two or fewer
1
P  all lights green  
20
 .05