Transcript Slide 1

+
Chapter 4
Designing Studies
4.1
Samples and Surveys
+ You can hardly go a day without hearing the
results of a statistical study. Here are some
examples:
Social Media and Teens

According to a Common Sense
Media study, nine out of ten 13to 17-year-olds have used
some form of social media.
Three out of four teenagers
currently have a profile on a
social networking site, and one
in five has a current Twitter
account. 68% of all teens say
Facebook is their main social
networking site, compared to
6% for Twitter, 1% for Google
Plus, and 1% for MySpace.
Underage Drinking

During the past month (30
days), 26.4% of underage
persons (ages 12-20) used
alcohol, and binge drinking
among the same age group
was 17.4%.
SAMHSA
Can we trust the results? That depends on how the data were produced.
and Sample
+
 Population
Definition:
The population in a statistical study is the entire group of individuals
about which we want information.
A sample is the part of the population from which we actually collect
information. We use information from a sample to draw conclusions
about the entire population.
Population
Collect data from a
representative Sample...
Sample
Make an Inference about the
Population.
Sampling and Surveys
The distinction between population and sample is basic to
statistics. To make sense of any sample result, you must know
what population the sample represents
Sampling Students and Soda
Problem: Identify the population and sample in each of the following settings.
(a) The student government at a high school surveys 100 of the students at the school
to get their opinions about a change to the bell schedule.
(a) Population is all students at the school, sample is the 100 students surveyed.
(b) The quality control manager at a bottling company selects a sample of 10 cans
from the production line every hour to see if the volume of the soda is within
acceptable limits.
(b) Population is all cans produced that hour, sample is 10 cans inspected.
Idea of a Sample Survey
Choosing a sample from a large, varied population is
not that easy.
Step 1: Define the population we want to describe.
Step 2: Say exactly what we want to measure.
Step 3: Decide how to choose a sample from the
population.
A “sample survey” is a study that uses an
organized plan to choose a sample that represents
some specific population.
Sampling and Surveys
We often draw conclusions (inferences) about a
whole population on the basis of a sample.
+
 The
to Sample Badly
Definition:
Choosing individuals who are easiest to reach results
in a convenience sample.
Convenience samples often produce unrepresentative
data…why?
Definition:
The design of a statistical study shows bias if it
systematically favors certain outcomes.
Sampling and Surveys
How can we choose a sample that we can trust to
represent the population? There are a number of
different methods to select samples.
+
 How
to Sample Badly
samples are almost guaranteed to
show bias. So are voluntary response samples, in
which people decide whether to join the sample in
response to an open invitation.
Definition:
A voluntary response sample consists of people who
choose themselves by responding to a general appeal.
Voluntary response samples show bias because
people with strong opinions (often in the same
direction) are most likely to respond.
Sampling and Surveys
 Convenience
+
 How
For each of the following situations, identify the sampling method used. Then
explain how the sampling method could lead to bias.
a) A farmer brings a juice company several crates of oranges each week. A
company inspector looks at 10 oranges from the top of each crate before
deciding whether to buy all the oranges.
a. The ABC program Nightline once asked whether the United Nations should
continue to have its headquarters in the US. Viewers were invited to call one
telephone number to respond “Yes” and another for “No”. There was a charge
for calling either number. More that 186,000 callers responded and 67% said,
“No”.
to Sample Well: Random Sampling
The statistician’s remedy is to allow impersonal chance to
choose the sample. A sample chosen by chance rules out both
favoritism by the sampler and self-selection by respondents.

Random sampling, the use of chance to select a sample, is
the central principle of statistical sampling.
Definition:
A simple random sample (SRS) of size n consists
of n individuals from the population chosen in such a
way that every set of n individuals has an equal
chance to be the sample actually selected.
In practice, people use random numbers generated by a
computer or calculator to choose samples. If you don’t have
technology handy, you can use a table of random digits.
Sampling and Surveys

+
 How
Definition:
A simple random sample (SRS) of size n consists
of n individuals from the population chosen in such a
way that every set of n individuals has an equal
chance to be the sample actually selected.
Select a sample of size 4 from our class to illustrate the definition of SRS
Need two recorders
Let’s force the number the sample to have 2 girls and 2 boys.
Why is this not an SRS?
Since samples of 4 that do not contain 2 girls and 2 boys have no chance of
occurring. In an SRS, each possible sample of 4 has the same chance of
occurring.
In practice, people use random numbers generated
by a computer or calculator to choose samples. If
you don’t have technology handy, you can use a
table of random digits.
to Choose an SRS
How to Choose an SRS Using Table D
Step 1: Label. Give each member of the population a
numerical label of the same length.
Step 2: Table. Read consecutive groups of digits of the
appropriate length from Table D.
Your sample contains the individuals whose labels you
find.
Sampling and Surveys
Definition:
A table of random digits is a long string of the digits 0, 1, 2, 3,
4, 5, 6, 7, 8, 9 with these properties:
• Each entry in the table is equally likely to be any of the 10
digits 0 - 9.
• The entries are independent of each other. That is,
knowledge of one part of the table gives no information about
any other part.
+
 How
Problem: Use Table D at line 130 to choose an SRS of 4 hotels.
01 Aloha Kai
02 Anchor Down
03 Banana Bay
04 Banyan Tree
05 Beach Castle
06 Best Western
07 Cabana
69051
08 Captiva
09 Casa del Mar
10 Coconuts
11 Diplomat
12 Holiday Inn
13 Lime Tree
14 Outrigger
15 Palm Tree
16 Radisson
17 Ramada
18 Sandpiper
19 Sea Castle
20 Sea Club
21 Sea Grape
22 Sea Shell
23 Silver Beach
24 Sunset Beach
25 Tradewinds
26 Tropical Breeze
27 Tropical Shores
28 Veranda
64817 87174 09517 84534 06489 87201 97245
Sampling and Surveys

How to Choose an SRS
+
 Example:
69 05 16 48 17 87 17 40 95 17 84 53 40 64 89 87 20
Our SRS of 4 hotels for the editors to contact is: 05 Beach Castle,
16 Radisson, 17 Ramada, and 20 Sea Club.
ACU#2
Alternate Example: Mall Hours
The management company of a local mall plans to survey a random sample of 3 stores to
determine the hours they would like to stay open during the holiday season.
Problem: Use Table D at line 101 to select an SRS of size 3 stores.
01 Aeropostale
08 Forever 21
15 Old Navy
02 All American Burger
09 GameStop
16 Pac Sun
03 Arby’s
10 Gymboree
17 Panda Express
04 Barnes & Noble
11 Haggar
18 Payless Shoes
05 Carter’s for Kids
12 Just Sports
19 Star Jewelers
06 Destination Tan
13 Mrs. Fields
20 Vitamin World
07 Famous Footwear
14 Nike Factory Store
21 Zales Diamond Store
Using line 101, here are the selected stores:
19 (Star Jewelers)
22 (skip)
39 (skip)
50 (skip)
34 (skip)
05 (Carter’s for Kids)
75 (skip)
62 (skip)
87 (skip)
13 (Mrs. Fields)
SRS using our Graphing Calculators (Tech Corner pg 214)
Seed the random number generator for TI-83/84, enter in a unique number on your
screen (cell phone number, student number).
Then, press the STO
button (just above the ON button).
Finally, have them press the MATH button, scroll to the PRB menu, choice the first
option, rand, and press ENTER
01 Aloha Kai
02 Anchor Down
03 Banana Bay
04 Banyan Tree
05 Beach Castle
06 Best Western
07 Cabana
08 Captiva
09 Casa del Mar
10 Coconuts
11 Diplomat
12 Holiday Inn
13 Lime Tree
14 Outrigger
15 Palm Tree
16 Radisson
17 Ramada
18 Sandpiper
19 Sea Castle
20 Sea Club
21 Sea Grape
22 Sea Shell
23 Silver Beach
24 Sunset Beach
25 Tradewinds
26 Tropical Breeze
27 Tropical Shores
28 Veranda
ACU#3 How large is a typical U.S. State?
Pick 5 states in
15 seconds.
Refer
to the table of land areas on page N/DS-5.
Find the mean land area for your sample.
Dot plot of the results.
Use line of Table D that I assign you to choose an SRS of 5
states.
Find the mean land for this sample.
Dot plot of the results.
How do the class’s estimates using the two methods
compare? What advantage(s) does random sampling
provide?
A university’s financial aid office wants to know how much it can expect
students to earn from summer employment. This information will be used to
set the level of financial aid. The population contains 3478 students who
have completed at least one year of study but have not yet graduated. A
questionnaire will be sent to an SRS of 100 of these students drawn from an
alphabetized list.
(a) Describe how you will select the sample.
(b) Starting at line 135, use the portion of the random digits table below to
select the first three students in the sample.
135 66925 55658 39100 78458 11206 19876 87151 31260
136 08421 44753 77377 28744 75592 08563 79140 92454
137 53645 66812 61421 47836 12609 15373 98481 14592
The basic idea of good sampling is
straightforward: take an SRS from the population
and use your sample results to gain information
about the population.
SRS:
• every individual has = chance of getting picked,
• every sample of the size you are drawing has
= chance of getting picked
Unfortunately, it’s usually very difficult to actually
get an SRS from the population of interest.
Sometimes, there are also statistical advantages
to using more complex samplings methods.
If the individuals in each stratum are less varied
than the Same
population
as aand
whole,
a stratified
Strata
Within
Different
Between
random sample can often produce better
EX. The population of students in a large
information about the population than an SRS
high
school
of the
same might
size. be divided not freshman,
sophomore, junior, and senior strata.
+
Definition:
To select a stratified random sample, first classify the
population into groups of similar individuals, called
strata. Then choose a separate SRS in each stratum
and combine these SRSs to form the full sample.

Sampling Sunflowers pg. 216
Use Table D or technology to take an SRS of 10 grid squares,
then using the rows as strata. Then, repeat using the columns
as strata.
+
 Activity:
Cell Phones Only
A growing percentage of Americans are dropping
their traditional land line phones and using their cell
phones exclusively.
This has caused a problem for polling organizations
who in the past only used land lines when randomly
selecting people for their polls.
Since the group of people who use cell phones
exclusively are different in many ways than people
who do not rely exclusively on cell phones, polling
organizations now use a stratified sampling design
that selects a random sample of cell phone users and
a random sample of land line users.

Although a stratified random sample can sometimes give more
precise information about a population than an SRS, both
sampling methods are hard to use when populations are large
and spread out over a wide area.

In that situation, we’d prefer a method that selects groups of
individuals that are “near” one another.
Definition:
To take a cluster sample, first divide the population
into smaller groups. Ideally, these clusters should
mirror the characteristics of the population. Then
choose an SRS of the clusters. All individuals in the
chosen clusters are included in the sample.
+
Other Sampling Methods
Sampling and Surveys

Cluster Different Within and Same Between

Sampling Sunflowers
Which, rows or columns, could be used for clusters? Why?
+

Strata (Same Within group/Different Between groups) Rows or Section?
Cluster(Different Within group/Same Between groups) Rows or Section?
A Hotel on the Beach
The manager of a beach-front hotel wants to survey guests in the hotel to estimate
overall customer satisfaction. The hotel has two towers, an older one to the south and
a newer one to the north. Each tower has 10 floors of standard rooms (40 rooms per
floor) and 2 floors of suites (20 suites per floor). Half of the rooms in each tower face
the beach, while the other half of the rooms face the street. This means there are (2
towers)(10 floors)(40 rooms) + (2 towers)(2 floors)(20 suites) = 880 total rooms.
Problem:
(a) Explain how to select a simple random sample of 88 rooms.
(b) Explain how to select a stratified random sample of 88 rooms.
(c) Explain why selecting 2 of the 24 different floors would not be a good way to obtain
a cluster sample.
(b)
customer
vary
based
ononce
roomthe
(e.g.
aselected
suite in the
(c)
a)Since
Although
Number
each
it would
ofsatisfaction
the
be rooms
easy towill
from
collect
001the
to 880.
data
Using
alocation
random
floors were
number
north
tower
facing
the
ocean
much
nicer
thanfrom
a room
in
south
facing the
(since
generator,
you only
select
need
88 unique
to
visit is
two
random
floors),
integers
each
floor
001
is more
to the
880.
homogenous
Usetower
the than
street),
we rooms
shouldwhich
take
sample
from each
type of room.
willcluster
take an
heterogeneous,
selected
for
theaissample
a bad things
for clusters.
Ideally,We
each
should
SRS
of 20
the 200 types
south of
tower
rooms
facing
beach 2 floors, it is fairly likely
include
allfrom
the different
rooms.
If you
onlythe
selected
SRS
20would
from the
southintower
rooms facing
that of
you
get 200
no suites
the sample
or onlythe
getstreet
one of the towers in the
SRS
of 20 from the 200 north tower rooms facing the beach
sample.
SRS of 20 from the 200 north tower rooms facing the street
SRS of 2 from the 20 south tower suites facing the beach
SRS of 2 from the 20 south tower suites facing the street
SRS of 2 from the 20 north tower suites facing the beach
SRS of 2 from the 20 north tower suites facing the street
1. Your school will send a delegation of 35 seniors to a student life convention.
200 girls and150 boys are eligible to be chosen. If a sample of 20 girls and
separate sample 15 boys are each selected randomly, it gives each senior the
same chance to be chosen to attend the
convention.
Is it an SRS? Explain.
2.
18. Dead trees. On the west side of Rocky Mountain National Park, many
mature pine trees are dying due to infestation by pine beetles. Scientists
would like to use sampling to estimate the proportion of all pine trees in the
area that have been infected.
(a) Explain why it wouldn’t be practical for scientists to obtain an SRS in this
setting.
To obtain an SRS, every tree would need to have an equal chance of being
included in the sample. It is not practical to even identify every tree in the park.
18. Dead trees. On the west side of Rocky Mountain National Park, many
mature pine trees are dying due to infestation by pine beetles. Scientists
would like to use sampling to estimate the proportion of all pine trees in the
area that have been infected.
(b) A possible alternative would be to use every pine tree along the park’s
main road as a sample. Why is this sampling method biased?
This sampling method is biased because these trees are unlikely to be
representative of the population. Trees along the main road are more
likely to be damaged by cars and people and may be more susceptible to
infestation.
18. Dead trees. On the west side of Rocky Mountain National Park, many
mature pine trees are dying due to infestation by pine beetles. Scientists
would like to use sampling to estimate the proportion of all pine trees in the
area that have been infected.
(c) Suppose that a more complicated random sampling plan is carried out,
and that 35% of the pine trees in the sample are infested by the pine
beetle. Can scientists conclude that 35% of all the pine trees on the west
side of the park are infested? Why or why not?
The scientists can be confident that the actual percent of pine trees in
the area that are infected by the pine beetle is near 35%, although there
is always some error associated with using sampling to estimate
population parameters.
Parameter-is a number that describes some characteristic of the
population

The purpose of a sample is to give us information about a
larger population.

The process of drawing conclusions about a population on the
basis of sample data is called inference.
“Margin of error” does not
mean
that
a mistake
has
Why should we
rely on
random
sampling?
been
made but
rather
1)To eliminate bias
in selecting
samples
from the list of
available individuals.
compensates for the
variability
results
2)The laws of probability
allowthat
trustworthy
inference about the
population
from taking a random
• Results
from random
come with a margin of
sample
fromsamples
a population.
error that sets bounds on the size of the likely error.
• Larger random samples give better information about the
population than smaller samples.
+
Inference for Sampling
Sampling and Surveys

• Sampling Variability is described by the margin of error
that comes with most poll results.
• Most sample surveys are affected by errors in addition to
sampling variability.
These errors can introduce bias that make a survey result
meaningless.
Two main sources of errors in sample surveys:

Sampling errors
 Non-sampling errors
Sampling errors
Sampling errors are mistakes made in the process of taking a sample
that could lead to inaccurate information about the population.
• Random sample margin of error
• bad sampling methods, such as voluntary response and
convenience samples. We can control these.
• Undercoverage which occurs when some members of the
population are left out of the sampling frame
Non-sampling errors
Nonresponse occurs when an individual chosen for the
sample can’t be contacted or refuses to participate.
Response bias is when people intentionally answer wrong
Poor wording of questions
Don’t misuse the term “voluntary response” to
explain why certain individuals don’t respond in a
sample survey.
Nonresponse can occur only after a sample has
been selected.
In voluntary response sample, every individual
has opted to take part, so there won’t be any
nonresponse.
+ Section 4.2
Experiments
Learning Objectives
After this section, you should be able to…

DISTINGUISH observational studies from experiments

DESCRIBE the language of experiments

APPLY the three principles of experimental design
+
+
http://apcentral.collegeboard.com/a
pc/public/repository/ap10_statistics_
form_b_q2.pdf
Study versus Experiment
An observational study observes individuals and measures
variables of interest but does not attempt to influence the
responses.
Experiments
In contrast to observational studies, experiments don’t just
observe individuals or ask them questions. They actively
impose some treatment in order to measure the response.
Definition:
+
 Observational
An experiment deliberately imposes some treatment on
individuals to measure their responses.
When our goal is to understand cause and effect, experiments are
the only source of fully convincing data.
Let me repeat…
When our goal is to understand cause and effect, experiments are
the only source of fully convincing data.
The distinction between observational study and experiment is one of
the most important in statistics.
We think that car weight helps explain accident deaths and
that smoking influences life expectancy.
Car weight
Smoking
accident deaths
life expectancy
In these relationships, the two variables play different roles.
Car weight and number of cigarettes smoked are the explanatory variables
Accident death rate and life expectancy are the response variables
An explanatory variable may help explain or influence changes in a response
variable
A response variable measures an outcome of a study
.
Soy Good For You?
The November 2009 issue of Nutrition Action discusses what the current
research tells us about the supposed benefits of soy. For a long time,
scientists have believed that the soy foods in Asian diets explains the
lower rates of breast cancer, prostate cancer, osteoporosis, and heart
disease in places like China and Japan. However, when experiments
were conducted, soy either had no effect or a very small effect on the
health of the participants. For example, several different studies
randomly assigned elderly women either to soy or placebo, and none of
the studies showed that soy was more beneficial for preventing
osteoporosis. So what explains the lower rates of osteoporosis in Asian
cultures? We still don’t know. It could be due to genetics, other dietary
factors, or any other difference between Asian cultures and non-Asian
cultures.
An explanatory variable may help explain or influence changes in a
response variable.
Whether the woman consumed soy.
A response variable measures an outcome of a study.
The overall health of the woman
Weight and Height
Julie wants to know if she can predict a student’s weight from his
or her height.
Information about height is easier to obtain than information about weight!
Jim wants to know if there is a relationship between height and
weight .
Problem: For each student, identify the explanatory variable and
response variable, if possible
Solution: Julie is treating a student’s height as the explanatory
variable and the student’s weight as the response variable.
Jim is just interested in exploring the relationship between the
two variables, so there is no clear explanatory or response
variable.
Study versus Experiment
Definition:
A lurking variable is a variable that is not among the
explanatory or response variables in a study but that may
influence the response variable.
Confounding occurs when two variables are associated in
such a way that their effects on a response variable cannot be
distinguished from each other.
Well-designed experiments take steps to avoid confounding.
Experiments
Observational studies of the effect of one variable on another
often fail because of confounding between the explanatory
variable and one or more lurking variables.
+
 Observational
For example, ice cream consumption and murder rates are
highly correlated.
Now, does ice cream incite murder or does murder increase
the demand for ice cream?
Neither: they are joint effects of a common cause or lurking
variable, namely, hot weather. Another look at the sample
shows that it failed to account for the time of year, including
the fact that both rates rise in the summertime
What could be a lurking variable in these
examples?
• There is a strong positive correlation between
the foot length of K-12 students and reading
scores.
• Students who use tutors have lower test scores
than students who don’t.
• A survey shows a strong positive correlation
between the percentage of a country's
inhabitants that use cell phones and the life
expectancy in that country
Language of Experiments
Definition:
A specific condition applied to the individuals in an experiment is
called a treatment. If an experiment has several explanatory
variables, a treatment is a combination of specific values of these
variables.
The experimental units are the smallest collection of individuals
to which treatments are applied. When the units are human
beings, they often are called subjects.
Sometimes, the explanatory variables in an experiment are called factors.
Many experiments study the joint effects of several factors. In such an
experiment, each treatment is formed by combining a specific value (often
called a level) of each of the factors.
Experiments
An experiment is a statistical study in which we actually do
something (a treatment) to people, animals, or objects (the
experimental units) to observe the response. Here is the
basic vocabulary of experiments.
+
 The
Experiments are the preferred method for examining the effect
of one variable on another. By imposing the specific treatment
of interest and controlling other influences, we can pin down
cause and effect. Good designs are essential for effective
experiments, just as they are for sampling.
Experiment

to Experiment Badly
+
 How
Example, page 236
A high school regularly offers a review course to
prepare students for the SAT. This year, budget cuts
will allow the school to offer only an online version of
the course. Over the past 10 years, the average SAT
score of students in the classroom course was 1620.
The online group gets an average score of 1780.
That’s roughly 10% higher than the long- time
average for those who took the classroom review
course. Is the online course more effective?
Students -> Online Course -> SAT Scores
Many laboratory experiments use a design like the one in the
online SAT course example:
Experimental
Units
Students --->
Treatment
Measure
Response
Online Course ----->
SAT Scores
In the lab environment, simple designs often work well.
Field experiments and experiments with animals or people deal
with more variable conditions.
Outside the lab, badly designed experiments often yield
worthless results because of confounding.
Experiment

to Experiment Badly
+
 How
to Experiment Well: The Randomized
Comparative Experiment
The remedy for confounding is to perform a comparative
experiment in which some units receive one treatment and
similar units receive another. Most well designed experiments
compare two or more treatments.

Comparison alone isn’t enough, if the treatments are given to
groups that differ greatly, bias will result. The solution to the
problem of bias is random assignment.
Definition:
In an experiment, random assignment means that
experimental units are assigned to treatments at
random, that is, using some sort of chance process.
Experiments

+
 How
ACU #
Does Caffeine Affect Pulse Rate?
Many students regularly consume caffeine to help them stay alert. Thus, it
seems plausible that taking caffeine might increase an individual’s pulse rate.
Is this true? One way to investigate this is to have volunteers measure their
pulse rates, drink some cola with caffeine, measure their pulses again after
10 minutes and calculate the increase in pulse rate. Unfortunately, even if
every student’s pulse rate went up, we couldn’t attribute the increase to caffeine.
Suppose you have a class of 30 students who volunteer to be subjects in
the caffeine experiment described earlier.
Problem: Explain how you would randomly assign 15 students to each of
the two treatments.
Solution: Using 30 identical slips of paper, write A on 15 and B on the other
15. Mix them thoroughly in a hat and have each student select one paper.
Have each student who received an A drink the cola with caffeine and each
student who received a B to drink the cola without caffeine.
Randomized Comparative Experiment
+
 The
Group 1
Experimental
Units
Experiments
Definition:
In a completely randomized design, the treatments are
assigned to all the experimental units completely by chance.
Some experiments may include a control group that receives
an inactive treatment or an existing baseline treatment.
Treatment
1
Compare
Results
Random
Assignment
Group 2
Treatment
2
ACU # Dueling Diets
A health organization wants to know if a low-carb or low-fat diet is more
effective for long-term weight loss. The organization decides to conduct an
experiment to compare these two diet plans with a control group that is only
provided with brochure about healthy eating. Ninety volunteers agree to
participate in the study for one year.
Problem: Outline a completely randomized design for this experiment. Write a
few sentences describing how you would implement your design.
Solution: Here is a basic outline:
Group 1 (30 subjects)  Trt 1 (low-carb)
Random Assignment
Group 2 (30 subjects)  Trt 2 (low-fat)
Group 3 (30 subjects)  Trt 3 (control)
compare weight loss
To implement the design, use 90 equally sized slips of paper. Label 30 of the slips “1”, 30 of the
slips “2” and 30 of the slips “3”. Then, mix them up in a hat and have each subject draw a
number without looking. The number that each subject chooses will be the group he or she is
assigned to. At the end of the year, the amount of weight loss will be recorded for each subject
and the mean weight loss will be compared for the three treatments.
Randomized comparative experiments are designed to give
good evidence that differences in the treatments actually
cause the differences we see in the response.
Principles of Experimental Design
1. Control for lurking variables that might affect the response: Use a
comparative design and ensure that the only systematic difference
between the groups is the treatment administered.
2. Random assignment: Use impersonal chance to assign experimental
units to treatments. This helps create roughly equivalent groups of
experimental units by balancing the effects of lurking variables that aren’t
controlled on the treatment groups.
3. Replication: Use enough experimental units in each group so that any
differences in the effects of the treatments can be distinguished from
chance differences between the groups.
Experiments

Principles of Experimental Design
+
 Three
More Caffeine
Problem: Explain how to use all three principles of experimental design in the
caffeine experiment.
Solution:
Control:
There should be a control group that receives non-caffeinated cola. Also, the
subjects in each group should receive exactly the same amount of cola served
at the same temperature. Also, each type of cola should look and taste exactly
the same and have the same amount of sugar. Subjects should drink the cola
at the same rate and wait the same amount of time before measuring their
pulse rates. If all of these lurking variables are controlled, they will not be
confounded with caffeine or be an additional source of variability in pulse rates.
Randomization:
Subjects should be randomly assigned to one of the two treatments. This
should roughly balance out the effects of the lurking variables we cannot
control, such as body size, caffeine tolerance, and the amount of food recently
eaten.
Replication:
We want to use as many subjects as possible to help make the treatment
groups as equivalent as possible. This will give us a better chance to see the
effects of caffeine, if there are any.

The Physicians’ Health Study
A placebo is a “dummy pill” or inactive
treatment that is indistinguishable from the real
treatment.
Experiments
Read the description of the Physicians’ Health Study on page
241. Explain how each of the three principles of experimental
design was used in the study.
+
 Example:
What Can Go Wrong?
The logic of a randomized comparative experiment depends
on our ability to treat all the subjects the same in every way
except for the actual treatments being compared.

Good experiments, therefore, require careful attention to
details to ensure that all subjects really are treated identically.
A response to a dummy treatment is called a placebo effect. The
strength of the placebo effect is a strong argument for randomized
comparative experiments.
Whenever possible, experiments with human subjects should be
double-blind.
Definition:
In a double-blind experiment, neither the subjects nor those
who interact with them and measure the response variable
know which treatment a subject received.
Experiments

+
 Experiments:
The caffeine experiment can be conducted in a doubleblind manner. It is very important that the subjects be
unaware of which treatment they are receiving. To make
sure they are blind, the two colas must look and taste
exactly the same. To be double-blind, the people
measuring the pulse rates and interacting with the
subjects should also be unaware of which subjects
are getting which treatments. To make this happen,
have the subjects measure their own pulse since they are
already blind. Also have another teacher prepare the
colas with labels A and B and then leave the room before
anyone else gets there. Only after the experiment is
complete will this teacher come back to reveal which
treatment is which.
for Experiments
In an experiment, researchers usually hope to see a difference
in the responses so large that it is unlikely to happen just
because of chance variation.

We can use the laws of probability, which describe chance
behavior, to learn whether the treatment effects are larger than
we would expect to see if only chance were operating.

If they are, we call them statistically significant.
Definition:
An observed effect so large that it would rarely occur by chance is
called statistically significant.
A statistically significant association in data from a well-designed
experiment does imply causation.
Experiments

+
 Inference
Growing Tomatoes
Does adding fertilizer affect the productivity of tomato plants?
How about the amount of water given to the plants? To answer
these questions, a gardener plants 24 similar tomato plants in
identical pots in his greenhouse. He will add fertilizer to the soil
in half of the pots. Also, he will water 8 of the plants with 0.5
gallons of water per day, 8 of the plants with 1 gallon of water per
day and the remaining 8 plants with 1.5 gallons of water per day.
At the end of three months he will record the total weight of
tomatoes produced on each plant.
Problem: Identify the explanatory and response variables,
experimental units, and list all the treatments.
Growing Tomatoes
Does adding fertilizer affect the productivity of tomato plants? How about the amount of
water given to the plants? To answer these questions, a gardener plants 24 similar
tomato plants in identical pots in his greenhouse. He will add fertilizer to the soil in half
of the pots. Also, he will water 8 of the plants with 0.5 gallons of water per day, 8 of the
plants with 1 gallon of water per day and the remaining 8 plants with 1.5 gallons of
water per day. At the end of three months he will record the total weight of tomatoes
produced on each plant.
Problem: Identify the explanatory and response variables, experimental units, and list
all the treatments.
Solution: The
two explanatory variables are fertilizer and water.
The response variable is the weight of tomatoes produced.
The experimental units are the tomato plants.
There are 6 treatments:
(1) fertilizer, 0.5 gallon
(2) fertilizer, 1 gallon
(3) fertilizer, 1.5 gallons
(4) no fertilizer, 0.5 gallons
(5) no fertilizer, 1 gallon
(6) no fertilizer, 1.5 gallons

Completely randomized designs are the simplest statistical
designs for experiments. But just as with sampling, there are times
when the simplest method doesn’t yield the most precise results.
Definition
A block is a group of experimental units that are known before
the experiment to be similar in some way that is expected to
affect the response to the treatments.
In a randomized block design, the random assignment of
experimental units to treatments is carried out separately within
each block.
Form blocks based on the most important unavoidable sources
of variability (lurking variables) among the experimental units.
Randomization will average out the effects of the remaining
lurking variables and allow an unbiased comparison of the
treatments.
Control what you can, block on what you can’t
control, and randomize to create comparable groups.
+
Blocking
Experiments


Design
Definition
A matched-pairs design is a randomized blocked experiment
in which each block consists of a matching pair of similar
experimental units.
Chance is used to determine which unit in each pair gets each
treatment.
Sometimes, a “pair” in a matched-pairs design consists of a
single unit that receives both treatments. Since the order of the
treatments can influence the response, chance is used to
determine with treatment is applied first for each unit.
Experiments
A common type of randomized block design for comparing two
treatments is a matched pairs design. The idea is to create
blocks by matching pairs of similar experimental units.
+
 Matched-Pairs
Consider the Fathom dotplots from a completely randomized
design and a matched-pairs design. What do the dotplots
suggest about standing vs. sitting pulse rates?
Experiments

Standing and Sitting Pulse Rate
+
 Example:
Are these results statistically
DISTRACTEDsignificant?
DRIVING
How many subjects in the experiment?
How many drivers stopped at the rest area?
How many drivers did not stop?
We need 48 cards from the deck to represent the drivers.
Since we’re assuming that the treatment received won’t change whether each
driver stops at the rest area, we use 33 cards to represent drivers who stop and
15 cards to represent those who don’t.
Remove the ace of spades and any three of the 2s from the deck.
** Stop: All Cards with denominations 2 through10 (36 - 3 missing 2s = 33)
Don’t Stop: All jacks, queens, kings, and aces (16 – 1 missing ace =15)
Activity:
DISTRACTED DRIVING
Are these results statistically significant?
To find out, let’s see what would happen just by chance if we randomly
reassign the 48 people in this experiment to the two groups many times,
assuming the treatment received doesn’t affect whether a driver stops at
the rest area
POD Roles for Activity (with each trial switch roles by going clockwise
Shuffler/DealerFlipperRecorder/Dot plotter-
Only Two in Pod
Shuffler/Dealer/Flipper
Recorder/Dot plotter
Are these results statistically
DISTRACTEDsignificant?
DRIVING
Remove the ace of spades and any three of the
2s from the deck.
** Stop: All Cards with denominations 2
through10 (36 - 3 missing 2s = 330
Don’t Stop: All jacks, queens, kings, and
aces (16 – 1 missing ace =15)
• Shuffle and deal two piles of 24 cards each – the first pile represents the
cell phone group and the second pile represents the passenger group. The
shuffling reflects our assumption that the outcome for each subject is not
affected by the treatment. Record the number of drivers who fail to stop at
the rest area in each group.
• Repeat this process 9 more times so that you have a total of 10 trails/
• Make a class dotplot of the number of drivers in the cell phone group who failed to
stop at the rest area in each trial.
Are these results statistically
DISTRACTEDsignificant?
DRIVING
In what percent of the class’s trials did 12 or more
people in the cell phone group fail to stop at the rest
area?
In the original experiment, 12 of the 24 drivers using cell
phones didn’t stop at the rest area. Based on the class’s
simulation results, how surprising would it be to get a
result this large or larger simply due to the chance
involved in the random assignment?
Is the result statistically significant?
What conclusion would you draw about whether talking on a cell phone is more
distracting than talking to a passenger?
+
Chapter 4
Designing Studies
 4.1
Samples and Surveys
 4.2
Experiments
 4.3
Using Studies Wisely
of Inference
Well-designed experiments randomly assign individuals to
treatment groups. However, most experiments don’t select
experimental units at random from the larger population. That
limits such experiments to inference about cause and effect.
Observational studies that don’t randomly assign individuals to
groups, rules out inference about cause and effect.
Observational studies that use random sampling can make
inferences about the population.
Using Studies Wisely
What type of inference can be made from a particular study?
The answer depends on the design of the study.
+
 Scope
: Silence is Golden?
Many students insist that they study better when
listening to music. A teacher doubts this claim and
suspects that listening to music actually hurts academic
performance. Here are four possible study designs to
address this question at your school. In each case, the
response variable will be the students’ GPA at the end
of the semester.
: Silence is Golden?
Many students insist that they study better when listening to music. A teacher doubts this claim
and suspects that listening to music actually hurts academic performance. Here are four possible
study designs to address this question at your school. In each case, the response variable will be
the students’ GPA at the end of the semester.
1. Get all of the students in your AP Statistics class to participate in the study. Ask them
whether or not they study with music on and divide them into two groups based on their
answer to this question.
For each design, suppose that the mean GPA for students who listen to music was
significantly lower than the mean GPA of students who didn’t listen to music.
Problem: What can we conclude for each design?
1. With no random selection, the results of the study should only be applied to the AP
Statistics students in the study. With no random assignment, we should not conclude
anything about cause-and-effect. All we can conclude is that AP Stats students who
listen to music while studying have lower GPA’s than those who do not listen to music.
We don’t know why and we can’t apply these results to any larger group of students.
: Silence is Golden?
Many students insist that they study better when listening to music. A teacher doubts this claim
and suspects that listening to music actually hurts academic performance. Here are four possible
study designs to address this question at your school. In each case, the response variable will be
the students’ GPA at the end of the semester.
2. Select a random sample of students from your school to participate in a study. Then,
divide them into two groups as in Design 1.
For each design, suppose that the mean GPA for students who listen to music was
significantly lower than the mean GPA of students who didn’t listen to music.
Problem: What can we conclude for each design?
With random selection, the results of the study can be applied to the entire population—in this
case, all the students at this school. With no random assignment, however, we should not
conclude anything about cause-and-effect. All we can conclude is that students at this school
who listen to music while studying have lower GPA’s than those who do not listen to music. We
don’t know why their GPA’s are lower, however
: Silence is Golden?
Many students insist that they study better when listening to music. A teacher doubts this claim
and suspects that listening to music actually hurts academic performance. Here are four possible
study designs to address this question at your school. In each case, the response variable will be
the students’ GPA at the end of the semester.
3. Get all of the students in your AP Statistics class to participate in a study. Randomly
assign half of the students to listen to music while studying for the entire semester and
have the remaining half abstain from listening to music while studying.
For each design, suppose that the mean GPA for students who listen to music was
significantly lower than the mean GPA of students who didn’t listen to music.
Problem: What can we conclude for each design?
. With no random selection, the results of the study should only be applied to the AP
Statistics students in the study. With random assignment, however, we can conclude that
there is a cause-and-effect relationship between listening to music while studying and
GPA, but only for the AP Statistics students who took part in the study.
: Silence is Golden?
Many students insist that they study better when listening to music. A teacher doubts this claim
and suspects that listening to music actually hurts academic performance. Here are four possible
study designs to address this question at your school. In each case, the response variable will be
the students’ GPA at the end of the semester.
4. Select a random sample of students from your school to participate in a study. Randomly
assign half of the students to listen to music while studying for the entire semester and
have the remaining half abstain from listening to music while studying.
For each design, suppose that the mean GPA for students who listen to music was
significantly lower than the mean GPA of students who didn’t listen to music.
Problem: What can we conclude for each design?
1. With random selection, the results of the study can be applied to the entire population—
in this case, all the students at this school. With random assignment, we can conclude
that there is a cause-and-effect relationship between listening to music while studying and
GPA for all the students at the school.
Challenges of Establishing Causation
Lack of realism can limit our ability to apply the conclusions of an
experiment to the settings of greatest interest
Using Studies Wisely
A well-designed experiment tells us that changes in the
explanatory variable cause changes in the response variable.
+
 The
Animal Testing and Lack of Realism
When new products are being developed for use by humans, the products are often tested on
animals first. While animals share some physiological features with humans, they are not the
same and we should always be cautious applying the results of tests on animals to humans.
In some cases it isn’t practical or ethical to do an experiment.
Consider these questions:



Does texting while driving increase the risk of having an accident?
Does going to church regularly help people live longer?
Does smoking cause lung cancer?
It is sometimes possible to build a strong case for causation in
the absence of experiments by considering data from
observational studies.
Challenges of Establishing Causation





The association is strong.
The association is consistent.
Larger values of the explanatory variable are associated with
stronger responses.
The alleged cause precedes the effect in time.
The alleged cause is plausible.
Discuss how each of these criteria apply
to the observational studies of the
relationship between smoking and lung
cancer.
Using Studies Wisely
When we can’t do an experiment, we can use the following
criteria for establishing causation.
+
 The
Ethics
•
Basic Data Ethics
All planned studies must be reviewed in advance by an
institutional review board charged with protecting the safety
and well-being of the subjects.
•
All individuals who are subjects in a study must give their
informed consent before data are collected.
•
All individual data must be kept confidential. Only statistical
summaries for groups of subjects may be made public.
Using Studies Wisely
Complex issues of data ethics arise when we collect data from
people. Here are some basic standards of data ethics that
must be obeyed by all studies that gather data from human
subjects, both observational studies and experiments.
+
 Data