Chapter 4

Transcript Chapter 4

Chapter 4
Gathering Data
1
Looking Back





In Chapters 2 & 3 we learned how to describe data
both graphically and numerically.
For these statistical analyses to be useful, we must
have good data.
In fact, the way a study is designed (how we gather
data) can have a major impact on the results of the
study.
The purpose of this course is for you to learn what
you can conclude about an entire population given a
sample from that population.
If a study is poorly designed and implemented, the
results may be meaningless or misleading.
2
Two Scenarios

Study 1


A U.S. study (2000) compared 469 patients with brain
cancer to 422 patients who did not have brain cancer. The
patients’ cell phone use was measured using a
questionnaire. The two groups’ use of cell phones was
similar.
Study 2

An Australian study (1997) conducted a study with 200
transgenic mice. One hundred were exposed for two 30
minute periods a day to the same kind of microwaves with
roughly the same power as the kind transmitted from a cell
phone. The other 100 mice were not exposed. After 18
months, the brain tumor rate for the exposed mice was
twice as high as that for the unexposed mice.
Example taken from Statistics: The Art and
Science of Learning from Data
3
Questions to Consider

How do the two studies differ?

Study 1

Study 2
4
Questions to Consider

How do the two studies differ?

Study 1



No treatments assigned
Patients merely questioned
Study 2
5
Questions to Consider

How do the two studies differ?

Study 1



No treatments assigned
Patients merely questioned
Study 2

Uses mice in hopes of generalizing to humans
6
Questions to Consider

Why do the results of different medical
studies sometimes disagree?

Could the second study be performed on
human beings?
7
Questions to Consider

Why do the results of different medical
studies sometimes disagree?


Differing types of studies, data collection or
sample frames
Could the second study be performed on
human beings?
8
Questions to Consider

Why do the results of different medical
studies sometimes disagree?


Differing types of studies, data collection or
sample frames
Could the second study be performed on
human beings?

No, because it would be unethical to knowingly
expose humans to possibly harmful waves.
9
Questions to Consider

Suppose a friend recently diagnosed with
brain cancer was a frequent cell phone user.
Is this strong evidence that frequent cell
phone use increases the likelihood of getting
brain cancer?


Informal observations of this type are called
_____________ _____________.
You should rely on reputable research studies, not
anecdotes.
10
Questions to Consider

Suppose a friend recently diagnosed with
brain cancer was a frequent cell phone user.
Is this strong evidence that frequent cell
phone use increases the likelihood of getting
brain cancer?


Informal observations of this type are called
anecdotal evidence.
You should rely on reputable research studies, not
anecdotes.
11
Two Main Ways to Gather Data

Observational Study



The researcher observes values of the response and
explanatory variables for the sampled subjects without
imposing any treatments
Example:
Experiment



The researcher assigns experimental conditions (also
called treatments) to subjects (also called experimental
units) and then observes outcomes on the response
variable.
Treatments correspond to values of the explanatory
variable
Example:
12
Two Main Ways to Gather Data

Observational Study



The researcher observes values of the response and
explanatory variables for the sampled subjects without
imposing any treatments
Example: Study 1
Experiment



The researcher assigns experimental conditions (also
called treatments) to subjects (also called experimental
units) and then observes outcomes on the response
variable.
Treatments correspond to values of the explanatory
variable
Example:
13
Two Main Ways to Gather Data

Observational Study



The researcher observes values of the response and
explanatory variables for the sampled subjects without
imposing any treatments
Example: Study 1
Experiment



The researcher assigns experimental conditions (also
called treatments) to subjects (also called experimental
units) and then observes outcomes on the response
variable.
Treatments correspond to values of the explanatory
variable
Example: Study 2
14
Advantages of Experiments
over Observational Studies




In an observational study, there can always be
lurking variables affecting the results.
This means that observational studies can
_________ show causation.
It is easier to adjust for lurking variables in an
experiment.
In general, we can study the effect of an explanatory
variable on a response variable more accurately
with an experiment than with an observational study.
15
Advantages of Experiments
over Observational Studies




In an observational study, there can always be
lurking variables affecting the results.
This means that observational studies can never
show causation.
It is easier to adjust for lurking variables in an
experiment.
In general, we can study the effect of an explanatory
variable on a response variable more accurately
with an experiment than with an observational study.
16
Disadvantages of Experiments





They can be ____________ to perform on the
subjects in which you are interested.
It can be difficult to monitor subjects to ensure that
they are doing what they are told.
They can take many years, even decades, to
complete.
Results of experiments that use animals do not
______________ to humans.
They are unnecessary when the question of interest
does not involve trying to assess _____________.
17
Disadvantages of Experiments





They can be unethical to perform on the subjects in
which you are interested.
It can be difficult to monitor subjects to ensure that
they are doing what they are told.
They can take many years, even decades, to
complete.
Results of experiments that use animals do not
______________ to humans.
They are unnecessary when the question of interest
does not involve trying to assess _____________.
18
Disadvantages of Experiments





They can be unethical to perform on the subjects in
which you are interested.
It can be difficult to monitor subjects to ensure that
they are doing what they are told.
They can take many years, even decades, to
complete.
Results of experiments that use animals do not
generalize to humans.
They are unnecessary when the question of interest
does not involve trying to assess _____________.
19
Disadvantages of Experiments





They can be unethical to perform on the subjects in
which you are interested.
It can be difficult to monitor subjects to ensure that
they are doing what they are told.
They can take many years, even decades, to
complete.
Results of experiments that use animals do not
generalize to humans.
They are unnecessary when the question of interest
does not involve trying to assess causality.
20
Example 4.1

A large study of student drug use and how it
depends on drug testing enrolled 76,000 middle and
high school students. Each student in the study
filled out a questionnaire. One question asked
whether the student used drugs. The study found
that drug use was not affected by student drug
testing.

This is an example of an

Could there be any lurking variables?
Example taken from Statistics: The Art and
Science of Learning from Data
21
Example 4.1

A large study of student drug use and how it
depends on drug testing enrolled 76,000 middle and
high school students. Each student in the study
filled out a questionnaire. One question asked
whether the student used drugs. The study found
that drug use was not affected by student drug
testing.

This is an example of an observational study.

Could there be any lurking variables?
Example taken from Statistics: The Art and
Science of Learning from Data
22
Example 4.1

A large study of student drug use and how it
depends on drug testing enrolled 76,000 middle and
high school students. Each student in the study
filled out a questionnaire. One question asked
whether the student used drugs. The study found
that drug use was not affected by student drug
testing.

This is an example of an observational study.

Could there be any lurking variables?

Frequency of drug testing, whether testing is random, etc.
Example taken from Statistics: The Art and
Science of Learning from Data
23
Example 4.2

A researcher buys seeds of two different varieties of
corn. He randomly selects 30 seeds of each variety
and plants them in his backyard, making sure to
label the location of each seed and its type. He then
measures how long it takes each seed to sprout. At
the end of the study he compares the average
germination time of the different varieties.

This is an example of an

Could there be any lurking variables?
Used with permission from Dr. Ellen Toby
24
Example 4.2

A researcher buys seeds of two different varieties of
corn. He randomly selects 30 seeds of each variety
and plants them in his backyard, making sure to
label the location of each seed and its type. He then
measures how long it takes each seed to sprout. At
the end of the study he compares the average
germination time of the different varieties.

This is an example of an experiment.

Could there be any lurking variables?
Used with permission from Dr. Ellen Toby
25
Example 4.2

A researcher buys seeds of two different varieties of
corn. He randomly selects 30 seeds of each variety
and plants them in his backyard, making sure to
label the location of each seed and its type. He then
measures how long it takes each seed to sprout. At
the end of the study he compares the average
germination time of the different varieties.

This is an example of an experiment.

Could there be any lurking variables?

Soil quality, temperature
Used with permission from Dr. Ellen Toby
26
Example 4.3

A researcher has seeds of only one variety of tomato. She has
60 nearly identical pots of soil and plants one tomato seed in
each. She randomly selects 30 pots and keeps them at 75° F.
The other 30 pots she keeps at 65° F. Aside from temperature,
she provides the same growing conditions to all pots. She then
measures how long it takes for the seeds to sprout. At the end of
the study she compares the average germination time of the
different temperature groups.

This is an example of an

Are there any lurking variables?
Used with permission from Dr. Ellen Toby
27
Example 4.3

A researcher has seeds of only one variety of tomato. She has
60 nearly identical pots of soil and plants one tomato seed in
each. She randomly selects 30 pots and keeps them at 75° F.
The other 30 pots she keeps at 65° F. Aside from temperature,
she provides the same growing conditions to all pots. She then
measures how long it takes for the seeds to sprout. At the end of
the study she compares the average germination time of the
different temperature groups.

This is an example of an experiment.

Are there any lurking variables?
Used with permission from Dr. Ellen Toby
28
Example 4.3

A researcher has seeds of only one variety of tomato. She has
60 nearly identical pots of soil and plants one tomato seed in
each. She randomly selects 30 pots and keeps them at 75° F.
The other 30 pots she keeps at 65° F. Aside from temperature,
she provides the same growing conditions to all pots. She then
measures how long it takes for the seeds to sprout. At the end of
the study she compares the average germination time of the
different temperature groups.

This is an example of an experiment.

Are there any lurking variables?

No, everything has been controlled here.
Used with permission from Dr. Ellen Toby
29
Types of Observational
Studies

Retrospective

Observational studies that look back in time


Cross-Sectional


This is sometimes done to find risk factors for certain
diseases
Observational studies that take a cross section of
the population at the current time
Prospective

Observational studies in which subjects are
followed into the future
30
Sampling Designs for
Observational Studies

Simple Random Sampling (SRS)

A simple random sample of n subjects from a
population is one in which each possible sample
of that size has the _______ chance of being
selected.
31
Sampling Designs for
Observational Studies

Simple Random Sampling (SRS)

A simple random sample of n subjects from a
population is one in which each possible sample
of that size has the same chance of being
selected.
32
Sampling Designs for
Observational Studies

Stratified Sampling

A stratified random sample divides the population
into separate groups, called strata, and then
selects an SRS of _________ from each stratum.
33
Sampling Designs for
Observational Studies

Stratified Sampling

A stratified random sample divides the population
into separate groups, called strata, and then
selects an SRS of subjects from each stratum.
34
Sampling Designs for
Observational Studies

Cluster Sampling

A cluster random sample can be used if the target
population naturally divides into groups, each of which is
representative of the entire target population. In this
method, a SRS of ________(or strata) is taken. Every
member of the selected groups is put into the sample.
35
Sampling Designs for
Observational Studies

Cluster Sampling

A cluster random sample can be used if the target
population naturally divides into groups, each of which is
representative of the entire target population. In this
method, a SRS of groups (or strata) is taken. Every
member of the selected groups is put into the sample.
36
Sampling Designs for
Observational Studies

Systematic Sampling

A systematic sample selects every kth person from
the sample frame. The researcher randomly
selects a number between 1 and k in order to
know which person to select first, then selects
every kth person after this.
37
Advantages of the Various
Sampling Designs

Simple Random Sampling (SRS)



It is the easiest most widespread form of
sampling.
Each subject has an _______ chance to be in the
sample.
The sample enables us to determine how likely it
is that descriptive statistics (like the sample mean)
fall close to corresponding values for which we
would like to make inference (like the population
mean).
38
Advantages of the Various
Sampling Designs

Simple Random Sampling (SRS)



It is the easiest most widespread form of
sampling.
Each subject has an equal chance to be in the
sample.
The sample enables us to determine how likely it
is that descriptive statistics (like the sample mean)
fall close to corresponding values for which we
would like to make inference (like the population
mean).
39
Advantages of the Various
Sampling Designs

Stratified Sampling


It ensures that there are enough _________ in
each group that you want to compare.
Cluster Sampling


It does not require a sampling frame of subjects.
It is less ___________ to implement.
40
Advantages of the Various
Sampling Designs

Stratified Sampling


It ensures that there are enough subjects in each
group that you want to compare.
Cluster Sampling


It does not require a sampling frame of subjects.
It is less ___________ to implement.
41
Advantages of the Various
Sampling Designs

Stratified Sampling


It ensures that there are enough subjects in each
group that you want to compare.
Cluster Sampling


It does not require a sampling frame of subjects.
It is less expensive to implement.
42
Bias in Sampling

A sampling method is _________ if



The sample tends to favor some parts of the
population over others.
In other words, the results from the sample are
not representative of the population.
Obviously, __________ samples are our
goal.
43
Bias in Sampling

A sampling method is biased if



The sample tends to favor some parts of the
population over others.
In other words, the results from the sample are
not representative of the population.
Obviously, __________ samples are our
goal.
44
Bias in Sampling

A sampling method is biased if



The sample tends to favor some parts of the
population over others.
In other words, the results from the sample are
not representative of the population.
Obviously, unbiased samples are our goal.
45
Types of Bias

Undercoverage


Nonresponse bias


Occurs when a sampling frame leaves out some groups in
the population
Occurs when some sampled subjects cannot be reached,
refuse to participate or fail to answer some questions
Response bias

Occurs when the subject gives an incorrect response or
when the question wording or the way the interviewer asks
the questions is confusing or misleading
46
Examples of Poor Samples
that Result in Bias

Convenience Samples

Voluntary Response Samples
47
Examples of Poor Samples
that Result in Bias

Convenience Samples



Sampling friends
Sampling at the mall
Voluntary Response Samples
48
Examples of Poor Samples
that Result in Bias

Convenience Samples



Sampling friends
Sampling at the mall
Voluntary Response Samples


Internet surveys
Call-in surveys
49
Example 4.4

In 1997 in her book Women and Love, Shere Hite
presented results of a survey mailed to 100,000
women in the United States. One of her
conclusions was that 70% of women who had been
married at least five years have extramarital affairs.
She based this conclusion on the replies of only
4500 women.

This is an example of
Example taken from Statistics: The Art and
Science of Learning from Data
50
Example 4.4

In 1997 in her book Women and Love, Shere Hite
presented results of a survey mailed to 100,000
women in the United States. One of her
conclusions was that 70% of women who had been
married at least five years have extramarital affairs.
She based this conclusion on the replies of only
4500 women.

This is an example of nonresponse bias.
Example taken from Statistics: The Art and
Science of Learning from Data
51
Example 4.5

Ann Landers asked readers, “If you had it to do over
again, would you have children?” A few weeks later,
her column was headlined, “70% OF PARENTS SAY
KIDS NOT WORTH IT.” Of the nearly 10,000
parents who wrote in, 70% said they would not have
children if they could go back in time.

This is an example of ______________________
sampling.
Used with permission from Dr. Ellen Toby
52
Example 4.5

Ann Landers asked readers, “If you had it to do over
again, would you have children?” A few weeks later,
her column was headlined, “70% OF PARENTS SAY
KIDS NOT WORTH IT.” Of the nearly 10,000
parents who wrote in, 70% said they would not have
children if they could go back in time.

This is an example of voluntary response sampling.
Used with permission from Dr. Ellen Toby
53
Example 4.6

In 1936, the Literary Digest conducted a poll to
predict the winner of the presidential election. Alf
Landon and Franklin Roosevelt were both running
for president. The sample frame for the poll was
constructed from telephone directories, country club
memberships and automobile registrations. The
Digest predicted that Landon would win, but in
reality FDR won by a landslide.

This is an example of _____________ sampling that
resulted in _______________.
Example taken from Statistics: The Art and
Science of Learning from Data
54
Example 4.6

In 1936, the Literary Digest conducted a poll to
predict the winner of the presidential election. Alf
Landon and Franklin Roosevelt were both running
for president. The sample frame for the poll was
constructed from telephone directories, country club
memberships and automobile registrations. The
Digest predicted that Landon would win, but in
reality FDR won by a landslide.

This is an example of convenience sampling that
resulted in undercoverage.
Example taken from Statistics: The Art and
Science of Learning from Data
55
Example 4.7


An experiment involving adolescent males (ages 1519) appeared in Science, 1995. The purpose of the
study was to determine whether there was an
association between survey techniques and the
desire to give socially acceptable answers.
The participants were randomly assigned to one of
two different survey forms, each of which had
identical questions concerning sexual practices and
drug habits.
Used with permission from Dr. Ellen Toby
56
Example 4.7

The two versions of the survey were


Paper: participants put answers in an envelope with ID#
on it and return in person
Computer: participants listened to questions in
headphones and then answered on laptops.
57
Types of Experimental Studies

Completely Randomized Design


The subjects are randomly assigned to one of the
treatments.
Matched Pairs Design



Each subject is matched up with another subject who is
similar in terms of age, health, etc.
 This creates a ______________ _______.
The treatments are then randomly assigned to the subjects
in each pair.
This ensures that the treatment groups are essentially
______________.
58
Types of Experimental Studies

Completely Randomized Design


The subjects are randomly assigned to one of the
treatments.
Matched Pairs Design



Each subject is matched up with another subject who is
similar in terms of age, health, etc.
 This creates a matched pair.
The treatments are then randomly assigned to the subjects
in each pair.
This ensures that the treatment groups are essentially
______________.
59
Types of Experimental Studies

Completely Randomized Design


The subjects are randomly assigned to one of the
treatments.
Matched Pairs Design



Each subject is matched up with another subject who is
similar in terms of age, health, etc.
 This creates a matched pair.
The treatments are then randomly assigned to the subjects
in each pair.
This ensures that the treatment groups are essentially
identical.
60
Types of Experimental Studies

Crossover Design


The subjects cross over during the experiment from one
treatment to another.
Randomized Block Design


Similar subjects are matched up to create a large set of
experimental units.
 This is called a _________.
The treatments are then randomly assigned to units within
the blocks.
61
Types of Experimental Studies

Crossover Design


The subjects cross over during the experiment from one
treatment to another.
Randomized Block Design


Similar subjects are matched up to create a large set of
experimental units.
 This is called a block.
The treatments are then randomly assigned to units within
the blocks.
62
Elements of a Good
Experiment

Control group


Allows us to compare against an existing treatment
Enables us to control the __________ _______


The placebo effect occurs when patients seem to improve
regardless of the treatment they receive.
Randomization



Eliminates ______ that can result when researchers assign
treatments to the subjects
Balances the group on variables that you know affect the
response
Balances the group on _________ variables that may be
unknown to you
63
Elements of a Good
Experiment

Control group


Allows us to compare against an existing treatment
Enables us to control the placebo effect


The placebo effect occurs when patients seem to improve
regardless of the treatment they receive.
Randomization



Eliminates ______ that can result when researchers assign
treatments to the subjects
Balances the group on variables that you know affect the
response
Balances the group on _________ variables that may be
unknown to you
64
Elements of a Good
Experiment

Control group


Allows us to compare against an existing treatment
Enables us to control the placebo effect


The placebo effect occurs when patients seem to improve
regardless of the treatment they receive.
Randomization



Eliminates bias that can result when researchers assign
treatments to the subjects
Balances the group on variables that you know affect the
response
Balances the group on _________ variables that may be
unknown to you
65
Elements of a Good
Experiment

Control group


Allows us to compare against an existing treatment
Enables us to control the placebo effect


The placebo effect occurs when patients seem to improve
regardless of the treatment they receive.
Randomization



Eliminates bias that can result when researchers assign
treatments to the subjects
Balances the group on variables that you know affect the
response
Balances the group on lurking variables that may be
unknown to you
66
Elements of a Good
Experiment

Blinding

Increases reliability of the results



_________-blind: subjects do not know the
treatment assignment
_________-blind: neither the subjects nor those in
contact with the subjects know the treatment
assignment
Replication

Assigns several _________________ ________
to each treatment
67
Elements of a Good
Experiment

Blinding

Increases reliability of the results



Single-blind: subjects do not know the treatment
assignment
_________-blind: neither the subjects nor those in
contact with the subjects know the treatment
assignment
Replication

Assigns several _________________ ________
to each treatment
68
Elements of a Good
Experiment

Blinding

Increases reliability of the results



Single-blind: subjects do not know the treatment
assignment
Double-blind: neither the subjects nor those in
contact with the subjects know the treatment
assignment
Replication

Assigns several _________________ ________
to each treatment
69
Elements of a Good
Experiment

Blinding

Increases reliability of the results



Single-blind: subjects do not know the treatment
assignment
Double-blind: neither the subjects nor those in
contact with the subjects know the treatment
assignment
Replication

Assigns several experimental units to each
treatment
70
Example 4.9

A pharmaceutical company has developed a new drug for treating
high blood pressure. To determine the effectiveness of the drug, the
company conducted an experiment in which subjects with a history
of high blood pressure were treated with the new drug.

A later experiment randomly divided subjects with a history of high
blood pressure into two groups. Group A was treated with the new
drug as before. Group B received the most popular drug on the
market at that time. The subjects were unaware of which treatment
they received. 60% of the patients in Group A improved, while 63%
of the patients in Group B improved.

The __________ experiment is better because
71
Example 4.9

A pharmaceutical company has developed a new drug for treating
high blood pressure. To determine the effectiveness of the drug, the
company conducted an experiment in which subjects with a history
of high blood pressure were treated with the new drug.

A later experiment randomly divided subjects with a history of high
blood pressure into two groups. Group A was treated with the new
drug as before. Group B received the most popular drug on the
market at that time. The subjects were unaware of which treatment
they received. 60% of the patients in Group A improved, while 63%
of the patients in Group B improved.

The second experiment is better because it employs a control group
and blinding.
72
Example 4.10

To investigate whether antidepressants help smokers to quit
smoking, one study used 429 men and women who were 18 or older
and had smoked 15 cigarettes or more per day in the previous year.
They were all highly motivated to quit and in good health. They
were assigned to one of two groups: one group took an
antidepressant called Zyban, while the other group did not take
anything. At the end of a year, the study observed whether each
subject had successfully abstained from smoking.
Example taken from Statistics: The Art and
Science of Learning from Data
73
Logic Behind Randomized
Comparative Experiments




Randomization ensures that the groups of subjects
are similar in all respects before the treatments are
applied.
Using a control group for comparison ensures that
external influences operate equally on both groups.
If the groups are large enough, natural differences in
subjects will average out.
This means that there be little difference in the
results for the groups unless the treatments
themselves actually cause the difference.
74
Did You Know?

Observational studies can also have control
groups.




These are called ______-________ studies.
The cases are people who have a certain disease
or condition, and the controls are people who do
not have the disease.
Their purpose is to see if one of the explanatory
variables is related to the disease.
_________ from the beginning of these notes is
an example of a case-control study.
75
Did You Know?

Observational studies can also have control
groups.




These are called case-control studies.
The cases are people who have a certain disease
or condition, and the controls are people who do
not have the disease.
Their purpose is to see if one of the explanatory
variables is related to the disease.
_________ from the beginning of these notes is
an example of a case-control study.
76
Did You Know?

Observational studies can also have control
groups.




These are called case-control studies.
The cases are people who have a certain disease
or condition, and the controls are people who do
not have the disease.
Their purpose is to see if one of the explanatory
variables is related to the disease.
Study 1 from the beginning of these notes is an
example of a case-control study.
77
Important Points

Observational studies

Types


Sampling Designs


Simple random sample (SRS), Stratified random sample,
Cluster sample, Systematic sample
Bias Types


Retrospective, Cross-Sectional, Prospective
Undercoverage, Response bias, Nonresponse bias
Sources of Bias

Convenience sampling, Voluntary response sampling
78
Important Points

Experiments

Types


Elements of Good Experiments


Control group, randomization, blinding and replication
Advantages


Completely randomized design, matched pairs designs,
crossover designs, randomized block designs
Can show causation
Disadvantages


Can be unethical
Can take decades to complete
79
Important Points




If a group is underrepresented in the sample, we
cannot make inference about it.
We must be careful when interpreting the results of
observational studies.
For comparison of several treatments to be valid, you
must apply all treatments to similar groups of
experimental units.
Interesting questions are usually pretty tough to
answer. This is due in part to the fact that no single
experiment or observational study can determine
causation.
80

Chapter 4

Transcript Chapter 4

Directory