Transcript Chapter 5

Sampling
Design
How do we gather data?
•
•
•
•
Surveys
Opinion polls
Interviews
Studies
– Observational
– Retrospective (past)
– Prospective (future)
• Experiments
Population
• the entire group of
individuals that we
want information about
Census
• a complete count of the
population
How good is a census?
Do frog fairy tale . . .
The answer is 83!
Why would we not use
a census all the time?
1)
2)
3)
4)
Not accurate
Very expensive
Perhaps impossible
Look at the U.S. census – it
has a huge
amount
ofwould
error in
If using destructive
sampling,
you
Since
census
ofknow
any
Suppose
you
it; taking
plus
it awanted
takes
a to
long
to
destroy population
•
•
•
population
takes
time,
censuses
the
average
weight
of thethe
compile
the
data
making
Breaking strength of soda bottles
are obsolete
VERY
costly
to do! in
white-tail
deer
population
Lifetime of data
flashlight
batteries by the time we
Texas – wouldget
it be
it! feasible to
Safety ratings for cars
do a census?
Sample
• A part of the population that
we actually examine in
order to gather information
• Use sample to generalize to
population
Sampling design
• refers to the method
used to choose the
sample from the
population
Sampling frame
• a list of every
individual in the
population
Simple Random
Suppose we were to take an SRS of
Sample
(SRS)
100
SWH students
– put each
Not only does each student has the
students’
name
in a hat.from
Thenthe
•same
consist
of
n
individuals
chance to be selected – but every
randomly select 100 names from the
possible
group chosen
of 100 students
has
the
population
in
such
a
way
hat. Each student has the same
same chance to be selected!
chance to be selected!
that
Therefore, it has to be possible for all
100 students to be seniors in order for
–every individual
has
an
equal
it to be an SRS!
chance of being selected
–every set of n individuals has an
equal chance of being selected
Stratified
random sample
Homogeneous groups are groups
that are alike based upon some
the group
Supposecharacteristic
we were tooftake
a stratified
random sample members.
of 100 SWH students.
Since students are already divided by
grade level, grade level can be our
strata. Then randomly select 50 seniors
and randomly select 50 juniors.
• population is divided
into homogeneous
groups called strata
• SRS’s are pulled from
each strata
Systematic
random sample
Suppose we want to do a systematic random
sample of SWH students - number a list of
students
(There are approximately 2000 students – if we
want a sample of 100, 2000/100 = 20)
• select
sample
by
Select a number between 1 and 20 at
random.
That student
will be the first
following
a
systematic
student chosen, then choose every 20
student from there.
approach
• randomly select where to
begin
th
Cluster Sample
Suppose we want to do a cluster sample of
SWH students. One way to do this would
be to randomly select 10 classrooms during
2nd period. Sample all students in those
rooms!
• based upon location
• randomly pick a
location & sample all
there
Multistage
sample
To use a multistage approach to sampling
SWH students, we could first divide 2nd
period classes by level (AP, Honors,
Regular, etc.) and randomly select 4 second
period classes from each group. Then we
could randomly select 5 students from each
of those classes. The selection process is
done in stages!
• select successively
smaller groups within
the population in stages
• SRS used at each stage
SRS
• Advantages• Disadvantages
– Unbiased
– Easy
– Large variance
– May not be
representative
– Must have
sampling frame
(list of
population)
Stratified
• Advantages • Disadvantages
– More precise
– Difficult to do if
unbiased
you must divide
estimator than
stratum
SRS
– Formulas for SD
– Less variability
& confidence
intervals are
– Cost reduced
more complicated
if strata
already exists – Need sampling
frame
Systematic Random
Sample
• Advantages • Disadvantages
– Unbiased
– Ensure that
the sample is
distributed
across
population
– More
efficient,
cheaper, etc.
– Large variance
– Can be
confounded by
trend or cycle
– Formulas are
complicated
Cluster Samples
• Advantages • Disadvantages
– Unbiased
– Clusters may
– Cost is
not be
reduced
representative
– Sampling
of population
frame may
– Formulas are
not be
complicated
available (not
needed)
Identify the sampling design
1)The Educational Testing Service
(ETS) needed a sample of colleges.
ETS first divided all colleges into
groups of similar types (small
public, small private, etc.) Then
they randomly selected 3 colleges
from each group.
Stratified random sample
Identify the sampling design
2) A county commissioner wants to
survey people in her district to
determine their opinions on a
particular law up for adoption. She
decides to randomly select blocks in
her district and then survey all who
live on those blocks.
Cluster sampling
Identify the sampling design
3) A local restaurant manager wants
to survey customers about the
service they receive. Each night
the manager randomly chooses a
number between 1 & 10. He then
gives a survey to that customer,
and to every 10th customer after
them, to fill it out before they
leave.
Systematic random sampling
Random digit
table
Numbers can be read across.
Numbers
can of
be the
readrandom
vertically.
The following
is part
digit table
found can
on page
847
of your
Numbers
be read
diagonally.
textbook:
•Row
each entry is equally
to 8be5any
1 likely
4 5 1
0 3of3the
7
2 4 2 5 5 8 0 4 5 7
10 digits
3 8 9 9 3 4 3 5 0 6
• digits are independent
of each other
1
0
3
Suppose your population consisted of these 20 people:
1)
1) Aidan
Aidan
2) Bob
3) Chico
4) Doug
5) Edward
We will11)
need
to use double
6) Fred
Kathy
16) Paul
digit 12)
random
7) Gloria
Lori numbers,
17) Shawnie
ignoring13)
any
number greater
8) Hannah
13)
Matthew
Matthew
18) Tracy
than 20.
9) Israel
14)Start
Nan with Row
19) 1
Uncle Sam
10) Jung and
15)read
Opus across. 20) Vernon
Ignore.
Ignore.Ignore.
Ignore.
Use the following random digits to select a sample of five from these people.
Row Stop when five people are selected. So
1 4 5 my1 sample
8 0 would
5 consist
1 3 of
7 :1
2 0 1 5 5 8 0 1 5 7 0
3 8 Aidan,
9 9 Edward,
3 4 Matthew,
3 5 0Opus,
6 3
and
Tracy
Bias
• ERROR
Anything
that causes the
• favors
certain
data to be wrong! It
might be attributed to
outcomes
the researchers, the
respondent, or to the
sampling method!
Sources of
Bias
• things that can cause
bias in your sample
• cannot do anything
with bad data
Voluntary
response
• People
respond
An examplechose
would be to
the surveys
in
Remember
–
the
way
to
magazines that ask readers to mail in
•the
Usually
onlyvoluntary
people
determine
survey.
Other
examples
arewith
callin shows,
Americanis:
Idol, etc.
response
very strong opinions
Remember, the respondent selects
respond
themselves to participate in the
Self-selection!!
survey!
Convenience
sampling
The data obtained by a convenience
sample will be biased – however this
method is often used for surveys &
results reported in newspapers and
An example would be stopping
magazines!
friendly-looking people in the mall to
survey. Another example is the
surveys left on tables at restaurants
- a convenient method!
•Ask people who are
easy to ask
•Produces bias
results
Undercoverage
People with unlisted
phone numbers – usually
high-income families
•some groups of
People without
phone numbers –
population
left
Suppose
you take a are
usually
lowsample by randomly
income families
out names
of the
selecting
from sampling
the phone book –
process
some
groups will not
People with ONLY cell
have the opportunity
of being selected!
phones – usually young
adults
Nonresponse
Because of huge telemarketing
efforts in the past few years,
telephone surveys have a MAJOR
People
are
chosen
by
the
problem
with
nonresponse!
One way
to
help
with
theresearchers,
problem
BUT refuse is
toto
participate.
of nonresponse
make follow
contact with the people who are
NOT
self-selected!
not home
when
you first contact
them.
This is often confused with voluntary
response!
• occurs when an individual
chosen for the sample
can’t be contacted or
refuses to cooperate
• telephone surveys 70%
nonresponse
Response bias
Suppose we wanted to survey high
school students on drug abuse and
we used a uniformed police officer
to interview each student in our
sample – would we get honest
Response biasanswers?
occurs when for some
reason (interviewer’s or respondent’s
fault) you get incorrect answers.
• occurs when the
behavior of respondent
or interviewer causes
bias in the sample
• wrong answers
Wording
of
the
The level of vocabulary should be
appropriate
for the
you
Questions
mustpopulation
be worded
as
Questions
are surveying
neutral
as possible to avoid
influencing
the influence
response.
• wording
can
the
– if surveying Podunk, TX,
thenare
you should
answers that
givenavoid
complex vocabulary.
• connotation of words
if surveying
doctors,
•– use
of “big”
words or
then use more complex,
technical
words
technical wording.
Source of Bias?
1) Before the presidential election of
1936, FDR against Republican ALF
Landon, the magazine Literary Digest
predicting Landon winning the election in
a 3-to-2 victory. A survey of 10 million
people. George Gallup surveyed only
50,000 people and
predicted
that survey
Undercoverage
– since
the Digest’s
Roosevelt
win. Theetc.,
Digest’s
comes
fromwould
car owners,
the survey
people
came from
magazine
car
selected
were
mostly subscribers,
from high-income
owners, and
telephone
directories,
etc.
families
thus mostly
Republican!
(other
answers are possible)
2) Suppose that you want to
estimate the total amount of
money spent by students on
textbooks each semester at
SMU.
You
collect
register
Convenience sampling – easy way to
collect
data
receipts for
students
as they
or
leave
the
bookstore
during
Undercoverage – students who buy
lunch
booksone
fromday.
on-line bookstores are
excluded.
3) To find the average
value of a home in
Memorial, one averages
the price of homes that
are listed for sale with a
Undercoverage – leaves out homes
realtor.
that are not for sale or homes that
are listed with different realtors.
(other answers are possible)