Transcript Chapter 5

Chapter 2
Sampling Design
How do we gather data?
•
•
•
•
Surveys
Opinion polls
Interviews
Studies
– Observational
– Retrospective (past)
– Prospective (future)
• Experiments
Population
• the entire group of
individuals that we
want information about
Census
• a complete count of the
population
How good is a census?
Do frog fairy tale . . .
The answer is 83!
Why would we not use
a census all the time?
1)
2)
3)
4)
Not accurate
Very expensive
Perhaps impossible
Look at the U.S. census – it
has a huge
amount
ofwould
error in
If using destructive
sampling,
you
Since
census
ofknow
any
Suppose
you
it; taking
plus
it awanted
takes
a to
long
to
destroy population
•
•
•
population
takes
time,
censuses
the
average
weight
of thethe
compile
the
data
making
Breaking strength of soda bottles
are obsolete
VERY
costly
to do! in
white-tail
deer
population
Lifetime of data
flashlight
batteries by the time we
Texas – wouldget
it be
it! feasible to
Safety ratings for cars
do a census?
Sample
• A part of the population that
we actually examine in
order to gather information
• Use sample to generalize to
population
Sampling design
• refers to the method
used to choose the
sample from the
population
Sampling frame
• a list of every
individual in the
population
Jelly Blubber Activity
• Select 10 Jelly blubbers that you
think are representative of the
population of blubbers in regards to
length.
• Find the mean length of your
sample
Simple Random
Suppose
wereeach
to take
an SRS
of
Not onlywedoes
student
has the
Sample
SHS to
students
–(SRS)
put –each
same100
chance
be selected
but every
students’
in a students
hat. from
Then
group
100
has the
the
• possible
consist
of name
n of
individuals
randomly
100
from the
sameselect
chance
to names
be selected!
population
chosen
inpossible
such
afor
way
hat. Eachit student
has
the same
Therefore,
has to be
all
chancetotobe
beseniors
selected!
100
in order for
thatstudents
it to be an SRS!
–every individual has an equal
chance of being selected
–every set of n individuals has an
equal chance of being selected
Stratified
random sample
Suppose we were to take a stratified
Homogeneous groups are groups
random sample of 100 SHS students.
that are alike based upon some
Since students are already divided by
characteristic of the group
grade level, grade level can be our
members.
strata. Then randomly select 50 seniors
and randomly select 50 juniors.
• population is divided
into homogeneous
groups called strata
• SRS’s are pulled from
each stratum
Systematic
random sample
Suppose we want to do a systematic random
sample of SHS students - number a list of
students
(There are approximately 2000 students – if we
want a sample of 100, 2000/100 = 20)
• select sample by
following a systematic
approach
• randomly select where to
begin
Select a number between 1 and 20 at
random. That student will be the first
student chosen, then choose every 20th
student from there.
Cluster Sample
Suppose we want to do a cluster sample of
SHS students. One way to do this would
be to randomly select 10 classrooms during
2nd period. Sample all students in those
rooms!
• based upon location
• randomly pick a
location & sample all
there
For the Jelly Blubber
colony:
m = 19.41
Multistage
sample
To use a multistage approach to sampling
SHS students, we could first divide 2nd
period classes by level (AP, Honors,
Regular, etc.) and randomly select 4 second
period classes from each group. Then we
could randomly select 5 students from each
of those classes. The selection process is
done in stages!
• select successively
smaller groups within
the population in stages
• SRS used at each stage
SRS
• Advantages• Disadvantages
– Unbiased
– Easy
– Large variance
– May not be
representative
– Must have
sampling frame
(list of
population)
Stratified
• Advantages • Disadvantages
– More precise
– Difficult to do if
unbiased
you must divide
estimator than
stratum
SRS
– Formulas for SD
– Less variability
& confidence
intervals are
– Cost reduced
more complicated
if strata
already exists – Need sampling
frame
Systematic Random
Sample
• Advantages • Disadvantages
– Unbiased
– Ensure that
the sample is
spread across
population
– More
efficient,
cheaper, etc.
– Large variance
– Can be
confounded by
trend or cycle
– Formulas are
complicated
Cluster Samples
• Advantages • Disadvantages
– Unbiased
– Clusters may
– Cost is
not be
reduced
representative
– Sampling
of population
frame may
– Formulas are
not be
complicated
available (not
needed)
Identify the sampling design
1)The Educational Testing Service
(ETS) needed a sample of colleges.
ETS first divided all colleges into
groups of similar types (small
public, small private, etc.) Then
they randomly selected 3 colleges
from each group.
Stratified random sample
Identify the sampling design
2) A county commissioner wants to
survey people in her district to
determine their opinions on a
particular law up for adoption. She
decides to randomly select blocks in
her district and then survey all who
live on those blocks.
Cluster sampling
Identify the sampling design
3) A local restaurant manager wants
to survey customers about the
service they receive. Each night
the manager randomly chooses a
number between 1 & 10. He then
gives a survey to that customer,
and to every 10th customer after
them, to fill it out before they
leave.
Systematic random sampling
Random digit
table
Numbers can be read across.
Numbers
can of
be the
readrandom
vertically.
The following
is part
digit table
found can
on be
page
847
of your
Numbers
read
diagonally.
textbook:
•Row
each entry is equally
to 8be5any
1 likely
4 5 1
0 3of3the
7
2 4 2 5 5 8 0 4 5 7
10 digits
3 8 9 9 3 4 3 5 0 6
• digits are independent
of each other
1
0
3
Suppose your population consisted of these 20 people:
1)
1) Aidan
Aidan
2) Bob
3) Chico
4) Doug
5) Edward
We will11)
need
to use double
6) Fred
Kathy
16) Paul
digit 12)
random
7) Gloria
Lori numbers,
17) Shawnie
ignoring13)
any
number greater
8) Hannah
13)
Matthew
Matthew
18) Tracy
than 20.
9) Israel
14)Start
Nan with Row
19) 1
Uncle Sam
10) Jung and
15)read
Opus across. 20) Vernon
Ignore.
Ignore.Ignore.
Ignore.
Use the following random digits to select a sample of five from these people.
Row Stop when five people are selected. So
1 4 5 my1 sample
8 0 would
5 consist
1 3 of
7 :1
2 0 1 5 5 8 0 1 5 7 0
3 8 Aidan,
9 9 Edward,
3 4 Matthew,
3 5 0Opus,
6 3
and
Tracy
Bias
• A systematic error in
measuring
the
estimate
Anything that causes the
data to be wrong! It
• favorsmight
certain
outcomes
be attributed to
the researchers, the
respondent, or to the
sampling method!
Sources of
Bias
• things that can cause
bias in your sample
• cannot do anything
with bad data
Voluntary
response
Remember – the way to
determine
voluntary
• People
chose
respond
An example would be to
the surveys
in
is: to mail in
magazines response
that ask readers
•the
Usually
onlyexamples
people
survey. Other
arewith
callin shows, American Idol, etc.
very
strong
opinions
Self-selection!!
Remember, the respondent selects
respond
themselves to participate in the
survey!
Convenience
sampling
The data obtained by a convenience
sample will be biased – however this
method is often used for surveys &
results reported in newspapers and
An example would be stopping
magazines!
friendly-looking people in the mall to
survey. Another example is the
surveys left on tables at restaurants
- a convenient method!
•Ask people who are
easy to ask
•Produces bias
results
Undercoverage
People with unlisted
phone numbers – usually
high-income families
•some groups of
People without
phone numbers –
population
left
Suppose
you take a are
usually
lowsample by randomly
income families
out names
of the
selecting
from selection
the phone book –
process
some
groups will not
People with ONLY cell
have the opportunity
of being selected!
phones – usually young
adults
Nonresponse
Because of huge telemarketing
• occurs
when
an
individual
efforts in the past few years,
telephonefor
surveys
a MAJOR
chosen
thehave
sample
People
are
chosen
by
the
problem
with
nonresponse!
One way
to
help
with
theresearchers,
problem
can’t
be
contacted
or
BUT
refuse
toto
participate.
of nonresponse
is
make follow
contact with
the
people who are
refuses
to
cooperate
NOT
self-selected!
not home
when
you first contact
• telephonethem.
surveys 70%
This is often confused with voluntary
nonresponse
response!
Response bias
Suppose we wanted to survey high
school students on drug abuse and
we used a uniformed police officer
to interview each student in our
sample – would we get honest
Response biasanswers?
occurs when for some
reason (interviewer’s or respondent’s
fault) you get incorrect answers.
• occurs when the
behavior of respondent
or interviewer causes
bias in the sample
• wrong answers
Wording
of
the
The level of vocabulary should be
appropriate
for the
you
Questions
mustpopulation
be worded
as
Questions
are surveying
neutral
as possible to avoid
influencing
the influence
response.
• wording
can
the
– if surveying Podunk, TX,
thenare
you should
answers that
givenavoid
complex vocabulary.
• connotation of words
if surveying
doctors,
•– use
of “big”
words or
then use more complex,
technical
words
technical wording.
Source of Bias?
1) Before the presidential election of
1936, FDR against Republican ALF
Landon, the magazine Literary Digest
predicting Landon winning the election in
a 3-to-2 victory. A survey of 2.8 million
people. George Gallup surveyed only
50,000 people and
predicted
that survey
Undercoverage
– since
the Digest’s
Roosevelt
win. Theetc.,
Digest’s
comes
fromwould
car owners,
the survey
people
came from
magazine
car
selected
were
mostly subscribers,
from high-income
owners, and
telephone
directories,
etc.
families
thus mostly
Republican!
(other
answers are possible)
2) Suppose that you want to
estimate the total amount of
money spent by students on
textbooks each semester at
SMU.
You
collect
register
Convenience sampling – easy way to
receipts for
students
as they
collect
data
or
leave the bookstore
during
Undercoverage – students who buy
lunch
booksone
fromday.
on-line bookstores are
included.
3) To find the average
value of a home in Fort
Smith, one averages the
price of homes that are
listed for sale with a
Undercoverage – leaves out homes
realtor.
that are not for sale or homes that
are listed with different realtors.
(other answers are possible)
4) A new and somewhat
controversial polling procedure that
replaces the phone with the
Internet is being used to conduct
surveys. One criticism is that
Internet users as a whole are still
too highly educated and urban to
produce results that accurately
reflect all Americans.
5) “More than half of California’s
doctors say they are so frustrated
with managed care they will quit,
retire early, or leave the state.”
This statement comes from a survey
conducted by the California Medical
Association in which surveys were
sent to 19,000 doctors and 2000
completed surveys were returned.