Chapter 12 – Sample Surveys Vocabulary  Population - the entire group of individuals or instances about which we hope to learn  Sample.

Download Report

Transcript Chapter 12 – Sample Surveys Vocabulary  Population - the entire group of individuals or instances about which we hope to learn  Sample.

Chapter 12 – Sample
Surveys
Vocabulary
 Population - the entire group of individuals or instances
about which we hope to learn
 Sample - a representative subset of a population
 Sample Survey - descriptive study that asks questions of
a sample in hope of learning something about the whole
population
 Bias - any failure to accurately represent the population
in a sample
 Randomization - process by which each individual is
given a fair and equal chance of selection for the sample
(Best Defense Against Bias)
 Sample size - the number of individuals in a sample
More Vocabulary
 Census - a sample that consists of the entire population
 Population parameter - a numerically valued attribute of
a model of a population (What? - you’ll see later)
 Sample statistic -a term for statistics that parallel a
parameter (better definition later)
 Representative - a sample is said to be representative of
a population if it accurately reflects the population.
 Sampling frame - a list of individuals from which the
sample is drawn
 Sampling variability - the natural tendency of randomly
drawn samples to differ from one another
A Quick Note
Not all the vocabulary for this chapter was
listed on the previous slides
Some of the vocabulary is better understood
when it is taken in the context of the chapter
And now the chapter…
Understanding Samples
Samples are used to “stretch” beyond the
data at hand to the entire world or group
at large
There are three necessary ideas in order
to make this “stretch” or draw this
conclusion (ERS)
Examine a part of the whole
Randomize
Sample size
STEP 1. Examining a part
of the whole
Researchers often want to know about an
entire population but surveying an entire
population is often impractical or
impossible, therefore …
Researchers often settle for creating a
representative sub-group or “sample”
from the population
Sample Surveys
Sample Surveys - Designed to ask
questions of a small group of people in
order to learn something about the entire
population
Sample surveys are everywhere
National polls
Newspaper polls
Internet electronic polls
Sample Surveys
How can sample surveys truly represent a
population?
In order to understand this, let’s look first at a
failed sample survey
In 1936, the Literary Digest magazine held a mock
election poll with its readers.
The magazine used telephone numbers in order to
select a sample of the population.
According to the survey, Alf Landon received 57% of
the votes beating F.D. Roosevelt (43%) in a landslide.
When the real election was held, FDR won 62% to
32%
What Went Wrong?
The magazine used a “biased” sample.
In 1936, the telephone was a luxury afforded only by
the affluent so the sample inadvertently was
composed of only wealthy individuals.
Roosevelt’s was extremely popular among the less
affluent, therefore the sample used underrepresented FDR’s support.
How can researchers eliminate biased
representation in samples?
The best strategy is ….
STEP 2. Randomize
Randomization is the best statistical weapon
against sampling bias. Why?
 it protects researchers by making sure that on
average that sample looks like the rest of the
population.
Populations have various features that may influence the
validity of the findings, sometimes even features that
researchers haven’t thought about. Randomization accounts
for this by giving every one an equal chance of selection and
representation in the sample.
 it also allows researchers to make inferences from
their sample to the population from which it was
drawn because the sample represents the population
accurately.
STEP 3. Sample Size
The fraction of the population that you’ve
sampled does NOT matter! The only thing
that is important is the sample size itself!
Samples need to be representative
In order to see the proportion of a population
that fall into a category, it is necessary to see
several respondents in each category in order
to say anything precise enough to be useful.
(usually several hundred respondents)
Census
Hey! Wouldn’t it be easier to just survey
the whole population, then there is no
need to worry about any of the sampling
stuff. Right??
NO! A census may appear to really represent
the population but it may actually not for
three main reasons.
A Census Doesn’t Make
Sense
1) It can be difficult to complete a census
 there are always some individuals who are hard to locate
(e.g., homeless) or hard to survey (e.g., people with
limited ability to communicate)
2) Populations are always changing
 Deaths and births are constantly happening and constantly
changing the population
 By the time the census is completed, an event could have
changed everyone’s opinion regarding the questions in the
census.
3) A census is more complicated than a survey
 Census’s often require a team effort and the help of the
population being surveyed.
Populations and Parameters
Population parameters - a parameter that
is part of a model of the population
Sample Statistics –
Statistics – computations from the data that
describe the sample
Summary statistics - computations from the
data that estimate or refer to the population
parameters
Let’s meet some new and old parameters.
The Parameters
Mean - (  )
Standard Deviation - (  )
Correlation - (  )
Regression coefficient - ( B )
Proportion - ( p )
Simple Random Samples
A Simple Random Sample (SRS) gives each
combination of people within the population an
equal chance to be selected for the survey.
How is this done?
Simple Random Sampling
 To select a sample at random we must first select a
sampling frame.
 A sampling frame is a list of individuals from which
the sample is drawn. (must be precise)
Sampling frames allow us to draw random samples from
large groups.
Within the sample frame we are able to select random
members that will represent the entire sampling frame
accurately.
 However, when we draw a sample at random, each
sample will be different. We call these sample to
sample differences, sampling variability.
Other Sampling Designs
All Statistical sampling designs have the
common idea that chance, not human
choice, is used to select the sample.
Besides SRS, there are three other main
Sampling designs
Stratified Random Sampling
Cluster Sampling
Multistage Sampling
Other Sampling Designs
Stratified Random Sampling
used when a population is already broken up
into stratas or homogeneous groups.
then within each strata SRS is used.
Cluster Sampling
used when a population is already broken
into homogeneous groups BUT only one
group is going to be surveyed.
SRS is used in only one strata or group in
this sampling design.
Other Sampling Designs
The final and most common design is..
Multistage Sampling
multistage sampling utilizes more than one
method of sampling
refers to complex sampling schemes that
combine several sampling methods.
E.g. - a random survey which is followed up by a
phone call if the person does not complete the
survey.
Systematic Samples
Systematic Sampling
A list of population members is prepared and every
N th name is selected until the sample size is reached;
beginning from a randomly selected point
Can be used when there is no reason to believe
that the order of the list in the sampling frame is
related to answers sought
• E.g., If the list is alphabetical and your asking a question
about a political subject, a systematic sampling method
could be choosing every tenth name, until your sample
size is reached.
Sampling Badly
Many of the most convenient forms of
sampling can be extremely biased.
There are four main problems or sample
types that can cause bad samples
Voluntary response sample
Convenience sample
Bad sampling frame
Undercoverage
Voluntary response sample
Voluntary response sample
In this approach, a large group of individuals
are eligible to participate but only those who
respond to the survey are counted.
Why is this bad?
Leads to a bias because only those who care
strongly enough about the survey will respond;
therefore, the results from the sample are not
representative of the entire population
Convenience Sample
Convenience sample
Only those individuals who are at hand are included
Why is this Bad?
Leads to bias in response because the people at hand
often have a common tie and are not representative of the
whole population
Bad Sampling Frame
Bad Sampling Frame
In a simple SRS survey people can often be
excluded from the sampling frame
Why is this Bad?
Gives us an incomplete picture or representation
of the population
• Remember the Roosevelt poll on an earlier slide?
– The results were biased because most of the poor
people in America were not included in the sampling
frame (e.g., they didn’t have a telephone so they
couldn’t be selected yet they could and did vote).
Undercoverage
Undercoverage
Refers to the scenario in which a portion of
the population is not represented or has
smaller representation then the rest of the
population
Why is this Bad?
It doesn’t allow for an accurate representation of
the population, therefore no accurate predictions
or inferences can be made from the data.
What can go wrong (BIAS)
Nonresponse bias
Nonresponse to surveys can be a source of
bias because those who do not respond to a
survey could differ from those who do.
To prevent this bias:
• Don’t bore people with long surveys
• Don’t send out a lot of surveys; send out fewer random
surveys in scenarios which you can ensure a high
response level
What can go wrong (BIAS)
Response Bias
Refers to a bias brought about by survey questions
which influences responses
This influence is often referred to as a “leading
question”
In leading questions the surveyor uses influential
words to “lead” a person to a certain answer.
E.g.
• Do you think that the evil companies who destroy animals’
habitats should be allowed to continue destroying the rain
forest when they harvest trees?- biased
• Do you think companies should be allowed to harvest trees
from the rain forest? - not biased
Rules for Eliminating Bias in
Your Surveys
Look for bias in any survey you encounter
there is no way to recover from a sample or survey that
asks biased questions. All of your data becomes useless
when you have a biased question included in your survey!
Spend your time and resources reducing biases
If possible, test your survey before you use it
Always report you sampling methods in detail
Practice Problems
Let’s try a problem! (#3 pg. 243)
Identify the following items from each
passage if possible:
a) The population
b) The population parameter of interest,
c) The sampling frame
d) The sample
e) The sampling method; was randomization
used?
f) Potential sources of bias or generalization
problems
Practice Problems
 #3 - Consumers Union asked all subscribers whether they had
used alternative medical treatments and, if so, whether they had
benefited from them. For almost all of the treatments, approx.
20% of those responding reported cures or substantial
improvement.
 A) Population - All US Adults
 b) Parameter - Proportion who have used and benefited from
alternative medical treatments
 c) Sampling Frame - all Consumer’s Union subscribers
 d) Sample - those who responded
 e) Method - a nonrandom questionnaire
 f) Bias - Voluntary response sample causes the bias. Only those who
cared strongly enough about the question responded. This sample can
not represent the whole population because those who did not respond
could have different opinions or answers then those who did respond
Practice Problems
Let’s try one more! (pg. 243 #13)
Question 1: Should elementary school - aged children
have to pass high stakes tests in order to remain with
their classmates?
Question 2: Should schools and students be held
accountable for meeting yearly learning goals by testing
students before they advance to the next grade?
 A) Do you think response to these questions might
differ? What kind of bias is this?
 B) Propose a question with more neutral wording that
might better assess parental opinion.
Practice Problems
Solution
a) Answers to the questions will definitely differ. Question
1 is worded to scare the respondents into “no” answers by
using extreme descriptions (high stakes tests). Question 2
is worded to receive more “yes” answers by changing the
subject of the question from the actual passing of the test
to accountability for learning. This is a type of wording or
response bias.
b) Do you think that students should have to pass a
standardized test in order to be promoted to the next
grade level? - This is better because it doesn’t use any
extreme words or subject changes.