Statistics 2/6/2013

Download Report

Transcript Statistics 2/6/2013

Statistics 2/6/2013
Quiz 1
1.Explain the difference between a Census and a
Sample.
2.Explain the difference between numerical and
Categorical data.
3.Give an example of data that is at the nominal
level of measurement.
Quiz 1 Solutions
1.Explain the difference between a Census and a
Sample. A census is a collection of data from the
entire population while a sample is from a subset.
2.Explain the difference between numerical and
Categorical data. Numerical data consists of
numbers. Categorical data consists of names of
labels or categories
3.Give an example of data that is at the nominal
level of measurement. Political affiliation.
Critical Thinking
We must think carefully about the context of the data the source
of the data, the method used in data collection, the
conclusions reached, and the practical implications
Fun Quotes
"There are three kinds of lies: lies, damned lies,
and statistics"-Benjamin Disraeli
"Figures don't lie; liars figure."-Mark Twain
"There are two kinds of statistics, the kind you
look up, and the kind you make up."-Rex Stout
•
•
•
Bad Statistics
Bad statistics happen either by evil intent or unintentional
errors. How to Lie with Statistics a book written by Darrell
Huff in 1954, is the classic text on this topic and has many
examples of intentional or unintentional misuses of statistics.
Misuse of graphs is a common way to misrepresent data or
results.
Bad Samples may result in incorrect findings as well. Bad
samples occur when the methods used to collect the data
results in a biased sample. So that the sample does not
represent the population from which it was obtained
A voluntary response sample (or self-selected sample) is
one in whch the respondents themselves decide whether to
be included.
Ex: Any poll or survey where the readers or listeners decide to
participate
Another way to misinterpret statistical data is to find a statistical
association between two variables and to conclude that one
of the variable caused the other. The relationship is called a
correlation. When one of the variables does cause a
change in the other, then we have causality.
Correlation and Causality
Reported results
When collecting data from people, it is better to take the
measurements yourself rather than rely on subjects to report
results.
Ex. 3 Voting Behavior When surveyed about whether they
voted or not about 70% of 1000 eligible voters reported that
they had voted. Voting records showed that only 61% had
indeed voted.
Small Samples
Conclusions should not be drawn from samples that are far too
small.
Ex. 4 The Children's defense fund published an article Children out
of School in America, in which it was reported that in a certain
school district that 67% of the students were suspended at least
three times. The figure is based on a sample of only 3 students.
Percentages
Percentages can be cited in a
manner that is either unclear or
misleading. The fact is that a 100%
of something is all of the
something. So if you see
percentages about 100% being
cited it is probably not justified.
Other Sampling considerations
Wording of a question
97% yes: "Should the President have the line item veto to
eliminate waste?"
57% yes: "should the President have the line item veto, or not?"
Order of Questions
• Would you say that traffic contributes more or less to air
pollution than industry?
• Would you say that industry contributes more or less to
air pollution than traffic?
Traffic first: 45% blamed traffic and 27% blamed industry
Industry first: 24% blamed traffic and 57% blamed industry
Nonresponse
Other Sampling Considertations
Missing Data Phone surveys miss people without phones
Self-Interest Study Kiwi Shoe Polish, getting a job
Precise Numbers You cannot assume precise numbers are
accurate
Deliberate Distortions Avis vs Hertz
Collecting Sample Data
If sample data are not collected in an appropriate way, the data
may be so completely useless that no amouicsnt of statistical
torturing can salvage them.
In an Observational Study, we observe and measure specific
characters, but we don't attempt to modify the subjects being
studied.
In an Experiment, we apply some treatment and then proceed
to observe its effects on the subjects. (Subjects in
experiments are called experimental units.)
Types of Samples
A Simple random sample of n subjects is selected in such a
way that every possible sample of the same size n has the
same chance of being chosen.
In a random sample members from the population are
selected in such a way that each individual member in the
sample has an equal chance of being selected.
A probability sample involves selecting members from a
population in such a way that each member of the population
has a know (but not necessarily the same) chance of being
selected
Types of Samples Example
Each of the 50 states sends two senators to Congress, so
there are exactly 100 senators. Suppose that we write the
name of each state on a separate index card, then mix 50
cards in a bowl, and then select one card. If we consider the
two senators from the selected state to be sampled, is this
result a random sample?
Types of Samples Example
Each of the 50 states sends two senators to Congress, so
there are exactly 100 senators. Suppose that we write the
name of each state on a separate index card, then mix 50
cards in a bowl, and then select one card. If we consider the
two senators from the selected state to be sampled, is this
result a random sample? Yes since each individual senator
has an equal chance of being picked.
Types of Samples Example
Each of the 50 states sends two senators to Congress, so
there are exactly 100 senators. Suppose that we write the
name of each state on a separate index card, then mix 50
cards in a bowl, and then select one card. If we consider the
two senators from the selected state to be sampled, is this
result a random sample? Yes since each individual senator
has an equal chance of being picked.
Simple random sample?
Types of Samples Example
Each of the 50 states sends two senators to Congress, so
there are exactly 100 senators. Suppose that we write the
name of each state on a separate index card, then mix 50
cards in a bowl, and then select one card. If we consider the
two senators from the selected state to be sampled, is this
result a random sample? Yes since each individual senator
has an equal chance of being picked.
Simple random sample? No not all samples of size two have
the same chance of being picked. (a sample of senators
from different states cannot be picked at all).
Types of Samples Example
Each of the 50 states sends two senators to Congress, so
there are exactly 100 senators. Suppose that we write the
name of each state on a separate index card, then mix 50
cards in a bowl, and then select one card. If we consider the
two senators from the selected state to be sampled, is this
result a random sample? Yes since each individual senator
has an equal chance of being picked.
Simple random sample? No not all samples of size two have
the same chance of being picked. (a sample of senators
from different states cannot be picked at all).
Probability sample?
Types of Samples Example
Each of the 50 states sends two senators to Congress, so
there are exactly 100 senators. Suppose that we write the
name of each state on a separate index card, then mix 50
cards in a bowl, and then select one card. If we consider the
two senators from the selected state to be sampled, is this
result a random sample? Yes since each individual senator
has an equal chance of being picked.
Simple random sample? No not all samples of size two have
the same chance of being picked. (a sample of senators
from different states cannot be picked at all).
Probability sample? Yes since each senator has a know
chance of being selected.
Other Sampling Methods
In Systematic sampling, we select some starting point and then
select every kth element in the population.
With convenience sampling, we simply use the results that are very
easy to get.
With Stratified sampling, we subdivide the population into at least
two different subgroups (or strata) so that subjects within the same
subgroup share the same characteristics (such as gender or age
bracket), then we draw a sample from each subgroup (or stratum).
In Cluster sampling, we first divide the population area into sections
(or clusters), then randomly select some of those clusters, and
then chose all the members from those selected clusters.
Other Sampling Methods
Multistage sampling occurs when pollsters collect data using
a combination of the basic sampling methods. In a
multistage sample design, pollsters select a sample in
different stages, and each stage might use different methods
of sampling.
Group Quiz 2
1. The Statistical Abstract of the United States includes the
average per capita income for each of the 50 states. When
those 50 values are added, then divided by 50, the result is
$29,672.52. Is $ 29,672.52 the average per capita income
for all individuals in the United States? Why or why not?
Frequency Distributions
We recorded the pulses of 40 women. Here it is!
76 64 72 80 88 76 60 76 72 76
68 80 80 104 64 88 68 60 68 76
80 72 76 72 68 88 72 80 96 60
72 72 68 88 72 88 64 124 80 64
This data is hard to make sense of so we (you) are going to
organize it using a Frequency Distribution (Table)
Frequency Distributions
A frequency Distribution shows how a data set is partitioned
among all of several categories (or classes) by listing all of
the categories along with the number of data values in each
of the categories.
Lower class limits are the smallest numbers that can belong
to the different classes.
Upper class limits are the largest numbers that can belong to
the different classes.
Class boundaries are the numbers used to separate the
classes, but without the gaps created by class limits
Frequency Distributions
Class midpoints are the values in the middle of the classes.
Class width is the difference between two consecutive lower
class limits.
Procedure for constructing a frequency
Distribution.
1. Determine the number of classes.
2. Calculate the class width.
class width= (max data value-min data value)/number of
classes.
3. Choose either the min data value or convenient value below
the min data value as the first lower class limit.
4. Using the first lower class limit and class width, list the other
lower class limits. Do this vertically and add in the upper
class limits
5. Tally up the data values in each class.
Example 1 Frequency table by hand.
76 64 72 80 88 76 60 76 72 76 68 80 80 104 64 88 68 60 68 76
80 72 76 72 68 88 72 80 96 60 72 72 68 88 72 88 64 124 80 64
1. Lets Have 7 classes.
2. Find the width.
Example 1 Frequency table by hand.
76 64 72 80 88 76 60 76 72 76 68 80 80 104 64 88 68 60 68 76
80 72 76 72 68 88 72 80 96 60 72 72 68 88 72 88 64 124 80 64
1. Lets Have 7 classes.
2. Find the width. 124-60= 64 64/7=9.14
List the min data value or
convenient data value
60
List the lower values
60
70
List the lower values
60
70
80
90
100
110
120
Add in the upper limit values
60-69
70-79
80-89
90-99
100-109
110-119
120-129
Tally Ho!
76 64 72 80 88 76 60 76 72 76 68
80 80 104 64 88 68 60 68 76
80 72 76 72 68 88 72 80 96 60 72
72 68 88 72 88 64 124 80 64
60-69
70-79
80-89
90-99
100-109
110-119
120-129
Tally Ho!
12
76 64 72 80 88 76 60 76 72 76 68
80 80 104 64 88 68 60 68 76
80 72 76 72 68 88 72 80 96 60 72
72 68 88 72 88 64 124 80 64
60-69
12
70-79
14
80-89
90-99
100-109
110-119
120-129
Tally Ho!
Pulse Rate
Freq
60-69
12
70-79
14
80-89
11
90-99
1
100-109
1
110-119
0
120-129
1
Relative Frequency
In a relative frequency the frequency is replaced with a relative
frequency (proportion) or a percentage frequency (percent).
Relative frequency=class frequency/sum of all frequencies
Percentage freq=(class freq/sum of all freq)*100%
Pulse Rate
60-69
Relative
Frequency
12/40
70-79
14/40
80-89
11/40
90-99
1/40
100-109
1/40
110-119
0/40
120-129
1/40
Change into a relative frequency
Pulse Rate
60-69
Relative
Frequency
12/40=0.3
70-79
14/40=0.35
80-89
11/40=0.27
90-99
1/40=0.025
100-109
1/40=0.025
110-119
0/40=0
120-129
1/40=0.025
Change into a relative frequency
Pulse Rate
60-69
Relative
Frequency
0.3
70-79
0.35
80-89
0.275
90-99
0.025
100-109
0.025
110-119
0
120-129
0.025
Change into a relative frequency
Pulse Rate
Freq
60-69
12
70-79
14
80-89
11
90-99
1
100-109
1
110-119
0
120-129
1
Change into cumulative frequency
Pulse Rate
Cumulative Freq
60-69
12
70-79
12+14
80-89
12+14+11
90-99
12+14+11+1
100-109
12+14+11+1+1
110-119
12+14+11+1+1+0
120-129
12+14+11+1+1+0+1
Change into cumulative frequency
Pulse Rate
Cumulative Freq
69 or less
12
79 or less
12+14=26
89 or less
12+14+11=37
99 or less
12+14+11+1=38
109 or less
12+14+11+1+1=39
119 or less
12+14+11+1+1+0=39
129 or less
12+14+11+1+1+0+1=40
Change into cumulative frequency
Pulse Rate
Cumulative Freq
69 or less
12
79 or less
26
89 or less
37
99 or less
38
109 or less
39
119 or less
39
129 or less
40
Frequency Distributions
Last Digit of female
pulses
0
Frequency
9
1
0
2
8
3
0
4
6
5
0
6
7
7
0
8
10
9
0
Frequency Distributions
IQ Scores from 1000 adults were randomly selected. The results are
summarized below. Notice the frequencies start low, increase then decrease.
IQ
Frequency
50-69
24
70-89
228
90-109
490
110-129
232
130-149
26
Histograms
A histogram is a graph consisting of bars of equal width drawn
adjacent to each other (without gaps). The Horizontal scale
represents classes of quantitative data value and the vertical
scale represents frequencies. The heights of the bars
correspond to the frequency values.
Female Pulse Rates
Frequency
15
10
5
0
Pulse Rate
Relative Frequency Histogram
A relative frequency histogram is the same as a histogram with
relative frequencies instead of frequencies.
Relative Freq
Female Pulse Rates
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
Pulse Rate
Cumulative Histogram
Cumulative Frequency Distribution of the
Pulse Rates of Females
50
40
30
20
10
0
69 or
less
79 or
less
89 or
less
99 or
less
109 or 119 or 129 or
less
less
less
IQ Scores
600
Frequency
500
400
300
200
100
0
50-69
70-89
90-109 110-129 130-149
IQ Score
This data because of its shape is said to have a
normal distribution.
Histograms
Frequency
Weights of Pennies
30
25
20
15
10
5
0
Weight of Penny
Statistical Graphs
obama-needs-charts-and-graphs