Probability & Statistics for P

Download Report

Transcript Probability & Statistics for P

PROBABILITY &
STATISTICS FOR P-8
TEACHERS
Chapter 2
Frequency Distributions and
Graphs
WHAT IS STATISTICS?
Most people become familiar with probability and statistics
through radio, television, newspapers, and magazines. For
example, the following statements were found in
newspapers.
o The U.S. is home to a quarter of the world’s cars
o
The mean NFL salary is $1.8 million in 2009
• The league minimum is $295,000
• The median salary is $770,000
o
The average cost of a wedding is $19,581
o
There are only thirteen blimps in the world.
o
Women who eat fish once a week are 29% less
likely to develop heart disease
Is Statistics important in our life?
VARIOUS USES OF STATISTICS






Statistics is used in almost all fields of human endeavor.
In sports,
 number of yards a running back gains during a football game, or the
number of hits a baseball player gets in a season.
In public health
 concerned with the number of residents who contract a new strain of
flu virus during a certain year.
In education
 are new methods of teaching are better than old ones?
Furthermore, statistics is used to analyze the results of surveys and as a
tool in scientific research to make decisions based on controlled
experiments.
Other uses of statistics include operations research, quality control,
estimation, and prediction.
Statistics is the
science of
conducting studies
to collect, organize,
summarize, analyze,
present, interpret
and draw
conclusions from
data.
Any values (observations or
measurements) that have been collected
4
WHAT IS STATISTICS?



Statistics is a collection of tools and
methods, designed to help us understand
the world.
Statistics are calculations and
interpretations made from data.
Statistics helps us to understand the real,
imperfect world in which we live.
WHAT IS STATISTICS?
THINK, SHOW, TELL

There are three simple steps to doing Statistics right:
first. Know where you’re headed and why.
is about the mechanics of calculating statistics
and graphical displays, which are important (but are not
the most important part of Statistics).
what you’ve learned. You must explain your
results so that someone else can understand your
conclusions.
TERMINOLOGY
A
variable is a characteristic or attribute
that can assume different values.
 The
values that a variable can assume are
called data.
A
population consists of all subjects
(human or otherwise) that are studied.
A
sample is a subset of the population.
POPULATION VS SAMPLE
DESCRIPTIVE AND INFERENTIAL
STATISTICS
 Descriptive
statistics consists of the
collection, organization, summarization, and
presentation of data.
 Inferential
statistics consists of
generalizing from samples to populations,
performing estimations and hypothesis tests,
determining relationships among variables,
and making predictions.
DEFINITIONS
Variable: A characteristic that varies from one person or
thing to another.
Qualitative variable: A non-numerically valued variable.
Quantitative variable: A numerically valued variable.
Discrete variable: A quantitative variable whose
possible values can be listed.
Continuous variable: A quantitative variable whose
possible values form some interval of numbers.
A discrete variable is a quantitative variable that either
has a finite number of possible values or a countable
number of possible values. The term “countable”
means the values result from counting such as 0, 1, 2, 3,
and so on.
A continuous variable is a quantitative variable that
has an infinite number of possible values it can take on
and can be measured to any desired level of accuracy.
ORGANIZING DATA
 Data
collected in original form is called raw
data.
A
frequency distribution is the
organization of raw data in table form, using
classes and frequencies.
 Qualitative
data can be placed in categories
and organized in categorical frequency
distributions.
CATEGORICAL FREQUENCY
DISTRIBUTION
A random sample of twenty-five patients in a
hospital were given a blood test to determine
their blood type.
Raw Data:
A
B
B
AB
O
O
O
B
B
AB
B
B
O
A
O
AB
O
O
O
A
A
AB
O
B
A
Construct a frequency distribution for the
data.
CATEGORICAL FREQUENCY DISTRIBUTION
Raw Data:
A
B
B
AB
O
O
O
B
B
AB
B
B
O
A
O
AB
O
O
O
A
A
AB
O
B
A
Class Tally
A
B
O
AB
Total
Frequency Percent
IIII
IIII II
IIII IIII
IIII
5
7
9
4
25
20
28
36
16
100
RELATIVE FREQUENCY DISTRIBUTION
The percentage of a class is called the relative
frequency of the class
A table that provides all classes and their relative
frequencies is called a relative-frequency
distribution. Note that the relative frequencies sum
to 1 (100%).
Class Tally
A
B
O
AB
Total
Frequency Percent
IIII
IIII II
IIII IIII
IIII
5
7
9
4
25
20
28
36
16
100
5/25
7/25
9/25
4/25
GROUPED FREQUENCY DISTRIBUTION
 Grouped
frequency distributions are used
when the range of the data is large.
 The
smallest and largest possible data values
in a class are the lower and upper class
limits. Class boundaries separate the
classes.
 To
find a class boundary, average the upper
class limit of one class and the lower class
limit of the next class.
GROUPED FREQUENCY DISTRIBUTION
 The
class width can be calculated by
subtracting
 successive lower class limits (or boundaries)
 successive upper class limits (or
boundaries)
 upper and lower class boundaries
 The
class midpoint Xm can be calculated by
averaging
 upper and lower class limits (or boundaries)
GROUPED FREQUENCY
DISTRIBUTION
Rules for Classes in Grouped Frequency
Distributions
1.
2.
3.
4.
5.
6.
There should be 5-20 classes.
The class width should be an odd number.
The classes must be mutually exclusive.
The classes must be continuous.
The classes must be exhaustive.
The classes must be equal in width
(except in open-ended distributions).
GROUPED FREQUENCY
DISTRIBUTION
The following data represent
the record high temperatures
for each of the 50 states.
Construct a grouped frequency
distribution for the data using 7
classes.
112
110
107
116
120
100
118
112
108
113
127
117
114
110
120
120
116
115
121
117
134
118
118
113
105
118
122
117
120
110
105
114
118
119
118
110
114
122
111
112
109
105
106
104
114
112
109
110
111
114
CONSTRUCTING A GROUPED
FREQUENCY DISTRIBUTION
STEP 1 Determine the classes.
Find the class width by dividing the range by the
number of classes 7.
Range = High – Low
= 134 – 100 = 34
Width = Range/7 = 34/7 = 5
Rounding Rule: Always round up if a remainder.
CONSTRUCTING A GROUPED FREQUENCY
DISTRIBUTION
 For
convenience sake, we will choose the lowest
data value, 100, for the first lower class limit.
 The subsequent lower class limits are found by
adding the width to the previous lower class limits.
Class Limits
 The first upper class limit is
100 - 104
one less than the next lower
105 - 109
class limit.
110 - 114
115 - 119
 The subsequent upper class
120 - 124
limits are found by adding the
125 - 129
width to the previous upper
130 - 134
class limits.
CONSTRUCTING A GROUPED FREQUENCY
DISTRIBUTION
 The
class boundary is midway between an
upper class limit and a subsequent lower class
limit. 104,104.5,105
Class
Limits
Class
Boundaries
100 - 104
105 - 109
110 - 114
115 - 119
120 - 124
125 - 129
130 - 134
99.5 - 104.5
104.5 - 109.5
109.5 - 114.5
114.5 - 119.5
119.5 - 124.5
124.5 - 129.5
129.5 - 134.5
Frequency
CONSTRUCTING A GROUPED FREQUENCY
DISTRIBUTION
STEP 2 Tally the data.
STEP 3 Find the frequencies.
Class
Limits
Class
Boundaries
Frequency
100 - 104
105 - 109
110 - 114
115 - 119
120 - 124
125 - 129
130 - 134
99.5 - 104.5
104.5 - 109.5
109.5 - 114.5
114.5 - 119.5
119.5 - 124.5
124.5 - 129.5
129.5 - 134.5
2
8
18
13
7
1
1
CUMULATIVE FREQUENCY DISTRIBUTION
Sometimes it is helpful to keep a running total of the
frequencies. This is called a cumulative frequency
distribution
Class
Limits
Class
Boundaries
100 - 104
105 - 109
110 - 114
115 - 119
120 - 124
125 - 129
130 - 134
99.5 - 104.5
104.5 - 109.5
109.5 - 114.5
114.5 - 119.5
119.5 - 124.5
124.5 - 129.5
129.5 - 134.5
Cumulative
Frequency
Frequency
2
8
18
13
7
1
1
2
10
28
41
48
49
50
DATA ANALYSIS

There are three simple rules of data
analysis:
1. Make a picture—things may be revealed
that are not obvious in the raw data.
These will be things to think about.
2. Make a picture—important features of
and patterns in the data will show up.
You may also see things that you did not
expect.
3. Make a picture—the best way to tell
others about your data is with a wellchosen picture.
LET’S DRAW PICTURES
Most Common Graphs in Research
1.
Histogram
2.
Frequency Polygon
3.
Cumulative Frequency Polygon (Ogive)
4.
Bar Graph
5.
Pareto Chart
6.
Time-Series Graph
7.
Pie Graph
8.
Dot Plot
9.
Stem and Leaf Plot
10. Scatter
Plot
HISTOGRAMS
The histogram is a graph that displays
the data by using vertical bars of various
heights to represent the frequencies of
the classes.
The class boundaries are represented on
the horizontal axis.
HISTOGRAMS
Construct a histogram to represent the data
for the record high temperatures for each of
the 50 states
Histograms
use class
boundaries
and
frequencies of
the classes.
Class
Limits
Class
Boundaries
Frequency
100 - 104
105 - 109
110 - 114
115 - 119
120 - 124
125 - 129
130 - 134
99.5 - 104.5
104.5 - 109.5
109.5 - 114.5
114.5 - 119.5
119.5 - 124.5
124.5 - 129.5
129.5 - 134.5
2
8
18
13
7
1
1
HISTOGRAMS
Class boundaries represent the x-axis
Frequencies represent the y-axis
FREQUENCY POLYGON
 The
frequency polygon is a graph that
displays the data by using lines that
connect points plotted for the frequencies at
the class midpoints. The frequencies are
represented by the heights of the points.
 The
class midpoints are represented on the
horizontal axis.
FREQUENCY POLYGONS
Construct a frequency polygon to represent the
data for the record high temperatures for each
of the 50 states
Frequency
polygons
use class
midpoints
and
frequencies
of the
classes.
Class
Limits
Class
Midpoints
Frequency
100 - 104
105 - 109
110 - 114
115 - 119
120 - 124
125 - 129
130 - 134
102
107
112
117
122
127
132
2
8
18
13
7
1
1
FREQUENCY POLYGONS
A frequency polygon
is anchored on the
x-axis before the first
class and after the
last class.
OGIVES
(CUMULATIVE FREQUENCY
POLYGON)
 The
ogive is a graph that represents the
cumulative frequencies for the classes in a
frequency distribution.
 The
upper class boundaries are represented
on the horizontal axis.
OGIVES
Construct an Ogive to represent the data for
the record high temperatures for each of the
50 states
Ogives use
upper class
boundaries
and
cumulative
frequencies of
the classes.
Class
Limits
Class
Boundaries
Cumulative
Frequency
100 - 104
105 - 109
110 - 114
115 - 119
120 - 124
125 - 129
130 - 134
99.5 - 104.5
104.5 - 109.5
109.5 - 114.5
114.5 - 119.5
119.5 - 124.5
124.5 - 129.5
129.5 - 134.5
2
10
28
41
48
49
50
OGIVES
RELATIVE FREQUENCIES
If proportions are used instead of
frequencies, the graphs are called relative
frequency graphs.
Relative frequency graphs are used when
the proportion of data values that fall into a
given class is more important than the
actual number of data values that fall into
that class.
Construct a histogram, frequency polygon,
and ogive using relative frequencies for the
distribution (shown here) of the miles that 20
randomly selected runners ran during a
given week.
Miles per
Week
Frequency
Relative
Frequency
6 - 10
11 - 15
16 - 20
21 - 25
26 - 30
31 - 35
36 – 40
1
2
3
5
4
3
2
0.05
0.10
0.15
0.25
0.20
0.15
0.10
20
1.00
GRAPHS
SHAPES OF DISTRIBUTIONS
SHAPES OF DISTRIBUTIONS
OTHER TYPES OF GRAPHS
BAR GRAPHS
OTHER TYPES OF GRAPHS
PARETO CHARTS
OTHER TYPES OF GRAPHS
TIME SERIES GRAPHS
OTHER TYPES OF GRAPHS
PIE GRAPHS
OTHER TYPES OF GRAPHS
DOT PLOTS
OTHER TYPES OF GRAPHS
STEM AND LEAF PLOTS
A stem and leaf plot is a data plot that
uses part of a data value as the stem and
part of the data value as the leaf to form
groups or classes.
It has the advantage over grouped
frequency distribution of retaining the
actual data while showing them in graphic
form.
STEM AND LEAF PLOTS
At an outpatient testing center, the number
of cardiograms performed each day for 20
days is shown. Construct a stem and leaf
plot for the data.
25
14
36
32
31
43
32
52
20
2
33
44
32
57
32
51
13
23
44
45
STEM AND LEAF PLOTS
First, we list the leading digits of the numbers in
the table (0, 1, . . . , 5) in a column, as shown to
the left of the vertical rule. Next, we write the
final digit of each number from the table to the
right of the vertical rule in the row containing
the appropriate leading digit.
Unordered Stem Plot
0
1
2
3
4
5
2
3
5
1
3
7
4
0 3
2 6 2 3 2 2
4 4 5
2 1
25
14
36
32
31 20 32 13
43 2 57 23
32 33 32 44
52 44 51 45
Ordered Stem Plot
0 2
1 3 4
2 0 3 5
3 1 2 2 2 2 3 6
4 3 4 4 5
5 1 2 7
SCATTER PLOTS AND
CORRELATION
A
scatter plot is a graph of the ordered pairs
(x, y) of numbers consisting of the
independent variable x and the dependent
variable y.
A
scatter plot is used to determine if a
relationship exists between the two variables.
EXAMPLE: MOTORCYCLE
ACCIDENTS
A researcher is interested in determining if
there is a relationship between the number of
motorcycle accidents and the number of
motorcycle fatalities. The data are for a 10year period. Draw a scatter plot for the data.
No. of accidents,
x
No. of fatalities,
y
376 650 884 1162 1513 1650 2236 3002 4028 4010
5
20
20
28
26
34
35
Step 1: Draw and label the x and y axes.
Step 2: Plot each point on the graph.
56
68
55
EXAMPLE: MOTORCYCLE ACCIDENTS
No. of accidents,
x
No. of fatalities,
y
80
376 650 884 1162 1513 1650 2236 3002 4028 4010
5
20
20
28
26
34
35
56
68
Number of fatalities
70
60
50
40
30
20
10
0
0
1000
2000
3000
Number of accidents
4000
5000
55
ANALYZING THE SCATTER PLOT
1. A positive linear relationship exists
when the points fall approximately in an
ascending straight line from left to right
and both the x and y values increase at the
same time.
2. A negative linear relationship exists
when the points fall approximately in a
descending straight line from left to right.
3. A nonlinear relationship exists when the
points fall in a curved line.
4. It is said that no relationship exists when
there is no discernable pattern of the points.
ANALYZING THE SCATTER PLOT
(a) Positive linear
relationship
(b) Negative linear
relationship
(c) Nonlinear relationship
(d) No relationship
WHAT CAN GO WRONG?

Don’t make a histogram of a categorical
variable—bar charts or pie charts should
be used for categorical data.
WHAT CAN GO WRONG?

Choose a bin width appropriate to the
data.

Changing the bin width changes the
appearance of the histogram:
WHAT CAN GO WRONG?


Avoid inconsistent
scales, either
within the display
or when
comparing two
displays.
Label clearly so a
reader knows
what the plot
displays.

Good intentions,
bad plot: