Transcript Chapter 2

Chapter 2
Presenting Data in Tables and
Charts
Note:
• Sections 2.1 & 2.2 - examining data
from 1 numerical variable.
• Section 2.3 - examining data from 2
numerical variables.
• Section 2.4 - examining data from 1
categorical variable (read).
• Section 2.5 - examining data from 2
categorical variables.
Section 2.1
Organizing Numerical Data
Examining One Numerical Variable.
Ordered Array
• Array of data ordered from smallest to
largest value
– Makes it easier to see the extreme values
and where the majority of values are
located.
Using Excel
• Data | Sort
• Select the heading of the column you want to
sort by first. Choose ascending or
descending.
• Select the heading of the column you wanted
to sort by second. Choose ascending or
descending. Etc.
• Choose appropriate button “Header row” or
“No header row”.
Stem & Leaf Display
• Shows how the data varies over a range
of observations
• Separates data according to leading
digits (stems) and trailing digits (leaves).
Stem & Leaf Display
Data
74
74.3
74.6
78.4
79.8
80.2
81.4
82.0
84.7
86.0
89.2
Stem Unit of 1
74 3 6
75
76
77
78 4
79 8
80 2
81 4
82
83
84 7
85
86
86
88
89 2
Stem & Leaf Display
Data
74
74.3
74.6
78.4
79.8
80.2
81.4
82.0
84.7
86.0
89.2
x
7 458
8 00159
Stem unit: 10
Using PHStat
7 4 4 5 8 10
8012569
The 10 in the top right cell
shows that the number
rounds to 80 but is in the
70’s
Using PHStat to create a
Stem & Leaf Display
• PHStat | Descriptive Statistics | Stem-and-Leaf
Display
• Enter range of values
• If selection contains a heading, leave selected
“First cell contains a label”.
• Select Stem Unit
• Enter Title
Section 2.2
Tables And Charts For Numerical Data
Examining One Numerical Variable
The Frequency Distribution
• Data is arranged into class groupings.
• Creating class groupings
– Number of classes
• Depends on number of observations
• Typically 5 <= class groupings < 15
– Intervals should be the same width. Use the following:
• Width of interval = Range / Number of class groupings
– Avoid overlapping classes
Frequency Distribution (continued)
• Consists of the number of occurrences
of a value fitting within the range of
each interval.
• Advantage - Data characteristics can be
approximated.
• Disadvantage - Individual values are
lost due to the grouping.
Ex. Given the following data:
74
74.3
74.6
78.4
79.8
80.2
81.4
82.0
84.7
86.0
89.2
Number of
classes.
Width of interval
Lets choose 5
89.2 - 74 = 3.04
5
Approx. 3
Frequency Distribution
Interval
74 - 77
77 - 80
80 - 83
83 - 86
86 - 89
89 - 92
Frequency
3
2
3
1
1
1
Right boundary
is not included.
Using PHStat to create a
Frequency Distribution
• PHStat | Descriptive Statistics | Frequency
Distribution
• Enter the variable cell range
• Enter the bin cell range
• If you selected the heading when selecting
the data, leave selected “First cell in each
range contains label”.
• Leave selected “Single Group Variable”
• Enter title of your choice.
Bin (Used for PHStat only)
• Contains the values that approximate the
maximum value of each class.
• For example:
– If your intervals are,
• -20.0 to -10.0
• -10.0 to 0.0
• 0
to 10.0
• 10.0 to 20.0
– Your bin values could be
•
•
•
•
-10.1
-0.1
9.9
19.9
Bin Values
Intervals
If your data were recorded
with 2 places after the
decimal, your bin values
would be:
-10.01
-.01
9.99
19.99
Example
See the file Sec2.2.xls
Relative Frequency
Distribution
• First create a Frequency Distribution.
• The values in the Relative Frequency
Distribution are formed by dividing the
frequency of each value within each
class by the total number of values.
• The Relative Frequency Distribution
contains the proportion of times a value
occurs within each class.
Relative Frequency Distribution
Interval
74 - 77
77 - 80
80 - 83
83 - 86
86 - 89
89 - 92
Total
Frequency
3
2
3
1
1
1
11
Relative
Frequency
3/11 = .2727
2/11 = .1818
3/11 = .2727
1/11 = .0909
1/11 = .0909
1/11 = .0909
Percentage Distribution
• First create a Relative Frequency
Distribution
• The values in the Percentage
Distribution are formed by multiplying
each proportion in the Rel. Freq. Dist.
by 100.
Percentage Distribution
Interval Freq. Rel. Freq.
0 - 74
0
74 - 77
3
77 - 80
2
80 - 83
3
83 - 86
1
86 - 89
1
89 - 92
1
Total 11
0.00
.2727
.1818
.2727
.0909
.0909
.0909
Percentage Freq.
0%
27.27%
18.18%
27.27%
9.09%
9.09%
9.09%
Benefit of a Relative
Frequency Distribution or
Percentage Distribution
• Essential when comparing two sets of
data consisting of a different number of
values.
For example:
Study 1
Study 2
2
2
5
8
5 occurs 1/5 times.
1/5 = 0.2
2
9
Or 20% of the time
5
2
8
5
5
5
8
5
2
5
5
5 occurs 7/12 times.
7/12 = 0.583 Or
58.3% of the time
Cumulative Percentage
Distribution
• Demonstrates the growth over the
classes.
Cumulative Percentage
Distribution
Interval
Rel.Fq.
Cumulative Dist.
0 - 74
0.00
0%
= 0.0%
74 - 77
0.2727
0%
= 0.0%
77 - 80
0.1818
27.27%
= 27.27%
80 - 83
0.2727
27.27% + 18.18%
= 45.45%
83 - 86
0.0909
27.27% + 18.18% + 27.27%
= 72.72%
86 - 89
0.0909
27.27% + 18.18% + 27.27% 9.09%
=81.81%
89 - 92
0.0909
27.27% + 18.18% + 27.27% + 9.09% + 9.09%
= 90.9%
92 - 95
0.00
27.27% + 18.18% + 27.27% + 9.09% + 9.09% + 9.09%
= 99.99%
Total
.9999
Cumulative Percentage
Distribution
• Top of Pg. 56. SOLUTION From Table
2.5 ...
• Error
Using PHStat to create a
Percentage or Cumulative
Percentage Distribution
• These are automatically generated
when you create a Frequency
distribution.
Class Midpoint
• Point halfway between the boundaries
of each class.
Histogram
• Using a picture to demonstrate data.
• Describes the numerical data that has been
grouped into a frequency, relative frequency,
or percentage distribution.
• The random variable of interest is displayed
along the horizontal axis (x-axis).
• The number, proportion or percentage of
values per class are plotted along the vertical
axis (y-axis)
Histogram
3
2.5
2
1.5
1
0.5
0
0 - 74
74 - 77 77 - 80 80 - 83 83 - 86 86 - 89 89 - 92 92 - 95
Frequency
Polygon (same info as
Histogram)
• Using a picture to demonstrate data.
• Describes the numerical data that has been
grouped into a frequency, relative frequency,
or percentage distribution.
• The random variable of interest is displayed
along the horizontal axis (x-axis).
• The number, proportion or percentage of
values per class are plotted along the vertical
axis (y-axis)
Polygon
3.5
Frequency
3
2.5
2
1.5
1
0.5
0
0 - 74
74 - 77
77 - 80
80 - 83
83 - 86
86 - 89
89 - 92
92 - 95
Using PHStat to create a
Histogram & Polygon
• PHStat | Descriptive Statistics | Histogram &
Polygons
• Enter the Variable Cell Range
• Enter the Bin Cell Range
• Enter the Midpoints Cell Range
• If the first row contains headings, leave
selected “First cell in each range contains
label”.
• Select “Multiple Groups - Unstacked”.
• Enter title of your choice
• Leave check boxes on default selection.
Section 2.3
• Graphing Bivariate Numerical Data
• Examining 2 numerical variables.
Scatter Diagram
• Used to demonstrate the relationship
between to numerical variables.
• One numerical variable is plotted on the
x-axis.
• The other numerical variable is plotted
on the y-axis.
• The result is a point on the x-y plane.
Example
• Cholesterol Level
200 176 115
100 120 199 151 100 150
• Meat Consumption in Ounces / Day
24
21
8
3
3
30
26
6
15
Scatter Diagram of previous
data:
Meat Consumption in Ounces / Day
35
30
25
20
15
10
5
0
0
50
100
150
Cholesterol Level
200
250
Section 2.4
• Tables and charts for categorical data
• Covered in CSC 199
– Read
Section 2.5
• Tabulating and Graphing Bivariate
Categorical Data
• Use a Contingency Table or a
Side-By-Side Chart.
Contingency Table
• Also called, “Cross-Classification Table”
• Used to study the values from two
categorical variables.
Example:
A sample of 20 graduates
was taken and each
individual was asked:
1. What was your major?
2. What is your salary
level?
<= $30,000
$30,000 - $50,000
>= $50,000
Degree
Year in School
English
>=$50,000
Math
$30,000 - $50,000
Math
<= $30,000
English
$30,000 - $50,000
English
<= $30,000
Philosophy
$30,000 - $50,000
Philosophy
<= $30,000
English
>=$50,000
Philosophy
<= $30,000
Math
>=$50,000
Math
$30,000 - $50,000
Math
>=$50,000
Math
>=$50,000
English
$30,000 - $50,000
A count of the number of degrees within each salary range.
Degree
<= $30,000
$30,000 - $50,000 >= $50,000
Total
English
1
2
2
5
Math
1
2
3
6
Philosophy
2
1
0
3
Grand Total
4
5
5
14
Each value is divided by the total (12)
Percentages based on overall total
Degree
<= $30,000
$30,000 - $50,000 >= $50,000
Total
English
7.14%
14.29%
14.29%
35.71%
Math
7.14%
14.29%
21.43%
42.86%
Philosophy
14.29%
7.14%
0.0%
21.43%
Total
28.57%
35.71%
35.71%
100.00%
Percentages based on overall total
Degree
<= $30,000
$30,000 - $50,000 >= $50,000
Total
English
7.14 %
14.29 %
14.29 %
35.71 %
Math
7.14 %
14.29 %
21.43 %
42.86 %
Philosophy
14.29 %
7.14 %
0.0 %
21.43 %
Total
28.57 %
35.71 %
35.71 %
100.00 %
28.57 % of all polled make $30,000 or under.
42.86 % of all polled majored in math.
21.43 % of all polled majored in math and make $50,000 or more.
A count of the number of degrees within each salary range.
Degree
<= $30,000
$30,000 - $50,000 >= $50,000
Total
English
1
2
2
5
Math
1
2
3
6
Philosophy
2
1
0
3
Grand Total
4
5
5
14
Each value is divided by the total of its row.
Percentages based on row total
Degree
<= $30,000
$30,000 - $50,000 >= $50,000
Total
English
20.00 %
40.00 %
40.00 %
100.00 %
Math
16.67 %
33.33 %
50.00 %
100.00 %
Philosophy
66.67 %
33.33 %
0.0 %
100.00 %
Total
28.57 %
35.71 %
35.71 %
100.00 %
Percentages based on row total
Degree
<= $30,000
$30,000 - $50,000 >= $50,000
Total
English
20.00 %
40.00 %
40.00 %
100.00 %
Math
16.67 %
33.33 %
50.00 %
100.00 %
Philosophy
66.67 %
33.33 %
0.0 %
100.00 %
Total
28.57 %
35.71 %
35.71 %
100.00 %
Of those who majored in math, 50.00 % make $50,000 or more.
Of those who majored in philosophy, 66.67 % make $30,000 or less.
A count of the number of degrees within each salary range.
Degree
<= $30,000
$30,000 - $50,000 >= $50,000
Total
English
1
2
2
5
Math
1
2
3
6
Philosophy
2
1
0
3
Grand Total
4
5
5
14
Each value is divided by the total of its column
Percentages based on column total
Degree
<= $30,000
$30,000 - $50,000 >= $50,000
Total
English
25.00 %
40.00 %
40.00 %
35.71 %
Math
25.00 %
40.00 %
60.00 %
42.86 %
Philosophy
50.00 %
20.00 %
0.0 %
21.43 %
Total
100.00 %
100.00 %
100.00 %
100.00 %
Percentages based on column total
Degree
<= $30,000
$30,000 - $50,000 >= $50,000
Total
English
25.00 %
40.00 %
40.00 %
35.71 %
Math
25.00 %
40.00 %
60.00 %
42.86 %
Philosophy
50.00 %
20.00 %
0.0 %
21.43 %
Total
100.00 %
100.00 %
100.00 %
100.00 %
Of those who make $30,000 or less, 50.00 % majored in philosophy
Of those who make between $30,000 and $50,000, 20.00 % majored
in philosophy.
Side-By-Side Chart
• Visual display of bivariate categorical data.
• Used to detect relationships in the data.
Consider the following data:
NC SC NE IL
Percentage of Pop. that is literate
93
89 99 98
Percent of crime-related deaths
10
15 4
5
Side-By-Side Chart of the previous data
IL
NE
Crime Rate
Literacy Rate
SC
NC
0
50
100
150
See the following:
• Excel Handbook for Chapter 2
• Pg. 93 - 104