Graphical Descriptive Techniques

Download Report

Transcript Graphical Descriptive Techniques

Graphical
Descriptive
Techniques
1
Frequency Distribution
Guidelines for Selecting Number of
Classes



Use between 5 and 20 classes.
Data sets with a larger number of elements
usually require a larger number of classes.
Smaller data sets usually require fewer
classes.
2
Frequency Distribution
Guidelines for Selecting Width of
Classes


Use classes of equal width.
Approximate Class Width =
Largest Data Value  Smallest Data Value
Number of Classes
3
Example: Hudson Auto
Repair
The manager of Hudson Auto would like to get a
better picture of the distribution of costs for engine
tune-up parts. A sample of 50 customer invoices has
been taken and the costs of parts, rounded to the
nearest dollar, are listed below.
91
71
104
85
62
78
69
74
97
82
93
72
62
88
98
57
89
68
68
101
75
66
97
83
79
52
75
105
68
105
99
79
77
71
79
80
75
65
69
69
97
72
80
67
62
62
76
109
74
73
4
Example: Hudson Auto
Repair
Frequency Distribution
If we choose six classes:
Approximate Class Width = (109 - 52)/6 = 9.5 10
Cost ($)
50-59
60-69
70-79
80-89
90-99
100-109
Total
Frequency
2
13
16
7
7
5
50
5
Example: Hudson Auto
Repair
Relative Frequency and Percent Frequency
Distributions
Relative
Percent
Cost ($)
Frequency
Frequency
50-59
.04
4
60-69
.26
26
70-79
.32
32
80-89
.14
14
90-99
.14
14
100-109
.10
10
Total 1.00
100
6
Example: Hudson Auto
Repair
Insights Gained from the Percent
Frequency Distribution




Only 4% of the parts costs are in the $5059 class.
30% of the parts costs are under $70.
The greatest percentage (32% or almost
one-third) of the parts costs are in the $7079 class.
10% of the parts costs are $100 or more.
7
Graphical Techniques for
Interval Data
Example 1: Providing information
concerning the monthly bills of new
subscribers in the first month after
signing on with a telephone company.



Collect data
Prepare a frequency distribution
Draw a histogram
8
Example 1: Providing information
Collect data
Bills
42.19
38.45
29.23
89.35
118.04
110.46
0.00
72.88
83.05
.
.
(There are 200 data points
Prepare a frequency distribution
How many classes to use?
Number of observations
Less then 50
50 - 200
200 - 500
500 - 1,000
1,000 – 5,000
5,000- 50,000
More than 50,000
Number of classes
5-7
7-9
9-10
10-11
11-13
13-17
17-20
Class width = [Range] / [# of classes]
[119.63 - 0] / [8] = 14.95
Largest
Largest
Largest
Largest
observation
observation
observation
observation
Smallest
Smallest
Smallest
Smallest
observation
observation
observation
observation
15
9
Example 1: Providing information
Draw a Histogram
Frequency
80
60
40
20
0
15 30
45 60
75 90 105 120
Bills
Bin
Frequency
15
71
30
37
45
13
60
9
75
10
90
18
105
28
120
14
10
Example 1: Providing information
What information can we extract from this histogram?
60
40
Bills
120
105
90
75
60
45
0
30
20
15
Frequency
About half of all A few bills are in Relatively,
the bills are small the middle range large number
13+9+10=32 of large bills
80 71+37=108
18+28+14=60
11
Relative frequency
It is often preferable to show the relative frequency
(proportion) of observations falling into each class,
rather than the frequency itself.
Class relative frequency =
Class frequency
Total number of observations
Relative frequencies should be used when



the population relative frequencies are studied
comparing two or more histograms
the number of observations of the samples studied are
different
12
Class width
It is generally best to use equal class width,
but sometimes unequal class width are called
for.
Unequal class width is used when the
frequency associated with some classes is
too low. Then,


several classes are combined together to form a
wider and “more populated” class.
It is possible to form an open ended class at the
higher end or lower end of the histogram.
13
Shapes of histograms
There are four typical shape characteristics
14
Shapes of histograms
Negatively skewed
Positively skewed
15
Modal classes
A modal class is the one with the largest
number of observations.
A unimodal histogram
The modal class
16
Modal classes
A bimodal histogram
A modal class
A modal class
17
Bell shaped histograms
• Many statistical techniques require that the
population be bell shaped.
• Drawing the histogram helps us to verify the shape of
the population in question
18
Interpreting histograms
Example 2: Selecting an investment



An investor is considering investing in one
out of two investments.
The returns on these investments were
recorded.
From the two histograms, how can the
investor interpret the
 Expected returns
 The spread of the return (the risk involved with
each investment)
19
Example 2 - Histograms
181614121086420-
The center
for A
-15
0 15 30 45 60 75
Return on investment A
181614121086420-15
The center
for B
0 15 30 45 60 75
Return on investment B
Interpretation: The center of the returns of Investment A
is slightly lower than that for Investment B
20
Example 2 - Histograms
181614121086420-
Sample size =50
17
34
46
-15
0 15 30 45 60 75
Return on investment A
Sample size =50
1816141210816
626
4243
0-15 0 15 30 45 60 75
Return on investment B
Interpretation: The spread of returns for Investment A
is less than that for investment B
21
Example 2 - Histograms
181614121086420-
-15
0 15 30 45 60 75
Return on investment A
181614121086420-15
0 15 30 45 60 75
Return on investment B
Interpretation: Both histograms are slightly positively
skewed. There is a possibility of large returns.
22
Providing information
Example 2: Conclusion

It seems that investment A is better, because:
 Its expected return is only slightly below that of
investment B
 The risk from investing in A is smaller.
 The possibility of having a high rate of return exists
for both investment.
23
Interpreting histograms
Example 3: Comparing students’
performance


Students’ performance in two statistics classes
were compared.
The two classes differed in their teaching
emphasis
 Class A – mathematical analysis and development of
theory.
 Class B – applications and computer based analysis.


The final mark for each student in each course
was recorded.
Draw histograms and interpret the results.
24
Interpreting histograms
Frequency
Histogram
40
20
0
50
60
Frequency
The mathematical emphasis
creates two groups, and a
larger spread.
70
80
90
100
90
100
Marks(Manual)
Histogram
40
20
0
50
60
70
80
Marks(Computer)
25
Stem and Leaf Display
This is a graphical technique most often
used in a preliminary analysis.
Stem and leaf diagrams use the actual
value of the original observations
(whereas, the histogram does not).
26
Stem-and-Leaf Display
A stem-and-leaf display shows both the rank order
and shape of the distribution of the data.
It is similar to a histogram on its side, but it has the
advantage of showing the actual data values.
The first digits of each data item are arranged to the
left of a vertical line.
To the right of the vertical line we record the last digit
for each item in rank order.
Each line in the display is referred to as a stem.
Each digit on a stem is a leaf.
8 57
27
9 3678
Stem-and-Leaf Display
Leaf Units




A single digit is used to define each leaf.
In the preceding example, the leaf unit was 1.
Leaf units may be 100, 10, 1, 0.1, and so on.
Where the leaf unit is not shown, it is
assumed to equal 1.
28
Example: Leaf Unit = 0.1
If we have data with values such as
8.6
11.7
9.4
9.1
10.2
11.0
8.8
a stem-and-leaf display of these data will be
Leaf Unit = 0.1
8 6 8
9 1 4
10 2
11 0 7
29
Example: Leaf Unit = 10
If we have data with values such as
1806 1717 1974 1791 1682 1910 1838
a stem-and-leaf display of these data will be
Leaf Unit = 10
16 8
17 1 9
18 0 3
19 1 7
30
Stem and Leaf Display
Split each observation into two parts.
There are several ways of doing that:
42.19
Stem
42
Observation:
Leaf
19
42.19
Stem
4
Leaf
2
A stem and leaf display for
Example 1 will use this
method next.
31
Stem and Leaf Display
A stem and leaf display for Example 1
Stem
0
1
2
3
4
5
6
7
8
9
10
11
Leaf
0000000000111112222223333345555556666666778888999999
000001111233333334455555667889999
0000111112344666778999
001335589
124445589
The length of each line
33566
represents the frequency
3458
022224556789
of the class defined by
334457889999
the stem.
00112222233344555999
001344446699
124557889
32
Ogives
Ogives are cumulative relative frequency
distributions.
Example 1 - continued
Cumulative relative frequency
Cumulative relative frequency for telephone bills
Class
0-15
15-30
30-45
45-60
60-75
75-90
90-105
105-200
Cumulative
Frequency frequency
71
71
37
108
13
121
9
130
10
140
18
158
28
186
14
200
}}
Cum.Relative
frquency
71/200=.355
108/200=.540
121/200=.605
130/200=.650
140/200=.700
158/200=.790
186/200=.930
200/200=1.000
.700
.650
.605
.540
.790
.930 1.000
.355
15
30
45
Bills
60
75
90
105 120
33
Summarizing Qualitative Data
Frequency Distribution
Relative Frequency
Percent Frequency Distribution
Bar Graph
Pie Chart
34
Frequency Distribution
A frequency distribution is a tabular
summary of data showing the frequency
(or number) of items in each of several
nonoverlapping classes.
The objective is to provide insights
about the data that cannot be quickly
obtained by looking only at the original
data.
35
Example: Marada Inn
Guests staying at Marada Inn were asked to rate the
quality of their accommodations as being excellent,
above average, average, below average, or poor. The
ratings provided by a sample of 20 quests are shown
below.
Below Average
Above Average
Above Average
Average
Above Average
Average
Above Average
Average
Above Average
Below Average
Poor
Excellent
Above Average
Average
Above Average
Above Average
Below Average
Poor
Above Average
Average
36
Example: Marada Inn
Frequency Distribution
37
Relative Frequency
Distribution
The relative frequency of a class is the
fraction or proportion of the total number
of data items belonging to the class.
A relative frequency distribution is a
tabular summary of a set of data
showing the relative frequency for each
class.
38
Percent Frequency
Distribution
The percent frequency of a class is the
relative frequency multiplied by 100.
A percent frequency distribution is a
tabular summary of a set of data
showing the percent frequency for each
class.
39
Example: Marada Inn
Relative Frequency and Percent Frequency
Distributions
40
Graphical Techniques for
Nominal data
The only allowable calculation on nominal
data is to count the frequency of each value
of a variable.
When the raw data can be naturally
categorized in a meaningful manner, we can
display frequencies by


Bar charts – emphasize frequency of occurrences
of the different categories.
Pie chart – emphasize the proportion of
occurrences of each category.
41
The Pie Chart
The pie chart is a circle, subdivided into
a number of slices that represent the
various categories.
The size of each slice is proportional to
the percentage corresponding to the
category it represents.
42
Pie Charts
The pie chart is a commonly used graphical device
for presenting relative frequency distributions for
qualitative data.
First draw a circle; then use the relative frequencies
to subdivide the circle into sectors that correspond to
the relative frequency for each class.
Since there are 360 degrees in a circle, a class with a
relative frequency of .25 would consume .25(360) =
90 degrees of the circle.
43
Example: Marada Inn
Pie Chart
44
Example: Marada Inn
Insights Gained from the Preceding Pie Chart
 One-half of the customers surveyed gave Marada
a quality rating of “above average” or “excellent”
(looking at the left side of the pie). This might
please the manager.
 For each customer who gave an “excellent” rating,
there were two customers who gave a “poor”
rating (looking at the top of the pie). This should
displease the manager.
45
The Pie Chart
Example 4



The student placement office at a university
wanted to determine the general areas of
employment of last year school graduates.
Data were collected, and the count of the
occurrences was recorded for each area.
These counts were converted to proportions
and the results were presented as a pie
chart and a bar chart.
46
The Pie Chart
Other
11.1%
Accounting
28.9%
(28.9 /100)(3600) = 1040
General
management
14.2%
Finance
20.6%
Marketing
25.3%
47
Pie Charts
Advantages
• display relative proportions of multiple classes of data
• size of the circle can be made proportional to the total quantity it
represents
• summarize a large data set in visual form
• be visually simpler than other types of graphs
• permit a visual check of the reasonableness or accuracy of
calculations
• require minimal additional explanation
• be easily understood due to widespread use in business and the
media
Disadvantages
• do not easily reveal exact values
• Many pie charts may be needed to show changes over time
• fail to reveal key assumptions, causes, effects, or patterns
• be easily manipulated to yield false impressions
48
Bar Graph
A bar graph is a graphical device for depicting
qualitative data.
On the horizontal axis we specify the labels that are
used for each of the classes.
A frequency, relative frequency, or percent frequency
scale can be used for the vertical axis.
Using a bar of fixed width drawn above each class
label, we extend the height appropriately.
The bars are separated to emphasize the fact that
each class is a separate category.
49
The Bar Chart
Rectangles represent each category.
The height of the rectangle represents the frequency.
The base of the rectangle is arbitrary
Bar Chart
Frequency
80
73
70
60
64
52
50
40
36
28
30
20
10
0
1
2
3
4
5
More
Area
50
Example: Marada Inn
51
The Bar Chart
Use bar charts also when the order in which
nominal data are presented is meaningful.
Total number of new products introduced in
North America in the years 1989,…,1994
20,000
15,000
10,000
5,000
0
‘89
‘90
‘91
‘92
‘93
‘94
52
Describing the Relationship
Between Two Variables
We are interested in the relationship between
two interval variables.
Example 7



A real estate agent wants to study the relationship
between house price and house size
Twelve houses recently sold are sampled and
Size
Price
there size and price recorded
23
315
Use graphical technique to describe the 18
229
relationship between size and price.
26
335
20
261
……………..
……………..
53
Describing the Relationship
Between Two Variables
Solution


The size (independent variable, X) affects
the price (dependent variable, Y)
We use Excel to create a scatter diagram
Y
400
300
200
100
X
0
0
10
20
30
40
54
Typical Patterns of Scatter Diagrams
Positive linear relationship
No relationship
Negative nonlinear relationship
Negative linear relationship
Nonlinear (concave) relationship
This is a weak linear relationship.
A non linear relationship seems to
fit the data better.
55
Graphing the Relationship
Between Two Nominal Variables
We create a contingency table.
This table lists the frequency for each
combination of values of the two
variables.
We can create a bar chart that
represent the frequency of occurrence
of each combination of values.
56
Crosstabulation
Crosstabulation is a tabular method for summarizing
the data for two variables simultaneously.
Crosstabulation can be used when:
 One variable is qualitative and the other is
quantitative
 Both variables are qualitative
 Both variables are quantitative
The left and top margin labels define the classes for
the two variables.
57
Contingency table
Example 8

To conduct an efficient advertisement
campaign the relationship between
occupation and newspapers readership is
studied. The following table was created
G&M
Post
Star
Sun
Blue Collar White collar Professional
27
29
33
18
43
51
38
15
24
37
21
18
58
Contingency table
Solution
If there is no relationship between
occupation and newspaper read, the bar
charts describing the frequency of
readership of newspapers should look
similar across occupations.
59
Bar charts for a contingency table
Blue
40
30
20
10
0
1
2
3
4
3
4
Blue-collar workers prefer
the “Star” and the “Sun”.
White
50
40
30
20
10
0
1
2
Prof
60
White-collar workers and
professionals mostly read the
“Post” and the “Globe and Mail”
50
40
30
20
10
0
1
2
3
4
60
Describing Time-Series Data
Data can be classified according to the
time it is collected.


Cross-sectional data are all collected at
the same time.
Time-series data are collected at
successive points in time.
Time-series data are often depicted on
a line chart (a plot of the variable over
time).
61
Line Chart
Example 9


The total amount of income tax paid by
individuals in 1987 through 1999 are listed
below.
Draw a graph of this data and describe the
information produced
62
Line Chart
Line Chart
1,200,000
1,000,000
800,000
600,000
400,000
200,000
0
87 88 89 90 91 92 93 94 95 96 97 98 99
For the first five years – total tax was relatively flat
From 1993 there was a rapid increase in tax revenues.
Line charts can be used to describe nominal data time series.
63
Tabular and Graphical
Procedures
64