Transcript Slide 1

EF 507
QUANTITATIVE METHODS FOR ECONOMICS AND
FINANCE
FALL 2008
Chapter 2
Describing Data: Graphical
Chap 2-1
Chapter Goals
After completing this chapter, you should be able to:

Identify types of data and levels of measurement

Create and interpret graphs to describe categorical variables:

bar chart, pie chart

Create a line chart to describe time-series data

Create and interpret graphs to describe numerical variables:


Histogram

Construct and interpret graphs to describe relationships between
variables
Describe appropriate and inappropriate ways to display data
graphically
Chap 2-2
Types of Data
Data
Categorical
Numerical
Examples:



Marital Status
Are you registered to
vote?
Eye Color
(Defined categories or
groups)
Discrete
Examples:


Number of Children
Defects per hour
(Counted items)
Continuous
Examples:


Weight
Voltage
(Measured characteristics)
Chap 2-3
Measurement Levels
Differences between
measurements, true
zero exists
Ratio Data
Quantitative Data
Differences between
measurements but no
true zero
Interval Data
Ordered Categories
(rankings, order, or
scaling)
Ordinal Data
Qualitative Data
Categories (no
ordering or direction)
Nominal Data
Chap 2-4
Graphical
Presentation of Data

Data in raw form are usually not easy to use
for decision making

Some type of organization is needed
 Table
 Graph

The type of graph to use depends on the
variable being summarized
Chap 2-5
Graphical
Presentation of Data
(continued)

Techniques reviewed in this chapter:
Categorical
Variables
• Bar chart
• Pie chart
• Pareto diagram
Numerical
Variables
• Line chart
• Histogram
• Scatter plot
Chap 2-6
Tables and Graphs for
Categorical Variables
Categorical
Data
Graphing Data
Bar
Chart
Pie
Chart
Chap 2-7
The Frequency
Distribution Table
Summarize data by category
Example: Hospital Patients by Unit
Hospital Unit
Cardiac Care
Emergency
Intensive Care
Maternity
Surgery
Number of Patients
1,052
2,245
340
552
4,630
(Variables are
categorical)
Chap 2-8
Bar and Pie Charts

Bar charts and Pie charts are often used
for qualitative (category) data

Height of bar or size of pie slice shows the
frequency or percentage for each
category
Chap 2-9
Bar Chart Example
Hospital Patients by Unit
5000
4000
3000
2000
1000
Surgery
Maternity
Intensive
Care
0
Emergency
1,052
2,245
340
552
4,630
Cardiac
Care
Cardiac Care
Emergency
Intensive Care
Maternity
Surgery
Number
of Patients
Number of
patients per year
Hospital
Unit
Chap 2-10
Pie Chart Example
Hospital
Unit
Cardiac Care
Emergency
Intensive Care
Maternity
Surgery
Number
of Patients
% of
Total
1,052
2,245
340
552
4,630
11.93
25.46
3.86
6.26
52.50
Hospital Patients by Unit
Cardiac Care
12%
Surgery
53%
(Percentages
are rounded to
the nearest
percent)
Emergency
25%
Intensive Care
4%
Maternity
6%
Chap 2-11
Pareto Diagram

Used to portray categorical data

A bar chart, where categories are shown in
descending order of frequency

A cumulative polygon is often shown in the
same graph

Used to separate the “vital few” from the “trivial
many”
Chap 2-12
Pareto Diagram Example
Example: 400 defective items are examined
for cause of defect:
Source of
Manufacturing Error
Number of defects
Bad Weld
34
Poor Alignment
223
Missing Part
25
Paint Flaw
78
Electrical Short
19
Cracked case
21
Total
400
Chap 2-13
Pareto Diagram Example
(continued)
Step 1: Sort by defect cause, in descending order
Step 2: Determine % in each category
Source of
Manufacturing Error
Number of defects
% of Total Defects
Poor Alignment
223
55.75
Paint Flaw
78
19.50
Bad Weld
34
8.50
Missing Part
25
6.25
Cracked case
21
5.25
Electrical Short
19
4.75
Total
400
100%
Chap 2-14
Pareto Diagram Example
(continued)
Step 3: Show results graphically
60%
100%
90%
50%
80%
70%
40%
60%
30%
50%
40%
20%
30%
20%
10%
10%
0%
cumulative % (line graph)
% of defects in each category
(bar graph)
Pareto Diagram: Cause of Manufacturing Defect
0%
Poor Alignment
Paint Flaw
Bad Weld
Missing Part
Cracked case
Electrical Short
Chap 2-15
Graphs for Time-Series Data

A line chart (time-series plot) is used to show
the values of a variable over time

Time is measured on the horizontal axis

The variable of interest is measured on the
vertical axis
Chap 2-16
Line Chart Example
Magazine Subscriptions by Year
350
Thousands of subscribers
300
250
200
150
100
50
0
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
Chap 2-17
Graphs to Describe
Numerical Variables
Numerical Data
Frequency Distributions
and
Cumulative Distributions
Histogram
Chap 2-18
Histogram

A graph of the data in a frequency distribution
is called a histogram

The interval endpoints are shown on the
horizontal axis

the vertical axis is either frequency, relative
frequency, or percentage

Bars of the appropriate heights are used to
represent the number of observations within
each class
Chap 2-19
Histogram Example
Interval
Frequency
Histogram : Daily High Tem perature
3
6
5
4
2
7
5
5
4
4
3
3
2
2
1
(No gaps
between
bars)
6
6
Frequency
10 but less than 20
20 but less than 30
30 but less than 40
40 but less than 50
50 but less than 60
0
0
0
0 0 10 10 2020 30 30 40 40 50 50 60 60 70
Temperature in Degrees
Chap 2-20
Histograms in Excel
1
Select
Tools/Data Analysis
Chap 2-21
Histograms in Excel
(continued)
2
Choose Histogram
(
Input data range and bin
range (bin range is a cell
3
range containing the upper
interval endpoints for each class
grouping)
Select Chart Output
and click “OK”
Chap 2-22
Questions for Grouping Data
into Intervals

1. How wide should each interval be?
(How many classes should be used?)

2. How should the endpoints of the
intervals be determined?



Often answered by trial and error, subject to
user judgment
The goal is to create a distribution that is
neither too "jagged" nor too "blocky”
Goal is to appropriately show the pattern of
variation in the data
Chap 2-23
How Many Class Intervals?
Many (Narrow class intervals)
3
2.5
2
1.5
1
0.5
60
Temperature
Few (Wide class intervals)


may compress variation too much and
yield a blocky distribution
can obscure important patterns of
variation.
12
10
Frequency

8
6
4
2
0
0
30
60
More
Temperature
(X axis labels are upper class endpoints)
Chap 2-24
More
56
52
48
44
40
36
32
28
24
20
16
8
0
4

may yield a very jagged distribution
with gaps from empty classes
Can give a poor indication of how
frequency varies across classes
12

3.5
Frequency

Distribution Shape
The shape of the distribution is said to be
symmetric if the observations are balanced,
or evenly distributed, about the center.
Symmetric Distribution
Frequency

10
9
8
7
6
5
4
3
2
1
0
1
2
3
4
5
6
7
8
9
Chap 2-25
Distribution Shape
(continued)

The shape of the distribution is said to be
skewed if the observations are not
symmetrically distributed around the center.
Positively Skewed Distribution
12
10
Frequency
A positively skewed distribution
(skewed to the right) has a tail that
extends to the right in the direction
of positive values.
8
6
4
2
0
1
3
4
5
6
7
8
9
7
8
9
Negatively Skewed Distribution
12
10
Frequency
A negatively skewed distribution
(skewed to the left) has a tail that
extends to the left in the direction of
negative values.
2
8
6
4
2
0
1
2
3
4
5
6
Chap 2-26
Relationships Between Variables


Graphs illustrated so far have involved only a
single variable
When two variables exist other techniques are
used:
Categorical
(Qualitative)
Variables
Numerical
(Quantitative)
Variables
Cross tables
Scatter plots
Chap 2-27
Scatter Diagrams

Scatter Diagrams are used for paired
observations taken from two
numerical variables

The Scatter Diagram:
 one variable is measured on the vertical
axis and the other variable is measured
on the horizontal axis
Chap 2-28
Scatter Diagram Example
Cost per
day
23
125
26
140
29
146
33
160
38
167
42
170
50
188
55
195
60
200
Cost per Day vs. Production Volume
250
Cost per Day
Volume
per day
200
150
100
50
0
0
10
20
30
40
50
60
70
Volume per Day
Chap 2-29
Scatter Diagrams in Excel
1
Select the chart wizard
2
Select XY(Scatter) option,
then click “Next”
3
When prompted, enter the
data range, desired
legend, and desired
destination to complete
the scatter diagram
Chap 2-30
Graphing
Multivariate Categorical Data

Side by side bar charts
C o m p arin g In vesto rs
S avings
CD
B onds
S toc k s
0
10
Inves tor A
20
30
Inves tor B
40
50
60
Inves tor C
Chap 2-31
Side-by-Side Chart Example

Sales by quarter for three sales territories:
East
West
North
1st Qtr
2nd Qtr
3rd Qtr
4th Qtr
20.4
27.4
59
20.4
30.6
38.6
34.6
31.6
45.9
46.9
45
43.9
60
50
40
East
West
North
30
20
10
0
1st Qtr
2nd Qtr
3rd Qtr
4th Qtr
Chap 2-32
Data Presentation Errors
Goals for effective data presentation:

Present data to display essential information

Communicate complex ideas clearly and
accurately

Avoid distortion that might convey the wrong
message
Chap 2-33
Data Presentation Errors
(continued)

Unequal histogram interval widths

Compressing or distorting the
vertical axis

Providing no zero point on the
vertical axis

Failing to provide a relative basis
in comparing data between
groups
Chap 2-34
Chapter Summary


Reviewed types of data and measurement levels
Data in raw form are usually not easy to use for decision
making -- Some type of organization is needed:
 Table

 Graph
Techniques reviewed in this chapter:




Frequency distribution
Bar chart
Pie chart
Pareto diagram





Line chart
Frequency distribution
Histogram
Scatter plot
Side-by-side bar charts
Chap 2-35
Which of the following variables is an example of a
categorical variable?
A. The amount of money you spend on eating out each
month.
B. The time it takes you to write a test.
C. The geographic region of the country in which you live.

D. The weight of a cereal box.
Chap 2-36

The data in the time series plot below represents monthly
sales for two years of beanbag animals at a local retail
store (Month 1 represents January and Month 12
represents December). Do you see any obvious patterns
in the data? Explain.

This is a representation of seasonal data. There seems
to be a small increase in months 3, 4, and 5 and a large
increase at the end of the year. The sales of this item
seem to peak in December and have a significant drop off
in January.
Chap 2-37
At a large company, the majority of the employees earn
from $20,000 to $30,000 per year. Middle management
employees earn between $30,000 and $50,000 per year
while top management earn between $50,000 and
$100,000 per year. A histogram of all salaries would
have which of the following shapes?
a. Symmetrical
b. Uniform
c. Skewed to right
d. Skewed to left

Chap 2-38