GRAPHICAL METHODS FOR QUANTITATIVE DATA

Download Report

Transcript GRAPHICAL METHODS FOR QUANTITATIVE DATA

Chapter 3
Graphical and Numerical
Summaries of Data





UNIT OBJECTIVES
At the conclusion of this unit you should be able to:
1) Construct graphs that appropriately describe
data
2) Calculate and interpret numerical summaries of a
data set.
3) Combine numerical methods with graphical
methods to analyze a data set.
4) Apply graphical methods of summarizing data to
choose appropriate numerical summaries.
5) Apply software and/or calculators to automate
graphical and numerical summary procedures.
Displaying Qualitative Data
“Sometimes you can see a lot just
by looking.”
Yogi Berra
Hall of Fame Catcher, NY Yankees
The three rules of data analysis
won’t be difficult to remember



1. Make a picture —reveals aspects not obvious in
the raw data; enables you to think clearly about the
patterns and relationships that may be hiding in your
data.
2. Make a picture —to show important features of
and patterns in the data. You may also see things
that you did not expect: the extraordinary (possibly
wrong) data values or unexpected patterns
3. Make a picture —the best way to tell others
about your data is with a well-chosen picture.
Bar Charts: show counts
or relative frequency for
each category

Example: Titanic passenger/crew distribution
Titanic Passengers by Class
1000.00
885
900.00
800.00
706
700.00
600.00
500.00
400.00
325
285
300.00
200.00
100.00
0.00
Crew
First
Second
Third
Pie Charts: shows
proportions of the
whole in each category

Example: Titanic passenger/crew
distribution
Titanic Passengers by Class
Third
32%
Second
13%
Crew
40%
First
15%
Example: Top 10 causes of death in the United
States 2001
Rank Causes of death
Counts
% of top
10s
% of total
deaths
1 Heart disease
700,142
37%
28%
2 Cancer
553,768
29%
22%
3 Cerebrovascular
163,538
9%
6%
4 Chronic respiratory
123,013
6%
5%
5 Accidents
101,537
5%
4%
6 Diabetes mellitus
71,372
4%
3%
7 Flu and pneumonia
62,034
3%
2%
8 Alzheimer’s disease
53,852
3%
2%
9 Kidney disorders
39,480
2%
2%
32,238
2%
1%
10 Septicemia
All other causes
629,967
25%
For each individual who died in the United States in 2001, we record what was
the cause of death. The table above is a summary of that information.
Top 10 causes of death: bar graph
Top 10 causes of deaths in the United States 2001
The number of individuals
who died of an accident in
2001 is approximately
100,000.
Ca
nc
Ce
er
re
s
br
ov
Ch
as
cu
ro
ni
la
c
r
re
sp
ira
to
ry
Ac
ci
Di
de
ab
nt
s
et
es
m
el
Fl
litu
u
&
s
pn
eu
Al
zh
m
on
ei
m
ia
er
's
di
se
Ki
as
dn
e
ey
di
so
rd
er
s
Se
pt
ice
m
ia
ise
as
es
800
700
600
500
400
300
200
100
0
He
ar
td
Counts (x1000)
Each category is represented by one bar. The bar’s height shows the count (or
sometimes the percentage) for that particular category.
zh
ei
m
er
's
di
de
nt
s
se
as
e
Ac
ci
800
700
600
500
400
300
200
100
0
Ca
nc
Ce
er
s
re
br
ov
Ch
as
cu
ro
la
ni
r
c
re
sp
ira
Di
to
ab
ry
et
es
m
el
Fl
litu
u
s
&
pn
eu
m
on
He
ia
ar
td
ise
as
Ki
dn
es
ey
di
so
rd
er
s
Se
pt
ice
m
ia
Al
Counts (x1000)
ise
as
es
Ca
nc
Ce
er
re
s
br
ov
Ch
as
cu
ro
ni
la
c
r
re
sp
ira
to
ry
Ac
ci
Di
de
ab
nt
s
et
es
m
el
Fl
litu
u
&
s
pn
eu
Al
zh
m
on
ei
m
ia
er
's
di
se
Ki
as
dn
e
ey
di
so
rd
er
s
Se
pt
ice
m
ia
He
ar
td
Counts (x1000)
800
700
600
500
400
300
200
100
0
Top 10 causes of deaths in the United
States 2001
Bar graph sorted by rank
 Easy to analyze
Sorted alphabetically
 Much less useful
Computer Hardware Sales 2009 ($billion)
1. United States $158
2. China $64.4
3. Japan $54
4. Germany $24.4
5. Britain $23.5
6. France $19.3
7. Brazil $14.2
8. Italy $13.1
9. Australia $12.8
10. India $11.9
NY Times
Software Sales 2009 ($billions)
1. United States $137.9
2. Japan $23.4
3. Germany $20
4. Britain $16.8
5. France $12.6
6. Canada $7.3
7. Italy $6.3
8. China $5.4
9. Netherlands $5.4
10. Australia $4.8
Top 10 causes of death: pie chart
Each slice represents a piece of one whole. The size of a slice depends on what
percent of the whole this category represents.
Percent of people dying from
top 10 causes of death in the United States in 2001
Make sure your
labels match
the data.
Make sure
all percents
add up to 100.
Percent of deaths from top 10 causes
Percent of
deaths from
all causes
Average Student Debt by State 2010
Class
$0
New Hampshire
Maine
Iowa
Minnesota
Pennsylvania
Vermont
Ohio
Indiana
Rhode island
New York
Michigan
Massachusetts
Connecticut
Alabama
Wisconsin
Louisiana
DC
Idaho
Oregon
Illinois
New Jersey
West Virginia
South Carolina
Virginia
South Dakota
Montana
Alaska
Missouri
Kansas
Mississippi
Washington
Colorado
Maryland
Delaware
Arkansas
Nebraska
Florida
North Carolina
Texas
Oklahoma
Wyoming
Tennessee
Kentucky
Georgia
Arizona
California
Nevada
New Mexico
Hawaii
Utah
$5,000 $10,000$15,000$20,000$25,000$30,000$35,000
Student Debt North Carolina Schools
North Carolina Private Schools
2010 Class
Average debt of graduates
0
Campbell University Inc
New Life Theological Seminary
Meredith College
Mid-Atlantic Christian University
Wake Forest University
Methodist University
Johnson C Smith University
Chowan University
Catawba College
Mars Hill College
Elon University
Wingate University
Lenoir-Rhyne University
Davidson College
St Andrews Presbyterian…
Duke University
Belmont Abbey College
Mean North Carolina - 4-year…
Brevard College
Warren Wilson College
Mount Olive College
Salem College
Saint Augustines College
High Point University
Tuition and fees (in-state)
20000
North Carolina Public Schools 2010
Class
Average debt of graduates
40000
0
UNC Greensboro
UNC School of the Arts
NC A & T
Mean North Carolina - 4-year or
above
NCSU
UNC-Wilmington
UNC Charlotte
ECU
Appalachian
UNC Asheville
Elizabeth City
Tuition and fees (in-state)
5000 10000 15000 20000 25000
Child poverty before and after
government intervention—UNICEF,
1996
What does this chart tell you?
•The United States has the highest rate of child
poverty among developed nations (22% of
under 18).
•Its government does almost the least—through
taxes and subsidies—to remedy the problem
(size of orange bars and percent difference
between orange/blue bars).
Could you transform this bar graph to fit in 1 pie
chart? In two pie charts? Why?
The poverty line is defined as 50% of national median income.
Unnecessary dimension in a
pie chart
Contingency Tables:
Categories for Two
Variables

Example: Survival and class on the
Titanic Marginal distributions
Crew
Alive
Dead
Total
First
212
673
885
885/2201
marg. dist. 40.2%
of class
Second Third
202
118
123
167
325
285
325/2201
14.8%
285/2201
12.9%
Total
178
528
706
706/2201
32.1%
710
1491
2201
marg. dist.
of survival
710/2201
32.3%
1491/2201
67.7%
Marginal distribution of class.
Bar chart.
Marginal distribution of class:
Pie chart
Contingency Tables: Categories
for Two Variables (cont.)

Conditional distributions.
Given the class of a passenger, what is the
chance the passenger survived?
Crew
Alive
Survival
Dead
Total
Count
% of col.
Count
% of col.
Count
212
24.0%
673
76.0%
885
First
202
62.2%
123
37.8%
325
Class
Second Third
Total
118
178
710
41.4%
25.2%
32.3%
167
528
1491
58.6%
74.8%
67.7%
285
706
2201
Conditional distributions:
segmented bar chart
Contingency Tables:
Categories for Two
Variables (cont.)
Questions:

What fraction of survivors were in first class?

What fraction of passengers were in first class and
survivors ?

What fraction of the first class passengers
survived?
Class
Crew
Alive
Survival
Dead
Total
Count
% of col.
Count
% of col.
Count
212
24.0%
673
76.0%
885
First
202
62.2%
123
37.8%
325
202/710
202/2201
202/325
Second Third
Total
118
178
710
41.4%
25.2%
32.3%
167
528
1491
58.6%
74.8%
67.7%
285
706
2201
TV viewers during the Super Bowl in 2007.
What is the marginal distribution of those
who watched the commercials only?
1.
2.
3.
4.
8.0%
23.5%
58.2%
27.7%
TV viewers during the Super Bowl in 2007.
What percentage watched the Game and
were Female?
1.
2.
3.
4.
41.8%
38.8%
51.2%
19.8%
TV viewers during the Super Bowl in 2007.
Given that a viewer did not watch the Super
Bowl Game or Commercials, what
percentage were male?
1.
2.
3.
4.
45.2%
48.8%
26.8%
27.7%
3-Way Tables

Example: Georgia death-sentence data
Death
Sentence
Yes
No
Totals
% Death Sentence
Race of Defendant
Black
White
Race of Victim
Race of Victim
Black
White
Black
White
18
50
2
58
1420
178
62
687
1438
228
64
745
1.2
21.9
3.1
7.8
Totals
128
2347
2475
UC Berkeley Lawsuit
MEN
WOMEN
No. of
applicants
2691
1835
Admitted
1199
557
%
admitted
44.6
30.4
LAWSUIT (cont.)
MEN
MAJOR
A
B
C
D
E
F
TOTAL
No. of
Applicants
825
560
325
417
191
373
2691
No.
Admitted
512 (62%)
353 (63%)
120 (37%)
138 (33%)
53 (28%)
23 (6%)
1199
WOMEN
No. of
No.
Applicants Admitted
108
*89 (82%)
25
*17 (68%)
593
202 (34%)
375
*131 (35%)
393
94 (24%)
341
*24 (7%)
1835
557
Simpson’s Paradox

The reversal of the direction of a
comparison or association when
data from several groups are
combined to form a single group.
Fly Alaska Airlines, the ontime airline!
Alaska Airlines
% Arrivals No. of
Destination On Time
Arrivals
L. A.
88.9%
559
Phoenix
94.8%
233
San Diego 91.4%
232
San Fran. 83.1%
605
Seattle
85.8%
2,146
Total
3,775
American West
% Arrivals No. of
On Time Arrivals
85.6%
811
92.1%
5,255
85.5%
448
71.3%
449
76.7%
262
7,225
American West Wins!
You’re a Hero!
Alaska Airlines
% Arrivals No. of
Destination On Time
Arrivals
L. A.
88.9%
559
Phoenix
94.8%
233
San Diego 91.4%
232
San Fran. 83.1%
605
Seattle
85.8%
2,146
Total
3,775
86.7%
American West
% Arrivals No. of
On Time Arrivals
85.6%
811
92.1%
5,255
85.5%
448
71.3%
449
76.7%
262
7,225
89.1%
End of Chapter 3