section 1.2 powerpoint

Download Report

Transcript section 1.2 powerpoint

+
Chapter 1: Exploring Data
Section 1.2
Displaying Quantitative Data with Graphs
The Practice of Statistics, 4th edition - For AP*
STARNES, YATES, MOORE
+
Chapter 1
Exploring Data
 Introduction:
Data Analysis: Making Sense of Data
 1.1
Analyzing Categorical Data
 1.2
Displaying Quantitative Data with Graphs
 1.3
Describing Quantitative Data with Numbers
+
Section 1.2
Displaying Quantitative Data with Graphs
Learning Objectives
After this section, you should be able to…

CONSTRUCT and INTERPRET dotplots, stemplots, and histograms

DESCRIBE the shape of a distribution

COMPARE distributions

USE histograms wisely
One of the simplest graphs to construct and interpret is a
dotplot. Each data value is shown as a dot above its
location on a number line.
How to Make a Dotplot
1)Draw a horizontal axis (a number line) and label it with the
variable name.
2)Scale the axis from the minimum to the maximum value.
3)Mark a dot above the location on the horizontal axis
corresponding to each data value.
Number of Goals Scored Per Game by the 2004 US Women’s Soccer Team
3
0
2
7
8
2
4
3
5
1
1
4
5
3
1
1
3
3
3
2
1
2
2
2
4
3
5
6
1
5
5
1
1
5
Displaying Quantitative Data

+
 Dotplots

The purpose of a graph is to help us understand the data. After
you make a graph, always ask, “What do I see?”
How to Examine the Distribution of a Quantitative Variable
In any graph, look for the overall pattern and for striking
departures from that pattern.
Describe the overall pattern of a distribution by its:
•Shape
•Center
•Spread
Don’t forget your
SOCS!
Note individual values that fall outside the overall pattern.
These departures are called outliers.
+
Examining the Distribution of a Quantitative Variable
Displaying Quantitative Data

+
this data
Example, page 28

The table and dotplot below displays the Environmental
Protection Agency’s estimates of highway gas mileage in miles
per gallon (MPG) for a sample of 24 model year 2009 midsize
cars.
2009 Fuel Economy Guide
MODEL
2009 Fuel Economy Guide
2009 Fuel Economy Guide
MPG
MPG
MODEL
<new>MODEL
MPG
1
Acura RL
922 Dodge Avenger
1630 Mercedes-Benz E350
24
2
Audi A6 Quattro
1023 Hyundai Elantra
1733 Mercury Milan
29
3
Bentley Arnage
1114 Jaguar XF
1825 Mitsubishi Galant
27
4
BMW 5281
1228 Kia Optima
1932 Nissan Maxima
26
5
Buick Lacrosse
1328 Lexus GS 350
2026 Rolls Royce Phantom
18
6
Cadillac CTS
1425 Lincolon MKZ
2128 Saturn Aura
33
7
Chevrolet Malibu
1533 Mazda 6
2229 Toyota Camry
31
8
Chrysler Sebring
1630 Mercedes-Benz E350
2324 Volksw agen Passat
29
9
Dodge Avenger
1730 Mercury Milan
2429 Volvo S80
25
<new>
Describe the shape, center, and spread of
the distribution. Are there any outliers?
Displaying Quantitative Data
 Examine
Here is the estimated battery life for each of 9 different smart
phones (in minutes) according to http://cellphones.toptenreviews.com/smartphones/.
Smart Phone
Battery Life (minutes)
Apple iPhone
Motorola Droid
Palm Pre
Blackberry Bold
Blackberry Storm
Motorola Cliq
Samsung Moment
Blackberry Tour
HTC Droid
300
385
300
360
330
360
330
300
460
Collection 1
300
Dot Plot
340 380 420 460
BatteryLife (minutes)
Displaying Quantitative Data

data
+
 Examine this
Alternate Example
Describe the shape, center, and spread of the distribution. Are
there any outliers?
Describe the shape, center, and spread of the distribution. Are there any
outliers?
Solution: Shape: There is a peak at 300 and the distribution has a long tail
to the right (skewed to the right). Center: The middle value is 330
minutes. Spread: The range is 460 – 300 = 160 minutes. Outliers: There
is one phone with an unusually long battery life, the HTC Droid at 460
minutes.
When you describe a distribution’s shape, concentrate on
the main features. Look for rough symmetry or clear
skewness.
Definitions:
A distribution is roughly symmetric if the right and left sides of the
graph are approximately mirror images of each other.
A distribution is skewed to the right (right-skewed) if the right side of
the graph (containing the half of the observations with larger values) is
much longer than the left side.
It is skewed to the left (left-skewed) if the left side of the graph is
much longer than the right side.
Symmetric
Skewed-left
Skewed-right
Displaying Quantitative Data

Shape
+
 Describing
Here are dotplots for two different sets of quantitative data.
Collection 2
Dot Plot
Collection 1
100
0
2
4
6
DieRoll
8
Dot Plot
140
180
calories
220
10
The first dotplot shows the outcomes of 100 rolls of a 10-sided die. This
distribution is roughly symmetric with no obvious modes. Don’t worry about
the small differences in the number of dots for each die roll—this is bound
to happen just by chance even if the frequencies should be the same. A
distribution with this shape can be called “approximately uniform.” The
second dotplot shows the number of calories in one serving of whole wheat
or multigrain bread. This distribution is skewed to the left with a peak at 220
calories.
Displaying Quantitative Data

Example
+
 Alternate
U.K
Place
South Africa
Example, page 32
Compare the distributions of
household size for these
two countries. Don’t forget
your SOCS!
Displaying Quantitative Data
Distributions
 Some of the most interesting statistics questions
involve comparing two or more groups.
 Always discuss shape, center, spread, and
possible outliers whenever you compare
distributions of a quantitative variable.
+
 Comparing
bottom
top
56
70
84
98
112
EnergyCost
126
140
Compare the distributions of annual energy costs for these two
types of refrigerators. Don’t forget your SOCS!
Solution: Shape: The distribution for bottom freezers looks skewed to the
right and possibly bimodal, with modes near $58 and $70 per year. The
distribution for top freezers looks roughly symmetric with its main peak
centered around $55. Center: The typical energy cost for the bottom
freezers is higher than the typical cost for the top freezers (median of $69
vs. median of $55). Spread: There is much more variability in the
energy costs for bottom freezers, since the range is $101 compared to
$17 for the top freezers. Outliers: There are a couple of bottom freezers
with unusually high energy costs (over $140 per year). There are no
outliers for the top freezers.
Displaying Quantitative Data
How do the annual energy costs (in dollars) compare for refrigerators
with top freezers and refrigerators with bottom freezers? The data
below is from the May 2010 issue of Consumer Reports.
Type

ExampleDotplot of EnergyCost vs Type
+
 Alternate
Another simple graphical display for small data sets is a
stemplot. Stemplots give us a quick picture of the distribution
while including the actual numerical values.
How to Make a Stemplot
1)Separate each observation into a stem (all but the final
digit) and a leaf (the final digit).
2)Write all possible stems from the smallest to the largest in a
vertical column and draw a vertical line to the right of the
column.
3)Write each leaf in the row to the right of its stem.
4)Arrange the leaves in increasing order out from the stem.
5)Provide a key that explains in context what the stems and
leaves represent.
Displaying Quantitative Data

(Stem-and-Leaf Plots)
+
 Stemplots
These data represent the responses of 20 female AP
Statistics students to the question, “How many pairs of
shoes do you have?” Construct a stemplot.
50
26
26
31
57
19
24
22
23
38
13
50
13
34
23
30
49
13
15
51
1
1 93335
1 33359
2
2 664233
2 233466
3
3 1840
3 0148
4
4 9
4 9
5
5 0701
5 0017
Stems
Add leaves
Order leaves
Key: 4|9
represents a
female student
who reported
having 49
pairs of shoes.
Add a key
Displaying Quantitative Data

(Stem-and-Leaf Plots)
+
 Stemplots
Stems and Back-to-Back Stemplots
When data values are “bunched up”, we can get a better picture of
the distribution by splitting stems.

Two distributions of the same quantitative variable can be
compared using a back-to-back stemplot with common stems.
Females
Males
50
26
26
31
57
19
24
22
23
38
14
7
6
5
12
38
8
7
10
10
13
50
13
34
23
30
49
13
15
51
10
11
4
5
22
7
5
10
35
7
0
0
1
1
2
2
3
3
4
4
5
5
Females
“split stems”
333
95
4332
66
410
8
9
100
7
Males
0
0
1
1
2
2
3
3
4
4
5
5
4
555677778
0000124
2
58
Key: 4|9
represents a
student who
reported
having 49
pairs of shoes.
Displaying Quantitative Data

+
 Splitting



Which gender is taller, males or females? A sample of 14-year-olds
from the United Kingdom was randomly selected using the
CensusAtSchool website.
Here are the heights of the students (in cm):
Male: 154, 157, 187, 163, 167, 159, 169, 162, 176, 177, 151, 175,
174, 165, 165, 183, 180
Female: 160, 169, 152, 167, 164, 163, 160, 163, 169, 157, 158,
153, 161, 165, 165, 159, 168, 153, 166, 158, 158, 166
Here are two stemplots for Male
heights, one with split stems:
Here is a back-to-back stemplot
comparing male and female heights:
Female
Male
15 1479
15 14
332 15 14
16 235579
15 79
98887 15 79
17 4567
16 23
433100 16 23
18 037
16 5579
Key: 15|1 represents a
male who is 151 cm
tall.
99876655 16 5579
17 4
17 4
17 567
17 567
18 03
18 03
18 7
18 7
Displaying Quantitative Data

Example – Who’s Taller?
+
 Alternate
Stems and Back-to-Back Stemplots
When data values are “bunched up”, we can get a better picture of
the distribution by splitting stems.

Two distributions of the same quantitative variable can be
compared using a back-to-back stemplot with common stems.
Females
Males
50
26
26
31
57
19
24
22
23
38
14
7
6
5
12
38
8
7
10
10
13
50
13
34
23
30
49
13
15
51
10
11
4
5
22
7
5
10
35
7
0
0
1
1
2
2
3
3
4
4
5
5
Females
“split stems”
333
95
4332
66
410
8
9
100
7
Males
0
0
1
1
2
2
3
3
4
4
5
5
4
555677778
0000124
2
58
Key: 4|9
represents a
student who
reported
having 49
pairs of shoes.
Displaying Quantitative Data

+
 Splitting

Quantitative variables often take many values. A graph of the
distribution may be clearer if nearby values are grouped
together.
The most common graph of the distribution of one
quantitative variable is a histogram.
How to Make a Histogram
1)Divide the range of data into classes of equal width.
2)Find the count (frequency) or percent (relative frequency) of
individuals in each class.
3)Label and scale your axes and draw the histogram. The
height of the bar equals its frequency. Adjacent bars should
touch, unless a class contains no individuals.
Displaying Quantitative Data

+
 Histograms

a Histogram
The table on page 35 presents data on the percent of
residents from each state who were born outside of the U.S.
Class
Count
0 to <5
20
5 to <10
13
10 to <15
9
15 to <20
5
20 to <25
2
25 to <30
1
Total
50
Number of States
Frequency Table
Percent of foreign-born residents
Displaying Quantitative Data
 Making
+
Example, page 35

a Histogram
The table presents data on the average points scored per
game (PTSG) for the 30 NBA teams in the 2009-2010 regular
season..
101.7
99.2
95.3
97.5
102.1
102
106.5
94
108.8
102.4
100.8
95.7
101.7
102.5
96.5
97.7
98.2
92.4
100.2
102.1
101.5
102.8
97.7
110.2
98.1
100
101.4
104.1
104.2
96.2
Displaying Quantitative Data
 Making
+
Calculator Example
Here are several cautions based on common mistakes
students make when using histograms.
Cautions
1)Don’t confuse histograms and bar graphs.
2)Don’t use counts (in a frequency table) or percents (in a
relative frequency table) as data.
3)Use percents instead of counts on the vertical axis when
comparing distributions with different numbers of
observations.
4)Just because a graph looks nice, it’s not necessarily a
meaningful display of data.
Displaying Quantitative Data

Histograms Wisely
+
 Using
+
Section 1.2
Displaying Quantitative Data with Graphs
Summary
In this section, we learned that…

You can use a dotplot, stemplot, or histogram to show the distribution
of a quantitative variable.

When examining any graph, look for an overall pattern and for notable
departures from that pattern. Describe the shape, center, spread, and
any outliers. Don’t forget your SOCS!

Some distributions have simple shapes, such as symmetric or skewed.
The number of modes (major peaks) is another aspect of overall shape.

When comparing distributions, be sure to discuss shape, center, spread,
and possible outliers.

Histograms are for quantitative data, bar graphs are for categorical data.
Use relative frequency histograms when comparing data sets of different
sizes.
+
Looking Ahead…
In the next Section…
We’ll learn how to describe quantitative data with
numbers.
Mean and Standard Deviation
Median and Interquartile Range
Five-number Summary and Boxplots
Identifying Outliers
We’ll also learn how to calculate numerical summaries
with technology and how to choose appropriate
measures of center and spread.