No Slide Title

Download Report

Transcript No Slide Title

Dealing With a Lot of Numbers…
• Summarizing the data will help us when
we look at large sets of quantitative data.
• Without summaries of the data, it is
difficult to grasp what the data tell us.
• In this chapter, we concentrate on
graphical displays of quantitative data.
Copyright © 2004 Pearson Education, Inc.
Slide 4-1
Distributions and Histograms
• First, slice up the entire span of values
covered by the quantitative variable into
equal-width piles called bins.
• The bins and the counts in each bin give
the distribution of the quantitative variable.
Copyright © 2004 Pearson Education, Inc.
Slide 4-2
Distributions and Histograms (cont.)
• A histogram plots the bin counts as the
heights of bars (like a bar chart).
• A relative frequency histogram displays
the percentage of cases in each bin
instead of the count.
– In this way, relative frequency histograms are
faithful to the area principle.
Copyright © 2004 Pearson Education, Inc.
Slide 4-3
Histogram Example
• The figure shows the first 36 months of Enron
monthly stock price changes. (Later, we will
examine these same data in something called a
stem-and-leaf display.)
Copyright © 2004 Pearson Education, Inc.
Slide 4-4
Stem-and-Leaf Displays
• Stem-and-leaf displays show the
distribution of a quantitative variable, like
histograms do, while preserving the
individual values.
• Stem-and-leaf displays contain all the
information found in a histogram and,
when carefully drawn, satisfy the area
principle and show the distribution.
Copyright © 2004 Pearson Education, Inc.
Slide 4-5
Constructing a Stem-and-Leaf Display
• First, cut each data value into leading
digits (“stems”) and trailing digits
(“leaves”).
• Use the stems to label the bins.
• Use only one digit for each leaf—either
round or truncate the data values to one
decimal place after the stem.
Copyright © 2004 Pearson Education, Inc.
Slide 4-6
Stem-and-Leaf Example
• In the figure, 2|124
stands for the
numbers $2.1, $2.2,
and $2.4.
– The stem tells us we
are in the $2 range.
– Each leaf gives us the
Enron stock price
change to the nearest
dime.
Copyright © 2004 Pearson Education, Inc.
Slide 4-7
Dotplots
• A dotplot is a simple
display. It just places
a dot for each case in
the data.
• The dotplot to the
right shows Kentucky
Derby winning times,
plotting each race as
its own dot.
Copyright © 2004 Pearson Education, Inc.
Slide 4-8
Shape, Center, and Spread
• When describing a distribution, make sure
to always tell about three things: shape,
center, and spread…
Copyright © 2004 Pearson Education, Inc.
Slide 4-9
The Shape of the Distribution
•
When talking about the shape of the
distribution, make sure to address the
following three questions:
1. Does the histogram have a single, central
hump or several separated bumps?
2. Is the histogram symmetric?
3. Do any unusual features stick out?
Copyright © 2004 Pearson Education, Inc.
Slide 4-10
Humps and Bumps
1. Does the histogram have a single,
central hump or several separated
bumps?
– Humps in a histogram are called modes.
– A histogram with one main peak is dubbed
unimodal; histograms with two peaks are
bimodal; histograms with three or more
peaks are called multimodal.
Copyright © 2004 Pearson Education, Inc.
Slide 4-11
Humps and Bumps (cont.)
• A bimodal histogram has two apparent peaks:
Copyright © 2004 Pearson Education, Inc.
Slide 4-12
Humps and Bumps (cont.)
• A histogram that doesn’t appear to have any
mode and in which all the bars are
approximately the same height is called uniform:
Copyright © 2004 Pearson Education, Inc.
Slide 4-13
Symmetry
2. Is the histogram symmetric?
– If you can fold the histogram along a vertical
line through the middle and have the edges
match pretty closely, the histogram is
symmetric.
Copyright © 2004 Pearson Education, Inc.
Slide 4-14
Symmetry (cont.)
– The (usually) thinner ends of a distribution are called
the tails. If one tail stretches out farther than the other,
the histogram is said to be skewed to the side of the
longer tail.
– In the figure below, the histogram on the left is said to
be skewed left, while the histogram on the right is
said to be skewed right.
Copyright © 2004 Pearson Education, Inc.
Slide 4-15
Anything Odd?
3. Do any unusual features stick out?
– Believe it or not, sometimes it’s the unusual
features that tell us something interesting or
exciting about the data.
– You should always mention any stragglers,
or outliers, that stand off away from the body
of the distribution.
– Are there any gaps in the distribution? If so,
we might have data from more than one
group.
Copyright © 2004 Pearson Education, Inc.
Slide 4-16
Center and Spread
• Center: If you had to pick a single number
to describe all the data what would you
pick?
• Spread: Since statistics is about variation,
how spread out is the distribution?
Copyright © 2004 Pearson Education, Inc.
Slide 4-17
Comparing Distributions
• Often we would like to compare two or
more distributions instead of looking at
one distribution by itself.
• When looking at two or more distributions,
it is very important that the histograms
have been put on the same scale.
Otherwise, we cannot really compare the
two distributions.
Copyright © 2004 Pearson Education, Inc.
Slide 4-18
Order, Please!
• For some data sets, we are interested in
how the data behave over time. In these
cases, we construct timeplots of the data.
Copyright © 2004 Pearson Education, Inc.
Slide 4-19
*Re-expressing Skewed Data
Figure 4.12
Copyright © 2004 Pearson Education, Inc.
Slide 4-20
*Re-expressing Skewed Data (cont.)
• One way to make a skewed
distribution more symmetric
is to re-express or transform
the data by applying a
simple function
(e.g., logarithmic function).
• Note the change in
skewness from the raw data
(Figure 4.12) to the
transformed data
(Figure 4.13):
Figure 4.13
Copyright © 2004 Pearson Education, Inc.
Slide 4-21
What Can Go Wrong?
• Don’t make a histogram of a categorical
variable—bar charts or pie charts should
be used for categorical data.
• Choose a scale appropriate to the data.
• Avoid inconsistent scales, either within the
display or when comparing two displays.
• Label clearly so a reader knows what the
plot displays.
Copyright © 2004 Pearson Education, Inc.
Slide 4-22
Key Concepts
• Quantitative variables can be displayed using
histograms, dotplots, and/or stem-and-leaf
displays. These displays help us to see the
distributions of the variables. Timeplots help us
to see patterns in the data over time.
• Consider three things when looking at these
displays: shape, center, and spread.
• Distributions can be classified as symmetric or
skewed (look at how the tails behave with
respect to the rest of the distribution).
Copyright © 2004 Pearson Education, Inc.
Slide 4-23
Key Concepts (cont.)
• A mode is a hump or local high point in the
shape of the distribution:
– unimodal (one mode)
– bimodal (two modes)
– multimodal (more than two modes)
– uniform (relatively flat, no mode)
• Be on the lookout for outliers (extreme
values that stand off away from the bulk of
the data).
Copyright © 2004 Pearson Education, Inc.
Slide 4-24