No Slide Title

Download Report

Transcript No Slide Title

Chapter 5
Understanding and
Comparing Distributions
.
The Big Picture

Below is a histogram of the Average Wind Speed at Hopkins Forest
in Western Massachusetts, for every day in 1989.
Slide 5- 2
The Big Picture (cont)

The distribution is:





High value may be an outlier
Median daily wind speed =1.90 mph
IQR is 1.78 mph
Slide 5- 3
The Five-Number Summary


The
of a distribution reports its
median, quartiles, and
minimum and maximum
Example: The five-number
summary for the daily wind
speed is:
Max
8.67
Q3
2.93
Median
1.90
Q1
1.15
Min
0.20
Slide 5- 4
Daily Wind Speed: Making Boxplots

A
is a graphical display of the fivenumber summary.

Boxplots are particularly useful when comparing
groups.
Slide 5- 5
Constructing Boxplots
Five number summary : 0.20, 1.15, 1.90, 2.93, 9.67
•
•
•
Draw a single vertical
axis spanning the range
of the data.
Draw short horizontal
lines at the lower and
upper quartiles and at the
median.
Then connect them with
vertical lines to form a
box.
Slide 5- 6
Constructing Boxplots (cont.)
Five number summary : 0.20, 1.15, 1.90, 2.93, 9.67
•
•
•
•
Sketch “fences” around the
main part of the data.
The upper fence is 1.5 IQRs
above the upper quartile.
The lower fence is 1.5 IQRs
below the lower quartile.
Note: the fences only help
with constructing the boxplot
and should not appear in the
final display.
Slide 5- 7
Constructing Boxplots (cont.)
• Use the fences to grow
“whiskers.”
• Draw lines from the ends
of the box up and down to
the minimum and
maximum data values
found
• If a data value falls outside
one of the fences, we do
not connect it with a
whisker.
Slide 5- 8
Constructing Boxplots (cont.)
•
•
Add the outliers by
displaying any data
values beyond the
fences with special
symbols.
We often use a different
symbol for “far outliers”
that are farther than 3
IQRs from the quartiles.
Slide 5- 9
Wind Speed: Making Boxplots (cont.)

Let us compare the histogram and boxplot for daily wind
speeds:
Slide 5- 10
Comparing Groups

It is always more interesting to compare groups.
With histograms, note the shapes, centers, and spreads
of the two distributions.

What does this graphical display tell you?

Slide 5- 11
Comparing Groups (cont)


Boxplots hide the details while displaying the overall summary
information.
We often plot them side by side for groups or categories we wish
to compare.
Slide 5- 12
What About Outliers?

If there are any clear outliers and you are
reporting the mean and standard deviation
 Report with the outliers present and with the
outliers removed

Note: The median and IQR are not likely to be
affected by the outliers.
Slide 5- 13
Timeplots: Order, Please!

For some data sets, we are interested in how the data
behave over time. In these cases, we construct
of the data.
Slide 5- 14
Re-expressing Skewed Data to
Improve Symmetry

One way to make a skewed distribution more
symmetric is to
or
the data
 Apply a simple function (e.g., logarithmic
function).
Slide 5- 15
Re-expressing Skewed Data to Improve Symmetry
(cont.)

A logarithmic function was
applied to each of the
observations of the data
displayed in the previous
slide.

Note the change in
from the
raw data (previous slide)
to the
data (left).
Slide 5- 16
What Can Go Wrong?

Avoid inconsistent
scales


Beware of outliers

Be careful when
comparing groups
with very different
spreads
Slide 5- 17
What have we learned?



We’ve learned the value of comparing data groups and
looking for patterns among groups and over time
We’ve seen that boxplots are very effective for comparing
groups graphically
We’ve experienced the value of identifying and
investigating outliers
Slide 5- 18
Practice Exercise - Chapter 5

A survey conducted in a college intro stats class
during Autumn 2003 asked students about the
number of credit hours they were taking that
quarter. The number of credit hours for a random
sample of 16 students is
10
17
10
17
12
19
14
20
15
20
15
20
15
20
15
22
Slide 5- 19
Practice Exercise - Chapter 5 (cont)
a. Find the five number summary for the data above
b. Find the IQR for the data
c. From parts (a) and (b), are there any outliers in
the data?
d. Create a boxplot of these data.
Slide 5- 20
Practice Exercise - Chapter 5 (cont)
10
17
10
17
12
19
14
20
15
20
15
20
15
20
15
22
a. Find the 5 number summary:
Slide 5- 21
Practice Exercise - Chapter 5 (cont)
To find quartiles, divide data into 2 even sets
1st: 10 10 12 14 15 15 15 15
2nd: 17 17 19 20 20 20 20 22


To find Q1 we find the median of the first set of
numbers above:
→ Q1 =
To find Q3 we find the median of the second set
of numbers:
→ Q3 =
Slide 5- 22
Practice Exercise - Chapter 5 (cont)
a. Five number summary:
Slide 5- 23
Practice Exercise - Chapter 5 (cont)
b. Find the IQR of the data.
IQR =
=
=
Slide 5- 24
Practice Exercise - Chapter 5 (cont)
c. From parts (a) and (b), are there any outliers in
the data?
To determine if there are outliers we need to
calculate the values of the fences.
Lower fence
=
=
=
Slide 5- 25
Practice Exercise - Chapter 5 (cont)
Upper fence

=
=
=
Q3 + 1.5 x IQR
Are there any observation outside the fences?
 None of the observations lie outside the
fences, hence
in the data
Slide 5- 26
Practice Exercise - Chapter 5 (cont)
35
d. Create a boxplot
of these data.
Min = 10
Q1 = 14.5
Median = 16
Q3 = 20
Max = 22
Lower fence = 5.75
Upper fence = 28.25
30
25
20
15
10
5
0
Slide 5- 27