Interpreting Scatterplots

Download Report

Transcript Interpreting Scatterplots

Chapter 3: Describing Relationships
Section 3.1
Scatterplots and Correlation

Explanatory and Response Variables
Definition:
A response variable (y) measures an
outcome of a study. An explanatory
variable (x) may help explain or influence
changes in a response variable.
SCATTERPLOTS AND CORRELATION
Most statistical studies examine data on more than one
variable. In many of these settings, the two variables
play different roles.

Displaying Relationships: Scatterplots
The most useful graph for displaying the relationship
between two quantitative variables is a scatterplot.
Definition:
A scatterplot shows the relationship between two
quantitative variables measured on the same individuals.
The values of one variable appear on the horizontal axis,
and the values of the other variable appear on the vertical
axis. Each individual in the data appears as a point on the
graph.

How to Make a Scatterplot
1. Decide which variable should go on each axis.
• Remember, the eXplanatory variable goes on the
X-axis!
2. Label and scale your axes.
3. Plot individual data values.
NOTE:
The axes need not intersect at (0,0).
For
each of the axes, the scale should be chosen
so that the minimum and maximum values on the
scale are convenient and the values to be plotted
are between the two values.

Displaying Relationships: Scatterplots
Make a scatterplot of the relationship between body
weight and pack weight.
Since Body weight is our eXplanatory variable, be sure
to place it on the X-axis!
Body weight (lb)
Backpack weight (lb)
120
187
109
103
131
165
158
116
26
30
26
24
29
35
31
28


Interpreting Scatterplots
For the distribution of a single quantitative variable,
“shape, center, spread, outliers” (SOCS) has been a useful
summary.
How to Examine a Scatterplot
As in any graph of data, look for the overall pattern and for
striking departures from that pattern.
• You can describe the overall pattern of a scatterplot by
the direction, shape, and strength of the relationship.
• An important kind of departure is an outlier, an
individual value that falls outside the overall pattern of
the relationship. (Ask yourself, “Is there a striking
exception to the overall pattern?”)
DESCRIBING STRENGTH

Describe the strength of the relationship. If the
points cluster closely around an imaginary line,
the association is strong. If the points are
scattered farther from the line, the association is
weak.
DESCRIBING DIRECTION



Definition:
Two variables have a positive association when
above-average values of one tend to accompany
above-average values of the other, and when belowaverage values also tend to occur together. (i.e.,
Generally speaking, the y values tend to increase as
the x values increase.)
Two variables have a negative association when
above-average values of one tend to accompany
below-average values of the other. (i.e., Generally
speaking, the y values tend to decrease as the x values
increase.)

Interpreting Scatterplots
Outlier
 There is one possible outlier, the hiker
with the body weight of 187 pounds
seems to be carrying relatively less
weight than are the other group
members.
Strength
Direction
Form
 There is a moderately strong, positive, linear
relationship between body weight and pack weight.
 It appears that lighter students are carrying lighter
backpacks.

Interpreting Scatterplots
Consider the SAT example
from page 144. Interpret the
scatterplot.
Strength
Direction
Form
There is a moderately strong,
negative, curved relationship
between the percent of students in a
state who take the SAT and the
mean SAT math score.
Further, there are two distinct
clusters of states and two possible
outliers that fall outside the overall
pattern.
EXAMPLE
S ta nda rd
A sample of one-way Greyhound
bus fares from Rochester, NY to
cities less than 750 miles away
was taken by going to
Greyhound’s website. The
following table gives the
destination city, the distance and
the one-way fare.
Which variable is the
explanatory?
O ne -W a y
D e stina tio n C ity
D ista nc e
A lba ny, N Y
240
F a re
39
B a ltim o re , M D
430
81
B uffa lo , N Y
69
17
C hic a g o , IL
607
96
C le v e la nd, O H
257
61
M o ntre a l, Q U
480
7 0 .5
N e w Y o rk C ity, N Y
340
65
O tta w a , O N
467
82
P hila de lphia , P A
335
67
P o tsda m , N Y
239
47
S yra c use , N Y
95
20
To ro nto , O N
178
35
W a shing to n, D C
496
87
Response?
12
EXAMPLE SCATTERPLOT
$100
Greyhound Bus Fares Vs. Distance
$90
Standard One-Way Fare
$80
$70
$60
$50
$40
$30
$20
$10
50
150
250
350
450
550
Distance from Rochester, NY (miles)
Verify the plot on your graphing calculator
Copyright © 2005 Brooks/Cole, a division of
Thomson Learning, Inc.
13
650
FURTHER COMMENTS
It is possible that two points might have the same x value
with different y values. Notice that Potsdam (239) and
Albany (240) come very close to having the same x value
but the y values are $8 apart. Clearly, the value of y is not
determined solely by the x value (there are factors other
than distance that affect the fare).
In this example, the y value tends to increase as x
increases. We say that there is a positive relationship
between the variables distance and fare.
It appears that the y value (fare) could be predicted
reasonably well from the x value (distance) by finding a
line that is close to the points in the plot.
14