Transcript Chapter 7

Chapter 7
Scatterplots, Association,and Correlation
Scatterplots
A scatterplot shows the relationship between two quantitative variables
measured on the same case.
It shows patterns, trends, and relationships.
Association
Direction: Positive direction/association means as one variable increases, so
does the other. If it’s negative, when one variable increases, the other
decreases.
Form: Should be mostly in a straight line. If it drastically curves, it isn’t
useful.
Strength: A scatterplot is strong if there is little scatter.
Variables
The response variable (explains or predicts) goes on y-axis.
The explanatory variable goes on x-axis.
A lurking variable is a variable other than x and y that affects both variables,
accounting for the correlation between the two.
Variables can have a strong association but still have a small correlation if the
association isn’t linear.
Correlation
Correlation numerically measures the direction and strength of the linear
relationship between the explanatory and response variables.
/
Equation: r = E zx zy n-1
Correlation is always between -1 and +1.
A correlation near zero corresponds to a weak linear association.
Correlation
Strong correlation
Weak correlation
No correlation
Outliers
Look for unusual features such as clusters/subgroups and outliers(point that
doesn’t fit pattern).
When you see an outlier, report the correlations with and without the point.
Correlation is sensitive to outliers. A single outlying value can make a small
correlation large or make a large one small.
Problem #7
A study examined brain size(measured as
pixels counted in a digitized resonance
image of a cross-section of the brain) and
IQ (4 Performance scales of the Weschler
IQ test) for college students. The
scatterplot shows the performance IQ
scores vs. the brain size. Comment on the
association between brain size and IQ as
seen in this scatterplot.
Problem #7
Answer: The data points are very spread out. It has an extremely weak
positive correlation between the brain size and performance IQ. It does not
have a good form; it is not a good scatterplot. The data points are so scattered,
it may have no correlation.
Problem #9
A ceramics factory can fire eight large
batches of pottery a day. Sometimes in
the process a few of the pieces break.
In order to understand the problem
better, the factory records the number
of broken pieces in each batch for 3
days and then creates the scatterplot
shown.
Problem #9
a. Make a histogram showing the distribution of the number of broken pieces
in the 24 batches of pottery examined.
Answer:
Problem #9
b. Describe the distribution as shown in the histogram. what feature of the
problem is more apparent in the histogram than in the scatterplot?
Answer: The histogram is unimodal. Besides the first broken piece, it is
uniformed. It is skewed to the right, because as the number of batches increase,
the number of broken pieces decrease.
Problem #9
C. What aspect of the company’s problem is more apparent in the
scatterplot?
Answer: There is a positive but weak correlation between the number of
batches and the number of broken pieces. You know this because as one
variable increases, so does the other, but the data points are very scattered out
and do not make a clear line, so it is weak.