Transcript SoTL Brown Bag - Juniata College
Gerald Kruse, Ph.D. & Cathy Stenson, Ph.D.
Juniata College Mathematics Department
CityMPG = EPA's estimated miles per gallon for city driving Weight = Weight of the car (in pounds) FuelCapacity = Size of the gas tank (in gallons) QtrMile = Time (in seconds) to go 1/4 mile from a standing start Acc060 = Time (in seconds) to accelerate from zero to 60 mph PageNum = Page number on which the car appears in the buying guide
Place the letter for each pair on the chart below to indicate your guess as to the direction (negative, neutral, or positive) and strength of the association between the two variables. (a) Weight vs. CityMPG (b) Weight vs. FuelCapacity (c) PageNum vs. Fuel Capacity (d) Weight vs. QtrMile (e) Acc060 vs. QtrMile (f) CityMPG vs. QtrMile Strong Negative Moderate Negative Weak Negative No Association Weak Positive Moderate Positive Strong Positive
Matrix Plot - Car Data 26.75
CityMPG 20.25
Scatterplot Matrix
3570 Weight 2420 20.35
13.65
FuelCap 17.85
15.35
10.775
7.325
QtrMile 202 108 2 0 .2 5 26 .75
24 20 35 70 1 3 .6 5 20 .35
15 .35
1 7 .8 5 Acc060 7.3
25 10 .77
5 PageNum 10 8 2 0 2
Place the letter for each pair on the chart below to indicate your guess as to the direction (negative, neutral, or positive) and strength of the association between the two variables. (a) Weight vs. CityMPG (b) Weight vs. FuelCapacity (c) PageNum vs. Fuel Capacity (d) Weight vs. QtrMile (e) Acc060 vs. QtrMile (f) CityMPG vs. QtrMile Strong Negative (a) Moderate Negative (d) Weak Negative No Association (c) Weak Positive Moderate Positive (f ) Strong Positive (b) , (e)
Measure of Correlation
Definition: The correlation , r, measures the strength of linear association between two quantitative variables.
r
n
1 1
X S X X
Y S Y Y
Measure of Correlation
X
mean of X values Y S
X
mean of
Std Dev Y of values X values S
Y
Std Dev of Y values
Sample Correlations in 1999 Car Data
CityMPG Weight FuelCap QtrMile Acc060 Weight -0.907
FuelCap -0.793 0.894
QtrMile 0.510 -0.450 -0.469
Acc060 0.506 -0.454 -0.465 0.994
PageNum 0.283 -0.237 -0.081 0.196 0.205
Place the letter for each pair on the chart below to indicate your guess as to the direction (negative, neutral, or positive) and strength of the association between the two variables. (a) Weight vs. CityMPG (b) Weight vs. FuelCapacity (d) Weight vs. QtrMile (e) Acc060 vs. QtrMile (c) PageNum vs. Fuel Capacity (f) CityMPG vs. QtrMile Strong Negative Moderate Negative Weak Negative No Association Weak Positive r “between” -1.0 and -0.8
(a) = -0.907
r “between” -0.8 and -0.5
(d) = -0.450
r “between” -0.5 and 0 r “around” 0
(c) = -0.081
r “between” 0 and 0.5
Moderate Positive r “between” 0.5 and 0.8
(f) = 0.510
Strong Positive r “between” 0.8 and 1.0
(b) = 0.894
(e) = 0.994
1) -1 ≤ r ≤ 1 2) The sign indicates the direction of association positive association: r > 0 negative association: r < 0 no linear association: r approx 0 3) The closer r is to ± 1, the stronger the
linear
association
4)
r has no units and does not depend on the units of measurement 5) The correlation between X and Y is the same as the correlation between Y and X
(0)
faculty.juniata.edu/kruse
(1) Open the Excel file:
ConsumerReportsCarData1999.xlsx
(2) Highlight column C, City MPG (3) CTRL – click and highlight column F, Weight (4) Insert -> Scatter -> Scatterplot (5) Remove legend (6) “Zoom” on axes (7) Add axes titles (8) Modify plot title, “City MPG vs. Weight” (9) Add trendline
We were given that the r-value for this data is -0.907.
Excel calculated R 2 as 0.8225?
Let’s take the square root… 0.906918, which if we round and add the negative sign for the slope, is what we would expect.
We could also calculate the r-value: (1) using the Data Analysis Add-In in Excel (2) by “hand,” in Excel
A correlation near zero does not (necessarily) mean that the two variables are unrelated.
EXAMPLE: A circus performer (the Human Cannonball) is interested in how the distance downrange (Y) that a projectile shot from a cannon will travel depends on the angle of elevation (X) of the cannon. Suppose that we designed an experiment to examine this relationship by test firing (dummies) at various angles ranging from X=0 o to X=90 o . Sketch a typical scatterplot that you might expect to see from such an experiment.
Would you say that there is likely to be a strong relationship between angle X and distance downrange Y? Estimate the correlation between the X and Y variables from your scatterplot.
Remember: Correlation measures the strength of linear association between two variables.
Y X Y 0 deg 90 deg X
http://stat.duke.edu/courses/Fall12/sta101.002/Sec2-145.pdf
A strong correlation does not (necessarily) imply a cause/effect relationship.
Life Expectancy vs. People/TV
250 200 150 100 50 0 40 -50 45 50 55 60 65
Life Expectancy (years)
y = -5,5887x + 413,83 R² = 0,6461 R = -0.8038
70 75 80 Would you agree that there is a fairly strong negative association between these two variables?
Given this association, would it be reasonable to set a foreign policy goal to send lots of TV's to the countries with lowest life expectancies, thus decreasing the number of people per TV and thereby helping the inhabitants to live longer lives? http://www.public.iastate.edu/~pcaragea/S226S09/Notes/student.notes.section2.4.pdf
A strong correlation does not (necessarily) imply a cause/effect relationship.
http://www.nbcnews.com/id/41479869/ns/health diet_and_nutrition/t/daily-diet-soda-tied-higher-risk-stroke heart-attack/
The following web-page has a Java applet which can be used to construct scatterplots and calculate Pearson’s Correlation Coefficient. http://illuminations.nctm.org/LessonDetail.aspx?ID=L456
1) Coefficient of Correlation lies between -1 and +1 2) Coefficients of Correlation are independent of Change of Origin and Scale 3) Coefficients of Correlation possess the property of Symmetry 4) Co-efficient of Correlation measures only linear correlation between X and Y 5) If two variables X and Y are independent, coefficient of correlation between them will be zero.