Transcript Lecture 4: Correlation and Regression
Lecture 4: Correlation and Regression
Laura McAvinue School of Psychology Trinity College Dublin
Correlation
• Relationship between two variables – Do two variables co-vary / co-relate?
• Is mathematical ability related to IQ?
• Are depression and anxiety related?
– Does variable Y vary as a function of variable X?
• Does error awareness vary as a function of ability to sustain attention?
• Does accuracy of memory decline with age?
Correlation
• Direction – Do both variables move in the same direction?
– Do they move in opposite directions?
• Degree – What is the degree or strength of the relationship?
• Analysis – Scatterplot – Correlation Coefficient • Statistical significance
Scatterplot
Describe the relationship between the two variables using a scatterplot Visual representation of the relationship between the variables Plot each observation in the study, displaying its value on variable X and variable Y Place the predictor variable on the X axis The independent variable, which is making the prediction Place the criterion variable on the Y axis The dependent variable, which is being predicted
Participant 1 2 3
6 5 4 Dep 3 2 1
Anxiety 1 3 6 Depression 1 3 6
1 2 3 4 5 6 Anx
No Relationship
Random Scatter
Positive Relationship
Direction in scatter
Negative Relationship
Direction in Scatter
Sometimes, the direction of the relationship might not be as obvious…
70 60 50 40 30 20 10 0 0
What is the relationship between verbal coherence and the number of pints of beer consumed?
2 4 No. of Pints 6 8 10
Regression Line
• Useful to add a regression line – Model of the relationship – Straight line that best represents the relationship between the two variables • ‘The line of best fit’ – Helps us to understand the direction of the relationship
Adding the regression line helps us see the direction of the relationship 70 60 50 40 30 20 10 0 0 2 4 No. of Pints 6 8 10
Direction of Relationship
• Positive – Two variables tend to move in the same direction • As X increases, Y also increases • As X decreases, Y also decreases • Negative – Two variables tend to move in opposite directions • As X increases, Y decreases • As X decreases, Y increases
A Positive Relationship
A Negative Relationship
Degree of Relationship
• Degree or strength of relationship – Calculate a correlation coefficient • Pearson Product-Moment Correlation Coefficient (
r
) – Statistic that varies between -1 and 1 •
r
= 0, no relationship between the variables – Change in X is not associated with systematic change in Y •
r
= 1, perfect positive correlation – Increase in X associated with systematic increase in Y •
r
= -1, perfect negative correlation – Increase in X associated with systematic decrease in Y
Interpretation of
r
Perfect Negative relationship -1 0 Absolutely No relationship +1 Perfect Positive relationship Closer Pearson r is to one of the extremes, the stronger the relationship between the variables
Calculation of Pearson
r
• Based on the covariance – A statistic representing the degree to which two variables vary together – Based on how an observation deviates from the mean on each variable
Calculation of Pearson
r
• Covariance is not suitable as measure of degree of relationship – Absolute value is a function of standard deviations – Scale the covariance by the standard deviations • Pearson
r
Assessing Magnitude of
r
• Cohen’s (1988) standards • Small Medium .1 - .29
.3 - .49
Large .5 - 1 • Statistical Significance – Test the null hypothesis that the true correlation in the population (
rho
) is zero • H o : ρ = 0 – Calculate the probability of obtaining a correlation of this size if the true correlation is zero – If
p
< .05, reject H o and conclude that it is unlikely that the results are due to chance, the correlation obtained represents a true correlation in the population
Summary
• Interested in the relationship between two variables • Direction and degree of relationship – Scatterplot & regression line • Direction – Correlation Coefficient • Magnitude • Statistical significance
As temperature increases, ice-cream consumption increases
r
= .73 (large)
n
= 12
p
= .007
As temperature increases, hot whiskey consumption decreases
r
= -.908 (large)
n
= 12
p
<.001
Issues to consider
• Assumption of linearity – Pearson correlation assumes there is a linear relationship between the two variables – Assumes the relationship can be represented by a straight line – It is possible that the relationship might be better represented by a curved line • Examine scatterplot – Curve-fitting procedures
160 140 120 100 80 60 40 20 0 0 2 VAR00003 4
Linear?
6 8 10 12 14
Non-linear?
160 140 120 100 80 60 40 20 0 0 2 VAR00003 4 6 8 10 12 14
Non-linear
14 12 10 8 6 4 2 0 0 STRESS 2 4 6 8 10 12
Issues to consider
• Correlation can be affected by – Range restrictions – Heterogeneous subsamples – Extreme observations • Correlation does not mean causation
Regression
• The regression line – A straight line that represents the relationship between two variables – Useful to add to the scatterplot to help us see the direction of the relationship – But it’s much more than this… • Prediction – Regression line enables us to predict Variable Y on the basis of Variable X
Y’ = 45
Regression
• If you have an equation of the line that represents the relationship between Variables X & Y, you can use it to predict a value of Y given a certain value of X.
X = 63
Regression Equation
ˆ
bX
a
Predicted value of Y Predicting value of X Regression Coefficients The basic equation of a line
Regression Equation
ˆ
bX
a
b The slope of the regression line The amount of change in Y associated with a one-unit change in X a The intercept The point where the regression line crosses the Y axis The predicted value of Y when X = 0
a Y’
Regression Equation
b ˆ
bX
a
X
Same intercept, different slopes Same slope, different intercepts
Summary
• The relationship between two variables, X & Y • Correlation – Degree and direction of relationship • Regression – Predict Y, given X – More on regression next lecture…