No Slide Title

Download Report

Transcript No Slide Title

Correlation tests:

Correlation Coefficient:

A succinct measure of the strength of the relationship between two variables (e.g. height and weight, age and reaction time, IQ and exam score).

There are various types of correlation coefficient, for different purposes:

1. Pearson's "r":

Used when both X and Y variables are (a) continuous; (b) (ideally) measurements on interval or ratio scales; (c) normally distributed - e.g. height, weight, IQ.

2. Spearman's rho:

In same circumstances as (1), except that data need only be on an ordinal scale - e.g. attitudes, personality scores.

r is a

parametric

test: the data have to have certain characteristics (parameters) before it can be used.

rho is a

non-parametric

test - less fussy about the nature of the data on which it is performed.

Correlations vary between:

+1 (perfect positive correlation: as X increases, so does Y): Y X

.

.. and -1 (perfect negative correlation: as X increases, Y decreases, or vice versa).

Y X r = 0 means no correlation between X and Y: changes in X are not associated with systematic changes in Y, or vice versa.

Calculating Pearson's r: a worked example:

Is there a relationship between the number of parties a person gives each month, and the amount of flour they purchase from Vinny Millar?

A B C D E F G H I J N=10 Month: 48 32 36 30 40 45 Flour production (X): 37 41 39 34 ΣX = 382 88 80 78 71 75 83 No. of parties (Y): 75 78 74 74 ΣY =776 X 2 Y 2 XY 1369 1681 2304 1024 1296 900 1600 2025 1521 1156 ΣX 2 = 14876 5625 6084 7744 6400 6084 5041 5625 6889 5476 5476 ΣY 2 = 60444 2775 3198 4224 2560 2808 2130 3000 3735 2886 2516 ΣXY = 29832

r

=   

X 2

XY

( 

X

) (  ( 

X

)

2 N

  *

N

  

Y 2 Y

) ( 

Y

)

2 N

 

Using our values (from the bottom row of the table:) N=10 ΣX = 382 ΣY =776 ΣX 2 = 14876 ΣY 2 = 60444 ΣXY = 29832

r

=

29832

 

14876

-

10

2

  *

10

 

60444

-

( )

2

10

 

r

= (

14876

-

29832

-

14592 .

40 29643 .

20

) (

60444

-

60217 .

60

)

r

=

283 .

188 60 .

80

*

226 .

40

=

188 .

80 253 .

391

=

.

7455 r is .75 . This is a positive correlation: people who buy a lot of flour from Vinny Millar also hold a lot of parties (and vice versa).

How to interpret the size of a correlation:

r 2 is the " coefficient of determination ". It tells us what proportion of the variation in the Y scores is associated with changes in X.

e.g., if r is .2, r 2 is 4% (.2 * .2 = .04 = 4%).

Only 4% of the variation in Y scores is attributable to Y's relationship with X. Thus, knowing a person's Y score tells you essentially nothing about what their X score might be.

Our correlation of .75 gives an r 2 of 56%.

An r of .9, gives an r 2 of (.9 * .9 = .81) = 81%.

Note that correlations become much stronger the closer they are to 1 (or -1).

Correlations of .6 or -.6 (r 2 = 36%) are much better than correlations of .3 or -.3 (r 2 = 9%), not merely twice as strong!

Spearman's rho:

Measures the degree of

monotonicity

rather than linearity in the relationship between two variables - i.e., the extent to which there is some kind of change in X associated with changes in Y: Hence, copes better than Pearson's r when the relationship is monotonic but non-linear - e.g.: But not:

Spearman's rho - worked example:

Is there a correlation between the number of vitamin treatments a person has, and their score on a memory test?

Subj: A B C D E F G H N = 8 3 4 3 6 No.vitamin teatments (X): 2 1 5 8 36 49 42 57 Memory test score (Y): 22 34 82 82 Vitamin treatment ranks (X): 2 1 3.5

5 3.5

7 6 8 Memory ranks (Y): 1 2 3 5 4 6 7.5

7.5

D (= X-Y) +1 -1 +0.5

0 -0.5

+1 -1.5

+0.5

D 2 1 1 0.25

0 0.25

1 2.25

0.25

ΣD 2 = 6.0

rho

=

1

-

6

* 

D 2 N 3

-

N

OR

rho

=

1

-

6

*

N

* 

N 2 D 2

( )

1

Step 1: assign ranks to the raw data, for each variable separately.

Rules for ranking:

(a) Give the lowest score a rank of 1; next lowest a rank of 2; etc.

(b) If two or more scores are identical, this is a "tie": give them the average of the ranks they would have obtained had they been different. The next score that is different, gets the rank it would have had if the tied scores had not occurred.

e.g.: raw score "original"rank actual rank: 12 1 1

15 2 2.5

15 3 2.5

16

4 4

Rank for the tied scores is (2+3)/2 = 2.5

17 5 5 raw score "original"rank actual rank: 3 1 1

18 2 3 18 3 3 18 4 3

Rank for the tied scores is (2+3+4)/3 = 3 100 5 5

Step 2:

Subtract one set of ranks from the other, to get a set of differences, D.

Step 3:

Square each of these differences, to get D 2 .

Step 4:

Add up the values of D 2 N = 8.

, to get ΣD 2 . Here, ΣD 2 = 6.0

Step 5: rho

=

1

-

6

D 2 N 3

-

N

=

1

-

6

*

6 .

0 512

-

8

=

1

  

36 504

   =

0 .

929

rho = .93

.

There is a strong positive correlation between the number of vitamin treatments a person has, and their memory test score.

Pearson's r on the same data = .86.

Using SPSS/PASW to obtain scatterplots: (a) simple scatterplot: Graphs > Legacy Dialogs > Scatter/Dot...

Using SPSS/PASW to obtain scatterplots: (a) simple scatterplot: Graphs > Chartbuilder

3. Drag X and Y variables into x-axis and y-axis boxes in chart preview window 1. Pick ScatterDot 2. Drag "Simple scatter" icon into chart preview window.

Using SPSS/PASW to obtain scatterplots: (b) scatterplot with regression line: Analyze > Regression > Curve Estimation...

Model Summary and Parameter Estimates

Dependent Variable: mem ory s core Equation Linear R Square .736

F 16.741

Model Sum m ary df1 1 df2 The independent variable is num ber of vitam in treatm ents .

6 Sig.

.006

Param eter Estim ates Constant 17.167

b1 8.333

" Constant" is the intercept, "b1" is the slope

Using SPSS/PASW to obtain correlations: Analyze > Correlate > Bivariate...

Correlations

number of vitamin treatments Pears on Correlation Sig. (2-tailed) number of vitamin treatments 1 memory s core N Pears on Correlation Sig. (2-tailed) N 8 .858** .006

8 **. Correlation is s ignificant at the 0.01 level (2-tailed).

memory s core .858** .006

8 1 8

Correlations

Spearman's rho number of vitamin treatments memory s core Correlation Coefficient Sig. (2-tailed) N Correlation Coefficient Sig. (2-tailed) N **. Correlation is s ignificant at the 0.01 level (2-tailed).

number of vitamin treatments 1.000

8 .

.928** .001

8 memory s core .928** .001

8 1.000

.

8