Where have you been? - University of Toronto

Download Report

Transcript Where have you been? - University of Toronto

Correlations

Renan Levine POL 242 July 12, 2006

Association

:

Crosstabulation

Nominal X Nominal Nominal X Nominal Nominal X Ordinal Ordinal X Ordinal Ordinal X Ordinal Interval X Interval (not taught yet)

Category Specifics

Only (2 X 2) Greater than (2X2) At least (2 X 3) Square (e.g., 3 X 3) Rectangle (e.g., 3 X 4) -

Symmetry Specification

Symmetrical Symmetrical Asymmetrical -

Measure of Association Phi Cramer's V Cramer's V Kendall's Tau b Kendall's Tau c Pearson's R (not yet taught) Indication of Direction

Yes No No Yes Yes Yes

Today: Correlations

  

Correlation

is a measure of a relationship between variables. Measured with a coefficient [Pearson’s

r

] that ranges from -1 to 1.

Measure strength of relationship of interval or ratio variables

r

= Σ(Zx * Zy)/

n

– 1  

Zx=Z scores for X variable and Z scores for Y variable.

Sum the products and divide by number of paired cases minus one. How to calculate Z scores can be found on-line.

Correlation r

 Absolute values closer to 0 indicate that there is little or no linear relationship.

 Generally, 0.2-0.4 is weak, 0.4-0.6 is okay, 0.6 or higher is strong.

 If correlation is very high, then its probably something related that you might considering indexing or choosing just one variable.

 The closer the coefficient is to the absolute value of 1 the stronger the relationship between the variables being correlated.

Positive Relationship

If two variables are related positively or directly

 

r > 0 Variables “track together” – high values on Variable X are associated with high values on Variable Y.

Low values on X associated with low values.

Example

Robert D. Putnam; Robert Leonardi; Raffaella Y. Nanetti; Franco Pavoncello. “Explaining Institutional Success: The Case of Italian Regional Government.”

The American Political Science Review

77:1 (Mar. 1983), pp. 55-74 More fun examples: http://www.nationmaster.com/correlations/eco_gdp-economy-gdp-nominal

Example II

Opinion towards Palin & McCain 0 10 20 30 40 50 60 Feeling Thermometer McCain 70 80 90 100 r = 0.84

Negative or Inverse Relationship

Variables can be inversely or negatively related

 High values of X are associated with low values of Y.

Example – Negative / Inverse

Opinion towards Obama & McCain

r

= -0.68

0 10 20 30 40 50 60 70 Feeling Thermometer McCain 80 90 100 red= Republicans, blue=Democrats, grey diamonds=Independents Time/SRBI: Oct 3 6, ‘08

Data

 You need interval-level data.

 You will find many interval-level variables in:  Countries / World  Provinces  Election studies (feeling thermometers, odds of party entering government, etc)  You can often use the index you created as an interval-level variable.

Compare

Opinion towards Palin & McCain Most points close to a line.

0 10 20 30 40 50 60 70 Feeling Thermometer McCain 80 90 100 Lots more noise here. Typical of public opinion data.

Differences between Public Opinion and Aggregate Data

  Although it is not uncommon to have one/some outliers in aggregate data, public opinion data tends to be “noisy”.

 Feeling thermometer example:  Many respondents gave both candidates a 50;  Quite a few respondents liked both candidates 

Even though most who liked McCain disliked Obama

A high Pearson’s r for public opinion data may be low for an association in aggregate data.

Guidelines for Public Opinion Data

MAGNITUDE OF ASSOCIATION QUALIFICATION COMMENTS 0.00

.00 to .15

.15 to .20

.20 to .25

.25 to .30

.30 to .40

.40 to .70

.70 to .99

1.00

No Relationship Not Useful Very Weak Moderately Strong Fairly Strong Strong Knowing the independent variable does not reduce the number of errors in predicting the dependent variable at all.

Not Acceptable Minimally acceptable Acceptable Good Work Great Work Very Strong/Worrisomely Strong

EITHER

an excellent relationship OR the two variables are measuring the same thing

Redundant (?) Perfect Relationship.

Proceed with caution: are the two variables testing the same thing?

If we the know the independent variable, we can predict the dependent variable with absolute success.

Rough Guidelines for Aggregate Data

MAGNITUDE OF ASSOCIATION QUALIFICATION COMMENTS 0.00

.00 to .30

.30 to .50

.50 to .70

.70 to .85

.80 to .90

.90 to .99

1.00

No Relationship Not useful, very weak Weak Fairly Strong Strong Knowing the independent variable does not reduce the number of errors in predicting the dependent variable at all.

Not Acceptable Minimally acceptable Acceptable Good Work Very Strong/Worrisomely Strong

EITHER

an excellent relationship OR the two variables are measuring the same thing

Redundant (?) Perfect Relationship.

Proceed with caution: are the two variables testing the same thing?

If we the know the independent variable, we can predict the dependent variable with absolute success.

Very Strong or Worrisome??

 Public Opinion: above |0.40|   Aggregate: above |0.80| But these are just guidelines. It depends on how good the data is:    Lots of variation in data Large scale (10, 20, 100 pts thermometer scales) – like prediction odds, physicians per 100,000 people, feeling Number of observations (N)  Provinces dataset is small

Outstanding or the same?

  You either have an outstanding relationship OR the variables may be measuring the same idea.

  Ex. unemployment and GDP both measure economic health Ex. Feeling thermometer Barack Obama and feeling thermometer for Joe Biden both measure attitudes towards the Democratic ticket Also inverse relationship  Example above: Obama and McCain feeling thermometers – different sides of the same coin, as both seem to measure partisanship.

Use Yo’ Brain

  Computer cannot tell you if it’s a good, strong relationship or two measures looking at the same thing.

Need to understand what each variable is measuring  Same thought process about the index creation.     Use your knowledge of world and theory to decide whether two variables measure the same thing or two different things.

Example (above): Putnam’s relationship between civic culture and government performance.

Failed states survey - appears that the higher an indicator value, the worse off the country in that particular field. http://www.fundforpeace.org/web/index.php?option=com_content&ta sk=view&id=99&Itemid=140

Flip side

  Relationship you expect is strong is surprisingly not ?!?!?

Make certain both variables are interval   Double check that you cleaned up data  Missing values are missing  Next week: there may be the need to qualify the relationship as some sub-group of the data is not like the others and those need to be identified.

Think about relationship – maybe its not linear, so that relationship is only present for part of range.

Usefulness

  Quick, easy way to look at several variables to see if they are related.

With strong association, you can begin to think about predicting values of Y based on a value of X.

 Ex. Positive correlation – you know a high value of X is associated with a high value of Y!

Webstats Output

- Correlation Coefficients - Q375A1 Q305 Q375A3 Coefficients (Pearson’s r) Q1005 Q375A1 1.0000

( 686) P= .

( .2916

666) P= .000

N ( .5320

667) P= .000

-.3163

( 672) P= .000

Q305 .2916

( 666) P= .000

1.0000

( 2776) P= .

( .2679

660) P= .000

-.1272

( 2721) P= .000

Q375A3 .5320

( 667) P= .000

( .2679

660) P= .000

( 1.0000

682) P= .

-.2020

( 666) P= .000

Q1005 -.3163

( 672) P= .000

-.1272

( 2721) P= .000

( -.2020

666) P= .000

1.0000

( 3181) P= .

Significance?

 Webstats will tell you whether or not the correlation coefficient is significant.

 Remember that this is just telling you whether the relationship may be due to chance.

 Not the strength of the relationship   Almost unheard of to have a strong relationship that is insignificant when using survey data.

So, don’t spend any time discussing significance.

What if non-interval/non-ratio?

 Usually more appropriate to use the other measures of association.

  Webstats will perform a correlation. Be ready for results to be less strong Program may report (instead of Pearson’s r):    Spearman: ordinal x ordinal Point-biserial: one interval/ratio, one dichotomous Phi: two dichotomous variables  All interpreted the same way