Statistics - Healey Chapter 13-14
Download
Report
Transcript Statistics - Healey Chapter 13-14
Week 12
Chapter 13 – Association between variables
measured at the ordinal level
&
Chapter 14: Association Between Variables
Measured at the Interval-Ratio Level
Chapter 13
Association Between Variables
Measured at the Ordinal Level
This Presentation
Two Types of Ordinal Variables
Gamma
Spearman’s Rho
Hypothesis Tests for Gamma and Rho
Two Types of Ordinal Variables
Continuous ordinal variables:
1.
Have many possible scores
Resemble interval-ratio level variables
Use Spearman’s Rho: rs
Example: a scale measuring attitudes toward handgun
control with scores ranging from 0 to 20
Collapsed ordinal variables:
2.
Have just a few values or scores
Use Gamma: G
Can also use Somer’s d and Kendall’s tau-b (see text website)
Example: social class measured as lower, middle, upper
Gamma
Gamma is used to measure the strength and direction of
the relationship between two ordinal level variables that
have been arrayed in a bivariate table
Before computing and interpreting Gamma, it will always
be useful to find and interpret the column percentages
Gamma
Interpretation:
Use the table below as a guide to interpret the strength of
gamma in overall terms
Gamma
In addition to strength, gamma also identifies the
direction of the relationship
In a negative relationship, the variables change in
different directions
Example: As age increases, income decreases
(or, as age decreases, income increases)
In a positive relationship, the variables change in the
same direction
Example: As education increases, income
increases (or, as education decreases, income
decreases)
Gamma
Gamma
Gamma
In addition to strength and direction, a hypothesis test of
Gamma can also indicate if the two variables share a
relationship in the population, or if the two variables are
significantly related
Hypothesis Test of Gamma:
Step 1: Make Assumptions and Meet Test
Requirements
Random sampling
Ordinal level of measurement
Normal sampling distribution
Gamma
Hypothesis Test of Gamma:
Step 2: State the Null Hypothesis
Ho: γ = 0
No relationship exists between the variables
in the population
H1: γ ≠ 0
A relationship exists between the variables in
the population
Gamma
Hypothesis Test of Gamma:
Step 3: Select the Sampling Distribution and
Establish the Critical Region
Sampling distribution = Z distribution
Set alpha (two-tailed)
Look up Z(critical) in Appendix A
Gamma
Hypothesis Test of Gamma:
Step 4: Compute the Test Statistic
Ns Nd
Z(obtained) G
N (1 G2 )
Ns Nd
w hereG
Ns Nd
Gamma
Hypothesis Test of Gamma:
Step 5: Make a Decision and Interpret the
Results
Compare Z(obtained) to Z(critical)
If Z(obtained) falls in the critical region,
reject Ho
If Z(obtained) does not fall in the critical
region, fail to reject Ho
Interpret results
Spearman’s Rho (rs)
Measure of association for ordinal-level variables
with a broad range of different scores and few ties
between cases on either variable
Computing Spearman’s Rho
1.
Rank cases from high to low on each variable
2.
Use ranks, not the scores, to calculate Rho
Spearman’s Rho (rs)
Spearman’s Rho (rs)
Spearman’s Rho (rs)
Spearman’s Rho (rs)
Rho is positive, therefore jogging and self-image
share a positive relationship: as jogging rank
increases, self-image rank also increases
On its own, Rho does not have a good strength
interpretation
But Rho2 is a PRE measure
For this example, Rho2 = (0.86)2 = 0.74
Therefore, we would make 74% fewer errors if we
used the rank of jogging to predict the rank on selfimage compared to if we ignored the rank on
jogging
Spearman’s Rho (rs)
In addition to strength and direction, a hypothesis test of Rho can
also indicate if the two variables share a relationship in the
population, or if the two variables are significantly related
Hypothesis Test of Spearman’s Rho:
Step 1: Make Assumptions and Meet Test
Requirements
Random sampling
Ordinal level of measurement
Normal sampling distribution
Spearman’s Rho (rs)
Hypothesis Test of Spearman’s Rho:
Step 2: State the Null Hypothesis
Ho: ρs = 0
No relationship exists between the variables
in the population
H1: ρs ≠ 0
A relationship exists between the variables in
the population
Spearman’s Rho (rs)
Hypothesis Test of Spearman’s Rho:
Step 3: Select the Sampling Distribution and
Establish the Critical Region
Sampling distribution = Student’s t
Alpha = 0.05 (two-tailed)
Degrees of freedom = N-2 = 8
t(critical) = ±2.306
Spearman’s Rho (rs)
Hypothesis Test of Gamma:
Step 4: Compute the Test Statistic
Spearman’s Rho (rs)
Hypothesis Test of Gamma:
Step 5: Make a Decision and Interpret the
Results
t(obtained) = 4.77
t(critical) = ±2.306
t(obtained) falls in the critical region, so
reject Ho
Jogging and self-image are related in the
population from which the sample was
drawn
Chapter 14
Association Between Variables
Measured at the Interval-Ratio Level
This Presentation
Scattergrams
Graphs that display relationships between two interval-ratio
variables
Regression Coefficients and the Regression Line
Regression line summarizes the linear relationship between
X and Y
Regression coefficients predict scores on Y from scores on
X
Pearson’s r
Preferred measure of association for two interval-ratio
variables
Coefficient of determination: r2
Correlation matrix
Scattergrams
Scattergrams have two dimensions:
The X (independent) variable is arrayed along the
horizontal axis
The Y (dependent) variable is arrayed along the
vertical axis
Each dot on a scattergram is a case
The dot is placed at the intersection of the case’s
scores on X and Y
Scattergrams
A regression line, which summarizes the linear
relationship between X and Y, is added to the graph
“Eyeball” a straight line that connects all of the
dots or comes as close as possible to connecting
all of the dots
To be more precise: calculate the conditional
mean of Y for each value of X, plot those values,
and connect the dots
Inspection of a scattergram should always be the
first step in assessing the relationship between two
interval-ratio level variables
Scattergrams
Linearity
A key assumption of scattergrams and regression
analysis is that X and Y share a linear relationship
In a linear relationship the dots of a scattergram form a
straight line pattern
Linear Relationship: Example
Scattergrams
Linearity
In a nonlinear relationship the dots do not form a straight
line pattern
Scattergrams
Three Questions
Does a relationship exist?
A relationship exists if the conditional
means of Y change across values of X
As long as the regression line lies at an
angle to the X axis (and is not parallel to
the X axis), we can conclude that a
relationship exists between the two
variables
Scattergrams
Three Questions
How strong is the relationship?
Strength of the relationship is determined by the
spread of the dots around the regression line
In a perfect association, all dots fall on the
regression line
In a stronger association, the dots fall close
(are clustered tightly around) the regression
line
In a weaker association, the dots are spread
out relatively far from the regression line
Scattergrams
Three Questions
What is the direction of the relationship? (Direction of
association is determined by the angle of the
regression line)
What is the Direction of the Relationship?
Scattergrams
Based on this scattergram for percent college educated (X) and voter
turnout (Y) on election day for 50 states:
Does a relationship exist? How strong is the relationship?
What is the direction of the relationship? Is the relationship linear?
Scattergrams
Does a relationship exist?
The regression line falls at an angle to the X axis (it is not
parallel), therefore we can conclude that an association
exists between voter turnout and college education
Scattergrams
How strong is the relationship?
The greater the extent to which dots are clustered around
the regression line, the stronger the relationship
This relationship is weak to moderate in strength
Scattergrams
What is the direction of the relationship?
Positive: Regression line rises from lower-left to upper-right
Negative: Regression line falls from upper-left to lower-right
This is a positive relationship: As percent college educated
increases, voter turnout increases
Scattergrams
Is the relationship linear?
The conditional means on Y form a straight line, as
demonstrated by the regression line
Therefore, the relationship is linear
Pearson’s r
Pearson’s r is a measure of association for interval-
ratio level variables
Pearson’s r can indicate the direction of association,
but it does not have an acceptable strength
interpretation
But, by squaring r, we obtain a PRE measure called
the coefficient of determination
The coefficient of determination indicates the
percentage of the variation in Y that is explained by X
Pearson’s r
Calculate r
Pearson’s r
r = 0.50
r is positive, therefore the relationship between X and Y is
positive
As the number of children in dual-career families increases,
husbands’ hours of housework per week also increases
r2 = (0.50)2 = 0.25
r2 is 0.25, therefore the number of children in dual-career
families explains 25% of the variation in husbands’ hours of
housework per week
Pearson’s r
Hypothesis Test of Pearson’s r
Step 1: Make Assumptions and Meet Test
Requirements
Random sample
Interval-ratio level measurement
Bivariate normal distributions
Linear relationship
Homoscedasticity
Normal sampling distribution
Pearson’s r
Hypothesis Test of Pearson’s r
Step 2: State the Null Hypothesis
H o: ρ = 0
H 1: ρ ≠ 0
Step 3: Select the Sampling Distribution and Establish the
Critical Region
Sampling distribution = Student’s t
Alpha = 0.05 (two-tailed)
Degrees of freedom = N-2 = 10
t(critical) = ±2.228
Pearson’s r
Hypothesis Test of Pearson’s r
Step 4: Compute Test Statistic
Pearson’s r
Hypothesis Test of Pearson’s r
Step 5: Make a Decision and Interpret the Results
t(critical) = ±2.228
t(obtained) = 1.83
t(obtained) does not fall in the critical region, so we
fail to reject Ho
The two variables are not related in the population
Correlation Matrix
A correlation matrix is a table that shows the relationships
between all possible pairs of variables
Correlation Matrix
Using the matrix below:
What is the correlation between GDP and inequality?
Of all the variables correlated with Inequality, which has the
strongest relationship? The weakest?