Variance Partitions - University of South Florida

Download Report

Transcript Variance Partitions - University of South Florida

Collinearity

The Problem of Large Correlations Among the Independent Variables

Skill Set

• What is collinearity?

• Why is it a problem?

• How do I know if I’ve got it?

• What can I do about it?

Collinearity Defined

• Within the set of IVs, one or more IVs are (nearly) totally predicted by the other IVs.

• In such a case, the

b

or beta weights are poorly estimated.

• Problem of the “Bouncing Betas.”

Diagnostics

1. Variance Inflation Factor (VIF).

Standard error of the

b

weight with 2 IVs:

S b y

1 .

2  

S

2

y

.

12

x

1 2 ( 1 

r

12 2 ) Sampling Variance of

b

weight

S

2

b y

1 .

2   2

S y

.

12

x

1 2 ( 1 

r

2 12 ) 

S

y

2 .

12

x

1 2   1  1

r

2 12   VIF

VIF (2)

Standard Error with

k

predictors:

S b y

1 .

2 ...

k

  2

S y

.

12 ...

k x

1 2 ( 1 

R

2 1 .

2 ...

k

)

VIF

1  1  1 2

R

1 .

2 ...

k

Large values of VIF are trouble. Some say values > 10 are high.

VIF i

 1  1

R i

2

Tolerance

Tolerance is

Tol

 1 

R i

2  1 /

VIF i

Small values are trouble. Maybe .10?

Condition Index

Number Eigenval Condition Index

CI i

 

Max

i

Lambda is an eigenvalue.

Variance Proportions Constant X1 X2 X3 1 2 3 4 3.771

.106

.079

.039

1.00

5.969

6.90

9.946

.004

.003

.000

.993

.006

.029

.749

.215

.006

.268

.397

.329

.008

.774

.066

.152

Number refers to a linear combination of the predictors.

Eigenvalue refers to the variance of that combination.

Collinearity is spotted by finding 2 or more variables that have large proportions of variance (.50 or more) that correspond to large condition indices. A rule of thumb is to label as large those condition indices in the range of 30 or larger. No apparent problem here.

Condition Index (2)

Number Eigenval Condition Variance Proportions Index Constant X1 X2 1 2 3 4 3.819

.117

.047

.017

1.00

5.707

9.025

15.128

.004

.043

.876

.077

X3 .006

.384

.608

.002

.002

.041

.001

.967

.002

.087

.042

.868

The last condition index (15.128) is highly associated with X2 and X3. The

b

weights for X2 and X3 are probably not well estimated.

Dealing with Collinearity

• Lump it. Admit ambiguity; SE of

b

weights. Refer also to correlations.

• Select or combine variables. • Factor analyze set of IVs.

• Use another type of analysis (e.g., path analysis).

• Use another type of regression (ridge regression).

• Unit weights (no longer regression).

Review

• What is collinearity?

• Why is collinearity a problem?

• What is the VIF?

• What is Tolerance?

• What is a condition index?

• What are some things you can do to deal with collinearity?