Stepwise Regression

Download Report

Transcript Stepwise Regression

Stepwise Regression

SAS

Download the Data

• http://core.ecu.edu/psyc/wuenschk/StatData/St atData.htm

3.2 625 540 65 2.7

4.1 575 680 75 4.5

3.0 520 480 65 2.5

2.6 545 520 55 3.1

3.7 520 490 75 3.6

4.0 655 535 65 4.3

4.3 630 720 75 4.6

2.7 500 500 75 3.0 and so on

Download the SAS Code

• http://core.ecu.edu/psyc/wuenschk/SAS/SAS Programs.htm

data

grades; infile 'C:\Users\Vati\Documents\StatData\MultReg.dat'; input GPA GRE_Q GRE_V MAT AR;

PROC REG

; a: MODEL GPA = GRE_Q GRE_V MAT AR / STB SCORR2 selection=forward slentry =

.05

details;

run

;

Forward Selection, Step 1

Statistics for Entry

DF = 1,28 Variable Tolerance GRE_Q GRE_V MAT AR 1.000000

1.000000

1.000000

1.000000

Model R-Square 0.3735

0.3381

0.3651

0.3853

F Value 16.69

14.30

16.10

17.55

Pr > F 0.0003

0.0008

0.0004

0.0003

All predictors have

p

< the slentry value of .05.

AR has the lowest

p

.

AR enters first.

Step 2

Statistics for Entry

DF = 1,27 Variable Tolerance GRE_Q GRE_V MAT 0.742099

0.835714

0.724599

Model R-Square 0.5033

0.5155

0.4923

F Value 6.41

7.26

5.69

All predictors have

p

< the slentry value of .05.

GRE-V has the lowest

p

.

GRE-V enters second.

Pr > F 0.0174

0.0120

0.0243

Step 3

Statistics for Entry

DF = 1,26 Variable Tolerance GRE_Q MAT 0.659821

0.670304

Model R-Square 0.5716

0.5719

F Value 3.41

3.42

Pr > F 0.0764

0.0756

No predictor has

p

< .05, forward selection terminates.

The Final Model

Parameter Estimates Variable DF Parameter Estimate Standard Error t Value Pr > |t| Intercept GRE_V AR 1 1 1 0.49718

0.00285

0.32963

0.57652 0.86

0.00106 2.69

0.10483 3.14

0.3961

0.0120

0.0040

Standard ized Estimate Squared Semi partial Corr Type II 0 .

0.39470 0.13020

0.46074 0.17740

R 2

= .516,

F

(2, 27) = 14.36,

p

< .001

Backward Selection

b: MODEL GPA = GRE_Q GRE_V MAT AR / STB SCORR2 selection=backward slstay = .05 details; run; • We start out with a simultaneous multiple regression, including all predictors.

• Then we trim that model.

Step 1

Variable Parameter Estimate Intercept -1.73811

GRE_Q 0.00400

GRE_V MAT 0.00152

0.02090

AR 0.14423

Standard Error Type II SS F Value Pr > F 0.95074 0.50153

0.00183 0.71582

3.34

4.77

0.0795

0.0385

0.00105 0.31588

0.00955 0.71861

2.11

4.79

0.1593

0.0382

0.11300

0.24448

1.63

0.2135

GRE-V and AR have

p

values that exceed the slstay value of .05.

AR has the larger

p

, it is dropped from the model.

Step 2

Statistics for Removal DF = 1,26 Variable Partial R-Square GRE_Q GRE_V 0.1236

0.0340

MAT 0.1318

Model R-Square 0.4935

0.5830

0.4852

F Value 8.39

2.31

8.95

Pr > F 0.0076

0.1405

0.0060

Only GRE_V has

p

> .05, it is dropped from the model.

Step 3

Statistics for Removal DF = 1,27 Variable Partial R-Square GRE_Q MAT 0.2179

0.2095

Model R-Square 0.3651

0.3735

F Value 14.11

13.56

Pr > F 0.0008

0.0010

No predictor has

p

< .05, backwards elimination halts.

The Final Model

Parameter Estimates Variable DF Parameter Estimate Standard Error t Value Pr > |t| Intercept 1 GRE_Q 1 MAT 1 -2.12938

0.00598

0.03081

0.92704 -2.30

0.00159 3.76

0.00836 3.68

0.0296

0.0008

0.0010

Standard ized Estimate Squared Semi partial Corr Type II 0 .

0.48438 0.21791

0.47494 0.20950

R 2

= .5183,

F

(2, 27) = 18.87,

p

< .001

What the

F

Test?

• Forward selection led to a model with AR and GRE_V • Backward selection led to a model with MAT and GRE_Q.

• I am getting suspicious about the utility of procedures like this.

Fully Stepwise Selection

c: MODEL GPA = GRE_Q GRE_V MAT AR / STB SCORR2 selection=stepwise slentry=

.08

slstay =

.08

details;

run

; • Like forward selection, but, once added to the model, a predictor is considered for elimination in subsequent steps.

Step 3

• Steps 1 and 2 are identical to those of forward selection, but with slentry set to .08, MAT enters the model.

Statistics for Entry DF = 1,26 Variable Tolerance F Value Pr > F GRE_Q MAT 0.659821

0.670304

Model R-Square 0.5716

0.5719

3.41

3.42

0.0764

0.0756

Step 4

• GRE_Q enters. Now we have every predictor in the model Statistics for Entry DF = 1,25 Variable Tolerance GRE_Q 0.653236

Model R-Square 0.6405

F Value 4.77

Pr > F 0.0385

Step 5

• Once GRE_Q is in the model, AR and GRE_V become eligible for removal. Statistics for Removal DF = 1,25 Variable Partial R-Square GRE_Q GRE_V 0.0686

0.0303

MAT AR 0.0689

0.0234

Model R-Square 0.5719

0.6102

0.5716

0.6170

F Value 4.77

2.11

4.79

1.63

Pr > F 0.0385

0.1593

0.0382

0.2135

Step 6

• AR out, GRE_V still eligible for removal.

Statistics for Removal DF = 1,26 Variable Partial R-Square GRE_Q GRE_V MAT 0.1236

0.0340

0.1318

Model R-Square 0.4935

0.5830

0.4852

F Value 8.39

2.31

8.95

Pr > F 0.0076

0.1405

0.0060

Step 7

• At this point, no variables in the model are eligible for removal • And no variables not in the model are eligible for entry.

• The final model includes MAT and GRE_Q • Same as the final model with backwards selection.

R-Square Selection

• d: MODEL GPA = GRE_Q GRE_V MAT AR / selection=rsquare cp mse;

run

; • Test all one predictor models, all two predictor models, and so on.

• Goal is the get highest

R 2

with fewer than all predictors.

One Predictor Models

Number in Model 1 1 1 1 R-Square 0.3853

0.3735

0.3651

0.3381

C(p) 16.7442

17.5642

18.1490

20.0268

MSE 0.22908

0.23348

0.23661

0.24667

Variables in Model AR GRE_Q MAT GRE_V

One Predictor Models

• • AR yields the highest

R 2 C(p)

= 16.74,

MSE

= .229

• Mallow says best model will be that with small

C(p)

and value of

C(p)

near that of

p

(number of parameters in the model).

p

here is 2 – one predictor and the intercept • Howell suggests one keep adding predictors until

MSE

starts increasing.

Two Predictor Models

2 2 2 2 2 Number in Model 2 R-Square 0.5830

C(p) 4.9963

0.5155

0.5033

0.4935

0.4923

0.4852

9.6908

10.5388

11.2215

11.3019

11.7943

MSE 0.16116

0.18725

0.19196

0.19575

0.19620

0.19894

Variables in Model GRE_Q MAT GRE_V AR GRE_Q AR GRE_V MAT MAT AR GRE_Q GRE_V

Two Predictor Models

• Compared to the best one predictor model, that with MAT and GRE_Q has – Considerably higher

R 2

– Considerably lower

C(p)

– Value of

C(p)

, 5, close to value of

p

, 3.

– Considerably lower

MSE

3 3 3

Three Predictor Models

Number in Model 3 R-Square 0.6170

C(p) 4.6292

0.6102

0.5719

0.5716

5.1050

7.7702

7.7888

MSE 0.15369

0.15644

0.17182

0.17193

Variables in Model GRE_Q GRE_V MAT GRE_Q MAT AR GRE_V MAT AR GRE_Q GRE_V AR

Three Predictor Models

• Adding GRE_V to the best two predictor model (GRE_Q and MAT) – Slightly increases

R 2

(from .58 to .62) – Reduces [

C(p)

p

] from 2 to .6

– Reduces

MSE

from .16 to .15

• None of these stats impress me much, I am inclined to take the GRE_Q, MAT model as being best.

Closer Look at MAT, GRE_Q, GRE_V

• e: MODEL GPA = GRE_Q GRE_V MAT / STB SCORR2;

run

; Variable DF Parameter Estimate Standard Error t Value Pr > |t| Standard ized Estimate Squared Semi partial Corr Type II Intercept 1 -2.14877

0.90541 -2.37

0.0253

0 .

GRE_Q GRE_V MAT 1 1 1 0.00493

0.00161

0.02612

0.00170 2.90

0.00106 1.52

0.00873 2.99

0.0076

0.1405

0.0060

0.39922 0.12357

0.22317 0.03404

0.40267 0.13180

Keep GRE_V or Not ?

• It does not have a significant partial effect in the model, why keep it?

• Because it is free info. You get GRE-V and GRE_Q for the same price as GRE_Q along.

• Equi donati dentes non inspiciuntur.

– As (gift) horses age, their gums recede, making them look long in the tooth.

Add AR ?

• •

R 2

increases from .617 to .640

C(p)

=

p

(always true in full model) •

MSE

drops from .154 to .150

• Getting AR data is expensive • Stop gathering the AR data, unless it has some other value.

Conclusions

• Read http://core.ecu.edu/psyc/wuenschk/StatHel p/Stepwise-Voodoo.htm

• Treat all claims based on stepwise algorithms as if they were made by Saddam Hussein on a bad day with a headache having a friendly chat with George Bush.