Discussion of “Least Angle Regression” by Weisberg

Transcript Discussion of “Least Angle Regression” by Weisberg

Discussion of “Least Angle Regression” by Weisberg

Mike Salwan November 2, 2006 Stat 882

Introduction

 “Notorious” problem of automatic model building algorithms for linear regression  Implicit Assumption  Replacing Y by something without loss of info  Selecting variables  Summary

Implicit Assumption

 We have n x m matrix X and n-vector Y  P is the projection onto the column space  LARS assumes we can replace Y with Ŷ = PY, in large samples F(y|x) = F(y|x’β)  We estimate residual variance by  ˆ 2  (



)

2 /(



 1 )  If this assumption does not hold, then LARS is unlikely to produce useful results

Implicit Assumption (cont)

 Alternative: let F(y|x) = F(y|x’B), where B is an m x d rank d matrix. The smallest d is called the structural dimension of the regression problem  The R package dr can be used to estimate d using methods such as sliced inverse regression  Find a smooth function that operates on a variable set of projections  Expanded variables from 10 to 65 in paper such that F(y|x) = F(y|x’β) holds

Implicit Assumption (cont)

 LARS relies too much on correlations  Correlation measures degree of linear association (obviously)  Requires linearity in conditional distributions of y and of a’x and b’x for all a and b, otherwise bizarre results can come  Any method replacing Y by PY cannot be sensitive to nonlinearity

Implicit Assumption (cont)

 Methods based on PY alone can be strongly   influenced by outliers and high leverage cases Consider

C p

(  ˆ ) 

  ˆ  2 2 

 2

i n

  1 cov(   2

y i

) Estimate σ² by  ˆ 2  (



)

2 /(



 1 )   Thus the ith term is given by:

C pi

( )  (

  ˆ 2 

) 2  cov(  ˆ  2

y i

)  Ŷ i is the ith element of PY and h i is the ith leverage which is a diagonal element in P

cov(  ˆ 2

y i

)

Implicit Assumption (cont)

 From the simulation in the article, we can  ˆ 2

u i is the ith diagonal of the projection matrix on the columns of (1,X) at the current step of the   algorithm Thus,

C pi

(  ˆ )  (

  ˆ 2  ˆ

) 2 

u i

 (

h i



u i

) This is the same formula in another paper by Weisberg where is computed from LARS instead of a projection

Implicit Assumption (cont)

 The value of depends on the agreement between and ŷ  ˆ i ) , the leverage in the subset model and the difference in the leverage between the full and subset models  Neither of the latter two terms has much to do with the problem of interest (study of conditional distribution of y given x), but they are determined by the predictors only

Selecting Variables

 We want to decompose x into two parts x u and x a where x a represents the active predictors  We want the smallest x a such that F(y|x) = F(y|x a ), often using some criterion  Standard methods are too greedy  LARS permits highly correlated predictors to be used

Selecting Variables (cont)

 Example to disprove LARS  Added nine new variables by multiplying original variables by 2.2, then rounding to the nearest integer  LARS method applied to both sets  LARS selects two of the rounded variables including one variable and its rounded variable (BP)

Selecting Variables (cont)

 Inclusion or exclusion depends on the marginal distribution of x as much as the conditional distribution of y|x  Ex: Two variables have a high correlation.

 LARS selects one for its active set  Modify the other to make it now uncorrelated  Doesn’t change y|x, changes marginal of x  Could change set of active predictors selected by LARS or any method that uses correlation

Selecting Variables (cont)

 LARS results are invariant under rescaling, but not under reparameterization of related predictors  By first scaling predictors then adding all cross-products and quadratics, we get a different model if done other way around  This can be solved by considering them simultaneously, but this is self-defeating in terms of subset selection

Summary

 Problems gain notoriety because their solution is illusive but of wide interest  LARS nor any other automatic model selection considers the context of the problem  There seems to be no foreseeable solution to this problem

Discussion of “Least Angle Regression” by Weisberg

Transcript Discussion of “Least Angle Regression” by Weisberg