Visualizing and Exploring Data

Download Report

Transcript Visualizing and Exploring Data

Basis Expansions and
Regularization
Based on Chapter 5 of
Hastie, Tibshirani and Friedman
Basis Expansions for Linear Models
M
f ( X )    m hm ( X )
m 1
Here the hm’s might be:
•hm(X)=Xm, m=1,…,p recovers the original model
•hm(X)=Xj2 or hm(X)= Xj Xk
•hm(X)=I(LmXk Um),
“knots”
Regression Splines
Bottom left panel uses:
h1 ( X )  1
h2 ( X )  X
h3 ( X )  ( X  1 ) 
h4 ( X )  ( X   2 ) 
Number of parameters = (3 regions) X (2 params per region)
- (2 knots X 1 constraint per knot)
=4
cubic
spline
Cubic Spline
continuous first and second derivatives
h1 ( X )  1
h2 ( X )  X
h3 ( X )  X 2
h4 ( X )  X 3
h5 ( X )  ( X  1 )3
h6 ( X )  ( X   2 )3
Number of parameters = (3 regions) X (4 params per region)
- (2 knots X 3 constraints per knot)
=6
Knot discontinuity essentially invisible to the human eye
Natural Cubic Spline
Adds a further constraint that the fitted function is linear beyond
the boundary knots
A natural cubic spline model with K knots is represented by K
basis functions:
H1 ( X )  1
H2(X )  X
H k  2 ( X )  d k ( X )  d K 1 ( X ), where
( X   k ) 3  ( X   K ) 3
dk ( X ) 
K  k
Each of these basis functions has zero 2nd and 3rd derivative outside
the boundary knots
Natural Cubic Spline Models
Can use these ideas in, for example, regression models.
For example, if you use 4 knots and hence 4 basis functions per
predictor variable, then simply fit logistic regression model with
four times the number of predictor variables…
Smoothing Splines
Consider this problem: among all functions f(x) with two
continuous derivatives, find the one that minimizes the penalized
residual sum of squares:
N


RSS( f ,  )   yi  f ( xi )    f (t ) dt
2
''
2
i 1
smoothing
parameter
=0 :
f can be any function that interpolates the data
=infinity : least squares line
Smoothing Splines
Theorem: The unique minimizer of this penalized RSS is a natural
cubic spline with knots at the unique values of xi , i=1,…,N
Seems like there will be N features and presumably overfitting of the
data. But,… the smoothing term shrinks the model towards the linear
fit
N
f ( x)   H j ( x) j
i 1
RSS( ,  )  ( y  H )T ( y  H )   T  H where
Hij  H j ( xi )
and  H jk   H 'j' (t ) H k'' (t )dt
ˆ  ( H T H   H ) 1 H T y  S y
This is a generalized ridge regression
Can show that S  ( I  K ) 1where K does not depend on 

Nonparametric Logistic Regression
Consider logistic regression with a single x:
Pr(Y  1 | X  x)
logit(Y ) : log
 f ( x)
Pr(Y  0 | X  x)
and a penalized log-likelihood criterion:

N

2
1
''
l ( f ,  )   yi log p( xi )  (1  yi ) log(1  p( xi )   f (t ) dt
2
i 1
N

  yi f ( xi )  log(1  e
f ( xi )

i 1


2
1
''
   f (t ) dt
2
Again can show that the optimal f is a natural spline with knots at the datapoint
Can use Newton-Raphson to do the fitting.
Thin-Plate Splines
The discussion up to this point has been one-dimensional. The higherdimensional analogue of smoothing splines are “thin-plate splines.” In
2-D, instead of minimizing:
N


RSS( f ,  )   yi  f ( xi )    f (t ) dt
2
''
2
i 1
minimize:
N
RSS( f ,  )   yi  f ( xi )  J ( f ) where
2
i 1
2
2
2
  f ( x)    f ( x)    f ( x) 
 
 dx1dx2
  
J ( f )    
2
2 



 x1   x1x2   x2 
2
2
2
Thin-Plate Splines
The solution has the form:
N
f ( x)   0   T x    j h j ( x) where
j 1
h j ( x)   ( x  x j ) and  ( z )  z 2 log z 2
a type of “radial basis function”