48x36 Poster Template - Carnegie Mellon University

Download Report

Transcript 48x36 Poster Template - Carnegie Mellon University

Cluster-Based Modeling:
Exploring the Linear Regression Model Space
Student: XiaYi(Sandy) Shen Advisor: Rebecca Nugent
Carnegie Mellon University, Pittsburgh, Pennsylvania
Real Y data:
-2
-2
-1
0
1
Yi*= 3 + 2Xi1
Real Y data:
Yi = 3 + 2Xi1+ rnorm(3,0,1)
20
The fitted value from each model and the original Yi* are plotted
below:
In Practice, we have:
30
8.0
We have 2p-1 possible models
50
60
70
10
20
30
40
50
60
Stepwise chose the model with variables X1, X2 and X3
Two clusters of models, one group of models predicts similarly to
the truth, the other group does not
The perfect model, the stepwise chosen model and the model with
the right variables predict very similarly
•Both: alternates forward and backward steps
POSTER TEMPLATE BY:
www.PosterPresentations.com
}
-5
0
5
10
1 0
3 4
2 3
4 5
3 4 3 4
4 5
4
2 3 3
5
4
5 6
3 4
2 3
1 2
4 5
4
3
2 3
2 3 3 4
3 4
1 2
1 2 2
4 5
3
2 3
1 2
3 4
2 3
1 2
2 3
3 4 2 3
There are two large clusters of models; each could be split
into two smaller clusters
The stepwise chosen model predicts similarly to models with
more variables; there is one 3-variable model that could be a
possible replacement
Models with fewer variables are in the same cluster with a
few exceptions
The model with no variables is similar to a 1-variable model
40
hclust (*, "complete")
4.5
5.0
5.5
6.0
6.5
7.0
20
0
• Stepwise regression models are in high frequency areas of
the model space. In our simulations, it predicts similarly to the
perfect model and the model with correct variables
-20
4.0
^
Y
1
greedy
algorithms
Principal Component 2
Conclusion /Discussion
-40
6.5
Principal Component 2
7.0
6.5
3.5
Stepwise regression: search in the “model space” for the “best subsets”
•Backward: removing variables one at a time
7.5
7.0
Model Criterion: R2, adjusted R2 , AIC, BIC, and Stepwise regression
•Forward: adding in variables one at a time
^
Y2
7.5
8.0
4. Y = β0 + β1X1 +β2X2
10
70
^
Y
1
^
Y3
8.5
3. Y = β0 + β2X2
-10
-10
40
• Principal Components (PC) projection: lower dimension
representation which contain information/structure from the high
dimensions
Perfect Fit:Y~3+2*X1
Y~1
Y~X1
Y~X2
Y~X1+X2
8.5
To predict Y from p-1 possible Xj variables
2. Y = β0 + β1X1
0
 Each model is labeled by its number of variables
Note: Hard to look at higher dimensions, can only visualize
2-dimension at a time.
9.0
• Many possible predictor variables: X1, X2 , X3 ……
1. Y = β0
-10
 The stepwise chosen model is labeled in blue
• One variable that we are interested in predicting: Y
Example: 2 variables: X1, X2 => 4 possible models :
-20
 Hierarchical Clustering is done on the PC projections
^
Y
5
Xi1
How do we normally build/choose model?
30
Principal Component 1
60
Perfect model:
(recall 4 possible models from previous panel)
2
20
60
Yi  2.83 2.19X i1
10
40
0
Fitted model (red line):
Illustration of Idea
We have two predictor variables Xi1, Xi2, i = 1,2,3 :
0
50
2
Yi
4
 i ~ N 0,1
-10
40
Yˆ
-20
• Hierarchical Clustering:
70
6
Yi  3  2 X i1   i
• Pairs plot :
plots, impossible to show all in one graph,
instead we show two selected pairs of dimensions representing two
cross sections of the model space
60
8
Truth model:
 60
 
2
50
• What does it look like graphically?
Our questions:
• Do models cluster?
Are there distinct “groups” of models with similar predictability?
• Are there complicated models that could be replaced
by simpler models?
• How is stepwise doing?
40
found by method of least squares
30
ˆ  ( X X )1 X Y
20
where
Perfect model in green, stepwise chosen models in blue, model
with the right variables in red
^
Y6
Yˆi  ˆ0  ˆ1 X i,1  ˆ2 X i,2  ... ˆ p1 X i, p1
-20
2p-1 possible models, each with n fitted values
2p-1 observations in n-dimensional space
Principal Component 2
• Estimated Regression Function
10
We use a heat map of the kernel density estimate of the model
space (red-low density, white/yellow-high density)
-15
We look at the Linear Regression Model Space :
Visualization of Model Space:
30
βj : Change in E[Yi] for one unit increase in Xi,j (all other variables fixed)
• Principal Component (PC) Projection: We randomly sampled 60
suburbs, since more models than observations are needed to run PC
20
β0 : E[Yi] when all Xi,j = 0
• Represent each model by its nx1vector of fitted values Yˆ
• Models that predict similar values are close (in space)
We have 26 = 64 possible models, model space is 64x60 dimensions
10
j = 0,1,2,…,p-1; p = number of parameters; p-1 variables
Yi = 2Xi1 + 3Xi2 + rnorm(60,0,1)
Principal Component 3
Characterizing the models:
Selected predictor variables: crime rate, average # of rooms, distance
to employment centers, proportion of blacks, accessibility to
highways, and nitrogen oxides concentration
0
i = 1,2,…,n observations
Yi* = 2Xi1 + 3Xi2
Height

Perfect model:
Predicting the median value of owner-occupied homes in $1000 for
506 suburbs of Boston
80
Yi   0  1 X i ,1   2 X i , 2  ...   p 1 X i , p 1   i  i ~ N 0, 
2
We have six predictor variables Xi1, Xi2, Xi3, Xi4, Xi5, Xi6 , i = 1,2,…,60
20
• Regression Model
• Stepwise regression is greedy, does not necessarily search the
entire model space
• Could have very complicated models that do not predict much
better than simpler models
^
Y
2
What is Linear Regression?
Boston Housing Data
Simulation with 60 Data Points
0
Issues with current model search criterions
Introduction
-100
• The blue and red models predict more similar values and are
closer to the perfect fit (brown) in model space
• The blue and red models contain the correct predictor variable X1
• The black model does not contain any predictor variable and thus
is the furthest from the perfect fit
-50
0
50
100
Principal Component 1
• PC projection is more useful to visualize higher dimension
Three clusters of models, one group of models predicts closely
to the truth, the other two groups do not.
• Increasing the number of observations increases the
dimensions;
Stepwise behaves similarly in PC projection as in pairs plot
• Increasing the number of variables drastically increases the
number of models
Note: relying on projection, hence does not necessarily capture all
the structure/information
Future: Want to better characterize the clusters/model spaces