48x36 Poster Template - Carnegie Mellon University
Download
Report
Transcript 48x36 Poster Template - Carnegie Mellon University
Cluster-Based Modeling:
Exploring the Linear Regression Model Space
Student: XiaYi(Sandy) Shen Advisor: Rebecca Nugent
Carnegie Mellon University, Pittsburgh, Pennsylvania
Real Y data:
-2
-2
-1
0
1
Yi*= 3 + 2Xi1
Real Y data:
Yi = 3 + 2Xi1+ rnorm(3,0,1)
20
The fitted value from each model and the original Yi* are plotted
below:
In Practice, we have:
30
8.0
We have 2p-1 possible models
50
60
70
10
20
30
40
50
60
Stepwise chose the model with variables X1, X2 and X3
Two clusters of models, one group of models predicts similarly to
the truth, the other group does not
The perfect model, the stepwise chosen model and the model with
the right variables predict very similarly
•Both: alternates forward and backward steps
POSTER TEMPLATE BY:
www.PosterPresentations.com
}
-5
0
5
10
1 0
3 4
2 3
4 5
3 4 3 4
4 5
4
2 3 3
5
4
5 6
3 4
2 3
1 2
4 5
4
3
2 3
2 3 3 4
3 4
1 2
1 2 2
4 5
3
2 3
1 2
3 4
2 3
1 2
2 3
3 4 2 3
There are two large clusters of models; each could be split
into two smaller clusters
The stepwise chosen model predicts similarly to models with
more variables; there is one 3-variable model that could be a
possible replacement
Models with fewer variables are in the same cluster with a
few exceptions
The model with no variables is similar to a 1-variable model
40
hclust (*, "complete")
4.5
5.0
5.5
6.0
6.5
7.0
20
0
• Stepwise regression models are in high frequency areas of
the model space. In our simulations, it predicts similarly to the
perfect model and the model with correct variables
-20
4.0
^
Y
1
greedy
algorithms
Principal Component 2
Conclusion /Discussion
-40
6.5
Principal Component 2
7.0
6.5
3.5
Stepwise regression: search in the “model space” for the “best subsets”
•Backward: removing variables one at a time
7.5
7.0
Model Criterion: R2, adjusted R2 , AIC, BIC, and Stepwise regression
•Forward: adding in variables one at a time
^
Y2
7.5
8.0
4. Y = β0 + β1X1 +β2X2
10
70
^
Y
1
^
Y3
8.5
3. Y = β0 + β2X2
-10
-10
40
• Principal Components (PC) projection: lower dimension
representation which contain information/structure from the high
dimensions
Perfect Fit:Y~3+2*X1
Y~1
Y~X1
Y~X2
Y~X1+X2
8.5
To predict Y from p-1 possible Xj variables
2. Y = β0 + β1X1
0
Each model is labeled by its number of variables
Note: Hard to look at higher dimensions, can only visualize
2-dimension at a time.
9.0
• Many possible predictor variables: X1, X2 , X3 ……
1. Y = β0
-10
The stepwise chosen model is labeled in blue
• One variable that we are interested in predicting: Y
Example: 2 variables: X1, X2 => 4 possible models :
-20
Hierarchical Clustering is done on the PC projections
^
Y
5
Xi1
How do we normally build/choose model?
30
Principal Component 1
60
Perfect model:
(recall 4 possible models from previous panel)
2
20
60
Yi 2.83 2.19X i1
10
40
0
Fitted model (red line):
Illustration of Idea
We have two predictor variables Xi1, Xi2, i = 1,2,3 :
0
50
2
Yi
4
i ~ N 0,1
-10
40
Yˆ
-20
• Hierarchical Clustering:
70
6
Yi 3 2 X i1 i
• Pairs plot :
plots, impossible to show all in one graph,
instead we show two selected pairs of dimensions representing two
cross sections of the model space
60
8
Truth model:
60
2
50
• What does it look like graphically?
Our questions:
• Do models cluster?
Are there distinct “groups” of models with similar predictability?
• Are there complicated models that could be replaced
by simpler models?
• How is stepwise doing?
40
found by method of least squares
30
ˆ ( X X )1 X Y
20
where
Perfect model in green, stepwise chosen models in blue, model
with the right variables in red
^
Y6
Yˆi ˆ0 ˆ1 X i,1 ˆ2 X i,2 ... ˆ p1 X i, p1
-20
2p-1 possible models, each with n fitted values
2p-1 observations in n-dimensional space
Principal Component 2
• Estimated Regression Function
10
We use a heat map of the kernel density estimate of the model
space (red-low density, white/yellow-high density)
-15
We look at the Linear Regression Model Space :
Visualization of Model Space:
30
βj : Change in E[Yi] for one unit increase in Xi,j (all other variables fixed)
• Principal Component (PC) Projection: We randomly sampled 60
suburbs, since more models than observations are needed to run PC
20
β0 : E[Yi] when all Xi,j = 0
• Represent each model by its nx1vector of fitted values Yˆ
• Models that predict similar values are close (in space)
We have 26 = 64 possible models, model space is 64x60 dimensions
10
j = 0,1,2,…,p-1; p = number of parameters; p-1 variables
Yi = 2Xi1 + 3Xi2 + rnorm(60,0,1)
Principal Component 3
Characterizing the models:
Selected predictor variables: crime rate, average # of rooms, distance
to employment centers, proportion of blacks, accessibility to
highways, and nitrogen oxides concentration
0
i = 1,2,…,n observations
Yi* = 2Xi1 + 3Xi2
Height
Perfect model:
Predicting the median value of owner-occupied homes in $1000 for
506 suburbs of Boston
80
Yi 0 1 X i ,1 2 X i , 2 ... p 1 X i , p 1 i i ~ N 0,
2
We have six predictor variables Xi1, Xi2, Xi3, Xi4, Xi5, Xi6 , i = 1,2,…,60
20
• Regression Model
• Stepwise regression is greedy, does not necessarily search the
entire model space
• Could have very complicated models that do not predict much
better than simpler models
^
Y
2
What is Linear Regression?
Boston Housing Data
Simulation with 60 Data Points
0
Issues with current model search criterions
Introduction
-100
• The blue and red models predict more similar values and are
closer to the perfect fit (brown) in model space
• The blue and red models contain the correct predictor variable X1
• The black model does not contain any predictor variable and thus
is the furthest from the perfect fit
-50
0
50
100
Principal Component 1
• PC projection is more useful to visualize higher dimension
Three clusters of models, one group of models predicts closely
to the truth, the other two groups do not.
• Increasing the number of observations increases the
dimensions;
Stepwise behaves similarly in PC projection as in pairs plot
• Increasing the number of variables drastically increases the
number of models
Note: relying on projection, hence does not necessarily capture all
the structure/information
Future: Want to better characterize the clusters/model spaces