Click to add title - University of Iowa

Download Report

Transcript Click to add title - University of Iowa

Gibbs Variable Selection
Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu
Topics
1. Overview of Variable Selection Procedures
2. Gibbs Variable Selection (GVS)
3. How to implement in WinBUGS
4. Recommendations
5. Appendices
Overview of Variable Selection Procedures
• Selecting the best model
– The best likelihood
– The link function
– Priors
– Variable Selection
Overview of Variable Selection Procedures
• The type of response variable typically
narrows our choices for:
• The best likelihood
• The link function
• Also, we often only have a relatively small
number of priors to choose from
• If no prior information is known non-informative
priors are chosen for the model parameters.
• If prior information is known, this will also narrow
our choices for priors
Overview of Variable Selection Procedures
• However there may be many choices for
what variables should be included in the
model
• In many real world problems the number of
candidate variables is in the tens to hundreds.
• For example 10 candidate variables leads to 1024
different possible linear models
• More models are possible if considering
interactions and non-linear terms!
Overview of Variable Selection Procedures
• Things to keep in mind when doing variable
selection
– P-values for variables are no longer valid.
• Variable selection is a form of “data snooping” (recall
“multiple comparison procedures”)
– Correlation between the predictors could lead to less
than optimal results.
– It is best to use cross-validation techniques as a
safeguard
– Best used in “prediction models” rather than “effect
models”
Overview of Variable Selection Procedures
• Variable Selection Methods
• Frequentists use methods such as
– Stepwise Regression
– Mallow’s CP
– Maximum R-squared
• What methods do Bayesians use?
Overview of Bayesian Variable Selection Procedures
Method
Used For
Ease of Use
References*
Gibbs Variable
Selection (GVS)
Variable Selection
Moderately Easy
2, 3, 7.
SSVS (Stochastic
Search Variable
Selection)
Variable Selection
Somewhat difficult
2, 5.
Kuo and Mallick
(Unconditional
Priors for Variable
Selection )
Variable Selection
Very Easy
2, 6.
Carlin and Chib
General Model
Selection
Moderately Easy
1, 2.
Moderately Difficult
4 and also see Matt
Reversible Jump
Mostly Variable
Selection
* Refer to reverence slide
Bognar’s thesis in
241 SH
Overview of Bayesian Variable Selection Procedures
Method
Advantages
Gibbs Variable
Selection (GVS)
Pseudo-priors do not affect the posterior
distribution.
Easy to implement using WinBUGS.
Requires pseudo-priors on all model
coefficients whose sole function is to
increase the efficiency of the sampler.
SSVS (Stochastic
Search Variable
Selection)
Can fit a wide variety of models. Allows
the user to indicate which models they
think are more likely.
Requires pseudo-priors on all model
coefficients and candidate models.
Kuo and Mallick
(Unconditional
Priors for Variable
Selection )
Extremely straightforward, only required
to specify the usual prior on full
parameter vector( for the full model) and
the conditional prior distributions replace
the pseudo-priors.
No flexibility to alter the method to
improve efficiency. If, for any parameter,
the prior is diffuse compared with the
posterior, the method becomes inefficient.
Flexible Gibbs sampling strategy for any
situation involving model uncertainty.
Computationally demanding. Must specify
efficient pseudo-priors becomes too time
consuming if there are a large number of
model under consideration
No need for pseudo-priors. Maybe faster
than GVS.
Diffuse priors will often lead to the fewest
parameter model being chosen.
Cannot implement in WinBUGS.
Carlin and Chib
Reversible Jump*
Disadvantages
* Thanks to Dr. Matt Bognar for insights into reversible jump. His thesis in 241 SH contains examples of using this method.
Gibbs Variable Selection
• GVS Sampling Procedure
Likelihood:
Y[i] ~ dnorm(mu[i],tau)
mu[i]= β0 + β1* X1 *γ1 + β2* X2 *γ2 + … + βp* Xp *γp
Prior:
γi ~ dbern(0.5)
#γ=1 means this variable is selected
βi ~γi*Real Prior+(1-γi)*Pseudo Prior
tau ~ dgamma(1.0E-3,1.0E-3)
How to Implement in WinBUGS
• We adapted code from Ntzoufras, I. (2003)
– This code and the paper is available on the
WinBUGS web site
– The example showed variable selection for a
model with 3 candidate predictor variables
– This code required the user to modify the
code extensively if they wanted to use it for
their own data
How to Implement in WinBUGS
• Our WinBUGS code
– Requires no changes in the model
specification
– The user must only insert their data and
modify initial values.
•
•
•
•
p=number of x variables
N=Number of observations
Models=number of models (2^p)
Initial values for beta’s
How to Implement in WinBUGS
• Provided at the end of this presentation
– Full WinBUGS code for variable selection
– Code for fitting the full model in WinBUGs (for
development of Pseudopriors)
• This code also only requires the user to change
the data and initial values
– R code to assist in interpreting output from
WinBUGS
– SAS Code used to develop sample data
How to Implement in WinBUGS
EXAMPLE
• Data used for example
– Validated our code using published results for a
model with 3 variables
– Created a simulated data set with 500 observations
and 10 predictors
• 9 predictors were continuous
• 1 predictor was binary
• Created a version of the file with correlation between
predictors to test robustness of GVS to non-orthogonal data
– Code used to create simulated data is in Appendix 1
– Full correlation matrix (for correlated data) is in
Appendix 2
How to Implement in WinBUGS
Target model created in simulated data
Y= 2 * X1 + .7 * X6 + .22 * X10 -.7 * X5 +
random error (normal 0,1)
How to implement in WinBUGS
Prior to Running Our Code
Step 1
Standardize X
Matrix
Orthoganalize
X Matrix**
** see recommendations
Step 2
Fit the full model
to develop
information
for
pseudo-priors
Step 3
Determine
Likelihoods
and priors
How to Implement GVS in WinBUGS
Step 1
• Standardize X matrix
– Centering the covariates will remove
correlation between the model coefficients
– Dividing by the standard deviation allows for
comparison of coefficients on the same scale
– May also make it easy to assign proper noninformative priors
– If the user wants to use informative priors this
may be a nuisance
How to Implement GVS in WinBUGS
Step 1
• Standardize X matrix
– In SAS
proc standard data=input_file_name mean=0 std=1
out=output_file_name;
var variable_names;
run;
– In Winbugs
# This is at the top of the model code and is done automatically
for (j in 1:p) {
b[j] <- beta[j]/sd(x[,j])
for (i in 1:N) {
z[i,j] <- (x[i,j] - mean(x[,j]))/sd(x[,j]) }
temp_mean[j]<-b[j]*mean(x[,j]) }
b0 <- beta0-sum(temp_mean[])
How to Implement GVS in WinBUGS
Step 1
• Orthoganalize X matrix**
– This is recommended by those who designed GVS to
make variable selection more accurate
– However, this makes interpretation of the coefficients
impossible
– We do not include this step in the WinBUGS code but
suggest some alternative methods for handling
correlation in our recommendations
How to Implement GVS in WinBUGS
Step 1
• Orthogonalize X matrix**
– In our example we simulated data with correlations and GVS
seems to be able to handle some correlation
– More analysis needs to be done to determine the sensitivity of
the procedure to correlations in the x matrix
– We recommend variable clustering to remove highly correlated
variables
– If you want to orthogonalize your X matrix, this is the appropriate
SAS code
proc princomp data=input_file_name out=output_file_name;
var variable_names;
run;
How to Implement GVS in WinBUGS
Step 2
• Fit full model to get Pseudo-priors
– In SAS
proc reg data=input_file_name;
model y=variable_list;
run;
– In Winbugs (full code is at end of document)
Model {
# Standardize code here
#likelihood
for (i in 1:N){
Y[i] ~ dnorm(mu[i],tau)
for(j in 1:p){
temp[i,j]<-beta[j]*z[i,j] }
mu[i] <- beta0 + sum(temp[i,]) }
beta0 ~ dnorm(0,0.00001)
for (j in 1:p) {
beta[j] ~dnorm(0, 1.0E-6) }
tau ~ dgamma(1.0E-3,1.0E-3)
sigma <- sqrt(1/tau)}
How to Implement GVS in WinBUGS
Step 2
• Fit full model to get Pseudo-priors
– In SAS
Parameter Estimates
Variable
Intercept
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
1
1
1
1
1
1
1
1
1
1
1
7.99442
2.03194
-0.01728
-0.09008
0.02241
-0.38520
0.71948
0.03432
0.04153
-0.01484
0.20915
0.04431
0.05115
0.05096
0.04458
0.04495
0.04590
0.04489
0.04496
0.04471
0.04682
0.04488
180.42
39.72
-0.34
-2.02
0.50
-8.39
16.03
0.76
0.93
-0.32
4.66
<.0001
<.0001
0.7347
0.0439
0.6183
<.0001
<.0001
0.4456
0.3534
0.7515
<.0001
How to Implement GVS in WinBUGS
Step 2
• Fit full model to get Pseudo-priors
– In WinBugs (note estimates are very similar and SAS runs in a
fraction of the time)
node
beta[1]
beta[2]
beta[3]
beta[4]
beta[5]
beta[6]
beta[7]
beta[8]
beta[9]
beta[10]
mean
sd
MC error
2.031
0.0511 4.74E-04
-0.0171 0.05108 5.09E-04
-0.09102 0.04418 4.50E-04
0.02227 0.04458 4.06E-04
-0.3856 0.04629 4.51E-04
0.7189 0.04544 4.43E-04
0.03404 0.04576 5.22E-04
0.04132 0.04451 4.03E-04
-0.01492 0.04676 4.58E-04
0.2097 0.04496 4.44E-04
2.50% median
1.931
2.032
-0.1142 -0.01757
-0.1777 -0.09149
-0.06568 0.02315
-0.4772
-0.386
0.6307
0.7186
-0.05454
0.0342
-0.04581 0.04108
-0.1063 -0.01505
0.1207
0.2097
97.50%
2.132
0.08328
-0.00371
0.1091
-0.2954
0.8083
0.1245
0.1306
0.07692
0.2975
start sample
500
9501
500
9501
500
9501
500
9501
500
9501
500
9501
500
9501
500
9501
500
9501
500
9501
How to Implement GVS in WinBUGS
Step 3
• Determine the Likelihood and Priors
– Our GVS code is set up with the standard
likelihood and non-informative priors
– Edit GVS code before running to put in data
for pseudo-priors
How to implement in WinBUGS
Running Our Adapted Code
model {
# Standardize x's and coefficients
for (j in 1:p) {
b[j] <- beta[j]/sd(x[,j]) ;
for (i in 1:N) {
z[i,j] <- (x[i,j] - mean(x[,j]))/sd(x[,j]) ;
}
temp_mean[j]<-b[j]*mean(x[,j])
}
b0 <- beta0-sum(temp_mean[])
How to implement in WinBUGS
#likelihood
for (i in 1:N){
Y[i] ~ dnorm(mu[i],tau)
for(j in 1:p){
temp[i,j]<-g[j]*beta[j]*z[i,j]
}
mu[i] <- beta0 + sum(temp[i,])
# residuals
stres[i] <- (Y[i] - mu[i])/sigma
#if standardized residual is greater than 2.5, outlier
outlier[i] <- step(stres[i] - 2.5) + step(-(stres[i]+2.5) )
}
How to implement in WinBUGS
for (j in 1:p){
# Create indicators for the possible variables in the model
# for example x[3] show after intercept, x1,x2,x1+x2, which is pow(2,3-1)
TempIndicator[j]<-g[j]*pow(2, j-1)
}
#Create a model number for each possible model
mdl<- 1+sum(TempIndicator[])
# calculate the percentage of time each model is selected
for (j in 1 : models){
pmdl[j]<-equals(mdl, j)
}
How to implement in WinBUGS
# diffuse normal prior on the intercept
beta0 ~ dnorm(0,0.00001)
# if the parameter is not in the model an informative prior is used
# this prior is calculated using a prior run of the full model
for (j in 1:p) {
bprior[j]<-(1-g[j])*mean[j]
tprior[j] <-g[j]*0.001+(1-g[j])/(se[j]*se[j])
beta[j] ~ dnorm(bprior[j],tprior[j])
g[j]~ dbern(0.5)
}
tau ~ dgamma(1.0E-3,1.0E-3)
sigma <- sqrt(1/tau)
}
How to implement in WinBUGS
# Fit the full model and put the information for the mean and se of each
# standardized beta here for use in the pseudo-priors
DATA
mean[] se[]
2.013
0.112
0.149
0.112
-0.259 0.094
-0.161 0.096
-0.293 0.134
0.564
0.097
0.094
0.098
-0.023 0.097
-0.018 0.133
0.214
0.088
END
How to implement in WinBUGS
# Set the initial values for the beta's, the overall
# precision and the variable selection indicators
Initial Values
list(beta0 = 0, beta=c(0,0, 0,0,0,0,0,0,0,0), tau = .1, g=c(1,1,1,1,1,1,1,1,1,1))
# P=number of parameters
# N=the number of obs
# models= the number of possible models (2^p)
DATA
list(p = 10, N = 500, models = 1024
Y =c(6.339,…..
How to implement in WinBUGS
• Output
– You should monitor at least all of the Beta’s and
PMDL which is the percentage of the time each model
was picked
– Models will be numbered 1 – 1024 in our example
(the total number of models)
– To see which model number corresponds to which
variables selected we created an R function which
outputs the model names in order (appendix 5)
• To use this fuction type print.ind(number of x variables)
Example: print.ind(10)
How to implement in WinBUGS
Correlated Data
• Our Example
– Ran 50,000
iterations
– Some models
not visited
– beta are the
standardized
coefficients
– B are the raw
coefficients
– Simulated
•
•
•
•
b1=2,
b5=-.7,
b6=.7,
b10=.22
node
mean
sd
MC error
2.50%
median
97.50%
b[1]
1.99
0.04395
2.14E-04
1.904
1.99
2.076
b[2]
-0.01494
0.04601
2.16E-04
-0.1054
-0.0151
0.07503
b[3]
-0.09533
0.04703
2.03E-04
-0.1878
-0.09507
-0.00378
b[4]
0.02211
0.04402
2.08E-04
-0.06457
0.02212
0.1083
b[5]
-0.7717
0.08926
3.88E-04
-0.9483
-0.7714
-0.5961
b[6]
0.7387
0.04553
2.04E-04
0.65
0.7388
0.8278
b[7]
0.035
0.04602
2.23E-04
-0.05527
0.03524
0.1248
b[8]
0.04097
0.04408
1.92E-04
-0.04531
0.04119
0.128
b[9]
-0.0145
0.04446
1.99E-04
-0.1007
-0.01441
0.07348
0.2126
0.04531
1.94E-04
0.1234
0.2125
0.3015
beta[1]
2.016
0.04451
2.17E-04
1.929
2.016
2.102
beta[2]
-0.01651
0.05087
2.39E-04
-0.1165
-0.01669
0.08294
beta[3]
-0.09032
0.04456
1.92E-04
-0.1779
-0.09007
-0.00358
beta[4]
0.0225
0.0448
2.11E-04
-0.06572
0.02251
0.1102
beta[5]
-0.3862
0.04467
1.94E-04
-0.4745
-0.386
-0.2983
beta[6]
0.7196
0.04435
1.99E-04
0.6332
0.7197
0.8064
beta[7]
0.03415
0.0449
2.17E-04
-0.05393
0.03439
0.1217
beta[8]
0.04161
0.04477
1.95E-04
-0.04602
0.04184
0.13
beta[9]
-0.01521
0.04663
2.09E-04
-0.1056
-0.01511
0.07707
0.2096
0.04467
1.91E-04
0.1216
0.2094
0.2972
b[10]
beta[10]
How to implement in WinBUGS
• Our Example
– The model visited
97% of the time was
the target model
– This is due to the
strong correlations
between the
response and
predictors.
– We also modeled
data with weaker
correlations.
• in these cases the
target model was
usually in the top 5
models and several
models were visited
with higher frequency
Correlated Data
> print.ind(10)
node
[1] "x1+ x5+ x6+ x10"
pmdl[562]
[1] "x1+ x5+ x6"
mean
sd
MC
error
0.9693
0.1726
7.61E-04
pmdl[50]
0.01232
0.1103
4.93E-04
[1] "x1+ x3+ x5+ x6+ x10"
pmdl[566]
0.009555
0.09728
4.04E-04
[1] "x1+ x5+ x6+ x7+ x10"
pmdl[626]
0.00204
0.04512
1.89E-04
[1] "x1+ x5+ x6+ x8+ x10"
pmdl[690]
0.00202
0.0449
2.05E-04
[1] "x1+ x4+ x5+ x6+ x10"
pmdl[570]
0.001778
0.04213
1.82E-04
[1] "x1+ x5+ x6+ x9+ x10"
pmdl[818]
0.001333
0.03649
1.70E-04
[1] "x1+ x2+ x5+ x6+ x10"
pmdl[564]
0.001252
0.03537
1.52E-04
[1] "x1+ x3+ x5+ x6"
pmdl[54]
1.01E-04
0.01005
4.47E-05
How to implement in WinBUGS
• Our Example
– Frequentist
Stepwise method
picks appropriate
model but also
includes x3 at the
.05 level
– Note that the pvalue of .0473 is
not accurate (since
we were data
snooping)
– Parameter
estimates are very
similar
Correlated Data
Var.
Entered
Step
Partial
R-Square
Model
R-Square
C(p)
F Value
Pr > F
1
x1
0.7173
0.7173
354.643
1263.5
<.0001
2
x6
0.0874
0.8047
93.6354
222.44
<.0001
3
x5
0.0234
0.8281
25.2357
67.51
<.0001
4
x10
0.0074
0.8355
5.0224
22.21
<.0001
5
x3
0.0013
0.8368
3.091
3.95
0.0473
Variable
Estimate
Intercept
0.19748
x1
SE
Type II SS
F Value
Pr > F
0.55106
0.12533
0.13
1.99288
0.04382
2018.143
2067.97
x3
-0.09303
0.04678
3.85938
3.95
x5
-0.77756
0.08863
75.11717
76.97
<.0001
x6
0.7403
0.04546
258.8251
265.22
<.0001
x10
0.20933
0.04509
21.03744
21.56
<.0001
0.7202
<.0001
0.0473
How to implement in WinBUGS
Non-Correlated Data
• Our Example
– Ran 50,000
iterations
– Some models
not visited
– beta are the
standardized
coefficients
– B are the raw
coefficients
– Simulated
•
•
•
•
b1=2,
b5=-.7,
b6=.7,
b10=.22
node
mean
sd
MC
error
2.50%
median
97.50%
start
sampl
e
b[1]
2.039
0.04615
2.20E-04
1.948
2.039
2.129
500
49501
b[2]
0.187
0.1436
6.75E-04
-0.09499
0.1865
0.4679
500
49501
b[3]
-0.1483
0.1363
5.91E-04
-0.4163
-0.1475
0.1169
500
49501
b[4]
-0.1588
0.1387
6.55E-04
-0.432
-0.1587
0.1124
500
49501
b[5]
-0.5456
0.09244
4.02E-04
-0.729
-0.5455
-0.3641
500
49501
b[6]
0.7378
0.0473
2.11E-04
0.6455
0.7377
0.8308
500
49501
b[7]
0.00314
0.1303
6.29E-04
-0.2525
0.003829
0.2571
500
49501
b[8]
0.01064
0.1402
6.12E-04
-0.2638
0.01136
0.2876
500
49501
b[9]
0.1248
0.1325
5.93E-04
-0.1318
0.1249
0.3871
500
49501
b[10]
0.3615
0.048
2.02E-04
0.2668
0.3613
0.4557
500
49501
beta[1]
2.037
0.0461
2.20E-04
1.947
2.037
2.127
500
49501
beta[2]
0.1781
0.1368
6.43E-04
-0.09047
0.1776
0.4457
500
49501
beta[3]
-0.1472
0.1354
5.87E-04
-0.4134
-0.1465
0.1161
500
49501
beta[4]
-0.1543
0.1348
6.37E-04
-0.4199
-0.1542
0.1093
500
49501
beta[5]
-0.2731
0.04627
2.01E-04
-0.3649
-0.273
-0.1822
500
49501
beta[6]
0.7182
0.04604
2.05E-04
0.6283
0.7181
0.8087
500
49501
beta[7]
0.00325
0.135
6.52E-04
-0.2617
0.00397
0.2665
500
49501
beta[8]
0.01025
0.1351
5.89E-04
-0.2542
0.01094
0.2771
500
49501
beta[9]
0.126
0.1339
5.99E-04
-0.1331
0.1262
0.391
500
49501
beta[10]
0.3483
0.04625
1.94E-04
0.2571
0.3481
0.439
500
49501
How to implement in WinBUGS
• Our Example
– Oddly, the coefficient
for x10 and x5 are
farther off than in the
correlated data. This
is by chance (found
in simulated data
review)
– The model visited
most often is the
target model
– This model is visited
a slightly higher
percentage of the
time than in the
correlated data
Non-Correlated Data
> print.ind(10)
node
[1] "x1+ x5+ x6+ x10"
pmdl[562]
[1] "x1+ x4+ x5+ x6+ x10"
mean
sd
0.9904
0.09728
pmdl[570]
0.002424
0.04918
[1] "x1+ x5+ x6+ x9+ x10"
pmdl[818]
0.001616
0.04017
[1] "x1+ x5+ x6+ x7+ x10"
pmdl[626]
0.001535
0.03915
[1] "x1+ x5+ x6+ x8+ x10"
pmdl[690]
0.001455
0.03811
[1] "x1+ x2+ x5+ x6+ x10"
pmdl[564]
0.001293
0.03593
[1] "x1+ x3+ x5+ x6+ x10"
pmdl[566]
0.001111
0.03331
[1] "x1+ x6+ x10"
pmdl[546]
6.06E-05
0.007785
[1] "x1+ x4+ x5+ x6+ x7+ x10"
pmdl[634]
2.02E-05
0.004495
[1] "x1+ x4+ x5+ x6+ x8+ x10"
pmdl[698]
2.02E-05
0.004495
[1] "x1+ x5+ x6+ x7+ x8+ x10"
pmdl[754]
2.02E-05
0.004495
How to implement in WinBUGS
• Our Example
– Frequentist Stepwise
method picks
appropriate model
– Parameter estimates
are very similar
Non-Correlated Data
Step
Var.
Entered
Partial
R-Square
Model
R-Square
C(p)
F Value
Pr > F
1
x1
0.695
0.695
335.019
1134.74
<.0001
2
x6
0.0914
0.7863
88.1258
212.5
<.0001
3
x10
0.0209
0.8072
33.2824
53.67
<.0001
4
x5
0.0128
0.82
0.5196
35.08
<.0001
Variable
Estimate
SE
Type II SS
F Value
Intercept
-1.68739
x1
Pr > F
0.55768
9.68545
9.16
2.03861
0.04613
2065.806
1952.68
<.0001
x5
-0.54495
0.09201
37.11277
35.08
<.0001
x6
0.73788
0.04734
256.9933
242.92
<.0001
x10
0.36191
0.04785
60.5075
57.19
<.0001
0.0026
Recommendations
• Orthogonalizing the X matrix makes
interpretation of the coefficients impossible
– If the correlations between the response and
explanatory variables are stronger then the
correlations between the explanatory variables you
may not need to orthogonalize the X matrix
– If correlations are high within the x matrix we
recommend using a variable clustering procedure and
then pick an explanatory variable from each cluster
• In SAS use Proc VARCLUS
Recommendations
• Remember the number of candidate
models is 2 raised to the number of
variables you have.
– In our example 2^10-1024.
– Sampler may have to many iterations
Conclusions
• Using the adapted WinBUGS code, it is now easy to
implement GVS in the standard regression setting
• We are working on also adapting other code from the
Ntzoufras paper (like Ridge Regression) to make it easy
to implement
• In the case of standard linear regression with noninformative priors, the GVS method appears to give
almost identical results to frequentist methods
implemented in SAS.
– SAS took seconds to fit these models while WinBUGS took
hours
• The additional time to use GVS may only be warranted
when wanting to use informative priors.
Appendix 1 Simulated Data Creation Code
*Correlated Data;
*Non-correlated Data;
data hw.simulate (drop= i j);
data hw.simulate_nocorr (drop= i j);
array x{10};
do i= 1 to 500;
array x{10};
do i= 1 to 500;
* Make 10 normal(J,1) variables;
* Make 10 normal(J,1) variables;
do j= 1 to 10;
x{j}=rannor(12458)+j;
do j= 1 to 10;
x{j}=rannor(12458)+j;
end;
end;
* Turn one into a binary variable;
if x5 le 5 then x5=1;
else x5=0;
* Turn one into a binary variable;
if x5 le 5 then x5=1;
else x5=0;
*create correlation between two predictors;
x2=.5*x1+rannor(5);
if x5=1 then x9=rannor(5)+.5;
else x9=rannor(5);
*create y from x1, x5, x6, and x10;
*create y from x1, x5, x6, and x10;
y=2*x1+.7*x6+.22*x10-.7*x5+rannor(1);
output ;
end;
run;
y=2*x1+.7*x6+.22*x10-.7*x5+rannor(1);
output ;
end;
run;
Appendix 2 Simulated Data Correlation Matrix
x1
x1
x2
0.479
x3
0.032
x4
0.025
x5
0.004
x6
0.047
x7 -0.055
x8 -0.039
x9
0.153
x10 0.063
y
0.847
x2
x3
0.479 0.032
-0.027
-0.027
-0.012 -0.031
-0.028 -0.038
0.109 0.019
-0.002 0.023
-0.021 0.034
0.093 0.013
0.046 -0.041
0.433 -0.001
x4
0.025
-0.012
-0.031
0.054
0.015
-0.003
0.079
0.113
0.056
0.032
x5
0.004
-0.028
-0.038
0.054
-0.007
-0.044
-0.022
0.235
0.066
-0.151
x6
0.047
0.109
0.019
0.015
-0.007
0.079
-0.021
-0.056
-0.009
0.335
x7
-0.055
-0.002
0.023
-0.003
-0.044
0.079
x8
-0.039
-0.021
0.034
0.079
-0.022
-0.021
0.057
x9
0.153
0.093
0.013
0.113
0.235
-0.056
-0.096
0.009
x10
0.063
0.046
-0.041
0.056
0.066
-0.009
-0.077
-0.050
-0.019
0.057
-0.096 0.009
-0.077 -0.050 -0.019
-0.007 -0.022 0.065 0.125
y
0.847
0.433
-0.001
0.032
-0.151
0.335
-0.007
-0.022
0.065
0.125
Appendix 3 Code to Fit full model in WinBUGS
#This code can be run with modification
# only to the data and initial values
Model {
# Standardize x's and coefficients
for (j in 1:p) {
b[j] <- beta[j]/sd(x[,j])
for (i in 1:N) {
z[i,j] <- (x[i,j] - mean(x[,j]))/sd(x[,j]) }
temp_mean[j]<-b[j]*mean(x[,j]) }
b0 <- beta0-sum(temp_mean[])
#likelihood
for (i in 1:N){
Y[i] ~ dnorm(mu[i],tau)
for(j in 1:p){
temp[i,j]<-beta[j]*z[i,j] }
mu[i] <- beta0 + sum(temp[i,])
#priors
beta0 ~ dnorm(0,0.00001)
for (j in 1:p) {
beta[j] ~dnorm(0, 1.0E-6)
}
tau ~ dgamma(1.0E-3,1.0E-3)
sigma <- sqrt(1/tau)
}
DATA
#p= the number of explanatory variables
#N=the number of observations
#Put your response variables and explanatory variables here
list(p = 10, N = 500,
Y =c(6.339,…, 8.4532),
x = structure(.Data =c(0.3534,…, 10.6885),.Dim = c(500,10)))
Initial Values
#enter initial values for the betas.
list(beta0 = 0, beta=c(0,0, 0,0,0,0,0,0,0,0), tau = .1)
}
Appendix 4 GVS variable selection code
**************************************************************************
# Code from Gibbs Variable Selection Using BUGS
# code of Ioannis Ntzoufras for variable selection with 3
# predictor variables was modified to allow for
# variable selection for any number of variables
# the user only need to modify the inits and data
model {
# Standardize x's and coefficients
for (j in 1:p) {
b[j] <- beta[j]/sd(x[,j]) ;
for (i in 1:N) {
z[i,j] <- (x[i,j] - mean(x[,j]))/sd(x[,j]) ;
}
temp_mean[j]<-b[j]*mean(x[,j])
}
b0 <- beta0-sum(temp_mean[])
#likelihood
for (i in 1:N){
Y[i] ~ dnorm(mu[i],tau)
for(j in 1:p){
temp[i,j]<-g[j]*beta[j]*z[i,j]
}
mu[i] <- beta0 + sum(temp[i,])
# residuals
stres[i] <- (Y[i] - mu[i])/sigma
#if standardized residual is greater than 2.5, outlier
outlier[i] <- step(stres[i] - 2.5) + step(-(stres[i]+2.5) )
}
for (j in 1:p){
# Create indicators for the possible variables in the model
TempIndicator[j]<-g[j]*pow(2, j-1)
}
#Create a model number for each possible model
mdl<- 1+sum(TempIndicator[])
# calculate the percentage of time each model is selected
for (j in 1 : models){
pmdl[j]<-equals(mdl, j)
}
# Priors
# diffuse normal prior on the intercept
beta0 ~ dnorm(0,0.00001)
# if the parameter is not in the model an informative prior is used
# this prior is calculated using a prior run of the full model
for (j in 1:p) {
bprior[j]<-(1-g[j])*mean[j]
tprior[j] <-g[j]*0.001+(1-g[j])/(se[j]*se[j])
beta[j] ~ dnorm(bprior[j],tprior[j])
g[j]~ dbern(0.5)
}
tau ~ dgamma(1.0E-3,1.0E-3)
sigma <- sqrt(1/tau)
}
Appendix 4 GVS variable selection code
# Fit the full model and put the information for
# the mean and se of each beta here for use in
# the pseudo priors
mean[] se[]
2.013
0.149
-0.259
-0.161
-0.293
0.564
0.094
-0.023
-0.018
0.214
0.112
0.112
0.094
0.096
0.134
0.097
0.098
0.097
0.133
0.088
END
# Set the initial values for the beta's, the overall
# precision and the variable selection indicators
Initial Values
list(beta0 = 0, beta=c(0,0, 0,0,0,0,0,0,0,0), tau = .1, g=c(1,1,1,1,1,1,1,1,1,1))
# P=number of parameters
# N=the number of obs
# models= the number of possible models (2^p)
list(p = 10, N = 500, models=1024,
Y =c(6.339, 4.4347, …),
x = structure(.Data =c(0.3534,…10.6885),.Dim = c(500,10)))
Appendix 5 R code to name models
ind<-function(p)
{ if(p == 0) {
return(t <- 0) }
else if(p == 1) {return(t <- rbind(0, 1)) }
else if(p == 2) {
return(t <- rbind(c(0, 0), c(1, 0), c(0, 1), c(1, 1)))
}
else {
t <- rbind(cbind(ind(p - 1), rep(0, 2^(p - 1))), cbind(
ind(p - 1), rep(1, 2^(p - 1))))
return(t)
}}
print.ind<-function(p)
{ t <- ind(p)
print("intercept")
for(i in 2:nrow(t)) {
e <- NULL
L <- T
for(j in 1:ncol(t)) {
if(t[i, j] == 1 & L == T) {
e <- paste(e, "x", j, sep = "")
L <- F
}
else if(t[i, j] == 1 & L == F) {
e <- paste(e, "+ x", j, sep = "")
}}
print(e)}}
References
1. Carlin, B.P. and Chib S. (1995), “Bayesian Model Choice via Markov Chain Monte
Carlo Methods”, Journal of Royal Statistics Society, B, 57, 473-484.
2. Dellaportas, P., Forster, J., & Ntzoufras, I. (2000), “Bayesian Variable Selection Using
the Gibbs Sampler” In Dey, K., Ghosh, S., Mallick, B. (Eds.), Generalized Linear
Models: A Bayesian Perspective. New York, NY, Marcel Drekker Inc.
3. Dellaportas, P., Forster, J., & Ntzoufras, I. (2002), “On Bayesian Model and Variable
Selection using MCMC”, Statistics and Computing 12:27-36.
4. Green, P. (1995) “Reversible Jump Markov Chain Monte Carlo Computation and
Bayesian Model Determination”, Biometrika, Vol. 82 No. 4 711-732.
5. George, E. & McCullock, R. (1993) “Variable Selection Via Gibbs Sampling”, Journal
of the American Statistical Association, September 1993, Volume 88, No. 423, Theory
and Methods.
6. Kuo, L. and Mallick, B. (1998), “Variable Selection for Regression Models”, Sankhya,
B, 60, Part 1, 65-81.
7. Ntzoufras, I. (2003) “Gibbs Variable Selection Using BUGS”, Journal of Statistical
Software, Volume 7, Issue 7.