Transcript Slide 1

Statistical Guidelines for
Psychosomatic Medicine:
A brief commentary
Reporting Results
Lay out analytic plan
Explicitly tie analysis to hypothesis
Include the exact model
Discuss assumptions
Discuss power
Correction for multiplicity-if not, why not?
Tables
Report exact p-values
Round, round and round some more
Mention scale in regression tables
Model fit, if relevant
Graphics
Avoid “ducks”
No 3-d unless data are 3-d
Box or dot plots preferred to bar charts
One-sided (Directional)
Hypothesis Tests
• Controversial
• Typically preferred because it covers
unexpected result
• Argument is that one-sided can be used if
unexpected result or no difference would
not lead to different action or suggest risk
• Need to justify deviation
• What’s wrong with higher p-value for new
ideas?
Artificial Categorization of
Variables
• Long literature outlining problems with this
approach
• In population, by definition reduces power
• In samples, can get a “lucky cut”
• Does NOT improve reliability
• Doesn’t make measurement sense
• Hides non-linear relations
• Can yield spurious results on multivariable
applications
Type I error rates for the relation between x2 and y after dichotomizing
two continuous predictors. Maxwell and Delaney (21) calculated the effect
of dichotomizing two continuous predictors as a function of the correlation
between them. The true model is y = .5x1 + 0x2 where all variables are
continuous. If x1 and x2 are dichotomized, the error rate for the relation
between x2 and y increases as the correlation between x1 and x2 increases.
Correlation between x1 and x2
N
0
.3
.5
.7
50
.05
.06
.08
.10
100
.05
.08
.12
.18
200
.05
.10
.19
.31
Artificial Categorization of
Variables
• If true category, use something like
clustering, not median splits
• If expect nonlinearity, use polynomials or
splines (splitting into quartiles, etc., is
acceptable, but increases standard errors
considerably)
• Clinical cutpoints should not figure into
statistical modeling until the model is
already developed with ALL the data
author ={Chatfield, C.},
title = {Model uncertainty, data mining and statistical inference (with discussion)},
journal = JRSSA,
year = 1995,
volume = 158,
pages = {419-466},
annote =
--bias by selecting model because it fits the data well; bias in standard errors; P. 420: ...
need for a better balance in the literature and in statistical teaching between techniques
and problem solving strategies}. P. 421: It is `well known' to be `logically unsound
and practically misleading' (Zhang, 1992) to make inferences as if a model is
known to be true when it has, in fact, been selected from the same data to be used
for estimation purposes. However, although statisticians may admit this privately
(Breiman (1992) calls it a `quiet scandal'), they (we) continue to ignore the
difficulties because it is not clear what else could or should be done. P. 421:
Estimation errors for regression coefficients are usually smaller than errors from failing
to take into account model specification. P. 422: Statisticians must stop pretending
that model uncertainty does not exist and begin to find ways of coping with it. P.
426: It is indeed strange that we often admit model uncertainty by searching for a
best model but then ignore this uncertainty by making inferences and predictions
as if certain that the best fitting model is actually true.
P. 427: The analyst needs to assess the model selection process and not just the
best fitting model. P. 432: The use of subset selection methods is well known to
introduce alarming biases. P. 433: ... the AIC can be highly biased in data-driven model
selection situations. P. 434: Prediction intervals will generally be too narrow. In the
discussion, Jamal R. M. Ameen states that a model should be (a) satisfactory in
performance relative to the stated objective, (b) logically sound, (c) representative, (d)
questionable and subject to on--line interrogation, (e) able to accommodate external or
expert information and (f) able to convey information.}
Automated Stepwise Selection
Procedures
• Can lead to wildly optimistic models
• Doesn’t deal well with correlated
predictors
• Extremely poor replication unless sample
sizes are huge
• “Best subset” has similar problems
SOME of the problems with
stepwise variable selection.
1. It yields R-squared values that are badly biased high
2. The F and chi-squared tests quoted next to each variable on the printout do not
have the claimed distribution
3. The method yields confidence intervals for effects and predicted values that are
falsely narrow
4. It yields P-values that do not have the proper meaning and the proper correction
for them is a very difficult problem
5. It gives biased regression coefficients that need shrinkage (the coefficients for
remaining variables are too large).
6. It has severe problems in the presence of collinearity
7. It is based on methods (e.g. F tests for nested models) that were intended to be
used to test pre-specified hypotheses.
8. Increasing the sample size doesn't help very much
9. It allows us to not think about the problem
Simulation results: Number of
Noise Variables Included
% of samples
35
Sample Size
30
100
200
500
1000
10000
25
20
15
10
5
0
0
1
2
3
4
5
Variables in Final Model
20 candidate predictors; 100 samples
6
7
Automated Stepwise Selection
Procedures
• If confronted with too many predictors:
– Use theory to delete
– Combine predictors using clustering or tree
methods before modeling without looking at Y
– Use approaches that exploit correlated
variables, MANOVA, SEM, PLS, Principal
Components Regression
• If you MUST use stepwise
– Backward preferable
– Set p to remove high
– MUST cross-validate
Variable Selection in
Multivariable Models
• Fit and p-values for regressions are based on
assumption of pre-specified model
• Univariate prescreening requires correction to
adjust for process
– P-values should not be sole guide—not a hypothesis
test!
– raise model df to reflect all variables searched
– Cross validation to show level of optimism
– Use “pre-shrinkage”
• Pay attention to effective sample size
– Too many predictors leads to poor power and
instability of estimates
Simulation results: number
of events/predictor ratio
% Relative Bias
50
40
NYHA
CHF
VES
DM
STD
HTN
LVC
30
20
10
0
-10
-20
0
2
5
10
15
20
25
Events per variable
From Peduzzi et al. J Clin Epidemiol. 1996 Dec;49(12):1373-9.
Harrell FE Jr. Regression modeling strategies: with applications to linear models,
logistic regression and survival analysis. New York: Springer; 2001.
Green SB. How many subjects does it take to do a regression analysis? Multivar
Behav Res 1991; 26: 499–510.
Peduzzi PN, Concato J, Holford TR, Feinstein AR. The importance of events per
independent variable in multivariable analysis, II: accuracy and precision of
regression estimates. J Clin Epidemiol 1995; 48: 1503–10
Peduzzi PN, Concato J, Kemper E, Holford TR, Feinstein AR. A simulation study
of the number of events per variable in logistic regression analysis. J Clin
Epidemiol 1996; 49: 1373–9.
Thompson B. Stepwise regression and stepwise discriminant analysis need not
apply here: a guidelines editorial. Ed Psychol Meas 1995; 55: 525–34.
Cohen J. Things I have learned (so far). Am Psychol 1990; 45: 1304–12.
Roecker EB. Prediction error and its estimation for subset-selected models
Technometrics 1991; 33: 459–68.
Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc B
2003; 58: 267–88.
Grambsch PM, O’Brien PC. The effects of preliminary tests for nonlinearity in
regression. Stat Med 1991; 10: 697–709.
Faraway JJ. The cost of data analysis. J Comput Graph Stat 1992; 1: 213–29.
Altman DG, Andersen PK. Bootstrap investigation of the stability of a Cox
regression model. Stat Med 2003; 8: 771–83.
Derksen S, Keselman HJ. Backward, forward and stepwise automated subset
selection algorithms: frequency of obtaining authentic and noise variables. Br J
Math Stat Psychol 1992; 45: 265–82.
Steyerberg EW, Harrell FE, Habbema JD. Prognostic modeling with logistic
regression analysis: in search of a sensible strategy in small data sets. Med Decis
Making 2001; 21: 45–56.
Steyerberg EW, Harrell FE Jr, Borsboom GJ, Eijkemans MJ, Vergouwe Y,
Habbema JD. Internal validation of predictive models: efficiency of some
procedures for logistic regression analysis. J Clin Epidemiol 2001; 54: 774–81.
Maxwell SE, Delaney HD. Bivariate median splits and spurious statistical
significance. Psychol Bull 1993; 113: 181–90.
MacCallum RC, Zhang S, Preacher KJ, Rucker DD. On the practice of
dichotomization of quantitative variables. Psychol Methods 2002; 7: 19–40.
McClelland G. Negative consequences of dichotomizing continuous predictor
variables. Available at: http://psych.colorado.edu/ mcclella/MedianSplit/.
Royston P, Altman DG, Sauerbrei Links Dichotomizing continuous predictors
in multiple regression: a bad idea. Stat Med. 2006 Jan 15;25(1):127-41
Freedman D. Statistical models and shoe leather (with discussion). Soc
Methodol 1991; 21: 291–313.