Statistics of the Use-Dilution Test

Download Report

Transcript Statistics of the Use-Dilution Test

INTERNATIONAL SYMPOSIUM OF UROLOGY
FUT-UROLOGY 2008
ROBUST CLINICAL PREDICTION
Luigi Salmaso
Associate Professor of Statistics
University of Padova
Research Group for the Bladder Cancer
multicentric study: PF. Bassi, C. Brombin,
L. Corain, M. Racioppi, L. Salmaso
1
Topics
• Some considerations on DATA COLLECTION and
STATISTICAL METHODS most frequently used in
UROLOGY
• Case study: INVASIVE BLADDER CANCER
• Application and results of several statistical
methods to the case study
• Robust clinical prediction using the NonParametric
Combination of Dependent Permutation Tests
(NPC Test)
• Conclusions and practical suggestions
2
Necessary steps for ‘optimal’ statistical predictions
• Study design
• Collecting data using
a Web-based
Database
Study protocol
……………………
…………………….
…………………….
Robust Statistical
Analysis by suitable
statistical methods
(e.g. Nonparametric
permutation methods)
Individual predictions
based, e. g., on
nomograms or other
techniques
……………………….
……………………….
……………………….
3
Some considerations on DATA COLLECTION and
STATISTICAL METHODS most frequently used in UROLOGY
• The availability of an electronic database can
improve the quality and completeness of
collected data, reducing, in particular, the
number of missing data and the risk of
imputation errors.
• Accuracy in defining the nature (observational/
randomized/…) and the endpoints of the study
can lead to a better choice of the sample size
and of the subsequent statistical analysis to
perform.
4
ELECTRONIC DATABASE : An example
WEB-based Database
Variables’ coding
5
STATISTICAL ANALYSIS: standard methods and recent advances
50%
45%
40%
35%
30%
25%
20%
15%
10%
5%
0%
Survival Analysis
1.0
.8
Student's t: p =0.000
Wilkoxon: p =0.000
.4
.2
Survival Function
Censored
0.0
0
0-1
2-3
Multivariate Methods
(Logistic regression, …)
.6
NED
DOD+AWD
Cum Survival
% of patients
Univariate Test
(Student t test, Wilcoxon)
4-5
6-7
Tumour (Phase III)
8-9
>=10
20
40
60
80
100
120
Months
Classification complex
methods (Neural Networks,
Artificial Intelligence, …)
NonParametric Combination
of Dependent Permutation
Tests (NPC Test)
6
Case study: INVASIVE BLADDER CANCER
Italian multicentric observational study (from Jan 2001 to Dec 2006)
Reference: prof. PF. Bassi (Univ. Cattolica, Rome)
Total sample size: 1,003 subjects
469 subjects
Lost patients and
DOC (Dead for Other
Causes) patients
were excluded
534 subjects
including DOD (Dead of
including NED (Non
Disease) and AWD
Evidence of Disease)
(Alive with Disease, i.e.
patients
“statistically” died)
patients
Aim of the study: Detecting variables (factors)
that best predict the outcome (DEAD or ALIVE)
after a BLADDER CANCER DIAGNOSIS
7
Case study: INVASIVE BLADDER CANCER
• TNM-Classification of Bladder Cancer has been used, according to
Wittekind & Sobin (2002), thus the original variables were transformed
into ordinal variables. 30 endpoints were considered as relevant for the
statistical analysis.
• In particular, the interest is in evaluating the importance of
endpoints, collected at three phases of the study, in
predicting the outcome.
I Phase
II Phase
patient state of
health at the first
medical visit
First sympton
patient condition after
bladder cancer diagnosis
Diagnosis
Diagnosis
III Phase patient state after surgery
(histopathological
variables were examined)
Diagnosis
Surgery
8
Results of Kaplan-Meier (survival analysis)
1.0
(artificial example)
.8
.6
Cum Survival
.4
.2
Survival Function
0.0
Censored
0
20
40
60
80
100
120
Months
9
Results of univariate tests
60%
% of patients
50%
40%
30%
NED
DOD+AWD
20%
% of patients
Student's t: p =0.000
Wilkoxon: p =0.000
10%
0%
0-1
2-3
60%
8-9
>=10
NED
DOD+AWD
20%
10%
0%
0-1
2-3
4-5
6-7
Tumour (Phase II)
8-9
>=10
% of patients
40%
30%
Student's t: p =0.000
Wilkoxon: p =0.000
NED
DOD+AWD
0
Student's t: p =0.000
Wilkoxon: p =0.000
50%
% of patients
4-5
6-7
Tumour (Phase II)
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
1
2
Grading (Phase III)
3
Student's t: p =0.000
Wilkoxon: p =0.000
NED
DOD+AWD
0
1
Desease restarting (Phase III)
10
Results of Logistic Regression
• The logistic regression model has been applied to the
same dataset but very poor results were obtained (only two
significant predictors: Stage TNM at I and II Phase)
• The main problems for application:
– the inability of logistic regression to handle missing
values (missing data are present in 522 subjects out of
1,003 individuals);
– the high number of coefficients to be estimated
so that the recursive algorithm do not converge (after
1000 iterations). Note that when convergence is not
achieved for parameter estimates, results may be
11
unreliable.
Results of Logistic Regression
III Phase
II Phase
I Phase
Phase Predictor
Constant
Previous superficial TCC (Transitional Cell Carcinoma)
Focality
Stage TNM
Grading
Carcinoma In Situ (CIS)
Focality
Stage TNM
Carcinoma In Situ (CIS)
Grading
Regional lymph nodes
Metastases
Highway urinary obstruction
Stage TNM
Carcinoma In Situ (CIS)
Grading
Regional lymph nodes
Metastases
Histoloy
Trigone infiltration
Corpus invasion
Urethral involvement
Vascular invasion
Lymphonodal invasion
Prostatic Invasion
Adenocarcinoma of the Prostate
Highway TCC (Transitional Cell Carcinoma)
Desease restarting
Chemotherapy before surgery
Chemoterapy after surgery
Theraphy restarting
estimated
coefficient
-2,743
1,186
0,911
-0,126
-0,345
-0,565
-0,098
0,129
0,381
-0,132
-0,754
0,000
0,445
0,109
0,280
0,257
1,009
21,000
0,209
-0,361
-0,459
-0,972
0,583
0,466
0,510
0,115
0,414
41,000
-1,587
-0,952
-20,000
p-value
0.006
0.288
0.058
0.521
0.186
0.447
0.805
0.026
0.473
0.576
0.314
1.000
0.050
0.035
0.376
0.352
0.083
0.999
0.133
0.158
0.136
0.099
0.158
0.075
0.181
0.694
0.441
0.993
0.161
0.180
0.996
12
Results of Logistic Regression:
Number and % of missing values by variable
III Phase
II Phase
I Phase
Phase Variable
Previous superficial TCC (Transitional Cell Carcinoma)
Focality
Stage TNM
Grading
Carcinoma In Situ (CIS)
Focality
Stage TNM
Carcinoma In Situ (CIS)
Grading
Regional lymph nodes
Metastases
Highway urinary obstruction
Stage TNM
Carcinoma In Situ (CIS)
Grading
Regional lymph nodes
Metastases
Histoloy
Trigone infiltration
Corpus invasion
Urethral involvement
Vascular invasion
Lymphonodal invasion
Prostatic Invasion
Adenocarcinoma of the Prostate
Highway TCC (Transitional Cell Carcinoma)
Desease restarting
Chemotherapy before surgery
Chemoterapy after surgery
Theraphy restarting
No. of missing
41
18
44
37
12
147
124
96
128
82
137
41
70
44
140
7
65
82
100
145
110
144
117
187
131
87
102
50
1
87
% of missing
4%
2%
4%
4%
1%
15%
12%
10%
13%
8%
14%
4%
7%
4%
14%
1%
6%
8%
10%
14%
11%
14%
12%
19%
13%
9%
10%
5%
0%
9%
Mean (missing values): 85,9
% mean (missing values): 9%
Subjects with at least one
missing values: 522 (52%)
13
Robust statistical prediction using NPC Test
PERMUTATION APPROACH FOR HYPOTHESIS TESTING
The multivariate permutation approach for hypothesis testing by
NonParametric Combination (NPC) offers the following
advantages:
Powerful tests
No need to
specify the
Treatment of missing
dependence
values (missing
structure among completely at random,
variables
MCAR, or not
Exact solutions
completely at random,
not-MAR)
It also deals
with:
- Stratification
- Multivariate
categorical
variables
It handles:
- Mixed
variables
- Multivariate
restricted
alternatives
• NPC Test implements methods and algorithms presented in several international
papers by prof. L. Salmaso and prof. F. Pesarin. L. Salmaso leads an
internationally recognised research group in theoretical and applied
nonparametric statistics.
• NPC TEST is a unique and innovative statistical method (and software) that
provides researchers with authentic and powerful innovative solutions in the field
14
of hypotheses testing.
Robust statistical prediction using NPC Test
FEATURES OF STATISTICAL SOFTWARE NPC TEST 2.0
• NPC TEST allows us to perform hypothesis testing in the case of:
Two and C samples with Two and C samples Stratified analysis
with repeated measures
dependent or
independent variables
• NPC TEST also provides:
 Powerful test statistics for the treatment  One or two tailed test
of missing values
• Data (including mixed variables):
 categorical  ordered categorical
 numeric or continuous
 binary
15
Robust statistical prediction using NPC Test
FEATURES OF STATISTICAL SOFTWARE NPC TEST 2.0
An innovation of NPC TEST w.r.t. existing methods consists in the
performance of any combination of tests, starting with an appropriate set of
elementary tests, leading to a multivariate or multistrata overall global test
through the NPC methodology.
Elementary partial test statistics include:
t Statistic
ANOVA
differ. of
means
test statistics missing values
Anderson
Darling
CramerVon-Mises
Chisquare
Modified
Chisquare
Likelihoo
d Ratio
Combining functions for intermediate tests include:
Fisher
Liptak
Tippet
Direct
NPC TEST supports all statistical software standard functions: data import,
data manipulating and produces an effective report that can be easily
16
integrated and customized by means of an efficient text editor.
Robust statistical prediction using NPC Test
17
Robust statistical prediction using NPC Test
• After processing variables thus obtaining p-values using
NPC methods, we also performed a control of the
familywise error rate (FWE)
• The need for multiplicity control arises when any
problem is structured into two or more experimental
hypotheses (Finos and Salmaso, 2006)
• In order to have an inference on all the hypotheses
defining the multivariate problem, it is necessary to
control the probability of erroneously rejecting at least
one univariate (elementary) hypothesis; this is called
multivariate type I error or familywise error rate (FWE)
18
(Marcus et al., 1976)
Robust statistical prediction using NPC Test
CLOSED TESTING GRAPHICAL REPRESENTATION
19
Results of NPC Test
I Phase
Phase Variables (explanation)
Previous superficial TCC (Transitional Cell Carcinoma)
Focality
Stage TNM
Grading
Carcinoma In Situ (CIS)
II Phase
Phase Variables (explanation)
Focality
Stage TNM
Carcinoma In Situ (CIS)
Grading
Regional lymph nodes
Metastases
Highway urinary obstruction
p-value
st
univariate
1
(partial test)
combination
n.s
n.s
n.s.
n.s
n.s
n.s
p-value
univariate
1st
(partial test)
combination
n.s
0,0045
n.s
0,0007
n.s
0,0014
n.s
0,0007
20
Results of NPC Test
III Phase
Phase Variables (explanation)
Stage TNM
Carcinoma In Situ (CIS)
Grading
Regional lymph nodes
Metastases
Histoloy
Vesical trigone infiltration
Corpus invasion
Urethral involvement
Vascular invasion
Lymphonodal invasion
Prostatic Invasion
Adenocarcinoma of the Prostate
Highway TCC (Transitional Cell Carcinoma)
Desease restarting
Chemotherapy before surgery
Chemoterapy after surgery
Theraphy restarting
p-value
st
univariate
1
(partial test)
combination
0,0011
0,0088
0,0011
0,0006
0,0006
n.s.
n.s
n.s
0,0027
n.s.
0,0005
0,0005
0,0005
0,0085
n.s
n.s
n.s.
0,0004
n.s
0,0002
0,0004
0,0004
21
Results of NPC Test
p-value
Phase
1
combination
I Phase
n.s.
II Phase
0,0007
st
2nd combination
(global test)
0,0006
0,0013
0,0005
III Phase
n.s
0,0002
22
Conclusions and practical suggestions
• NPC method can offer a significant contribution to
successful research in biomedical studies with
several endpoints
• The advantages of NPC Test are connected with its
flexibility of handling any type of variables
• We recommended the use of this methodology
whenever the normality assumption is hard to
justify, in presence of missing values and when
the number of variables is higher than the
number of subjects
23
REFERENCES










Bassi P.F., Pagano F. (2007). Invasive Bladder Cancer. Springer.
Corain L., Salmaso L. (2007). A critical review and a comparative study on
conditional permutation tests for two-way ANOVA. Communications in Statistics
– Simulations and Computation, 36, 791-805.
Finos L., Salmaso L. (2006). Weighted methods controlling the multiplicity
when the number of variables is much higher than the number of observations.
Journal of Nonparametric Statistics, 18, 245-261.
Finos L., Salmaso L. (2006). FDR- and FWE-controlling methods using datadriven weights. Journal of Statistical Inference and Planning, 137, 3859-3870.
Finos L., Salmaso L., Solari A. (2007). Conditional Inference under
simultaneous stochastic ordering constraints. Journal of Statistical Inference
and Planning, 137, 2633-2641.
Marcus R., Peritz E., Gabriel K.R. (1976). On closed testing procedures with
special reference to ordered analysis of variance. Biometrika, 63, 655-660.
Marozzi M., Salmaso L. (2006). Multivariate Bi-Aspect Testing for Two-Sample
Location Problem. Communications in Statistics – Theory and Methods, 35,
477-488.
Salmaso L., Solari A. (2005). Multiple aspect testing for case-control designs.
Metrika, 62, 331-340.
Wittekind C., Sobin L. H. (2002). TNM Classification of malignant tumours
UICC, International Union Against cancer (6. ed.). Wiley-Liss, New York.
24
http://www.gest.unipd.it/~salmaso/NPC_TEST.htm
Results of Neural Networks
• We applied a neural network model (Multilayer Perceptron)
to the same dataset
• By applying a k-fold cross-validation, we obtained a rate of
right classification of 75.3% for DOD+AWD and of 60.5% for
NED. By using the subset of variables identified by univariate analysis
we got a very similar performance (74.5% and 62.4%)
• Main problems of neural networks are:
– Neural network work as black boxes, hence it is not possible to
convert the neuronal structure into a known model structure
– All input fields ‘must’ be numeric (in the study we do not have
numerical but ordinal categorical variables)
– Neuronal networks can suffer from a problem called
interference (i.e. to forget some of what it learned on older
data)
25