Epidemiological measures

Download Report

Transcript Epidemiological measures

EPIDEMIOLOGICAL
MEASURES
Philippe Wagner
Statistician
Unit for Social Epidemiology, Lund University
Centre for Clinical Research, Uppsala
University, Västerås
And
discriminatory
accuracy
MEASURES




Odds ratio (OR)
Population attributable fraction (PAF)
Variance explained / VPC / ICC
Risk dif ference (RD) / Number needed to treat (NNT)
 What are the used for? In what situations?
 Is there any benefit in adding information about the
discriminatory accuracy (DA)?
MEASURES




Odds ratio (OR)
Population attributable fraction (PAF)
Variance explained / VPC / ICC
Risk difference (RD) / Number needed to treat (NNT)
 What are the used for? In what situations?
 Is there any benefit in adding information about the
discriminatory accuracy (DA)?
 Note: We will only be studying dichotomous risk factors and
outcomes.
 Note II: Will be tanking some liberties with respect to notation
and mathematical rigor in order to focus on the bigger picture.
Warning! This presentations may contain
some algebra!
DISCRIMINATORY
ACCURACY
A reminder
DISCRIMINATORY ACCURACY
 Measured by sensitivity and specificity
 Sensitivity
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑎𝑠𝑒𝑠 𝑤𝑖𝑡ℎ 𝑟𝑖𝑠𝑘 𝑓𝑎𝑐𝑡𝑜𝑟
𝑃 𝑅=1𝑂=1 =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑎𝑠𝑒𝑠
 Proportion of exposed cases
 Loosely put; How well your prediction is doing on the cases.
 Specificity
𝑃 𝑅=0𝑂=0 =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑠 𝑤𝑖𝑡ℎ𝑜𝑢𝑡 𝑟𝑖𝑠𝑘 𝑓𝑎𝑐𝑡𝑜𝑟
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑠
 Proportion of non-exposed controls.
 Loosely put; How well your prediction is doing on the controls
THE CLASSICAL 2X2 TABLE
Outcome
No Outcome
Exposed
Sensitivity
1-Specificty
Not exposed
1-Sensitivty
Specificity
If we condition on the outcome we get the table above.
THE CLASSICAL 2X2 TABLE
Outcome
No Outcome
Exposed
TPF
FPF
Not exposed
1-TPF
1-FPF
Often expressed in terms of true and false positive
fractions where
𝑇𝑃𝐹 = 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦
𝐹𝑃𝐹 = 1 − 𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦
THE CLASSICAL 2X2 TABLE
Outcome
No Outcome
Exposed
TPF
FPF
Not exposed
1-TPF
1-FPF
We know that the OR is also calculated from the 2x2 table.
This will help us link the OR to the discriminatory accuracy
of the factor.
THE ODDS RATIO
And risk
factor DA
THE ODDS RATIO
 Needs no introduction – we all use it everyday.
 For its connection to DA , let’s go back to the 2x2 table..
THE CLASSICAL 2X2 TABLE
Outcome
No Outcome
Exposed
TPF
FPF
Not exposed
1-TPF
1-FPF
The odds ratio can, from the above table, be expressed as
𝑂𝑅 =
𝑇𝑃𝐹(1 − 𝐹𝑃𝐹)
1 − 𝑇𝑃𝐹 𝐹𝑃𝐹
THE CLASSICAL 2X2 TABLE
Outcome
No Outcome
Exposed
TPF
FPF
Not exposed
1-TPF
1-FPF
The odds ratio can, from the above table, be expressed as
𝑂𝑅 =
𝑇𝑃𝐹(1 − 𝐹𝑃𝐹)
𝑇𝑃𝐹
𝐹𝑃𝐹
=
/
1 − 𝑇𝑃𝐹 𝐹𝑃𝐹 (1 − 𝑇𝑃𝐹) (1 − 𝐹𝑃𝐹)
THE CLASSICAL 2X2 TABLE
Outcome
No Outcome
Exposed
TPF
FPF
Not exposed
1-TPF
1-FPF
Through algebra, we can re-express this as
𝑂𝑅 =
𝑇𝑃𝐹(1 − 𝐹𝑃𝐹)
𝑇𝑃𝐹
𝐹𝑃𝐹
=
/
1 − 𝑇𝑃𝐹 𝐹𝑃𝐹 (1 − 𝑇𝑃𝐹) (1 − 𝐹𝑃𝐹)
𝑂𝑅 𝐹𝑃𝐹
𝑇𝑃𝐹 =
1 + 𝐹𝑃𝐹(𝑂𝑅 − 1)
THE CLASSICAL 2X2 TABLE
Outcome
No Outcome
Exposed
TPF
FPF
Not exposed
1-TPF
1-FPF
And we can draw..
TPF VS FPF GIVEN THE OR
Pepe AJE 2004
#1 in the course
reading material
SO WHAT IS HAPPENING?!
 Dif ferences in risk factor prevalence can cause dif ferent
TPF/FPF scenarios.
 Intuitively, this is not dif ficult to understand.
 When we have dif ferent numbers of outcomes and exposed we
get some false negative and some false positive predictions.
INTUITIVELY
People with outcome
People with exposure
INTUITIVELY
People with outcome
People with exposure
Perfect! Exposure covers all cases, AND nothing else
INTUITIVELY
People with outcome
People with exposure
TPF very high!
FPF very low!
Perfect! Exposure covers all cases, AND nothing else
TPF VS FPF GIVEN THE OR
Pepe AJE 2004
#1 in the course
reading material
INTUITIVELY
People with outcome
People with exposure
INTUITIVELY
However…
INTUITIVELY
However…
INTUITIVELY
However…
False positives!
FPF high!
TPF high!
TPF VS FPF GIVEN THE OR
Pepe AJE 2004
#1 in the course
reading material
INTUITIVELY
Or the other way around..
INTUITIVELY
Or..
INTUITIVELY
Or..
INTUITIVELY
Or..
False negatives!
TPF low!
FPF low!
TPF VS FPF GIVEN THE OR
Pepe AJE 2004
#1 in the course
reading material
IN THE 2X2 TABLE
Outcome
No Outcome
Exposed
TP
FP
Not exposed
FN
TN
We want TP and TN to be high, FN and FP low.
IN THE 2X2 TABLE
Outcome
No Outcome
Exposed
TP
FP
Not exposed
FN
TN
We want TP and TN to be high, FN and FP low.
When exposure prevalence increases, TP and FP increases
in relation to FN and TN.
IN THE 2X2 TABLE
FP
Risk
TN
TP
FN
Risk factor prevalence
IN THE 2X2 TABLE
FP
Risk
TN
TP
FN
Risk factor prevalence
POPULATION
ATTRIBUTABLE
FRACTION
PAF
PAF
 Definition: The PAR is used to estimate the fraction of the
total disease burden in the population that would not have
occurred if a causal risk factor were absent.
 Often used to gauge the ef fect of a potential intervention on
the population.
 Used in etiological studies to indicate how much of a disease
that is ”explained” by existance of an exposure in the
population. Controversial use. (See for instance, Rockhill AJE
1998)
 Used in etiological studies to indicate how much of a disease
that is not ”explained” by existance of known exposure.
Controversial use. (Rockhill AJE 1998)
PAF
 Definition: The PAR is often used to estimate the fraction of
the total disease burden in the population that would not have
occurred if a causal risk factor was absent.
𝑃 − 𝑃𝑁𝑜𝑡
𝑃𝐴𝐹 =
𝑃
PAF
FP
Risk
TN
TP
FN
Risk factor prevalence
PAF
FP
TN
Risk
PAF
TP
FN
Risk factor prevalence
PAF
FP
TN
Risk
PAF
TP
FN
Risk factor prevalence
We are
subtracting
the cases in
the exposed
group not due
to exposure.
PAF
 One possible explanation.
 Three component causes
 E, A and B. Two pathways to disease.
E
A
B
A
PAF
Studying
exposure E
E
FP
TN
PAF
E A
A
Risk
B
TP
E
B
A
FN
Risk factor prevalence
B A
PAF
FP
E
Studying
exposure E
E A
Removing E
removes the
E|A cases.
TN
PAF
A
Does not
remove
exposed
E|A|B cases.
Risk
B
TP
E
B
A
FN
Risk factor prevalence
B A
PAF=30%
PAF
Studying
exposure E
E
FP
TN
PAF
E A
A
Risk
B
Removing A
Removes all
cases.
PAF=100%
TP
E
B
A
Removing E
removes the
E|A cases.
PAF=30%
FN
Risk factor prevalence
B A
PAF IN RELATION TO DA
 Rockhill (AJE 1998) commented on another author, stating that:
 ..af ter computing an attributable fraction of 41% for the three risk
factors “no fir st bir th by age 20 year s”, “family histor y of breast cancer
in a fir st -degree relative”, and “family income level in the upper two
ter tiles of the United States”, Madigan et al. state that their estimates
"suggest that a substantial proportion of breast cancer cases in
the are explained by well-established risk factors“.
 This use of the word "explain" is misleading.
 According to the data of Madigan et al., nearly the entire population of
women in the United States has at least one of the considered risk
factors. Since the vast majority of such exposed women will not
develop breast cancer, Stating that such factor s explain a large
propor tion of breast cancer risk is misleading and even alarmist.”
PAF IN RELATION TO DA
 Rockhill (AJE 1998) commented on another author, stating that:
 ..af ter computing an attributable fraction of 41% for the three risk
factors “no fir st bir th by age 20 year s”, “family histor y of breast cancer
in a fir st -degree relative”, and “family income level in the upper two
ter tiles of the United States”, Madigan et al. state that their estimates
"suggest that a substantial proportion of breast cancer cases in
the are explained by well-established risk factors“.
FALSE POSTITIVES!
 This use of the word "explain" is misleading.
 According to the data of Madigan et al., nearly the entire population of
women in the United States has at least one of the considered risk
factors. Since the vast majority of such exposed women will not
develop breast cancer, Stating that such factor s explain a large
propor tion of breast cancer risk is misleading and even alarmist.”
PAF IN RELATION TO DA
 Rockhill also notes that in her own study of breast cancer risk
factors in the US that
 the PAF increases with a more liberal definition of risk factor
cut-of fs.
PAF IN RELATION TO DA
PAF IN RELATION TO DA
 ..estimates of population attributable fraction for established breast
cancer risk factor s can be made "high" only by defining risk factor s in such
a way that vir tually the entire population must be labeled "exposed ,“ and
therefore , "at risk .”
 .. To demonstrate this, we estimated population attributable fraction for
the four establishedbreast cancer risk factor s “early age at menarche”,
“late age at fir st full -term pregnancy/nulliparity”, “histor y of breast cancer
in mother/sister”, and “histor y of benign breast biopsy” and examined the
sensitivity of the population attributable fraction and its precision to
changes in exposure cutpoints. Using the broad exposure definitions for
early age at menarche (< 14 year s) and for late age at fir st full-term
pregnancy (>20/nulliparous ), we found a high propor tion exposed among
white cases and control s 98%.
 The population attributable fraction estimate was reduced (from 0.25 to
0.15) when the most "restrictive" exposure definitions of early age at
menarche (<1 2 year s) and late age at fir st full-term pregnancy (S:30
year s/nulliparous) were used.
 That is: “the PAF is larger with a more liberal definition of risk factor
cut-of fs.”
PAF IN RELATION TO DA
 We will see that this is mathematical fact.
 It is known that the PAF can be expressed as
𝑃𝐴𝐹 =
𝑝𝑅 (𝑅𝑅 − 1)
𝑝𝑅 𝑅𝑅 − 1 + 1
 Indicating that the PAF increases with the risk factor
prevalence for a given relative risk.
 From before, we know that TPF and FPF also increases with
risk factor prevalence, for a given OR.
 We have that
PAF IN RELATION TO DA
RR = 5
PAF IN RELATION TO DA
RR = 5
PAF
FP
TN
Risk
PAF
TP
FN
Risk factor prevalence
PAF IN RELATION TO DA
RR = 5
PAF IN RELATION TO DA
RR = 5
PAF
FP
TN
Risk
PAF
TP
FN
Risk factor prevalence
PAF
FP
TN
Risk
PAF
TP
FN
Risk factor prevalence
PAF IN RELATION TO DA
RR = 5
PAF IN RELATION TO DA
RR = 5
PAF
FP
TN
Risk
PAF
TP
FN
Risk factor prevalence
PAF
FP
TN
Risk
PAF
TP
FN
Risk factor prevalence
PAF IN RELATION TO DA
RR = 5
PAF IN RELATION TO DA
High risk factor prevalence!
 .. To demonstrate this, we estimated population attributable
fraction for the four establishedbreast cancer risk factors “early
age at menarche”, “late age at first full -term
pregnancy/nulliparity”, “history of breast cancer in
mother/sister”, and “history of benign breast biopsy” and
examined the sensitivity of the population attributable fraction
and its precision to changes in exposure cutpoints. Using the
broad exposure definitions for early age at menarche (<14 years)
and for late age at first full-term pregnancy (>20/nulliparous ),
we found a high proportion exposed among white cases and
controls 98%.
High FPF!
 According to the data of Madigan et al., nearly the entire
population of women in the United States has at least one of the
considered risk factors. Since the vast majority of such exposed
women will not develop breast cancer,
PAF IN RELATION TO DA
 Empirical example
 Studying myocardial infarction (MI) in the MDC-cohort creating
a risk-index from traditional risk factors and biomarkers in
order to predict disease within 15 years from baseline.
 Study population.
 The Malmö Diet and Cancer (MDC) study is a population based, prospective epidemiologic cohort of 28 449 persons
enrolled between 1991 and 1996. From this cohort, 6103
individuals were randomly selected to participate in the MDC
cardiovascular cohort.
 5054 had complete information on traditional risk factors,
4764 on biomarkers and 4489 on both traditional risk factors
and biomarkers.
PAF IN RELATION TO DA
PAF IN RELATION TO DA
 The fact that PAF increases with risk factor prevalens, is
important to keep in mind when choosing cut -of fs for risk
factors in your study and interpreting the results.
PAF IN RELATION TO DA
 Another fact that is important to keep in mind, with respect to
the PAF-DA relationship, is when reading PAF reported from
other studies, is that a given PAF can be associated with very
dif ferent combinations of TPF and FPF.
PAF IN RELATION TO DA
 In fact, the PAF does not care about the FPF at all. This is
mathematical fact.
 With some algebra we can show that
𝑃𝐴𝐹 =
𝑇𝑃𝐹 − 𝑝𝑅
1 − 𝑝𝑅
 PAF depends only on the risk factor prevalence and the TPF,
 TPF can then be re-written as
𝑇𝑃𝐹 = 𝑃𝐴𝐹 + (1 − 𝑃𝐴𝐹)𝑝𝑅
 And since we know that the FPF increases with p R as well, if
we fix the PAF, we can draw
PAF IN RELATION TO DA
PAF IN RELATION TO DA
We can
remove
60%
disease
burden
with
moderate
precision.
FPF=20%
PAF IN RELATION TO DA
PAF
FP
TN
Risk
PAF
TP
FN
Risk factor prevalence
PAF
FP
TN
Risk
PAF
TP
FN
Risk factor prevalence
PAF
FP
TN
Risk
PAF
TP
FN
Risk factor prevalence
We can
remove
60%
disease
burden
with high
presicion.
PAF
FP
TN
Risk
PAF
TP
FN
Risk factor prevalence
TPF=PAF
All
exposed
cases can
be
removed.
With
almost no
FPF.
Almost no
unnecess
ary costs
and side
effects.
PAF IN RELATION TO DA
PAF IN RELATION TO DA
PAF
FP
TN
Risk
PAF
TP
FN
Risk factor prevalence
PAF
TN
Risk
FP
PAF
TP
FN
Risk factor prevalence
PAF
TN
Risk
FP
PAF
TP
FN
Risk factor prevalence
We can still
remove 60%.
But with a
large
amount of
unnecessary
costs and
possible side
effects.
PAF IN RELATION TO DA
 Empirical example:
 Genotype and bladder cancer. A genetic association study (ref) showed
strong evidence that the copy number of gene GSTM1 is significantly
associated with risk of bladder cancer, with an OR = 1 .9 corresponding
to the GSTM1 null genotype (51% prevalence ). If this marker were used
as a binar y marker for bladder cancer detection in the general
population, it would result in 66% sensitivity and 50% specificity, a
poor marker for diagnostic purposes. However, if a drug were to be
developed that targeted the pathway(s) by which GSTM1 null increases
risk , and if the drug were 100% ef fective in preventing bladder cancer
without toxic side ef fects (and ignoring costs), then treatment of all
marker carriers would reduce bladder cancer by 31% (PAR%),
 Ref: Li Quantification of population benefit in evaluation of biomarkers:
practical implications for disease detection and prevention, Med
Inform. & Desc. Mak . 2014.
PAF IN RELATION TO DA
If they had
only
presented
PAF 31%.
The TPF
could be
between 31
and 100%.
The FPF
could be
between 0
and 100%
PAF IN RELATION TO DA
31% of
disease
could be
removed, but
with low
precision.
FPF was
50%.
This means
that 50%
who did not
develope
disease
would still
get the drug.
PAF IN RELATION TO DA
 Empirical example:
 CNV and neuroblastoma. A copy number variation associated with
neuroblastoma was repor ted recently (ref). The prevalence of the marker
(1q21 .1) in the general population is about 9%, and the OR of the marker
(copy loss) for neuroblastoma risk is estimated to be around 3. If this
marker were dichotomized as a binar y marker for predicting the absence or
presence of the disease, it will result in a 23% sensitivity and 91%
specificity, with a PAR% of approximatel y 15%, which indicates the marker
could account for about 15% of neuroblastoma risk if the disease is truly
caused by the CNV (copy -number variation). Assume a drug is developed
that targeted this marker ( 1q21 .1) for prevention. If the drug is 100%
ef fective in disease prevention and had no side ef fects and all per sons who
were carrier s for the marker were treated with the drug, it would reduce
the total disease cases by 15% (PAR%).
 However, in the more likely scenario, drugs have significant side ef fects
and are not 100% ef fective such that more extensive risk benefit analyses
are needed.
 Ref: Li Quantification of population benefit in evaluation of biomarker s:
practical implications for disease detection and prevention, Med Inform. &
Desc. Mak . 2014.
PAF IN RELATION TO DA
If they had
only
presented
PAF 15%.
The TPF
could be
between 15
and 100%.
The FPF
could be
between 0
and 100%
PAF IN RELATION TO DA
In this
example only
15% of
disease
could be
removed.
But it could
be removed
with good
precision.
Only 9% FPF.
PAF IN RELATION TO DA
In this
example only
15% of
disease
could be
removed.
But it could
be removed
with good
precision.
Only 9% FPF.
But is it high
enough if
there are
side effects?
PAF IN RELATION TO DA
 Summary I
 The PAF has a natural tendency to grow with increasing risk
factor prevalence.
 When it increases, so does the FPF, and many with the given
risk factor never get the disease.
 This may appear odd when interpreting PAF as the proportion
of disease explained by the risk factor.
 This fact may be highlighted by presenting TPF and FPF
together with PAF.
PAF IN RELATION TO DA
 Summary II
 The PAF is the proportion of exposed cases that can be
removed. The TPF is all exposed cases, and not all exposed
cases can be removed.
 Therefore, the PAF is the lower possible value of the TPF
(when PAF is say 0.8, so is the lowest possible TPF).
 When they are the same, the FPF is low and disease burden
can be removed with great precision.
PAF IN RELATION TO DA
 Summary III
 But PAF may appear with any number of TPF and FPF.
 The PAF is the proportion that can be removed in an idealized
situation where intervention has no costs or side ef fects.
 In reality TPF and FPF need to be considered together with
PAF, in order to make a real risk -/cost-/benefit analysis.
VARIANCE EXPLAINED
ICC and
VPC
VARIANCE EXPLAINED
 In this context, a measure of how much presence/abscene of
exposure explains in terms of the presence/absence of the
disease.
 As opposed to PAF that only cares about cases and presence
of exposure.
 In the continous case it corresponds to the correlation
between predicted and observed outcome.
VARIANCE EXPLAINED
 In dichotomous outcomes several dif ferent alternatives to choose from.
 For the purpose of simplicit for this presentation, we chose the
McKelvey and Zavoina R 2 .
 Similar to the continuous case, it is defined as
𝑅 2 𝑀𝑍 =
𝑉 [𝑦 ∗ ]
𝑉 𝑦∗ +
𝜋2
3
 with the exception that the continous predictor y * is measured on a
latent scale and manifestas as a 0/1-variable below/over a cer tain cutpoint.
 In this case, y * is assumed follow the logistic distribution.
 Intuitively, it be viewed as the correlation between the predicted and
obser ved outcomes on the latent scale.
VARIANCE EXPLAINED
 When the predictor is one dichotoumous exposure variable,
this becomes
𝑅2
𝑀𝑍
=
𝑃𝑅 1 − 𝑃𝑅 log (𝑂𝑅) 2
𝑃𝑅 1 − 𝑃𝑅
log (𝑂𝑅) 2
𝜋2
+
3
 Where PR is the risk factor prevalence and π 2 /3 is the
variance of the logistic distribution.
 Alot of algebra, but we observe that the variance explained
depends on the prevalence of the risk factor and the odds
ratio.
VARIANCE EXPLAINED
Just like TPF
and FPF, VE
is
dependent
on the risk
factor
prevalence.
VARIANCE EXPLAINED
 As TPF and the FPF depend on risk factor prevalence we can
plot them togheter to see that..
VARIANCE EXPLAINED
RR = 5
IN THE 2X2 TABLE
To see why this
is..
IN THE 2X2 TABLE
In order fo VE
to be large..
FP
Risk
TN
TP
FN
Risk factor prevalence
We want TPF
and FPF to be
as large as
possible.
IN THE 2X2 TABLE
We want TPF
and FPF to be
as large as
possible.
FP
Risk
TN
TP and TN are
where
exposure and
disease are
concordant.
TP
FN
Risk factor prevalence
For FP and FN
they are
discordant.
IN THE 2X2 TABLE
IN THE 2X2 TABLE
FP
TN
TPF =
(TP/TP+FN)
grows.
Risk
FPF =
(FP/FP+TN)
grows as well.
TP
FN
Risk factor prevalence
IN THE 2X2 TABLE
IN THE 2X2 TABLE
IN THE 2X2 TABLE
FP
TN
TPF =
(TP/TP+FN)
grows.
Risk
FPF =
(FP/FP+TN)
grows as well.
TP
FN
Risk factor prevalence
IN THE 2X2 TABLE
TPF =
(TP/TP+FN)
decreases.
FP
Risk
TN
TP
FN
Risk factor prevalence
FPF =
(FP/FP+TN)
decrease as
well.
IN THE 2X2 TABLE
IN THE 2X2 TABLE
IN THE 2X2 TABLE
FP
Risk
TN
TP
FN
Risk factor prevalence
We want TPF
and FPF to
be as large
as possible.
Alot of
concordant
pairs.
Best balance
happens at
near risk
factor
prevalence
0.5
VARIANCE EXPLAINED
TPF; How well your prediction is doing with the cases.
FPF; How well your prediction is doing with the controls.
Variance explained; How well you are doing with both.
TPF and FPF both have to be high in order for explained
variance to be high.
 The concepts of VE and DA in terms of sensitivity/specificty
are closely connected.




VARIANCE EXPLAINED
 However, the same explained variance can be associated with
dif ferent sets of TPF and FPF.
VARIANCE EXPLAINED
 However, the same explained variance can be associated with
dif ferent sets of TPF and FPF.
VARIANCE EXPLAINED
 The dif ferent sets of TPF and FPF improve with increasing VE.
VARIANCE EXPLAINED
 But VE is thightly linked to DA in terms of TPF and FPF
improving with increasing VE.
VARIANCE EXPLAINED
 VE is thightly linked to DA in terms of TPF and FPF improving
with increasing VE.
 As we have seen the PAF is not.
COMPARE VE AND PAF
 VE and PAF conveys very dif ferent information.
COMPARE VE AND PAF
 VE and PAF conveys very dif ferent information.
PAF tells you
that your risk
factor covers
cases that
could be
removed.
VE tells you
wheter the
risk factor
explains
presence/ab
sence of
disease
VE AND PAF
 VE and PAF conveys very dif ferent information.
PAF is high
when VE is
low, for high
risk factor
prevalence.
This is
because PAF
ignores the
fact that we
are not
explaining
absence of
disease in
those
exposed.
VE – PAF GOING BACK..
May be
explained
using
previous
example.
E
FP
TN
PAF
E A
A
Risk
B
TP
E
B
A
FN
Risk factor prevalence
B A
1. Studying
exposure E
2. And we
increase risk
factor
prevalence
and disease
prevalence in
non-exposed..
VE AND PAF – GOING BACK
FP
E
Studying
exposure E
E A
PAF is happy
with knowing
about E
TN
PAF
A
Risk
B
TP
E
B
A
FN
Risk factor prevalence
BA
VE is not
happy until
we identify
the
interaction
with the
factor A.
VE AND PAF – GOING BACK
FP
E
Studying
exposure E
E A
PAF is happy
with knowing
about E
TN
PAF
A
Risk
B
TP
E
B
A
FN
Risk factor prevalence
BA
VE is not
happy until
we identify
the
interaction
with the
factor A.
VARIANCE EXPLAINED - SUMMARY
 When variance explained is reported in a study, we are
informed of the combined ef fect of TPF and FPF.
 If high, we know we are doing well predicting both presence
AND absence of disease.
 As opposed to PAF who is only concerned with presence of
exposure in diseased.
VARIANCE EXPLAINED – SUMMARY II
 But the same value of explained variance can still correspond
to dif ferent combinations of TPF and FPF.
 Depending on the study problem, a higher TPF or a lower FPF
may be preferred.
 TPF may be important if disease is serious and can be
prevented.
 FPF may be important if treatment is expensive and/or has
side ef fects.
 Therefore, it may still be sensible to present TPF and FPF in
addition to VE.
 If VE is low, we do not know if we are doing poorly with cases
or controls, or both.
THE ICC
 The ICC used, for instance in multilevel analysis, can be
viewed in terms of explained variance.
 Take for instance, a multilevel logistic regression analysis of
5-year survival after breast cancer diagnosis in dif ferent
hospitals in Sweden, with hospital as a random ef fect.
 The ICC corresponding to the empty model, containing random
term only, is given by
𝐼𝐶𝐶 =
𝑉[𝑢]
𝜋2
𝑉𝑢 + 3
 Where u is the hospital ef fect.
THE ICC
 A high ICC tells you that, when using treating hospital to
predict patient outcomes, TPF is high and the FPF is low.
 Patients from high risk hospitals die within 5 years, patients
from low risk hospitals do not.
 Therefore, treating hospital can be used to select high-risk
patients for screening. Simply select the ones from high risk
hospitals.
 If causal, that is; hospital-related factors, such as treatment
dif ferences, are actaully causing the observed dif ferences,
hospital is an important level to consider when imporving
patient survival.
THE ICC
 If the ICC is low, even if dif ferences are caused by hospitals,
treating hospital is not important when improving patient
survival as other factors are more influential.
 Therefore it should not be used for screening or intervention.
 This is because the TPF may be low, causing us to miss high
risk patients treated in low risk hospitals.
 The high FPF may be serious if intervention is medical
treatment and has side ef fects, because we are treating
patients unnecessarily.
 What a high ICC does not tell us, is wheter a given definition
of high risk hospital is good in terms of TPF or FPF, or both.
 For this you need to calculate the TPF and FPF.
THE NUMBER NEEDED
TO TREAT
Very
briefly
NNT
 The number needed to treat (NNT) is used to evaluate the
ef fectiveness health-care interventions.
 The NNT is the number of patients who need to be treated in
order to prevent one additional adverse outcome.
 It is defined as NNT=1/Risk dif ference between treated and
non-treated patients.
 NNT = 1 is ideal, where everyone improves in the treatment
group and no one improves with control group.
 The greater NNT, the less ef fective the treatment is.
NNT
FP
TN
Risk
RD
TP
FN
Risk factor prevalence
Studies with
different
prevalence in
the control
group may
still have the
same RD and
hence NNT.
NNT
FP
TN
Risk
RD
TP
FN
Risk factor prevalence
Studies with
different
prevalence in
the control
group may
still have the
same RD and
hence NNT.
This yields
different TPF
and FPF for
the same
NNT.
NNT
DA associated
with NNT
usually bad.
For
interpretation:
Consider ”Not
being treated”
as exposure
Numers in
graph
Above risk
difference,
below NNT
NNT
DA associated
with NNT
usually bad.
For
interpretation:
Consider ”Not
being treated”
as exposure
Numers in
graph
Above risk
difference,
below NNT
Event with
NNT=2 we
can still have
up to 75% of
improved
patients not
treated.
CONCLUSIONS - IN SHORT
 All epidemiological measures mentioned
 OR, VE, ICC, PAF and NNT
 may benefit from additional information on the DA in many
studies in order to fully be able to evaluate risk, cost and
benefit from a given intervention.
THANK YOU FOR YOUR
ATTENTION