Transcript Cases - SACEMA
Advanced Epi August 15-19
th
SACEMA 2011
Matthew Fox Boston University Center for Global Health and Development Department of Epidemiology Health Economics and Epidemiology Research Office [email protected]
Introductions
Who are you?
Where do you work/study?
What do you study?
Welcome
About me
Week long short course on epi methods
2 Sessions/day each about 3 hours (depending) Assumes intro/intermediate epi, practical experience with epi and stats
Mix of lecture and discussion
Too much material, take good notes, go back to them
Finish mid-day on Friday
Course works if you read and participate
Course Overview
Review basic epidemiologic principles
Reinterpret them in a new light
Think through problems/implications of what we learned in intro/intermed epi
Develop a causal framework(s) to hang our epidemiologic thinking
Learn/apply advanced epi methods
Modern Epidemiology III
Questions for Today
What is epidemiology, what is its goal?
What are measures of association and measures of effect?
What do these measures really mean?
Which ones have causal meanings?
What is the odds ratio really about Why does everyone use it?
The goal of epidemiologic research
Epidemiology is study of:
The distribution and determinants of disease in human populations and the application of that knowledge to the control of disease
But the goal is:
To obtain a
valid
and
precise
(and
generalizable
estimate of the effect of an exposure on a disease ) Validity is the opposite of bias, precision is the opposite of random error Fundamentally concerned with measurement
Anyone remember Type I and Type II error?
What are they?
Basic Statistics
Truth about Null Effect No effect Effect Our study null No effect
Correct Type II error (beta) Type I error (alpha) Correct Type I: If we reject the null, what are the chance there is no effect?
Type II: If we fail to reject the null, what are the chances there is an effect?
How do we know a particular epidemiologic finding is true?
Find that the relative risk of exposure to vitamin # on cancer @ is 2.5, p=0.049
Assume we did the perfect study
No bias (confounding, selection, information) 80% power, alpha = 0.05
What is chance there is really no effect of vitamins on cancer?
i.e. True relative risk is 1
Syphilis testing in the US
In US pre-2005, Massachusetts required a syphilis test before marriage
Assume the test was: 95% sensitive and 95% specific
If I test positive, how likely is it that I truly have syphilis?
Answer is that it depends
Syphilis
Se = 95% Sp = 95%
Test + +
95 5
Total
100
Truth Total
495 590 PPV = 16% 9405 9900 9410 10,000 Prevalence is: 1%
Back to our study
Effect Effect Correct Our study No effect Type II error (beta) Truth No effect Type I error (alpha) Correct Alpha and beta use the TRUTH as the denominator and so are like Se and Sp
Back to our study
Effect Effect Correct Our study No effect Type II error (beta) Truth No effect Type I error (alpha) Correct Judging the “correctness” of a single study is the PPV, and depends of the prevalence of true hypotheses
Back to our study
alpha = 5%, (Sp 95%) beta = 5%, (Se 95%)
Our Study + Total Truth +
950 50 1000
-
450 8550 9000
Total
1400 68% chance our study is right 8600 10,000 Prevalence of true hypotheses is: 10%
Take home message: We need to critically examine the way we have been taught to design and interpret epidemiologic research
Review of basic concepts
Study design, measures of disease frequency, measures of effect/association
The Source Population
The population that gives rise to cases
It is defined:
In time and place With respect to population characteristics With respect to external influences (modifiers) Not as a sample of the general population
Cohorts
Membership in a cohort requires a person meet admissibility criteria
Have common admissibility-defining events
Membership begins once the temporally last criterion is met
Once a member, a person never leaves (membership is static or closed) A
closed
cohort adds no new members and loses only to death, an
open
cohort is adding new members
Dynamic population
Membership requires a person satisfy the membership status criteria
They have common admissibility-defining characteristics
Membership exists so long as all of the status criteria are satisfied
A person can enter a dynamic population, leave it, and then re-enter
Cohorts vs. Dynamic Populations
Framingham heart study
Cohort
– the admissibility criteria are enrolling in the study in 1948. Never leave the cohort once you enroll.
Dynamic population
– could have instead studied all residents of Framingham from 1948 onwards, the catchment population for a case registry there. Some will leave, new people will join.
STUDY DESIGN: How to harvest information from the base
Census (cohort) or Sample (case-control)
Cases are valuable (information rich)
In SE calcs, these drive your standard error Ex. SE(LN(RR)) = sqrt(1/A –1/N 1 +1/B –1/N 0 ) Include all the cases in the population
Information density of population that gave rise to cases is not great
Can include all or sample Nearly all base’s info is harvested when sample of base is small multiple of the cases
Which is the best measure to assess causal effects?
1) Risk Difference 2) Risk Ratio 3) Odds Ratio
In a case-control study, from what population do we sample controls?
1) 2) 3) Those with disease Those without disease Everyone, regardless of whether they have the disease
Cohort Study
Case-control Study
Kramer and Bovin 1987
We define a cohort study as a study in which subjects are followed forward from exposure to outcome… Inferential reasoning is from cause to effect. In case control studies, the directionality is the reverse. Study subjects are investigated backwards from outcome to exposure, and the reasoning is from effect to cause.”
Cohort Study: Relative Risks
Index (E+) Reference (E-) Cases Non-cases Total
A C N 1 B D N 0
Relative risk: (A/N 1 ) / (B/N 0 )
Risk in exposed / risk in unexposed Risk is number of cases / total at risk Numerator is number of cases
Denominator is cases and controls!
Cohort Concept
N E+ t 0 N E Exposed Cases A C (N E+ - a) D (N E t - b) Unexposed Cases B
Cohort Study: Relative Risks
Index (E+) Reference (E-) Cases Non-cases Total
A C N 1 B D N 0
Relative risk:
(A/N 1 )/(B/N 0 ) can be rearranged as (A/B)/(N 1 /N 0 ) A/B is ratio of exposed to unexposed cases N 1 /N 0 is ratio of exposed to unexposed in population
Relative risk has meaning: average increase in risk produced by exposure
Case-control: Cases
Members of population who develop disease over the follow-up period
Same cases as the analogous cohort study Case ascertainment is influenced by design Primary base: population defined first Secondary base: cases defined first
Case-control: Controls
A sample of the population experience that gave rise to the cases
3 options (paradigms)
Un-diseased experience Population at risk at beginning of the study Population experience over follow-up Cases Non-cases
0 mos 6 mos 12 mos 18 mos 24 mos
0 5 10 15 20 100 95 90 85 80
t 0
Case-control Concept
Option 2: Case-cohort N E+ N E Option 3: Density Sampling Exposed Cases A Unexposed Cases B Option 1: Cumulative C (N E+ - a) D (N E t - b)
Case-control study
Cases Controls Index A C Reference B D
Now we can’t estimate risk A/N 1 and B/N 0 because we don’t know the denominators
Left with an odds ratio
But how to interpret?
2 ways to calculate an OR
Cases Controls Index A C Reference B D
Cross product ratio:
(A*D)/(B*C) Not particularly meaningful, but it works
2 ways to calculate an OR
Cases Controls Index A C Reference B D
Case ratio/base ratio:
(A/B) / (C/D) A/B is the ratio of exposed to unexposed cases C/D is the ratio of exposed to unexposed controls Remember back to Relative Risk Here C/D fills in for N 1 /N 0
The trohoc fallacy
Cases Non-cases Total Index 400 600 1000 Reference 100 900 1000 Cases Non-cases Total
10% sample of non-cases
Index 400 60 Not Reference 100 90 sampled RR = (400/1000) / (100/1000) = 4.0
OR = (400/60) / (100/90) = 6.0
The trohoc fallacy is idea that a case-control study is a cohort study done backwards (heteropalindrome)
Requires a rare disease assumption for the odds ratio to approximate the relative risk
t 0
Case-control Concept
Option 2: Case-cohort N E+ N E Exposed Cases A Unexposed Cases B Option 1: Cumulative C (N E+ - a) D (N E t - b)
10% sample of population that gave rise to cases
The trohoc fallacy revealed
Cases Non-cases Total Index 400 600 1000 Reference 100 900 1000 Cases Non-cases Controls Index 400 Not 100 Reference 100 sampled 100 RR = (400/1000) / (100/1000) = 4.0
OR = (400/100) / (100/100) = 4.0
Sample total population that gave rise to cases (which includes cases), not undiseased at end
Cases can be their own controls if randomly sampled
Requires no rare disease assumption
Miettinen on the trohoc fallacy
“Consider the clinical trial: the concern is, as always, to contrast categories of treatment as to subsequent occurrence of some outcome phenomenon, whereas comparing different categories of the outcome as to the antecedent distribution of treatment is uninteresting if not downright perverse.”
Preferred terms like “case-referent” and “case base” studies as “the base sample is no more a control series than a census of the base is”
Why it works
OR = [A*D] / [B*C] = [A/B] / [C/D]
If we sample 10% of the base then the odds ratio is:
OR = [A/B] /[(10%*N 1 )/(10%*N 0 )]
= [A/B]/(N 1 /N 0 ) = RR
Cases Non case Total Index Ref A C N 1 B D N 0
Cohort studies exclude those who are not at risk for disease (though they don’t need to). In a case control study. Should we exclude those not at risk for exposure? Ex. In a study of hormonal contraception and heart disease, should we exclude nuns?
With appropriate sampling, odds ratio is interpreted as estimate of relative risk, which has meaning. Case control studies are cohort studies done efficiently, not cohort studies done backwards.
Measures of Disease Frequency
Provide an estimate of the occurrence of disease in a population
Typically we study first occurrence as later occurrences are often affected by first
Incorporates:
Disease state Time Population definition
Measures of Disease Frequency
Prevalence:
Proportion of population with disease at a particular time Cross-sectional Reflects rate of disease occurrence and survival with disease
Measures of Disease Frequency
Cumulative Incidence (Simple)
Proportion of a population that develops disease over a follow-up period Also called incidence proportion or risk Bounded by 0 and 1 Time not part of measure but must report Difficult to measure in dynamic populations CI (t0,t) = I (t0,t) /N 0
Measures of Disease Frequency
Incidence rate (density)
Number of newly developed cases divided by accumulated person time Time is part of the denominator Can be used in dynamic populations/cohorts Ignores distinction between individuals (2/100 py could be 2 followed 50 yrs each, both get event or 100 followed 1 yr each, 2 get event) IR (t 0 ,t) = I (t0,t) / ∑PT where
PT
i N
1
t i
or
PT
N
t
Measures of Disease Frequency
Rules for counting person time
Start disease free, free of history of disease at entry At risk for outcome? Not necessary, but wasteful Start after exposure is complete (not during) and after minimum induction period Stop when disease occurs (date or midpoint) Stop if withdrawn (lost to follow up, death from another cause, study ends, no longer at risk)
Only those eligible to be counted in numerator are in denominator
Ask, if became a case, would I have counted them?
Person Time Issues I
We conduct a cohort study of continuous smoking vs. no smoking and prostate cancer
Enroll 1000 smokers and 1000 non-smokers
At end, find 100 non-smokers became smokers. Should we exclude them?
Can’t because if they became cases while not smoking we would have included them
Person Time Issues II
Study HAART regimens and death
But much death and LTFU in first 6-months and we care about long term mortality
Exclude any deaths in first 6-months
OK if all we care about is long-term effects
When should person time start?
Immortal person-time biases towards null
Black triangle Prevalence = 2/8 = 0.25
Black triangle Cum Inc = 2/9
2 5 5 5 5 5 5 5 5
Black triangle Inc Rate = 2/42
Measure of Effect
Comparison of occurrence of outcome in the same population at same time under two different conditions
Only one can be observed Second is “counterfactual” (we will come back to this)
Theoretical, as such we substitute measure of association
But as an approximation to measure of effect
Measures of Association
Comparison of incidence in 2+ populations
Relative:
Comparison by division Null (no effect) is 1 Log scale (distance from 0-1 is same as 1 to infinity)
Difference:
Comparison by subtraction Null (no effect) is 0 Distance above and below null is equivalent
Calculations
RD
CI E
CI E RR
CI CI E E IRD
IR E
IR E IRR
IR IR E E
Conclusion
Objective is a VALID and PRECISE estimate of the effect of an exposure on an outcome
Need to think critically about the logic of the methods we have been taught
Make sure we understand how to validly design studies and how to correctly interpret study findings
Odds ratios are odd
Correct sampling means can reduce reliance on them