Transcript Slide 1

LECTURE 3 – June 9 2006
Cohort Studies,
Selection Bias
Survival analysis
Dr. Dick Menzies
Cohort Studies – General
• Prospective study: Incidence of new disease in
persons who start without disease.
– Follow-up period – weeks, months, years
– One or more diseases can be measured
• Measure exposures – at start or ongoing.
– Can measure multiple exposures
• Compare incidence in exposed vs unexposed
groups within population – per unit of time
Advantages of cohort over case-control or
cross-sectional designs
• KEY – exposure measurement is made before
disease occurs
– Exposure more accurate – prospective, and often
repeated
– Eliminates bias in measurement of exposures:
• Recall bias of patients, or observer bias in exposure
assessment - with knowledge of disease status.
Experimental vs cohort studies
• Expt studies are a form of cohort study
– Same - Persons are free of disease at outset
– But - Exposure is RANDOMLY ASSIGNED to
some/not others
– Same - Measure outcomes after exposure
• Cohort study – exposures NOT assigned, but
occur naturally, or are chosen purposely by
subjects, or by their MD’s, etc
Advantages of cohort studies over
experimental
• Ideal to study natural history, course of disease,
prognostic factors.
• Etiologic research for exposures that can not be
given experimentally, for ethical reasons
– Smoking, asbestos, air pollution
• Interventions not feasible for randomization
– Diagnostic tests, complex care management
• Some outcomes not well measured in trials:
– Compliance by patients and MD’s,
Advantages of cohort studies over
experimental
• Total population studied.
– Children, elderly, pregnancy, mentally incompetent,
• Full spectrum of illness
– From patients in ICU to minimal forms of disease
– Often excluded in RCT – esp Pharma trials
• Findings more likely to be applicable in real world
– Adverse events often more accurately measured
• Population based estimates of exposure effects
• BUT you MUST include the full spectrum of patients
as possible (No exclusions in observational studies)
Disadvantages
• Selection bias – Persons who get exposed not
same as unexposed
– Surgery – who is ‘operable’ vs ‘inoperable’
• Exposures that seem same, are not
– Potential bias in measuring
• Drop-outs – reduce power, may bias (a lot)
• Outcome assessment can be biased
Cohort Designs
• Prospective: Subjects without disease followed to
determine incidence of diseases
– Exposures measured at baseline, and/or
concurrently.
– Disease – measured during follow-up
• Retrospective: Subjects first identified based on
past Exposures (Hiroshima survivors, work-force)
– Outcomes may then be ascertained directly, or also
have already occurred
– Key – exposure well defined, AND occurred well
before disease (useful for diseases like cancer)
Cohort Populations
• General populations – no special exposures
– Framingham study – a true general population
• All persons in the community invited
• Proxy general Pop’n - Nurses, Military, Company
– Exposures studied are those of general pop’n.
– Diet, exercise, smoking, alcohol
• Exposure defined cohort
– Work-force to study occupational exposures
– Group of patients who received certain therapy
Cohorts of patients
• Clinical cohorts – patients with a given condition
– Case series can be form of cohort study
– But – must have differences in ‘exposure’
• Different types, severity, causes
• Potential problems in cohort studies with patients:
– Referral bias – only sickest, rarest,
– Lead-time bias – better facilities = earlier Dx
– Multi-serial cohorts –
• Cohort starts with all diabetics in 2004
• New, and old = very different patients
Open versus Closed Cohorts
• An open cohort – or dynamic cohort - is one where
people can enter or leave
– Examples: A workforce study that is ongoing
– A city or other geographic location
• A closed cohort is where all persons in the cohort
are defined at entry. No one enters, members can
only exit.
– Eg. McGill medical school class of 2004
Selection Bias
• Definition – selection bias occurs when there is a
distortion in the estimate of effect (association)
because the study or sample population is not truly
representative of the underlying population in terms
of the distribution of exposures and/or outcomes.
• Other terms: referral bias, volunteer bias, healthy
worker effect, susceptibility bias, drop-out bias
• How/where in a study can this occur?
REASONS
FOR LOSSES
GROUPS
LOSSES
INTENDED
POPULATION
NOT
AVAILABLE
AVAILABLE
GROUP
NOT
CANDIDATES
CANDIDATE
GROUP
NOT
ELIGIBLE
ELIGIBLE
GROUP
EXCLUDED
Superimposed condition of
severity, co-morbidity, comedication, or non-compliance
QUALIFIED
GROUP
NONRECEPTIVE
Refused participation or
acceptance of assigned
maneuver
Treated at other hospitals
or by other doctors
Not identified or
accessible
Did not fulfill
diagnostic criteria
ADMITTED
GROUP
Figure 15-2. Diagram showing successive transfers from the intended population to the group admitted to a study of therapy
Obtaining a representative sample
• In a representative sample we hope for a sample that shows
us the true underlying distribution of exposure and disease:
Truth – distribution of exposure and
disease in source population
Exposed
Not Exposed
Diseased
Not Diseased
A
C
• Odds Ratio = (A/B) / (C/D)
=AxD
BxC
B
D
Un-biased Sampling
Exposed
Not Exposed
Diseased
P1A
P2B
Not Diseased
P3C
P4D
• Odds Ratio = (P1 x P4)
(P2 x P3)
IF (P1 x P4) x 1 THEN
(P2 x P3)
x (A x D)
(B x C)
OR = (A x D) = Truth!
(B x C)
Biased Sampling
Exposed
Not Exposed
Diseased
P1A
P2B
Not Diseased
P3C
P4D
• If sample all of A (P1=1) but only half of B (P2 =0.5)
• And 1/3 of C and D (P3=0.33, P4=0.33)
• Odds Ratio = (P1 x P4) = (1x.33) = 2 x (A x D)
(P2 x P3) = (.5X.33)
(B x C)
IF (P1 x P4)=2 THEN
(P2 x P3)
ORestimated = 2X ORTrue
Example – Biased sampling
• We are planning a case control study of spicy foods
and peptic ulcer disease
– Cases = endoscopy proven peptic ulcer disease
– Controls = elective inguinal hernia repair at the same
hospital
• The truth: no relationship i.e. the odds ratio = 1
• The problem – physician at this hospital strongly
believe spicy foods is an important risk factor for
peptic ulcer disease.
– Therefore they tend to refer patients for endoscopy
more often if they had a diet of spicy foods
Biased sampling (cont’d)
• TRUTH:
Spicy Foods No Spicy Foods Odds
Truth
Biased sample
Cases
25
75
25/75
Controls
25
75
25/75
Total
50
150
1.0
Cases
25
37.5
25/37.5
Controls
25
75
25/37.5
Total
50
112.5
2.0
Example: biased sampling
• So, 100% of patients with peptic ulcer disease
AND history of spicy foods have endoscopy
• But only half of those with peptic ulcer, but
WITHOUT history of spicy food are in fact
diagnosed – (they do not have endoscopy, so
they are missed)
• Estimated association will be twice what is
correct.
To achieve Un-biased Sampling
• To achieve un-biased sampling the easiest is:
•
P1= P2=P3=P4
• This means the proportion sampled from each
group is the same, i.e., 10% are sampled from each
of the groups
• However if P1 is higher than P2 this can be okay as
long as P4 is also increased more than P3
Volunteer Bias
• Participants in a study are different from refuseniks
– Mortality of non-participants in the Framingham
study
• Subjects with exposure and the outcome are more
(or less) likely to participate
– Eg HIV infection and homosexuality – in Africa
– Disease and occupational exposures, particularly for
self-reported exposures, and compensable illnesses.
Susceptibility bias
• Persons allocated to one form of treatment, or who
who self-select to certain exposures are more, or
less susceptible to develop health outcomes of
interest.
– Eg Cancer patients who have surgery vs medical or
radiotherapy only. Surgical patients often appear to
do better.
Healthy worker effect
• An important bias – found in work-force studies
– Reflects medical screening (military, mining)
– Or, physical requirements of job
• Results in better health status initially than general
population, or certain control pop’n
– Strongly affects results in cross-sectional studies
– Reduces risk or delays occurrence of health
outcomes of interest.
• Also occurs in smokers “healthy smoker effect”
– Lung function in adolescent smokers > non-smokers
Example of healthy smoker effect
Selection Bias in Cohort Studies – Dropouts
•
•
•
•
Losses to follow up occur in all cohort studies
Reduce power, and dilute results
Problematic if more drop-outs in one exposure group
REALLY important if drop-out is due to development
of disease
Selection Bias in Cohort Studies – Dropouts
• Example:
– study of incidence of diabetes in obese persons.
– Truth: IRR = 3.0
– Losses – 33% in diabetes/obesity group (death/other)
• 5% losses in all other groups
– (P1 x P4) does not = 1
(P2 x P3)
Selection Bias from Dropouts - Example
At onset
Dropped Out
No DM
Diabetes
Detected at end
with diabetes
Obese
227
10
9
18
Not Obese
773
35
3
30
• Incidence (biased):
•In obese
– 18/208 = 8.7%
•In non-obese – 30/735 = 4.1%
• Biased incidence rate ratio – 8.7%/4.1% = 2.1
Drop-outs from a work-force - impact
• An occupational exposure causes health effects
quickly in a susceptible sub-group.
– They leave the work-force (quit) quickly.
– Examples:
• Allergy to lab animals in researchers
• Asthma in Grain workers
• Cross-sectional studies – no susceptibles left
• Cohorts – Can miss when setting up cohort.
– Outcomes occur in small number of new workers
(power problem)
Controlling Selection Bias
• Control in design - Most important is prevention
– Recruitment – high % in all groups
– Same %recruitment in exposed/not exposed
– Close follow-up to prevent dropouts
• Assess in analysis
– Compare participants to non-participants
• Sub-groups of non-participant
– Compare dropouts with those who remained
– Sensitivity analysis – best case/ worst case to
assess impact of selection biases
Cohort Studies – Exposure Assessments
• Prospective - Measure one or more exposures at start
– Specific: cholesterol, obesity, smoking, blood pressure.
– Proxies: occupation, housing
– Measure once, or repeatedly to account for changes in
exposure over time (obesity, smoking, BP).
• Retrospective
– Exposure based upon past events
– These are rarely quantified
• Proxies used (job description, distance from blast)
• Sometimes records (transfusions, dust levels)
Pitfalls in exposure assessments
• Observer bias – disease ascertained at same time
– Blind observers to study hypothesis
– Standardized protocols
• Are all exposures the same?
– Complications of pleural tap at MGH/RVH >> MCI
• Did you forget something?
– Hard to go back to the start of cohort
– Measure everything, freeze the rest
– Add measures as new things reported
Cohort Studies – Outcome Assessments
• Baseline – ensure cohort members free of disease.
– Easy if prospective, harder if retrospective
• Outcomes measured periodically
– Through questionnaire, exam, labs (direct)
– Through health service utilization (databases)
– Through vital statistics (databases)
• Case definition key for outcome assessments
– Diagnosis of milder disease common problem
Pitfalls in outcome assessments
• Ascertainment bias – if patients with Factor X are
more likely to have testing to detect outcome.
– Standardized protocols, blinding to exposures
• Observer bias – patients with Factor X more likely
to be diagnosed with outcome of interest
– Common with more subjective tests – eg CXR
– Solution – independent reviewers, blinded to
exposure status (Factor X)
• Lead time bias – earlier diagnosis makes survival
look better
Lead-time bias - example
Cohort Studies – Measures of Incidence
• Incidence rate (simplest) =
number developing disease
Total number who entered cohort
per unit of time
• Cumulative incidence =
number developing disease
Total number who entered cohort
Over total follow-up period
Measuring Incidence in Cohort Studies
How to handle drop outs etc..?
• Drop-outs from loss to follow-up, death other
causes, or withdraw consent are common
– Up to 50% in long term cohorts
• Include or exclude from analysis?
• Simple incidence measures - excludes
• Need to allow variable length of follow up
– And count people who enter after the first year
Incidence Density (ID)
• Counts person-time (person-years/months)
•Starts count when person enters cohort
• Each year of follow-up added up
Patient
Exposed
Enter in
year
Stop in
Year
Years of
FU
Disease
occurrence
1
YES
1
3
2
NO
2
YES
3
12
10
YES
3
NO
1
8
8
NO
4
NO
2
11
10
YES
ID in Exposed = 1 event in 12 person years
ID in Unexposed = 1 event in 18 person years
Cohort studies – Measure of Association:
Risk Ratios, or Incidence rate ratios
• Summary measure of association in Cohort Studies
• Formula for Incidence rate ratio (IRR) =
Incidence of disease in persons with exposure
Incidence of disease in persons without exposure
Ndisease/Nexposed per unit time
or
Ndisease/Nunexposed per unit time
* Note – in IRR there is no unit of time. This assumes the
amount of time was similar for those with and without disease
and those exposed and unexposed
Calculation of Risk Ratio - example
• Cohort at inception: 1,000 people without diabetes
– Prevalence of obesity at inception = 22.7%
•
•
•
•
Outcome: Incidence of diabetes in a population
Exposure - obesity at inception of cohort
Follow-up - six years
Overall incidence of diabetes = 1% per year
– Cumulative Incidence = 6%
– Risk = cumulative incidence
Risk Ratio Calculation - Example
Number with
exposure
Developed
Diabetes
Cumulative
Incidence rate
Obese
227
27
27/227
Non Obese
773
33
33/773
1,000
60
Total
Ratio of Incidence = risk ratio = 27/227 / 33/773
= 12 / 4
= 3.0
Incidence Density Ratio
Patient
Exposed Follow up Years Disease
1
YES
2
NO
2
YES
10
YES
3
NO
8
NO
4
NO
10
YES
• Incidence rate ratio = (1/2) / (1/2) = 1
• Density method = (0/2 years) + (1/10 years)
(0/8 years) + (1/10 years)
• Incidence density ratio = (1/12)
(1/18)
= 1.5
Incidence Rate Difference
• A patient asks “How much will my risk of heart attack go
down if I take this new drug (B), instead of old one (A)?”
• Answer using incidence rate difference
Incidence with Drug A - Incidence with Drug B
= 0.5%/year – 0.3%/year = 0.2%/year, or, a 40% reduction
• Same answer using Incidence rate ratio:
= Incidence with Drug B = 0.3% = 0.6, or, a 40% reduction
Incidence with Drug A 0.5%
Attributable risk
• “How many lung cancers are due to air pollution in
Montreal?” Same as “What is attributable risk?”
• Attributable risk = IRR x Prevalence of exposure
– Increases with higher IRR
– Or if exposure more common
• Diabetes vs Silicosis and TB
– Diabetes: IRR = 3.5, Prevalence = 3%
– Silicosis: IRR = 12, Prevalence = 0.1%
– Attrib risk for Diabetes >> than for Silicosis
Cohort Studies – Survival Analysis
•
•
•
•
Analysis of time to event
Accounts for variable length of follow up.
Advantage if time to event affected by exposure.
Can find important differences in treatments even
overall survival same:
– Cancer treatment A increases survival at two years
– But five year mortality is same as treatment B.
– Treatment A - preferred by most patients!
Important differences found using Survival
analysis
Types of Survival Analysis
• Simplest – Direct
• Kaplan-Meier – still pretty simple. Calculates
cumulative proportion free of outcome (survived) at
each point in time when that outcome occurs.
People who drop out or die of other causes are
‘censored’. At each point numerator is all who have
developed disease, while denominator is all without
outcome in the interval just before
• Cox regression analysis – multivariate analysis with
same basic principles
Kaplan Meier survival analysis - example
Time
Number
at start
During interval
Surviving
at end
Drop-outs Deaths
Proportion surviving
Interval Cumulative
0
100
0
0
100
1.0
1.0
3 months
100
10
0
90
1.0
1.0
6 months
90
10
10
70
0.88
0.88
10 months
70
0
10
60
0.86
0.75
12 months
60
10
10
40
0.8
0.6
18 months
40
10
0
30
1.0
0.6
Notes: Intervals are variable – defined by when subjects die
Proportion surviving interval – excludes drop-outs during the interval (censored)
Kaplan Meier survival analysis - example
100%
% Surviving
90%
80%
70%
60%
50%
0
3
6
9
12
15
18
Example of Kaplan-Meier analysis: General
Hospital Ventilation and time to TST conversion