Transcript Slide 1
LECTURE 3 – June 9 2006 Cohort Studies, Selection Bias Survival analysis Dr. Dick Menzies Cohort Studies – General • Prospective study: Incidence of new disease in persons who start without disease. – Follow-up period – weeks, months, years – One or more diseases can be measured • Measure exposures – at start or ongoing. – Can measure multiple exposures • Compare incidence in exposed vs unexposed groups within population – per unit of time Advantages of cohort over case-control or cross-sectional designs • KEY – exposure measurement is made before disease occurs – Exposure more accurate – prospective, and often repeated – Eliminates bias in measurement of exposures: • Recall bias of patients, or observer bias in exposure assessment - with knowledge of disease status. Experimental vs cohort studies • Expt studies are a form of cohort study – Same - Persons are free of disease at outset – But - Exposure is RANDOMLY ASSIGNED to some/not others – Same - Measure outcomes after exposure • Cohort study – exposures NOT assigned, but occur naturally, or are chosen purposely by subjects, or by their MD’s, etc Advantages of cohort studies over experimental • Ideal to study natural history, course of disease, prognostic factors. • Etiologic research for exposures that can not be given experimentally, for ethical reasons – Smoking, asbestos, air pollution • Interventions not feasible for randomization – Diagnostic tests, complex care management • Some outcomes not well measured in trials: – Compliance by patients and MD’s, Advantages of cohort studies over experimental • Total population studied. – Children, elderly, pregnancy, mentally incompetent, • Full spectrum of illness – From patients in ICU to minimal forms of disease – Often excluded in RCT – esp Pharma trials • Findings more likely to be applicable in real world – Adverse events often more accurately measured • Population based estimates of exposure effects • BUT you MUST include the full spectrum of patients as possible (No exclusions in observational studies) Disadvantages • Selection bias – Persons who get exposed not same as unexposed – Surgery – who is ‘operable’ vs ‘inoperable’ • Exposures that seem same, are not – Potential bias in measuring • Drop-outs – reduce power, may bias (a lot) • Outcome assessment can be biased Cohort Designs • Prospective: Subjects without disease followed to determine incidence of diseases – Exposures measured at baseline, and/or concurrently. – Disease – measured during follow-up • Retrospective: Subjects first identified based on past Exposures (Hiroshima survivors, work-force) – Outcomes may then be ascertained directly, or also have already occurred – Key – exposure well defined, AND occurred well before disease (useful for diseases like cancer) Cohort Populations • General populations – no special exposures – Framingham study – a true general population • All persons in the community invited • Proxy general Pop’n - Nurses, Military, Company – Exposures studied are those of general pop’n. – Diet, exercise, smoking, alcohol • Exposure defined cohort – Work-force to study occupational exposures – Group of patients who received certain therapy Cohorts of patients • Clinical cohorts – patients with a given condition – Case series can be form of cohort study – But – must have differences in ‘exposure’ • Different types, severity, causes • Potential problems in cohort studies with patients: – Referral bias – only sickest, rarest, – Lead-time bias – better facilities = earlier Dx – Multi-serial cohorts – • Cohort starts with all diabetics in 2004 • New, and old = very different patients Open versus Closed Cohorts • An open cohort – or dynamic cohort - is one where people can enter or leave – Examples: A workforce study that is ongoing – A city or other geographic location • A closed cohort is where all persons in the cohort are defined at entry. No one enters, members can only exit. – Eg. McGill medical school class of 2004 Selection Bias • Definition – selection bias occurs when there is a distortion in the estimate of effect (association) because the study or sample population is not truly representative of the underlying population in terms of the distribution of exposures and/or outcomes. • Other terms: referral bias, volunteer bias, healthy worker effect, susceptibility bias, drop-out bias • How/where in a study can this occur? REASONS FOR LOSSES GROUPS LOSSES INTENDED POPULATION NOT AVAILABLE AVAILABLE GROUP NOT CANDIDATES CANDIDATE GROUP NOT ELIGIBLE ELIGIBLE GROUP EXCLUDED Superimposed condition of severity, co-morbidity, comedication, or non-compliance QUALIFIED GROUP NONRECEPTIVE Refused participation or acceptance of assigned maneuver Treated at other hospitals or by other doctors Not identified or accessible Did not fulfill diagnostic criteria ADMITTED GROUP Figure 15-2. Diagram showing successive transfers from the intended population to the group admitted to a study of therapy Obtaining a representative sample • In a representative sample we hope for a sample that shows us the true underlying distribution of exposure and disease: Truth – distribution of exposure and disease in source population Exposed Not Exposed Diseased Not Diseased A C • Odds Ratio = (A/B) / (C/D) =AxD BxC B D Un-biased Sampling Exposed Not Exposed Diseased P1A P2B Not Diseased P3C P4D • Odds Ratio = (P1 x P4) (P2 x P3) IF (P1 x P4) x 1 THEN (P2 x P3) x (A x D) (B x C) OR = (A x D) = Truth! (B x C) Biased Sampling Exposed Not Exposed Diseased P1A P2B Not Diseased P3C P4D • If sample all of A (P1=1) but only half of B (P2 =0.5) • And 1/3 of C and D (P3=0.33, P4=0.33) • Odds Ratio = (P1 x P4) = (1x.33) = 2 x (A x D) (P2 x P3) = (.5X.33) (B x C) IF (P1 x P4)=2 THEN (P2 x P3) ORestimated = 2X ORTrue Example – Biased sampling • We are planning a case control study of spicy foods and peptic ulcer disease – Cases = endoscopy proven peptic ulcer disease – Controls = elective inguinal hernia repair at the same hospital • The truth: no relationship i.e. the odds ratio = 1 • The problem – physician at this hospital strongly believe spicy foods is an important risk factor for peptic ulcer disease. – Therefore they tend to refer patients for endoscopy more often if they had a diet of spicy foods Biased sampling (cont’d) • TRUTH: Spicy Foods No Spicy Foods Odds Truth Biased sample Cases 25 75 25/75 Controls 25 75 25/75 Total 50 150 1.0 Cases 25 37.5 25/37.5 Controls 25 75 25/37.5 Total 50 112.5 2.0 Example: biased sampling • So, 100% of patients with peptic ulcer disease AND history of spicy foods have endoscopy • But only half of those with peptic ulcer, but WITHOUT history of spicy food are in fact diagnosed – (they do not have endoscopy, so they are missed) • Estimated association will be twice what is correct. To achieve Un-biased Sampling • To achieve un-biased sampling the easiest is: • P1= P2=P3=P4 • This means the proportion sampled from each group is the same, i.e., 10% are sampled from each of the groups • However if P1 is higher than P2 this can be okay as long as P4 is also increased more than P3 Volunteer Bias • Participants in a study are different from refuseniks – Mortality of non-participants in the Framingham study • Subjects with exposure and the outcome are more (or less) likely to participate – Eg HIV infection and homosexuality – in Africa – Disease and occupational exposures, particularly for self-reported exposures, and compensable illnesses. Susceptibility bias • Persons allocated to one form of treatment, or who who self-select to certain exposures are more, or less susceptible to develop health outcomes of interest. – Eg Cancer patients who have surgery vs medical or radiotherapy only. Surgical patients often appear to do better. Healthy worker effect • An important bias – found in work-force studies – Reflects medical screening (military, mining) – Or, physical requirements of job • Results in better health status initially than general population, or certain control pop’n – Strongly affects results in cross-sectional studies – Reduces risk or delays occurrence of health outcomes of interest. • Also occurs in smokers “healthy smoker effect” – Lung function in adolescent smokers > non-smokers Example of healthy smoker effect Selection Bias in Cohort Studies – Dropouts • • • • Losses to follow up occur in all cohort studies Reduce power, and dilute results Problematic if more drop-outs in one exposure group REALLY important if drop-out is due to development of disease Selection Bias in Cohort Studies – Dropouts • Example: – study of incidence of diabetes in obese persons. – Truth: IRR = 3.0 – Losses – 33% in diabetes/obesity group (death/other) • 5% losses in all other groups – (P1 x P4) does not = 1 (P2 x P3) Selection Bias from Dropouts - Example At onset Dropped Out No DM Diabetes Detected at end with diabetes Obese 227 10 9 18 Not Obese 773 35 3 30 • Incidence (biased): •In obese – 18/208 = 8.7% •In non-obese – 30/735 = 4.1% • Biased incidence rate ratio – 8.7%/4.1% = 2.1 Drop-outs from a work-force - impact • An occupational exposure causes health effects quickly in a susceptible sub-group. – They leave the work-force (quit) quickly. – Examples: • Allergy to lab animals in researchers • Asthma in Grain workers • Cross-sectional studies – no susceptibles left • Cohorts – Can miss when setting up cohort. – Outcomes occur in small number of new workers (power problem) Controlling Selection Bias • Control in design - Most important is prevention – Recruitment – high % in all groups – Same %recruitment in exposed/not exposed – Close follow-up to prevent dropouts • Assess in analysis – Compare participants to non-participants • Sub-groups of non-participant – Compare dropouts with those who remained – Sensitivity analysis – best case/ worst case to assess impact of selection biases Cohort Studies – Exposure Assessments • Prospective - Measure one or more exposures at start – Specific: cholesterol, obesity, smoking, blood pressure. – Proxies: occupation, housing – Measure once, or repeatedly to account for changes in exposure over time (obesity, smoking, BP). • Retrospective – Exposure based upon past events – These are rarely quantified • Proxies used (job description, distance from blast) • Sometimes records (transfusions, dust levels) Pitfalls in exposure assessments • Observer bias – disease ascertained at same time – Blind observers to study hypothesis – Standardized protocols • Are all exposures the same? – Complications of pleural tap at MGH/RVH >> MCI • Did you forget something? – Hard to go back to the start of cohort – Measure everything, freeze the rest – Add measures as new things reported Cohort Studies – Outcome Assessments • Baseline – ensure cohort members free of disease. – Easy if prospective, harder if retrospective • Outcomes measured periodically – Through questionnaire, exam, labs (direct) – Through health service utilization (databases) – Through vital statistics (databases) • Case definition key for outcome assessments – Diagnosis of milder disease common problem Pitfalls in outcome assessments • Ascertainment bias – if patients with Factor X are more likely to have testing to detect outcome. – Standardized protocols, blinding to exposures • Observer bias – patients with Factor X more likely to be diagnosed with outcome of interest – Common with more subjective tests – eg CXR – Solution – independent reviewers, blinded to exposure status (Factor X) • Lead time bias – earlier diagnosis makes survival look better Lead-time bias - example Cohort Studies – Measures of Incidence • Incidence rate (simplest) = number developing disease Total number who entered cohort per unit of time • Cumulative incidence = number developing disease Total number who entered cohort Over total follow-up period Measuring Incidence in Cohort Studies How to handle drop outs etc..? • Drop-outs from loss to follow-up, death other causes, or withdraw consent are common – Up to 50% in long term cohorts • Include or exclude from analysis? • Simple incidence measures - excludes • Need to allow variable length of follow up – And count people who enter after the first year Incidence Density (ID) • Counts person-time (person-years/months) •Starts count when person enters cohort • Each year of follow-up added up Patient Exposed Enter in year Stop in Year Years of FU Disease occurrence 1 YES 1 3 2 NO 2 YES 3 12 10 YES 3 NO 1 8 8 NO 4 NO 2 11 10 YES ID in Exposed = 1 event in 12 person years ID in Unexposed = 1 event in 18 person years Cohort studies – Measure of Association: Risk Ratios, or Incidence rate ratios • Summary measure of association in Cohort Studies • Formula for Incidence rate ratio (IRR) = Incidence of disease in persons with exposure Incidence of disease in persons without exposure Ndisease/Nexposed per unit time or Ndisease/Nunexposed per unit time * Note – in IRR there is no unit of time. This assumes the amount of time was similar for those with and without disease and those exposed and unexposed Calculation of Risk Ratio - example • Cohort at inception: 1,000 people without diabetes – Prevalence of obesity at inception = 22.7% • • • • Outcome: Incidence of diabetes in a population Exposure - obesity at inception of cohort Follow-up - six years Overall incidence of diabetes = 1% per year – Cumulative Incidence = 6% – Risk = cumulative incidence Risk Ratio Calculation - Example Number with exposure Developed Diabetes Cumulative Incidence rate Obese 227 27 27/227 Non Obese 773 33 33/773 1,000 60 Total Ratio of Incidence = risk ratio = 27/227 / 33/773 = 12 / 4 = 3.0 Incidence Density Ratio Patient Exposed Follow up Years Disease 1 YES 2 NO 2 YES 10 YES 3 NO 8 NO 4 NO 10 YES • Incidence rate ratio = (1/2) / (1/2) = 1 • Density method = (0/2 years) + (1/10 years) (0/8 years) + (1/10 years) • Incidence density ratio = (1/12) (1/18) = 1.5 Incidence Rate Difference • A patient asks “How much will my risk of heart attack go down if I take this new drug (B), instead of old one (A)?” • Answer using incidence rate difference Incidence with Drug A - Incidence with Drug B = 0.5%/year – 0.3%/year = 0.2%/year, or, a 40% reduction • Same answer using Incidence rate ratio: = Incidence with Drug B = 0.3% = 0.6, or, a 40% reduction Incidence with Drug A 0.5% Attributable risk • “How many lung cancers are due to air pollution in Montreal?” Same as “What is attributable risk?” • Attributable risk = IRR x Prevalence of exposure – Increases with higher IRR – Or if exposure more common • Diabetes vs Silicosis and TB – Diabetes: IRR = 3.5, Prevalence = 3% – Silicosis: IRR = 12, Prevalence = 0.1% – Attrib risk for Diabetes >> than for Silicosis Cohort Studies – Survival Analysis • • • • Analysis of time to event Accounts for variable length of follow up. Advantage if time to event affected by exposure. Can find important differences in treatments even overall survival same: – Cancer treatment A increases survival at two years – But five year mortality is same as treatment B. – Treatment A - preferred by most patients! Important differences found using Survival analysis Types of Survival Analysis • Simplest – Direct • Kaplan-Meier – still pretty simple. Calculates cumulative proportion free of outcome (survived) at each point in time when that outcome occurs. People who drop out or die of other causes are ‘censored’. At each point numerator is all who have developed disease, while denominator is all without outcome in the interval just before • Cox regression analysis – multivariate analysis with same basic principles Kaplan Meier survival analysis - example Time Number at start During interval Surviving at end Drop-outs Deaths Proportion surviving Interval Cumulative 0 100 0 0 100 1.0 1.0 3 months 100 10 0 90 1.0 1.0 6 months 90 10 10 70 0.88 0.88 10 months 70 0 10 60 0.86 0.75 12 months 60 10 10 40 0.8 0.6 18 months 40 10 0 30 1.0 0.6 Notes: Intervals are variable – defined by when subjects die Proportion surviving interval – excludes drop-outs during the interval (censored) Kaplan Meier survival analysis - example 100% % Surviving 90% 80% 70% 60% 50% 0 3 6 9 12 15 18 Example of Kaplan-Meier analysis: General Hospital Ventilation and time to TST conversion