Observational Studies Developed through the APTR Initiative to Enhance Prevention and Population Health Education in collaboration with the Brody School of Medicine.
Download ReportTranscript Observational Studies Developed through the APTR Initiative to Enhance Prevention and Population Health Education in collaboration with the Brody School of Medicine.
Observational Studies Developed through the APTR Initiative to Enhance Prevention and Population Health Education in collaboration with the Brody School of Medicine at East Carolina University with funding from the Centers for Disease Control and Prevention APTR wishes to acknowledge the following individual that developed this module: Jeffrey Bethel, PhD Department of Public Health Brody School of Medicine at East Carolina University This education module is made possible through the Centers for Disease Control and Prevention (CDC) and the Association for Prevention Teaching and Research (APTR) Cooperative Agreement, No. 5U50CD300860. The module represents the opinions of the author(s) and does not necessarily represent the views of the Centers for Disease Control and Prevention or the Association for Prevention Teaching and Research. 1. Recognize criteria for initiating various observational studies 2. Identify design components of various observational studies 3. Calculate and interpret outcome measures in various observational studies 4. Recognize advantages and disadvantages to various observational studies Images obtained from: http://tobacco.stanford.edu Images obtained from: http://tobacco.stanford.edu Experimental studies (experimental) Cohort studies (observational) Case-control studies (observational) Cross-sectional studies (observational) Used to study a wider range of exposures than experimental studies “Natural” experiments Mitigate many issues which are not feasible in experimental studies Etiology What are risk factors for developing disease? Includes study of behaviors, occupational or environmental factors Prognosis What factors predict mortality or disability? What elements of care predict other health-related outcomes (e.g. quality of life)? Smith, AH. The Epidemiologic Research Sequence. 1984 Cohort derived from Latin cohors = warriors, group of persons proceeding together in time “Exposed” and “unexposed” are selected by the investigators to be followed longitudinally over time to observe difference in the incidence of the outcome Incidence or follow-up studies STUDY POPULATION NON-RANDOM ASSIGNMENT EXPOSED Develop Disease Do Not Develop Disease UNEXPOSED Develop Disease Do Not Develop Disease STUDY POPULATION Present NON-RANDOM ASSIGNMENT Future Future EXPOSED Develop Disease Do Not Develop Disease UNEXPOSED Develop Disease Do Not Develop Disease STUDY POPULATION Past NON-RANDOM ASSIGNMENT Past Past EXPOSED Develop Disease Do Not Develop Disease UNEXPOSED Develop Disease Do Not Develop Disease Due to large sample size, long follow-up time required and usual high cost, cohort studies usually initiated when: Sufficient evidence obtained from less expensive studies to indicate association of disease with certain exposure(s) New agent requires monitoring for possible association with several diseases • e.g. Oral Contraceptives or Hormone Replacement Therapy Select groups based on exposure status (exposed and unexposed), follow through time and assess outcome Select defined population (e.g. occupation, geographic area) before exposure, follow through time to separate by exposure status, and assess outcome Framingham http://www.framinghamheartstudy.org/ Nurse’s Health Study http://www.channing.harvard.edu/nhs/ Women’s Health Initiative http://www.nhlbi.nih.gov/whi/ Health Professional’s Follow-up Study http://www.hsph.harvard.edu/hpfs/ Women’s Health Study ≥ 45 years No history of coronary heart disease, cerebrovascular disease, cancer, or other major chronic illness No history of side effects to any of study medications Were not taking any of following meds more than once per week: aspirin, NSAIDs, supplements of vitamin A, E, or betacarotene Were not taking anticoagulants or corticosteroids Potential sources Occupational cohorts: ease of identification and adequate number exposed (e.g. Nurse’s Health Study) Prepaid health plan members: ease of identification and health records Schools, military: ease of identification and follow-up Questionnaires Laboratory tests Physical measurements Special procedures Existing records Compare the outcome for the “exposed” group to the outcome in a “substitute” population Substitute population represents the “exposed group without the exposure” Validity of inference depends on finding a valid substitute population From same sample as exposed but do not have exposure Strengths Most comparable to exposed group Weaknesses May be difficult to identify Similar population probably has similar exposures General population, other occupation Strengths Accessible, stable data Weaknesses Lack of comparability with exposed group Results may suffer from healthy worker effect Data on key variables may be missing Sources of information include: Death certificates (if fatal) Hospital records (if hospitalization required) Disease registries (e.g. cancer, birth defects) Physicians records Physical exam (e.g. Framingham) Laboratory tests (e.g. infectious diseases) Questionnaires (if physical not required) Large prospective cohort study providing longitudinal data on cardiovascular disease Recruits residents of Framingham, Massachusetts in whom potential cardiovascular risk factors were first measured nearly 50 years ago Incidence of coronary heart disease (CHD) increases with age and occurs earlier and more frequently in males Persons with hypertension develop CHD at a greater rate than those who are normotensive Elevated blood cholesterol level is associated with an increased risk of CHD Tobacco smoking and habitual use of alcohol are associated with an increased incidence of CHD Relative Risk = Incidence in Exposed Incidence in Unexposed Measure of association used for deriving a causal inference Develops Disease Exposed Not Exposed a c Does Not Develop Disease b d Relative Risk = Incidence in Exposed Incidence in Unexposed Totals Incidence of Disease a+b a a+b c+d c c+d = a/a+b c/c+d If Relative Risk = 1 exposure is NOT associated disease If Relative Risk > 1 exposure is associated with an increased risk of disease If Relative Risk < 1 exposure is associated with a decreased risk of disease (i.e. is protective) Smoke No Smoke CHD No CHD Totals 84 2,916 3,000 87 4,913 5,000 Relative Risk = Incidence in Exposed Incidence in Unexposed CHD Incidence (per 1,000) 28.0 84 _ X 1,000 3,000 17.4 87 _ X 1,000 5,000 = 28.0 17.4 = 1.61 Life table Kaplan-Meier plot Incidence proportion Hazard ratio Multiple logistic regression A cohort study of smoking and bladder cancer was conducted in a small island population. There were a total of 1,000 people on the island. Four hundred were smokers and 600 were not. Fifty of the smokers developed bladder cancer. Fifteen of the non-smokers developed bladder cancer. Calculate and interpret relative risk (RR) Bladder Cancer Smoke No Smoke No Bladder Cancer Totals Incidence of Bladder Cancer Bladder Cancer No Bladder Cancer Totals Smoke 50 350 400 No Smoke 15 585 600 Incidence of Bladder Cancer Bladder Cancer No Bladder Cancer Totals Incidence of Bladder Cancer Smoke 50 350 400 0.125 No Smoke 15 585 600 0.025 Relative Risk = Incidence in Exposed = 0.125 = 5.0 Incidence in Unexposed 0.025 Relative Risk = a/(a+b) = 50/400 c/(c+d) 15/600 = 5.0 Interpretation: Incidence of bladder cancer is 5 times as great in smokers as in nonsmokers Inefficient for evaluation of rare diseases If outcome has long latent period, study can take a long time Generally more expensive If retrospective, requires availability of records Validity of results can be seriously affected by losses to follow-up Useful design when exposure is rare Examine multiple effects of single exposure (multiple outcomes) If prospective, minimize bias in ascertainment of exposure Examine temporal relationship between exposure and disease Allows direct measurement of incidence of disease in exposed and unexposed Direct calculation of relative risk Smith, AH. The Epidemiologic Research Sequence. 1984 EXPOSED NOT EXPOSED EXPOSED NOT EXPOSED DISEASE NO DISEASE CASES CONTROLS Disparaging term given to case-control studies because their logic seemed backwards and they seemed more prone to bias than other designs Case-control studies are a logical extension of cohort studies and an efficient way to learn about associations Little is known about the disease Exposure data are difficult or expensive to obtain Rare disease Disease with long induction and latent period Dynamic underlying population Definition of a case Should lead to accurate classification of diseased and non- diseased individuals Homogeneous disease entity by strict diagnostic criteria, e.g. distinguishing cancer of the corpus uteri (body of uterus) from cancer of the cervix (neck of uterus) Applied uniformly Black or White women (including Hispanic women self-identifying as Black or White) aged 50–79 years, who were residents in the contiguous nine-county Philadelphia, Pennsylvania, region at the time of diagnosis and newly diagnosed with endometrial cancer between July 1, 1999, and June 30, 2002. All cases in a population Representative sample of all cases Disease registries: e.g. cancer, birth defects All hospitals in a community (for diseases requiring hospitalization) Particular hospital or health system Physician records Purpose is to provide information on the exposure distribution in the source population Controls must be identified independently of exposure status Controls are a sample of the population that gave rise to the cases Member of control group who gets the disease “would” end up as a case in the study General population Used when cases are identified from well-defined population (e.g. residents of a geographic area) Sources: RDD, voter reg lists, tax lists, neighborhood Advantage: generally more representative of non-diseased with respect to exposure Disadvantage: not as motivated, potentially lower data quality Cases: active surveillance at 61 of 68 hospitals in 9 counties around Philadelphia Controls: RDD controls were selected from the same geographic region as the cases Hospital/Clinic Used when cases are identified from hospital/clinic rosters Advantage: easily identified, readily available, more aware of prior exposure, same selection factors as hospitalized cases Disadvantage: difficulty determining appropriate illness (unrelated to exposure and same referral pattern as cases) Relatives, friends, classmates, coworkers Used in rare circumstances Advantage: motivated, readily available, less expensive, more similar neighborhood or social class, and more representative of healthy with regard to exposures Disadvantage: may share exposures (e.g. alcohol, occupation) with cases, cases may be unable or may not wish to nominate friends Without randomization, cases and controls may differ in characteristics Individual matching (pairwise or multiple) For each case, select one (or more) controls matched on variables (e.g. age within 5 years and gender) Group matching (frequency matching) Distribution of matching characteristic is similar in cases and controls (e.g. if 30% of cases are women, then 30% of controls should be women) “Random-digit-dialing controls were selected from the same geographic region as the cases, frequency matched to the cases on age (in 5-year age groups) and race (Black or White).” Questionnaires Laboratory tests Physical measurements Special procedures Existing records “Telephone interviews, which averaged 60 minutes, were administered by trained lay interviewers with no knowledge of the study hypotheses.” Exposed Not Exposed Case Control a b c d Odds that a case was exposed = a/(a+c) = c/(a+c) Odds that a control was exposed = b/(b+d) = d/(b+d) Odds ratio = a c b d = ad bc a c b d Smoke (exposed) No Smoke (unexposed) CHD (cases) No CHD (controls) 84 2,916 a b c d 87 4,913 Odds Ratio = (a/c)/(b/d) = ad/bc = 84 x 4913 = 2916 x 87 412,692 253,692 = 1.63 Odds that a person with CHD smoked is 1.63 times the odds that a person without CHD smoked When the cases studied are representative, with regard to history of exposure, of all people WITH the disease in the population from which the cases were drawn When the controls studied are representative, with regard to history of exposure, of all people WITHOUT the disease in the population from which the cases were drawn When the disease studied does not occur frequently (rare disease assumption) Suppose that a case-control study was conducted to evaluate the relationship between artificial sweeteners (AS) and bladder cancer. 3,000 cases and 3,000 controls were enrolled in the study. Among the cases, 1,293 had used artificial sweeteners in the past, while 1,707 had never used artificial sweeteners. Of the controls, 855 had used sweeteners and 2,145 had not. Calculate and interpret odds ratio (OR) Cases Exposed to AS Not Exposed to AS Controls Cases Controls Exposed to AS 1,293 855 Not Exposed to AS 1,707 2,145 TOTAL 3,000 3,000 Odds Ratio = ad_ bc = (1,293)(2,145) = 1.90 (855)(1,707) OR = 1.90 Interpretation Odds that a person with bladder cancer used artificial sweeteners was 1.90 times the odds that a person without bladder cancer used artificial sweeteners Can investigate only one disease outcome Inefficient for rare exposures Cannot directly compute incidence rates of disease in exposed and unexposed Temporal relationship between exposure and disease may be difficult to establish Vulnerable to bias because retrospective (recall bias) Efficient for rare diseases Efficient for diseases with long induction and latent periods Can evaluate multiple exposures in relation to a disease Relatively quick and inexpensive Smith, AH. The Epidemiologic Research Sequence. 1984 Exposure status and disease status of an individual are measured at one point in time Disease prevalence in those with and without exposures or at different exposure levels are compared Useful for health planning STUDY POPULATION Gather Data on Exposure and Disease Exposed, Diseased Exposed, No Disease Unexposed, Diseased Unexposed, No Disease Sometimes based on exposure of interest, if readily identifiable e.g. prevalence of disease in particular ethnic group or geographic area or occupational group For relatively small numbers, entire population may be included or a representative sample e.g. community or a random sample of households Generally questionnaires, records, lab tests, physical measures, special procedures (e.g. air samples) Duration and timing of exposure important to document, if possible, to relate to onset of disease Determined by questionnaire (e.g. symptoms), physical exam (e.g. joints for arthritis), special procedures (e.g. x-rays, lung function) For diseases with exacerbations and remission (e.g. asthma), need to ask asymptomatic if they had symptoms in past Diagnostic criteria determined in advance and applied systematically 2 x 2 tables developed and measures calculated Prevalence ratio Prevalence of disease in exposed divided by prevalence of disease in unexposed Prevalence odds Odds that a diseased person was exposed or unexposed Prevalence odds ratio Ratio of prevalence odds in exposed to prevalence odds in unexposed HIV infection and intravenous drug use (IVDU) among women in New York State Prison System HIV + HIV - Totals IVDU + 61 75 136 IVDU - 27 312 338 61/136__ Prevalence ratio = 27/338 = 5.61 Interpretation: IV drug users are 5.61 times as likely to be infected with HIV than non-IV drug users Prevalence odds ratio = 61 x 312 = 9.40 75 x 27 Interpretation: Odds that a HIV+ person uses IV drugs is 9.4 times the odds that a HIV- person uses IV drugs Lack of temporal sequence of exposure preceding disease Tends to include cases with long duration, which may have different characteristics and risk factors than series of incident cases Potential misclassification of disease status if disease has exacerbations and remissions (e.g. asthma, multiple sclerosis, lupus) or if disease is being treated (e.g. hypertension) Often have reasonably good generalizability Data on individuals, not groups as in ecologic studies Often conducted in a relatively short period of time Less costly than cohort and case-control studies Observational studies are “natural experiments” Cohort studies explicitly incorporates passage of time Case-control studies are retrospective Uniformity in data collection is key to increased validity Relative risk (cohort) and odds ratio (case-control and cross-sectional) are the key measures of association Center for Public Health Continuing Education University at Albany School of Public Health Department of Community & Family Medicine Duke University School of Medicine Mike Barry, CAE Lorrie Basnight, MD Nancy Bennett, MD, MS Ruth Gaare Bernheim, JD, MPH Amber Berrian, MPH James Cawley, MPH, PA-C Jack Dillenberg, DDS, MPH Kristine Gebbie, RN, DrPH Asim Jani, MD, MPH, FACP Denise Koo, MD, MPH Suzanne Lazorick, MD, MPH Rika Maeshiro, MD, MPH Dan Mareck, MD Steve McCurdy, MD, MPH Susan M. Meyer, PhD Sallie Rixey, MD, MEd Nawraz Shawir, MBBS Sharon Hull, MD, MPH President Allison L. Lewis Executive Director O. Kent Nordvig, MEd Project Representative