I is for Investigation Outbreak Investigation Methods from Mystery to Mastery Session V Analyzing Data.
Download ReportTranscript I is for Investigation Outbreak Investigation Methods from Mystery to Mastery Session V Analyzing Data.
I is for Investigation Outbreak Investigation Methods from Mystery to Mastery Session V Analyzing Data Session Overview • Analysis planning and data cleaning • Attack rates • Hypothesis testing in analytic epidemiology – Measures of association – Tests of significance Learning Objectives • Understand the purpose of an analytic study in an epidemiologic outbreak investigation • Generate measures of association for cohort and case-control studies • Interpret measures of association (risk ratios, odds ratios) and corresponding confidence intervals • Interpret a statistical test of significance Basic Steps of an Outbreak Investigation 1. 2. 3. 4. 5. 6. 7. 8. Verify the diagnosis and confirm the outbreak Define a case and conduct case finding Tabulate and orient data: time, place, person Take immediate control measures Formulate and test hypotheses Plan and execute studies Implement and evaluate control measures Communicate findings Analysis Planning and Data Cleaning Purpose of Analysis Planning • Tailor questionnaire to collect necessary data in the correct format • Select the most appropriate epidemiologic methods • Use data efficiently in analytic software Factors Influencing Data Analysis • • • • Research question Exposure and outcome variables Study design Sampling Steps for Analysis Planning 1. Work backwards from the research question(s) to design the most efficient data collection instrument 2. Study design will determine which statistical tests and measures of association you evaluate in the analysis output 3. Consider the need to present, graph, or map data Analysis Planning: Step 1 1. Work backwards from the research question(s) to design the most efficient data collection instrument – Develop a sound data collection instrument – Collect pieces of information that can be counted, sorted, and recoded or stratified Analysis phase is not the time to realize that you should have asked questions differently! Analysis Planning: Step 2 2. Study design will determine which statistical tools you will use – Use risk ratio (RR) with cohort studies and odds ratio (OR) with case-control studies Match sampling methods to the required special types of analysis Analysis Planning: Step 3 3. Consider the need to present, graph, or map data – Consider continuous versus categorical data – Collect additional data needed to map data Data Cleaning • Check for accuracy – Outliers • Check for completeness – Missing values • Get to know the descriptive findings • Determine whether or not to create or collapse data categories Data Cleaning: Distribution of Variables 8 6 4 2 2/ 5 2/ 6 2/ 7 2/ 8 2/ 2/ 9 10 2/ 11 2/ 12 2/ 13 2/ 14 2/ 15 2/ 16 2/ 17 2/ 18 2/ 19 2/ 20 2/ 21 2/ 22 2/ 23 2/ 24 2/ 25 2/ 2/ “Outlier” 4 0 3 Number of Cases Illness Onset for Outbreak of Gastrointestinal Illness at a Nursing Home Day of Onset Data Cleaning: Outliers • Value falling far outside the values of the sample – First or last cases in an outbreak – Very high or very low values for a variable • May be due to a collection, coding or data entry error • If not due to error, the outlier may represent: – – – – – Baseline level of illness / unrelated to outbreak Outbreak source Case exposed earlier than the others Case exposed later than the others Case with a long incubation period Data Cleaning: Missing Values • Determine missing values by performing frequency distribution • Select missing values for follow up • Reasons for missing values – Expected missing values • Skip patterns • Non applicable questions – Problems in data collection or data entry Data Cleaning: Data Categories • Determine which variables are continuous versus categorical • Decide whether to collapse existing categories – Never, former, sometimes, frequently – Ever, never • Assess need to categorize continuous variables Attack Rates Understanding Attack Rates Attack Rate (AR) # of people who became ill # of people at risk (for a limited period of time) Exposure-specific AR # people who were exposed and became ill # of people who were exposed Food-Specific Attack Rates Consumed Item Did Not Consume Item Item Ill Total AR(%) Ill Total AR(%) Chicken 12 46 26 17 29 59 Cake 26 43 61 20 32 63 Water Green Salad 10 24 42 33 51 65 42 54 78 3 21 14 Asparagus 4 6 67 42 69 61 This food is probably not the source of infection CDC. Outbreak of foodborne streptococcal disease. MMWR 23:365, 1974. Hypothesis Testing in Analytic Epidemiology Hypothesis Generation vs. Hypothesis Testing • Generate hypotheses – Interview officials, case-patients, and others – Review the literature – Assess descriptive epidemiology • Test hypotheses – After hypotheses have been generated – Use analytic epidemiology Descriptive Versus Analytic Epidemiology Descriptive Epidemiology Analytic Epidemiology • Search for clues • Clues available • Formulate hypotheses • Test hypotheses • No comparison group • Comparison group • Answers: • Answers: − − − − − How much Who What When Where − How − Why Measures of Association • Assess the strength of an association between an exposure and the outcome of interest • Two widely used measures – Risk ratio (relative risk, RR) • Used with cohort studies – Odds ratio (OR) • Used with case-control studies 2 x 2 Tables Used to summarize counts of disease and exposure in order to do calculations of association Outcome Exposure Yes No Total Yes a b a+b No c d c+d a+c b+d a+b+c+ d Total 2 x 2 Tables a = number who are exposed and have the outcome b = number who are exposed and do not have the outcome c = number who are not exposed and have the outcome d = number who are not exposed and do not have the outcome Outcome Exposure Total Yes No Total Yes a b a+b No c d c+d a+c b+d a+b+c+d 2 x 2 Tables a + b = total number who are exposed c + d = total number who are not exposed a + c = total number who have the outcome b + d = total number who do not have the outcome a + b + c + d = total study population Outcome Exposure Total Yes No Total Yes a b a+b No c d c+d a+c b+d a+b+c+d Risk Ratio Ill Not Ill Total Exposed A B A+B Unexposed C D C+D Risk Ratio [A/(A+B)] [C/(C+D)] Interpreting a Risk Ratio • RR = 1.0 – No association between exposure and disease • RR > 1.0 – Positive association (more exposure = more disease) • RR < 1.0 – Negative association / protective effect (more exposure = less disease) Risk Ratio Example Ill Well Total Ate alfalfa sprouts 43 11 54 Did not eat alfalfa sprouts 3 18 21 Total 46 29 75 RR = (43 / 54) / (3 / 21) = 5.6 Odds Ratio Cases Controls Exposed A B Unexposed C D Odds Ratio (A/C)/(B/D)=(A*D)/(B*C) Interpreting an Odds Ratio • OR = 1.0 – No association between exposure and disease • OR > 1.0 – Positive association • OR < 1.0 – Negative association Odds Ratio Example Case Control Total Ate at restaurant X 60 25 85 Did not eat at restaurant X 18 55 73 Total 78 80 158 OR = (60 / 18) / (25 / 55) = 7.3 What to Do with a Zero Cell • Try to recruit more study participants • Add 1 to each cell Case Control Total Ate at restaurant X 60 0 60 Did not eat at restaurant X 18 55 73 Total 78 55 133 Remember to document / report this! Tests of Significance • Indication of reliability of the association • Answers the question: “How likely is it that the observed association may be due to chance?” • Two main methods: – 95% Confidence Intervals (CIs) – p-values Confidence Intervals • Consist of a lower bound and an upper bound – Example: RR=1.9, 95% CI: 1.1, 3.1 • Allow the investigator to: – Evaluate statistical significance (does not include 1.0) – Assess the precision of the estimate (the odds ratio or risk ratio) Confidence Interval Precision • Narrow confidence intervals – More precise – Larger sample size, less random error – Example: OR=10, 95% CI: 9.0, 11.0 • Wide confidence intervals – Less precise – Smaller sample size, more random error – Example: OR=10, 95% CI: 0.9, 44.0 p-values • How likely the observed association would be to occur by chance alone, in the absence of a true association • A small p-value – Means that it is unlikely to have observed the RR or OR observed if there was no true association – Interpreted as a true association • A p-value of 0.05 – Indicates a 5% chance that the RR or OR was observed by chance alone – Often used as the cut-point for statistical significance Plan and Execute Additional Studies • Re-examine data or collect new data – Test new hypotheses – Gather more specific info – Example: Salmonella muenchen • Investigate new hypotheses about transmission and prevention – Intervention studies – Example: Intensive hand-washing to prevent Norwalk on cruise ships Session 5 Summary • Analysis planning ensures useful data that can be used in the analytic phase to address study hypotheses • Attack rates are descriptive statistics useful for comparing the risk of disease in groups with different exposures • Analytic epidemiology allows you to test the hypotheses generated via review of descriptive statistics and the medical literature Session 5 Summary • The measure of association for casecontrol analytic studies is the odds ratio • The measure of association for cohort analytic studies is the risk ratio • Confidence intervals and p-values can be used to evaluate the statistical significance of measures of association References and Resources • Centers for Disease Control and Prevention (1992). Principles of Epidemiology, 2nd ed. Atlanta, GA: Public Health Practice Program Office. • Gordis L. (1996). Epidemiology. Philadelphia, WB Saunders. • Rothman KJ. Epidemiology: An Introduction. New York, Oxford University Press, 2002. • Stehr-Green, J. and Stehr-Green, P. (2004). Hypothesis Generating Interviews. Module 3 of a Field Epidemiology Methods course being developed in the UNC Center for Public Health Preparedness, UNC Chapel Hill.