I is for Investigation Outbreak Investigation Methods from Mystery to Mastery Session V Analyzing Data.

Download Report

Transcript I is for Investigation Outbreak Investigation Methods from Mystery to Mastery Session V Analyzing Data.

I is for Investigation
Outbreak Investigation Methods from
Mystery to Mastery
Session V
Analyzing Data
Session Overview
• Analysis planning and data cleaning
• Attack rates
• Hypothesis testing in analytic
epidemiology
– Measures of association
– Tests of significance
Learning Objectives
• Understand the purpose of an analytic study in an
epidemiologic outbreak investigation
• Generate measures of association for cohort and
case-control studies
• Interpret measures of association (risk ratios, odds
ratios) and corresponding confidence intervals
• Interpret a statistical test of significance
Basic Steps of an
Outbreak Investigation
1.
2.
3.
4.
5.
6.
7.
8.
Verify the diagnosis and confirm the outbreak
Define a case and conduct case finding
Tabulate and orient data: time, place, person
Take immediate control measures
Formulate and test hypotheses
Plan and execute studies
Implement and evaluate control measures
Communicate findings
Analysis Planning and
Data Cleaning
Purpose of Analysis Planning
• Tailor questionnaire to collect necessary
data in the correct format
• Select the most appropriate epidemiologic
methods
• Use data efficiently in analytic software
Factors Influencing Data
Analysis
•
•
•
•
Research question
Exposure and outcome variables
Study design
Sampling
Steps for Analysis Planning
1. Work backwards from the research question(s)
to design the most efficient data collection
instrument
2. Study design will determine which statistical
tests and measures of association you evaluate
in the analysis output
3. Consider the need to present, graph, or map
data
Analysis Planning: Step 1
1. Work backwards from the research
question(s) to design the most efficient
data collection instrument
– Develop a sound data collection instrument
– Collect pieces of information that can be
counted, sorted, and recoded or stratified
Analysis phase is not the time to realize that
you should have asked questions differently!
Analysis Planning: Step 2
2. Study design will determine which
statistical tools you will use
– Use risk ratio (RR) with cohort studies and
odds ratio (OR) with case-control studies
Match sampling methods to the required special
types of analysis
Analysis Planning: Step 3
3. Consider the need to present, graph, or
map data
– Consider continuous versus categorical data
– Collect additional data needed to map data
Data Cleaning
• Check for accuracy
– Outliers
• Check for completeness
– Missing values
• Get to know the descriptive findings
• Determine whether or not to create or
collapse data categories
Data Cleaning:
Distribution of Variables
8
6
4
2
2/
5
2/
6
2/
7
2/
8
2/
2/ 9
10
2/
11
2/
12
2/
13
2/
14
2/
15
2/
16
2/
17
2/
18
2/
19
2/
20
2/
21
2/
22
2/
23
2/
24
2/
25
2/
2/
“Outlier”
4
0
3
Number of Cases
Illness Onset for Outbreak of Gastrointestinal
Illness at a Nursing Home
Day of Onset
Data Cleaning: Outliers
• Value falling far outside the values of the sample
– First or last cases in an outbreak
– Very high or very low values for a variable
• May be due to a collection, coding or data entry
error
• If not due to error, the outlier may represent:
–
–
–
–
–
Baseline level of illness / unrelated to outbreak
Outbreak source
Case exposed earlier than the others
Case exposed later than the others
Case with a long incubation period
Data Cleaning: Missing Values
• Determine missing values by performing
frequency distribution
• Select missing values for follow up
• Reasons for missing values
– Expected missing values
• Skip patterns
• Non applicable questions
– Problems in data collection or data entry
Data Cleaning: Data Categories
• Determine which variables are continuous
versus categorical
• Decide whether to collapse existing
categories
– Never, former, sometimes, frequently
– Ever, never
• Assess need to categorize continuous
variables
Attack Rates
Understanding Attack Rates
Attack Rate (AR)
# of people who became ill
# of people at risk (for a limited period of time)
Exposure-specific AR
# people who were exposed and became ill
# of people who were exposed
Food-Specific Attack Rates
Consumed
Item
Did Not Consume
Item
Item
Ill
Total
AR(%)
Ill
Total
AR(%)
Chicken
12
46
26
17
29
59
Cake
26
43
61
20
32
63
Water
Green
Salad
10
24
42
33
51
65
42
54
78
3
21
14
Asparagus
4
6
67
42
69
61
This food is probably not the source of infection
CDC. Outbreak of foodborne streptococcal disease. MMWR 23:365, 1974.
Hypothesis Testing in Analytic
Epidemiology
Hypothesis Generation vs.
Hypothesis Testing
• Generate hypotheses
– Interview officials, case-patients, and others
– Review the literature
– Assess descriptive epidemiology
• Test hypotheses
– After hypotheses have been generated
– Use analytic epidemiology
Descriptive Versus Analytic
Epidemiology
Descriptive Epidemiology
Analytic Epidemiology
• Search for clues
• Clues available
• Formulate hypotheses
• Test hypotheses
• No comparison group
• Comparison group
• Answers:
• Answers:
−
−
−
−
−
How much
Who
What
When
Where
− How
− Why
Measures of Association
• Assess the strength of an association
between an exposure and the outcome of
interest
• Two widely used measures
– Risk ratio (relative risk, RR)
• Used with cohort studies
– Odds ratio (OR)
• Used with case-control studies
2 x 2 Tables
Used to summarize counts of disease and
exposure in order to do calculations of
association
Outcome
Exposure
Yes
No
Total
Yes
a
b
a+b
No
c
d
c+d
a+c
b+d
a+b+c+
d
Total
2 x 2 Tables
a = number who are exposed and have the outcome
b = number who are exposed and do not have the
outcome
c = number who are not exposed and have the
outcome
d = number who are not exposed and do not have the
outcome
Outcome
Exposure
Total
Yes
No
Total
Yes
a
b
a+b
No
c
d
c+d
a+c
b+d
a+b+c+d
2 x 2 Tables
a + b = total number who are exposed
c + d = total number who are not exposed
a + c = total number who have the outcome
b + d = total number who do not have the outcome
a + b + c + d = total study population
Outcome
Exposure
Total
Yes
No
Total
Yes
a
b
a+b
No
c
d
c+d
a+c
b+d
a+b+c+d
Risk Ratio
Ill
Not Ill
Total
Exposed
A
B
A+B
Unexposed
C
D
C+D
Risk Ratio
[A/(A+B)]
[C/(C+D)]
Interpreting a Risk Ratio
• RR = 1.0
– No association between exposure and disease
• RR > 1.0
– Positive association
(more exposure = more disease)
• RR < 1.0
– Negative association / protective effect
(more exposure = less disease)
Risk Ratio Example
Ill
Well
Total
Ate alfalfa sprouts
43
11
54
Did not eat alfalfa
sprouts
3
18
21
Total
46
29
75
RR = (43 / 54) / (3 / 21) = 5.6
Odds Ratio
Cases
Controls
Exposed
A
B
Unexposed
C
D
Odds Ratio
(A/C)/(B/D)=(A*D)/(B*C)
Interpreting an Odds Ratio
• OR = 1.0
– No association between exposure and
disease
• OR > 1.0
– Positive association
• OR < 1.0
– Negative association
Odds Ratio Example
Case
Control
Total
Ate at restaurant X
60
25
85
Did not eat at
restaurant X
18
55
73
Total
78
80
158
OR = (60 / 18) / (25 / 55) = 7.3
What to Do with a Zero Cell
• Try to recruit more study participants
• Add 1 to each cell
Case
Control
Total
Ate at restaurant X
60
0
60
Did not eat at
restaurant X
18
55
73
Total
78
55
133
Remember to document / report this!
Tests of Significance
• Indication of reliability of the association
• Answers the question: “How likely is it that
the observed association may be due to
chance?”
• Two main methods:
– 95% Confidence Intervals (CIs)
– p-values
Confidence Intervals
• Consist of a lower bound and an upper
bound
– Example: RR=1.9, 95% CI: 1.1, 3.1
• Allow the investigator to:
– Evaluate statistical significance (does not
include 1.0)
– Assess the precision of the estimate (the odds
ratio or risk ratio)
Confidence Interval Precision
• Narrow confidence intervals
– More precise
– Larger sample size, less random error
– Example: OR=10, 95% CI: 9.0, 11.0
• Wide confidence intervals
– Less precise
– Smaller sample size, more random error
– Example: OR=10, 95% CI: 0.9, 44.0
p-values
• How likely the observed association would be to
occur by chance alone, in the absence of a true
association
• A small p-value
– Means that it is unlikely to have observed the RR or
OR observed if there was no true association
– Interpreted as a true association
• A p-value of 0.05
– Indicates a 5% chance that the RR or OR was
observed by chance alone
– Often used as the cut-point for statistical significance
Plan and Execute Additional
Studies
• Re-examine data or collect new data
– Test new hypotheses
– Gather more specific info
– Example: Salmonella muenchen
• Investigate new hypotheses about
transmission and prevention
– Intervention studies
– Example: Intensive hand-washing to prevent
Norwalk on cruise ships
Session 5 Summary
• Analysis planning ensures useful data that can be
used in the analytic phase to address study
hypotheses
• Attack rates are descriptive statistics useful for
comparing the risk of disease in groups with
different exposures
• Analytic epidemiology allows you to test the
hypotheses generated via review of descriptive
statistics and the medical literature
Session 5 Summary
• The measure of association for casecontrol analytic studies is the odds ratio
• The measure of association for cohort
analytic studies is the risk ratio
• Confidence intervals and p-values can be
used to evaluate the statistical significance
of measures of association
References and Resources
• Centers for Disease Control and Prevention (1992).
Principles of Epidemiology, 2nd ed. Atlanta, GA: Public
Health Practice Program Office.
• Gordis L. (1996). Epidemiology. Philadelphia, WB
Saunders.
• Rothman KJ. Epidemiology: An Introduction. New York,
Oxford University Press, 2002.
• Stehr-Green, J. and Stehr-Green, P. (2004). Hypothesis
Generating Interviews. Module 3 of a Field
Epidemiology Methods course being developed in the
UNC Center for Public Health Preparedness, UNC
Chapel Hill.