Gastroenteritis at a University in Texas

Download Report

Transcript Gastroenteritis at a University in Texas

Session V
Analyzing Data
Session Overview
• Analysis planning
• Descriptive epidemiology
– Attack rates
• Analytic epidemiology
– Measures of association
– Tests of significance
Learning Objectives
• Understand what an analytic study contributes to
an epidemiological outbreak investigation
• Know why and how to generate measures of
association for cohort and case-control studies
• Understand how to interpret measures of
association (risk ratios, odds ratios) and
corresponding confidence intervals
• Understand how to interpret tests of significance
Basic Steps of an
Outbreak Investigation
1.
2.
3.
4.
5.
6.
7.
8.
Verify the diagnosis and confirm the outbreak
Define a case and conduct case finding
Tabulate and orient data: time, place, person
Take immediate control measures
Formulate and test hypotheses
Plan and execute additional studies
Implement and evaluate control measures
Communicate findings
Analysis Planning
Analysis Planning
– An invaluable investment of time
– Helps you select the most appropriate
epidemiologic methods
– Helps assure that the work leading up to
analysis yields a database structure and
content that your preferred analysis software
needs to successfully run analysis programs
Analysis Planning
Several factors influence—and sometimes
limit—your approach to data analysis:
– Research question
– Exposure and outcome variables
– Study design
– Sample population selection
Analysis Planning
Three key considerations as you plan your
analysis:
1. Work backwards from the research question(s) to
design the most efficient data collection instrument
2. Study design will determine which statistical tests
and measures of association you evaluate in the
analysis output
3. Consider the need to present, graph, or map data
Analysis Planning
1. Work backwards from the research
question(s) to design the most efficient
data collection instrument
•
Develop a sound data collection instrument
•
Collect pieces of information that can be
counted, sorted, and recoded or stratified
•
Analysis phase is not the time to realize that
you should have asked questions differently!
Analysis Planning
2. Study design will determine which
statistical tools you will use
•
Use risk ratio (RR) with cohort studies and
odds ratio (OR) with case-control studies
•
Some sampling methods (e.g., matching in
case-controls studies) require special types
of analysis
Analysis Planning
3. Consider the need to present, graph, or
map data
•
Even if you collect continuous data, you may
later categorize it so you can generate a bar
graph and assess frequency distributions
•
If you plan to map data, you may need X
and Y coordinate or denominator data
Data Cleaning
• Check for accuracy
– Outliers
• Check for completeness
– Missing values
• Determine whether or not to create or collapse
data categories
• Get to know the basic descriptive findings
Data Cleaning:
Outliers
• Outliers can be cases at the very beginning and
end that may not appear to be related
– First check to make certain they are not due to a
collection, coding or data entry error
• If they are not an error, they may represent
–
–
–
–
–
–
Baseline level of illness
Outbreak source
A case exposed earlier than the others
An unrelated case
A case exposed later than the others
A case with a long incubation period
Data Cleaning:
Distribution of Variables
Number of Cases
Illness Onset for Outbreak of Gastrointestinal
Illness at a Nursing Home
8
6
4
2
“Outlier”
2/
4
2/
5
2/
6
2/
7
2/
8
2/
2/ 9
10
2/
11
2/
12
2/
13
2/
14
2/
15
2/
16
2/
17
2/
18
2/
19
2/
20
2/
21
2/
22
2/
23
2/
24
2/
25
2/
3
0
Day of Onset
Data Cleaning:
Missing Values
• The investigator can check into missing
values that are expected versus those that
are due to problems in data collection or
entry
• The number of missing values for each
variable can also be learned from
frequency distributions
Data Cleaning:
Data Categories
• Which variables are continuous versus
categorical?
• Collapse existing categories into fewer?
• Create categories from continuous? (e.g.,
age)
Attack Rates
Attack Rates (AR)
AR
# of cases of a disease
# of people at risk
(for a limited period of time)
Food-specific AR
# people who ate a food and became ill
# of people who ate that food
Food-Specific Attack Rates
Consumed
Item
Did Not Consume
Item
Item
Ill
Total
AR(%)
Ill
Total
AR(%)
Chicken
12
46
26
17
29
59
Cake
26
43
61
20
32
63
Water
Green
Salad
10
24
42
33
51
65
42
54
78
3
21
14
Asparagus
4
6
67
42
69
61
This food is probably not the source of infection
CDC. Outbreak of foodborne streptococcal disease. MMWR 23:365, 1974.
Hypothesis Generation vs.
Hypothesis Testing
Hypothesis Generation vs.
Hypothesis Testing
Formulate hypotheses
– Occurs after having spoken with some case –
patients and public health officials
– Based on information from literature review
– Based on descriptive epidemiology (step #3)
Test hypotheses
– Occurs after hypotheses have been
generated
– Based on analytic epidemiology
Descriptive
Epidemiology
Analytic
Epidemiology
Search for clues
Clues available
Formulate hypotheses
Test hypotheses
No comparison group
Comparison group
Answers: How much,
who, what, when,
where
Answers: How, why
Measures of Association
• Assess the strength of an association
between an exposure and the outcome of
interest
• Two widely used measures:
– Risk ratio (a.k.a. relative risk, RR)
• Used with cohort studies
– Odds ratio (a.k.a. OR)
• Used with case-control studies
2 x 2 Tables
Used to summarize counts of disease and exposure in
order to do calculations of association
Outcome
Exposure
Yes
No
Total
Yes
a
b
a+b
No
c
d
c+d
a+c
b+d
a+b+c+d
Total
2 x 2 Tables
a = number who are exposed and have the outcome
b = number who are exposed and do not have the
outcome
c = number who are not exposed and have the outcome
d = number who are not exposed and do not have the
outcome
Outcome
Exposure
Total
Yes
No
Total
Yes
a
b
a+b
No
c
d
c+d
a+c
b+d
a+b+c+d
2 x 2 Tables
a + b = total number who are exposed
c + d = total number who are not exposed
a + c = total number who have the outcome
b + d = total number who do not have the outcome
a + b + c + d = total study population
Outcome
Exposure
Total
Yes
No
Total
Yes
a
b
a+b
No
c
d
c+d
a+c
b+d
a+b+c+d
Risk Ratio
Ill
Not Ill
Total
Exposed
A
B
A+B
Unexposed
C
D
C+D
Risk Ratio
[A/(A+B)]
[C/(C+D)]
Interpreting a Risk Ratio
• RR=1.0 = no association between
exposure and disease
• RR>1.0 = positive association
• RR<1.0 = negative association / protective
effect
Risk Ratio Example
Ill
Well
Total
43
11
54
Did not eat alfalfa 3
sprouts
18
21
Total
29
75
Ate alfalfa
sprouts
46
RR = (43 / 54) / (3 / 21) = 5.6
Odds Ratio
Cases
Controls
Exposed
A
B
Unexposed
C
D
Odds Ratio
(A/C)/(B/D)=(A*D)/(B*C)
Interpreting an Odds Ratio
The odds ratio is interpreted in the same
way as a risk ratio:
• OR=1.0 = no association between
exposure and disease
• OR>1.0 = positive association
• OR<1.0 = negative association
Odds Ratio Example
Case
Control
Total
Ate at restaurant X 60
25
85
Did not eat at
restaurant X
18
55
73
Total
78
80
158
OR = (60 / 18) / (25 / 55) = 7.3
What to do with a Zero Cell
Case
Control
Total
Ate at restaurant X
60
0
60
Did not eat at
restaurant X
18
55
73
Total
78
55
133
•Try to recruit more study participants
•Add 1 to each cell*
*Remember to document / report this!
Tests of Significance
•
Indication of reliability of the association that
was observed
•
Answers the question “How likely is it that the
observed association may be due to chance?”
•
Two main tests:
1. 95% Confidence Intervals (CI)
2. p-values
Confidence Intervals
• Allow the investigator to:
– Evaluate statistical significance
– Assess the precision of the estimate
(the odds ratio or risk ratio)
• Consist of a lower bound and an upper
bound
– Example: RR=1.9, 95% CI: 1.1-3.1
Confidence Intervals
• Provide information on precision of
estimate
– Narrow confidence intervals =more
precise
• Example: OR=10, 95% CI: 9.0 - 11.0
– Wide confidence intervals =less precise
• Example: OR=10, 95% CI: 0.9 - 44.0
p-values
• The p-value is a measure of how likely the
observed association would be to occur by
chance alone, in the absence of a true
association
• A very small p-value means that you are very
unlikely to observe such a RR or OR if there was
no true association
• A p-value of 0.05 indicates only a 5% chance
that the RR or OR was observed by chance
alone
Plan and Execute Additional
Studies
•
To gather more specific info
– Example: Salmonella muenchen
•
Intervention study
– Example: Implement intensive hand-washing
Session V Summary
Analysis planning will ensure that you get
the most valuable / useful data out of your
investigation.
Attack rates are descriptive statistics used
in cohort studies that are useful for
comparing the risk of disease in groups
with different exposures (such as
consumption of individual food items).
Session V Summary
Analytic epidemiology allows you to test the
hypotheses generated via review of descriptive
statistics and the medical literature.
The measures of association for case-control
and cohort analytic studies, respectively, are
odds ratios and risk ratios.
Confidence intervals and p-values that
accompany measures of association evaluate
the statistical significance of the measures.
References and Resources
• Centers for Disease Control and Prevention (1992).
Principles of Epidemiology, 2nd ed. Atlanta, GA: Public
Health Practice Program Office.
• Gordis L. (1996). Epidemiology. Philadelphia, WB
Saunders.
• Rothman KJ. Epidemiology: An Introduction. New York,
Oxford University Press, 2002.
• Stehr-Green, J. and Stehr-Green, P. (2004). Hypothesis
Generating Interviews. Module 3 of a Field
Epidemiology Methods course being developed in the
NC Center for Public Health Preparedness, UNC Chapel
Hill.