Find the Joy in Stats

Download Report

Transcript Find the Joy in Stats

Find the Joy in Stats ? ! ?
Walt Senterfitt, Ph.D., PWA
Los Angeles County Department of
Public Health and
CHAMP
Introduction
• Stats are tools, to help describe, understand
and assess research results
• Like other tools, they can be used properly
and appropriately, or they can be misused
• They have no meaning by themselves and
are not a gold standard for determining truth
or falsehood.
Some key terms and concepts
• Observational vs. experimental study
• Effect measures or estimates: Relative Risk,
RR and Odds Ratio, OR (there are others!)
• Confidence Intervals (or CI)
• Crude vs. Adjusted RR and OR
• P-values and tests of statistical significance
Observational (Epidemiologic)
Study Designs
• Individuals are described and observed and
certain outcomes are measured
• No attempt is made to affect the outcome
• Example: Studies that observed that
circumcised men were less likely to be HIVinfected than uncircumcised men
• Results are associations; causation can only
be inferred
Experimental Studies
• Individuals divided into 2 or more groups,
usually randomly, that receive different
interventions or treatments, e.g. drug vs.
placebo
• Specified outcomes are measured and
compared
• Example: Randomized controlled trial of
circumcision for HIV prevention.
Effect Measures
• Most research seeks to offer evidence (note
I don’t say “show” or “prove”) for or
against a hypothesized effect of an
intervention or exposure on an outcome
• Usually expressed by comparing the
outcome in the exposed or treatment group
vs. the unexposed, placebo or control group
Relative Risk or Risk Ratio (RR)
• The probability of an event or outcome
occurring in the exposed (treatment or
intervention) group vs. the unexposed
(control) group. Expressed as a ratio or
fraction.
• Example: HIV incidence in circumcision
group was 2.1% vs. 4.2% in control group
• Risk ratio (2.1 / 4.2) was 0.47
• Corresponds to a 53% reduction in risk
Figure 2
Source: The Lancet 2007; 369:643-656 (DOI:10.1016/S0140-6736(07)60312-2)
Odds Ratio (OR)
• The ratio of the odds of an event or outcome
occurring in the exposed group compared to
the odds in the unexposed or control group
• “Odds” is a little different from way used in
most ordinary language, but similar to
betting odds, say at a horse race
• 3:2 odds, means 60% chance of winning
and 40% chance of not winning, and OR =
.6/.4 = 1.5
Odds Ratios vs. Risk Ratio
• For both, value of 1.0 means “no
effect”
• RR is more intuitive, like the way
people think, and thus easier to explain
• OR has some properties that make it
easier to work with statistically,
especially for adjustments to account
for the influence of other variables
More on RR vs OR
• Both can be expressed exposed/unexposed or
unexposed/exposed, so be sure you understand the
framing. If an undesirable outcome has less risk
or lower odds in the exposed (treatment) group,
we can say that the treatment was “protective”
• OR tends to be close to the RR if the outcome of
interest is fairly rare (<10%, say)
• BUT …..
Risk/Odds of Death on Titanic
Males vs. Females
Risk Ratio vs. Odds Ratio
Female
Male
Total
Alive
308
142
450
Dead
154
709
863
Total
462
851
1,313
…if the outcome is not rare
•
•
•
•
•
•
•
Risk of death for males was 709/851 = 0.83
Risk of death for females = 154/462 = 0.33
RR = 0.83/0.33 = 2.5
Odds of death for males = 709:142 = 5:1
Odds of death for females = 154:308 = 1:2
OR = 4.99/0.5 = 9.99
Lesson: Make sure odds ratios are being
applied correctly, i.e. to rare outcomes
Crude vs. adjusted RR and OR
• In observational studies especially, there are
other variables -- besides the main one of
interest -- that may well affect the outcome;
the effect of an exposure may be different
in, say, certain age, gender, race/ethnic,
social class groups … and the proportionate
mix of these categories may be incidental
Crude vs. Adjusted (more)
• To isolate or highlight the effect of the main
exposure variable of interest, we can statistically
“adjust” the OR or RR to “control for the effect
of” certain other variables
• For instance, we can statistically adjust the
observed results to see what the effect would have
been if the overall sample were all of the same
age; then what we have is an RR or OR “adjusted
for age differences” or “controlling for variations
in age”
Confidence Intervals (CI)
• Effect measures are first expressed, as in examples
above, as “point estimates” as in “the risk of HIV
in the circumcision group was only 0.47 times as
great as in the control group” or “there was a 53%
reduction in risk of HIV in the circumcision as
compared with the control group”
• But there is always random error in measurement,
no matter how carefully a study is done
• Unlikely we would get the exact same results
again
Confidence Intervals (continued)
• Thus, effect estimates are also presented with
“interval estimates” which is a range around the
point estimate. The true value of the effect “most
likely” lies within this range.
• This range is called the confidence interval and the
upper and lower ends of the range are called the
confidence limits. The range is determined in part
by the confidence level, most typically set
arbitrarily at 95%
CI: example
• “During the study, seroconversion occurred in 22
participants in the circumcision group and 47 of those in
the control group. The 2-year HIV incidence was 2·1%
(95% CI 1·2—3·0) in the circumcision group and 4·2%
(3·0—5·4) in the control group (p=0·0065); combined, it
was 3·1% (2·4—3·9). ...The risk ratio of HIV acquisition
in the circumcision group compared with the control group
was 0·47 (95% CI 0·28—0·78), which corresponds to a
reduction in the risk of acquiring an HIV infection in the
circumcision group of 53% (22%—72%).”
CI Interpretation
• Tricky to be technically accurate and still be
understood by anyone but statisticians!
• But not *too* bad to say “statistically, there is a
95% chance that the true effect is not outside this
range”
• Most important point is to remember that the point
estimate is probably not exact
• CI depends on the particular statistical test used,
the amount of random variation in the data
collection process and the sample size
“Statistical Significance” and pvalues
• “Statistically significant” does not mean a
result or inference is necessarily true or
accurate, and “non-significant” does not
mean an effect is necessarily not real
• Significance tests are associated with pvalues; a particular p-value is calculated
from a particular statistical test on the
observed data
P-values
• The p-value is the probability of obtaining
the observed result, or one more extreme, if
no real effect exists
• Back to the circumcision trial example, the
calculated p-value (on the Z-test of the
difference in 2-year HIV incidence in
circumcision group, 2.1% vs control, 4.2%)
was .0065.
Statistical significance
• “Statistically significant” is usually applied to
results where the calculated p-vale is less than .05
(sometimes set at a different level), the flip side of
95% a or confidence interval
• Sometimes researchers say “significant at the ,05
level” or “highly significant” at “.01” or “less than
.001,” etc.
• Sometimes researchers say for p-values greater
than .05 but less than .10 that there was “a trend
toward significance”
Dangers of Significance Testing
• Significance test provides an arbitrary but
accepted way to compare the relative
strength of associations between exposure
and observed effect across studies, and
helps make objective “decision rules”
• But they are often misinterpreted and
misused in ways that prevent us applying
critical judgment to make maximum use of
information from studies
Dangers, continued
• A common misinterpretation of a p-value less than
.05 = “That we can be 95% certain that the
observed difference between groups, or the
observed effect, is real and could not have
happened by chance”
• A p-value less than .05 allows us to say that there
is strong evidence against there being no true
effect (aka “for rejecting the null hypothesis), but
it does NOT prove that the alternative is true – that
the observed effect is true and real.
More dangers
• Statistical significance does not necessarily
= clinical or “real world” significance. Very
large sample sizes can render almost any
difference statistically significant.
• Focusing only on stat significance can us to
ignore some real effects and exaggerate
others
• Only necessary when there’s a close call !
Conclusion
• Question, question, question
• Does the basic study design make sense? Is it
asking important question in the right way? Do the
comparison groups seem equivalent? Are the
observed differences likely to be significant in the
real world? If the observed results are surprising,
can you possibly explain the discrepancy or see
what further studies are necessary?
• Replication and confirmation are usually key.