Subset Working Group

Download Report

Transcript Subset Working Group

Subgroup Report 7/28/06
Our Aims
• Purpose of future work: write (at least) one
paper describing the landscape of
appropriate analytic options.
• Purpose of paper is to educate writers and
readers of research papers with respect to
assessing the ‘quality’ and validity of
results.
• Purpose of this talk: elicit group feedback
on our paper structure and concepts.
Not our aims:
• We are not discussing particular methodology,
but rather general principles that provide a
framework in which different methodologies can
be incorporated.
• We don’t overlap with CONSORT (Consolidated
standards of reporting trials), which gives
detailed advice for reporting randomized
controlled trials.
• We deal with reporting more generally on
subgroup research, not necessarily on
specifically randomized trials.
Propose 3 levels of analysis
People will look at their data in great detail
no matter what we say. We are trying to
tell them how to report what they find or
interpret what other people report.
So we suggest studies should be formulated
at 3 levels of analysis: Primary,
secondary, and tertiary. Investigators
should clearly specify where they are in
this system – to facilitate credibility.
Primary analyses
• There should be a very limited set of major
outcomes (often only one) of primary
concern. This level usually doesn’t include
separate subgroup analyses – so is not
really covered in our paper!
Secondary analyses
• Here we are specifically focusing on
planned subgroup analyses. These may
be carried out whether the primary
analyses are significant or not.
• multiplicity correction methods
Tertiary analyses
• Unplanned analyses. We ask authors to
explicitly identify when they move into this
level of analysis!
• Examine data in many ways, aka, Data
mining Data dredging, Exploratory
analyses.
Alert readers that results are derived within
an unplanned analysis format.
Reporting/interpreting: 1o
• Primary analysis(-es): sound methodology
(for analyses). If the studies aren’t
randomized, there may be different
explanations for the results (e.g.
confounding factors), but the results
themselves have a defined statistical
justification.
• Usual alpha allocation.
Reporting/interpreting: 2o
• Secondary analyses: we propose a
second allotment of alpha ≤ primary’s
alpha for this entire set of analyses (no
matter how large a set!)
• Results will have some statistical
justification, but should be considered
promising, with independent replication
strongly advised before the results are
acted upon.
Reporting/Interpreting 3o
• Results must be considered speculative. Any
reported p-values or effect sizes are purely
descriptive, since they do not take the multiplicity
of possible inferences into account. It may be
possible to speculate on multiplicity adjustment,
but this is usually problematic.
• Any findings here should include other related
evidence to facilitate decisions on whether to
pursue them (reporter’s [or reader’s]
perspective) or believe them (reader’s
perspective).
Summary of proposed continuum:
• Primary level: Allow 5% for T1error (or
other specified value) (as usual)
• Secondary level: Allow some alpha <= the
specified primary alpha
• Tertiary level: Presumably it is impossible
to define ‘statistical significance’ here. It
may be possible retrospectively, but
unlikely. Authors can report individual pvalues, and/or effect sizes, although these
are generally only descriptive.
Summary of proposed paper(s)
• One paper is drafted-fuller version to
SMMR; and possibly briefer version (see
previous slide) to JAMA.
• Main reference study for SMMR paper is
from WHI –to provide unifying context for
explanations.
• Need other/additional exemplars too (see
below).
What do we need?
• Input from you!
• References from you!
• A time line… to be announced.
• URL to be maintained at SAMSI!
• Alias to be maintained at SAMSI!
Input/ references?
• Input from SAMSI workshoppers <worshippers?>
– please check our draft
• http://www.samsi.info/200506/multiplicity/workingg
roup/sa/index.html
• Looking for examples of well-done, well-reported
studies, even for null findings
– Specifically, exemplars of the types/situations described
in the paper at the 3 levels
• Researchers you know/respect/work with whose
names will lead to excellent hits in Medline.
• Please consider these options; email us with
anything you think of!!
SAMSI resources?
• URL to be maintained at SAMSI!
– Do we need to put publications (references
we’ve been sharing) that are currently posted
on the URL behind a password ??
• Alias to be maintained at SAMSI!
• Return tickets?
THANK YOU SAMSI!!
What if you find something at
exploratory level
When and how should followup studies be
performed?
• Existence of prior published relevant
results
• “Plausible” explanations
• Possible confounding factors
• Strength of evidence: p-values, effect
sizes, posterior probabilities
Other issues
1. A priori vs post hoc comparisons:
The wrong thing to emphasize.
The important issue is whether there is
some multiple error control over the set of
comparisons. i.e. post hoc is fine using
Scheffe. Planning 20 comparisons without
multiplicity adjustment is bad.
• 2. Hierarchical analyses
• Can gain power by using analyses where
you don’t continue unless something is
significant. E.g. degree of a polynomial.
• Don’t test for linear unless constant is
significant. Don’t test for quadratic unless
linear is significant. Etc. Every test can
be at .05 and familywise error rate is .05.
• Interaction: Qualitative interaction is more
important than quantitative interaction.
• Either test hypothesis of no qualitative
interaction (Gail and Simon, Biometrics
1985) or test hypothesis of qualitative
interaction (Shaffer, Psychometrika, 1991).
Which?
• Consider testing for differences in
distributions of subgroups rather than
differences in means.
• Sometimes leads to different tests, and
often leads to different interpretations of
tests.
Type III errors
• In some contexts directional errors are
worse than false non-zero decisions.
• Ex: Comparing medications
• In some contexts directional errors are
less important than false non-zero
decisions.
• Ex: Perhaps microarray analysis?
• (Shaffer, Psychological Methods 2002)
We can’t expect perfection
• Some results of good studies will be Type I
errors (p = .05)
• Some results of good studies will be Type
II errors (p = .20 with power of .80).