Chapter 11: Tests of Comparison

Download Report

Transcript Chapter 11: Tests of Comparison

Chapter 11
Tests of Comparison
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins
Chapter Overview
•
Statistical procedures used to test hypotheses and investigate differences between
groups or within groups across time.
•
Types of data and the differences between parametric and nonparametric statistics.
•
Concepts of Type I and II error, statistical power, and interaction between
independent variables.
•
Introduction to planned and post-hoc comparisons and analysis of covariance.
•
The use of t-tests and the link between t-tests and ANOVA.
•
Introduction to issues of statistical significance, clinical meaningfulness, confidence
interval analysis, and effect size provides a context for the critical appraisal of clinical
research.
•
Overview of nonparametric test of comparison and a working example of a Mann–
Whitney U test.
•
All practitioners claiming to practice from an evidence base must also understand the
principles of statistics.
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins
Selecting Statistics and Types of Data
• Data are categorized as:
– Nominal
– Ordinal
– Interval
– Ratio
• Nominal simply means “to name.” The assignment of
numeric values for analysis of nominal data is arbitrary.
• Ordinal data are ordered in a particular and meaningful
manner (e.g., numeric pain scales)
• Nonparametric statistical methods of comparison are
used to analyze nominal data.
• Parametric statistics are appropriate for analyzing
interval and ratio data under most circumstances.
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins
Interval and Ratio Data
• Interval and ratio data permit the calculation and useful
understanding of a mean or average value and a standard deviation.
• The mode is the most useful measure of central tendency when
analyzing nominal data.
• The appropriate measure of central tendency is the median, whereas
range could be provided as a measure of dispersion for some ordinal
scales.
• Interval data: “Interval” implies that the differences between points
of measure are consistent and meaningful.
• Ratio data: Similar to interval data but can yield meaningful ratio
values.
• The absence of an absolute 0 precludes the calculation of meaningful
ratios.
• In all other respects, interval and ratio data are similar. Both types of
data are analyzed with the same statistical procedures.
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins
Differences Between Nonparametric
and Parametric Procedures
• Variance and standard deviation are measures of
dispersion for interval and ratio data.
• Median and range value are reported for ordinal data
and the mode for nominal data.
• Parametric statistics analyze the distribution of
variance, hence the term “analysis of variance
(ANOVA).”
• Variance is the difference between a score or value and
a mean.
• Standard deviation is the square root of variance.
• Variance from the mean can be used to compare sets
of data.
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins
Steps to Complete ANOVA
Steps in Preparing and Completing Analysis of Variance
1. Formulate an answerable question that includes identifying
independent and dependent variables from a research idea.
2. Write the research question in a null form. Abbreviate the null
as “NES = no NES”
3. Collect data.
4. Organize data, and perform analysis of variance.
5. Interpret the results and report findings.
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins
Analysis of Variance
• The purpose of most research involving comparisons is to
infer the results to the population.
• The analysis estimates the probability that the differences
found reflect real population differences.
• Statistical analysis asks if the data collected provide sufficient
evidence to determine a difference in an entire population.
• Statistical tests of comparison are used to reject a null
hypothesis: two sets of data are drawn from the same
population.
• Rejecting the null: the groups are different.
• It is not possible to accept a null since two groups will not
truly be equal. If we fail to reject a null, then we must
suspend judgment as to whether the groups differ.
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins
F-Value and Unexplained Variance
• The result of ANOVA is an F value, which is a point on an
F distribution that permits estimates of probability.
• The formula for an F is a ratio of variance estimates—thus
the term “analysis of variance.”
• F = mean square explained / mean square
unexplained (also sometimes referred to as “ms error”).
• A mean square is essentially the sum of the squared
differences from each score and a mean divided by the
number of scores minus 1.
• Unexplained variance is variation from the mean that is
attributed to factors beyond the scope of the research
design.
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins
Interpreting F
• F is a point on a distribution.
• There are an infinite number of F distributions that are reflective
of the number of degrees of freedom in the numerator and
denominator.
• The larger the F (ratio of explained / unexplained variance), the
less likely that the differences observed were chance occurrences.
• By convention, researchers are generally willing to accept less
than a 5% risk that an F value obtained is a chance occurrence.
• When the F value is larger, we reject the null hypothesis; thus,
differences observed are due to the effects of our intervention.
• The alpha value specifies the level of accepted risk of incorrectly
concluding that observed differences do not reflect true
differences in a population of 100.
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins
Alpha Values and Types of Error
• Type I error occurs when a null is rejected when in fact
population differences do not exist. The alpha value is really
the level of risk of Type I error.
• Type II error occurs when a null is not rejected yet a study
of the population would reveal differences between groups.
• Researchers guard against Type I error by selecting the
alpha level.
• Statistical power is required to decrease the risk of Type II
error.
• Power is influenced by 3 factors:
– The mean difference between groups
– The variance within groups
– Sample size
• In reality, the only factor investigators can control is sample
size.
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins
Complexity and Interaction
• When a between-subjects variable and a within-subjects
variable exist in a research design, the design may be
referred to as a “mixed model” (very common in health
care research).
• It is possible to have multiple between-subjects and withinsubjects variables within a research design.
• Greater complexity in research designs and data analysis is
not necessarily an indicator of better research.
• Significant interaction: “significant” suggests that the
finding is a reflection of a population phenomenon.
• To better understand how variables interact, you can turn to
tables that include “cell” means and standard deviations.
• To interpret the meaning of interactions between variables,
use graphic representation.
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins
Levels of Variables, Planned
Comparison, and Post-Hoc Analysis
• There may be multiple levels within a variable.
– Example: Time as a within-subject variable.
• Addition of levels of independent variables can maximize
efficiencies and yield greater insights into the interactions
between the variables of interest.
• Comparisons between pairs of means: pre-planned
pairwise comparison.
• Post-hoc test: Tukey, Scheffe, and Bonferroni
procedures.
• When one encounters reference to procedures of post-hoc
testing, the investigator is conveying that additional
analyses were performed to isolate the sources of
significant differences between sets of scores.
• Risk of Type I error exists with each analysis performed.
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins
Analysis of Covariance
• ANCOVA signifies analysis of covariance.
• ANCOVA is a special case of ANOVA in which a variable
is introduced for the purpose of accounting for
unexplained variance.
• ANCOVA can increase statistical power.
• MANOVA refers to multivariant analysis of variance or
cases where more than one dependent measure is
analyzed simultaneously.
• MANOVA is best applied when the investigator is
interested in the effect of the independent variable(s)
on the collection of dependent variables.
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins
T-Tests
• T- Tests are a special case of ANOVA in which there are only two sets of
data in the comparison.
• t2= F
• t values are points on a curve, and there are an infinite number of t
distributions.
• Each t value corresponds to the DF associated with unexplained variance.
The DF associated with explained variance is always 1.
• t = mean A – mean B / S
pool
√1 / na + 1 / n
b
• Standard deviation (SD) is the square root of variance. Thus, it’s the
link between the formula for t and ANOVA.
• t values may be positive or negative. (F values are always positive.)
• With t there is a choice of a null of A = B or A = or >B or vice versa A =
or < B.
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins
Significance and Confidence
Intervals
• ANOVA, t-tests, and the nonparametric procedures address
the probability that differences observed reflect population
differences but do not address the magnitude of
differences.
• It is possible to reject a null hypothesis and conclude when
the magnitude is of little clinical consequence or conversely
fail to reject a null when the possibility of clinically
meaningful differences exists.
• The solution to this problem is the reporting of confidence
intervals.
• Focus on the interpretation of confidence intervals, and
provide only one example of the calculation process.
• A statistically meaningful difference may not reflect
clinically meaningful differences; thus, additional
information may be needed before deciding on a plan of
care.
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins
Effects Size
• Effect size is a calculation that shows the typical response to
intervention. It is a useful approach to understanding what the
observed differences between groups means in terms of
magnitude of effect.
• Effect size calculations place the magnitude of differences
between groups in the context of group variance.
• Jacob Cohen (1988): Cohen’s d is one of the most commonly
referenced methods of calculating effect size:
•
d=
meana - meanb / s
• s is the pooled variance estimate:
•
s = √ (n1 – 1)s12 + (n2 – 1)s22 / n1 + n2
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins
Effects Size
• Hedges (1981): Similar equation to Cohen’s.
Yields higher effect size estimates. Denominator
based on the degrees of freedom:
s = √ (n1 – 1)s12 + (n2 – 1)s22 / n1 + n2 -2
• Effect size of
– 0.2 represents a small effect.
– 0.5 represents a moderate effect.
– > 0.8 represents a large effect.
• These values are based in social science rather
than in biomedical research.
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins
Nonparametric Statistics
• The terms nonparametric and distribution-free can be used
interchangeably.
• When parametric analyses are completed, it is assumed that data
is based on observations of a normally distributed population with
similar variance, and that samples are drawn at random.
• If these assumptions are not met, nonparametric procedures may
be the appropriate analytical methods. Nonparametric statistics
test hypotheses about medians or nominal data distribution.
• Violation of the assumptions is unlikely to have a substantial
impact on the statistical outcome, as procedures such as ANOVA
are robust.
• Three of the most common nonparametric procedures:
–
–
–
Mann–Whitney U
Kruskal–Wallis One-Way Analysis of Variance by Ranks
Friedman Two-Way Analysis of Variance by Ranks
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins
Mann–Whitney U Test
• The Mann–Whitney U test is analogous to the paired
t-test.
• The analysis tests the null hypothesis that the median
score in one group (A) is < or = to the median score
of a second group (B) (A < or = B).
• If the analysis reveals the median of B > A, then one
might reject the null hypothesis.
• The Mann–Whitney U result is designated as a T.
• As with parametric tests, the null hypothesis (A < or =
B) is rejected only if the probability of obtaining a Tvalue is sufficiently small (e.g., less than 5%).
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins
Kruskal–Wallis One-Way and
Friedman Two-Way
• A Kruskal–Wallis One-Way Analysis of Variance by Ranks is
appropriate when there are more than two groups.
–
The result is an H-value.
–
The probability of H can be found by consulting a table
specific to this analysis.
• A Friedman Two-Way Analysis of Variance by Ranks is
appropriate for analyses in which there are repeated measures
within one group.
• None of these nonparametric tests allow for the analysis of
repeated measures from multiple groups, known as a mixed
model design.
• This represents one of the major limitations of these statistical
tests in clinical research.
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins
Chapter Summary and Key Points
• Statistics do not prove anything.
• Do not read to accept the conclusions of a research
report as an absolute or final answer.
• Numbers can lie, and the misinterpretation of data
and statistical analyses can mislead.
• Most students preparing for careers in health care
are not fond of statistics.
• Careful consideration and critical appraisal inform
quality clinical practice; thus, it is necessary to
understand the principles of statistics.
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins