Statistical Methods A Brief Review

Transcript Statistical Methods A Brief Review

Statistical Methods in Clinical
Research
James B. Spies M.D., MPH
Professor of Radiology
Georgetown University School of Medicine
Washington, DC
Overview

Data types

Summarizing data using descriptive statistics

Standard error

Confidence Intervals
Overview






P values
One vs two tailed tests
Alpha and Beta errors
Sample size considerations and power analysis
Statistics for comparing 2 or more groups with
continuous data
Non-parametric tests
Overview

Regression and Correlation

Risk Ratios and Odds Ratios

Survival Analysis

Cox Regression
Further Study

Medical Statistics Made Easy

M. Harris and G. Taylor
Informa Healthcare UK

Distributed in US by:

Taylor and Francis
6000 Broken Sound Parkway, NW Suite 300
Boca Raton, FL 33487
1-800-272-7737
Types of Data

Discrete Data-limited number of choices

Binary: two choices (yes/no)



Categorical: more than two choices, not ordered



Dead or alive
Disease-free or not
Race
Age group
Ordinal: more than two choices, ordered


Stages of a cancer
Likert scale for response

E.G. strongly agree, agree, neither agree or disagree, etc.
Types of data

Continuous data

Theoretically infinite possible values (within
physiologic limits) , including fractional values


Can be interval




Height, age, weight
Interval between measures has meaning.
Ratio of two interval data points has no meaning
Temperature in celsius, day of the year).
Can be ratio


Ratio of the measures has meaning
Weight, height
Types of Data


Why important?
The type of data defines:

The summary measures used



Mean, Standard deviation for continuous data
Proportions for discrete data
Statistics used for analysis:

Examples:


T-test for normally distributed continuous
Wilcoxon Rank Sum for non-normally distributed
continuous
Descriptive Statistics

Characterize data set

Graphical presentation




Histograms
Frequency distribution
Box and whiskers plot
Numeric description

Mean, median, SD, interquartile range
Histogram
Continuous Data
No segmentation of data into groups
Frequency Distribution
Segmentation of data into groups
Discrete or continuous data
Box and Whiskers Plots
Box and Whisker Plots
Popular in Epidemiologic Studies
Useful for presenting comparative data graphically
Numeric Descriptive Statistics

Measures of central tendency of data




Mean
Median
Mode
Measures of variability of data


Standard Deviation
Interquartile range
Sample Mean

Most commonly used measure of central tendency

Best applied in normally distributed continuous data.

Not applicable in categorical data

Definition:

Sum of all the values in a sample, divided by the number of
values.
Sample Median


Used to indicate the “average” in a skewed
population
Often reported with the mean


It is the middle value from an ordered listing of the
values



If the mean and the median are the same, sample is
normally distributed.
If an odd number of values, it is the middle value
If even number of values, it is the average of the two middle
values.
Mid-value in interquartile range
Sample Mode

Infrequently reported as a value in studies.

Is the most common value

More frequently used to describe the
distribution of data

Uni-modal, bi-modal, etc.
Interquartile range

Is the range of data from the 25th percentile
to the 75th percentile

Common component of a box and whiskers
plot


It is the box, and the line across the box is the
median or middle value
Rarely, mean will also be displayed.
Standard Error

A fundamental goal of statistical analysis is to
estimate a parameter of a population based on a
sample

The values of a specific variable from a sample are
an estimate of the entire population of individuals
who might have been eligible for the study.

A measure of the precision of a sample in estimating
the population parameter.
Standard Error

Standard error of the mean

Standard deviation / square root of (sample size)


Standard error of the proportion


(if sample greater than 60)
Square root of (proportion X 1 - proportion) / n)
Important: dependent on sample size

Larger the sample, the smaller the standard error.
Clarification

Standard Deviation measures the
variability or spread of the data in an
individual sample.

Standard error measures the precision
of the estimate of a population
parameter provided by the sample
mean or proportion.
Standard Error

Significance:

Is the basis of confidence intervals

A 95% confidence interval is defined by


Sample mean (or proportion) ± 1.96 X standard error
Since standard error is inversely related to the
sample size:

The larger the study (sample size), the smaller the
confidence intervals and the greater the precision of the
estimate.
Confidence Intervals

May be used to assess a single point
estimate such as mean or proportion.

Most commonly used in assessing the
estimate of the difference between two
groups.
Confidence Intervals
Commonly reported in studies to provide an estimate of the precision
of the mean.
Confidence Intervals
P Values

The probability that any observation is due to chance
alone assuming that the null hypothesis is true

Typically, an estimate that has a p value of 0.05 or less is
considered to be “statistically significant” or unlikely to occur
due to chance alone.

The P value used is an arbitrary value



P value of 0.05 equals 1 in 20 chance
P value of 0.01 equals 1 in 100 chance
P value of 0.001 equals 1 in 1000 chance.
P Values and Confidence
Intervals

P values provide less information than confidence
intervals.


A P value provides only a probability that estimate is due to chance
A P value could be statistically significant but of limited clinical
significance.


A very large study might find that a difference of .1 on a VAS Scale of 0
to 10 is statistically significant but it may be of no clinical significance
A large study might find many “significant” findings during
multivariable analyses.
“a large study dooms you to statistical significance”
Anonymous Statistician
P Values and Confidence
Intervals

Confidence intervals provide a range of plausible values of the
population mean



For most tests, if the confidence interval includes 0, then it is not
significant.
Ratios: if CI includes 1, then is not significant
The interval contains the true population value 95% of the time.

If a confidence interval range is very wide, then plausible value
might range from very low to very high.

Example: A relative risk of 4 might have a confidence interval of 1.05 to
9, suggesting that although the estimate is for a 400% increased risk,
an increased risk of 5% to 900% is plausible.
Errors

Type I error

Claiming a difference between two
samples when in fact there is none.



Remember there is variability among samplesthey might seem to come from different
populations but they may not.
Also called the  error.
Typically 0.05 is used
Errors

Type II error




Claiming there is no difference between
two samples when in fact there is.
Also called a  error.
The probability of not making a Type II
error is 1 - , which is called the power of
the test.
Hidden error because can’t be detected
without a proper power analysis
Errors
Test Result
Truth
Null
Hypothesis
H0
Alternative
Hypothesis
H1
Null
Hypothesis
H0
Alternative
Hypothesis
H1
No Error
Type I

Type II

No Error
Sample Size Calculation




Also called “power analysis”.
When designing a study, one needs to determine
how large a study is needed.
Power is the ability of a study to avoid a Type II error.
Sample size calculation yields the number of study
subjects needed, given a certain desired power to
detect a difference and a certain level of P value that
will be considered significant.


Many studies are completed without proper estimate of
appropriate study size.
This may lead to a “negative” study outcome in error.
Sample Size Calculation

Depends on:




Level of Type I error: 0.05 typical
Level of Type II error: 0.20 typical
One sided vs two sided: nearly always two
Inherent variability of population


Usually estimated from preliminary data
The difference that would be meaningful
between the two assessment arms.
One-sided vs. Two-sided

Most tests should be framed as a twosided test.

When comparing two samples, we usually
cannot be sure which is going to be be
better.


You never know which directions study results
will go.
For routine medical research, use only twosided tests.
Sample size for proportions
Stata input: Mean 1 = .2, mean 2 = .3,  = .05, power (1-) =.8.
Sample Size for Continuous Data
Stata input: Mean 1 = 20, mean 2 = 30,  = .05, power (1-) =.8, std. dev. 10.
Statistical Tests

Parametric tests


Continuous data normally distributed
Non-parametric tests


Continuous data not normally distributed
Categorical or Ordinal data
Comparison of 2 Sample Means

Student’s T test

Assumes normally distributed continuous
data.
T value = difference between means
standard error of difference

T value then looked up in Table to
determine significance
Paired T Tests



Uses the change before
and after intervention in a
single individual
Reduces the degree of
variability between the
groups
Given the same number of
patients, has greater
power to detect a
difference between groups
Analysis of Variance

Used to determine if two or more samples are
from the same population- the null
hypothesis.



If two samples, is the same as the T test.
Usually used for 3 or more samples.
If it appears they are not from same
population, can’t tell which sample is different.

Would need to do pair-wise tests.
Non-parametric Tests

Testing proportions



Testing ordinal variables



(Pearson’s) Chi-Squared (2) Test
Fisher’s Exact Test
Mann Whiney “U” Test
Kruskal-Wallis One-way ANOVA
Testing Ordinal Paired Variables


Sign Test
Wilcoxon Rank Sum Test
Use of non-parametric tests




Use for categorical, ordinal or non-normally
distributed continuous data
May check both parametric and nonparametric tests to check for congruity
Most non-parametric tests are based on
ranks or other non- value related methods
Interpretation:

Is the P value significant?
(Pearson’s) Chi-Squared (2) Test




Used to compare observed proportions of an
event compared to expected.
Used with nominal data (better/ worse;
dead/alive)
If there is a substantial difference between
observed and expected, then it is likely that
the null hypothesis is rejected.
Often presented graphically as a 2 X 2 Table
Chi-Squared (2) Test

Chi-Squared (2) Formula

Not applicable in small samples

If fewer than 5 observations per cell, use
Fisher’s exact test
Correlation

Assesses the linear relationship between two variables


Example: height and weight
Strength of the association is described by a correlation
coefficient- r








r = 0 - .2
r = .2 - .4
r = .4 - .6
r = .6 - .8
r = .8 - 1
low, probably meaningless
low, possible importance
moderate correlation
high correlation
very high correlation
Can be positive or negative
Pearson’s, Spearman correlation coefficient
Tells nothing about causation
Correlation
Source: Harris and Taylor. Medical Statistics Made Easy
Correlation
Perfect Correlation
Source: Altman. Practical Statistics for Medical Research
Correlation
Correlation Coefficient 0
Correlation Coefficient .3
Source: Altman. Practical Statistics for Medical Research
Correlation
Correlation Coefficient -.5
Correlation Coefficient .7
Source: Altman. Practical Statistics for Medical Research
Regression

Based on fitting a line to data

Provides a regression coefficient, which is the slope of the
line


Use to predict a dependent variable’s value based on the
value of an independent variable.


Y = ax + b
Very helpful- In analysis of height and weight, for a known
height, one can predict weight.
Much more useful than correlation

Allows prediction of values of Y rather than just whether
there is a relationship between two variable.
Regression

Types of regression




Linear- uses continuous data to predict continuous
data outcome
Logistic- uses continuous data to predict
probability of a dichotomous outcome
Poisson regression- time between rare events.
Cox proportional hazards regression- survival
analysis.
Multiple Regression Models


Determining the association between two
variables while controlling for the values of
others.
Example: Uterine Fibroids


Both age and race impact the incidence of
fibroids.
Multiple regression allows one to test the impact of
age on the incidence while controlling for race
(and all other factors)
Multiple Regression Models

In published papers, the multivariable models are
more powerful than univariable models and take
precedence.



Therefore we discount the univariable model as it does not
control for confounding variables.
Eg: Coronary disease is potentially affected by age, HTN,
smoking status, gender and many other factors.
If assessing whether height is a factor:

If it is significant on univariable analysis, but not on
multivariable analysis, these other factors confounded the
analysis.
Risk Ratios

Risk is the probability that an event will happen.


Number of events divided by the number of people at risk.
Risks are compared by creating a ratio

Example: risk of colon cancer in those exposed to a factor vs
those unexposed

Risk of colon cancer in exposed divided by the risk in those
unexposed.
Risk Ratios

Typically used in cohort studies


Allows exploration of the probability that
certain factors are associated with outcomes
of interest


Prospective observational studies comparing
groups with various exposures.
For example: association of smoking with lung
cancer
Usually require large and long-term studies
to determine risks and risk ratios.
Interpreting Risk Ratios

A risk ratio of 1 equals no increased risk

A risk ratio of greater than 1 indicates increased risk

A risk ratio of less than 1 indicates decreased risk

95% confidence intervals are usually presented

Must not include 1 for the estimate to be statistically
significant.

Example: Risk ratio of 3.1 (95% CI 0.97- 9.41) includes 1, thus
would not be statistically significant.
Odds Ratios

Odds of an event occurring divided by
the odds of the event not occurring.

Odds are calculated by the number of
times an event happens by the number of
times it does not happen.

Odds of heads vs the odds of tails is 1:1 or 1.
Odds Ratios

Are calculated from case control studies

Case control: patients with a condition (often rare) are compared
to a group of selected controls for exposure to one or more
potential etiologic factors.

Cannot calculate risk from these studies as that requires the
observation of the natural occurrence of an event over time in
exposed and unexposed patients (prospective cohort study).

Instead we can calculate the odds for each group.
Comparing Risk and Odds Ratios

For rare events, ratios very similar

If 5 of 100 people have a complication:



If more common events, ratios begin to differ

If 30 of 100 people have a complication:



The odds are 5/95 or .0526.
The risk is 5/100 or .05.
The odds are 30/70 or .43
The risk is 30/100 or .30
Very common events, ratios very different

Male versus female births


The odds are .5/.5 or 1
The risk is .5/1 or .5
Risk reduction



Absolute risk reduction: amount that the risk is
reduced.
Relative risk reduction: proportion or percentage
reduction.
Example:




Death rate without treatment: 10 per 1000
Death rate with treatment: 5 per 1000
ARR = 5 per 1000
RRR = 50%
Survivial Analysis


Evaluation of time to an event (death,
recurrence, recover).
Provides means of handling censored data


Patients who do not reach the event by the end of
the study or who are lost to follow-up
Most common type is Kaplan-Meier analysis


Curves presented as stepwise change from
baseline
There are no fixed intervals of follow-up- survival
proportion recalculated after each event.
Survival Analysis
Source: Altman. Practical Statistics for Medical Research
Kaplan-Meier Curve
Source: Wikipedia
Kaplan-Meier Analysis

Provides a graphical means of comparing the
outcomes of two groups that vary by intervention or
other factor.

Survival rates can be measured directly from curve.

Difference between curves can be tested for
statistical significance.
Cox Regression Model



AKA: Proportional Hazards Survival Model.
Used to investigate relationship between an event
(death, recurrence) occurring over time and possible
explanatory factors.
Reported result: Hazard ratio (HR).


Ratio of the hazard in one group divided the hazard in
another.
Interpreted same as risk ratios and odds ratios



HR 1 = no effect
HR > 1 increased risk
HR < 1 decreased risk
Cox Regression Model

Common use in long-term studies
where various factors might predispose
to an event.

Example: after uterine embolization, which
factors (age, race, uterine size, etc) might
make recurrence more likely.
Summary

Understanding basic statistical concepts is central to
understanding the medical literature.

Not important to understand the basis of the tests or
the underlying math.

Need to know when a test should be used and how to
interpret its results

Statistical Methods A Brief Review

Transcript Statistical Methods A Brief Review

Directory