www.kumc.edu

Download Report

Transcript www.kumc.edu

Introduction to Biostatistics
for Clinical and Translational
Researchers
KUMC Departments of Biostatistics & Internal Medicine
University of Kansas Cancer Center
FRONTIERS: The Heartland Institute of Clinical and Translational Research
Course Information
 Jo A. Wick, PhD
 Office Location: 5028 Robinson
 Email: [email protected]
 Lectures are recorded and posted at
http://biostatistics.kumc.edu under ‘Educational
Opportunities’
Course Objectives
 Understand the role of statistics in the scientific
process
 Understand features, strengths and limitations of
descriptive, observational and experimental
studies
 Distinguish between association and causation
 Understand roles of chance, bias and confounding
in the evaluation of research
Course Calendar
 June 29: Descriptive Statistics and Core Concepts
 July 6: Hypothesis Testing
 July 13: Linear Regression & Survival Analysis
 July 20: Clinical Trial & Experimental Design
Hypothesis Testing
Which test do I use?
 Research questions usually concern one of the
following population parameters:
 Mean: μ—interval or ratio response
 Proportion: π—nominal or ordinal response
 Time-to-event: t—combination response
 These questions can involve exploring the
characteristics of one group of interest, or they can
explore differences in characteristics of 2 or more
groups of interest.
Inferences on a Single Mean
 Example: BMI of single population—is it greater
than 26.3?
 One sample t-test
Inferences on Two Means
 Example: Smoking cessation
 Two types of therapy: x = {behavioral therapy, literature}
 Dependent variable: y = number of cigarettes smoked
per day after six months of therapy1
Behavioral Therapy
Literature Only
1
6
0
2
6
0
0
12
3
4
1 Some other response that takes into account the difference from baseline would be more appropriate: change from baseline or %change from baseline
Smoking Cessation
 Research question: Is behavioral therapy in
addition to education better than education alone
in getting smokers to quit?
 H0: μ1 = μ2 versus H1: μ1 ≠ μ2
 Two independent samples t-test IF:
 the number of cigarettes smoked is approximately normal
OR can be transformed to an approximate normal
distribution (e.g., natural log)
 the variability within each group is approximately the
same (ROT: no more than 2x difference)
Smoking Cessation
Smoking Cessation
Smoking Cessation
Smoking Cessation
Reject H0: μ1 = μ2
 Conclusion: Adding behavioral therapy to
cessation education results in—on average—
significantly fewer cigarettes smoked per day at six
months post-therapy when compared to education
alone (t30.9 = -2.87, p < 0.01).
Smoking Cessation
 The 95% confidence interval is:
-8.39 ≤ μ1 - μ2 ≤ -1.42
 Interpretation: On average, the difference in
number of cigarettes smoked per day between the
two groups is 4.9 cigarettes (95%CI: 1.42, 8.39).
Confidence Intervals
 What exactly do confidence intervals represent?
 Remember that theoretical sampling distribution
concept?
 It doesn’t actually exist, it’s only mathematical.
 What would we see if we took sample after sample after
sample and did the same test on each . . .
Confidence Intervals
 Suppose we actually took sample after sample . . .
 100 of them, to be exact
 Every time we take a different sample and compute the
confidence interval, we will likely get a slightly different
result simply due to sampling variability.
Confidence Intervals
 Suppose we actually took sample after sample . . .
 100 of them, to be exact
 95% confident means: “In 95 of the 100 samples, our
interval will contain the true unknown value of the
parameter. However, in 5 of the 100 it will not.”
Confidence Intervals
 Suppose we actually took sample after sample . . .
 100 of them, to be exact
 Our “confidence” is in the procedure that produces the
interval—i.e., it performs well most of the time.
 Our “confidence” is not directly related to our particular
interval—we cannot say “The probability that the mean
number of cigarettes is between (1.4,8.4) is 0.95.”
Inferences on More Than Two
Means
 Example: Smoking cessation
 Three types of therapy: x = {pharmaceutical therapy,
behavioral therapy, literature}
 Dependent variable: y = number of cigarettes smoked
per day after six months of therapy
Pharmaceutical Therapy
Behavioral Therapy
Literature Only
0
1
6
3
0
2
0
6
0
3
0
12
6
3
4
Smoking Cessation
 Research question: Is therapy in addition to
education better than education alone in getting
smokers to quit? If so, is one therapy more
effective?
 H0: μ1 = μ2 = μ3 versus H1: At least one μ is different
 More than 2 independent samples requires an
ANOVA:
 the number of cigarettes smoked is approximately normal
OR can be transformed to an approximate normal
distribution (e.g., natural log)
 the variability within each group is approximately the
same (ROT: no more than 2x difference)
Smoking Cessation
Smoking Cessation
Smoking Cessation
Smoking Cessation
 Test of the ‘homogeneity’ assumption using Levene
or Brown-Forsythe test:
 Conclusion: Reject H0: σ1 = σ2 = σ3
Smoking Cessation
 Counts are notorious for this—try a natural log
transformation
 Note: Make sure you add 1 to each count because the
log of 0 does not exist.
 Modification: new y = log(y + 1)
 Retest!
Smoking Cessation
Smoking Cessation
 Test of the ‘homogeneity’ assumption using Levene
or Brown-Forsythe test on the transformed count:
 Conclusion: Fail to reject H0: σ1 = σ2 = σ3
Smoking Cessation
 ANOVA produces a table:
 One-way ANOVA indicates you have a single
categorical factor x (e.g., treatment) and a single
continuous response y and your interest is in
comparing the mean response μ across the levels
of the categorical factor.
Wait . . .
 Why is ANOVA using variances when we’re
hypothesizing about means?
 Between-groups mean square: a variance
 Within-groups mean square: also a variance
 F: a ratio of variances—F = MSBG/MSWG
What’s the Rationale?
 In the simplest case of the one-way ANOVA, the
variation in the response y is broken down into
parts: variation in response attributed to the
treatment (group/sample) and variation in
response attributed to error (subject
characteristics + everything else not controlled for)
 The variation in the treatment (group/sample) means is
compared to the variation within a treatment
(group/sample) using a ratio—this is the F test statistic!
 If the between treatment variation is a lot bigger than the
within treatment variation, that suggests there are some
different effects among the treatments.
Rationale
1
2
3
Rationale
 There is an obvious difference between scenarios
1 and 2. What is it?
 Just looking at the boxplots, which of the two
scenarios (1 or 2) do you think would provide more
evidence that at least one of the populations is
different from the others? Why?
Rationale
1
2
3
F Distribution
Properties, F(dfnum, dfden)
 The values are non-
negative, start at zero and
extend to the right,
approaching but never
touching the horizontal
axis.
 The distribution of F
changes as the degrees of
freedom change.
F=
Variation between the sample means
Natural variation within the samples
F Statistic
F=
Variation between the sample means
Natural variation within the samples
 Case A: If all the sample means were exactly the
same, what would be the value of the numerator of
the F statistic?
 Case B: If all the sample means were spread out
and very different, how would the variation
between sample means compare to the value in
A?
F Statistic
F=
Variation between the sample means
Natural variation within the samples
 So what values could the F statistic take on?
 Could you get an F that is negative?
 What type of values of F would lead you to believe
the null hypothesis—that there is no difference in
group means—is not accurate?
Smoking Cessation
 ANOVA produces a table:
 Conclusion: Reject H0: μ1 = μ2 = μ3. Some
difference in the number of cigarettes smoked per
day exists between subjects receiving the three
types of therapy.
Smoking Cessation
 ANOVA produces a table:
 But where is the difference? Are the two
experimental therapies different? Or is it that each
are different from the control?
Smoking Cessation
 Reject H0: μ1 = μ3 and μ1 = μ2. Both pharmaceutical
and behavioral therapy are significantly different
from the literature only control group, but the two
therapies are not different from each other.
Smoking Cessation
 Conclusion: Adding either behavioral (p = 0.015) or
pharmaceutical therapy (p < 0.01) to cessation
education results in—on average—significantly fewer
cigarettes smoked per day at six months post-therapy
when compared to education alone.
Smoking Cessation
 On average, the number of cigarettes smoked per
day by subjects receiving behavioral and
pharmaceutical therapy is 1.1 fewer cigarettes
(95%CI: 0.16, 2.79) and 1.5 fewer cigarettes
(95%CI: 0.36, 3.45), respectively, than control
subjects.
Inferences on Means
 Concerns a continuous response y
 One or two groups: t
 More than two groups: ANOVA
 Remember, this (and the two-sample case) is essentially
looking at the association between an x and a y, where x
is categorical (nominal or ordinal) and y is continuous
(interval or ratio).
 Check assumptions!
 Normality of y
 Equal group variances
ANOVA Models
 There are many . . .
Randomized designs with one treatment
A. Subjects not subdivided on any basis other than randomization prior to assignment to treatment
levels; no restriction on random assignment other than the option of assigning the same number of
subjects to each treatment level
1. Completely randomized or one factor design
B. Subjects subdivided on some nonrandom basis or one or more restrictions on random assignment
other than assigning the same number of subjects to each treatment level
1. Balanced incomplete block design
2. Crossover design
3. Generalized randomized block design
4. Graeco-Latin square design
5. Hyper-Graeco-Latin square design
6. Latin square design
7. Partially balanced incomplete block design
8. Randomized block design
9. Youden square design
Randomized designs with two or more treatments
A. Factorial experiments: designs in which all treatment levels are crossed
1. Designs without confounding
a. Completely randomized factorial design
b. Generalized randomized factorial design
c. Randomized block factorial design
2. Design with group-treatment confounding
a. Split-plot factorial design
3. Designs with group-interaction confounding
a. Latin square confounded factorial design
b. Randomized block completely confounded factorial design
c. Randomized block partially confounded factorial design
4. Designs with treatment-interaction confounding
a. Completely randomized fractional factorial design
Inferences on Proportions (k = 2)
 Example: plant genetics
 Two phenotypes: x = {yellow-flowered plants, greenflowered plants}
 Dependent variable: y = proportion of plants out of 100
progeny that express each phenotype
Phenotype
Yellow
Yellow
Green
Yellow
Green
x
y=
n
Plant Genetics
 The plant geneticist hypothesizes that his crossed
progeny will result in a 3:1 phenotypic ratio of
yellow-flowered to green-flowered plants.
 H0: The population contains 75% yellow-flowered
plants versus H1: The population does not contain
75% yellow-flowered plants.
 H0: πy = 0.75 versus H1: πy ≠ 0.75
 This particular type of test is referred to as the chi-
square goodness of fit test for k = 2.
Plant Genetics
 Chi-square statistics compute deviations between
what is expected (under H0) and what is actually
observed in the data:
 
2
x
O - E 
2
E
 DF = k – 1 where
k is number of
categories of x
Plant Genetics
 Suppose the researcher actually observed in his
sample of 100 plants this breakdown of phenotype:
Phenotype
f (%)
Yellow-flowered
84 (84%)
Green-flowered
16 (16%)
 Does it appear that this type of sample could have
come from a population where the true proportion
of yellow-flowered plants is 75%?
Plant Genetics
Phenotype
f (%)
Yellow-flowered
84 (84%)
Green-flowered
16 (16%)

2
1
84 - 75 


75
2
16 - 25 


25
2
 4.32
 Conclusion: Reject H0: πy = 0.75—it does not
appear that the geneticist’s hypothesis about the
population phenotypic ratio is correct (p = 0.038).
Inferences on Proportions (k > 2)
 Example: plant genetics
 Four phenotypes: x = {yellow-smooth flowered, yellowwrinkled flowered, green-smooth flowered, greenwrinkled flowered}
 Dependent variable: y = proportion of plants out of 250
progeny that express each phenotype
Phenotype
Yellow smooth
Yellow smooth
Green wrinkled
Yellow wrinkled
x
y=
n
Plant Genetics
 The plant geneticist hypothesizes that his crossed
progeny will result in a 9:3:3:1 phenotypic ratio of
YS:YW:GS:GW plants.
 Actual numeric hypothesis is H0: π1 = 0.5625, π2 =
0.1875, π3 = 0.1875, π4 = 0.0625
 This particular type of test is referred to as the chisquare goodness of fit test for k = 4.
Plant Genetics
 Chi-square statistics compute deviations between
what is expected (under H0) and what is actually
observed in the data:
 
2
x
O - E 
2
E
 DF = k – 1 where
k is number of
categories of x
Plant Genetics
 Suppose the researcher actually observed in his
sample of 250 plants this breakdown of phenotype:
Phenotype
f (%)
YS
152 (60.8%)
YW
39 (15.6%)
GS
53 (21.2%)
GW
6 (2.4%)
 Does it appear that this type of sample could have
come from a population where the true phenotypic
ratio is as the geneticist hypothesized?
Plant Genetics
Phenotype
f (%)
YS
152 (60.8%)
YW
39 (15.6%)
GS
53 (21.2%)
GW
6 (2.4%)
32  8.972
 Conclusion: Reject H0—it does not appear that the
geneticist’s hypothesis about the population
phenotypic ratio is correct (p = 0.03).
Inferences on Proportions
 Concerns a categorical response y
 Regardless of the number of groups, a chi-square
test may be used
 Remember, this is essentially looking at the association
between an x and a y, where x is categorical (nominal or
ordinal) and y is categorical (nominal or ordinal).
 Assumptions?
 ROT: No expected frequency should be less than 5 (i.e.,
nπ < 5)
 If not met, use the binomial (k = 2) or multinomial (k > 2)
test
Inferences on Proportions
 What do we do when we have nominal data on
more than one factor x?
 Gender and hair color
 Menopausal status and disease stage at diagnosis
 ‘Handedness’ and gender
 We still use chi-square!
 These types of tests are looking at whether two
categorical variables are independent of one
another—thus, tests of this type are often referred
to as chi-square tests of independence.
Inferences on Proportions
 Example: Hair color and Gender
 Gender: x1 = {M, F}
 Hair Color: x1 = {Black, Brown, Blonde, Red}
Male
Female
Total
Black
Brown
Blonde
Red
Total
32 (32%)
43 (43%)
16 (16%)
9 (9%)
100
64 (32%)
16 (8%)
200
80
25
N = 300
55 (27.5%) 65 (32.5%)
87
108
What the data should look like
in the actual dataset:
Gender
Hair
Color
Male
Black
Female
Red
Female
Blonde
Hair Color and Gender
 The researcher hypothesizes that hair color is not
independent of sex.
 H0: Hair color is independent of gender (i.e., the
phenotypic ratio is the same within each gender).
 H1: Hair color is not independent of gender (i.e.,
the phenotypic ratio is different between genders).
Hair Color and Gender
 Chi-square statistics compute deviations between
what is expected (under H0) and what is actually
observed in the data:
 
2
x
O - E 
2
E
 DF = (r – 1)(c – 1)
where r is number of
rows and c is
number of columns
Hair Color and Gender
 Does it appear that this type of sample could have
come from a population where the different hair
colors occur with the same frequency within each
gender?
 OR does it appear that the distribution of hair color
is different between men and women?
Male
Female
Total
Black
Brown
Blonde
Red
Total
32 (32%)
43 (43%)
16 (16%)
9 (9%)
100
64 (32%)
16 (8%)
200
80
25
N = 300
55 (27.5%) 65 (32.5%)
87
108
Hair Color and Gender
Male
Female
Total
Black
Brown
Blonde
Red
Total
32 (32%)
43 (43%)
16 (16%)
9 (9%)
100
64 (32%)
16 (8%)
200
80
25
N = 300
55 (27.5%) 65 (32.5%)
87
108
32  7.815
 Conclusion: Reject H0: Gender and Hair Color are
independent. It appears that the researcher’s
hypothesis that the population phenotypic ratio is
different between genders is correct (p = 0.029).
Inferences on Proportions
 Special case: when you have a 2X2 contingency
table, you are actually testing a hypothesis
concerning two population proportions: H0: π1 = π2
(i.e., the proportion of males who are blonde is the
same as the proportion of females who are
blonde).
Blonde
Non-blonde
Total
Male
16 (16%)
84 (84%)
100
Female
64 (32%)
136 (68%)
200
Total
80 (26.7%)
220 (73.3%)
N = 300
Inferences on Proportions
 When you have a single proportion and have a
small sample, substitute the Binomial test which
provides exact results.
 The nonparametric Fisher Exact test can be
always be used in place of the chi-square test
when you have contingency table-like data (i.e.,
two categorical factors whose association is of
interest)—it should be substituted for the chisquare test of independence when ‘cell’ sizes are
small.
Inferences on Time-to-Event
 Survival Analysis is the class of statistical
methods for studying the occurrence (categorical)
and timing (continuous) of events.
 The event could be
 development of a disease
 response to treatment
 relapse
 death
 Survival analysis methods are most often applied
to the study of deaths.
Inferences on Time-to-Event
 Survival Time: the time from a well-defined point in
time (time origin) to the occurrence of a given
event.
 Survival data includes:
 a time
 an event ‘status’
 any other relevant subject characteristics
Inferences on Time-to-Event
 Survival data has two features that are difficult to
handle with “conventional” statistical methods:
 censoring—did the event happen or not?
 time-dependent covariates
 Many consider survival data analysis as the
application of conventional statistical methods
when the data is censored.
 Censoring comes in many forms and occurs for
many different reasons.
 The most common and basic types of censored data are
Types I-III right-censored data.
Censoring
 Type I Censoring:
 At the end of a fixed period of time, any subjects not
experiencing the event are censored (e.g., at the end of
the 3-year experiment, all subjects who have not died are
censored at time 3 years).
 If there are no accidental losses, all censored
observations equal the length of the study period.
A
x
B
x
C
x
x
D
0
1
2
3
Censoring
 Type II Censoring:
 Subjects are observed until a fixed pre-specified number
of events have occurred within the sample—after that the
remaining subjects are censored (e.g., experiment with
100 lab rats will stop when 50 of them have died and the
remaining 50 will be censored at that time).
 If there are no accidental losses, the censored
observations equal the largest uncensored observation.
Censoring
 Type III Censoring:
 There is a single termination time, but entry times vary
randomly across subjects.
 For example, patients receive heart surgery at various
time points, typically uncontrolled by the investigator, but
the study has to be terminated on a single date—all
patients still alive on that date will be censored, but their
survival times from surgery will vary.
Censoring
 Type III Censoring:
A
o
B
x
x
C
o
D
0
5
10
Inferences on Time-to-Event
 In most clinical studies the length of study period is
fixed and the patients enter the study at different
times.
 Lost-to-follow-up patients’ survival times are measured
from the study entry until last contact (censored
observations).
 Patients still alive at the termination date will have
survival times equal to the time from the study entry until
study termination (censored observations).
 When there are no censored survival times, the set
is said to be complete.
Functions of Survival Time
 Let T = the length of time until a subject
experiences the event.
 The distribution of T can be described by several
functions:
 Survival Function: the probability that an individual
survives longer than some time, t:
S(t) = P(an individual survives longer than t)
= P(T > t)
Functions of Survival Time
 If there are no censored observations, the survival
function is estimated as the proportion of patients
surviving longer than time t:
# of patients surviving longer than t
Sˆ (t ) =
total # of patients
Functions of Survival Time
 Density Function: The survival time T has a
probability density function defined as the limit of
the probability that an individual experiences the
event in the short interval (t, t + t) per unit width
t:
P  an individual dying in the interval (t, t + t ) 
f (t ) = lim
t 0
t
Functions of Survival Time
 Survival density function:
Functions of Survival Time
 f(t) is a non-negative function:
f(t)  0 for all t  0
= 0 for all t < 0
 Like any probability density, the area under the
curve equals one.
Functions of Survival Time
 If there are no censored observations, f(t) is
estimated as the proportion of patients
experiencing the event in an interval per unit width:
# of patients dying in the interval beginning at time t
fˆ(t ) 
 total # of patients interval width 
 The density function is also known as the
unconditional failure rate.
Functions of Survival Time
 Hazard Function: The hazard function h(t) of
survival time T gives the conditional failure rate.
It is defined as the probability of failure during a
very small time interval, assuming the individual
has survived to the beginning of the interval:
P an individual of age t fails in the time interval (t , t + t )
h(t )  lim
t 0
t
Functions of Survival Time
 The hazard is also known as the instantaneous
failure rate, force of mortality, conditional mortality
rate, or age-specific failure rate.
 The hazard at any time t corresponds to the risk of
event occurrence at time t:
 For example, a patient’s hazard for contracting influenza
is 0.015 with time measured in months.
 What does this mean? This patient would expect to
contract influenza 0.015 times over the course of a month
assuming the hazard stays constant.
Functions of Survival Time
 If there are no censored observations, the hazard
function is estimated as the proportion of patients
dying in an interval per unit time, given that they
have survived to the beginning of the interval:
# of patients dying in the interval beginning at time t
hˆ(t ) =
 # of patients surviving at t interval width 
=
# of patients dying per unit time in the interval
# of patients surviving at t
Nonparametric Estimation
 Product-Limit Estimates (Kaplan-Meier): most
widely used in biological and medical applications
 Life Table Analysis (actuarial method): appropriate
for large number of observations or if there are
many unique event times
Nonparametric Methods for
Comparing Survival Distributions
 If your question looks like: “Is the time-to-event
different in group A than in group B (or C . . . )?”
then you have several options, including:
 Gehan’s Generalized Wilcoxon Test: better for detecting
short-term differences in survival
 Cox-Mantel Test
 Log-rank Test: better for detecting long-term differences
in survival
 Peto and Peto’s Generalized Wilcoxon Test
 Cox’s F-Test
 Mantel and Haenszel Test: allows stratification based on
prognostic factors
Inferences for Time-to-Event
 Example: survival in squamous cell carcinoma
 A pilot study was conducted to compare
Accelerated Fractionation Radiation Therapy
versus Standard Fractionation Radiation Therapy
for patients with advanced unresectable squamous
cell carcinoma of the head and neck.
 The researchers are interested in exploring any
differences in survival between the patients treated
with Accelerated FRT and the patients treated with
Standard FRT.
Inferences for Time-to-Event
 H0: S1(t) = S2(t) for all t
Overall Survival by Treatment
 H1: S1(t) ≠ S2(t) for at least one t
1.00
AFRT
SFRT
Survival Probability
0.75
0.50
0.25
0.00
0
12
24
36
48
60
72
Survival Time (months)
84
96
108
120
Squamous Cell Carcinoma
Gender
Male
Female
Age
Median
Range
Primary Site
Larynx
Oral Cavity
Pharynx
Salivary Glands
Stage
III
IV
Tumor Stage
T2
T3
T4
AFRT
SFRT
28 (97%)
1 (3%)
16 (100%)
0
61
30-71
65
43-78
3 (10%)
6 (21%)
20 (69%)
0
4 (25%)
1 (6%)
10 (63%)
1 (6%)
4 (14%)
25 (86%)
8 (50%)
8 (50%)
3 (10%)
8 (28%)
18 (62%)
2 (12%)
7 (44%)
7 (44%)
Squamous Cell Carcinoma
Median Survival Time:
AFRT: 18.38 months (2 censored)
SFRT: 13.19 months (5 censored)
Overall Survival by Treatment
1.00
AFRT
SFRT
Survival Probability
0.75
0.50
0.25
0.00
0
12
24
36
48
60
72
Survival Time (months)
84
96
108
120
Squamous Cell Carcinoma
Overall Survival by Treatment
Log-Rank1.00
test p-value= 0.5421
AFRT
SFRT
Survival Probability
0.75
0.50
0.25
0.00
0
12
24
36
48
60
72
Survival Time (months)
84
96
108
120
Squamous Cell Carcinoma
 Staging of disease is also prognostic for survival.
 Shouldn’t we consider the analysis of the survival
of these patients by stage as well as by treatment?
Squamous Cell Carcinoma
Overall Survival by Treatment and Stage
1.00
AFRT/Stage 3
AFRT/Stage 4
SFRT/Stage 3
SFRT/Stage 4
Median Survival Time:
AFRT Stage 3: 77.98 mo.
AFRT Stage 4: 16.21 mo.
SFRT Stage 3: 19.34 mo.
SFRT Stage 4: 8.82 mo.
Survival Probability
0.75
0.50
0.25
0.00
0
12
24
36
48
60
72
Survival Time (Months)
Log-Rank test p-value = 0.0792
84
96
108
120
Inferences on Time-to-Event
 Concerns a response that is both categorical
(censor) and continuous (time)
 There are several nonparametric methods that can
be used—choice should be based on whether you
anticipate a short-term or long-term benefit.
What about adjustments?
 There may be other predictors or explanatory
variables that you believe are related to the
response other than the actual factor (treatment) of
interest.
 Regression methods will allow you to incorporate
these factors into the test of a treatment effect:
 Logistic regression: when y is categorical and nominal
binary
 Multinomial logistic regression: when y is categorical with
more than 2 nominal categories
 Ordinal logistic regression: when y is categorical and
ordinal
What about adjustments?
 There may be other predictors or explanatory
variables that you believe are related to the
response other than the actual factor (treatment) of
interest.
 Regression methods will allow you to incorporate
these factors into the test of a treatment effect:
 Linear regression: when y is continuous and the factors
are a combination of categorical and continuous (or just
continuous)
 Two- and three-way ANOVA: when y is continuous and
the factors are all categorical
What about adjustments?
 There may be other predictors or explanatory
variables that you believe are related to the
response other than the actual factor (treatment) of
interest.
 Regression methods will allow you to incorporate
these factors into the test of a treatment effect:
 Cox regression: when y is a time-to-event outcome
Summary
 Hypothesis Testing
 Choice of test depends on several factors
• What is your question?
• What is your outcome? Is it continuous or categorical? NOIR?
• What is your independent variable (if there is one . . . )? Is it
continuous or categorical? Is there more than one?
 Inferences on
 Means
 Proportions
 Time-to-Event
 Regression methods for incorporating prognostic factors
Summary
 We still have to cover:
 Correlation and Linear Regression
 Logistic Regression
• Sensitivity and specificity (or false-positive and false-negative rates)
 Cox Regression
 Design of Experiments
Additional Material
 Time allowing!
Incidence and Prevalence
 An incidence rate of a disease is a rate that is
measured over a period of time; e.g., 1/100
person-years.
 For a given time period, incidence is defined as:
# of newly - diagnosed cases of disease
# of individuals at risk
 Only those free of the disease at time t = 0 can be
included in numerator or denominator.
Incidence and Prevalence
 A prevalence ratio is a rate that is taken at a
snapshot in time (cross-sectional).
 At any given point, the prevalence is defined as
# with the illness
# of individuals in population
 The prevalence of a disease includes both new
incident cases and survivors with the illness.
Incidence and Prevalence
 Prevalence is equivalent to incidence multiplied by
the average duration of the disease.
 Hence, prevalence is greater than incidence if the
disease is long-lasting.
Measurement Error
 To this point, we have assumed that the outcome
of interest, x, can be measured perfectly.
 However, mismeasurement of outcomes is
common in the medical field due to fallible tests
and imprecise measurement tools.
Diagnostic Testing
Sensitivity and Specificity
 Sensitivity of a diagnostic test is the probability that
the test will be positive among people that have the
disease.
P(T+| D+) = TP/(TP + FN)
 Sensitivity provides no information about people that
do not have the disease.
 Specificity is the probability that the test will be
negative among people that are free of the disease.
Pr(T-|D-) = TN/(TN + FP)
 Specificity provides no information about people that
have the disease.
Prevalence
SN == 56/70
SP
24/30
= =30/100
= 0.80
0.80 = 0.30
Healthy
Diseased
Non-Diseased
Diseased
Diseased
Non-Diseased
Positive Diagnosis
Negative Diagnosis
Diagnosed positive
A perfect diagnostic test has
SN = SP = 1
Healthy
Diseased
Non-Diseased
Diseased
Positive Diagnosis
Negative Diagnosis
A 100% inaccurate diagnostic
test has SN = SP = 0
Healthy
Diseased
Non-Diseased
Diseased
Positive Diagnosis
Negative Diagnosis
Sensitivity and Specificity
 Example: 100 HIV+ patients are given a new
diagnostic test for rapid diagnosis of HIV, and 80 of
these patients are correctly identified as HIV+
What is the sensitivity of this new diagnostic test?
 Example: 500 HIV- patients are given a new
diagnostic test for rapid diagnosis of HIV, and 50 of
these patients are incorrectly specified as HIV+
What is the specificity of this new diagnostic test?
(Hint: How many of these 500 patients are correctly
specified as HIV-?)
Positive and Negative Predictive
Value
 Positive predictive value is the probability that a person
with a positive diagnosis actually has the disease.
Pr(D+|T+) = TP/(TP + FP)
 This is often what physicians want-patient tests
positive for the disease; does this patient actually have
the disease?
 Negative predictive value is the probability that a
person with a negative test does not have the disease.
Pr(D-|T-) = TN/(TN + FN)
 This is often what physicians want-patient tests
negative for the disease; is this patient truly disease
free?
NPV
56/62 == 0.63
0.90
PPV == 24/38
Healthy
Diseased
Non-Diseased
Diseased
Diseased
Non-Diseased
Positive Diagnosis
Negative Diagnosis
Diagnosed positive
A perfect diagnostic test has
PPV = NPV = 1
Healthy
Diseased
Non-Diseased
Diseased
Positive Diagnosis
Negative Diagnosis
A 100% inaccurate diagnostic
test has PPV = NPV = 0
Healthy
Diseased
Non-Diseased
Diseased
Positive Diagnosis
Negative Diagnosis
PPV and NPV
 Example: 50 patients given a new diagnostic test
for rapid diagnosis of HIV test positive, and 25 of
these patients are actually HIV+.
What is the PPV of this new diagnostic test?
 Example: 200 patients given a new diagnostic test
for rapid diagnosis of HIV test negative, but 2 of
these patients are actually HIV+.
What is the NPV of this new diagnostic test? (Hint:
How many of these 200 patients testing negative for
HIV are truly HIV-?)