슬라이드 1 - Stony Brook University

Download Report

Transcript 슬라이드 1 - Stony Brook University

Group 4
AMS 572
Table of Contents
1. Introduction and History
1.1 Part 1: Ahram Woo
1.2 Part 2: Jingwen Zhu
2. Theoretical Background
2.1 Part 1: Xin Yu
2.2 Part 2: Unjung Lee
3. Application of ANCOVA and Summary
3.1 Part 1: Xiaojuan Shang
3.2 Part 2: Younga Choi
3.3 Part 3: Qiao Zhang
1. Introduction and History
Group 4
by Ahram Woo
1. Introduction and History
Individual
by Ahram Woo
Xin Yu
Ahram Woo
Unjung Lee
Jingwen Zhu
Qiao Zhang
Xiaojuan Shang
Younga Choi
1. Introduction and History
1.1 Introduction to ANCOVA
by Ahram Woo
• Analysis of covariance : An extension of
ANOVA in which main effects and
interactions are assessed on Dependent
Variable(DV) scores after the DV has been
adjusted for by the DV’s relationship with one
or more Covariates (CVs)
• ANCOVA = ANOVA + Linear Regression
1. Introduction and History
1.1 Introduction to ANCOVA
• R.A. Fisher who is credited with
the introduction of ANCOVA
"Studies in crop variation. IV. The
experimental determination of
the value of top dressings with
cereals" published in Journal of
Agricultural Science, vol. 17, 548562. The paper was published in
1927.
by Ahram Woo
1. Introduction and History
1.1 Introduction to ANCOVA
by Ahram Woo
• ANOVA is described by R. A. Fisher to assist in
the analysis of data from agricultural
experiments.
• ANOVA compare the means of any number of
experimental conditions without any increase
in Type 1 error.
• ANOVA is a way of determining whether the
average scores of groups differed significantly.
1. Introduction and History
1.2 Introduction to Linear Regression
by Jingwen Zhu
Model the relationship between
explanatory and dependent variables by
fitting a linear equation to observed data.
(i.e. Y = a + bX)
There is a relationship or not ?
One variable causes the other?
Scatter Plot & Correlation Coefficient
1. Introduction and History
1.2 Introduction to Linear Regression
by Jingwen Zhu
The term “ regression” was first studied in d
epth by 19th-century scientist,
.





Geographer
Psychologist
Statistician
Meteorologist
Eugenicist
1. Introduction and History
1.2 Introduction to Linear Regression
by Jingwen Zhu
Galton studied data on relative heights of fathers
and their sons
Conclusions:
 A taller-than-average father tends to produce
a taller-than-average son
 The son is likely to be less tall than the father in
terms of his relative position within his own
population
1. Introduction and History
1.2 Introduction to Linear Regression
by Jingwen Zhu
 ANCOVA is a merger of ANOVA and
regression.
 ANCOVA allows to compare one variable in 2
or more groups taking into account (or to
correct for) variability of other variables, called
covariates.
 The inclusion of covariates can increase
statistical power because it accounts for some
of the variability
1. Introduction and History
1.2 Introduction to Linear Regression
by Jingwen Zhu
Example: whether MCAT scores are significantly
different among medical students who had
different types of undergraduate majors, when
adjusted for year of matriculation?
•Dependent variable (continuous)
 MCAT total (most recent)
•Fixed factor (categorical variables)
 Undergraduate major
• 1 = Biology/Chemistry
• 2 = Other science/health
• 3 = Other
•Covariate
 Year of matriculation
1. Introduction and History
1.2 Introduction to One-way Analysis of Variance
by Jingwen Zhu
 One factor of k levels or groups. E.g., 3
treatment groups in a drug study.
 The main objective is to examine the equality of
means of different groups.
 Total variation of observations (SST) can be split
in two components: variation between groups
(SSA) and variation within groups (SSE).
1. Introduction and History
1.2 Introduction to One-way Analysis of Variance
by Jingwen Zhu
Consider a layout of a study with 16 subjects
that intended to compare 4 treatment
groups (G1-G4). Each group contains four
subjects.
G1
G2
G3
G4
S1
Y11
Y21
Y31
Y41
S2
Y12
Y22
Y32
Y42
S3
Y13
Y23
Y33
Y43
S4
Y14
Y24
Y34
Y44
1. Introduction and History
1.2 Introduction to One-way Analysis of Variance
by Jingwen Zhu
 Model:
yij = m + ai + eij , where, i = 1, 2.....a; j = 1, 2.....n
where, yij is the ith observation of jth group,
ai is the effect of ith group,
m is the general mean and eij is the error.
 Assumptions:
– Observations yij are independent.
– e ij are normally distributed with mean
zero and constant standard deviation.
1. Introduction and History
1.2 Introduction to One-way Analysis of Variance
by Jingwen Zhu
 Hypothesis
Ho: Means of all groups are equal.
Ha: At least one of them is not equal to
other.
 ANOVA Table
Source of
Variance
Sum of
Squares
Degree of
Freedom
Mean
Square
F
Treatment
SSA
a-1
SSA/(a-1)
MSA/MSE
Error
SSE
N-a
SSE/(N-a)
Total
SST
N-1
1. Introduction and History
1.2 Introduction to One-way Analysis of Variance
by Jingwen Zhu
SSA (Variation between groups) is due to
the difference in different groups. E.g.
different treatment groups or different
doses of the same treatment.
a
SSA = nå (yi - y. )2
i=1
a
n
åå y
ij
y. =
i=1 j=1
N
1. Introduction and History
1.2 Introduction to One-way Analysis of Variance
by Jingwen Zhu
Treatment
1
2
….
a
y21
….
ya1
y12
y22
….
ya2
….
….
….
….
y1n1
y2n2
y11
SAMPLE
MEAN
yana
….
y1
y2
ya
1. Introduction and History
1.2 Introduction to One-way Analysis of Variance
by Jingwen Zhu
SSE (Variation within groups) is the inherent
variation among the observations within
each group.
a
ni
SSE = åå(yij - yi )2
i=1 j=1
SSE
s = MSE =
N -a
2
1. Introduction and History
1.2 Introduction to One-way Analysis of Variance
by Jingwen Zhu
Treatment
Sample
Mean
1
2
….
a
y11
y21
….
ya1
y12
y22
….
ya2
….
….
….
….
y1n
y2n
....
yan
y1
y2
….
ya
1. Introduction and History
1.2 Introduction to One-way Analysis of Variance
• SST (SUM SQUARE OF TOTAL) is the
combination of SSE and SSA
a
n
SST = åå(yij - y. )2
i=1 j=1
SST = SSE + SSA
by Jingwen Zhu
2. Theoretical Background
2.1 Model of ANOVA
Y
Data, the
jth
observatio
ij
by Xin Yu
u 

Grand
mean of Y
i


ij
Error N(0, σ ^2)
n of the ith
Effects of the jth group(we mainly
group
focus on when ai=0,i=1, …,a )
2. Theoretical Background
2.1 Model of Linear Regression
Data, the
(ij)th
observation
Predictor
by Xin Yu
Error
Slope and Intersect (we mainly
focus on the estimate)
2. Theoretical Background
2.1 ANCOVA: ANOVA Merged With Linear Regression
Y  u    ( X
ij
i
Effects of the ith
group (We still
focus on if ai=0,
i=1,… ,a)
ij
by Xin Yu
 X ..)   ij
Known covariance
2. Theoretical Background
2.1 How to Perform ANCOVA
Y u a   (X
ij
i
~
Y
ij
by Xin Yu
ij
 X ..)   ij
(adjust)  Y ij  ˆ ( X ij  X ..)
ANOVA Model!
2. Theoretical Background
2.1 How do we get
Y
ij
    i   ( X ij  X ..)   ij
Within each group, consider ai as a
constant, and notice that we actually
only desire the estimate of slope β
instead of intersect.
by Xin Yu
2. Theoretical Background
2.1 How do we get
(continue)
by Xin Yu
(*)Within each group, do Least Square:
(*)Assume that β1=…=βi=…=βa
(*)Which means that αi and β are independent; Or,
Covariate has nothing to do with group effect
2. Theoretical Background
2.1 How do we get
(continue)
We use POOLED ESTIMATE of β
by Xin Yu
2. Theoretical Background
2.1 Model of ANOVA
by Xin Yu
2. Theoretical Background
2.2.A The Simple Linear Regression Model
by Unjung Lee
Y = β0 + β1 X+ ε
Y : dependent (response) variable
X : independent (predictor) variable
β0 : the intercept
β1 : the slope
ε : error term ~ N(0,σ2)
E(Y) = β0 + β1X
2. Theoretical Background
2.2.A The Simple Linear Regression Model
by Unjung Lee
Y
E(Y) =β0 + β1 x
y
Error: 
} β = Slope
}
{
1
1
β0 = Intercept
X
2. Theoretical Background
2.2.A The Simple Linear Regression Model
Y
E(Y) =β0 + β1 x
y
N(y|x, sy|x2)
Identical normal distrib
utions of errors, all cent
ered on the regression
line.
by Unjung Lee
2. Theoretical Background
2.2.A Assumptions of simple linear regression model
by Unjung Lee
 The relationship between X and Y is the strai
ght-line relationship.
 X and Y has a common variance σ2 .
 Error is normally distributed.
 Error is independent.
2. Theoretical Background
2.2.A The least squares(LS) method
by Unjung Lee
2. Theoretical Background
2.2.A The least squares(LS) method
by Unjung Lee
The fitted values and residuals
We can get these ones with the nor
mal equations
2. Theoretical Background
2.2.A Fitting a Regression Line
Y
by Unjung Lee
Y
Data
Three errors from the le
ast squares regression li
ne
X
X
Y
e
Three errors fro
m a fitted line
X
Errors from the least s
quares regression line
are minimized
X
2. Theoretical Background
2.2.A Errors in Regression
by Unjung Lee
Y
yˆ  a  bx
.
yi
Error ei  yi  yˆi
yˆi
the fitted regression line
{
yˆ
the predicted value of Y for x
X
xi
2. Theoretical Background
2.2.A Multiple linear regression
by Unjung Lee
A statistical model that utilizes two or more q
uantitative and qualitative explanatory varia
bles (x1,..., xp) to predict a quantitative depe
ndent variable Y.
Caution: have at least two or more quantitati
ve explanatory variables (rule of thumb)
2. Theoretical Background
2.2.A Dummy-Variable Regression Model
by Unjung Lee
• Involves categorical X variable with
two levels
– e.g., female-male, employed-not emp
loyed, etc.
• Variable levels coded 0 & 1
• Assumes only intercept is different
– Slopes are constant across categories
2. Theoretical Background
2.2.A Dummy-Variable Model Relationships
Y
by Unjung Lee
Same slopes b1
Females
b0 + b2
b0
Males
0
0
X1
2. Theoretical Background
2.2.A Dummy Variables
by Unjung Lee
• Permits use of qualitative data
(e.g.: seasonal, class standing, location, gender)
.
• 0, 1 coding (nominative data)
• As part of Diagnostic Checking;
incorporate outliers
(i.e.: large residuals)
and influence
easures.
m
2. Theoretical Background
2.2.A Interaction Regression Model
by Unjung Lee
• Hypothesizes interaction between pairs of X vari
ables
– Response to one X variable varies at different
levels of another X variable
• Contains two-way cross product terms
Y = 0 + 1x1 + 2x2 + 3x1x2 + 
• Can be combined with other models
e.g. dummy variable models
2. Theoretical Background
2.2.A Effect of Interaction
by Unjung Lee
• Given:
Yi   0  1X1i   2 X 2i   3 X1i X 2i   i
• Without interaction term, effect of X1 on Y is me
asured by β1
• With interaction term, effect of X1 on
Y is measured by β1 + β3X2
– Effect increases as X2i increases
2. Theoretical Background
2.2.A Interaction Example
Y
by Unjung Lee
Y = 1 + 2X1 + 3X2 + 4X1X2
Y = 1 + 2X1 + 3(1) + 4X1(1) = 4 + 6X1
12
8
Y = 1 + 2X1 + 3(0) + 4X1(0) = 1 + 2X1
4
0
0
0.5
1
1.5
X1
Effect (slope) of X1 on Y does depend on X2 value
2. Theoretical Background
2.2.A The two-way ANOVA
by Unjung Lee
2. Theoretical Background
2.2.A The two-way ANOVA table
by Unjung Lee
sourse
df
ss
Ms
Factor A
a-1
SS(A)
Factor B
b-1
SS(B)
MS(A) =
SS(A)/(a-1)
MS(B) =
SS(B)/(b-1)
Intersection AB (a-1)(b-1)
SS(AB)
MS(AB)=
SS(AB)/(a1)(b-1)
Error
ab(r-1)
SSE
SSE/ab(r-1)
Total
abr-1
SS(Total)
2. Theoretical Background
2.2.A Test homogeneity of variance
by Unjung Lee
2. Theoretical Background
2.2.B Test Whether Ho:
by Xin Yu
2. Theoretical Background
2.2.B Test Whether Ho:
by Xin Yu
a
SSEG   SSEi
i 1
ˆ
i
2. Theoretical Background
2.2.B Test Whether Ho:
(2) SSE is generated by
(*) Random Error ε
ˆ

(*)Difference between distinct
i
we can calculate SSE based on a common ˆ i
(3) Let SSA=SSESSA
Sum of Square between Groups
SSA is constituted by the difference between differ
ent β i
by Xin Yu
2. Theoretical Background
2.2.B Test Whether Ho:
df
a
G
 df  df
e

SSA
e

MSA df
SSE

MSA
df
by Xin Yu
 [a(n  1)  1]  a(n  2)  a  1
SSA
a 1
a
G
G
G

SSE
G
a(n  2)
e
MSA
Mean Square between Groups
Mean Square within Groups
Do F test on MSA and
to see whether we can
reject our Ho
F= MSA/
2. Theoretical Background
2.2.C Test Linear Relationship
Assumption 3:
Test a linear relationship between the dependent
variable and covariate.
Ho: β=0
How to do it next?
Use F test on SSR and SSE
S um of
S quare of
R egres s ion
by Xin Yu
2. Theoretical Background
2.2.C Test Linear Relationship
by Xin Yu
How to calculate SSR and MSR?
From each
SST is the difference obtained from the sum
mation of the square of the
differences between and y. .
an

ni
22
SST

(
y

y
)
SSR  
( yˆ ij y. )
i 1 j 1
i 1
i
MSR  SSR /1
2. Theoretical Background
2.2.C Test Linear Relationship
How to calculate SSE and MSE?
From each
by Xin Yu
yˆ
i
SSE is the error obtained from the summation of th
e square of the differences between and
a
ni
SSE   ( yij  yi )2
i 1 j 1
SSE
MSE 
(n  a)
2. Theoretical Background
2.2.C Test Linear Relationship
F
MSR
MSE
Based on the T.S. we determine whether to
accept Ho(β=0) or not.
Assume Assumption 1 and 2 are already passed.
(*)If H0 is true (β=0), we do ANOVA.
(*)Otherwise, we do ANCOVA
So, anytime we want to use ANCOVA, we need to
test the three assumptions first!
by Xin Yu
3. Application of ANCOVA
3.1 Case Introduction
by Xiaojuan Shang
Analysis of covariance (ANCOVA) is a statistical
procedure that allows you to include both cate
gorical and continuous variables in a single mod
el. ANCOVA assumes that the regression coeffici
ents are homogeneous (the same) across the c
ategorical variable. Violation of this assumption
can lead to incorrect conclusions
3. Application of ANCOVA
3.1 Case Introduction
by Xiaojuan Shang
Here is an example data file we will use. It conta
ins 30 subjects who used one of three diets, diet
1 (diet=1), diet 2 (diet=2) and a control group (d
iet=3). Before the start of the study, the height of
the subject was measured, and after the study t
he weight of the subject was measured.
3. Application of ANCOVA
3.1 Data Structure
by Xiaojuan Shang
3. Application of ANCOVA
3.1 Case Concerns
by Xiaojuan Shang
• Difference between three diet groups
• Correlation between height and weight
• Difference between control group and the oth
er two groups
3. Application of ANCOVA
3.1 Case Data: Compare with ANOVA
by Xiaojuan Shang
PROC GLM DATA=htwt;
CLASS diet ;
MODEL weight = diet ;
MEANS diet / deponly ;
CONTRAST 'compare 1&2 with control' diet 1 1 -2 ;
CONTRAST 'compare diet 1 with 2 ' diet 1 -1 0 ;
RUN;
QUIT;
3. Application of ANCOVA
3.1 Case Data: Compare with ANOVA
by Xiaojuan Shang
3. Application of ANCOVA
3.1 Case Data: Compare with ANOVA
by Xiaojuan Shang
3. Application of ANCOVA
3.2 SAS Codes for ANCOVA model: Outline
by Younga Choi
1. Description of data
2. Investigation of equality of slope for the grou
ps through traditional ANOVA model (homog
eneity of regression assumption)
3. When homogeneity of assumption is violated
 examination on the effect of the group va
riable
(diet group) at different levels of the covariat
e
(height levels).
3. Application of ANCOVA
3.2 Data Description
by Younga Choi
•N= 30
•IV:
(1)Diet (three levels)
- diet 1 (diet=1, n=10)
- diet 2 (diet=2, n=10)
- diet 3, control group, (diet=3, n=10)
(2) Height
•DV: weight of the subject was measured after
the study
3. Application of ANCOVA
3.2 Reading the Data & Traditional ANCOVA model
Comparing
means of
diet groups
by Younga Choi
3. Application of ANCOVA
3.2 Homogeneity of Regression Assumption
by Younga Choi
Checking on the Homogeneity of Regression Assumption:
3. Application of ANCOVA
3.2 Homogeneity of Regression Assumption
by Younga Choi
Checking on the Homogeneity of Regression Assumption:
Pairwise Comparisons
3. Application of ANCOVA
3.2 Homogeneity of Regression Assumption
by Younga Choi
When the Homogeneity of Regression Assumption is Violated
3. Application of ANCOVA
3.2 Homogeneity of Regression Assumption
by Younga Choi
Comparing Slope of Diet1 and Diet2 and Diet3 Combined
3. Application of ANCOVA
3.2 Homogeneity of Regression Assumption
by Younga Choi
3. Application of ANCOVA
3.2 Homogeneity of Regression Assumption
by Younga Choi
Overall mean
value of height
3. Application of ANCOVA
3.3 SAS Output- One Way ANOVA Model
by Qiao Zhang
3. Application of ANCOVA
3.3 Standard ANCOVA Model
by Qiao Zhang
The results are consistent with those of the ANOVA
3. Application of ANCOVA
3.3 Assumptions (Homogenity of Regresion)
by Qiao Zhang
3. Application of ANCOVA
3.3 Assumptions (Homogenity of Regresion)
by Qiao Zhang
Diet=1
Dependent Variable: weight
Diet=2
Dependent Variable: weight
Diet=3
Dependent Variable: weight
There is significant linear relationship between wei
ght and height in both diet 2 and diet 3 group, but
not in diet 1 group.
3. Application of ANCOVA
3.3 Assumptions (Homogenity of Regresion)
by Qiao Zhang
The diet*height effect is indeed significant, indicat
ing that the slopes do differ across the three diet g
roups.
3. Application of ANCOVA
3.3 Tests : Comparing diet 1 with diet 2
by Qiao Zhang
These results indicate a significant difference betwee
n diet 1 and diet 2 for those 59 inches tall, and a signi
ficant difference for those 64 inches tall. For those w
ho are tall (i.e., 68 inches), diet 1 and diet 2 are abo
ut equally effective.
3. Application of ANCOVA
3.3 Comparing diets 1 and 2 to the control group
by Qiao Zhang
The difference in weight between diet groups 1 an
d 2 combined and the control group is significant
at different heights.
3. Application of ANCOVA
3.3 Testing to pool slopes
by Qiao Zhang
The test comparing the slopes of diet group 1 vers
us 2 and 3 was significant, and the test comparing
the slopes for diet groups 2 versus 3 was not signif
icant.
We can combine slopes for diet group 2 and 3.
3. Application of ANCOVA
3.3 Overall analysis: diet groups 2 and 3
Pooled slopes model
Unpooled slopes model
by Qiao Zhang
3. Application of ANCOVA
3.3 Overall analysis
by Qiao Zhang
3. Application of ANCOVA
3.3 Summary of Outputs
by Qiao Zhang
• The homogeneity of regression assumption is viol
ated in this data set.
• We then estimated models that have separate s
lopes across groups.
• When comparing the control group to diets 1 a
nd 2, we found the control group weighed mor
e at 3 different levels of height (59 inches, 64 inc
hes and 68 inches).
• When we comparing diets 1 and 2, we found di
et 2 to be more effective at 59 and 64 inches, b
ut there was no difference at 68 inches.