Statistics 846.3(02) Statistics 349.3(02) Lecture Notes

Transcript Statistics 846.3(02) Statistics 349.3(02) Lecture Notes

MANOVA
Multivariate Analysis of Variance
One way Analysis of Variance
(ANOVA)
Comparing k Populations
The F test – for comparing k means
Situation
• We have k normal populations
• Let mi and s denote the mean and standard
deviation of population i.
• i = 1, 2, 3, … k.
• Note: we assume that the standard deviation
for each population is the same.
s1 = s2 = … = sk = s
We want to test
H 0 : m1  m2  m3    mk
against
H A : mi  m j for at least one pair i, j
The F statistic
1
k 1
F
 n x  x 
k
1
N k
2
i
i 1
k nj
i
 x
i 1 j 1
ij  xi 
2
where xij = the jth observation in the i th sample.
i  1,2,, k
and j  1,2,, ni 
ni
xi 
x
j 1
ij
ni
 mean for i th sample i  1,2,, k 
k
N   ni  T otal samplesize
i k1
x
ni
 x
i 1 j 1
N
ij
 Overallmean
The ANOVA table
Source
S.S
SSB   ni xi  x 
k
Between
2
SSW   xij  xi 
nj
i 1 j 1
2
M.S.
F
MSB 
1
k 1
 n x  x 
N  k MSW 
1
N k
 x
k 1
i 1
k
Within
d.f,
k
i 1
k
F
2
i
nj
i 1 j 1
i
MS B
MSW
ij  xi 
2
The ANOVA table is a tool for displaying the
computations for the F test. It is very important when
the Between Sample variability is due to two or more
factors
Computing Formulae:
Compute
ni
1)
2)
Ti   xij  T otalfor sample i
j 1
k
k
G  Ti   xij  Grand T otal
i 1
k
3)
i 1
ni
 x
ij
i 1 j 1
k
5)
i 1 j 1
N   ni  T otalsamplesize
k
4)
ni
2
Ti

i 1 ni
2
The data
• Assume we have collected data from each of
k populations
• Let xi1, xi2 , xi3 , … denote the ni observations
from population i.
• i = 1, 2, 3, … k.
Then
1)
3)
2
Ti G
SSBetween   
N
i 1 ni
k
2)
2
k
ni
k
2
Ti
SSWithin   xij  
i 1 j 1
i 1 ni
2
SSBetween k  1
F
SSWithin N  k 
Anova Table
Source
d.f.
Sum of
Squares
Between
k-1
SSBetween
Mean
Square
MSBetween
Within
N-k
SSWithin
MSWithin
Total
N-1
SSTotal
SS
MS 
df
F-ratio
MSB /MSW
Example
In the following example we are comparing weight
gains resulting from the following six diets
1. Diet 1 - High Protein , Beef
2. Diet 2 - High Protein , Cereal
3. Diet 3 - High Protein , Pork
4. Diet 4 - Low protein , Beef
5. Diet 5 - Low protein , Cereal
6. Diet 6 - Low protein , Pork
Gains in weight (grams) for rats under six diets
differing in level of protein (High or Low)
and source of protein (Beef, Cereal, or Pork)
Diet
Mean
Std. Dev.
x
x2
1
73
102
118
104
81
107
100
87
117
111
100.0
15.14
1000
102062
2
98
74
56
111
95
88
82
77
86
92
85.9
15.02
859
75819
3
94
79
96
98
102
102
108
91
120
105
99.5
10.92
995
100075
4
90
76
90
64
86
51
72
90
95
78
79.2
13.89
5
107
95
97
80
98
74
74
67
89
58
83.9
15.71
792
839
64462 72613
6
49
82
73
86
81
97
106
70
61
82
78.7
16.55
787
64401
Thus
Ti 2 G 2
52722
SSBetween   
 467846
 4612.933
N
60
i 1 ni
2
k ni
k
Ti
2
SSWithin   xij  
 479432 467846 11586
i 1 j 1
i 1 ni
k
SSBetween k  1 4612.933/ 5 922.6
F


 4.3
SSWithin N  k  11586/ 54 214.56
F0.05  2.386 with1  5 and 2  54
Thus since F > 2.386 we reject H0
Anova Table
Source
d.f.
Sum of
Squares
Between
5
4612.933
Mean
Square
922.587
F-ratio
4.3**
(p = 0.0023)
SS
Within
54
11586.000
Total
59
16198.933
214.556
* - Significant at 0.05 (not 0.01)
** - Significant at 0.01
Equivalence of the F-test and the t-test
when k = 2
the t-test
xy
t
1 1
s Pooled

n m
sPooled 
n  1sx2  m  1s2y
nm2
the F-test
k
2
Between
2
Pooled
s
F
s

 n x  x 
2
i
i 1
i
k
2


n

1
s
 i i
i 1
k 1
 k

  ni  k 
 i 1

n1 x1  x   n2 x1  x 

2
2
n1 1s1  n1 1s1 n1  n2  2
2

denominator  s
2

2
pooled
numerator n1 x1  x   n2 x1  x 
2
2

n1 x1  n2 x2 

n1 x1  x   n1  x1 
n1  n2 

2
n1n2
2
x1  x2 

2
n1  n2 
2
n2 x2  x 
2

n1 x1  n2 x2 

 n2  x2 
n1  n2 

2
1 2
nn
2
x1  x2 

2
n1  n2 
2
2
nn n n
2
x1  x2 
n1 x1  x   n2 x2  x  
n1  n 
n1n2
x1  x2 2

n1  n2 
2
2

Hence
F
1
1 1
  
 n1 n2 
1
2
1 2
2
2 1
2
2
x1  x2 
2
x1  x2 
1 1 s
  
 n1 n2 
2
2
Pooled
t
2
Factorial Experiments
Analysis of Variance
• Dependent variable Y
• k Categorical independent variables A, B, C,
… (the Factors)
• Let
–
–
–
–
a = the number of categories of A
b = the number of categories of B
c = the number of categories of C
etc.
The Completely Randomized Design
• We form the set of all treatment combinations
– the set of all combinations of the k factors
• Total number of treatment combinations
– t = abc….
• In the completely randomized design n
experimental units (test animals , test plots,
etc. are randomly assigned to each treatment
combination.
– Total number of experimental units N = nt=nabc..
The treatment combinations can thought to be
arranged in a k-dimensional rectangular block
B
1
1
2
A
a
2
b
C
B
A
• The Completely Randomized Design is called
balanced
• If the number of observations per treatment
combination is unequal the design is called
unbalanced. (resulting mathematically more
complex analysis and computations)
• If for some of the treatment combinations
there are no observations the design is called
incomplete. (In this case it may happen that
some of the parameters - main effects and
interactions - cannot be estimated.)
Example
In this example we are examining the effect of
The level of protein A (High or Low) and
the source of protein B (Beef, Cereal, or
Pork) on weight gains (grams) in rats.
We have n = 10 test animals randomly
assigned to k = 6 diets
The k = 6 diets are the 6 = 3×2 Level-Source
combinations
1. High - Beef
2. High - Cereal
3. High - Pork
4. Low - Beef
5. Low - Cereal
6. Low - Pork
Table
Gains in weight (grams) for rats under six diets
differing in level of protein (High or Low) and s
ource of protein (Beef, Cereal, or Pork)
Level
of Protein High Protein
Low protein
Source
of Protein Beef Cereal Pork Beef Cereal Pork
Diet
1
2
3
4
5
6
73
98
94
90 107
49
102
74
79
76
95
82
118
56
96
90
97
73
104 111
98
64
80
86
81
95 102
86
98
81
107
88 102
51
74
97
100
82 108
72
74 106
87
77
91
90
67
70
117
86 120
95
89
61
111
92 105
78
58
82
Mean
100.0 85.9 99.5 79.2 83.9 78.7
Std. Dev. 15.14 15.02 10.92 13.89 15.71 16.55
Treatment combinations
Source of Protein
Level
of
Protein
Beef
Cereal
Pork
High
Diet 1
Diet 2
Diet 3
Low
Diet 4
Diet 5
Diet 6
Summary Table of Means
Source of Protein
Level of Protein Beef
High
100.00
Low
79.20
Overall
89.60
Cereal
85.90
83.90
84.90
Pork Overall
99.50 95.13
78.70 80.60
89.10 87.87
Profiles of the response relative
to a factor
A graphical representation of the
effect of a factor on a reponse
variable (dependent variable)
Profile Y for A
Y
This could be for an
individual case or
averaged over a group
of cases
This could be for
specific level of
another factor or
averaged levels of
another factor
1
2
3
Levels of A
…
a
Profiles of Weight Gain for
Source and Level of Protein
110
High Protein
Low Protein
Overall
Weight Gain
100
90
80
70
Beef
Cereal
Pork
Profiles of Weight Gain for
Source and Level of Protein
110
Beef
Cereal
Pork
Weight Gain
100
Overall
90
80
70
High Protein
Low Protein
Example – Four factor experiment
Four factors are studied for their effect on Y (luster
of paint film). The four factors are:
1)
Film Thickness - (1 or 2 mils)
2)
Drying conditions (Regular or Special)
3)
Length of wash (10,30,40 or 60 Minutes), and
4)
Temperature of wash (92 ˚C or 100 ˚C)
Two observations of film luster (Y) are taken
for each treatment combination
The data is tabulated below:
Regular Dry
Minutes 92 C
100 C
1-mil Thickness
20
3.4 3.4 19.6 14.5
30
4.1 4.1 17.5 17.0
40
4.9 4.2 17.6 15.2
60
5.0 4.9 20.9 17.1
2-mil Thickness
20
5.5 3.7 26.6 29.5
30
5.7 6.1 31.6 30.2
40
5.5 5.6 30.5 30.2
60
7.2 6.0 31.4 29.6
Special Dry
92C
100 C
2.1
4.0
5.1
8.3
3.8
4.6
3.3
4.3
17.2
13.5
16.0
17.5
13.4
14.3
17.8
13.9
4.5
5.9
5.5
8.0
4.5
5.9
5.8
9.9
25.6
29.2
32.6
33.5
22.5
29.8
27.4
29.5
Definition:
A factor is said to not affect the response if
the profile of the factor is horizontal for all
combinations of levels of the other factors:
No change in the response when you change
the levels of the factor (true for all
combinations of levels of the other factors)
Otherwise the factor is said to affect the
response:
Profile Y for A – A affects the response
Y






1
2
3
Levels of A
…
Levels of B
a
Profile Y for A – no affect on the response
Y






1
2
3
Levels of A
…
Levels of B
a
Definition:
• Two (or more) factors are said to interact if
changes in the response when you change
the level of one factor depend on the
level(s) of the other factor(s).
• Profiles of the factor for different levels of
the other factor(s) are not parallel
• Otherwise the factors are said to be
additive .
• Profiles of the factor for different levels of
the other factor(s) are parallel.
Interacting factors A and B
Y






1
2
3
Levels of A
…
Levels of B
a
Additive factors A and B
Y






1
2
3
Levels of A
…
Levels of B
a
• If two (or more) factors interact each factor
effects the response.
• If two (or more) factors are additive it still
remains to be determined if the factors
affect the response
• In factorial experiments we are interested in
determining
– which factors effect the response and
– which groups of factors interact .
The testing in factorial experiments
1. Test first the higher order interactions.
2. If an interaction is present there is no need
to test lower order interactions or main
effects involving those factors. All factors
in the interaction affect the response and
they interact
3. The testing continues with for lower order
interactions and main effects for factors
which have not yet been determined to
affect the response.
Models for factorial
Experiments
The Single Factor Experiment
Situation
• We have t = a treatment combinations
• Let mi and s denote the mean and standard
deviation of observations from treatment i.
• i = 1, 2, 3, … a.
• Note: we assume that the standard deviation
for each population is the same.
s1 = s2 = … = sa = s
The data
• Assume we have collected data for each of
the a treatments
• Let yi1, yi2 , yi3 , … , yin denote the n
observations for treatment i.
• i = 1, 2, 3, … a.
The model
Note:
yij  mi   yij  mi   mi   ij
 m   mi  m   ij  m  i  ij
 ij  yij  mi
where
1 k
m   mi
k i 1
i  mi  m
a
Note:

i 1
i
0
has N(0,s2) distribution
(overall mean effect)
(Effect of Factor A)
by their definition.
Model 1:
yij (i = 1, … , a; j = 1, …, n) are independent
Normal with mean mi and variance s2.
Model 2:
yij  mi   ij
where ij (i = 1, … , a; j = 1, …, n) are independent
Normal with mean 0 and variance s2.
Model 3:
yij  m  i   ij
where ij (i = 1, … , a; j = 1, …, n) are independent
Normal with mean 0 and variance s2 and
a

i 1
i
0
The Two Factor Experiment
Situation
• We have t = ab treatment combinations
• Let mij and s denote the mean and standard
deviation of observations from the treatment
combination when A = i and B = j.
• i = 1, 2, 3, … a, j = 1, 2, 3, … b.
The data
• Assume we have collected data (n observations)
for each of the t = ab treatment combinations.
• Let yij1, yij2 , yij3 , … , yijn denote the n observations
for treatment combination - A = i, B = j.
• i = 1, 2, 3, … a, j = 1, 2, 3, … b.
The model
Note:
yijk  mij   yijk  mij   mij   ijk
 m   mi  m    m j  m    mij  mi  m j  m    ij
 m   i   j   ij   ijk
where
 ijk  yijk  mij has N(0,s2) distribution
1 a b
1 b
1 a
m   mij , mi   mij and m j   mij
ab i 1 j 1
b j 1
a i 1
i  mi  m,  j  m j  m,
and
 ij  mij  mi  m j  m
The model
Note:
yijk  mij   yijk  mij   mij   ijk
 m   mi  m    m j  m    mij  mi  m j  m    ij
 m   i   j   ij   ijk
where
 ijk  yijk  mij has N(0,s2) distribution
1 a b
1 b
1 a
m   mij , mi   mij and m j   mij
ab i 1 j 1
b j 1
a i 1
i  mi  m,  j  m j  m,
a
Note:

i 1
i
0
by their definition.
Main effects
Interaction
Error
Mean
Model :
Effect
yijk  m   i   j   ij   ijk
where ijk (i = 1, … , a; j = 1, …, b ; k = 1, …, n) are
independent Normal with mean 0 and variance s2 and
a

i 1
i
b

0
j 1
a
and
j
0
b
      
i 1
ij
j 1
ij
0
Maximum Likelihood Estimates
yijk  m   i   j   ij   ijk
where ijk (i = 1, … , a; j = 1, …, b ; k = 1, …, n) are
independent Normal with mean 0 and variance s2 and
a
b
n
mˆ  y   yijk abn
i 1 j 1 k 1
b
n
ˆi  yi  y   yijk bn  y
j 1 k 1
a
n
ˆ j  y j   y   yijk an  y
i 1 k 1
^
 ij  yij  yi  y j  y
n
  yijk n  yi  y j   y
k 1
a
b
n
2
1
2
sˆ 
yijk  yij  


nab i 1 j 1 k 1
2
^ 

1 a b n 
ˆ     
ˆ
ˆ


y

m






ijk
i
j
ij 

nab i 1 j 1 k 1 

 
This is not an unbiased estimator of s2 (usually the
case when estimating variance.)
The unbiased estimator results when we divide by
ab(n -1) instead of abn
The unbiased estimator of s2 is
a
b
n
2
1
2
s 
yijk  yij 


ab  n  1 i 1 j 1 k 1
a
b
n 
^ 

1
ˆ     
ˆ
ˆ


y

m






ijk
i
j
ij 

ab  n  1 i 1 j 1 k 1 

 
1

SS Error  MSError
ab  n  1
where
SS Error    yijk  yij  
a
b
n
i 1 j 1 k 1
2
2
Testing for Interaction:
We want to test:
H0: ()ij = 0 for all i and j, against
HA: ()ij ≠ 0 for at least one i and j.
1
The test statistic
a  1 b  1

MS AB
F

MS Error
MS Error
SS AB
where
^
SS AB    ij    yij   yi  y j   y 
a
b
i 1 j 1
2
a
b
i 1 j 1
2
We reject
H0: ()ij = 0 for all i and j,
If
MS AB
F
 F  (a  1)(b  1), ab(n  1) 
MS Error
Testing for the Main Effect of A:
We want to test:
H0: i = 0 for all i, against
HA: i ≠ 0 for at least one i.
The test statistic
where
MS A
F

MS Error
a
1
SS A
 a  1
MS Error
a
SS A   ˆ    yi  y 
i 1
2
i
i 1
2
We reject
H0: i = 0 for all i,
If
MS A
F
 F  (a  1), ab(n  1) 
MS Error
Testing for the Main Effect of B:
We want to test:
H0: j = 0 for all j, against
HA: j ≠ 0 for at least one j.
The test statistic
where
MS B
F

MS Error
1
SS B
 b  1
MS Error
2
2
ˆ
SSB    j    y j   y 
b
b
j 1
j 1
We reject
H0: j = 0 for all j,
If
MS B
F
 F  (b  1), ab(n  1) 
MS Error
The ANOVA Table
Source
S.S.
d.f.
MS =SS/df
F
A
SSA
a-1
MSA
MSA / MSError
B
SSB
b-1
MSB
MSB / MSError
AB
SSAB
(a - 1)(b - 1)
MSAB
MSAB/ MSError
Error
SSError
ab(n - 1)
MSError
Total
SSTotal
abn - 1
Computing Formulae
a
b
n
Let T   yijk
i 1 j 1 k 1
b
n
a
n
n
Ti   yijk , T j   yijk , Tij   yijk
j 1 k 1
i 1 k 1
a
Then SSTotal
b
n
k 1
2
•••
T
  y 
nab
i 1 j 1 k 1
2
ijk
a
2
i ••
2
•••
a
2
• j•
T
2
•••
T
T
T
SS A  

, SSB  

nab
nab
i 1 nb
i 1 na
2
2
a T2
a
a T2
T
T•••
ij •
• j•
i ••
SS AB    


,
nab
i 1 n
i 1 nb
i 1 na
and SSError  SSTotal  SS A  SSB  SS AB
MANOVA
Multivariate Analysis of Variance
One way Multivariate Analysis
of Variance (MANOVA)
Comparing k p-variate Normal
Populations
The F test – for comparing k means
Situation
• We have k normal populations
• Let mi and  denote the mean vector and
covariance matrix of population i.
• i = 1, 2, 3, … k.
• Note: we assume that the covariance matrix
for each population is the same.
1  2 
 k  
We want to test
H0 : m1  m2  m3 
 mk
against
H A : mi  m j for at least one pair i, j
The data
• Assume we have collected data from each of
k populations
• Let xi1 , xi 2 , , xin denote the n observations
from population i.
• i = 1, 2, 3, … k.
Computing Formulae:
Compute
n
1) Ti   xij  Total vector for sample i
j 1
 n

x
  1ij   T 
 j 1   1i 
 

 n
  
 x  Tpi 
pij
 

j 1
 G1 
k
k
 
2) G   Ti   xij     Grand Total vector
i 1
i 1 j 1
G p 
 
ni
3)
N  kn  Total sample size
 k n 2
  x1ij
 i 1 j 1
k
n
4)  xij xij  
 k n
i 1 j 1

x1ij x pij

 i 1 j 1
5)
 1 k 2
 n  T1i
i 1
k

1

TT

i i  
n i 1
 k
 1  T1iTpi
 n i 1

x1ij x pij 

i 1 j 1



k
n
2

x

pij

i 1 j 1
k
n
1 k

T1iTpi 

n i 1



k
1
2 
T

pi

n i 1
Let
1 k
1

H   TT
GG
i i 
n i 1
N
 1 k 2 G12
T1i 


N
 n i 1

 k
 1 T T  G1G p
1i pi
 n 
N
i 1
G1G p 
1 k
T1iTpi 


n i 1
N 


1 k 2 G12 
T1i 

n i 1
N 
k

2
n
x

x
 1i 1 


i 1


 k
 n  x1i  x1   x pi  x p 
 i 1

n  x1i  x1   x pi  x p  
i 1



k
2

n   x pi  x p 

i 1
k
= the Between SS and SP matrix
k
Let
n
1 k

E   xij xij   TT
i i
n i 1
i 1 j 1
 k n 2 1 k 2
  x1ij  n  T1i
i 1
 i 1 j 1

 k n
1 k

x1ij x pij   T1iTpi
 
n i 1
i 1 j 1
k
n
2

 x1ij  x1i 


i 1 j 1


 k n

 x1ij  x1i  x pij  x pi 
 
i 1 j 1
1 k

x
x

T
T


1ij pij
1i pi 
n
i 1 j 1
i 1



k
n
k
1
2
x

Tpi2 


pij

n i 1
i 1 j 1
k
n

 x1ij  x1i  x pij  x pi 

i 1 j 1



k
n
2
 x pij  x pi  

i 1 j 1

k
n
= the Within SS and SP matrix
The Manova Table
Source
Between
Within
SS and SP matrix
 h11

H 
 h1 p

 e11

E
e1 p

h1 p 


hpp 
e1 p 


e pp 
There are several test statistics for testing
H0 : m1  m2  m3 
 mk
against
H A : mi  m j for at least one pair i, j
1. Roy’s largest root
1  largest eigenvalue of HE1
This test statistic is derived using Roy’s union
intersection principle
2. Wilk’s lambda (L)
E
1
L

H  E HE1  I
This test statistic is derived using the generalized
Likelihood ratio principle
3. Lawley-Hotelling trace statistic
T02  trHE1  sum of the eigenvalues of HE1
4. Pillai trace statistic (V)
V  trH  H  E 
1
Example
In the following study, n = 15 first year
university students from three different School
regions (A, B and C) who were each taking the
following four courses (Math, biology, English
and Sociology) were observed: The marks on
these courses is tabulated on the following slide:
The data
Student
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Math
62
54
53
48
60
55
76
58
75
55
72
72
76
44
89
A
Biology
65
61
53
56
55
52
71
52
71
51
74
75
69
48
71
English
67
75
53
73
49
34
35
58
60
69
64
51
69
65
59
Sociology Student
76
1
70
2
59
3
81
4
60
5
41
6
40
7
46
8
59
9
75
10
59
11
47
12
57
13
65
14
67
15
Educational Region
B
Math
Biology English Sociology Student
65
55
35
43
1
87
81
59
64
2
75
67
56
68
3
74
70
55
66
4
83
71
40
52
5
59
48
48
57
6
61
47
46
54
7
81
77
51
45
8
77
68
42
49
9
82
84
63
70
10
68
64
35
44
11
60
53
60
65
12
94
88
51
63
13
96
88
67
81
14
84
75
46
67
15
Math
47
57
65
41
56
63
43
28
47
42
50
46
74
63
69
C
Biology
47
69
71
64
54
73
62
47
54
44
53
61
78
66
82
English Sociology
98
78
68
45
77
62
68
58
86
64
88
76
84
78
65
58
90
78
79
73
89
89
91
82
99
86
94
86
78
73
Summary Statistics
xA 
SA 
xB 
SB 
63.267
61.600
58.733
60.133
160.638
104.829
-32.638
-47.110
104.829
92.543
-4.900
-22.229
-32.638
-4.900
155.638
128.967
-47.110
-22.229
128.967
159.552
76.400
69.067
50.267
59.200
141.257
155.829
45.100
60.914
155.829
185.924
61.767
71.057
45.100
61.767
96.495
93.371
60.914
71.057
93.371
123.600
xC 
52.733
61.667
83.600
72.400
SC 
156.067
116.976
53.814
35.257
116.976
136.381
3.143
-0.429
53.814
3.143
116.543
114.886
35.257
-0.429
114.886
156.400
x 
15
15
15
xA 
xB 
xC 
45
45
45
64.133
S Pooled 
64.111
64.200
63.911
14
14
14
S A  S B  SC 
42
42
42
152.654
125.878
22.092
16.354
125.878
138.283
20.003
16.133
22.092
20.003
122.892
112.408
16.354
16.133
112.408
146.517
Computations :
n
1) Ti   xij  Total vector for sample i
j 1
 G1 
k
k
 
2) G   Ti   xij     Grand Total vector
i 1
i 1 j 1
G p 
 
ni
Totals
Grand Totals
A
B
C
G
Math
Biology English Sociology
949
924
881
902
1146
1036
754
888
791
925
1254
1086
2886
2885
2889
2876
3) N  kn  Total sample size = 45
 k n 2
  x1ij
 i 1 j 1
k
n
4)  xij xij  
 k n
i 1 j 1

x1ij x pij

 i 1 j 1
=
195718
191674
180399
182865
191674
191321
184516
184542

x1ij x pij 

i 1 j 1



k
n
2

x

pij

i 1 j 1
k
n
180399
184516
199641
193125
182865
184542
193125
191590
 1 k 2
 n  T1i
i 1
k

1

TT

i i  
n i 1
 k
 1  T1iTpi
 n i 1
5)
=
189306.53
186387.13
179471.13
182178.13
186387.13
185513.13
183675.87
183864.40
1 k

T1iTpi 

n i 1



k
1
2 
T

pi

n i 1
179471.13
183675.87
194479.53
188403.87
182178.13
183864.40
188403.87
185436.27
Now
1 k
1

H   TT
GG
i i 
n i 1
N
=
4217.733333
1362.466667
-5810.066667
-2269.333333
1362.466667
552.5777778
-1541.133333
-519.1555556
-5810.066667
-1541.133333
9005.733333
3764.666667
-2269.333333
-519.1555556
3764.666667
1627.911111
= the Between SS and SP matrix
k
Let
n
1 k

E   xij xij   TT
i i
n i 1
i 1 j 1
 k n 2 1 k 2
  x1ij  n  T1i
i 1
 i 1 j 1

 k n
1 k

x1ij x pij   T1iTpi
 
n i 1
i 1 j 1
=
6411.467
5286.867
927.867
686.867
5286.867
5807.867
840.133
677.600
1 k

x
x

T
T


1ij pij
1i pi 
n
i 1 j 1
i 1



k
n
k
1
2
x

Tpi2 


pij

n i 1
i 1 j 1
k
n
927.867
840.133
5161.467
4721.133
= the Within SS and SP matrix
686.867
677.600
4721.133
6153.733
Using SPSS to perform MANOVA
Selecting the variables and the Factors
The output
Multivariate Testsc
Effect
Intercept
High_School
Pillai's Trace
Wilks' Lambda
Hotelling's Trace
Roy' s Larg est Root
Pillai's Trace
Wilks' Lambda
Hotelling's Trace
Roy' s Larg est Root
Value
.984
.016
60.194
60.194
.883
.161
4.947
4.891
F
586.890a
586.890a
586.890a
586.890a
7.913
14.571a
23.501
48.913b
Hypothesis df
4.000
4.000
4.000
4.000
8.000
8.000
8.000
4.000
Error df
39.000
39.000
39.000
39.000
80.000
78.000
76.000
40.000
a. Exact statistic
b. The statistic is an upper bound on F that yields a lower bound on the significance level.
c. Design: Intercept+High_School
Sig .
.000
.000
.000
.000
.000
.000
.000
.000
Univariate Tests
Tests of Between-Subj ects Effects
Source
Corrected Model
Intercept
High_School
Error
Total
Corrected Total
Dependent Variable
Math
Biology
English
Sociology
Math
Biology
English
Sociology
Math
Biology
English
Sociology
Math
Biology
English
Sociology
Math
Biology
English
Sociology
Math
Biology
English
Sociology
Type III Sum
of Squares
4217.733a
552.578b
9005.733c
1627.911d
185088.800
184960.556
185473.800
183808.356
4217.733
552.578
9005.733
1627.911
6411.467
5807.867
5161.467
6153.733
195718.000
191321.000
199641.000
191590.000
10629.200
6360.444
14167.200
7781.644
a. R Squared = .397 (Adjusted R Squared = .368)
b. R Squared = .087 (Adjusted R Squared = .043)
c. R Squared = .636 (Adjusted R Squared = .618)
d. R Squared = .209 (Adjusted R Squared = .172)
df
2
2
2
2
1
1
1
1
2
2
2
2
42
42
42
42
45
45
45
45
44
44
44
44
Mean Square
2108.867
276.289
4502.867
813.956
185088.800
184960.556
185473.800
183808.356
2108.867
276.289
4502.867
813.956
152.654
138.283
122.892
146.517
F
13.815
1.998
36.641
5.555
1212.473
1337.555
1509.241
1254.515
13.815
1.998
36.641
5.555
Sig .
.000
.148
.000
.007
.000
.000
.000
.000
.000
.148
.000
.007