Relationship between variables

Download Report

Transcript Relationship between variables

Chapter 11
Goodness of Fit Tests
1
Categorical
Observations fall into one of a number of
mutually exclusive categories
Binomial distribution ( AB)
multinomial distribution (ABC…)
chi-square distribution
2
Goodness of Fit Tests
 To determining whether a sample could have been
drawn from a population with a specified
distribution
 Based on comparison of observed frequencies and
expected frequencies under the specified condition.
3
Concepts
 The Binomial Test
 The Chi-Square Test for Goodness
of Test
 Kolomogorov-Smirnov Test
 The Chi-Square Test for r x k
Contingency Tables
4
Binomial Test
 For data that can be grouped into exactly two
categories
e.g. male versus female
diseased versus healthy
 To determine whether the sample proportions of the
two categories are what would be expected with a
given binomial distribution
5
Binomial Test
Assumptions
Independent random sample of size n
Two mutually exclusive categories
Actual proportion: p, corresponding
hypothesized value: p0
Hypotheses
H 0 : p  p0 and H a : p  p0
H 0 : p  p0 and H a : p  p0
H 0 : p  p0 and H a : p  p0
6
Binomial Test
 Test statistic X
 the number of observations falling into the first
category (successes), follow a binomial random
variable with B (n, p0)

x: observed value of X
P ( X  x )   / 2, reject H 0 : p  p0
P( X  x)   ,
reject H 0 : p  p0
P( X  x)   ,
reject H 0 : p  p0
7
Binomial Test
 Cumulative binomial distribution
Table C1 (p374)
n x
n x
F (d )  P ( X  d )     p (1  p)
x 0  x 
d
p( x )
 Note that
P( X  d )  1  P( X  d  1)
8
Binomial Test
 15 of 20 trees have a 1987 growth ring that is less than half the size
of other growth rings. Do these support the claim that the severe
drought of 1987 in the U.S. have affected the growth rate of the
majority of the established trees?
 Hypothesis:
H0 : p  0.5 and Ha : p  0.5
 Test statistic: X=15, given α=0.05, p0=0.5
P ( X  15)  1  P( X  14)  1  0.9793  0.0207  
 Inferences : the majority of trees have growth rings for
1987 less than half their usual size.
9
One sample p test
• In southeastern Queensland, Pardalotes race A 70%
• 18 pardalotes: race A 10 vs race B 8
Hypothesis:
H0 : p  0.7
Test statistic: X=10, given α=0.05, p0=0.7
P ( X  10)  1  P ( X  9)  1  0.0596  0.9404   / 2
P ( X  10)  0.1407   / 2
inference: no change in the population
proportions of the two races.
10
Normal approximation to Binomial distribution
A r.v. X ~ B(n,p) has mean m = np and variance s 2 = np(1-p). If
np(1-p) > 3, then X ~ N(m ,s 2)
But it should be noted that
 n i
x  0.5  np
n i
FB ( x)  P( X  x)  i 0   p (1  p)  FN (
)
np(1  p)
i
x
One sample p test
• Pardalotes race A 70%
• 180 pardalotes: race A 100 vs race B 80
Hypothesis:
H0 : p  0.7
Test statistic: X=100, given α=0.05, p0=0.7
P ( X  100)  1  P ( z 
99.5  180  0.7
)  1  P ( z  4.31)  1
180  0.7  0.3
100.5  180  0.7
P ( X  100)  P ( z 
)  P ( z  4.51)  0
180  0.7  0.3
inference: a significant change in the population
proportions of the two races.
12
Concepts
 The Binomial Test
 The Chi-Square Test for Goodness
of Test
 Kolomogorov-Smirnov Test
 The Chi-Square Test for r x k
Contingency Tables
13
Chi-Square test
 It is used when there are several categories.
 It compare the observed frequencies of a discrete,
ordinal, or categorical data set with those of some
theoretically expected distribution (e.g. binomial,
multinomial.)
 It tests whether an observed set of data agree with
expected values based on some hypothesis, H0.
 The test gives us a Probability of getting such a value if
the H0 applies to our data.
14

2
Test for goodness of fit
 Assumptions
Independent random sample of size n
A set of k mutually exclusive categories
Specified the expected freq for each category
Ei  5
 Hypotheses
H0: the observed frequency distribution is the same as
the hypothesized frequency distribution
Ha: the observed and hypothesized frequency
distributions are different.
15
Test Statistic and Theory
Test statistic
Observed frequency
Expected frequency
2
(
O

E
)
i
2   i
Ei
i 1
k
The difference between
the observed and
expected frequencies
Observed and expected freq equal 
 2small
Right tailed, approximate chi-square distribution when
v  df  (number of categories )  1
H0 is true, where
Table C.5. P385
16
Why chi-square?
• Y~B(n,p)
success Y
(p),
failure
n-Y (q)
• n larger enough Y~N(np,npq)
z
Y  np
npq
2
Y  np 
2
z 
npq
 
2
Y  np 
npq
2

Y  np 
np
2

N (0,1)
Observed frequency
 2 (1)
Y  np 
nq
2

Y  np 
np
2

 n  Y  nq 
nq
Observed frequency
2
17
Why chi-square?
•
•
•
•
Multinomial
Output
A1 A2 …… Ak
Probability p1 p2 …... Pk
Observed
Y1 Y2 …… Yk
 p 1
Y  n
i
i
i
i
k
Yi  npi 
i 1
npi
Q
2
  2 ( k  1)
18
Example: an F2 population

Mirabilis jalapa, a self-pollination plant

Consider an F2 population in which a single incomplete
dominant gene is segregating.

The numbers of the 3 genotypes AA, Aa, aa are
counted and we want to know if they are segregating
according to Mendel’s law.

i.e. we are testing the null hypothesis (H0) that we
have a 1:2:1 ratio.
19
Analysis

Genotype
AA
Aa
aa
red
pink
white
Total

Expected freqs.
¼
½
¼
1.0

Observed Nos.(O)
55
132
53
240

Expected Nos.(E)
60
120
60
240

(O-E)
-5
12
-7
0
20
The extrinsic model
 Example: are the data of 240 progeny of self-pollination four-o’clock
reasonably consistent with the Mendelian model?
category
Oi
Ei
(Oi  Ei )2
Ei
①
Hypotheses:

H0: the data are consistent with a
Mendelian model.

Ha: the data are inconsistent with a
Mendelian model.
red
¼
55
60
0.42
pink
½
132
120
1.2
white
¼
53
60
0.82
②
Calculate expected frequencies
240
240
2.44
③
Test statistic:
total
(Oi  Ei )2
2
 
 0.42  1.20  0.82  2.44   0.05,[2]
 5.99
Ei
i 1
3
2
④
Conclusion: The data support for the Mendelian genetic model.
21
The intrinsic model
 Example: does the number of landfalling hurricanes/year in 19001997 in U.S. follow a Poisson distribution?
①
②
Hurricanes/year
xi
0
1
2
3
4
5
6
Frequency
fi
18
34
24
16
3
1
2
Hypotheses:

H0: the annual number of U.S. landfalling hurricanes follow a Poisson
distribution.

Ha: the annual number of U.S. landfalling hurricanes is inconsistent
with a Poisson distribution
Estimate parameters:
1 6
159
mˆ   xi f i 
 1.622 hurricanes / year
n i 0
98
22
e  m ( m ) x e 1.622 (1.622) x
f ( x) 

x!
x!
e 1.622 (1.622)0
f (0)  P (0 hurricanes/year) 
 0.198
0!
count
probability
Oi
Ei
0
0.198
18
19.40
1
0.320
34
31.36
count
Oi
Ei
(Oi  Ei )2
Ei
2
0.260
24
25.48
0
18
19.40
0.101
3
0.140
16
13.72
1
34
31.36
0.222
4
0.057
3
5.59
2
24
25.48
0.086
5
0.018
1
1.76
3
16
13.72
0.379
>=6
0.007
2
0.69
>=4
6
8.04
0.518
E0  98  f (0)  98  0.198  19.40
<5
(Oi  Ei )2
 
 0.101  0.222 
Ei
i 0
4
2
2
 0.518  1.306   0.05,[3]
 7.81
23
Concepts
 The Binomial Test
 The Chi-Square Test for Goodness
of Test
 Kolomogorov-Smirnov Test
 The Chi-Square Test for r x k
Contingency Tables
24
Kolomogorov-Smirnov Test
To determine whether a sample could
come from a population with a particular
specified distribution
 -square: for discrete or categorical data
2
Kolomogorov-Smirnov test: for random samples
from continuous (Normal) or discontinuous (Binomial)
population.
25
 The arm lengths (radii) of 67 Edmonds sea stars at Polka Point
2.5
5.0
6.0
6.5
7.0
8.0
4.0
5.0
6.0
6.5
7.0
8.0 10.0
4.0
5.0
6.0
7.0
7.5
8.0 10.0
4.5
5.5
6.5
7.0
7.5
8.5 10.5
4.5
5.5
6.5
7.0
7.5
8.5
4.5
5.5
6.5
7.0
7.5
8.5 13.0
4.5
5.5
6.5
7.0
7.5
8.5 13.5
4.5
6.0
6.5
7.0
7.5
8.5
5.0
6.0
6.5
7.0
8.0
9.0
5.0
6.0
6.5
7.0
8.0
9.0
467.5
 6.98 cm
67
467.52
3525.25 
67  3.988 cm 2
s2 
67  1
s  2.00 cm
9.5
X
11.0
H ist og ra m o f Ra di u s
10
Sufficiently
close to
normal
distribution
67
X
i 1
67
i
 2.5 
2
2
X

2.5

 i
i 1
 13.5  467.5
Frequency
8
6
4
2
 13.52  3525.25
0
2
4
6
8
Radius(cm)
10
12
14
26
Kolomogorov-Smirnov Test
 Assumptions
Random sample of size n with some unknown
distribution function G(x)
Specified the hypothesized distribution as F(x)
 Hypotheses
H0: G(x)=F(x) for all x
Ha: G(x)≠F(x) for at least one value of x
27
Intervals of
data range
Kolomogorov-Smirnov
Test  Statistic:
Observed
CDF
expected
CDF
The largest absolute value of
X
Cum.freq.
S(x)
zx
F(zx)
|S(x)-F(zx)|
the differences between the
2.75
1
0.015
-2.12
0.017
0.002
cumulative distribution of the
3.75
1
0.015
-1.62
0.053
0.038
sample and the expected
4.75
8
0.119
-1.12
0.131
0.012
distribution.
5.75
17
0.254
-0.62
0.268
0.014
6.75
32
0.478
-0.12
0.452
0.026
7.75
48
0.716
0.39
0.652
0.064
8.75
58
0.866
0.89
0.813
0.053
9.75
61
0.910
1.39
0.918
0.008
K  max S ( x )  F ( x )  0.064
10.75
64
0.955
1.89
0.971
0.016
c.v .  0.1632 for   0.05
11.75
65
0.970
2.39
0.992
0.022
+∞
67
1.000
+∞
1.000
0.000
 K0, accept H0
Table C14
The sea star radii
follow N(6.98,3.988)
28
 Do the following data support that the number of males in a litter is a
binomial random variable with p=0.5?
No. of males
0
1
2
3
4
5
6
Frequency
3
16
53
78
53
18
0
 H0: the number of males in each litter is a binomial random variable with
p=0.5 and n=6.
K  max S ( x )  F ( x )  0.028
X
freq
S(x)
F(x)
|S(x)-F(x)|
0
3
0.014
0.016
0.002
1
19
0.086
0.109
0.023
2
72
0.326
0.344
0.018
3
150
0.679
0.656
0.022
4
203
0.919
0.891
0.028
5
221
1.000
0.984
0.016
6
221
1.000
1.000
0.000
c.v . 
1.36
 0.0914 for   0.05
221
The number of males and females
is described by a binomial
distribution with p=0.5.
! advantage of KS test to chi-square
test: needn’t to calculate the
density function for the binomial.
29
n!
6!
6!0.56
11.25
x
n x
x
6 x
f ( x) 
p (1  p) 
0.5 (1  0.5) 

x !(n  x )!
x !(6  x )!
x !(6  x )! x !(6  x)!
f (0)  P (male/litter) 
11.25
 0.016
0!(6  0)!
count
probability
Oi
Ei
0
0.016
3
3.45
1
0.094
16
20.72
count
Oi
Ei
(Oi  Ei )2
Ei
2
0.234
53
51.80
1
19
24.17
1.107
3
0.313
78
69.06
2
53
51.80
0.028
4
0.234
53
51.80
3
78
69.06
1.157
5
0.094
18
20.72
4
53
51.80
0.028
>=6
0.016
0
3.45
>=5
18
24.17
1.576
E0  221 f (0)  221 0.016  3.45
<5
(Oi  Ei )2
 
 1.107  0.028 
Ei
i 0
5
2
2
 1.576  3.895   0.05,[4]
 9.49
30
Concepts
 The Binomial Test
 The Chi-Square Test for Goodness
of Test
 Kolomogorov-Smirnov Test
 The Chi-Square Test for r x k
Contingency Tables
31
r x k Contingency tables
 r: the number of categories
k: the number of populations or treatments
Oij: the number of observations of category i in
population j
 To test whether the distribution of a categorical variable is
the same in two or more populations
 To test whether there is relationship or dependency
between the row and column variables
32
 Is the shell species independent to whether it is occupied?
Species
Occupied
Empty
Total
Austrocochlea
47
42
89
Bembicium
10
41
51
Cirithid
125
49
174
182
132
314
Total
cj
ri
 The expected number of observations for each category based on the
assumption that the row and column variables are independent:
Species
Occupied
Empty
Total
Austrocochlea
51.59
37.41
89
Bembicium
29.56
21.44
51
Cirithid
100.85
73.15
174
Total
182
132
314
Fraction of row i in
the entire sample
Eij 
rc
ri c j
 N  i j
N N
N
Fraction of column j in
the entire sample
33
Test
for
contingency

table
 Assumptions (two different sampling method)
2
 A random sample, categorized in two ways
 k independent random samples, a categorical variable
 Hypotheses
 H0: the row and column variables are independent
 Ha: the row and column variables are not independent
 H0: the distribution of the row categories is the same in all k
populations
 Ha: the distribution of the row categories is not the same in all k
populations
34
Test statistic and Theory
 Test statistic
 For large sample, n≥40, Eij≥5
2  
ij
(Oij  Eij )2
Eij
 Observed and expected freq equal 

 2 small
  df  (r  1)(k  1)
 For small sample
2  
ij
( Oij  Eij  0.5)2
Eij
35
Solution to the example
H0: The status (occupied or not) is independent of the shell species
Ha: The status is not independent of the shell species
observed
 
2
Species
Occupied
Empty
Total
Austrocochlea
47
42
89
Bembicium
10
41
51
Cirithid
125
49
174
Total
182
132
314
(Oij  Eij )2
ij
Eij
(47  51.59)2


51.59
 45.52
(49  73.15)2

73.15
df  (3  1)  (2  1)  2
 22  5.99
expected
Species
Occupied
Empty
Total
Austrocochlea
51.59
37.41
89
Bembicium
29.56
21.44
51
Cirithid
100.85
73.15
174
Total
182
132
314
Reject H0,
There is an association
between species of shell
and those that hermit
crabs occupy.
36
2x2 Contingency Tables
 A special case of
contingency table
Category B
 Employ a correction factor
for discontinuity
Category
A
1
2
Total
1
n11
n12
n1.
2
n21
n22
n2.
Total
n.1
n.2
n..
Correction for discontinuity
 
2
ij
( Oij  Eij  0.5)
Eij
2

n.. 2
n..  ( n11n22  n12n21  )
2

n1.n2.n.1n.2
with v  df  (2  1)  (2  1)  1
37
Two sample proportion
H 0 : p1  p2
pˆ 1  n11 / n1
group
pˆ 2  n12 / n2
s pˆ 1  pˆ 2 
1 1
ˆ ˆ(  )
pq
n1 n2
z  (| pˆ 1  pˆ 2 | (
pˆ 
success
n11  n12
n
1
1
1 1
ˆ ˆ(  )

)) / pq
2n1 2n2
n1 n2
1
2
Total
1
n11
n12
n1.
0
n21
n22
n2.
Total
n.1
n.2
n..
N (0,1)
38
 (df  1)  z
2
z  (| pˆ 1  pˆ 2 | (
2
1
1
1
1
ˆ
ˆ

)) / pq(

)
2n1 2n2
n1 n2
n11 n12
1
1
n11  n12 n21  n22
1
1
 (|

| (

)) /

(

)
n1 n2
2n1 2n2
n
n
n1 n2
 (| n11n2  n12 n1 | 
n1  n2
n n
n  n22
) / 11 12  21
 ( n1  n2 ) n1n2
2
n
n
 (| n11n22  n12 n21 | 
n
n n n n
) / 1 2 1 2
2
n
 2
39
Exact Fisher test
category B
expected value of one cell <5
category
A
1
2
Total
1
a
b
a+b
2
c
d
c+d
Total
a+c
b+d
n
( a  b )!( c  d )!( a  c )!( b  d )!
Pr( a, b, c , d ) 
n! a ! b ! c ! d !
a
p   Pr( i )
2
4
1
5
0
6
4
8
5
7
6
6
i 0
40
Enrichment analysis
•
•
•
•
Gene list
batch annotation
gene-GO term enrichment analysis
highlight the most relevant GO terms
associated with a given gene list .
EASE Score, a modified Fisher Exact
P-Value
• In human genome
background (30,000 gene
total), 40 genes are involved
in p53 signalling pathway.
• Fisher Exact P-Value
= 0.008
• EASE Score is more
conservative to exame the
situation. EASE Score
= 0.06 (using 3-1 instead of
3).
User
genes
Genome
In pathway 3-1
40
Not in
pathway
297
29960
Total
300
30000
2
41
1
42
0
43
298
29959
299
29958
300
29957
2x2 Contingency Tables
44
2x2 Contingency Tables
45
46
Extreme Hypothetical Example of
Population Stratification
• Interested in the LD between allele A and
disease.
case
control
A
a
101
20
20
101
OR= 25.5
case
A
a
control
100
10
10
1
OR= 1
case
A
a
control
1
10
10
100
OR= 1
Population Stratification
• Confounding bias that may occur if one’s sample is
comprised of sub-populations with different:
– allele frequencies (); and
– disease rates (RpR)
Sub-population

Gene
RpR
Disease
• Cases are more likely than controls to arise from the subpopulation with the higher baseline disease rate.
• Further, cases and controls will have different allele
frequencies regardless of whether the locus is causal.
Example of Population Stratification
Cardon & Palmer, 2003
50
Partitioning the Chi-Square Test
 Repeat Mendel’s classic experiments with garden peas, Pisum sativum.
Category
Oi
Ei
(Oi  Ei )2
Ei
A_B_
180
157.5
3.214
180 tall, green pods (A_B_)
A_bb
30
52.5
9.643
30 tall, yellow pods (A_bb)
aaB_
60
52.5
1.071
60 short, green pods (aaB_)
aabb
10
17.5
3.214
10 short, yellow pods (aabb)
Total
280
280
17.143
Tall green pods
Tall green pods

(AaBb)
(AaBb)
 H0: The result are in a 9:3:3:1 phenotypic ratio.
 Ha: The result deviate significantly from a 9:3:3:1 ratio.
 2  17.143  32  7.81
Reject H0
51
Test the gene locus for plant
height produced offspring
in a 3A_:1aa ratio
H0 : 3 :1
Test the gene locus for seed
pod produced offspring in a
3B_:1bb ratio
H0 : 3 :1
Category
Oi
Ei
(Oi  Ei )2
Ei
Tall
(A_B_ and A_bb)
210
210
0
Short
(aaB_ and aabb)
70
70
0
Total
280
280
0
Category
Oi
Ei
(Oi  Ei )2
Ei
Green
(A_B_ and aaB_)
240
210
4.286
Yellow
(A_bb and aabb)
40
70
12.857
Total
280
280
17.143
B_
bb
Total
A_
180
30
210
aa
60
10
70
Total
240
40
280
Test the two gene loci, A
and B are independent of
each other
H0 : independent
 2  0  12  3.84
Accept H0
 2  17.143  12  3.84
Reject H0
Accept H0
unnecessary
280  [ 180  10  30  60  280 2]2
52
 
 0.04  12  3.84
210  70  240  40
2
Conclusion
 From the overall chi-square test:
 The data deviate significantly form a 9:3:3:1 ratio.
 From the 3 single degree of freedom chi-square test:
 Plant heights in offspring follow Mendel Law.
 The seed pod color does not follow Mendel Law.
 The Two loci are behaving independently.
 The discrepancy in the overall chi-square test is due to a
distortion in the green to yellow seed pod ratio.
 Reasons: Differential survival of the two phenotypes;
53
To be summary

The Binomial Test

The Chi-Square Test for Goodness of Test


The Extrinsic Model

The Intrinsic Model
Kolomogorov-Smirnov Test

Normal

Binomial
54
Non parameter statistics
• Sign test
• Rank test
55
Sign test
• One sample
H0 : M X  M0
Ha : M X  M0
( Xi  M0 )
( Xi  M0 )
S   
and S  
Xi M0 | X i  M 0 |
Xi M 0 | X i  M 0 |
• Two paired samples
( X i  Yi )
S   
X i Yi | X i  Yi |
H 0 : M X Y  0
H a : M X Y  0
( X i  Yi )
and S  
X i Yi | X i  Yi |
Let n’ = n – {the number of Xi = M0 }, then S- (or S+) ~
B(n’,1/2)
56
McNemar’s test
• Two paired samples
(0,1)
case
control
0
1
0
a
b
1
c
d
( X i  Yi )
( X i  Yi )
S   
 b and S  
c
X i Yi | X i  Yi |
X i Yi | X i  Yi |
Let n’ = b+c, then S- (or S+) ~ B(n’,1/2)
b+c>20
N (m  np  n / 2,s 2  npq  n / 4)
(b  c ) 2
T
bc
z
b+c>20, T ~ X2(df=1)
b  (b  c ) / 2
bc

 T
(b  c ) / 4
bc
57
Kendall Correlation Coefficient τ
 Direct compare the n observations with each other
 Concordant (C):
correlation
( X i  X j )(Yi  Yj )  positive
0
 Discordant (D):
( X i  X j )(Yi  Yj )  0negative correlation
 Tie (E):
( X i  X j )(Yi  Yj )  0
Difference between the # of
concordant and discordant pairs
CD
2(C  D )


n( n  1) 2
n( n  1)
1     1
The total # of
comparisons
=C+D+E
58
Public bias
59
Rank test
• One sample or two paired samples
Wilcoxon singed-rank test
• Two independent sample
Wilcoxon rank-sum test (Mann-Whitney U test)
• Multiple samples (One way ANOVA)
Kruskal-Wallis test
• Multiple paired samples (two way ANOVA)
Friendman k-sample test
• Correlation and regression
Spearman correlation
60
Data
One Variable
Two Variables
Three Variables
61
Data
One Variable
Binomial
Bernoulli
Two Variables
Poisson
Ranked
Three Variables
Normal
Chi-square Test
Infer σ2
Infer p
Infer μ
Sign Test
Wilcoxon
Singed-Rank
t Test
Infer μ
Large sample Normal approximation
X m
s/ n
N (0,1)
62
Data
One Variable
Three Variables
Two Variables
X Variable
01
01
Categories
2×2 Table
Two Sample
Chi-Square
p Test
Ranked
2×K Table
R×C Table
Chi-Square
Chi-Square
Ranked
Wilcoxon
Rank Sum
KruskalWallis Test
Spearman
Correlation
Normal
Two Sample
t Test
One Way
ANOVA
Spearman
Regression
Categories
Normal
Y Variable
Pearson
Regression
63
2×2 Table Chi-Square
Two Sample p Test
a
c
b
d
a  b  n1
c  d  n2
ac
bd
n1  n2
pˆ
Z
pˆ 1 
n  n2 2
c
a
)
| 1

2n1n2
n1 n2
Z2 
n  n2
bd
ac
 1

n1n2
n1  n2 n1  n2
(|
N ( p0 , p0q0 / n)
1
1

)
2n1 2n2
1
1
ˆ ˆ(  )
pq
n1 n2

| pˆ 1  pˆ 2 |  (
a
c
ac
; pˆ 2  ; pˆ 
n1
n2
n1  n2
N (0,1)
n1  n2 2
)
2
n1n2 ( a  c )( b  d )
n1  n2
(| an2  cn1 | 
n
n(| ad  bc |  ) 2
2
  df2 1

( a  b )( c  d )( a  c )( b  d )
64
Two sample
T test
F test
Infer σ2
Unpaired
t test
Welch’s
approximate
t test
65
Data
One variable
Three variables
(block, paired)
Two variables
X variable
01
01
Categories
Ranked
Normal
McNemar
Test
Categories
Y variable
Ranked
Wilcoxon
Signed-Rank
Friedman kSample Test
Normal
Two Sample
Paired t Test
Two Way
ANOVA
66
Data
One variable
Three variables
(quantity)
Two variables
X variable
01
Categories
Ranked
Normal
01
Categories
Y variable
Partial
Spearman
Ranked
Normal
Covariance
Analysis
Partial Pearson
Partial Regression
67
Test for differences
among A factor
treatments
H0
H 0 : mi .. ' s all equal
H0
FA 
MS A
,  1  a  1,  2  ab( n  1)
MS E
Ha
Test for differences
among B factor
treatments
H 0 : ( )ij  0 for all i , j
H a : ( )ij  0 for some i , j
Test for
interaction
between factors
FAB 
MS AB
MS E
H 0 : m. j . ' s all equal
Ha
 1  (a  1)(b  1),  2  ab( n  1)
FB 
DONE
Mean
separation
techniques
DMRT :
SSR p  rp
MS E
an
MS B
,  1  b  1,  2  ab( n  1)
MS E
MS E
SSR p  rp
bn
Look for the best
combinations of A
factor and B factor
DMRT : SSRp  rp
MS E
n
68
The End
69