Frequency distributions: Testing of goodness of fit and

Download Report

Transcript Frequency distributions: Testing of goodness of fit and

Frequency distributions:
Testing of goodness of fit and
contingency tables
Chi-square statistics
• Widely used for nominal data’s analysis
• Introduced by Karl Pearson during 1900
• Its theory and application expanded by
him and R. A. Fisher
• This lecture will cover Chi-square test,
G test, Kolmogorov-Smirnov goodness
of fit for continuous data
The
2

test:
2 =  (observed freq. - expected freq.)2/ expected freq.
• Obtain a sample of nominal scale data and to
infer if the population conforms to a certain
theoretical distribution e.g. genetic study
• Test Ho that the observations (not the
variables) are independent of each other for
the population.
• Based on the difference between the actual
observed frequencies (not %) and the
expected frequencies
The 2 test:
2 =  (observed freq. - expected freq.)2/ expected freq.
• As a measure of how far a sample distribution
deviates from a theoretical distribution
• Ho: no difference between the observed and
expected frequency (HA: they are different)
• If Ho is true: the difference and Chi-square
 SMALL
• If Ho is false: both measurements  Large
For Questionnaire
Example (1)
• In a questionnaire, 259 adults were asked
what they thought about cutting air pollution
by increasing tax on vehicle fuel.
• 113 people agreed with this idea but the rest
disagreed.
• Perform a Chi-square text to determine the
probability of the results being obtained by
chance.
For Questionnaire
Agree
Disagree
Observed
113
259 -113 = 146
Expected 259/2 = 129.5
259/2 = 129.5
Ho: Observed = Expected
2 = (113 - 129.5)2/129.5 + (146 - 129.5)2 /129.5
= 2.102 + 2.102 = 4.204
df = k - 1 = 2 - 1 = 1
From the Chi-square (Table B1 in Zar’s book)
2 ( = 0.05, df = 1)= 3.841  for 2 = 4.202, 0.025<p<0.05
Therefore, rejected Ho. The probability of the results
being obtained by chance is between 0.025 and 0.05.
For Genetics
Practical (1)
• Calculate the Chi-square of data consisting of
100 flowers to a hypothesized color ratio of 3:1
(red: green) and test the Ho
• Ho: the sample data come from a population
having a 3:1 ratio of red to green flowers
• Observation: 84 red and 16 green
• Expected frequency for 100 flowers:
– 75 red and 25 green
Please Do it Now
For Genetics
Practical (2)
• Calculate the Chi-square of data consisting of
100 flowers to a hypothesized color ratio of 3:1
(red: green) and test the Ho
• Ho: the sample data come from a population
having a 3:1 ratio of red to green flowers
• Observation: 67 red and 33 green
• Expected frequency for 100 flowers:
– 75 red and 25 green
Please Do it Now
For Genetics
For > 2 categories
• Ho: The sample of Drosophila from a population
having 9: 3: 3: 1 ratio of pale body-normal wing
(PNW) to pale-vestigial wing (PVW) to dark-normal
wing (DNW) to dark-vestigial wing (DVW)
• Student’s observations in the lab:
PNW
300
PVW
77
DNW
89
DVW
36
Calculate the chi-square and test Ho
Total
502
For Genetics
• Ho: The sample of Drosophila (F2) from a population having 9: 3: 3: 1
ratio of pale body-normal wing (PNW) to pale-vestigial wing (PVW) to
dark-normal wing (DNW) to dark-vestigial wing (DVW)
PNW PVW DNW DVW Total
Observed
300
77
89
36
502
Exp. proportion 9/16
3/16
3/16
1/16
1
Expected
282.4 94.1
94.1
31.4
502
O-E
17.6
-17.1
-5.1
4.6
0
(O - E)2
309.8 292.4 26.0
21.2
(O - E)2/E
1.1
3.1
0.3
0.7
2 = 1.1 + 3.1 + 0.3 + 0.7 = 5.2
df = 4 -1 = 3
2 ( = 0.05, df = 3)= 7.815  for 2 = 5.20, 0.25<p<0.10
Therefore, accept Ho.
For Questionnaire
Cross Tabulation or Contingency Tables
– Further examination of the data on the opinion on increasing
fuel to cut down air pollution (example 1):
– Ho: the decision is independent of sex
Males
Females
Agree
13 (a)
100 (b)
Disagree
116 (c)
30 (d)
Expected frequency for cell b = (a + b)[(b + d)/N]
Males
Agree
13
113(129/259)=56.28
Disagree
116
146(129/259)=72.72
n
129
Females
n
100
113
113(130/259)= 56.72
30
146
146(130/259)= 73.28
130
259
Cross tabulation or contingency tables:
– Ho: the decision is independent of sex
Males
Females
Agree
13
100
56.28
56.72
Disagree
116
30
72.72
73.28
n
129
130
n
113
146
259
2 = (13 - 56.28)2/56.28 + (100 - 56.72)2/56.72 + (116 - 72.72) 2/72.72 +
(30 - 73.28)2/73.28
= 117.63
df = (r - 1)(c - 1) = (2 - 1)(2 - 1) = 1
2 ( = 0.05, df = 1)= 3.841  p<0.001
Therefore, reject Ho and accept HA that the decision is dependent of sex.
Quicker method for 2 x 2 cross tabulation:
Class A
Class B
n
State 1
a
b
a+b
State 2
c
d
c+d
n
a+c
b+d
n = a + b + c +d
2 = n (ad - bc)2/(a + b)(c + d)(a + c)(b + d)
Males
Females
Agree
13
100
113
Disagree
116
30
146
129
130
259
2 = 259(13  30 - 116  100)2/(113)(146)(129)(130) = 117.64
2 (
= 0.05, df = 1)=
3.841  p<0.001; Therefore, rejected Ho.
Yates’ continuity correction:
• Chi-square is also a continuous distribution, while the
frequencies being analyzed are discontinuous (whole
number).
• To improve the analysis, Yates’ correction is often applied
(Yate,1934):
• 2 =  (observed freq. - expected freq. - 0.5)2/ expected freq.
• For 2 x 2 contingency table:
2 = n (ad - bc- 0.5n)2/(a + b)(c + d)(a + c)(b + d)
Yates’ Correction (example 1):
• 2 = n (ad - bc- 0.5n)2/(a + b)(c + d)(a + c)(b + d)
Males
Females
Agree
13
100
113
Disagree
116
30
146
129
130
259
2 = 259(1330 - 116100 -0.5259)2/(113)(146)(129)(130)
= 114.935 (smaller than 117.64, less bias)
2 (
= 0.05, df = 1)=
3.841  p<0.001; Therefore, rejected Ho.
Practical 3:
• 2 = n (ad - bc- 0.5n)2/(a + b)(c + d)(a + c)(b + d)
• For a drug test, Ho: The survival of the animals is
independent of whether the drug is administered
Treated
Not treated
n
Dead
12
27
39
Alive
30
31
61
n
42
58
100
Using Yates’ correction to calculate 2 and test the hypothesis
Please do it at home
Bias in Chi-square calculations

• If values of expected frequency (fi) are very small,
the calculated 2 is biased in that it is larger than the
theoretical 2 value and we shall tend to reject Ho.


• Rules: fi > 1 and no more than 20% of fi < 5.0.
• It may be conservative at significance levels < 5%,
especially when the expected frequencies are all
equal.

• If having small fi, (1) increase the sample size if
possible, use G-test or (2) combine the categories if
possible.
The G test (log-likelihood ratio)
G = 2  O ln (O/E)
• Similar to the 2 test
• Many statisticians believe that the G test is superior to the
2 test (although at present it is not as popular)
• For 2 x 2 cross tabulation:
Class A
Class B
State 1
a
b
State 2
c
d
The expected frequency for cell a = (a+b)[(a+c)/n]
Practical 3
Treated
Not treated
n
Dead
12 (16.38)
27 (22.62)
39
Alive
30 (25.62)
31 (35.38)
61
n
42
58
100
G = 2  O ln (O/E)
Treated
Not treated
n
Dead
12 (16.38)
27 (22.62)
39
Alive
30 (25.62)
31 (35.38)
61
n
42
58
100
(1) Calculate G:
G = 2 [ 12 ln(12/16.38) + 30 ln(30/25.62) + 27 ln(27/22.62) + 31
ln(31/35.38)]
G = 2 (1.681) = 3.362
(2) Calculate the William’s correction: 1 + [(w2 - 1)/6nd] where w
is the number of frequency cells, n is total number of
measurements and d is the degree of freedom (r-1)(c-1)
= 1 + [(42 - 1)/ (6)(100)(1)] = 1.025
 G (adjusted) = 2 = 3.362/1.025 = 3.28 (< 3.31 from 2 test)
 2 ( = 0.05, df = 1)= 3.841  p>0.05; Therefore, accept Ho.
• Ho: The sample of Drosophila (F2) from a population having 9: 3: 3: 1
ratio of pale body-normal wing (PNW) to pale-vestigial wing (PVW) to
dark-normal wing (DNW) to dark-vestigial wing (DVW)
PNW
300
282.4
18.14
PVW DNW DVW Total
Observed
77
89
36
502
Expected
94.1
94.1
31.4
O ln(O/E)
-15.44 -4.96
4.92
G value:
G = 2  (18.14 - 15.44 - 4.96 + 4.92) = 5.32
William’s correction:
1 + [(42 - 1)/6 (502) (3)] = 1.00166
G (adjusted):
5.32/1.00166 = 5.311
2 ( = 0.05, df = 3)= 7.815  for 2 = 5.20, 0.25<p<0.10
Therefore, accept Ho.
The Kolmogorov-Smirnov goodness of fit test =
Kolmogorov-Smirnov one-sample test
• Deal with goodness of fit tests applicable to nominal scale
data and for data in ordered categories
• Example: 35 cats were tested one at a time, and allowed to
choose 5 different diets with different moisture content (1=
very moist to 5 = very dry):
• Ho: Cats prefer all five equally
Observed
Expected
1
2
7
2
18
7
3
10
7
4
4
7
5
5
7
n
35
35
Kolmogorov-Smirnov one-sample test
• Ho: Cats prefer all five diets equally
1
2
3
4
5
n
O
2
18
10
4
1
35
E
7
7
7
7
7
35
Cumulative O 2
20
30
34
35
Cumulative E 7
14
21
28
35
 di 
5
6
9
6
0
dmax = maximum  di = 9
(dmax), k, n = (dmax) 0.05, 5, 35 = 7 (Table B8: k = no. of categories)
Therefore reject Ho.
0.002< p < 0.005
• When applicable (i.e. the categories are ordered), the K-S test is more
powerful than the 2 test when n is small or when values of observed
frequencies are small.
• Note: order for the same data changed to 2, 1, 4, 18 and 10: the 2 test
will give the same results (independent of the orders) but the calculated
dmax from the K-S test will be different.
Kolmogorov-Smirnov one-sample test for
continuous ratio scale data
• Example 22.11 (page 479 in Zar)
• Ho: Moths are distributed uniformly from ground
level to height of 25 m
• HA: Moths are not distributed uniformly from
ground level to height of 25 m
• Use of Table B9
Ho: Moths are distributed uniformly from ground level to height of 25 m
no.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Xi
fi
height frequency
1.4
1
2.6
1
3.3
1
4.2
1
4.7
1
5.6
2
6.4
1
7.7
1
9.3
1
10.6
1
11.5
1
12.4
1
18.6
1
22.3
1
Fi
Fi/15
cumulative relative
frequency frequency
1
0.0667
2
0.1333
3
0.2000
4
0.2667
5
0.3333
7
0.4667
8
0.5333
9
0.6000
10
0.6667
11
0.7333
12
0.8000
13
0.8667
14
0.9333
15
1.0000
Xi/25
relative
expected
frequency
Di
Dí
0.056
0.0107
0.104
0.0293
0.132
0.0680
0.168
0.0987
0.188
0.1453
0.224
0.2427
0.256
0.2773
0.308
0.2920
0.372
0.2947
0.424
0.3093
0.460
0.3400
0.496
0.3707
0.744
0.1893
0.892
0.1080
Max =
Table B9 D 0.05, 15 =
0.3707
0.0560
0.0373
0.0013
0.0320
0.0787
0.1093
0.2107
0.2253
0.2280
0.2427
0.2733
0.3040
0.1227
0.0413
0.3040
0.3376 < D max
Therefore, reject Ho.
0.02<p<0.05
Kolmogorov-Smirnov one-sample test
for grouped data (example 22.11)
Xi
0-5 m
5-10 m 10-15 m
observed fi
5
5
3
expected fi
3
3
3
Cumulative O fi
5
10
13
Cumulative E fi
3
6
9
abs di
2
4
4
d max
4
d max 0.05, 5, 15
5 (use Table B8)
Thus, accept Ho 0.05<p<0.10
15-20 m
1
3
14
12
2
20-25 m
1
3
15
15
0
Note: The power is lost (and Ho is not rejected) by grouping the data
and therefore grouping should be avoided whenever possible.
• The power is reduced by grouping the data and
therefore grouping should be avoided whenever
possible.
• K-S test can be used to test normality of data
n
15
15
• Recognizing the distribution of your data
is important
– Provides a firm base on which to establish
and test hypotheses
– If data are normally distributed, you can
use parametric tests;
– Otherwise transform data to normal
distribution
– Or non-parametric tests should be
performed
• For a reliable test for normality of
interval data, n must be large enough
(e.g. > 15)
– Difficult to tell whether a small data set
(e.g. 5) is normally distributed
•
•
•
•
•
Inspection of the frequency histogram
Probability plot
Chi-square goodness of fit
Kolmogorov-Smirnov one-sample test
Symmetry and Kurtosis: D’AgostinoPearson K2 test (Chapters 6 & 7, Zar 99)
Inspection of the frequency histogram
• Construct the frequency histogram
• Calculate the mean and median (mode as well, if possible)
• Check the shape of the distribution and the location of
these measurements
Probability plot
e.g. 1
Class
0246810 12 14 16 18 20 22 -
2
4
6
8
10
12
14
16
18
20
22
24
c.f./61
frequency
1
2
3
5
8
11
8
9
6
4
3
1
=NORMSINV(X)
Cumulative
Probit
Upper
frequency Percentile z
(5 + z)
Class limit
1
0.0164
-2.1347
2.8653
2
3
0.0492
-1.6529
3.3471
4
6
0.0984
-1.2910
3.7090
6
11
0.1803
-0.9141
4.0859
8
19
0.3115
-0.4917
4.5083
10
30
0.4918
-0.0205
4.9795
12
38
0.6230
0.3132
5.3132
14
47
0.7705
0.7405
5.7405
16
53
0.8689
1.1210
6.1210
18
57
0.9344
1.5096
6.5096
20
60
0.9836
2.1347
7.1347
22
61
1.0000
Probability plot
e.g. 1
1.0
12
0.9
0.8
Expected cum ulative p
frequency
10
8
6
4
2
0.7
0.6
0.5
0.4
0.3
y = 0.8502x + 0.0736
R2 = 0.9711
0.2
0.1
0
1
2
3
4
5
6
7
8
9
Bin num ber (bin size = 2)
10
11
12
0.0
0.0
0.2
0.4
0.6
Observed cum ulative p
0.8
1.0
Probability plot
e.g. 2
Class
0246810 12 14 16 18 20 22 -
2
4
6
8
10
12
14
16
18
20
22
24
frequency
10
24
11
9
4
5
2
2
4
1
8
10
Cumulative
Probit
Upper
frequency Percentile z
(5 + z)
Class limit
10
0.1111
-1.2206
3.7794
2
20
0.2222
-0.7647
4.2353
4
44
0.4889
-0.0279
4.9721
6
55
0.6111
0.2822
5.2822
8
64
0.7111
0.5566
5.5566
10
68
0.7556
0.6921
5.6921
12
73
0.8111
0.8820
5.8820
14
75
0.8333
0.9674
5.9674
16
77
0.8556
1.0606
6.0606
18
81
0.9000
1.2816
6.2816
20
82
0.9111
1.3476
6.3476
22
90
1.0000
Probability plot
e.g. 2
30
1.0
20
0.9
15
0.8
10
0.7
5
0.6
0
1
2
3
4
5
6
7
8
9
10
11
12
Bin num ber (bin size = 2)
• Obviously, the data is not
distributed on the line.
• Based on the frequency distribution
of the data, the distribution is
positive skew (higher frequencies
at lower classes)
Exp cum P
Frequency
25
y = 1.0876x - 0.2443
R2 = 0.8576
0.5
0.4
0.3
0.2
0.1
0.0
0.0
0.2
0.4
0.6
Obs cum P
0.8
1.0
• Concave curve indicates positive skew which suggest a lognormal distribution (i.e. log-transformation of the upper class
limit is required)
 very common e.g. mortality rates
• Convex curve indicates negative skew
 less common (e.g. some binomial distribution)
• S-shaped curve suggests ‘bad’ kurtosis: Normality departure but their
mean, median, mode remain equal
• Leptokurtic distribution: data bunched around the mean, giving a
sharp peak
• Platykurtic distribution: a board summit which falls rapidly in the
tails
• Bimodal distributions e.g.
toxicity data produce a
sigmoid probability plot
• Multi-modal distributions:
data from animals with several
age-classes; undulating wavelike curve
Chi-Square Goodness of Fit
The heights of 70 students: Chi-square goodness of fit of a normal distribution.
(Example 6.1
7.4 in Zar)
E
no.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Height class
<62.5
62.5 - 63.5
63.5 - 64.5
64.5 - 65.5
65.5 - 66.5
66.5 - 67.5
67.5 - 68.5
68.5 - 69.5
69.5 - 70.5
70.5 - 71.5
71.5 - 72.5
72.5 - 73.5
73.5 - 74.5
74.5 - 75.5
75.5 - 76.5
76.5 - 77.5
77.5 - 78.5
O
Xi
Z
Expected Expected
observed Upper class (Xi-mean)/s P(z)
P(Xi)
frequency frequency
frequency limit
n(P(Xi))
n(P(Xi))
(O-E)^2/E
0
62.5
-2.32
0.0102
0.0102
0.7172
2
63.5
-2.02
0.0219
0.0117
0.8191
1.5363
0.1400
2
64.5
-1.71
0.0434
0.0214
1.4987
1.4987
0.1677
3
65.5
-1.41
0.0791
0.0358
2.5048
2.5048
0.0979
5
66.5
-1.11
0.1338
0.0546
3.8238
3.8238
0.3618
4
67.5
-0.81
0.2099
0.0762
5.3318
5.3318
0.3327
6
68.5
-0.50
0.3069
0.0970
6.7906
6.7906
0.0921
5
69.5
-0.20
0.4198
0.1129
7.8996
7.8996
1.0643
8
70.5
0.10
0.5397
0.1199
8.3939
8.3939
0.0185
7
71.5
0.40
0.6561
0.1164
8.1467
8.1467
0.1614
7
72.5
0.70
0.7593
0.1032
7.2220
7.2220
0.0068
10
73.5
1.01
0.8428
0.0835
5.8479
5.8479
2.9481
6
74.5
1.31
0.9046
0.0618
4.3251
4.3251
0.6486
3
75.5
1.61
0.9463
0.0417
2.9219
2.9219
0.0021
2
76.5
1.91
0.9721
0.0258
1.8029
1.8029
0.0215
0
77.5
2.21
0.9866
0.0145
1.0161
1.5392
1.5392
0
78.5
2.52
0.9941
0.0075
0.5231
Chi-square =
7.6026
Chi-sq 0.05, 12 =
21.026
Accept Ho: the data are normally distributed
fiXi
fi(Xi)^2
freq
sum
mean
sd
2
2
3
5
4
6
5
8
7
7
10
6
3
2
126
128
195
330
268
408
345
560
497
504
730
444
225
152
7938
8192
12675
21780
17956
27744
23805
39200
35287
36288
53290
32856
16875
11552
70
4912
345438
70.17
3.310
12
Observed
10
Frequency
Xi
fi
Mid height
63
64
65
66
67
68
69
70
71
72
73
74
75
76
Expected
8
6
4
2
0
60
65
70
Height (in), Xi
=(345438-(49122/70))/(70-1)
75
80
Kolmogorov-Smirnov one-sample test
The heights of 70 students: Chi-square goodness of fit of a normal distribution.
no.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Height class
<62.5
62.5 - 63.5
63.5 - 64.5
64.5 - 65.5
65.5 - 66.5
66.5 - 67.5
67.5 - 68.5
68.5 - 69.5
69.5 - 70.5
70.5 - 71.5
71.5 - 72.5
72.5 - 73.5
73.5 - 74.5
74.5 - 75.5
75.5 - 76.5
76.5 - 77.5
77.5 - 78.5
Xi
observed cumulative cumulative Z
cumulative
Upper class O
O
relative
(Xi-mean)/s E
Di
Dí
limit
frequency frequency
O frequency
frequency
62.5
0
0.0
0.0000
-2.32
0.0102
0.0102
0.0102
63.5
2
2.0
0.0286
-2.02
0.0219
0.0066
0.0219
64.5
2
4.0
0.0571
-1.71
0.0434
0.0138
0.0148
65.5
3
7.0
0.1000
-1.41
0.0791
0.0209
0.0220
66.5
5
12.0
0.1714
-1.11
0.1338
0.0377
0.0338
67.5
4
16.0
0.2286
-0.81
0.2099
0.0186
0.0385
68.5
6
22.0
0.3143
-0.50
0.3069
0.0073
0.0784
69.5
5
27.0
0.3857
-0.20
0.4198
0.0341
0.1055
70.5
8
35.0
0.5000
0.10
0.5397
0.0397
0.1540
71.5
7
42.0
0.6000
0.40
0.6561
0.0561
0.1561
72.5
7
49.0
0.7000
0.70
0.7593
0.0593
0.1593
73.5
10
59.0
0.8429
1.01
0.8428
0.0001
0.1428
74.5
6
65.0
0.9286
1.31
0.9046
0.0240
0.0617
75.5
3
68.0
0.9714
1.61
0.9463
0.0251
0.0178
76.5
2
70.0
1.0000
1.91
0.9721
0.0279
0.0007
77.5
0
70.0
1.0000
2.21
0.9866
0.0134
0.0134
78.5
0
70.0
1.0000
2.52
0.9941
0.0059
0.0059
Another method can be found in
example 7.14 (Zar 99)
D max
0.0593
D 0.05, 70
0.1598 > D max
Accept Ho
0.1593
Symmetry (Skewness)
and Kurtosis
Skewness
• A measure of the asymmetry of a
distribution.
• The normal distribution is
symmetric, and has a skewness
value of zero.
• A distribution with a significant
positive skewness has a long
right tail.
• A distribution with a significant
negative skewness has a long left
tail.
• As a rough guide, a skewness
value more than twice it's
standard error is taken to
indicate a departure from
symmetry.
Symmetry (Skewness)
and Kurtosis
Kurtosis
• A measure of the extent to which
observations cluster around a central
point.
• For a normal distribution, the value
of the kurtosis statistic is 0.
• Positive kurtosis indicates that the
observations cluster more and have
longer tails than those in the normal
distribution ( leptokurtic).
• Negative kurtosis indicates the
observations cluster less and have
shorter tails ( Platykurtic).
• You should read the Chapters 1-7 of Zar 1999 which have been
covered by the five lectures so far.
• The frequency distribution of a sample can often be identified with a
theoretical distribution, such as the normal distribution.
• Five methods for comparing a sample distribution: inspection of the
frequency histogram; probability plot; Chi-square goodness of fit,
Kolmogorov-Smirnov one-sample test and D’Agostino-Pearson K2
test.
• Probability plots can be used for testing normal and log-normal
distributions.
• Graphical methods often provide evidence of non-normal distributions,
such as skewness and kurtosis (Excel or SPSS can determine the
degree of these two measurements).
• The Chi-square goodness of fit or Kolmogorov-Smirnov one-sample
test also can be used to test of an unknown distribution against a
theoretical distribution (apart from normal distribution).
Binomial & Poisson Distributions
and their Application
(Chapters 24 & 25, Zar 1999)
Binomial
• Consider nominal scale data that come from
a population with only two categories
– members of a mammal litter may be classified
as male or female
– victims of an epidemic as dead or alive
– progeny of a Drosophila cross as white-eyed or
red-eyed
Binomial Distributions
The proportion of the population belonging to one of
the two categories is denoted as:
– p, then the other q = 1- p
– e.g. if 48% male and 52% female so
p = 0.48 and q = 0.52
(Source of photos: BBC)
http://zygote.swarthmore.edu/chap20.html
http://www.mun.ca/biology/scarr/Bird_sexing.htm
Binomial Distributions
• e.g. if p = 0.4 and q = 0.6: for taking 10 random samples,
you will expect 4 males and 6 females; however, you
might get 1 male and 9 females.
• The probabilities of two independent events both occurring
is the product of the probabilities of the two separate
events:
– (p)(q) = (0.4)(0.6) = 0.24;
– (p)(p) = 0.16; and
– (q)(q) = 0.36
Binomial Distributions
• e.g. if p = 0.4 and q = 0.6: for taking 10 random
samples, you will expect 4 males and 6 females
• The probabilities of either of two independent events
is sum of the probabilities of each event, e.g. for
having one male and one female in the sample:
pq + qp = 2 pq = 2(0.4)(0.6) = 0.48
• For having all male, all female,
Both sexes = 0.16 + 0.36 + 0.48 = 1
Binomial Distributions
If a random sample of size n is taken from a binomial
population, the probability of X individuals being in
one category (other category = n - X) is
P(X) = [(n!)/(X!(n-X)!)](pX)(qn-X)
For n = 5, X = 3, p = q = 0.5, then
P(X) = (5!/3!2!)(0.53)(0.52)
P(X) = (10)(0.125)(0.25) = 0.3125
For X = 0, 1, 2 , 4, 5,
P(X) = 0.03125, 0.15625, 0.31250, 0.15625, 0.03125,
respectively
Binomial distributions
• For example: The data consist of observed frequencies of females in 54
litters of 5 offspring per litter. X = 0 denotes a litter having no females, X
= 1 denotes a litter having one female, etc; f is the observed number of
litters, and ef is the the number of litters expected if the null hypothesis is
true. Computation of the values of ef requires the values of P(X)
• Ho: The sex of the offspring are from a binomial distribution with p = q =
0.5
X
0
1
2
3
4
5
n
5
5
5
5
5
5
n-X
5
4
3
2
1
0
n!/(X!(n-x)!)
1
5
10
10
5
1
p
0.5
0.5
0.5
0.5
0.5
0.5
q
0.5
0.5
0.5
0.5
0.5
0.5
p^X
1
0.5
0.25
0.125
0.0625
0.03125
q^(n-X)
0.03125
0.0625
0.125
0.25
0.5
1
P(X)
0.03125
0.15625
0.31250
0.31250
0.15625
0.03125
Xi
0
1
2
3
4
5
Observed
fi
3
10
14
17
9
1
efi
(P(X))(n)
1.688
8.438
16.875
16.875
8.438
1.688
2 = (Observed freq – Expected freq)2/Expected freq
2 = (3-1.688)2/1.688 + 0.2948 + 0.4898 + 0.0009 + 0.0375 +
0.2801
= 2.117
df = k -1 = 6 -1 = 5; 2 0.05, 5 = 11.07 so accept Ho. P>0.05
P(X) = [(n!)/(X!(n-X)!)](pX)(qn-X)
Poisson Distributions
Important in describing random occurrences, these
occurrences being either objects in space or events in
time.
P(X) = e- X/X!
• When n is large and p is very small, Possion
distribution approaches the binomial distribution.
• Interesting property: 2 = 
Poisson Distributions
P(X) = e- X/X!
• e.g. The data are the number of sparrow nests in an
area of given size (8,000 m2). There are totally 40
areas of the same size surveyed. Then Xi is the
number of nests in an area; fi is the frequency of Xi
nests per hectare; and P(Xi) is the probability of Xi
nests per hectare, if the nests are distributed
randomly.
• Ho: the population of sparrow nests is distributed
randomly
Example 25.3 (Zar 1999)
• Ho: the population of sparrow nests is distributed
randomly
O
fi
Xi
0
1
2
3
4
>=5
fiXi
9
22
6
2
1
0
40
sum
mean
0
22
12
6
4
44
1.1
P(Xi)
0.33287
0.36616
0.20139
0.07384
0.02031
E fi
[P(Xi)](n)
13.3148
14.6463
8.0555
2.9537
0.8123
(O-E)2/E
1.398280
3.692154
0.524488
0.307921
0.043392
Chi-square = 5.966234
df = K -2 = 3
Chi-square (0.05, 3) = 7.815
Accept Ho
P(0) = e –1.1 = 0.0332871
P(1) = (0.332871)(1.1)/1
P(X) = e- X/X!
For further reading on Binomial and Poisson
distributions: Zar’s chapters 24 and 25