Transcript Document

Section 8.3
Suppose X1 , X2 , ..., Xn are a random sample from a distribution defined
by the p.d.f.
f(x) for a < x < b and corresponding distribution function F(x),
The random variables which order the sample from smallest to largest
Y1 < Y2 < ...< Yn are called the order statistics.
Suppose n = 2.
The space of (X1 , X2) is {(x1 , x2) | a < x1 < b , a < x2 < b}
The space of (Y1 , Y2) is {(y1 , y2) | a < y1 < y2 < b }
For a subset A of the space of (Y1 , Y2), we have that P[(Y1 , Y2)  A] =
For a subset A of the space of (Y1 , Y2), we have that P[(Y1 , Y2)  A] =
P[{(X1 , X2)  A}{(X2 , X1)  A}] =
P[(X1 , X2)  A] + P[(X2 , X1)  A] =
2 P[(X1 , X2)  A] = 2
f(x1) f(x2) dx1 dx2 =
A
2 f(y1) f(y2) dy1 dy2
A
Therefore, the joint p.d.f. of (Y1 , Y2) must be
g(y1 , y2) = 2 f(y1) f(y2)
if a < y1 < y2 < b
To find the p.d.f. for Y1 , we first find the distribution function
G1(y) = P(Y1 ≤ y) = P[min(X1 , X2) ≤ y] = 1 – P[min(X1 , X2)  y] =
1 – P(X1  y  X2  y) = 1 – P(X1  y) P(X2  y) =
1 – [1 – P(X1 ≤ y)] [1 – P(X2 ≤ y)] = 1 – [1 – F(y)] [1 – F(y)] =
1 – [1 – F(y)]2
The p.d.f. for Y1 is g1(y) =
d
—G1(y) = – 2[1 – F(y)][– f(y)] = 2[1 – F(y)]f(y)
dy
if a < y < b
To find the p.d.f. for Y2 , we first find the distribution function
G2(y) = P(Y2 ≤ y) = P[max(X1 , X2) ≤ y] = P(X1 ≤ y  X2 ≤ y) =
P(X1 ≤ y) P(X2 ≤ y) = F(y)F(y) = [F(y)]2
d
The p.d.f. for Y2 is g2(y) = —G2(y) = 2F(y)f(y)
dy
if a < y < b
1. Suppose X1 , X2 is a random sample from a distribution defined by
the p.d.f.
f(x) = 2x if 0 < x < 1 .
Let Y1 , Y2 be the order statistics of the sample.
(a) Is f(x) a beta p.d.f., and if yes, for what values of  and ?
f(x) is a beta p.d.f. with  = 2 and  = 1 .
(b) Find the distribution function corresponding to the p.d.f.
0
if x  0
F(x) =
x2
if 0 < x  1
1
if 1 < x
(c) Find the joint p.d.f. of the order statistics (Y1 , Y2).
The joint p.d.f. of Y1 , Y2 is
g(y1 , y2) = 2(2y1)(2y2) = 8y1y2
if 0 < y1 < y2 < 1
(d) Find the p.d.f. of Y1 .
The p.d.f. of Y1 is g1(y) =
2[1 – y2](2y) = 4y(1 – y2)
(e) Find the p.d.f. of Y2 .
The p.d.f. of Y2 is g2(y) =
2[y2](2y) = 4y3
if 0 < y < 1
if 0 < y < 1
(f) Is either the p.d.f. of Y1 or the p.d.f. of Y2 a beta p.d.f., and if yes,
for what values of  and ?
The p.d.f. of Y1 is not a beta p.d.f.
The p.d.f. of Y2 is a beta p.d.f. with  = 4 and  = 1 .
Suppose n = 3.
The space of (X1 , X2 , X3) is
{(x1 , x2 , x3) | a < x1 < b , a < x2 < b , a < x3 < b}
The space of (Y1 , Y2 , Y3 ) is {(y1 , y2 , y3) | a < y1 < y2 < y3 < b}
For a subset A of the space of (Y1 , Y2 , Y3), we have that
P[(Y1 , Y2 , Y3)  A] =
P[{(X1 , X2 , X3)  A}{(X1 , X3 , X2)  A}{(X2 , X1 , X3)  A} 
{(X2 , X3 , X1)  A}{(X3 , X1 , X2)  A}{(X3 , X2 , X1)  A}] =
P[(X1 , X2 , X3)  A] + P[(X1 , X3 , X2)  A] + P[(X2 , X1 , X3)  A] +
P[(X2 , X3 , X1)  A] + P[(X3 , X1 , X2)  A] + P[(X3 , X2 , X1)  A] =
6P[(X1 , X2 , X3)  A] = 6
f(x1) f(x2) f(x3) dx1 dx2 dx3 =
A
6P[(X1 , X2 , X3)  A] = 6
f(x1) f(x2) f(x3) dx1 dx2 dx3 =
A
6 f(y1) f(y2) f(y3) dy1 dy2 dy3
A
Therefore, the joint p.d.f. of (Y1 , Y2 , Y3) must be
g(y1 , y2 , y3) = 6 f(y1) f(y2) f(y3)
if a < y1 < y2 < y3 < b
To find the p.d.f. for Y1 , we first find the distribution function
G1(y) = P(Y1 ≤ y) = P[min(X1 , X2 , X3) ≤ y] =
1 – P[min(X1 , X2 , X3)  y] = 1 – P(X1  y  X2  y  X3  y) =
1 – P(X1  y) P(X2  y) P(X3  y) =
1 – [1 – P(X1 ≤ y)] [1 – P(X2 ≤ y)] [1 – P(X3 ≤ y)] =
1 – [1 – F(y)] [1 – F(y)] [1 – F(y)] = 1 – [1 – F(y)]3
The p.d.f. for Y1 is g1(y) =
d
—G1(y) = – 3[1 – F(y)]2[– f(y)] = 3[1 – F(y)]2f(y)
dy
if a < y < b
To find the p.d.f. for Y2 , we first find the distribution function
G2(y) = P(Y2 ≤ y) = P[at least two of X1 , X2 , X3 are  y] =
3
2
[ F(y) ]2
[1
– F(y) ]1
+
3
[ F(y) ]3 [1 – F(y) ]0 =
3
3[F(y)]2 [1 – F(y)] + [F(y)]3
The p.d.f. for Y2 is g2(y) =
d
—G2(y) = 6[F(y)]f(y)[1 – F(y)] + 3[F(y)]2 [– f(y)] + 3[F(y)]2 f(y) =
dy
6[F(y)] [1 – F(y)] f(y) if a < y < b
To find the p.d.f. for Y3 , we first find the distribution function
G3(y) = P(Y3 ≤ y) = P[max(X1 , X2 , X3) ≤ y] =
P(X1 ≤ y  X2 ≤ y  X3 ≤ y) = P(X1 ≤ y) P(X2 ≤ y) P(X3 ≤ y) = [F(y)]3
d
2
The p.d.f. for Y3 is g3(y) = —G3(y) = 3[F(y)] f(y) if a < y < b
dy
Suppose n is any integer greater than 1.
The space of (X1 , X2 , …, Xn) is
{(x1 , x2 , …, xn) | a < x1 < b , a < x2 < b , …, a < xn < b}
The space of (Y1 , Y2 , …, Yn ) is
{(y1 , y2 , …, yn) | a < y1 < y2 < … < yn < b}
For a subset A of the space of (Y1 , Y2 , …, Yn), we have that
P[(Y1 , Y2 , …, Yn)  A] =
P[{(X1 , X2 , …, Xn)  A}{(X2 , X1 , …, Xn)  A} …] =
n! P[(X1 , X2 , …, Xn)  A] = n!
… f(x1) f(x2) … f(xn) dx1 dx2 … dxn =
A
… n! f(y1) f(y2) … f(yn) dy1 dy2 … dyn
A
Therefore, the joint p.d.f. of (Y1 , Y2 , …, Yn) must be
g(y1 , y2 , …, yn) = n! f(y1) f(y2) … f(yn)
if a < y1 < y2 < … < yn < b
Suppose r is any integer from 1 to n.
To find the p.d.f. for Yr , we first find the distribution function
Gr(y) = P(Yr  y) = P [at least r of X1 , X2 , …, Xn are  y] =
n

k=r
n
k
[ F(y) ]k [1 – F(y) ]n–k
d n
d
The p.d.f. for Yr is gr(y) = —Gr(y) = — 
dy k = r
dy
d
—
dy
n–1

k=r
n–1

k=r
n
k
[ F(y) [1 – F(y)
]k
]n–k
n
k
[ F(y) ]k [1 – F(y) ]n–k =
d
+ — [F(y)]n =
dy
n!
———— k [F(y)]k–1 f(y) [1 – F(y)]n–k +
k! (n – k)!
n!
———— [F(y)]k (n – k) [1 – F(y)]n–k–1 [– f(y)] +
k! (n – k)!
Observe that when k = r this second term is the
negative of the preceding term when k = r + 1.
This pattern continues until k = n – 1 when this
second term is the negative of the isolated term.
n [F(y)]n–1 f(y) =
Consequently, the p.d.f. for Yr is gr(y) =
n!
—————— [F(y)]r–1 [1 – F(y)]n–r f(y)
(r – 1)! (n – r)!
Now, go to Exercise #2:
if a < y < b
2. Suppose the random sample X1 , X2 , X3 , X4 , X5 is from a
distribution defined by the p.d.f.
f(x) = 2x if 0 < x < 1 .
Let Y1 , Y2 , Y3 , Y4 , Y5 be the order statistics of the sample.
(a) Find the joint p.d.f. of the order statistics (Y1 , Y2 , Y3 , Y4 , Y5).
The joint p.d.f. of Y1 , Y2 , Y3 , Y4 , Y5 is
g(y1 , y2 , y3 , y4 , y5) = 3840 y1 y2 y3 y4 y5
if 0 < y1 < y2 < y3 < y4 < y5 < 1
(b) Find the p.d.f. of Y1 .
The p.d.f. of Y1 is g1(y) =
5!
—————— [y2]1–1 [1 – y2]5–1 (2y) = 10y(1 – y2)4
(1 – 1)! (5 – 1)!
if 0 < y < 1
(c) Find the p.d.f. of Y5 .
The p.d.f. of Y5 is g5(y) =
5!
—————— [y2]5–1 [1 – y2]5–5 (2y) = 10y9
(5 – 1)! (5 – 5)!
if 0 < y < 1
(d) Find the p.d.f. of Y3 .
The p.d.f. of Y3 is g3(y) =
5!
—————— [y2]3–1 [1 – y2]5–3 (2y) = 60y5(1 – y2)2 if 0 < y < 1
(3 – 1)! (5 – 3)!
2.-continued
(e) Find P(Y1  1/2).
1/2
1/2
P(Y1  1/2) = 10y(1 – y2)4 dy = – 5 – 2y(1 – y2)4 dy =
0
0
1/2
(1 – y2)5
– 5 ————
5
= 1 –
y=0
Note that an alternative approach is
P(Y1  1/2) = P[min(X1 , X2 , X3 , X4 , X5)  1/2] =
3
—
4
5
781
= ——
1024
P(Y1  1/2) = P[min(X1 , X2 , X3 , X4 , X5)  1/2] =
1 – P[min(X1 , X2 , X3 , X4 , X5)  1/2] = 1 – P[X1  1/2 … X5  1/2] =
1 – P[X1  1/2] … P[X5  1/2] = 1 – [1 –
1/4]5
= 1 –
3
—
4
5
781
= ——
1024
2.-continued
(f) Find P(Y5  1/2).
1/2
1/2
P(Y5  1/2) = 10y9 dy =
0
1
= —
2
y=0
10
y10
Note that an alternative approach is
P(Y5  1/2) = P[max(X1 , X2 , X3 , X4 , X5)  1/2] =
P[X1  1/2 … X5  1/2] = P[X1  1/2] … P[X5  1/2] =
([1/2]2)5
=
1
—
2
10
1
= ——
1024
2.-continued
(g) Find P(Y3  1/2).
1/2
Since this looks hard
P(Y3  1/2) = 60y5(1 – y2)2 dy = to integrate, we shall
use an alternative
approach:
0
P(Y3  1/2) = P[at least three of X1 , X2 , X3 , X4 , X5 are  1/2] =
5
3
[ 1/4
1
10 —
4
3
]3
[1 – 1/4
2
]2
3
1
— + 5 —
4
4
+
4
5
[ 1/4 ]4 [1 – 1/4 ]1
4
1
3
— +
4
1
—
4
5
=
+
[ 1/4 ]5 =
106
53
—— = ——
1024 512
Note that this probability can be read as 0.1035 from Table II in the
appendix of the textbook.
3. Suppose the random sample X1 , X2 , … , Xn is from a U(0,1)
distribution. Let Y1 , Y2 , … , Yn be the order statistics of the sample.
(Note: Parts of this Exercise are the same as Text Exercise 8.3-6.)
(a) Find the distribution function corresponding to the U(0, 1)
distribution.
0
if x  0
F(x) =
x
if 0 < x  1
1
if 1 < x
(b) Find the joint p.d.f. of the order statistics (Y1 , Y2 , … , Yn).
The joint p.d.f. of Y1 , Y2 , …, Yn is
g(y1 , y2 , …, yn) = n! if 0 < y1 < y2 < … < yn < 1
(c) Find the p.d.f. of Yr where r is any integer from 1 to n.
The p.d.f. of Yr is gr(y) =
n!
—————— yr–1 (1 – y)n–r if 0 < y < 1
(r – 1)! (n – r)!
Realizing that (n + 1) = n! , (r) = (r – 1)! , and (n – r + 1) = (n – r)!,
we find that Yr has a beta distribution with  = r and  = n – r + 1 .
This is essentially what Text Exercise 8.3-6(c) says to show.
3.-continued
(d) Find the mean and variance of Yr where r is any integer from
1 to n.
E(Yr) =

—— =
+
r
——
n+1

r(n – r + 1)
= ——————
Var(Yr) = ————————
2
( +  + 1)( + )
(n + 2)(n + 1)2
(e) Find E(Yr+1 – Yr) where r is any integer from 1 to n – 1.
r+1
E(Yr+1 – Yr) = —— –
n+1
r
—— =
n+1
1
——
n+1
4. Let Q have a U(0, 1) distribution. For constants b > a, define the
random variable X = (b – a)Q + a .
(a) Find the distribution function for X, find the p.d.f. for X, and state
what type of distribution X has.
0 if q  0
The distribution function for Q is F(q) = P(Q  q) =
q
if 0 < q  1
1 if 1 < q
The space for X is {x : a < x < b}. The distribution function for X is
x–a
G(x) = P(X  x) = P([b – a]Q + a  x) = P(Q  [x – a] / [b – a]) = ——
b–a
1
The p.d.f. for X is g(x) = ——
b–a
for a < x < b
We see then that X has a U(a, b) distribution.
(b) Let Q1 , Q2 , Q3 be a random sample selected from the U(0, 1)
distribution, and let V1 , V2 , V3 be the order statistics. Also, let
X1 = (b – a)Q1 + a , X2 = (b – a)Q2 + a , X3 = (b – a)Q3 + a ,
and let Y1 , Y2 , Y3 be the order statistics, which implies
Y1 = (b – a)V1 + a , Y2 = (b – a)V2 + a , Y3 = (b – a)V3 + a .
State why X1 , X2 , X3 is a random sample, use part (a) to find the
type of distribution this random sample is from, and use Class
Exercise #3 to find E(Y1) , Var(Y1) , E(Y2) , Var(Y2) , E(Y3) ,
Var(Y3) , and E(Y1Y3) .
Since Q1 , Q2 , Q3 are independent, then X1 , X2 , X3 are independent
and this together with part (a) implies X1 , X2 , X3 is a random sample
from a U(a, b) distribution.
4.-continued
r
E(Y1) = E([b – a]V1 + a) = [b – a]E(V1) + a = [b – a] —— + a =
n+1
1
b + 3a
[b – a] —— + a = ———
3+1
4
Var(Y1) = Var([b – a]V1 + a) = (b – a)2Var(V1) =
2
r(n
–
r
+
1)
1(3
–
1
+
1)
3(b
–
a)
2 —————— = ———–
(b – a)2 ——————
=
(b
–
a)
(n + 2)(n + 1)2
(3 + 2)(3 + 1)2
80
r
E(Y2) = E([b – a]V2 + a) = [b – a]E(V2) + a = [b – a] —— + a =
n+1
2
b+a
[b – a] —— + a = ——
3+1
2
Var(Y2) = Var([b – a]V2 + a) = (b – a)2Var(V2) =
2
r(n
–
r
+
1)
2(3
–
2
+
1)
(b
–
a)
2 —————— = ———–
(b – a)2 ——————
=
(b
–
a)
(n + 2)(n + 1)2
(3 + 2)(3 + 1)2
20
r
E(Y3) = E([b – a]V3 + a) = [b – a]E(V3) + a = [b – a] —— + a =
n+1
3
3b + a
[b – a] —— + a = ———
3+1
4
Var(Y3) = Var([b – a]V3 + a) = (b – a)2Var(V3) =
2
r(n
–
r
+
1)
3(3
–
3
+
1)
3(b
–
a)
2 —————— = ———–
(b – a)2 ——————
=
(b
–
a)
(n + 2)(n + 1)2
(3 + 2)(3 + 1)2
80
4.-continued
E(Y1Y3) = E{([b – a]V1 + a)([b – a]V3 + a)} =
E{[b – a]2V1V3 + a[b – a]V1 + a[b – a]V3 + a2} =
[b – a]2E(V1V3) + a[b – a]E(V1) + a[b – a]E(V3) + a2 =
To find E(V1V3), we first recall from part (b) of Class Exercise #3
that the joint p.d.f. of (V1 , V2 , V3) is
g(v1 , v2 , v3) = 6
1
v3 v2
E(V1V3) =
0 0
if 0 < v1 < v2 < v3 < 1
1
v2
3v12 v3
6v1v3 dv1 dv2 dv3 =
0
v3
0
0
dv2 dv3 =
v1 = 0
1 v3
1
3v22 v3 dv2 dv3 =
0
0
0
1
v3
v2 3 v3
dv3 =
v2 = 0
0
v3 4
1
v3 5
1
dv3 = — = —
5
5
v3 = 0
E(Y1Y3) = [b – a]2E(V1V3) + a[b – a]E(V1) + a[b – a]E(V3) + a2 =
1
1
3
2
[b – a] — + a[b – a] — + a[b – a] — + a2 =
5
4
4
[b – a]2
——— + ab
5
Consequently, the p.d.f. for Yr is gr(y) =
n!
—————— [F(y)]r–1 [1 – F(y)]n–r f(y)
(r – 1)! (n – r)!
if a < y < b
Recall that the (100p)th percentile of the distribution defined by p.d.f.
f(x) is a number p such that
p
f(x) dx = F(p) = p
–
which motivates the
following definition:
The (100p)th percentile of the sample X1 , X2 , …, Xn is defined to be
Yr where r = (n+1)p
if (n+1)p is an integer
a weighted average of Yr and Yr+1 where r = (n+1)p
if (n+1)p is not an integer
Note: This definition is extended to an observed sample of values
x1 , x2 , …, xn where the ordered values in the sample are represented
by y1 , y2 , …, yn .
The detailed definition of sample order
statistics was given in Section 3.2.
5. Find the 40th percentile and the 80th percentile for data of Text
Example 8.3-5.
1013
1019
1021
1024
1026
1028
1033 1035 1039 1040 1043 1047
The detailed definition of sample order statistics was given in
Section 3.2, and an Excel spreadsheet was constructed to find
sample order statistics. Recall that the Excel formulas were
slightly different.
The location of the 40th percentile is (n + 1)p = (13)(0.40) = 5.2 .
40th percentile = y5 + (0.2)(y6 – y5) = 1026 + (0.2)(1028 – 1026) =
1026.4
The location of the 80th percentile is (n + 1)p = (13)(0.80) = 10.4 .
80th percentile = y10 + (0.4)(y11 – y10) = 1040 + (0.4)(1043 – 1040) =
1041.2