#### Transcript Chap. 5 - Sun Yat

```Chapter 5. Joint Probability
Distributions and Random Sample
Weiqi Luo (骆伟祺)
School of Software
Sun Yat-Sen University
Email：[email protected] Office：# A313
Chapter 5: Joint Probability Distributions and
Random Sample





5.1. Jointly Distributed Random Variables
5.2. Expected Values, Covariance, and Correlation
5. 3. Statistics and Their Distributions
5.4. The Distribution of the Sample Mean
5.5. The Distribution of a Linear Combination
2
School of Software
5.1. Jointly Distributed Random Variables
 The Joint Probability Mass Function for Two
Discrete Random Variables
Let X and Y be two discrete random variables defined
on the sample space S of an experiment. The joint
probability mass function p(x,y) is defined for each pair
of numbers (x,y) by
p( x, y )  P( X  x and Y  y )
3
School of Software
5.1. Jointly Distributed Random Variables
 Let A be any set consisting of pairs of (x,y) values. Then
the probability P[(X,Y)∈A] is obtained by summing the
joint pmf over pairs in A:
p[( X , Y )  A] 
 p( x, y)
( x , y )A
 Two requirements for a pmf
p ( x, y )  0
 p( x, y)  1
x
y
4
School of Software
5.1. Jointly Distributed Random Variables
 Example 5.1
A large insurance agency services a number of customers who have
purchased both a homeowner’s policy and an automobile policy from the
agency. For each type of policy, a deductible amount must be specified. For an
automobile policy, the choices are \$100 and \$250, whereas for a homeowner’s
policy the choices are 0, \$100, and \$200.
Suppose an individual with both types of policy is selected at random from
the agency’s files. Let X = the deductible amount on the auto policy,
Y = the deductible amount on the homeowner’s policy
0
y
100
200
100
0.20
0.10
0.20
250
0.05
0.15
0.30
Joint Probability Table
p(x,y)
x
5
School of Software
5.1. Jointly Distributed Random Variables
 Example 5.1 (Cont’)
0
y
100
200
100
0.20
0.10
0.20
250
0.05
0.15
0.30
p(x,y)
x
p(100,100) =P(X=100 and Y=100) = 0.10
P(Y ≥ 100) = p(100,100) + p(250,100) + p(100,200) + p(250,200) = 0.75
6
School of Software
5.1. Jointly Distributed Random Variables
 The marginal probability mass function
The marginal probability mass functions of X and Y,
denoted by pX(x) and pY(y), respectively, are given by
p X ( x)   p( x, y ); pY ( y )   p( x, y )
y
x
pY
pX
Y1
Y2
X1
p1,1
X2
…
Ym-1
Ym
p1,2
p1,m-1
p1,m
p2,1
p2,2
p2,m-1
p2,m
Xn-1
pn-1,m
pn-1,m
pn-1,m
pn-1,m
Xn
pn,m
pn,m
pn,m
pn,m
…
7
School of Software
5.1. Jointly Distributed Random Variables
 Example 5.2 (Ex. 51. Cont’)
The possible X values are x=100 and x=250, so
computing row totals in the joint probability table yields
0
y
100
200
100
0.20
0.10
0.20
250
0.05
0.15
0.30
p(x,y)
x
px(100)=p(100,0 )+p(100,100)+p(100,200)=0.5
px(250)=p(250,0 )+p(250,100)+p(250,200)=0.5
8
0.5, x  100, 250
px ( x)  
0, otherwise
School of Software
5.1. Jointly Distributed Random Variables
 Example 5.2 (Cont’)
0
y
100
200
100
0.20
0.10
0.20
250
0.05
0.15
0.30
p(x,y)
x
py(0)=p(100,0)+p(250,0)=0.2+0.05=0.25
py(100)=p(100,100)+p(250,100)=0.1+0.15=0.25
py(200)=p(100,200)+p(250,200)=0.2+0.3=0. 5
0.25, y  0,100

pY ( y )  0.5, y  200
0, otherwise

P(Y ≥ 100) = p(100,100) + p(250,100) + p(100,200) + p(250,200)
= pY(100)+pY (200) =0.75
9
School of Software
5.1. Jointly Distributed Random Variables
 The Joint Probability Density Function for Two
Continuous Random Variables
Let X and Y be two continuous random variables. Then
f(x,y) is the joint probability density function for X and Y if
for any two-dimensional set A
P[( X , Y )  A]   f ( x, y)dxdy
A
Two requirements for a joint pdf
1. f(x,y) ≥ 0; for all pairs (x,y) in R2
 
2.   f ( x, y)dxdy  1
 
10
School of Software
5.1. Jointly Distributed Random Variables
 In particular, if A is the two-dimensional rectangle
{(x,y):a ≤ x ≤ b, c ≤ y ≤ d},then
P[( X , Y )  A]  P(a  X  b, c  Y  d )  
b
a
f(x,y)

d
c
f ( x, y)dydx
y
Surface f(x,y)
rectangle
x
11
School of Software
5.1. Jointly Distributed Random Variables
 Example 5.3
A bank operates both a drive-up facility and a walk-up window.
On a randomly selected day, let X = the proportion of time that
the drive-up facility is in use, Y = the proportion of time that the
walk-up window is in use. Let the joint pdf of (X,Y) be
6
2
(
x

y
)

f ( x, y )   5
0
0  x  1,0  y  1
otherwise
1. Verify that f(x,y) is a joint probability density function;
2. Determine the probability P(0  X  1 ,0  Y  1 )
4
12
4
School of Software
5.1. Jointly Distributed Random Variables
 Marginal Probability density function
The marginal probability density functions of X and Y,
denoted by fX(x) and fY(y), respectively, are given by
f X ( x)  

fY ( y )  



f ( x, y )dy
for    x  
f ( x, y )dx
for    y  
Y
Fixed y
X
Fixed x
13
School of Software
5.1. Jointly Distributed Random Variables
 Example 5.4 (Ex. 5.3 Cont’)
The marginal pdf of X, which gives the probability distribution of
busy time for the drive-up facility without reference to the walk-up
window, is

16
6
2
f X ( x)   f ( x, y )dy   ( x  y 2 )dy  x 

0 5
5
5
for x in (0,1); and 0 for otherwise.
6 2 3
 y 
fY ( y )   5
5
0
0  y 1
otherwise
Then
3/4
1
3
P(  Y  )   fY ( y )dy  0.4625
4
4 1/4
14
School of Software
5.1. Jointly Distributed Random Variables
 Example 5.5
A nut company markets cans of deluxe mixed nuts containing
almonds, cashews, and peanuts. Suppose the net weight of each can
is exactly 1 lb, but the weight contribution of each type of nut is
random. Because the three weights sum to 1, a joint probability
model for any two gives all necessary information about the weight
of the third type. Let X = the weight of almonds in a selected can
and Y = the weight of cashews. The joint pdf for (X,Y) is
24 xy 0  x  1,0  y  1, x  y  1
f ( x, y)  
otherwise
0
15
School of Software
5.1. Jointly Distributed Random Variables
 Example 5.5 (Cont’)
24 xy 0  x  1,0  y  1, x  y  1
f ( x, y)  
otherwise
0
1: f(x,y) ≥ 0
(0,1)
2:
(x,1-x)





f ( x, y )dydx    f ( x, y )dydx
D
1
1 x
0
0
  {
x
(1, 0)
(24 xy ) dy}dx
1
  12 x(1  x) 2 dx  1
0
16
School of Software
5.1. Jointly Distributed Random Variables
 Example 5.5 (Cont’)
Let the two type of nuts together make up at most 50%
of the can, then A={(x,y); 0≤x ≤1; 0 ≤ y ≤ 1, x+y ≤ 0.5}
(0,1)
P(( X , Y )  A)    f ( x, y )dydx
A
0.5
0.5 x
0
0
  {
x+y=0.5
(1, 0)
(24 xy )dy}dx
 0.625
17
School of Software
5.1. Jointly Distributed Random Variables
 Example 5.5 (Cont’)
The marginal pdf for almonds is obtained by
holding X fixed at x and integrating f(x,y) along
the vertical line through x:
(0,1)
f X ( x)  
(x,1-x)
0.5
0
f ( x, y )dy
 1 x (24 xy )dy  12 x(1  x) 2 , 0  x  1
  0
0, otherwise
x
(1, 0)
18
School of Software
5.1. Jointly Distributed Random Variables
 Independent Random Variables
Two random variables X and Y are said to be
independent if for every pair of x and y values,
p( x, y)  p X ( x)  pY ( y)
f ( x, y)  f X ( x)  f Y ( y)
when X and Y are discrete
when X and Y are continuous
Otherwise, X and Y are said to be dependent.
Namely, two variables are independent if their joint pmf or pdf is the product of
the two marginal pmf’s or pdf’s.
19
School of Software
5.1. Jointly Distributed Random Variables
 Example 5.6
In the insurance situation of Example 5.1 and 5.2
p(100,100)  0.1  (0.5)(0.25)  pX (100) pY (100)
0
y
100
200
100
0.20
0.10
0.20
250
0.05
0.15
0.30
p(x,y)
x
So, X and Y are not independent.
20
School of Software
5.1. Jointly Distributed Random Variables
 Example 5.7 (Ex. 5.5 Cont’)
Because f(x,y) has the form of a product, X and Y
would appear to be independent. However, although
f X ( x)  
1 x
0
fY ( y)  
1 y
0
(24 xy)dy  12 x(1  x)2
(24 xy)dx  12 y(1  y)2
By symmetry
3
3
9
9 9
f X ( ) f Y ( )  , f ( x, y )  0  
4
4 16
16 16
21
School of Software
5.1. Jointly Distributed Random Variables
 Example 5.8
Suppose that the lifetimes of two components are independent of
one another and that the first lifetime, X1, has an exponential
distribution with parameter λ1 whereas the second, X2, has an
exponential distribution with parameter λ2. Then the joint pdf is
12e1x1 2 x2
f ( x1 , x2 )  f X1 ( x1 )  f X 2 ( x2 )  
0
x1  0, x2  0
otherwise
Let λ1 =1/1000 and λ2=1/1200. So that the expected lifetimes are
1000 and 1200 hours, respectively. The probability that both
component lifetimes are at least 1500 hours is
P(1500  X1 ,1500  X 2 )  P(1500  X1 ) P(1500  X 2 )
22
School of Software
5.1. Jointly Distributed Random Variables
 More than Two Random Variables
If X1, X2, …, Xn are all discrete rv’s, the joint pmf of the
variables is the function
p(x1, x2, …, xn) = P(X1 = x1, X2 = x2, …, Xn = xn)
If the variables are continuous, the joint pdf of X1, X2,
…, Xn is the function f(x1, x2, …, xn) such that for any n
intervals [a1, b1], …, [an, bn],
b1
bn
a1
an
P(a1  X 1  b1 ,..., an  X n  bb )   ...  f ( x1 ,..., xn )dxn ...dx1
23
School of Software
5.1. Jointly Distributed Random Variables
 Independent
The random variables X1, X2, …Xn are said to be
independent if for every subset Xi1, Xi2,…, Xik of the
variable, the joint pmd or pdf of the subset is equal to
the product of the marginal pmf’s or pdf’s.
24
School of Software
5.1. Jointly Distributed Random Variables
 Multinomial Experiment
An experiment consisting of n independent and identical trials, in
which each trial can result in any one of r possible outcomes. Let
pi=P(Outcome i on any particular trial), and define random
variables by Xi=the number of trials resulting in outcome i
(i=1,…,r). The joint pmf of X1,…,Xr is called the multinomial
distribution.
n!

x1
xr
p
...
p

1
r , xi  0,1...withx1  x2 ...  xr  n
p( x1 ,..., xr )   ( x1 !)( x2 !)...( xr !)
0

Note: the case r=2 gives the binomial distribution.
25
School of Software
5.1. Jointly Distributed Random Variables
 Example 5.9
If the allele of each of then independently obtained pea
sections id determined and p1=P(AA), p2=P(Aa),
p3=P(aa), X1= number of AA’s, X2=number of Aa’s and
X3=number of aa’s, then
10!
p( x1 , x2 , x3 ) 
p1x1 p2 x 2 p3 x 3 , xi  0,1,..andx1  x2  x3  10
( x1 !)( x2 !)...( xr !)
If p1=p3=0.25, p2=0.5, then
P( x1  2, x2  5, x3  3)  p(2,5,3)  0.0769
26
School of Software
5.1. Jointly Distributed Random Variables
 Example 5.10
When a certain method is used to collect a fixed volume of
rock samples in a region, there are four resulting rock types.
Let X1, X2, and X3 denote the proportion by volume of rock
types 1, 2 and 3 in a randomly selected sample. If the joint pdf
of X1,X2 and X3 is
kx1 x2 (1  x3 ), 0  x1  1, 0  x2  1, 0  x3  1, x1  x2  x3  1
f ( x1 , x2, x3)  
0, otherwise
 
f ( x1 , x 2, x3)  1, D1 :   xi  , i  1, 2,3
k=144.
D1
 
f ( x1 , x 2, x3)  0.6066, D2 : X 1  X 2  0.5
D2
27
School of Software
5.1. Jointly Distributed Random Variables
 Example 5.11
If X1, …,Xn represent the lifetime of n components, the
components operate independently of one another, and
each lifetime is exponentially distributed with
parameter, then
f ( x1 , x2 ,...xn )  ( e   x1 )( e   x2 )...( e   xn )
n    xi

 e
, x1  0; x2  0;..., xn  0;


0, otherwise
28
School of Software
5.1. Jointly Distributed Random Variables
 Example 5.11 (Cont’)
If there n components constitute a system that will fail
as soon as a single component fails, then the probability
that the system lasts past time is


t
t
P( X 1  t , X 2  t ,..., X n  t )   ... f ( x1 , x2 ,..., xn )dx1...dxn


t
t
 (   e   x1 dx1 )...(   e   xn dxn )  e  nt
therefore,
P(systemlifetime  t )  1  e nt , fort  0
29
School of Software
5.1. Jointly Distributed Random Variables
 Conditional Distribution
Let X and Y be two continuous rv’s with joint pdf f(x,y) and
marginal X pdf fX(x). Then for any X values x for which
fX(x)>0, the conditional probability density function of Y
given that X=x is
f ( x, y )
fY | X ( y | x ) 
,   y  
f X ( x)
If X and Y are discrete, then
fY | X ( y | x ) 
p ( x, y )
,   y  
p X ( x)
is the conditional probability mass function of Y when X=x.
30
School of Software
5.1. Jointly Distributed Random Variables
 Example 5.12 (Ex.5.3 Cont’)
X= the proportion of time that a bank’s drive-up facility is busy
and Y=the analogous proportion for the walk-up window. The
conditional pdf of Y given that X=0.8 is
f (0.8, y ) 1.2(0.8  y 2 ) 1
fY | X ( y | 0.8) 

 (24  30 y 2 ), 0  y  1
f X (0.8) 1.2(0.8)  0.4 34
The probability that the walk-up facility is busy at most half the
time given that X=0.8 is then
0.5
fY | X ( y  0.5 | X  0.8) 

0.5
fY | X ( y | 0.8)dy 

31
1
2
(24

30
y
)dy  0.39
 34
School of Software
5.1. Jointly Distributed Random Variables
 Homework
Ex. 9, Ex.12, Ex.18, Ex.19
32
School of Software
5.2 Expected Values, Covariance, and Correlation
 The Expected Value of a function h(x,y)
Let X and Y be jointly distribution rv’s with pmf p(x,y) or pdf
f(x,y) according to whether the variables are discrete or
continuous. Then the expected value of a function h(X,Y),
denoted by E[h(X,Y)] or μh(X,Y) , is given by
 h( x, y )  p( x, y ), X & Y : discrete
 x y
E[h( X , Y )]  



h( x, y )  f ( x, y )dxdy, X & Y : continuous
  
33
School of Software
5.2 Expected Values, Covariance, and Correlation
 Example 5.13
Five friends have purchased tickets to a certain concert. If the
tickets are for seats 1-5 in a particular row and the tickets are
randomly distributed among the five, what is the expected
number of seats separating any particular two of the five?
1

p ( x, y )   20
 0
x  1,...,5; y  1,...,5; x  y
otherwise
The number of seats separating the two individuals is
h(X,Y)=|X-Y|-1
34
School of Software
5.2 Expected Values, Covariance, and Correlation
 Example 5.13 (Cont’)
h(x,y)
1
2
y 3
4
5
1
-0
1
2
3
2
0
-0
1
2
x
3
1
0
-0
1
4
2
1
0
-0
5
3
2
1
0
--
E[h( X , Y )]    h( x, y )  p( x, y )
( x, y )
5
5
 
x 1 y 1
x y
1
(| x  y | 1)   1
20
35
School of Software
5.2 Expected Values, Covariance, and Correlation
 Example 5.14
In Example 5.5, the joint pdf of the amount X of almonds and
amount Y of cashews in a 1-lb can of nuts was
 24 xy 0  x  1, 0  y  1, x  y  1
f ( x, y )  
otherwise
0
If 1 lb of almonds costs the company \$1.00, 1 lb of cashews
costs \$1.50, and 1 lb of peanuts costs \$0.50, then the total cost
of the contents of a can is
h(X,Y)=(1)X+(1.5)Y+(0.5)(1-X-Y)=0.5+0.5X+Y
36
School of Software
5.2 Expected Values, Covariance, and Correlation
 Example 5.14 (Cont’)
The expected total cost is
E[h( X , Y )]  



1
0

1 x
0



h( x, y)  f ( x, y)dxdy
(0.5  0.5 x  y)  24 xydydx  \$1.10
Note: The method of computing E[h(X1,…, Xn)], the expected value of a function
h(X1, …, Xn) of n random variables is similar to that for two random variables.
37
School of Software
5.2 Expected Values, Covariance, and Correlation
 Covariance
The Covariance between two rv’s X and Y is
Cov( X , Y )  E[( X   X )(Y  Y )]
  ( x   X )( y  Y ) p( x, y )
 x y

 

( x   X )( y  Y ) f ( x, y ) dxdy
  
38
X , Y discrete
X , Y continuous
School of Software
5.2 Expected Values, Covariance, and Correlation
 Illustrates the different possibilities.
y
-
+
y
μY
-
+
x
(a) positive covariance
-
+
+
-
μY
μY
+ μX
y
+ μX
x
(b) negative covariance;
μX
(c) covariance near zero
Here: P(x, y) =1/10
39
x
School of Software
5.2 Expected Values, Covariance, and Correlation
 Example 5.15
The joint and marginal pmf’s for X = automobile policy
deductible amount and Y = homeowner policy deductible amount
in Example 5.1 were
y
x 100 250
y 0 100 250
p(x,y)
0 100 200
pX(x) .5 .5
pY(y) .25 .25 .5
x 100 .20 .10 .20
250 .05 .15 .30
From which μX=∑xpX(x)=175 and μY=125. Therefore
Cov( X , Y )    ( x  175)( y  125) p( x, y )
( x, y )
 (100  175)(0  125)(0.2)  ...  (250  175)(200  125)(0.3)  1875
40
School of Software
5.2 Expected Values, Covariance, and Correlation
 Proposition
Cov( X , Y )  E ( XY )   X Y
Note:
Cov( X , X )  E ( X 2 )   2 X  V ( X )
 Example 5.16 (Ex. 5.5 Cont’)
The joint and marginal pdf’s of X = amount of almonds
and Y = amount of cashews were
24 xy 0  x  1, 0  y  1, x  y  1
f ( x, y )  
otherwise
 0
41
School of Software
5.2 Expected Values, Covariance, and Correlation
 Example 5.16 (Cont’)
12 x(1  x)2
f X ( x)  
0
0  x 1
otherwise
fY(y) can be obtained through replacing x by y in fX(x). It is
easily verified that μX = μY = 2/5, and
E( XY )  



 
1 1 x
1
xyf ( x, y)dxdy  0 0 xy  24 xydydx  80 x2 (1  x)3dx  2 /15
Thus Cov(X,Y) = 2/15 - (2/5)2 = 2/15 - 4/25 = -2/75. A negative
covariance is reasonable here because more almonds in the can
implies fewer cashews.
42
School of Software
5.2 Expected Values, Covariance, and Correlation
 Correlation
The correlation coefficient of X and Y, denoted by
Corr(X,Y), ρX,Y or just ρ, is defined by
 X ,Y
Cov( X , Y )

 X  Y
The normalized version of Cov(X,Y)
 Example 5.17
It is easily verified that in the insurance problem of
Example 5.15, σX = 75 and σY = 82.92. This gives
ρ = 1875/(75)(82.92)=0.301
43
School of Software
5.2 Expected Values, Covariance, and Correlation
 Proposition
1. If a and c are either both positive or both negative
Corr(aX+b, cY+d) = Corr(X,Y)
2. For any two rv’s X and Y, -1 ≤ Corr(X,Y) ≤ 1.
3. If X and Y are independent, then ρ = 0, but ρ = 0 does
not imply independence.
4. ρ = 1 or –1 iff Y = aX+b for some numbers a and b
with a ≠ 0.
44
School of Software
5.2 Expected Values, Covariance, and Correlation
 Example 5.18
Let X and Y be discrete rv’s with joint pmf
1
( x, y)  (4,1), (4, 1), (2, 2)(2, 2)

p( x, y)   4

otherwise
0
It is evident from the figure that the value of X is completely determined by
the value of Y and vice versa, so the two variables are completely dependent.
However, by symmetry μX = μY = 0 and E(XY) = (-4)1/4 + (-4)1/4 + (4)1/4 +
(4)1/4 = 0, so Cov(X,Y) = E(XY) - μX μY = 0 and thus ρXY = 0.
Although there is perfect dependence, there is also complete absence of any
linear relationship!
45
School of Software
5.2 Expected Values, Covariance, and Correlation
 Another Example
X and Y are uniform distribution in an unit circle
1 2
2
 , x  y 1
p ( x, y )   
0, otherwise
(1,0)
Obviously, X and Y are dependent.
However, we have
Cov( X , Y )  0
46
School of Software
5.2 Expected Values, Covariance, and Correlation
 Homework
Ex. 24, Ex. 26, Ex. 33, Ex. 35
47
School of Software
5.3 Statistics and Their Distributions
 Example 5.19
f(x)
Given a Weibull Population with α=2, β=5
~
μ= 4.4311, μ= 4.1628, δ=2.316
0.15
0.10
0.05
0
x
5
15
10
48
School of Software
5.3 Statistics and Their Distributions
 Example 5.19 (Cont’)
Sample
1
2
3
1
6.1171
5.07611 3.46710 1.55601 3.12372 8.93795
2
4.1600
6.79279 2.71938 4.56941 6.09685 3.92487
3
3.1950
4.43259 5.88129 4.79870 3.41181 8.76202
4
0.6694
8.55752 5.14915 2.49795 1.65409 7.05569
5
1.8552
6.82487 4.99635 2.33267 2.29512 2.30932
6
5.2316
7.39958 5.86887 4.01295 2.12583 5.94195
7
2.7609
2.14755 6.05918 9.08845 3.20938 6.74166
8
10.2185 8.50628 1.80119 3.25728 3.23209 1.75486
9
5.2438
5.49510 4.21994 3.70132 6.84426 4.91827
10
4.5590
4.04525 2.12934 5.50134 4.20694 7.26081
49
4
5
6
School of Software
5.3 Statistics and Their Distributions
 Example 5.19 (Cont’)
Sample
1
2
3
4
5
6
Mean
4.401
5.928
4.229
4.132
3.620
5.761
Median
4.360
6.144
4.608
3.857
3.221
6.342
Standard Deviation
2.642
2.062
1.611
2.124
1.678
2.496
Function of the
sample observation
Sample 1
Function of the
sample observation
Population
Sample 2
…
Function of the
sample observation
Sample k
50
A quantity #1
A quantity #2
A quantity #k
School of Software
statistic
5.3 Statistics and Their Distributions
 Statistic
A statistic is any quantity whose value can be
calculated from sample data (with a function).
 Prior to obtaining data, there is uncertainty as to what value of
any particular statistic will result. Therefore, a statistic is a
random variable. A statistic will be denoted by an uppercase
letter; a lowercase letter is used to represent the calculated or
observed value of the statistic.
 The probability distribution of a statistic is sometimes referred to
as its sampling distribution. It describes how the statistic varies
in value across all samples that might be selected.
51
School of Software
5.3 Statistics and Their Distributions
 The probability distribution of any particular
statistic depends on
1. The population distribution, e.g. the normal, uniform,
etc. , and the corresponding parameters
2. The sample size n (refer to Ex. 5.20 & 5.30)
3. The method of sampling, e.g. sampling with
replacement or without replacement
52
School of Software
5.3 Statistics and Their Distributions
 Example
Consider selecting a sample of size n = 2 from a
population consisting of just the three values 1, 5, and
10, and suppose that the statistic of interest is the
sample variance.
 If sampling is done “with replacement”, then S2 = 0
will result if X1 = X2.
 If sampling is done “without replacement”, then
S2 can not equal 0.
53
School of Software
5.3 Statistics and Their Distributions
 Random Sample
The rv’s X1, X2,…, Xn are said to form a (simple) random
sample of size n if
1. The Xi’s are independent rv’s.
2. Every Xi has the same probability distribution.
When conditions 1 and 2 are satisfied, we say that the Xi’s
are independent and identically distributed (i.i.d)
Note: Random sample is one of commonly used sampling methods in practice.
54
School of Software
5.3 Statistics and Their Distributions
 Random Sample
 Sampling with replacement or from an infinite population is
random sampling.
 Sampling without replacement from a finite population is
generally considered not random sampling. However, if the
sample size n is much smaller than the population size N (n/N ≤
0.05), it is approximately random sampling.
Note: The virtue of random sampling method is that the probability
distribution of any statistic can be more easily obtained than for any other
sampling method.
55
School of Software
5.3 Statistics and Their Distributions
 Deriving the Sampling Distribution of a Statistic
 Method #1: Calculations based on probability rules
e.g. Example 5.20 & 5.21
 Method #2:
Carrying out a simulation experiments
e.g. Example 5.22 & 5.23
56
School of Software
5.3 Statistics and Their Distributions
 Example 5.20
A large automobile service center charges \$40, \$45, and \$50 for a tuneup of four-, six-, and eight-cylinder cars, respectively. If 20% of its
tune-ups are done on four-cylinder cars, 30% on six-cylinder cars, and
50% on eight-cylinder cars, then the probability distribution of revenue
from a single randomly selected tune-up is given by
x
40
45
50
p(x)
0.2
0.3
0.5
μ = 46.5
σ2 = 15.25
Suppose on a particular day only two servicing jobs involve tune-ups.
Let X1 = the revenue from the first tune-up &
X2 = the revenue from the second,
which constitutes a random sample with the above probability distribution.
57
School of Software
5.3 Statistics and Their Distributions
 Example 5.20 (Cont’)
x1
x2
p(x1,x2)
x
s2
40
40
0.04
40
0
40
45
0.06
42.5
12.5
40
50
0.10
45
50
45
40
0.06
42.5
12.5
45
45
0.09
45
0
45
50
0.15
47.5
12.5
50
40
0.10
45
50
50
45
0.15
47.5
12.5
50
50
0.25
50
0
x
40
42.5
45
47.5
50
px(x) 0.04 0.12 0.29 0.30 0.25
_
_
  E ( X )  46.5   ,   V ( X )  7.635 
2
_
x
s2
0
12.5
50
ps2(s2) 0.38 0.42 0.20
s  E(S 2 )  15.25   2
2
Known the Population Distribution
58
School of Software
2
2
5.3 Statistics and Their Distributions
 Example 5.20 (Cont’)
x
40
42.5
45
47.5
50
n=2
px(x) 0.04 0.12 0.29 0.30 0.25
x
40
41.25
42.5
43.75
45
43.26
47.5
48.75
50
px(x)
0.0016
0.0096
0.0376
0.0936
0.1761
0.2340
0.2350
0.1500
0.0625
n=4
…
59
School of Software
5.3 Statistics and Their Distributions
 Example 5.21
The time that it takes to serve a customer at the cash register in a
minimarket is a random variable having an exponential distribution
with parameter λ. Suppose X1 and X2 are service times for two different
customers, assumed independent of each other. Consider the total
service time To = X1 + X2 for the two customers, also a statistic. What is
the pdf of To? The cdf of To is, for t≥0
FT0 (t )  P( X 1  X 2  t ) 

t t  x1
{( x1 , x2 ); x1  x2 t }
x2
f ( x1 , x2 )dx1dx2  

0
t
0
e  x  e  x dx2 dx1
1
2
(x1,t-x1)
  [ e  x1   e t ]dx1
x1+x2= t
 1  e t  te t
0
x
1
60
School of Software
5.3 Statistics and Their Distributions
 Example 5.21 (Cont’)
The pdf of To is obtained by differentiating FTo(t);
 2tet
fT0 (t )  
0
t 0
t 0
This is a gamma pdf (α = 2 and β = 1/λ).
The pdf of X = To/2 is obtained from the relation
{ X ≤ x }iff {To ≤ 2 x } as
4 2 xe2 x
fX (x )  
0
61
x0
x0
School of Software
5.3 Statistics and Their Distributions
 Simulation Experiments




This method is usually used when a derivation via probability
rules is too difficult or complicated to be carried out. Such an
experiment is virtually always done with the aid of a computer.
And the following characteristics of an experiment must be
specified:
The statistic of interest (e.g. sample mean, S, etc.)
The population distribution (normal with μ = 100 and σ = 15,
uniform with lower limit A = 5 and upper limit B = 10, etc.)
The sample size n (e.g., n = 10 or n = 50)
The number of replications k (e.g., k = 500 or 1000) (the actual
sampling distribution emerges as k∞)
62
School of Software
5.3 Statistics and Their Distributions
 Example 5.23
Consider a simulation experiment in which the population
distribution is quite skewed. Figure shows the density curve of
a certain type of electronic control (actually a lognormal
distribution with E(ln(X)) = 3 and V(ln(X))=.4).
f(x)
E(X)=μ=21.7584, V(X)=σ2=82.1449
.05
.03
.01
0
25
50
63
75
x
School of Software
5.3 Statistics and Their Distributions
 Example 5.23 (Cont’)
1.
2.
64
Center of the sampling
distribution remains at
the population mean.
As n increases:
 Less skewed
(“more normal”)
 More concentrated
(“smaller variance”)
School of Software
5.3 Statistics and Their Distributions
 Homework
Ex.38, Ex.41
65
School of Software
5.4 The Distribution of the Sample Mean
 Proposition
Let X1, X2, …, Xn be a random sample (i.i.d. rv’s) from a
distribution with mean value μ and standard deviation σ.

Then
E( X )    

X

V ( X )   2 
X
2
n
and     / n
X
In addition, with To=X1+…+Xn (the sample total),
E (T0 )  n ,V (T0 )  n 2 and  T0  n
Refer to 5.5 for the proof!
66
School of Software
5.4 The Distribution of the Sample Mean
 Example 5.24
In a notched tensile fatigue test on a titanium specimen, the expected
number of cycles to first acoustic emission (used to indicate crack
initiation) is μ = 28,000, and the standard deviation of the number of
cycles is σ = 5000.
Let X1, X2, …, X25 be a random sample of size 25, where each Xi is the
number of cycles on a different randomly selected specimen. Then
E ( X )    28, 000, E (T0 )  n  25(28000)  700, 000
The standard deviations of X and To are
5000
X  / n 
 1000
25
 T  n  25 (5000)  25,000
0
67
School of Software
5.4 The Distribution of the Sample Mean
 Proposition
Let X1, X2, …, Xn be a random sample from a normal
distribution with mean μ and standard deviation σ. Then
for any n, X is normally distributed (with mean μ and
standard deviation  / n ), as is To (with mean nμ and
standard deviation n ).
68
School of Software
5.4 The Distribution of the Sample Mean
 Example 5.25
The time that it takes a randomly selected rat of a certain
subspecies to find its way through a maze is a normally
distributed rv with μ = 1.5 min and σ = .35 min. Suppose five rats
are selected. Let X1, X2, …, X5 denote their times in the maze.
Assuming the Xi’s to be a random sample from this normal
distribution.
 Q #1: What is the probability that the total time To =
X1+X2+…+X5 for the five is between 6 and 8 min?
 Q #2: Determine the probability that the sample average time X
is at most 2.0 min.
69
School of Software
5.4 The Distribution of the Sample Mean
 Example 5.25 (Cont’)
A #1: To has a normal distribution with μTo= nμ = 5(1.5) = 7.5
min and variance σTo2 =nσ2 = 5(0.1225) =0.6125, so σTo = 0.783
min. To standardize To, subtract μTo and divide by σTo:
6  7.5
8  7.5
P(6  To  8)  P(
Z
)
0.783
0.783
 P(1.92  Z  0.64)  (0.64)  (1.92)  0.7115
A #2:
E ( X )    1.5
 X   / n  0.35 / 5  0.1565
2.0  1.5
)
0.1565
 P( Z  3.19)  (3.19)  0.9993
P( X  2.0)  P( Z 
70
School of Software
5.4 The Distribution of the Sample Mean
 The Central Limit Theorem (CLT)
Let X1, X2, …, Xn be a random sample from a distribution
(may or may not be normal) with mean μ and variance σ2.
Then if n is sufficiently large, X has approximately a
normal distribution with
 X   ,  X2   2 / n
To also has approximately a normal distribution with
T  n , T 2  n 2
0
0
The larger the value of n, the better the approximation
Usually, If n > 30, the Central Limit Theorem can be used.
71
School of Software
5.4 The Distribution of the Sample Mean
 An Example for Uniform Distribution
72
School of Software
5.4 The Distribution of the Sample Mean
 An Example for Triangular Distribution
73
School of Software
5.4 The Distribution of the Sample Mean
 Example 5.26
When a batch of a certain chemical product is prepared, the amount
of a particular impurity in the batch is a random variable with mean
value 4.0g and standard deviation 1.5g. If 50 batches are
independently prepared, what is the (approximate)
probability that
_
the sample average amount of impurity X is between 3.5 and 3.8g?
_
Here n = 50 is large enough for the CLT to be applicable. X then has
approximately a normal distribution with mean value  X  4.0 and
 X  1.5 / 50  0.2121, so
3.5  4.0
3.8  4.0
Z
)  (0.94)  (2.36)  0.1645
P(3.5  X  3.8)  P(
0.2121
0.2121
74
School of Software
5.4 The Distribution of the Sample Mean
 Example 5.27
A certain consumer organization customarily reports the number
of major defects for each new automobile that it tests. Suppose
the number of such defects for a certain model is a random
variable with mean value 3.2 and standard deviation 2.4. Among
100 randomly selected cars of this model, how likely is it that the
sample average number of major defects exceeds 4?
Let Xi denote the number of major defects for the ith car in the
random sample. Notice that Xi is a discrete rv, but the CLT is
applicable whether the variable of interest is discrete or
continuous.
75
School of Software
5.4 The Distribution of the Sample Mean
 Example 5.27 (Cont’)
Using  X  3.2 and  X  0.24
4  3.2 

P( X  4)  P  Z 

0.24 

 1   (3.33)  0.0004
76
School of Software
5.4 The Distribution of the Sample Mean
 Other Applications of the CLT
The CLT can be used to justify the normal approximation to the
binomial distribution discussed in Chapter 4. Recall that a
binomial variable X is the number of successes in a binomial
experiment consisting of n independent success/failure trials with
p = P(S) for any particular trial. Define new rv’s X1, X2, …, Xn by
1 if the ith trial results in a success
Xi =
(i = 1, …, n)
0 if the ith trial results in a failure
77
School of Software
5.4 The Distribution of the Sample Mean
 Because the trials are independent and P(S) is constant from trial to trial to
trial, the Xi’s are i.i.d (a random sample from a Bernoulli distribution).
 The CLT then implies that if n is sufficiently large, both the sum and the
average of the Xi’s have approximately normal distributions. Now the
binomial rv X = X1+….+Xn. X/n is the sample mean of the Xi’s. That is, both X
and X/n are approximately normal when n is large.
 The necessary sample size for this approximately depends on the value of p:
When p is close to .5, the distribution of Xi is reasonably symmetric. The
distribution is quit skewed when p is near 0 or 1.
1
0
p=0.4
(a)
1
0
p=0.1
78
Rule:
np ≥ 10 & n(1-p) ≥ 10
rather than
n>30
(b)
School of Software
5.4 The Distribution of the Sample Mean
 Proposition
Let X1, X2, …, Xn be a random sample from a distribution for
which only positive values are possible [P(Xi > 0) = 1]. Then if n
is sufficiently large, the product Y = X1X2 · … · Xn has
approximately a lognormal distribution.
ln(Y)=ln(X1)+ ln(X2)+…+ ln(Xn)
79
School of Software
Supplement: Law of large numbers
 Chebyshev's Inequality
Let X be a random variable (continuous or discrete) , then
P(| X  E(X ) |  ) 
Proof:
D(X )
,  0
2

(X  E(X ))2
| X  E(X ) |
 1)
P(| X  E(X ) |  )  P(
 1)  P(
2





B
2
1
p(x )dx
(X  E(X ))


(X  E (X ))2


2
2
(X  E(X ))2

2
p(x )dx
B  {X  E(X )   }  {X  E(X )   }
|
p(x )dx 
(X  E(X ))2

D(X )
80
2
| 1
2
School of Software
Supplement: Law of large numbers
 Khintchine law of large numbers
X1, X2, ... an infinite sequence of i.i.d. random variables with
finite expected value E(Xk) = µ < ∞ and variable D(Xk) = δ2 < ∞
lim P(|
n 
Proof:
Xn 
1
n
Xi

n i
  |  )  1,  0
1
1
n
Xi

n i
1
2
E(X n )  ; D(X n ) 
n
According to Chebyshev's inequality
D(X n ) 1  2 n  
P(|  X i   |  )  P(| X n  E(X n ) |  ) 

2
n i 1
n 2

1 n
1 n
lim P(|  X i   |  )  1  lim P(|  X i   |  )  1
n 
n 
n i 1
n i 1
1
n
81
School of Software
0
Supplement: Law of large numbers
 Bernoulli law of large numbers
The empirical probability of success in a series of Bernoulli trials Ai
will converge to the theoretical probability.
1,A occurs
Ai   i
0,others
Ai
1
0
p
p
1-p
Let n(A) be the number of replication on which A does occur, then
we have
2
n(A ) 1 n
n(A )
n

Ai

n i
E(
1
)  p D( n(A ))    p(1  p )
n
n
n
n
According to Chebyshev's inequality
n(A )
1 p(1  p ) n  
 p ) |  ) 
0
n
n
2
n(A )
n(A )
lim P(|
 p |  )  1  lim P(|
 p |  )  1
n 
n


n
n
P(|
82
School of Software
Supplement: Law of large numbers
1
Relative
frequency:
n(A)/n
p
0
1
2 3
… 100 101 …
Number of experiments performed
83
School of Software
5.4 The Distribution of the Sample Mean
 Homework
Ex. 48, Ex. 51, Ex. 55, Ex. 56
84
School of Software
5.5 The Distribution of a Linear Combination
 Linear Combination
Given a collection of n random variables X1, …, Xn and
n numerical constants a1, …, an, the rv
n
Y  a1 X 1      an X n   ai X i
i 1
is called a linear combination of the Xi’s.
85
School of Software
5.5 The Distribution of a Linear Combination
Let X1, X2, …, Xn have mean values μ1, …, μn respectively,
and variances of σ12, …., σn2, respectively.
 Whether or not the Xi’s are independent,
E(i 1 ai X i )  i 1 ai E( X i )  i 1 ai i
n
n
n
 If X1, X2, …, Xn are independent,
V (i 1 ai X i )  i 1 a V ( X i )  i 1 ai2 i2
n
n
n
2
i
 a X a X  a1212  an2 n2
1 1
n
n
 For any X1, X2, …, Xn,
V ( i 1 ai X i )   i 1  j 1 ai a j Cov( X i , X j )
n
n
n
86
School of Software
5.5 The Distribution of a Linear Combination
 Proof:
E(i 1 ai X i )  i 1 ai E( X i )  i 1 ai i
n
n
n
For the result concerning expected values, suppose that
Xi’s are continuous with joint pdf f(x1,…,xn). Then




E(i 1 ai X i )   ... (i 1 ai xi ) f ( x1 ,..., xn )dx1...dxn
n
n




 i 1 ai  ... xi f ( x1 ,..., xn )dx1...dxn
n

 i 1 ai  xi f X i ( xi )dxi
n

 i 1 ai E ( X i )
n
87
School of Software
5.5 The Distribution of a Linear Combination
 Proof:
V ( i 1 ai X i )   i 1  j 1 ai a j Cov( X i , X j )
n

n


V  i 1 ai X i  E 

n




a
X

a

 i1 i i  i1 i i 
 E   i 1 ai  X i  i  


n
n
n
2

n
E

n
i 1

n
2

a a j ( X i  i )( X j   j )
j 1 i
  i 1  j 1 ai a j E[( X i  i )( X j   j )]
n
n
 i 1  j 1 ai a j Cov( X i , X j )
n
n
When the Xi’s are independent, Cov(Xi, Xj) = 0 for i ≠ j, and
V


2

a

a
a
Cov(
X
,
X
)
a
X
i1 i V ( X i )
i1 i i i1  j 1 i j
i
j
n
n
n
n
88
School of Software
5.5 The Distribution of a Linear Combination
 Example 5.28
\$1.35, and \$1.50 per gallon, respectively. Let X1, X2 and X3
denote the amounts of these grades purchased (gallon) on a
particular day. Suppose the Xi’s are independent with μ1 = 1000,
μ2= 500, μ3= 300, σ1 = 100, σ2 = 80, and σ3 = 50. The revenue
from sales is Y = 1.2X1+1.35X2+1.5X3. Compute E(Y), V(Y), σY.
E (Y )  1.21  1.352  1.53  \$2325
V (Y )  (1.2) 2  12  (1.35) 2  22  (1.5) 2  32  31, 689
Y
 31, 689  \$178.01
89
School of Software
5.5 The Distribution of a Linear Combination
 Corollary (the different between two rv’s)
E(X1-X2) = E(X1) - E(X2) and, if X1 and X2 are independent,
V(X1-X2) = V(X1)+V(X2).
 Example 5.29
A certain automobile manufacturer equips a particular model with
either a six-cylinder engine or a four-cylinder engine. Let X1 and X2 be
fuel efficiencies for independently and randomly selected six-cylinder
and four-cylinder cars, respectively. With μ1 = 22, μ2 = 26, σ1 = 1.2,
and σ2 = 1.5,
E ( X 1  X 2 )  1  2  22  26  4
V ( X1  X 2 )   21   2 2  (1.2)2  (1.5) 2  3.69
 X  X  3.69  1.92
1
2
90
School of Software
5.5 The Distribution of a Linear Combination
 Proposition
If X1, X2, …, Xn are independent, normally distributed rv’s
(with possibly different means and/or variances), then any linear
combination of the Xi’s also has a normal distribution.
 Example 5.30 (Ex. 5.28 Cont’)
The total revenue from the sale of the three grades of gasoline on
a particular day was Y = 1.2X1+1.35X2+1.5X3, and we calculated
μY = 2325 and σY =178.01). If the Xi’s are normally distributed,
the probability that the revenue exceeds 2500 is
2500  2325
)
P(Y  2500)  P ( Z 
178.01
 P( Z  0.98)  1  (0.98)  0.1635
91
School of Software
5.5 The Distribution of a Linear Combination
 Homework
Ex. 58, Ex. 70, Ex. 73
92
School of Software
```