Transcript Lecture 02

2. Independence and Bernoulli Trials
Independence: Events A and B are independent if
P ( AB )  P ( A) P ( B ).
(2-1)
• It is easy to show that A, B independent implies A, B;
A, B; A, B are all independent pairs. For example,
B  ( A  A) B  AB  AB and AB  AB   , so that
P( B )  P( AB  AB )  P( AB)  P( AB )  P( A) P( B )  P( AB )
or
P( AB )  P( B )  P( A) P( B )  (1  P( A)) P( B )  P( A) P( B ),
i.e., A and B are independent events.
1
• If P(A) = 0, then since the event AB  A always, we have
P( AB)  P( A)  0  P( AB)  0,
and (2-1) is always satisfied. Thus the event of zero
probability is independent of every other event!
• Independent events obviously cannot be mutually
exclusive, since P( A)  0, P( B )  0 and A, B independent
implies P( AB)  0. Thus if A and B are independent,
the event AB cannot be the null set.
• More generally, a family of events Ai  are said to be
independent, if for every finite sub collection
Ai , Ai ,, Ai , we have
1
2
n
 n
P  Aik
 k 1
n

   P( Aik ).
 k 1
(2-2)
2
• Let
A  A1  A2  A3    An ,
(2-3)
a union of n independent events. Then by De-Morgan’s
law
A  A1 A2  An
(2-4)
and using their independence
n
n
i 1
i 1
P ( A)  P ( A1 A2  An )   P ( Ai )   (1  P ( Ai )). (2-5)
Thus for any A as in (2-3)
n
P( A)  1  P ( A)  1   (1  P( Ai )),
a useful result.
(2-6)
i 1
3
Example 2.1: Three switches connected in parallel operate
independently. Each switch remains closed with probability
p. (a) Find the probability of receiving an input signal at the
output. (b) Find the probability that switch S1 is open given
that an input signal is received at the output.
S
1
Input
2
S
3
Output
S
Fig.2.1
Solution: a. Let Ai = “Switch Si is closed”. Then P( Ai )  p,
i  1  3. Since switches operate independently, we have
P( Ai Aj )  P( Ai ) P( Aj ); P( A1 A2 A3 )  P( A1 ) P( A2 ) P( A3 ).
4
Let R = “input signal is received at the output”. For the
event R to occur either switch 1 or switch 2 or switch 3
must remain closed, i.e.,
R  A1  A2  A3.
(2-7)
Using (2-3) - (2-6),
P( R )  P( A1  A2  A3 )  1  (1  p )3  3 p  3 p 2  p 3. (2-8)
We can also derive (2-8) in a different manner. Since any
event and its compliment form a trivial partition, we can
always write
P( R )  P( R | A1 ) P( A1 )  P( R | A1 ) P( A1 ).
(2-9)
But P( R | A1 )  1, and P( R | A1 )  P( A2  A3 )  2 p  p2
and using these in (2-9) we obtain
P( R)  p  (2 p  p 2 )(1  p)  3 p  3 p 2  p3 ,
which agrees with (2-8).
(2-10)
5
Note that the events A1, A2, A3 do not form a partition, since
they are not mutually exclusive. Obviously any two or all
three switches can be closed (or open) simultaneously.
Moreover, P( A1 )  P( A2 )  P( A3 )  1.
b. We need P ( A1 | R ). From Bayes’ theorem
P( R | A1 ) P( A1 ) (2 p  p 2 )(1  p)
2  2 p  p 2 (2-11)
P( A1 | R) 


.
2
3
2
3
P( R )
3p  3p  p
3p  3p  p
Because of the symmetry of the switches, we also have
P( A1 | R )  P( A2 | R )  P( A3 | R ).
6
Repeated Trials
Consider two independent experiments with associated
probability models (1, F1, P1) and (2, F2, P2). Let
1, 2 represent elementary events. A joint
performance of the two experiments produces an
elementary events  = (, ). How to characterize an
appropriate probability to this “combined event” ?
Towards this, consider the Cartesian product space
 = 1 2 generated from 1 and 2 such that if
  1 and   2 , then every  in  is an ordered pair
of the form  = (, ). To arrive at a probability model
we need to define the combined trio (, F, P).
7
Suppose AF1 and B  F2. Then A  B is the set of all pairs
(, ), where   A and   B. Any such subset of 
appears to be a legitimate event for the combined
experiment. Let F denote the field composed of all such
subsets A  B together with their unions and compliments.
In this combined experiment, the probabilities of the events
A  2 and 1  B are such that
P( A  2 )  P1 ( A), P(1  B)  P2 ( B).
(2-12)
Moreover, the events A  2 and 1  B are independent for
any A  F1 and B  F2 . Since
( A  2 )  (1  B)  A  B,
we conclude using (2-12) that
(2-13)
8
P( A  B)  P( A  2 )  P(1  B)  P1 ( A) P2 ( B)
(2-14)
for all A  F1 and B  F2 . The assignment in (2-14) extends
to a unique probability measure P( P1  P2 ) on the sets in F
and defines the combined trio (, F, P).
Generalization: Given n experiments 1, 2 ,, n , and
their associated Fi and Pi , i  1  n, let
  1  2    n
(2-15)
represent their Cartesian product whose elementary events
are the ordered n-tuples 1,2 ,,n , where i  i . Events
in this combined space are of the form
A1  A2    An
where Ai  Fi , and their unions an intersections.
(2-16)
9
If all these n experiments are independent, and Pi ( Ai ) is the
probability of the event Ai in Fi then as before
P( A1  A2    An )  P1 ( A1 ) P2 ( A2 ) P( An ).
(2-17)
Example 2.2: An event A has probability p of occurring in a
single trial. Find the probability that A occurs exactly k times,
k  n in n trials.
Solution: Let (, F, P) be the probability model for a single
trial. The outcome of n experiments is an n-tuple
  1 , 2 ,,  n  0 ,
(2-18)
where every i   and 0         as in (2-15).
The event A occurs at trial # i , if i  A. Suppose A occurs
exactly k times in .
10
Then k of the  i belong to A, say i , i ,, i , and the
remaining n  k are contained in its compliment in A.
Using (2-17), the probability of occurrence of such an  is
given by
1
2
k
P0 ( )  P({ i1 ,i2 ,,ik ,,in })  P({ i1 })P({ i2 }) P({ ik }) P({ in })
 P( A) P( A)  P( A) P( A) P( A)  P( A)  p k q n k .


 


k
(2-19)
n k
However the k occurrences of A can occur in any particular
location inside . Let 1 , 2 ,,  N represent all such
events in which A occurs exactly k times. Then
" A occurs exactly k times in n trials"  1  2     N . (2-20)
But, all these  i s are mutually exclusive, and equiprobable.
11
Thus
P(" A occurs exactly k times in n trials" )
N
  P0 (i )  NP0 ( )  Npk q n k ,
(2-21)
i 1
where we have used (2-19). Recall that, starting with n
possible choices, the first object can be chosen n different
ways, and for every such choice the second one in (n  1)
ways, … and the kth one (n  k  1) ways, and this gives the
total choices for k objects out of n to be n (n  1) (n  k  1).
But, this includes the k! choices among the k objects that
are indistinguishable for identical objects. As a result
n(n  1)(n  k  1)
n!
 n
N

  
k!
(n  k )! k!  k 
(2-22)
12
represents the number of combinations, or choices of n
identical objects taken k at a time. Using (2-22) in (2-21),
we get
Pn (k )  P(" A occurs exactly k times in n trials" )
 n  k n k
   p q , k  0,1,2,, n,
k 
(2-23)
a formula, due to Bernoulli.
Independent repeated experiments of this nature, where the
outcome is either a “success” (  A) or a “failure” (  A)
are characterized as Bernoulli trials, and the probability of
k successes in n trials is given by (2-23), where p
represents the probability of “success” in any one trial.
13
Example 2.3: Toss a coin n times. Obtain the probability of
getting k heads in n trials ?
Solution: We may identify “head” with “success” (A) and
let p  P(H ). In that case (2-23) gives the desired
probability.
Example 2.4: Consider rolling a fair die eight times. Find
the probability that either 3 or 4 shows up five times ?
Solution: In this case we can identify
"success"  A  { either 3 or 4 }   f3   f 4 .
Thus
1 1 1
P( A)  P( f 3 )  P( f 4 )    ,
6 6 3
and the desired probability is given by (2-23) with n  8 , k  5
and p  1 / 3. Notice that this is similar to a “biased coin”14
problem.
Bernoulli trial: consists of repeated independent and
identical experiments each of which has only two outcomes A
or A with P( A)  p, and P( A)  q. The probability of exactly
k occurrences of A in n such trials is given by (2-23).
Let
X k  " exactly k occurrence s in n trials" .
(2-24)
Since the number of occurrences of A in n trials must be an
integer k  0,1,2,, n, either X 0 or X1 or X 2 or  or X n must
occur in such an experiment. Thus
P( X 0  X 1    X n )  1.
(2-25)
But X i , X j are mutually exclusive. Thus
15
 n  k n k
P( X 0  X 1    X n )   P( X k )   
k 
 p q . (2-26)
k 0
k 0 

n
n
From the relation
( a  b) 
n
 n  k n k


k 
a b ,
k 0 

n
(2-27)
(2-26) equals ( p  q)n  1, and it agrees with (2-25).
For a given n and p what is the most likely value of k ?
From Fig.2.2, the most probable value of k is that number
which maximizes Pn (k ) in (2-23). To obtain this value,
consider the ratio
Pn (k )
n  12,
Fig. 2.2
p  1 / 2.
k
16
Pn (k  1)
n! p k 1q n k 1
(n  k )! k!
k
q


.
k n k
Pn (k )
(n  k  1)! (k  1)! n! p q
n  k 1 p
Thus
Thus
Pn (k )  Pn (k  1), if k (1  p )  ( n  k  1) p
Pn (k ) as a function of k increases until
k  ( n  1) p
or
(2-28)
k  ( n  1) p.
(2-29)
if it is an integer, or the largest integer kmax less than ( n  1) p,
and (2-29) represents the most likely number of successes
(or heads) in n trials.
Example 2. 5: In a Bernoulli experiment with n trials, find
the probability that the number of occurrences of A is
between k1 and k2 .
17
Solution: With X i , i  0,1,2,, n, as defined in (2-24),
clearly they are mutually exclusive events. Thus
P(" Occurrence s of A is between k1 and k2 " )
 n  k n k
 P( X k1  X k1 1    X k2 )   P( X k )     p q . (2-30)
k k1
k k1  k 
k2
k2
Example 2. 6: Suppose 5,000 components are ordered. The
probability that a part is defective equals 0.1. What is the
probability that the total number of defective parts does not
exceed 400 ?
Solution: Let
Yk  " k parts are defective among 5,000 components ".
18
Using (2-30), the desired probability is given by
400
P(Y0  Y1    Y400 )   P(Yk )
k 0
 5000 
(0.1) k (0.9)5000 k .
  
k 
k 0 
400
(2-31)
Equation (2-31) has too many terms to compute. Clearly,
we need a technique to compute the above term in a more
efficient manner.
From (2-29), kmax the most likely number of successes in n
trials, satisfy
or
(n  1) p  1  kmax  (n  1) p
(2-32)
q k max
p
p 
 p ,
n
n
n
(2-33)
19
so that
km
 p.
n  n
lim
(2-34)
From (2-34), as n  , the ratio of the most probable
number of successes (A) to the total number of trials in a
Bernoulli experiment tends to p, the probability of
occurrence of A in a single trial. Notice that (2-34) connects
the results of an actual experiment (km / n ) to the axiomatic
definition of p. In this context, it is possible to obtain a more
general result as follows:
Bernoulli’s theorem: Let A denote an event whose
probability of occurrence in a single trial is p. If k denotes
the number of occurrences of A in n independent trials, then
 k

pq

P

p



.

2
 n
n


(2-35)
20
Equation (2-35) states that the frequency definition of
k
probability of an event n and its axiomatic definition ( p)
can be made compatible to any degree of accuracy.
Proof: To prove Bernoulli’s theorem, we need two identities.
Note that with Pn (k ) as in (2-23), direct computation gives
n 1
n
n
n!
n!
k n k
k n k
k
P
(
k
)

k
p
q

p
q



n
( n  k )! k!
k 0
k 1
k 1 ( n  k )! ( k  1)!
n 1
n!
( n  1)!
i 1 n i 1

p q
 np 
p i q n 1i
i 0 ( n  i  1)! i!
i 0 ( n  1  i )! i!
n 1
(2-36)
 np( p  q) n 1  np.
Proceeding in a similar manner, it can be shown that
n
n
n
n!
n!
k n k
k
P
(
k
)

k
p
q

p k q n k



n
( n  k )! ( k  1)!
k 0
k 1
k  2 ( n  k )! ( k  2)!
2
n
n!

p k q n k  n 2 p 2  npq.
k 1 ( n  k )! ( k  1)!
(2-37)
21
Returning to (2-35), note that
k
 p 
n
is equivalent to ( k  np ) 2  n 2 2 ,
(2-38)
which in turn is equivalent to
n
n
2 2
2 2
(
k

np
)
P
(
k
)

n

P
(
k
)

n
 .


n
n
(2-39)
2
k 0
k 0
Using (2-36)-(2-37), the left side of (2-39) can be expanded
to give
n
 (k  np)
k 0
n
2
n
Pn ( k )   k Pn ( k )  2np  k Pn ( k )  n 2 p 2
2
k 0
k 0
 n 2 p 2  npq  2np  np  n 2 p 2  npq. (2-40)
Alternatively, the left side of (2-39) can be expressed as
n
 (k  np) P (k )   (k  np) P (k )   (k  np)
2
k 0
2
n
n
k  np  n

 (k  np) P (k ) 
2
k  np  n
n
 n  P k  np  n .
2
2
2
k  np  n
n 2 2
Pn ( k )
  P (k )
k  np  n
n
(2-41)
22
Using (2-40) in (2-41), we get the desired result
 k

pq

P

p



.

2
 n
n


(2-42)
Note that for a given   0, pq / n 2 can be made arbitrarily
small by letting n become large. Thus for very large n, we
k
can make the fractional occurrence (relative frequency) n
of the event A as close to the actual probability p of the
event A in a single trial. Thus the theorem states that the
probability of event A from the axiomatic framework can be
computed from the relative frequency definition quite
accurately, provided the number of experiments are large
enough. Since kmax is the most likely value of k in n trials,
from the above discussion, as n  , the plots of Pn (k ) tends
to concentrate more and more around kmax in (2-32).
23