Transcript Example

ความน่ าจะเป็ นแบบดีสครีต
(Discrete
Probability)
รศ.ดร. สาธิต อินทจักร์
ภาควิชาวิศวกรรมการวัดคุม คณะวิศวกรรมศาสตร์
สถาบันเทคโนโลยีพระจอมเกล้าเจ้าคุณทหาร ลาดกระบัง
Why Probability?
• Originally devised for gambling by Pascal
and Laplace over 200 years ago.
• Current applications of probability include
genetics study (e.g., to understand
inheritance of traits) and computer science
(e.g., to determine average-case complexity
of algorithms).
Definitions
• Experiments: a procedure that yields one of
a given set of possible outcomes.
• Sample space of an experiment: the set of
possible outcomes.
• Event: a subset of the sample space whose
outcome is of our interest.
• Probability of an event, p(E) = |E|/|S|
Examples
• Ex 1: What’s the probability of drawing a blue
ball from an urn containing four blue balls and
five red balls? [4/(4+5) = 4/9]
• Ex 2: What’s the probability that when two dice
are rolled, the sum of the numbers on the two dice
is 7?
– Number of possible outcomes, |S|, = 6*6 = 36
– Events where the sum of the numbers is 7 include (1,6),
(2,5), (3, 4), (4, 3), (5, 2), (6, 1)  |E| = 6
– The probability of such event, p(E), = |E|/|S| = 6/36 =
1/6
More examples
• Ex 3: Find the probability that a hand of five cards
in poker contains four cards of one kind?
– Sample space, S, is number of possible ways to choose
5 cards out of 52 cards  |S| = C(52, 5)
– Event, E, to select four cards of one kind, includes first
select 1 kind out of 13 kinds (C(13, 1)), then select 4
cards of this kind from the four in the deck of this kind
(C(4, 4)), and finally select 1 last card out of the 48
cards left (C(48, 1))
C(13,1)C(4,4)C(48,1)
– Probability, p(E) = |E|/|S| =
C(52,5)
Complementary probability
• Theorem 1: Let E be an event in a sample space S.
The probability of the event E, the complementary
event of E, is given by
p( E )  1  p( E )
• Ex: A sequence of ten bits is randomly generated.
What’s the probability that at least one of these
bits is 0?
– E = event that at least one of ten bits is 0  E =
event that all bits are 1s.
– p(E) = 1 – p( E ) = 1 - | E |
|S|
= 1 – 1/210 = 1 – 1/1024
= 1023/1024
Probability of union of two events
• Theorem 2: Let E1 and E2 be events in the
sample space S. Then (by the inclusion-exclusion
principle), p(E1E2)
= p(E1) + p(E2) – p(E1E2)
Properties of probability values
1.
2.
The probability of each outcome is a nonnegative real
number no greater than 1. That is, 0  p(xi)  1 for i = 1,
2, …, n
The sum of the probabilities
of all possible outcomes
n
should be 1. That is,  p( xi )  1
i 1
Such a p is called a probability distribution
Definition 2: The probability of the event E is the sum of the
probabilities of the outcomes in E. That is,
p( E )   p( s )
sE
Probability
• The probability p = p(E)  [0,1] of an event E is a
real number representing our degree of certainty
that E will occur.
– If p(E) = 1, then E is absolutely certain to occur,
– If p(E) = 0, then E is absolutely certain not to occur,
– If p(E) = ½, then we are completely uncertain about
whether E will occur.
– What about other cases?
An Example
• Ex: What’s the probability that an odd
number appears when we roll a die with
equally likely outcomes?
– probability of the event an odd number appears,
E, = {1, 3, 5}. Each event has probability p(1) =
p(3) = p(5) = 1/6
– Therefore, p(E) = p(1) + p(3) = p(5) = 3/6 = 1/2
Random Variables
• A random variable V is a variable whose
value is unknown, or that depends on the
situation.
– E.g., the number of students in class today
– the grades students receive in this class
– Whether it will rain tonight (Boolean variable)
• The proposition V=vi may be uncertain, and
be assigned a probability.
Mutually Exclusive Events
• Two events E1, E2 are called mutually
exclusive if they are disjoint: E1E2 = 
• Note that two mutually exclusive events
cannot both occur in the same instance of a
given experiment.
• For mutually exclusive events,
p(E1  E2) = p(E1) + p(E2).
Exhaustive Sets of Events
• A set E = {E1, E2, …} of events in the
sample space S is exhaustive if  Ei  S .
• An exhaustive set of events that are all
mutually exclusive with each other has the
property that  p( Ei )  1
Conditional Probability
• Let E, F be events such that p(F)>0.
• Then, the conditional probability of E given
F, written p(E|F), is defined as
p(E|F) = p(EF)/p(F).
• This is the probability that E would turn out
to be true, given just the information that F
is true.
• If E and F are independent, p(E|F) = p(E).
An Example
• Ex 1: A bit string of length four is generated at random so
that each of the 16 bit strings of length four is equally
likely. What’s the probability that it contains at least two
consecutive 0s given that its first bit is 0?
• Let E = event that a bit string of length four contains at
least two consecutive 0s, and
• let F = event that the first bit of a bit string of length four is
a 0. Then, p(E|F) = p(EF)/p(F)
• E  F = {0000, 0001, 0010, 0011, 0100}  p(EF) = 5/24
= 5/16. Since half of the bit string of length four must
begin with 0 (the other half begins with 1), p(F) = 8/16 =
½. Therefore, p(E|F) = (5/16)/(1/2) = 5/8
Independent Events
• Two events E, F are independent if and only
if p(EF) = p(E)·p(F).
• Relates to product rule for number of ways
of doing two independent tasks
• Example: Flip a coin, and roll a die.
p( quarter is heads  die is 1 ) =
p(quarter is heads) × p(die is 1)
Bernoulli Trials
• Theorem 2: The probability of exactly k
successes in n independent Bernoulli trials,
with probability of success p and
probability of failure q = 1 – p, is
C(n, k) pk qn-k
An Example
• Ex: What’s the probability that exactly four heads
come up when a fair coin is flipped seven times,
assuming that the flips are independent.
• n = 7, k = 4, n – k = 7 – 4 = 3
• p = probability of success (getting head) = ½
• q=1–p=½
• Therefore, p(gets 4 heads out of 7 flips)
= C(n, k) pk qn-k = C(7, 4) (1/2)4(1/2)3
Bayes’s Theorem
• Allows one to compute the probability that a
hypothesis H is correct, given data D:
p( D | H )  p( H )
p ( H | D) 
p ( D)
• Easy to prove from def’n of conditional prob.
• Extremely useful in artificial intelligence apps:
– Data mining, automated diagnosis, pattern recognition,
statistical modeling, evaluating scientific hypotheses.
Expectation Values
• For a random variable X(s) on the sample space S
is equal to E(X) = ∑sS p(s)X(s)
• The term “expected value” is widely used, but
misleading since the expected value might be
totally unexpected or impossible!
• Ex 1: Let X be the number that comes up when a
die is rolled. What’s the expected value of X?
Random variable X takes the values 1, 2, 3, 4, 5, or 6,
each with probability 1/6. So E(X)
= (1/6) 1 + (1/6) 2 + (1/6) 3 + (1/6) 4 + (1/6) 5 + (1/6) 6
= 21/6 = 7/2
Linearity of Expectation
• Let X1, X2 be any two random variables
derived from the same sample space, and if
a and b are real numbers. Then:
• E(X1+X2) = E(X1) + E(X2)
• E(aX1 + b) = aE(X1) + b
Independent Random Variables
• Definition 3: The random variables X and Y
are independent if
p(X = r1 and Y = r2) = p(X = r1)*p(Y = r2)
• Theorem 5: If X and Y are independent
random variables, then E(XY) = E(X)E(Y).
Variance
• The variance V(X) = σ2(X) of a random variable X
is the expected value of the square of the
difference between the value of X and its
expectation value E(X):
V( X )   ( X (s)  E( X ))2 p(s)
sS
• The standard deviation or root-mean-square
(RMS) difference of X, σ(X) :≡ V(X)1/2.
• Theorem 6: If X is a random variable on a sample
space S, then V(X) = E(X2) – E(X)2
Example
=>
Daily sales records for a shop selling electric
appliances show that it will sell zero, one,
two or three air-conditioners with the
probabilities:
Number of Sales
0
1
2
3
Probability
0.5 0.3 0.15 0.05
Calculate the expected value, variance and
standard variation for daily sales.
Example
=>
Expected value
= (0)(0.5) + (1)(0.3) + (2)(0.15) + (3)(0.05)
= 0.75
Variance
= (0 - 0.75)2(0.5) + (1 – 0.75)2(0.3) +
(2 – 0.75)2(0.15) + (3 – 0.75)2(0.05)
= 0.7875
Standard deviation =
0.7875
= 0.8874
Introducing binomial distribution
Consider the following random variables:
a) X : no.of “6” obtained in 10 rolls of a fair
die.
b) X : no. of tails obtained in 100 tosses of a
fair coin.
c) X : no. of defective light bulb in a batch of
1000.
d) X : no. of boys in a family of 5 children.
In each case, a basic experiment is repeated a
number of times. For example, the basic
experiment in case (a) is rolling the die once.
The following are common characteristics of
the random variables in cases (a) to (d):
1) The number of trials n of the basic
experiment is fixed in advance.
2) Each trial has two possible outcomes
which may be called “success” and “failure”.
3) The trials are independent.
4) The probability of success is fixed.
Binomial random variable
• A random variable X defined to be the
number of successes among n trials called
a binomial random variable if the
properties (1) to (4) are satisfied.
• Mathematically, we write
X ~ Bin(n, ),
where n = no. of trials,
and  = prob. of success.
The p.d. of a binomial r.v.
If X ~ Bin(n, ), then
p(x) = P(X = x) = nCxx(1-)n-x
where x = 0, 1, 2, …, n .
Example (Binomial 1)
A fair coin is tossed 8 times.
Find the probability of obtaining 5 heads.
Let X be the number of heads obtained in
8 tosses.
Then X ~ Bin(8, 1/2).
1 5
1 85
P(5 heads) = 8 C5 ( ) (1  )
2
2
= 7/32
Example (Binomial 2)
There are 10 multiple-choice questions in
a test and each question has 5 options.
Suppose a student answers all 10
questions by randomly picking an option
in each question. Find the probability that
(a) he will answer six questions correctly,
(b) he will get at least 3 correct answers.
Example (Binomial 2)
Let X be the number of correct answers he will get.
Then X ~ Bin(10, 0.2).
(a) P(X = 6) = 10C6(0.2)6(1-0.2)10-6 = 0.00551
(b)P(at least 3 correct answers)
= 1 – P(X = 0) – P(X = 1) – P(X = 2)
= 1 – 10C0(0.8)10 – 10C1(0.2)(0.8)9 –
10C2(0.2)
2(0.8)8
= 0.322
Example (Binomial 3)
Binary digits 0 and 1 are transmitted along a data
channel in which the presence of noise results in the
fact that each digit may be wrongly received with a
probabilty of 0.00002. Each message is transmitted
in blocks of 2000 digits.
• What is the probabilty that at least one digit in
a block is wrongly received?
• If a certain message has a length of 20 blocks,
find the probability that 2 or more blocks are
wrongly received.
Example (Binomial 3)
(a) Let X be the number of digits wrongly
received in a block of 2000 digits.
Then X ~ Bin(2000, 0.00002) .
P(X  1)= 1 – P(X = 0)
= 1 – 0.000022000
= 0.0392
Example (Binomial 3)
(b) Let Y be the number of block that are
wrongly received among the 20 blocks.
Then Y~ Bin(20, 0.0392).
P(Y  2) = 1 – P(Y = 0) – P(Y = 1)
= 1 – (1 – 0.0392)20 –
19
C
(0.0392)(1
–
0.0392)
20 1
= 0.184
Event
สมมุติมีคณะกรรมการอยู่ 5 คน เป็ นผู้ชาย 3 และ ผู้หญิง 2 คน จะเลือกตัวแแนน 3
คน จึงนาการจับฉลาก โดยให้ มีผ้ ชู าย 2 และผู้หญิงหนึง่ คน
• Sampling
with Replacement
N3 = 53 = 125 ชุด
M1 M1 M1,
:
F2 F2 F2,
M1 M1 M2,
:
F2 F2 F1,
M1 M1 M3,
:
F2 F2 M3,
M1 M1 F1,
:
F2 F2 M2,
M1 M1 F2
:
F2 F2 M1
• Sampling without Replacement
M1 M2 M3,
M1 M2 F1,
M1 M2 F2,
M1 M3 F1,
M1 F1 F2,
M2 M3 F1,
M2 M2 F2,
M2 F1 F2,
M1 M3 F2
M3 F1 F2
Pr(ตัวแแนนนี่ประกอบด้ วแยกรรมการชาย 2 และ หญิง 1 คน) = 6/10
Hypergeometric Distribution
ในกลุ่มตัวอย่างขนาด n จะมีการแจกแจงแบบ Hypergeometric ถ้า
X มีฟังก์ชนั ความน่าจะเป็ นดังนี้
 k  N  k

  
x
n

x

   
; x  0,1,2,... n
f X ( x )  f X ( x; N , k , n )    N 
  n
  

0
Otherwise
โดยที่
k
N
n
เป็ นส่ วนที่ของข้อมูลที่เราให้ความสนใจ
เป็ นจานวนประชากรทั้งหมด
เป็ นจานวนประชากรของตัวอย่าง
ตัวอย่ างที่ 1.4 กล่องใบหนึ่งบรรจุตวั ต้านทาน 100 ตัว โดยมีตวั ต้านทานที่
เสื่ อมคุณภาพอยู่ 5 ตัวปะปนอยู่ เพื่อตรวจสอบคุณภาพของตัวต้านทานทั้ง
กล่อง ผูซ้ ้ือสุ่ มตัวอย่างตัวต้านทานมา 10 ตัว เพื่อนาไปตรวจสอบคุณภาพ
ถ้าใน 10 ตัว ถ้าพบว่ามีตวั เสื่ อมคุณภาพปะปนอยูส่ องตัวผูซ้ ้ือจะไม่ยอมซื้อ
• จงหาความน่าจะเป็ นที่จะมีตวั ต้านทานที่เสื่ อมคุณภาพปนอยู่ 2 ตัว
ในตัวอย่างสุ่ ม
• จงคานวณค่าความน่าจะเป็ นในข้อแรก ด้วยวิธีการแจกแจงแบบทวิ
นาม
 5  95 
  
10  95!10! 90!
2  8 

Pr(x =2) =

 0.0702
100 
87! 8! 100!


 10 
ประมาณค่า Pr(x=2) ด้วย Binomial Distribution
ในการทดลอง n ครั้ง การแจกแจงทวินามจะมีลกั ษณะเหมือนกับ
การกระจายทวินาม n เทอมคือ
 p  q
n
 n  n x x
  x 0   p q
 x
n
โดยทัว่ ไป Hypergeometric Distribution สามารถประมาณด้วย
Binomial Distribution ถ้าหาก n/N  0.1 นัน่ คือ
k/N = p เป็ นโอกาสเลือกได้หน่วยที่สนใจมาเป็ นตัวอย่าง
(N-k)/N = q เป็ นโอกาสที่เลือกได้หน่วยที่ไม่สนใจมาเป็ นตัวอย่าง
10 
2
8
Pr( x  2)     0.05   0.95 
2
 0.075
Normal distribution
=>
A random variable X is said to have a Normal
distribution, with parameters  and 2 if it
can take any real value and has p.d.f.
2

1
1 x  
f ( x) 
exp 
,   x  


2 2
 2    
In this case, we write X ~ N(, 2). It can be
shown that  is the expected value of X and
2 is the variance.
P(a < X < b) =
=

b
a

b
a
f ( x)dx
2

1
1 x  
exp 
dx


2 2
 2    
This integral cannot be done algebraically
and its value has to be found by numerical
methods of integration. The value of
P(a < X < b) can also be viewed as the area
under the curve of f(x) from x = a to x = b.
Characteristics of normal distribution
1.
2.
3.
4.
Bell shape
Symmetric
Mean = Mode = Median
The 2 tails extend indefinitely
Standard normal distribution
A standard normal distribution is the
normal distribution with  = 0 and 2 = 1,
i.e. the N(0,1) distribution.
A standard normal random variable is
often denoted by Z.
If Z ~ N(0,1), its c.d.f. is usually written as
(z) = P(Z  z)
=
1
2
2
x
 exp( 2 )dx
z
Note:
(1) (z) may be interpreted as the area to the left of z
under the standard normal curve.
(2) P(Z < -z) = P(Z > z), since the standard normal
curve is symmetrical about the line Z = 0.
(3) The area between
Z= -1 and +1 is 68%
Z= -2 and +2 is 95%
Z= -3 and +3 is 99%
Example
=>
Given Z ~ N(0,1), find the following
probabilities using the standard normal
table.
(a)
p( Z  1.25)
(b)
(c)
(d)
(e)
P( Z > 2.33)
P(0.5 < Z < 1.5)
P( Z < -1.25)
p (-1.5< Z <-0.5)
Example
(a) P(Z  1.25) = 0.8944
(b) P(Z > 2.33) = 1 - (2.33)
= 1 – 0.9901 = 0.0099
(c) P(0.5 < Z < 1.5) = (1.5) - (0.5)
= 0.9332 – 0.6915=0.2417
(d) P( Z < -1.25) = P(Z > 1.25)
= 1 - (1.25)
= 1 – 0.8944 = 0.1056
(e) P(-1.5 < Z < -0.5) = 0.2417
Example
(a) P(Z  1.25) = 0.8944
Example
(b) P(Z > 2.33) = 1 - (2.33)
= 1 – 0.9901 = 0.0099
Example
(c) P(0.5 < Z < 1.5) = (1.5) - (0.5)
= 0.9332 – 0.6915=0.2417
Example
Given Z ~ N(0,1), find the value of c if
(a) P(Z  c) = 0.8888
(b) P( Z > c) = 0.37
(c) P(Z < c) = 0.025
(d) P(0  Z  c) = 0.4924
Example
(a) (c) = 0.8888
c = 1.22
(b) 1 - (c) = 0.37
(c) = 0.63
c = 0.332
Example
(c) Obviously c is negative and the standard
normal table cannot be used directly.
Recall that P(Z < -z) = P(Z > z)
0.025 = P(Z < c) = P(Z > -c) = 1 –(-c)
(-c) = 0.975
-c = 1.96
c = -1.96
Example
(d) P(0  Z  c) = (c) – 0.5
= 0.4924
(c) = 0.9924
c = 2.43
Standardization of normal
random variables
If X ~ N(, 2), then it can be shown that
Z=
X 

In other words

X 

~ N(0,1) .

is the standard
normal random variable.
The process of transforming X to Z is
called standardization of the random
variable X.
Example
Suppose X ~ N(10, 6.25). Find the
following probabilities
(a) P(X < 13)
(b) P(X > 5)
(c) P(8 < X < 15)
Example
X

10
X

10
Let Z =
.

2.5
6.25
13  10
(a) P(X < 13) = P(Z < 2.5
= P(Z < 1.2)
= 0.8849
)
Example
5  10
(b) P(X > 5) = P(Z >
2 .5
= P(Z > -2)
= P(Z <2)
= 0.9773
)
Example
8  10
15  10
Z
(c) P(8 < X < 15) = P(
)
2.5
2.5
= P(-0.8 < Z < 2)
= (2) - (-0.8)
= (2) – (1 - (0.8))
= 0.9772 – (1 – 0.7881)
= 0.7653