Transcript Chapter 6

Discrete probability distributions
Chapter 6 - Sullivan
Prof. Felix Apfaltrer
[email protected]
Office:N518
Phone: 212-220 8000 x 74 21
Office hours:
Tue, Thu 1:30-3 pm
Random variables and distributions
• A random variable is a variable (typically represented by x) that has a
single numerical value, determined by chance, for each outcome of a
procedure.
• A probability distribution is a graph, table, or a formula that gives the
probability for each value of the random variable.
Probabilities of girls
x (girls)
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
P(x)
0.000
0.001
0.006
0.022
0.061
0.122
0.183
0.209
0.183
0.122
0.061
0.022
0.006
0.001
0.000
Gender of children: A study consists of randomly
selecting 14 newborn babies and counting the
number of girls in the sample. If we assume that
having a boy or a girl is equally likely, and let
x = number of girls among the 14 babies
then x is a random variable because its value
depends on chance.
The possible values are x =0,1,2,3,…,11,12,13,14.
2
A probability distribution is shown to the left.
Discrete and Continuous Random Variables (r.v.’s)
•
•
A discrete random variable has either a finite or countable number of
values. Countable means it might be infinite, but you can still “count” them
(there are gaps between them).
A continuous random variable has infinitely many values without gaps
between them (like interval subsets of the real numbers).
Examples:
Discrete random variables:
• Number of eggs a hen lays per day.
– cannot lay 2.3 eggs one day!
– R.v.: # not known for sure in
advance!
• Number of people attending the
Columbus Day Parade.
– Discrete r.v.: counting the number of
people. Random: we do not know in
advance exactly how many are going.
(but we might have an estimate)
• The sum of the faces when we roll two
dice.
• The points in hand of Black Jack.
• The average number of eggs per hen per
day in a farm with 10 hens.
Continuous random variables:
• Amount of milk a cow produces a day.
–Continuous r.v.: She might yield 1.345
gallons, or 1.34512 (no gaps in
measurement).
• The humidity at a given day.
–Continuous r.v.: Percentage of humidity
can be 75.34%.
• The daily closing value of the Dow Jones
Industrial Average index.
• The daily ocean temperature at a marine
laboratory investigating whales.
3
Probability histogram
0.250
Probability histogram
• Very similar to relative
frequency histogram
• Instead of percent (relative
frequency) probability is
shown.
• The values 0, 1, 2, …, 13,
14, are at the center of the
rectangles -> base = 1
• area = height*base = height
Probability
0.200
0.150
0.100
0.050
0.000
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14
Number of Girls among 14 newborns
4
Requirements of Probability Distributions
• ∑P(x) = 1
• 0 ≤ P(x) ≤ 1
where x assumes all possible values.
for every individual value of x.
Discussion:
• x takes all possible values, so it represents all options in the sample space
– For table ‘girls’, sum is 0.999, almost 1 except for rounding errors.
• All P(x) between 0 and 1 because they are probabilities!
Probabilities ???
x
0
1
2
3
P(x)
0.2
0.3
0.4
0.5
P(x) = x/9
for x = 2,3, & 4
Example: Does the table represent a probability distribution?
• All values between 0 and 1. Good!
• ∑P(x) = 0.2+0.3+0.4+0.5 = 1.4 . Uups!
• is not 1.
• Therefore, it is not a probability distribution.
Does the function P(x) = x/9 represent a probability distribution?
•
•
•
•
P(2) =2/9, P(3) =3/9, P(4) =4/9,
∑P(x) = 2/9 + 3/9 + 4/9 = (2+3+4)/9 = 9/9 = 1
It is 1.
Therefore, the function does represent a
probability distribution.
5
Mean, Variance and Standard Deviation for Distributions
•
•
•

2
= ∑ x•P(x)
= ∑ (x –  )2•P(x)
= ∑ [ x 2 •P(x) ] –  2
mean
variance
variance (alternative formula)
•

= √ ∑ [ x 2 •P(x) ] –  2
standard deviation
x
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
sum
mean
P(x)
0.000
0.001
0.006
0.022
0.061
0.122
0.183
0.209
0.183
0.122
0.061
0.022
0.006
0.001
0.000
1.000
x^2 P(x)
x P(x)
0.000
0.000
0.001
0.001
0.011
0.022
0.067
0.200
0.244
0.978
0.611
3.055
1.100
6.598
10.264
1.466
11.730
1.466
1.100
9.898
0.611
6.110
0.244
2.688
0.067
0.800
0.011
0.144
0.001
0.012
52.500
7.000
Rationale:
7.000
-mean^2
variance
standard deviation 1.871
-49.000
3.500
6
Mean, Variance for Distributions (round-off and unusual values)
•
•
•
Round off at 1 more decimal than data!
Minimum usual value
 – 2
Maximum usual value
 + 2
Example:
In previous calculation,  = 7,  =1.9.
• Minimum usual value:  – 2 = 7 – 2(1.9) = 3.2
• Maximum usual value:  + 2 = 7 + 2(1.9) = 10.8
For the group of 14 babies, the usual values for the number of girls fall between
3.2 and 10.8.
Rare event rule: If, under a given assumption, the probability of an event is
extremely low, we conclude that the assumption is most likely incorrect.
With probabilities:
• x successes among n trials are unusually high if P(x or more) <0.05
• x successes among n trials are unusually low if P(x or less) <0.05
Example (Gender Selection):
Getting 13 or more girls.
P(13 or more girls)
=P(13)+P(14) = 0.001+0.000 = 0.001
unusually high.
7
Expected Value
The mean of a discrete random variable (expected value) denoted by E or
μX , and it represents the average value of the outcomes.
μX = E = E[X] = ∑ { x•P(x) }
Example (NJ pick 3 game):
Bet $ 0.50 and select a 3 digit number between 000 and 999. If you get the
number, you collect $275. Your net gain is then $274.50. Suppose that
you bet $0.50 on the number 007. What is your expected value of gain
or loss?
Event
x
P(x)
xP(x)
A:
Each outcome is equally likely.
Win
$274.50
0.001 $0.2745
Loose
-$0.50
0.999 -$0.4995
P(win) = 1/1000 = 0.001
P(loss) = 999/1000 = 0.999
Total
-$0.2250
E[X] = ∑ x•P(x) = ∑ x•P(x) =274.50 • 0.001 + (-0.50) • 0.999
win
loss
= 0.2745 - 0.4995 = - 0.225
On average you will be loosing 22.5 cents every time you play.
8
Bernoulli Distribution
The Bernoulli probability distribution results from a procedure such that:
• there is one trial, like one flip of a coin
• there are only two outcomes (heads/tails, 0/1, red/white, success/failure)
Examples:
•
•
•
•
Probabilities:
Tossing one coin (or bean)
•
 1 trial
 outcomes: heads or tails
•
Birth of one child:
 1 trial
 Outcomes: boy or girl
Tossing one die, win if it’s 6, loose 1-5
 1 trial
 outcomes: win or loose
•
 Suppose you pay $1 to play and get $3
back if ‘6’comes out.
Weather tomorrow
 1 trial (day)
 Outcomes: rain or shine
P(X=heads)=0.5
 P(X=tails)=0.5
P(girl)=0.513
= p success probability
 P(boy)=0.487 = q = (1– p) failure prob
 X=“number of girls” in one birth: 0 or1
• = 0P(0)+1P(1) = 0 q + 1p = p
 2 = 0 2P(0)+12P(1) – p 2
=p – p 2 =p(1 – p) = pq
P( win) =1/6 = p , P(loose)=5/6 = q
 X=“number of wins” in one toss: 0 or 1
 = 0P(0)+1P(1) = p = 1/6,  2 =pq= 5/36
 Expectation:
E[X] =3•1/6 + (-1)•6/6 = – 3/6
On average you will be loosing 50 cents per play
Binomial Distributions
A procedure has a binomial probability distribution if:
• each trial must have all outcomes in 2 categories
• the procedure has a fixed number of trials
• the trials are independent
• the probabilities must remain constant for each trial
Notation for binomial probability distributions:
2 categories:
S success (p prob. of success)
Probabilities:
P(S) = p
n
x :: X = x
p
q
P(x) = P( X = x )
P( X ≤ x )
B( n , p )
F failure (q prob. of failure)
P(F) = q =1– p
fixed number of trials
X denotes the random variable, x denotes number of successes in n trials
probability of success (success is arbitrary, can be good or not)
probability of failure
probability of getting exactly x successes among n trials
probability of getting x or less successes among n trials
binomial distribution with n trials and probability of success p
Note: B(n,p) = sum of n independent Bernoulli distributions with probability of success p
X
= Y1 + Y2 + …+ Yn
 X =  Y1 +  Y2 +…+  Y n = p + p +…+ p = np
10
 2X =  2Y1+ 2Y2 +…+ 2Y n = pq + pq +…+ pq = npq
Binomial Distributions: Examples
Remember:
• Poll and test samples usually done without replacement -> dependent
• If sample small enough (< 5% of population), then it is safe to
assume independence (even though there is no independence)
Multiple choice answers: (answered at random, options: a,b,c,d,e, 4 questions)
• P(3 answers correct)
• Binomially distributed?
–
–
–
–
Number of trials fixed n = 4.
Trials independent. (answers do not depend on previous ones).
2 outcomes: right, or wrong.
One answer correct, p=1/5=0.2; q = 0.8.
YES!
Use binomial formula
11
Binomial Distributions: Examples Continued
Use table A-1:
n
4
4
4
4
4
x
0
1
2
3
4
p
0.2
0.41
0.41
0.154
0.026
0.002
x
0
1
2
3
4
P (x)
0.4096
0.4096
0.1536
0.0256
0.0016
Hence, P(3) = 0.0256
Question: What is the probability that at least 3 answers are correct?
•
•
HW:
Sullivan Review
Chapter 6, SC p315
#1-5, 7, 8, 13, 15
‘at least 3 answers correct’ = {X≥3} = {X=3 or X= 4}
P(X ≥ 3 ) = P(X = 3 ) + P(X = 4 )
= 0.0256 + 0.0016 = 0.0272
Mean, variance and expectation:
 X = np = 4 ( 0.2) = 0.8
 2X = npq = 4 ( 0.2) (0.8) = 0.64
->
 X = 0.8
Suppose that someone you pay $1000 if the person that answers at random won’t
answer 3 or more answers correctly, and that you receive $100 otherwise. What is
your expected loss/gain?
E[ X] = -1000 (0.0272) + 100 ( 1- 0.0272)
= - 27.2 + 77.28 = 49.92
12
Homework
• Sullivan Review exercises chapter 6
– P. 315 (softcover)
• 1-5, 7, 8, 13, 15
13