Transcript Document

3
Discrete Random
Variables and
Probability Distributions
Copyright © Cengage Learning. All rights reserved.
Population versus sample

Population: The entire group
of individuals in which we are
interested but can’t usually
assess directly.

Sample: The part of the
population we actually examine
and for which we do have data.
How well the sample represents
the population depends on the
sample design.
Example: All humans, all
working-age people in
California, all crickets
Population
Sample

A parameter is a number
describing a characteristic of
the population.

A statistic is a number
describing a characteristic of a
sample.
Parameters and Statistics
As we begin to use sample data to draw conclusions about a wider
population, we must be clear about whether a number describes a
sample or a population.
A parameter is a number that describes some characteristic of the
population. In statistical practice, the value of a parameter is not
known because we cannot examine the entire population.
A statistic is a number that describes some characteristic of a
sample. The value of a statistic can be computed directly from the
sample data. We often use a statistic to estimate an unknown
parameter.
We write µ (the Greek letter mu) for the population mean and σ for the
population standard deviation. We write x (x-bar) for the sample mean
and s for the sample standard deviation.
Discrete random variables
A random variable is a variable whose value is a numerical outcome of
a random phenomenon.
A bottle cap is tossed three times. We define the random variable X as the
number of number of times the cap drops with the open side up.
A discrete random variable X has a finite or countable infinite number
of possible values.
A bottle cap is tossed three times. The number of times the cap drops with
the open side up is a discrete random variable (X). X can only take the
values 0, 1, 2, or 3.
Probability distribution for discrete random variables
The probability distribution of a random variable X lists the values
and their probabilities:
The probabilities pi must add up to 1.
Definition of Probability distribution or probability mass function
(pmf) of a discrete rv:
p(x) = P(X = x) = P(all s  S : X(s) = x),
for every number x.
Example:
A bottle cap is tossed three times. We define the random variable X as the
number of number of times the cap drops with the open side up. The open side is
lighter than the closed side so it will probably be up well over 50% of the tosses.
Suppose on each toss the probability it drops with the open side up is 0.7.
Example: P(UUU) = P(U)* P(U)* P(U) = (.7)*(.7)*(.7) = .343
U
U -
UUU
D -
UUD
U -
UDU
Value of X
0
1
2
3
Probability
.027
.189
.441
.343
UDD
DUD
DDU
UUD
UDU
UUD
UUU
U
D
DDD
D…
D -
DDU
…
The probability of any event is the sum of the probabilities pi of the
values of X that make up the event.
A bottle cap is tossed three times. We define the random variable X as the
number of number of times the cap drops with the open side up.
What is the probability that at least two
Value of X
0
1
2
3
times the cap lands with the open side
Probability
.027
.189
.441
.343
UDD
DUD
DDU
UUD
UDU
UUD
UUU
up (“at least two” means “two or more”)?
DDD
P(X ≥ 2) = P(X=2) + P(X=3) = .441 + .343 = 0.784
What is the probability that cap lands with the open side up fewer than three
times?
P(X<3) = P(X=0) + P(X=1) + P(X=2) = .027 + .441 + .189 = 0.657 or
P(X<3) = 1 – P(X=3) = 1 - 0.343 = 0.657
A Parameter of a Probability Distribution
The pmf of the Bernoulli rv X was p(0) = .8 and p(1) = .2 because 20%
of all purchasers selected a desktop computer.
At another store, it may be the case that p(0) = .9 and
p(1) = .1.
More generally, the pmf of any Bernoulli rv can be expressed in the
form p(1) =  and p(0) = 1 – , where 0 <  < 1. Because the pmf
depends on the particular value of  we often write p (x; ) rather than
just p(x):
(3.1)
Example:
Starting at a fixed time, we observe the gender of each newborn child
at a certain hospital until a boy (B) is born.
Let p = P(B), assume that successive births are independent, and
define the rv X by x = number of births observed.
Then
p(1) = P(X = 1)
= P(B)
=p
Example:
p(2) = P(X = 2)
= P(GB)
= P(G)  P(B)
= (1 – p)p
and
p(3) = P(X = 3)
= P(GGB)
= P(G)  P(G)  P(B)
= (1 – p)2p
Example:
Continuing in this way, a general formula emerges:
The parameter p can assume any value between 0 and 1.
The Cumulative Distribution Function
For some fixed value x, we often wish to compute the probability that
the observed value of X will be at most x.
The probability that X is at most 1 is then
P(X  1) = p(0) + p(1) = .500 + .167 = .667
The Cumulative Distribution Function
In this example, X  1.5 if and only if X  1, so
P(X  1.5) = P(X  1) = .667
Similarly,
P(X  0) = P(X = 0) = .5, P(X  .75) = .5
And in fact for any x satisfying 0  x < 1, P(X  x) = .5.
The Cumulative Distribution Function
The largest possible X value is 2, so
P(X  2) = 1 P(X  3.7) = 1 P(X  20.5) = 1
and so on.
Notice that P(X < 1) < P(X  1) since the latter includes the probability
of the X value 1, whereas the former does not.
More generally, when X is discrete and x is a possible value of the
variable, P(X < x) < P(X  x).
The Cumulative Distribution Function
Definition of cumulative distribution function (cdf) F(x) of a discrete
rv variable X with pmf p(x):
F(x) = P(X  x) =
,
for every number x.
For any number x, F(x) is the probability that the observed value of X
will be at most x.
Example:
A store carries flash drives with either 1 GB, 2 GB, 4 GB, 8 GB, or 16
GB of memory.
The accompanying table gives the distribution of Y = the amount of
memory in a purchased drive:
Example:
Let’s first determine F(y) for each of the five possible values of Y:
F(1) = P(Y  1)
= P(Y = 1)
= p(1)
= .05
F(2) = P(Y  2)
= P(Y = 1 or 2)
= p(1) + p(2)
= .15
Example:
cont’d
F(4) = P(Y  4)
= P(Y = 1 or 2 or 4)
= p(1) + p(2) + p(4)
= .50
F(8) = P(Y  8)
= p(1) + p(2) + p(4) + p(8)
= .90
F(16) = P(Y  16)
=1
Example:
Now for any other number y, F(y) will equal the value of F at the closest
possible value of Y to the left of y. For example,
F(2.7) = P(Y  2.7)
= P(Y  2)
= F(2)
= .15
F(7.999) = P(Y  7.999)
= P(Y  4)
= F(4)
= .50
Example:
If y is less than 1, F(y) = 0 [e.g. F(.58) = 0], and if y is at least 16, F(y) =
1[e.g. F(25) = 1]. The cdf is thus
Example:
A graph of this cdf is shown in Figure 3.5.
A graph of the cdf of Example 3.13
Figure 3.13
The Cumulative Distribution Function
For X a discrete rv, the graph of F(x) will have a jump at every possible
value of X and will be flat between possible values. Such a graph is
called a step function.
Proposition
For any two numbers a and b with a  b,
P(a  X  b) = F(b) – F(a–)
where “a–” represents the largest possible X value that is strictly less
than a.
Expected value of a random variable
The expected value of a random variable X is also called mean of X.
The mean x bar of a set of observations is their arithmetic average.
The mean µ of a random variable X is a weighted average of the
possible values of X, reflecting the fact that all outcomes might not be
equally likely.
A bottle cap is tossed three times. We define the random variable X as the
number of number of times the cap drops with the open side up (“U”).
DDD
UDD
DUD
DDU
UUD
UDU
UUD
UUU
Value of X
0
1
2
3
Probability
.027
.189
.441
.343
Expected value of a random variable
Definition:
Let X be a discrete random variable with set of possible values D and
pmf p(x). The expected value or mean value of X, denoted by E(X) or
X or just , is
Mean of a discrete random variable
For a discrete random variable X with
probability distribution 
the mean µ of X is found by multiplying each possible value of X by its
probability, and then adding the products.  X  x1 p1  x2 p2    xk pk
  xi pi
A bottle cap is tossed three times. We define the random variable X as the
number of number of times the cap drops with the open side up.
Value of X
0
1
2
3
Probability
.027
.189
.441
.43
The mean µ of X is
µ = (0*.027)+(1*.189)+(2*.441)+(3*.343)
= 2.1
Variance of a random variable
The variance and the standard deviation are the measures of spread
that accompany the choice of the mean to measure center.
The variance σ2X of a random variable is a weighted average of the
squared deviations (X − µX)2 of the variable X from its mean µX. Each
outcome is weighted by its probability in order to take into account
outcomes that are not equally likely.
The larger the variance of X, the more scattered the values of X on
average. The positive square root of the variance gives the standard
deviation σ of X.
Variance of a random variable
Definition
Let X have pmf p(x) and expected value . Then the variance of X,
denoted by V(X) or 2X , or just 2, is
The standard deviation (SD) of X is
Variance of a discrete random variable
For a discrete random variable X
with probability distribution 
and mean µX, the variance σ2 of X is found by multiplying each squared
deviation of X by its probability and then adding all the products.
 X2  ( x1   X ) 2 p1  ( x2   X ) 2 p2    ( xk   X ) 2 pk
  ( xi   X ) 2 pi
A bottle cap is tossed three times. We define the random variable X as the
number of number of times the cap drops
with the open side up. µX = 2.1.
Value of X
0
1
2
3
Probability
.027
.189
.441
.343
The variance σ2 of X is
σ2 = .027*(0−2.1)2 + .189*(1−2.1)2 + .441*(2−2.1)2 + .343*(3−2.1)2
= .11907 + .22869 + .00441 + .27783 = .63
A Shortcut Formula for 2
The number of arithmetic operations necessary to compute 2 can be
reduced by using an alternative formula.
Proposition:
V(X) = 2 =
– 2 = E(X2) – [E(X)]2
In using this formula, E(X2) is computed first without any subtraction;
then E(X) is computed, squared, and subtracted (once) from E(X2).
Rules for means and variances
If X is a random variable and a and b are fixed numbers, then
µa+bX = a + bµX
σ2a+bX = b2σ2X
If X and Y are two independent random variables, then
µX+Y = µX + µY
σ2X+Y = σ2X + σ2Y
Investment
You invest 20% of your funds in Treasury bills and 80% in an “index fund” that
represents all U.S. common stocks. Your rate of return over time is proportional
to that of the T-bills (X) and of the index fund (Y), such that R = 0.2X + 0.8Y.
Based on annual returns between 1950 and 2003:

Annual return on T-bills µX = 5.0% σX = 2.9%

Annual return on stocks µY = 13.2% σY = 17.6%

Assume that X and Y are independent
µR = 0.2µX + 0.8µY = (0.2*5) + (0.8*13.2) = 11.56%
σ2R = σ20.2X + σ20.8Y = 0.2*2σ2X + 0.8*2σ2Y
= (0.2)2(2.9)2 + (0.8)2(17.6)2 = 198.58
σR = √198.58 = 14.09%
The portfolio has a smaller mean return than an all-stock portfolio, but it is also
less risky.
Binomial distributions
Binomial distributions are models for some categorical variables,
typically representing the number of successes in a series of n trials.
The observations must meet these requirements:

The total number of observations n is fixed in advance.

Each observation falls into just 1 of 2 categories: success and failure.

The outcomes of all n observations are statistically independent.

All n observations have the same probability of “success,” p.
We record the next 50 births at a local hospital. Each newborn is either a
boy or a girl; each baby is either born on a Sunday or not.
We express a binomial distribution for the count X of successes among n
observations as a function of the parameters n and p: B(n,p).
The parameter n is the total number of observations.
 The parameter p is the probability of success on each observation.
 The count of successes X can be any whole number between 0 and n.

A coin is flipped 10 times. Each outcome is either a head or a tail.
The variable X is the number of heads among those 10 flips, our count
of “successes.”
On each flip, the probability of success, “head,” is 0.5. The number X of
heads among 10 flips has the binomial distribution B(n = 10, p = 0.5).
Applications for binomial distributions
Binomial distributions describe the possible number of times that a
particular event will occur in a sequence of observations.
They are used when we want to know about the occurrence of an event,
not its magnitude.

In a clinical trial, a patient’s condition may improve or not. We study the
number of patients who improved, not how much better they feel.

Is a person ambitious or not? The binomial distribution describes the
number of ambitious persons, not how ambitious they are.

In quality control we assess the number of defective items in a lot of
goods, irrespective of the type of defect.
Binomial distribution in statistical sampling
A population contains a proportion p of successes. If the population is
much larger than the sample, the count X of successes among size n
has approximately the binomial distribution B(n, p).
The n observations will be nearly independent when the size of the
population is much larger than the size of the sample. As a rule of
thumb, the binomial distribution can be used when the population is
at least 20 times as large as the sample.
Binomial mean and standard deviation
0.3
distribution for a count X are defined by
P(X=x)
The center and spread of the binomial
0.25
0.2
a)
0.15
0.1
0.05
the mean  and standard deviation :
0
0
  npq  np(1  p)
We often write q as 1 – p.
2
0.3
3
4
5
6
7
8
9
10
8
9
10
8
9
10
Number of successes
0.25
P(X=x)
  np
1
b)
0.2
0.15
0.1
0.05
0
Effect of changing p when n is fixed.
a) n = 10, p = 0.25
0
1
2
3
4
5
6
7
Number of successes
0.3
b) n = 10, p = 0.5
c) n = 10, p = 0.75
For small samples, binomial distributions
are skewed when p is different from 0.5.
P(X=x)
0.25
0.2
c)
0.15
0.1
0.05
0
0
1
2
3
4
5
6
7
Number of successes
Color blindness
The frequency of color blindness (dyschromatopsia) in the
Caucasian American male population is estimated to be
about 8%. We take a random sample of size 25 from this population.
The population is definitely larger than 20 times the sample size, thus we can
approximate the sampling distribution by B(n = 25, p = 0.08).
What are the mean and standard deviation of the count of color blind
individuals in the SRS of 25 Caucasian American males?

µ = np = 25*0.08 = 2
σ = √np(1  p) = √(25*0.08*0.92) = 1.36
Calculations for binomial probabilities
The binomial coefficient counts the number of ways in which k
successes can be arranged among n observations.
The binomial probability P(X = k) is this count multiplied by the
probability of any specific arrangement of the k successes:
P ( X  k )   n  p k (1  p ) n  k
k
X
0
0 n
nC0 p q =
1
1 n-1
nC1 p q
2
2 n-2
nC2 p q
The probability that a binomial random variable takes any
…
range of values is the sum of each probability for getting
k
exactly that many successes in n observations.
…
P(X ≤ 2) = P(X = 0) + P(X = 1) + P(X = 2)
P(X)
n
Total
qn
…
k n-k
nCx p q
…
n 0
nCn p q =
1
pn
Binomial formulas
The number of ways of arranging k successes in a series of n
observations (with constant probability p of success) is the number of
possible combinations (unordered sequences).
This can be calculated with the binomial coefficient:
n!
 n  
 k  k!(n  k )!
Where k = 0, 1, 2, ..., or n.
Binomial formulas

The binomial coefficient “n_choose_k” uses the factorial notation “!”.
The factorial n! for any strictly positive whole number n is:
n! = n × (n − 1) × (n − 2) × · · · × 3 × 2 × 1

For example: 5! = 5 × 4 × 3 × 2 × 1 = 120

Note that 0! = 1.

Color blindness
The frequency of color blindness (dyschromatopsia) in the
Caucasian American male population is estimated to be
about 8%. We take a random sample of size 25 from this population.
What is the probability that exactly five individuals in the sample are color blind?


P(x  5) 
n!
25!
p k (1  p) n k 
0.08 5 (0.92) 20
k!(n  k)!
5!(20)!
P(x  5) 
21*22*23*24*25
0.085 (0.92)20
1*2* 3* 4 *5
P(x = 5) = 53,130 * 0.0000033 * 0.1887 = 0.03285