Random Variable - The University of Vermont

Download Report

Transcript Random Variable - The University of Vermont

Random Variable
• A random variable X is a function that assign a real number, X(ζ), to
each outcome ζ in the sample space of a random experiment.
• Domain of the random variable -- S
• Range of the random variable -- Sx
• Example 1: Suppose that a coin is tossed 3 times and the sequence
of heads and tails is noted.
Sample space S={HHH,HHT,HTH,HTT,THH,THT,TTH, TTT}
X :number of heads in three coin tosses.
ζ : HHH
HHT
HTH
THH
HTT
THT
TTH
TTT
X(ζ): 3
2
2
2
1
1
1
0
Sx={0,1,2,3}
1
Probability of random variable
• Example 2: The event {X=k} ={k heads in three coin tosses} occurs
when the outcome of the coin tossing experiment contains k heads.
P[X=0]=P[{TTT}]=1/8
P[X=1]=P[{HTH}]+P[{THT}]+P[{TTH}]=3/8
P[X=2]=P[{HHT}]+P[{HTH}]+P[{THH}]=3/8
P[X=3]=P[{HHH}]=1/8
• Conclusion:
B⊂SX
A={ζ: X(ζ) in B}
P[B]=P[A]=P[ζ: X(ζ) in B].
Event A and B are referred to as equivalent events.
All numerical events of practical interest involves {X=x} or {X in I}
2
Events Defined by Random Variable
• If X is a r.v. and x is a fixed real number, we can define the event
(X=x) as
(X=x)={ζ: X(ζ)=x)}
(X=x)={ζ: X(ζ)=x)}
(X=x)={ζ: X(ζ)=x)}
(x1<X≤x2)={ζ: x1<X(ζ)≤x2}
These events have probabilities that are denoted by
P[X=x]=P{ζ: X(ζ}=x}
P[X=x]=P{ζ: X(ζ}=x}
P[X=x]=P{ζ: X(ζ}=x}
P[x1<X≤x2]=P{ζ: x1<X(ζ)≤x2}
3
Distribution Function
The cumulative distribution function (cdf) of a random variable X is
defined as the probability of events {X ≤ x}:
Fx(x)=P[X ≤ x] for -∞< x ≤ +∞
In terms of underlying sample space, the cdf is the probability of the
event {ζ: X(ζ)≤x}.
• Properties:
0  FX  1
lim FX ( x )  1
x 
lim FX ( x )  0
x  
a  b  FX ( a )  FX (b)
FX (b)  lim FX (b  h)  FX (b  )
h 0
P[ a  x  b]  FX (b)  FX ( a )
P[ X  b]  FX (b)  FX (b  )
P[ X  b]  1  FX (b)
4
A typical example of cdf
• Tossing a coin 3 times and counting the number of heads
x
-1
0
1
2
3
4
X≤x
Ø
{TTT}
{TTT,TTH,THT,HTT}
{TTT,TTH,THT,HTT,HHT,HTH,THH}
S
S
FX(x)
0
1/8
4/8
7/8
1
1
5
Two types of random variables
• A discrete random variable has a countable number of possible
values.
X: number of heads when trying 5 tossing of coins.
The values are countable
• A continuous random variable takes all values in an interval of
numbers.
X: the time it takes for a bulb to burn out.
The values are not countable.
6
Example of cdf for discrete random variables
• Consider the r.v. X defined in example 2.
7
Discrete Random Variable And
Probability Mass Function
• Let X be a r.v. with cdf FX(x). If FX(x) changes value only in jumps
and is constant between jumps, i.e. FX(x) is a staircase function,
then X is called a discrete random variable.
• Suppose xi < xj if i<j.
P(X=xi)=P(X≤xi) - P(X≤xj)= FX(xi) - FX(xi-1)
Let px(x)=P(X=x)
The function px(x) is called the probability mass function (pmf) of the
discrete r.v. X.
• Properties of px(x):
0  p X ( xk )  1 k  1,2,...
p X ( x)  0
x  xk
(k  1,2,...)
 pX ( xk )  1
k
8
Example of pmf for discrete r.v.
• Consider the r.v. X defined in example 2.
9
Continuous Random variable and
Probability Density function
• Let X be a r.v. with cdf FX(x) . If FX(x) is continuous and also has a
derivative dFX(x) /dx which exist everywhere except at possibly a
finite number of points and is piecewise continuous, then X is called
a continuous random variable.
• Let
dF ( x)
f X ( x) 
X
dx
• The function fX(x) is called the probability density function (pdf) of
the continuous r.v. X . fX(x) is piecewise continuous.
• Properties:
f X ( x)  0



f X ( x)dx  1
b
P(a  X  b)   f X ( x)dx
a
10
Conditional distribution
•
Conditional probability of an event A given event B is defined as
P( A | B) 
•
P( A  B)
P( B)
Conditional cdf FX(x|B) of a r.v. X given event B is defined as
P{( X  x)  B}
P( B)
If X is discrete, then the conditional pmf pX(x|B) is defined by
P{( X  xk )  B}
p X ( xk | B)  P( X  xk | B) 
P( B)
If X is continuous r.v., then the conditional pdf fX(x|B) is defined by
FX ( x | B)  P( X  x | B) 
•
•
f X ( x | B) 
dFX ( x | B)
dx
11
Mean and variance
•
Mean:
The mean (or expected value) of a r.v. X, denoted by μX or E(X), is
defined by
 xk p X ( xk )

 X  E ( X )   k
xf ( x)dx

  X
• Moment:
The nth moment of a r.v. X is defined by
  xkn p X ( xk )

E ( X n )   k
 x n f X ( x)dx
 
• Variance:
The variance of a r.v. X, denoted by σX2 or Var(X), is defined by
 X2  Var ( X )  E{[ X  E ( X )]2 }  E ( X 2 )  [ E ( X )]2
  ( x k   X ) 2 p X ( xk )

 X2   k
 ( x   X ) 2 f X ( x)dx

12
Expectation of a Function of a Random variable
•
Given a r.v. X and its probability distribution (pmf in the discrete
case and pdf in the continuous case), how to calculate the expected
value of some function of X, E(g(X))?
• Proposition:
(a) If X is a discrete r.v. with pmf pX(x), then for any real-valued
function g,
E[ g ( X )]   g ( xk ) p( xk )
k
(b) If X is a continuous r.v. with pdf fX(x), then for any real-valued
function g,

E[ g ( X )]   g ( x) f ( x)dx

13
Limit Theorem
•
Markov's Inequality: If X is a r.v. that takes only nonnegative values,
then for any value a>0,
E[ X ]
P{ X  a} 
a
• Chebyshev's Inequality: If X is a random variable with mean μ and
variance σ2, then for any value k>0
P{ X    k} 

2
k
2
14
Application of Limit theorem
• Suppose we know that the number of items produced in a factory
during a week is a random variable with mean 500.
• (a) What can be said about the probability that this week's
production will be at least 1000?
• (b) If the variance of a week's production is known to equal 100,
then what can be said about the probability that this week's
production will be between 400 and 600?
• Solution: Let X be number of item that will be produced in a week.
(a) By Markov's inequality, P{X≥1000}≤E[X]/1000=0.5
(b) By Chebyshev's inequality,
P{|X-500|≥100}≤ σ2/(100)2=0.01
P {|X-500|<100}≥1-0.01=0.99.
15
Some Special Distribution
•
•
•
•
•
•
•
•
Bernoulli Distribution
Binomial Distribution
Poisson Distribution
Uniform Distribution
Exponential Distribution
Normal (or Gaussian) Distribution
Conditional Distribution
……
16
Bernoulli Random Variable
An experiment with outcome as either a "success" or as a "failure" is
performed. Let X=1 if the outcome is a "success" and X=0 if it is a
"failure". If the pmf is given as following, such experiments are called
Bernoulli trials, X is said to be a Bernoulli random variable.
S X  {0,1}
p X (1)  p
p X (0)  1  p
 X  E[ X ]  p  X2  Var[ X ]  p(1  p)
Note: 0 ≤ p ≤ 1
Example: Tossing coin once. The head and tail are equally likely
to occur, thus p=0.5. pX(1)=P(H)=0.5, pX(1)=P(T)=0.5.
17
Binomial Random Variable
• Suppose n independent Bernoulli trails, each of which results in a
"success" with probability p and in a "failure with probability 1-p, are
to be performed. Let X represent the number of success that occur
in the n trials, then X is said to be a binomial random variable with
parameters (n,p).
S X  {0,1,2,...n}
n k
p X (k )    p (1  p ) n  k k  0,1,...n
k 
 X  E[ X ]  np  X2  np(1  p)
Example: Toss a coin 3 times, X=number of heads. p=0.5
pX (0)  0.125 pX (1)  0.375 pX (2)  0.375 pX (3)  0.125
18
Geometric Random Variable
•
Suppose the independent trials, each having probability p of being
a success, are performed until a success occurs. Let X be the
number of trails required until the first success occurs, then X is said
to be a geometric random variable with parameter p.
S X  {1,2,...}
p X (k )  p(1  p) k 1 k  1,2,...
1
(1  p)
2
 X  E[ X ] 
X 
p
p2
Example: Consider an experiment of rolling a fair die. The average
number of rolls required in order to obtain a 6:
1
1
 X  E( X )  
6
p 1/ 6
19
Poisson Random Variable
•
A r.v. X is called a Poisson random variable with parameter λ(>0) if
its pmf is given by
p X (k )  P( X  k )  e 
k
k!
k  0,1,2,...
S X  {0,1,2,...}
 X  E[ X ]    X2  
An important property of the Poisson r.v. is that it may be used to
approximate a binomial r.v. when the binomial parameter n is
large and p is small. Let λ=np
P( X  k )  e

k
k!
20
Uniform Random Variable
S X  [ a, b]
1
f X ( x) 
ba
a xb
2
ab
(
b

a
)
 X  E[ X ] 
 X2 
2
12
A uniform r.v.X is often used when we have no prior
knowledge of the actual pdf and all continuous values in
some range seem equally likely.
21
Exponential Random Variable
S X  [0,)
f X ( x )  e  x
x  0,   0
FX ( x)  1  e x
 X  E[ X ] 
1

 
2
X
1
2
The most interesting property of the exponential r.v. is
"memoryless".
P( X  x  t | X  t )  P( X  x)
x, t  0
X can be the lifetime of a component.
22
Gaussian (Normal) Random Variable
•
S X  (,)
f X ( x) 
e
 ( x   ) 2 / 2 2
   x  ,   0
2 
 X  E[ X ]    X2   2
An important fact about normal r.v. is that if X is normally
distributed with parameter μ and σ2, then Y=aX+b is normally
distributed with paramter a μ+b and (a2 σ2);
Application: central limit theorem-- the sum of large number of
independent r.v.'s,under certain conditions can be approximated b
a normal r.v. denoted by N(μ;σ2)
23
The Moment Generating Function
•
  etxk p( xk )

 (t )  E[etX ]   k
 etx f ( x)dx
 
The important property: All of the moment of X can be obtained
by successively differentiation.
 n (0)  E[ X n ]
24
Application of Moment Generating Function
•
The Binomial Distribution (n,p)
n k
 (t )  E[e ]   e   p (1  p ) n  k
k 0
k 
 (t )  n( pet  1  p ) n 1 pet
E[ X ]   (0)  np
n
tX
tk
25
Entropy
• Entropy is a measure of the uncertainty in a random experiment.
• Let X be a discrete r.v. with SX={x1,x2, …,xk} and pmf pk=P[X=xk].
Let Ak denote the event {X=xk}.
Intuitive facts: the uncertainty of Ak is low if pk is close to one, and it
is high if pk is close to zero.
Measure of uncertainty:
1
I ( X  xk )  ln
  ln P[ X  xk ]
P[ X  xk ]
26
Entropy of a random variable
• The entropy of a r.v. X is defined as the expected value of the
uncertainty of its outcomes:
H X  E[ I ( X )]   p( xk ) ln
k
1
  p( xk ) ln p( xk )
p( xk )
k
The entropy is in units of ''bits'' when the logarithm is base 2
Independent fair coin flips have an entropy of 1 bit per flip.
A source that always generates a long string of A's has an
entropy of 0, since the next character will always be an 'A'.
27
Entropy of Binary Random Variable
•
•
•
•
Suppose r.v. X with Sx={0,1}, p=P[X=0]=1-P[X=1]. (Flipping a coin).
The HX=h(p) is symmetric about p=0.5 and achieves its maximum at p=0.5;
The uncertainty of event (X=0) and (X=1) vary together in complementary
manner.
The highest average uncertainty occurs when p(0)=p(1)=0.5;
28
Reduction of Entropy Through Partial
Information
•
Entropy quantifies uncertainty by the amount of information required
to specify the outcome of a random experiment.
• Example:
If r.v. X equally likely takes on the values from set
{000,001,010,…,111} (Flipping coins 3 times), given the event A={X
begins with a 1}={100,101,110,111}, what is the change of entropy of
r.v.X ?
1
1 1
1
1
1
H X   log2  log2  ...  log2  3bits
8
8 8
8
8
8
1
1
1
1
H X | A   log2  ...  log2  2bits
4
4
4
4
29
Thanks!
Question?
30
Extending discrete entropy to the continuous
case: differential entropy
•
Quantization method: Let X be a continuous r.v. that takes on
values in the interval [a b]. Divide [a b] into a large number K of
subintervals of length ∆. Let Q(X) be the midpoint of the subinterval
that contains X. Find the entropy of Q.
• Let xk be the midpoint of the kth subinterval, then P[Q= xk]=P[X is in
kth subinterval]=P[xk-∆/2<X< xk+∆/2]≈ fX(xk) ∆
•
K
H Q   P[Q  xk ] ln P[Q  xk ]
k 1
K
K
k 1
k 1
H Q   f X ( xk ) ln( f X ( xk ))   ln    f X ( xk ) ln f X ( xk )
Trade off: ∆→0, HQ→∞
Differential Entropy is defined as

H X   f X ( x) ln f X ( x)dx   E[ln f X ( x)]

31
The Method of Maximum Entropy
The maximum entropy method is a procedure for estimating the pmf
or pdf of a random variable when only partial information about X, in
the form of expected values of functions of X, is available.
Discrete case:
X being a r.v. with Sx={x1,x2,…,xk} and unknown pmf px(xk).
Given the expected value of some function g(X) of X:
K
 g(x ) p
k 1
k
X
( xk )  c
32