Transcript Document

RANDOM VARIABLES,
EXPECTATIONS,
VARIANCES ETC.
- THEORY
1
Variable
• Recall:
• Variable: A characteristic of population or
sample that is of interest for us.
• Random variable: A function defined on the
sample space S that associates a real number
with each outcome in S.
2
DISCRETE RANDOM VARIABLES
• If the set of all possible values of a r.v. X is a
countable set, then X is called discrete r.v.
• The function f(x)=P(X=x) for x=x1,x2, … that
assigns the probability to each value x is called
probability density function (p.d.f.) or
probability mass function (p.m.f.)
3
Example
• Discrete Uniform distribution:
P (X  x ) 
1
; x  1, 2 ,..., N ;
N  1, 2 ,...
N
• Example: throw a fair die.
P(X=1)=…=P(X=6)=1/6
4
CONTINUOUS RANDOM VARIABLES
• When sample space is uncountable
(continuous)
• Example: Continuous Uniform(a,b)
f (X ) 
1
ba
a  x  b.
5
CUMULATIVE DENSITY FUNCTION
(C.D.F.)
• CDF of a r.v. X is defined as F(x)=P(X≤x).
• Note that, P(a<X ≤b)=F(b)-F(a).
• A function F(x) is a CDF for some r.v. X iff it
satisfies
lim
x  
lim
x
lim
F(x )  1
h 0
a  b
F(x )  0
F(x  h )  F(x )
implies
F(x) is continuous from right
F(a )  F(b )
F(x) is non-decreasing.
6
Example
•
•
•
•
Consider tossing three fair coins.
Let X=number of heads observed.
S={TTT, TTH, THT, HTT, THH, HTH, HHT, HHH}
P(X=0)=P(X=3)=1/8; P(X=1)=P(X=2)=3/8
x
F(x)
(-∞,0)
0
[0,1)
1/8
[1,2)
1/2
[2,3)
7/8
[3, ∞)
1
7
Example
• Let
f ( x )  2 (1  x )
3
for x  0
 x
3
2
  2 (1  t ) dt  1  (1  x )
F(x )  P (X  x )   0
 0
for x  0
P ( 0 . 4  X  0 . 45 ) 
0 . 45
0 .4
for x  0
f ( x ) dx  F ( 0 . 45 )  F ( 0 . 4 )  0 . 035
8
JOINT DISTRIBUTIONS
• In many applications there are more than one
random variables of interest, say X1, X2,…,Xk.
JOINT DISCRETE DISTRIBUTIONS
• The joint probability mass function (joint pmf)
of the k-dimensional discrete rv
X=(X1, X2,…,Xk) is
f  x 1 , x 2 ,..., x k   P X 1  x 1 , X 2  x 2 ,..., X k  x k 
  x 1 , x 2 ,..., x k  of X .
9
JOINT DISCRETE DISTRIBUTIONS
• A function f(x1, x2,…, xk) is the joint pmf for
some vector valued rv X=(X1, X2,…,Xk) iff the
following properties are satisfied:
f(x1, x2,…, xk) 0 for all (x1, x2,…, xk)
and
 ...  f  x 1 , x 2 ,...,
x1
x k   1.
xk
10
Example
• Tossing two fair dice  36 possible sample
points
• Let X: sum of the two dice;
Y: |difference of the two dice|
• For e.g.:
– For (3,3), X=6 and Y=0.
– For both (4,1) and (1,4), X=5, Y=3.
11
Example
• Joint pmf of (x,y)
x
2
0
1
y
2
3
3
1/36
4
5
1/36
1/18
6
7
1/36
1/18
1/18
1/18
1/18
5
9
1/36
1/18
4
8
1/18
1/18
11
1/36
1/18
1/18
10
12
1/36
1/18
1/18
1/18
1/18
1/18
Empty cells are equal to 0.
e.g. P(X=7,Y≤4)=f(7,0)+f(7,1)+f(7,2)+f(7,3)+f(7,4)=0+1/18+0+1/18+0=1/9
12
MARGINAL DISCRETE
DISTRIBUTIONS
• If the pair (X1,X2) of discrete random variables
has the joint pmf f(x1,x2), then the marginal
pmfs of X1 and X2 are
f 1  x1    f  x1 , x 2  and
x2
f 2  x 2    f  x1 , x 2 
x1
13
Example
• In the previous example,
–
5
P(X  2) 
 P ( X  2 , y )  P ( X  2 , y  0 )  ...  P ( X  2 , y  5 )  1 / 36
y0
–
12
P (Y  2) 
 P(x, Y
 2 )  4 / 18
x2
14
JOINT DISCRETE DISTRIBUTIONS
• JOINT CDF:
F  x 1 , x 2 ,..., x k   P  X 1  x 1 ,..., X k  x k .
• F(x1,x2) is a cdf iff
lim
F  x 1 , x 2   F   , x 2   0 ,  x 2 .
lim
F  x 1 , x 2   F  x 1 ,    0 ,  x 1 .
x 1  
x 2  
lim
x1  
x2
F  x 1 , x 2   F  ,    1
P ( a  X 1  b , c  X 2  d )  F  b , d   F  b , c   F a , d   F a , c   0 ,  a  b and c  d.
lim
h0

F  x 1  h , x 2   lim
h0

F  x 1 , x 2  h   F  x 1 , x 2 ,  x 1 and x 2 .
15
JOINT CONTINUOUS DISTRIBUTIONS
• A k-dimensional vector valued rv X=(X1,
X2,…,Xk) is said to be continuous if there is a
function f(x1, x2,…, xk), called the joint
probability density function (joint pdf), of X,
such that the joint cdf can be given as
F  x 1 , x 2 ,..., x k  
x1 x 2
 
 
xk
...
 f  t 1 , t 2 ,..., t k dt 1dt 2 ... dt k

16
JOINT CONTINUOUS DISTRIBUTIONS
• A function f(x1, x2,…, xk) is the joint pdf for
some vector valued rv X=(X1, X2,…,Xk) iff the
following properties are satisfied:
f(x1, x2,…, xk) 0 for all (x1, x2,…, xk)
and


 
 

...
 f  x 1 , x 2 ,...,
x k dx 1 dx 2 ... dx k  1 .

17
JOINT CONTINUOUS DISTRIBUTIONS
• If the pair (X1,X2) of discrete random variables
has the joint pdf f(x1,x2), then the marginal
pdfs of X1 and X2 are




f1  x1    f  x1 , x 2 dx 2 and f 2  x 2    f  x1 , x 2 dx 1 .
18
JOINT DISTRIBUTIONS
• If X1, X2,…,Xk are independent from each
other, then the joint pdf can be given as
f  x 1 , x 2 ,..., x k   f  x 1 f  x 2 ... f  x k

And the joint cdf can be written as
F  x 1 , x 2 ,..., x k   F  x 1 F  x 2 ... F  x k

19
CONDITIONAL DISTRIBUTIONS
• If X1 and X2 are discrete or continuous random
variables with joint pdf f(x1,x2), then the
conditional pdf of X2 given X1=x1 is defined by
f x 2 x 1  
f x 1 , x 2 
f x 1 
,  x 1 such that f  x 1   0 , 0 elsewhere.
• For independent rvs,
f  x 2 x1   f  x 2  .
f  x1 x 2   f  x1  .
20
Example
Statistical Analysis of Employment Discrimination Data (Example
from Dudewicz & Mishra, 1988; data from Dawson, Hankey
and Myers, 1982)
% promoted (number of employees)
Pay grade
Affected class
others
5
100 (6)
84 (80)
7
88 (8)
87 (195)
9
93 (29)
88 (335)
10
7 (102)
8 (695)
11
7 (15)
11 (185)
12
10 (10)
7 (165)
13
0 (2)
9 (81)
14
0 (1)
7 (41)
Affected class might be a minority group or e.g. women
21
Example, cont.
• Does this data indicate discrimination against the
affected class in promotions in this company?
• Let X=(X1,X2,X3) where X1 is pay grade of an
employee; X2 is an indicator of whether the
employee is in the affected class or not; X3 is an
indicator of whether the employee was promoted or
not
• x1={5,7,9,10,11,12,13,14}; x2={0,1}; x3={0,1}
22
Example, cont.
Pay grade
Affected class
others
10
7 (102)
8 (695)
• E.g., in pay grade 10 of this occupation (X1=10) there
were 102 members of the affected class and 695
members of the other classes. Seven percent of the
affected class in pay grade 10 had been promoted,
that is (102)(0.07)=7 individuals out of 102 had been
promoted.
• Out of 1950 employees, only 173 are in the affected
class; this is not atypical in such studies.
23
Example, cont.
Pay grade
Affected class
others
10
7 (102)
8 (695)
• E.g. probability of a randomly selected employee
being in pay grade 10, being in the affected class, and
promoted: P(X1=10,X2=1,X3=1)=7/1950=0.0036
(Probability function of a discrete 3 dimensional r.v.)
• E.g. probability of a randomly selected employee
being in pay grade 10 and promoted:
P(X1=10, X3=1)= (7+56)/1950=0.0323 (Note: 8% of 695 > 56) (marginal probability function of X1 and X3)
24
Example, cont.
• E.g. probability that an employee is in the other class
(X2=0) given that the employee is in pay grade 10
(X1=10) and was promoted (X3=1):
P(X2=0| X1=10, X3=1)= P(X1=10,X2=0,X3=1)/P(X1=10, X3=1)
=(56/1950)/(63/1950)=0.89 (conditional probability)
• probability that an employee is in the affected class
(X2=1) given that the employee is in pay grade 10
(X1=10) and was promoted (X3=1):
P(X2=1| X1=10, X3=1)=(7/1950)/(63/1950)=0.11
25
Production problem
• Two companies manufacture a certain type of sophisticated
electronic equipment for the government; to avoid the lawsuits
lets call them C and company D. In the past, company C has had
5% good output, whereas D had 50% good output (i.e., 95% of C’s
output and 50% of D’s output is not of acceptable quality). The
government has just ordered 10,100 of these devices from
company D and 11,000 from C (maybe political reasons, maybe
company D does not have a large enough capacity for more
orders). Before the production of these devices start, government
scientists develop a new manufacturing method that they believe
will almost double the % of good devices received. Companies C
and D are given this info, but its use is optional: they must each
use this new method for at least 100 of their devices, but its use
beyond that point is left to their discretion.
26
Production problem, cont.
• When the devices are received and tested, the
following table is observed:
Production method
Results
Standard
New
Bad
5950
9005
Good
5050 (46%)
1095 (11%)
• Officials blame scientists and companies for
producing with the lousy new method which is
clearly inferior.
• Scientists still claim that the new method has almost
doubled the % of good items.
• Which one is right?
27
Production problem, cont.
• Answer: the scientists rule!
Company
C
Results
D
Standard
New
Standard
New
Bad
950
9000
5000
5
Good
50 (5%)
1000 (10%)
5000 (50%)
95 (95%)
• The new method nearly doubled the % of good
items for both companies.
• Company D knew their production under
standard method is already good, so they used
the new item for only minimum allowed.
• This is called Simpson’s paradox. Do not combine
the results for 2 companies in such cases.
28
Describing the Population
• We’re interested in describing the population by
computing various parameters.
• For instance, we calculate the population mean
and population variance.
29
EXPECTED VALUES
Let X be a rv with pdf fX(x) and g(X) be a
function of X. Then, the expected value (or
the mean or the mathematical expectation) of
g(X)
  g  x  f X  x  , if X is discrete
 x
E  g  X    
  g  x  f X  x  dx, if X is continuous
  
providing the sum or the integral exists, i.e.,
<E[g(X)]<.
30
EXPECTED VALUES
• E[g(X)] is finite if E[| g(X) |] is finite.
  g  x  f X  x  <  , if X is discrete
 x
E  g  X     
  g  x  f X  x  dx<  , if X is continuous
  
31
Population Mean (Expected Value)
• Given a discrete random variable X with
values xi, that occur with probabilities p(xi),
the population mean of X is
E(X )     x i  p( x i )
all x i
32
Population Variance
– Let X be a discrete random variable with
possible values xi that occur with
probabilities p(xi), and let E(xi) =. The
variance of X is defined by
V ( X )    E ( X   )
2
2

2
 ( x i   ) p( x i )
Unit*Unit
all x i
The s tan dard deviation is


Unit
2
33
EXPECTED VALUE
• The expected value or mean value of a
continuous random variable X with pdf f(x) is
  E(X ) 

xf ( x ) dx
all x
• The variance of a continuous random
variable X with pdf f(x) is

2
 V ar ( X )  E ( X   ) 
2

( x   ) f ( x ) dx
2
all x
 E(X )   
2
2
 ( x)
all x
2
f ( x ) dx  
2
34
EXAMPLE
• The pmf for the number of defective items in
a lot is as follows
 0.35, x  0
 0.39, x  1

p ( x )   0.19, x  2
 0.06, x  3

 0.01, x  4
Find the expected number and the variance of
defective items.
35
EXAMPLE
• Let X be a random variable. Its pdf is
f(x)=2(1-x), 0< x < 1
Find E(X) and Var(X).
36
Laws of Expected Value
• Let X be a rv and a, b, and c be constants.
Then, for any two functions g1(x) and g2(x)
whose expectations exist,
a ) E  ag 1  X   bg 2  X   c   aE  g 1  X
b) If g 1  x   0 for all x , then E  g 1  X
   bE  g 2  X    c
   0.
c ) If g 1  x   g 2  x  for all x, then E  g 1  x    E  g 2  x   .
d ) If a  g 1  x   b for all x, then a  E  g 1  X
  b
37
Laws of Expected Value and Variance
Let X be a rv and c be a constant.
Laws of Expected Value
 E(c) = c
 E(X + c) = E(X) + c
 E(cX) = cE(X)
Laws of
Variance
 V(c) = 0
 V(X + c) = V(X)
 V(cX) = c2V(X)
38
EXPECTED VALUE


E   a i X i    a i E  X i .
 i 1
 i 1
k
k
If X and Y are independent,
E  g  X h Y   E  g  X
E h Y 
The covariance of X and Y is defined as
Cov  X , Y   E  X  E  X  Y  E  Y 
 E ( XY )  E ( X ) E ( Y )
39
EXPECTED VALUE
If X and Y are independent,
Cov  X , Y   0
The reverse is usually not correct! It is only correct
under normal distribution.
If (X,Y)~Normal, then X and Y are independent
iff
Cov(X,Y)=0
40
EXPECTED VALUE
Var  X 1  X 2   Var  X 1   Var  X 2   2 Cov  X 1 , X 2 
If X1 and X2 are independent,
Var  X 1  X 2   Var  X 1   Var  X 2 
41
CONDITIONAL EXPECTATION AND
VARIANCE
  yf  y x 
y

E Y x    
  yf  y x dy


Var Y x   E Y
, if X and Y are discrete.
, if X and Y are continuous .
2
x    E Y x 
2
42
CONDITIONAL EXPECTATION AND
VARIANCE
E  E Y X
 
E Y

Var ( Y )  E X ( Var ( Y | X ))  Var X ( E ( Y | X ))
(EVVE rule)
Proofs available in Casella & Berger (1990), pgs. 154 & 158
43
Example - Advanced
• An insect lays a large number of eggs, each
surviving with probability p. Consider a large
number of mothers. X: number of survivors in
a litter; Y: number of eggs laid
• Assume:
X | Y ~ Binomial
(Y , p)
Y |  ~ Poisson (  )
 ~ Exponentia
l(  )
• Find: expected number of survivors, i.e. E(X)
44
Example - solution
EX=E(E(X|Y))
=E(Yp)
=p E(Y)
=p E(E(Y|Λ))
=p E(Λ)
=pβ
45
SOME MATHEMATICAL EXPECTATIONS
• Population Mean:  = E(X)
• Population Variance:
2
2
2
2
  Var  X   E  X     E  X     0
(measure of the deviation from the population mean)
• Population Standard
• Moments:
Deviation:  

2
0
  E  X   the k-th m om ent
*
k
k
 k  E  X     the k-th central m om ent
k
46
SKEWNESS
• Measure of lack of symmetry in the pdf.
Skew ness 
EX  

3
3

3

3/ 2
2
If the distribution of X is symmetric around its
mean ,
3=0  Skewness=0
47
KURTOSIS
• Measure of the peakedness of the pdf. Describes the
shape of the distribution.
K urtosis 
EX  

4
4

4
2
2
Kurtosis=3  Normal
Kurtosis >3  Leptokurtic
(peaked and fat tails)
Kurtosis<3  Platykurtic
(less peaked and thinner tails)
48
KURTOSIS
• What is the range of kurtosis?
• Claim: Kurtosis ≥ 1. Why?
• Proof:
Var (Y )  E (Y )  ( EY )
2
Let
2
Y  (X1  ) .
2
E (( X 1   ) )  Var (( X 1   ) )  [ E (( X 1   ) ) ]
4
2
 Var (( X 1   ) )  
2
Var (( X 1   ) )
2
2
4
2
Kurtosis


4
11
49
Problems
1. True or false: The mean, median and mode of
a normal distribution with mean µ and std
deviation σ coincide.
50
Problems
2. True or false: In a symmetrical population,
mean, median, and mode coincide. (Kendall &
Stuart, 1969, p. 85)
51
Problems
3. True or False: “The mean, median and mode occur in
the same order (or reverse order) as in the
dictionary; and that the median is nearer to the
mean than that to the mode, just as the
corresponding words are nearer together in the
dictionary. “ (Kendall & Stuart, 1969, p. 39)
52
Problems
4. If X, Y, Z and W are random variables, then
find (show the derivations):
a) Cov(X+Y,Z+W)
b) Cov(X-Y,Z)
53
Problems
5. Calculate
x
a) the skewness for f ( x )  e , x  0 . Comment.
b) the kurtosis for the following pdf and
comment:
f ( x) 
1
e
|x|
2
54
Problems
5. c) Consider the discrete random variable X
with pdf given below:
x
-3
-1
0
2
f(x)
1/4
1/4
( 6  3 2 ) / 16
1/8
2 2
3 2 / 16
i) Is the distribution of X symmetric around
mean?
ii) Show that the 3rd central moment, and hence
skewness, are 0. What does this imply?
55
Problem
6. Let X1, X2, X3 be three independent r.v.s each
with variance  2 . Define new r.v.s W1, W2,
W3 by W1=X1; W2=X1+X2; W3=X2+X3.
Find Cor(W1,W2), Cor(W2,W3), Cor(W1,W3)
56