All of Statistics Chapter 5: Convergence of Random Variables Nick Schafer

Download Report

Transcript All of Statistics Chapter 5: Convergence of Random Variables Nick Schafer

All of Statistics
Chapter 5: Convergence
of Random Variables
Nick Schafer
Overview



What are we studying? Probability
What is probability? The mathematical
language for quantifying uncertainty
Why are we studying probability? Dan has his
reasons…
Motivation

These are the kind of questions that we hope
to be able to answer.
Review



Definition: Random Variable –
a random variable X is a
mapping from the sample
space to the real line
Example: Flip a fair coin twice.
The sample space is every
combination of heads and tails.
Choose the random variable X
to be the number of heads. Let
our outcome be one head and
one tail. X maps this outcome
to 1.
Note: The notation can be confusing. In the book,
usually X denotes the map and X with a subscript
denotes a real number. However, this is not always
the case, so you must examine the context to be
sure of the meaning.
X : 
  {HH , HT , TH , TT }
X
  HT 
 X    1
Sequences of Random
Variables




Much of probability theory is
concerned with large sequences of
random variables.
This study is sometimes known as
large sample theory, limit theory, or
asymptotic theory.
What is a sequence of random
variables? Simply a set of indexed set
of random variables.
We will be interested in sequences
that have some interesting limiting
behavior (i.e. we can say something
about them as n gets large)
X1 , , X n
A Special Kind of Sequence of
Random Variables


As it turns out, a very common and
particularly useful class of sequences of
random variables is IID
Definition: IID – Independent and identically
distributed


Independent – essentially, the value of one
random variable doesn’t effect the value of
any other; for instance, a coin doesn’t
remember which side last landed up, so
consecutive flips are said to be independent
Identically distributed – each random variable
P
X has associated with it a cumulative
A



 P A [0,1]
distribution function (CDF) which is derived
from the probability measure. The CDF gives
the probability of the value of the random
P  CDF  FX ( x)  P( X  x)
variable being less than or equal to a certain
value. When two or more random variables
have the same CDF, we say they are
X1 , , X n F
identically distributed.
 
Statements about sequences
of random variables

Given a sequence of random variables, most likely IID, it would
be useful to be able to make statements of the form:
 The average of all Xi will be between two values with certain
probability.

Or, what is the probability that the average of Xi is less/greater
than a certain value?
These kind of statements can be made with the help of the Weak
Law of Large Numbers (WLLN) and the Central Limit Theorem
(CLT), respectively. So why not state them now? “Hold your
horses” there, Makarand. The statement of the WLLN and the
CLT make use of a few different types of convergence which
must be discussed first.


Example: Flip a fair coin n times. The average number of heads per
toss will be between .4 and .6 with probability greater than or equal to
70% if we flip 84 times (n=84).
Types of Convergence

There are two main types of convergence

Convergence in Probability (CIP)


Convergence in Distribution (CID)


A sequence of random variables is said to
converge in probability to X if the probability of
it differing from X goes to zero as n gets large.
A sequence of random variables is said to
converge in distribution to X if the limit of the
corresponding CDFs is the CDF of X.
There is also another type, called
convergence in quadratic mean, which is
used primarily because it is stronger than CIP
or CID (it implies CIP and CID) and it can be
computed relatively easily.
P( X n  X   )  0
P
 X n 
X
lim Fn  t   F  t 
n 
 X n ~ X
qm
X n 
X
qm  CIP  CID
Weak Law of Large Numbers


If a sequence of
random variables are
IID, then the sample
average converges in
probability to the
expectation value.
On the left we have information about
many trials and on the right we have
information about the relative
likelihood of the different values a
random variable can take on. In words,
the WLLN says that the distribution of
the sample average becomes more
concentrated around the expectation
as n gets large.
1
X n  i X i
n
P
X n 
   E ( X1 ) 

 xf ( x)dx

Example of using the WLLN


Consider flipping a coin for which the
probability of heads is p. Let Xi denote the
outcome of a single toss (either 0 or 1).
Hence p = P(Xi=1)=E(Xi). The first equality
is a definition. The second equality is
obtained by averaging over the distribution.
The fraction of heads after n tosses is
equal to the sample average. Note that the
Xi are IID. Therefore the WLLN can be
applied. The WLLN says that the sample
average converges to p = E(Xi) in
probability.
You may find yourself wondering, how
many times must I flip this coin such that
the sample average is between .4 and .6
with probability greater than or equal to
70%? The WLLN tells you that it is possible
to find such an n. The inequalities that
Justin presented on from Chapter 4 can be
used to show that n=84 does the trick in
this case, but I’ll spare you the details.
E ( X )   i xi f ( xi )
 0(1/ 2)  1(1/ 2)  1/ 2
1
X n  i X i
n
P
X n 
   E ( X1 )
The Central Limit Theorem

Given a sequence of random variables with a mean and a
variance, the CLT says that the sample average has a
distribution which is approximately Normal, and gives the new
mean and variance.
X 1 , , X n ;  ; 2
 X n ~ N (  ,  / n)
2

Notice that nothing at all need be assumed about the P, CDF, or
PDF associated with X, which could have any distribution from
which a mean and variance can be derived.
Example of using the CLT


Suppose that the number of errors per computer
program has a Poisson distribution with mean 5. We
get 125 programs. Approximately what is the
probability that the average number of errors per
computer program is less than 5.5?
In this case 125 is the sample size, which we hope
is large enough to make a good approximation. The
approximation we are making here is that the
sample average will have a Normal distribution.
Taking the sample size, mean and variance into
account, it is possible to show that the question
asked is equivalent to the probability of the standard
Normal distribution being less than 2.5, which turns
out to be approximately 0.9983.
Topics in Chapter 5 not
covered in this presentation


All proofs
Slutzky’s theorem and related theorems


Multivariate central limit theorem


The effect of adding sequences of random
variables on their convergence behavior
CLT with IID random vectors instead of variables
The delta method

The effect of applying a smooth function to a
sequence of random variables on its limiting
behavior
Interesting problems
6&8
Bibliography

Chapters 1-5 of All of Statistics by Larry
Wasserman