Week 1: Descriptive Statistics

Download Report

Transcript Week 1: Descriptive Statistics

Topic 5: Continuous
Random Variables and
Probability Distributions
CEE 11 Spring 2002
Dr. Amelia Regan
These notes draw liberally from the class text, Probability and Statistics for
Engineering and the Sciences by Jay L. Devore, Duxbury 1995 (4th edition)
Definition


A random variable X is said to be continuous if its set of possible
values is an entire interval of numbers -- that is, for some A<B,
any number x between A and B is possible.
Let X be a continuous rv. The a probability distribution or
probability density function (pdf) of X is a function f(x) such that
for any two numbers
a and b with a <= b,
b
P(a  X  b)   f ( x)dx
a

To be a legitimate pdf, f(x) must satisfy the following two
conditions:
1. f(x) >= 0 for all x

2.  f ( x)dx  1
Class exercise


To be a legitimate pdf, f(x) must satisfy the following two
conditions:

1. f(x)>=0 for all x, 2.  f ( x)dx  1
Which of the following functions is a legitimate pdf?
a)
b)
c)
0.5 x
f ( x)  
0
1
 x
f ( x)   25
0
0  x  2)
otherwise
0x5
otherwise
0.75(1  x 2 )
-1  x  1
f ( x)  
otherwise
0
Definition

Let X be a continuous rv with probability density
function (pdf) f(x). The expected value or mean value
of X, denoted E(x) or mx is given by:
E(X) = mx   xf ( x)dx


If X is a continuous rv with pdf f(x) and h(x) is a
function of X, then

E(h(X)) = - h(x)f(x)dx
Class exercise


Let X be a continuos rv with the following pmf
Calculate the pmf
 3x
03

f (x)=  2

0 otherwise


F (x)= 

Calculate E(X)
E(X) = m x   xf ( x)dx



Now let h(x) = x2+2 Calculate E(h(X))
E(h(X)) =
Variance of a random
variable

If X is a continuous random variable with mean m,
then the variance of X
Var(X) = 

2
x

  ( x  m )2 f ( x)dx  E

(X-m ) 
2
As shown for previously for discrete distributions
Var(X) = E  X2   E  X 
2
The Uniform distribution

The uniform distribution is one for which all values in the region
for which the distribution is defined are equally likely. A common
range is [0,1], though any range is possible. The pdf and cdf of
the uniform distribution are the following:
 1
for A  x  b

f ( x; A,B)=  B-A

otherwise
0
 x-A
for A  x  b

F ( x; A,B)=  B-A

otherwise
0
The Uniform distribution

Class Exercise (Banks et al., 1995, p.203):
A bus arrives every 20 minutes at a specified stop beginning at 6:40
AM and continuing until 8:40 AM. A certain passenger does not
know the schedule but arrives randomly (uniformly distributed)
between 7:00 AM and 7:30 AM every morning.
What is the probability that the passenger waits more than 5
minutes for a bus?
The Uniform distribution

E(X) and Var(X) can be obtained by integration:
 1
for A  x  b

f ( x; A,B)=  B-A

otherwise
0

Class exercise:

Integrate the pmf above to find the expression for E(X) and Var(X).
The Normal distribution

The normal distribution is probably the most used of the known
probability distributions. The pdf is messy:
1
f ( x) 
e
 2

 1  x  m 2 
 
 
 2    
,   x  
In fact, integrating the pdf of the normal distribution to obtain the
cdf and the mean and variance are not possible. However, the
normal distribution has some properties that make it easy to work
with.
The Normal distribution

When we discuss the normal distribution we must specify its
mean m and standard deviation . The normal distribution with m
 0 and  = 1.0 is known as the standard normal distribution. Its
graph is shown below:
Standard Normal Curve
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
-4
-2
0
2
4
The Normal distribution



The normal distribution is a family of distributions
Any normally distributed variable can be transformed to the
standard normal distribution which has mean 0 and standard
deviation 1.
The relationship between a normal distributed random variable
X with mean m and standard deviation  is the following to a
standard normal variable Z is the following:
z

X m

We “standardize” by subtracting a constant and dividing the
difference by another constant
The Normal distribution
What happens to the mean and the standard deviation when we
standardize?

Remember that E(aX+b) = aE(X) + b

m m m
 X m 
X m
1
E
  E    E X      0
  
  
  


Remember also that Var(aX+b) = a2Var(X)
 X m 
X m
1  1
1 2
Var 

Var


Var
X

Var
(
X
)





  
   1
  
  
   
 
2

Therefore the standard deviation is also 1
2
The Normal distribution

If the population distribution of a variable is (approximately)
normal then



Roughly 68% of the values are within one standard deviation
of the mean
Roughly 95% of the values are within two standard
deviations of the mean
Roughly 99.7% of the values are within three standard
deviations of the mean
The Normal distribution
In order to calculate probabilities associated with normal random
variables we use tables (or a built-in function which uses an
approximation method to estimate the values).



To use the tables, we must transform our data to the standard
normal distribution
For example, if x is approximately normally distributed with
mean 34 and standard deviation 3.5, what is the probability that
an observed value will be less than 32?
32-34
z=
= -0.571 p(z < -0.571) = 0.2838
3.5
Class exercise




Again if x has mean 34 and standard deviation 3.5, calculate the
following probabilities:
X > 39
X < 36
X < 30
The Exponential distribution

The exponential distribution is among the most useful in science
and engineering. It is typically used to predict the time between
events. Its pdf and cdf are the following:
a e a x for x  0
f ( x;a )  
otherwise
0
for x  0
0
F ( x;a )   -a x
x0
1-e

Please note that this notation is slightly different from that
of the text, which uses l instead of a as the parameter.
The Exponential distribution

The mean and variance of the exponential distribution are given
below:
E( X )  m 

1
a
,Var ( X )   2 
1
a2
The reason for the different notation is important -- its
because of the relationship between the exponential and
Poisson distributions. If the number of events in a time
period (often called arrivals) is Poisson distributed with
parameter l = at, then time between successive arrivals is
exponentially distributed with parameter a.
The exponential and the
Poisson

The time between calls to a suicide help line in a busy city are
approximately exponentially distributed with mean m  0.50
hours
 Calculate the probability that no calls will be received in a
1.0 hour period.

Now assume that the number of calls received per hour is
approximately Poisson distributed with parameter l = 2
per hour.
Calculate the probability that in a 1.0 hour period zero
calls are received.
The exponential and the
Poisson
You absolutely must understand the relationship between
these distributions
If the time between events is approximately exponentially
distributed then then number of events in a time period is
approximately Poisson distributed