LECTURE 2 - :: EUIS -- Enterprise University Information

Download Report

Transcript LECTURE 2 - :: EUIS -- Enterprise University Information

TOPIC 5
Normal
Distributions
Start Thinking
• As a web designer you face a task, one that involves a
continuous measurement of downloading time which could be
any value and not just a whole number. How can you answer the
following questions:

What proportion of the
homepage downloads take
more than 10 seconds?

How many seconds elapse
before 10% of the downloads
are complete? etc.
Continuous Probability Distributions
Continuous
Probability
Distributions
Uniform
Exponential
Normal
Gamma
Weibull
Beta
Normal Distribution
1.
Also called Gaussian
distribution
2.
‘Bell-shaped’ & symmetrical
3.
Mean, median, mode are
equal
4.
Random variable has infinite
range
5.
Area under the curve is 1
(the probability equals 1)
6.
Can be used to approximate discrete probability distributions, for
example: Binomial and Poisson
7.
Basis for classical statistical inference
σ
Probability Density Function
1
f ( x) 
e
 2
2
 1   x 
  

2  
•

= Standard deviation
•

= 3.14159;
•
x
= Value of random variable (– < x < )
•

= Mean
e = 2.71828
The mean and the variance are
E  X    and Var  X   
2
The notation that denotes the random variable X has a normal
distribution
2

X ~ N  ,

Effect of Varying Parameters (μ & σ)
• Normal distributions differ by mean & standard deviation
• Each distribution would require its own table.
f(x)
X ~ N 5, 0.5
B
X ~ N 5, 2
X ~ N 10, 2
A
C
5
10
x
That’s an infinite number of table! Then we need a “standardized”
normal distribution
The Standard Normal Distribution
One table!
Normal
Distribution
Standard Normal
Distribution

=1

X

X ~ N  , 2

Negative
=0
0
Z ~ N 0,1
Z
Positive
Normal Distribution Probability
• If X ~ N(μ, σ2) then the transformed random variable is
Z
X 
~ N 0,1

• The random variable Z is known as the “standardized”
version of the random variable X.
• The probability values of a general normal distribution can
be related to the cumulative distribution function of the
standard normal distribution, Φ(z)
P X  a    z   PZ  z 
where
z
a

Example
Normal Distribution

X ~ N 3, 0.8
2
Standard Normal Distribution

Z ~ N 0,1
=1
  0.8
3
3 4.2
P X  4.2  ?
X
=0
0 z = 1.5
a   4.2  3
z

 1.5

0.8
 z   PZ  z 

 1.5  PZ  1.5
Therefore P X  4.2    1.5  PZ  1.5  ?
Z
Standard Normal Tables
• The values of the cumulative distribution function of the standard
normal distribution, Φ(z) or the probability P(Z ≤ z) is already
tabulated
=1
=0
0 z=?
Z
Normal Distribution Probability
• We just have the table of the cumulative distribution function of
the standard normal distribution, Φ(z) or P(Z ≤ z) to find P(X ≤ a).
By using the same table, we can find the other probabilities
Pa  X  b   PX  b   P X  a 
P X  a   1  P X  a 

a

b
X
a
X
Normal Distribution μ = 5, σ = 10 :
P(5 < X< 24.6) = ?
zupper
a
55
0

10
b   24.6  5


 1.96

10
zlower 

P5  X  24.6   P0  Z  1.96 
 PZ  1.96   PZ  0 
Standard Normal Distribution
Normal Distribution
  10
1
5
5
0
24.6
X
0 1.96
Z
Normal Distribution μ = 5, σ = 10 :
P(5 < X< 24.6) = ?
P5  X  24.6   P0  Z  1.96 
Standard Normal
 PZ  1.96   PZ  0 
Probability Table
Look up
 0.9750  0.5000 the table !
 0.4750
Z 0.04 0.05 0.06
1.8 0.9671 0.9678 0.9686
1
1.9 0.9738 0.9744 0.9750
2.0 0.9793 0.9798 0.9803
2.1 0.9838 0.9842 0.9846
0.4750
0
0 1.96
Z
Normal Distribution μ = 5, σ = 10 :
P(X ≥ 8) = ?
z
a

P X  8  PZ  0.3
85

 0.3
10
 1  PZ  0.30 
 1  0.6179  0.3821
Look up the table ! please
Standard Normal Distribution
Normal Distribution
  10
1
5
5
0
8
X
0
0.3
Z
Normal Distribution Example
You work in Quality Control for
GE. Light bulb life has a normal
distribution with  = 2000 hours
and  = 200 hours. What’s the
probability that a bulb will last
a) between 1800 and 2200
hours?
b) less than 1470 hours?
c) more than 2500 hours?
Example Solution a)
a
1800  2000
 1.0

200
b   2200  2000
zup 

 1.0

200
zlow 

Normal Distribution
P1800  X  2200   P 1  Z  1
 PZ  1  PZ  1
 0.8413  0.1587  0.6826
Standardized Normal Distribution
  200
1
0
  2000
1800
2200 X
-1.0
1.0
Z
Example Solution b)
z

a
P X  1470   PZ  2.65
 0.0040

1470  2000
 2.65
200
Normal
Distribution
Standardized Normal
Distribution
 = 200
=1
  2000
1470
0
0.0040
X
-2.65
0
Z
Example Solution c)
a
2500  2000
 2.50
200
P X  2500   PZ  2.5
 1  PZ  2.5
 1  0.9938  0.0062
Normal Distribution
Standardized Normal Distribution
z


  200
1
  2000
2500
0
X
0
2.5
Z
The Empirical Rule
19
Finding Random Variable X
for Known Probabilities
Given that P(X ≤ a) = 0.6216, what is a?
Standard Normal
Probability Table
Firstly, find the value of z !
.6217
=1
Z
.00
.01
0.2
0.0 .5000 .5040 .5080
0.1 .5398 .5438 .5478
 = 0 0.31
?
z  0.31
Z
0.2 .5793 .5832 .5871
0.3 .6179 .6217 .6255
The closest value
Finding Random Variable X
for Known Probabilities
Secondly, find the value of X = a !
Standard Normal Distribution
Normal Distribution
 = 10
 =1
.6217
 = 5 8.1?
Z 
X 

X

X    Z    5  .3110   8.1
.6217
 = 0 0.31
Z
Exercise
1.
The thicknesses of metal plates made by a particular
machine are normally distributed with a mean 0f 4.3 mm
and a standard deviation of 0.12 mm
a) What is the proportion of the metal plates that have
thickness outside the range of 4.1 to 4.5 mm
b) What are the upper and lower quartiles of the metal
plate thickness?
c) What is the value of c for which there is 80%
probability that a metal plate has a thickness within the
interval [4.3 – c, 4.3 + c]?
Answer to the Exercise
μ = 4.3 mm
a)
and
σ = 0.12 mm
P 4.5  X  4.1  1  P 4.1  X  4.5
 1  P  X  4.5  P  X  4.1
4.1  4.3
4.5  4.3
 1.67
zup 
 1.67
0.12
0.12
P 4.5  X  4.1  1  P Z  1.67   P Z  1.67 
zlow 
 1  0.9525  0.0475  0.095  9.5%
b) Lower quartile: P(X ≤ a) = 0.25 and upper quartile: P(X ≤ a)
= 0.75
P  X  a   0.25

z  0.67
X    Z   4.3   0.67   0.12  4.2196
P  X  a   0.75  z  0.67
X    Z   4.3  0.67   0.12  4.3804
Answer to the Exercise
c) It means P (a ≤ X ≤ b) = 80%, where P (X ≤ a) = 10% = 0.1
or P (X ≤ b) = 90% = 0.9. Pick either one.
P X  a   0.10 
X    Z    c
z  2.33

c  2.33  0.12  0.2796
c Z
Linear Combination of Normal
Random Variables
• Linear Functions of a Normal Random Variable
 If X ~ N(μ, σ2) and a and b are constants then

Y  aX  b ~ N a  b, a 2 2

• The Sum of Two Independent Normal Random Variables
 If X1 ~ N(μ1, σ12) and X2 ~ N(μ2, σ22) are independent
random variables then

Y  a1 X 1  a2 X 2 ~ N a11  a2  2 , a1212  a22 22

• Averaging Independent Normal Random Variables
 If Xi ~ N(μ, σ2), 1≤ i ≤ n, are independent random variables
then their average X is distributed
 2 

X ~ N   ,
n 

Example
The annual return of the stock of company A, XA say (in percent), is
distributed


X A ~ N 8.0, 1.5  N 8.0, 2.25
2
In addition, suppose that the annual return from the stock of
company B, XB say, is distributed
X B ~ N 9.5, 4.0
independent of the stock of company A.
a) What is the probability that company B’s stock performs better
than company A’s stock?
b) What is the probability that company B’s stock performs at least
2% points better than company A’s stock?
Example Solution
a) Let Y = XB – XA , then

Y  a1 X 1  a2 X 2 ~ N a11  a2  2 , a1212  a22 22

Y ~ N 9.5   8.0 , 4   1 2.25
2
Y ~ N 1.5, 6.25 or


  1.5 and   6.25
Performs better means Y ≥ 0.
PY  0   1  PY  0  and
PY  0   1  PZ  0.60 
 1  0.2743  0.7257
0  1.5
z
 0.60
6.25
Example Solution
b) It means Y ≥ 2.0.
PY  2.0   1  PY  2.0  and
PY  2.0   1  PZ  0.20 
 1  0.5793  0.4207
z
2.0  1.5
 0.20
6.25
Normal Approximations to the
Binomial Distribution
1.
Not all binomial tables
exist
2.
Requires large sample
size
3.
Gives approximate
probability only
n = 10 p = 0.50
f(x)
.3
.2
.1
.0
0
x
2
4
6
8
4.
Need correction for
continuity
5.
The distribution B(n, p) can be approximated by a
normal distribution with the mean and variance
  np
 2  np1  p 
N np, np1  p 
10
Normal Approximations to the
Binomial Distribution
.3
f(x)
Probability Added by
Normal Curve
.2
.1
.0
P X  a   ?
a
x
Binomial Distribution: the area of all the
‘orange’ bars
Normal Approximation: the area starting
from the ‘blue’ vertical line to the left. So it
needs correction of a ‘half’ in order to have
the same area as the Binomial
Correction for Continuity
1.
2.
3.
A 1/2 unit adjustment to discrete
variable
Improves accuracy
Correction for each of four cases:

For P(X ≥ a), use the area
above â = (a – 0.5).

For P(X > a), use the area
above â = (a + 0.5).

For P(X ≤ a), use the area
below â = (a + 0.5).

For P(X < a), use the area
below â = (a – 0.5).
-0.5
a
+0.5
Normal Approximation Procedure
•
Normal approximations to the binomial distribution work
well as long as
np  5 and n1  p   5
•
For each of four cases above, use
z
where
aˆ  

aˆ  a  0.5
  np and   np1  p 
Example
Suppose that a fair coin is tossed n times. The
distribution of the number of heads obtained, X, is B(n,
0.5). If n = 100, what is the probability of obtaining
between 45 and 55 heads?
np  5 and n1  p   5 are satisfied since
np  n1  p   100  0.5  50




P 45  X  55  P aˆ  X  bˆ  P X  bˆ  P  X  aˆ 
aˆ  45  0.5  44.5 and bˆ  55  0.5  55.5
zlow 
aˆ  np
44.5  50

 1.10
5
np 1  p 
zup 
bˆ  np
55.5  50

 1.10
5
np 1  p 
Example Solution
P 45  X  55  P Z  1.1  P Z  1.1
 0.8643  0.1357  0.7286
Using a statistical software or Excel, the exact solution
of the binomial probability is 0.7287. The difference is
just about 0.0001
Central Limit Theorem
•
If X1, …, Xn is a sequence of independent identically
distributed random variables with a mean μ and a variance σ2
(not necessarily normal distributed), then the distribution of
their average X can be approximated by a
 2 

N   ,
n 

distribution. Similarly, the distribution of the sum X1 + … + Xn
can be approximated by a

N n , n 2

distribution. The general rule is that the approximation is
adequate as long as n ≥ 30