LECTURE 2 - :: EUIS -- Enterprise University Information
Download
Report
Transcript LECTURE 2 - :: EUIS -- Enterprise University Information
TOPIC 5
Normal
Distributions
Start Thinking
• As a web designer you face a task, one that involves a
continuous measurement of downloading time which could be
any value and not just a whole number. How can you answer the
following questions:
What proportion of the
homepage downloads take
more than 10 seconds?
How many seconds elapse
before 10% of the downloads
are complete? etc.
Continuous Probability Distributions
Continuous
Probability
Distributions
Uniform
Exponential
Normal
Gamma
Weibull
Beta
Normal Distribution
1.
Also called Gaussian
distribution
2.
‘Bell-shaped’ & symmetrical
3.
Mean, median, mode are
equal
4.
Random variable has infinite
range
5.
Area under the curve is 1
(the probability equals 1)
6.
Can be used to approximate discrete probability distributions, for
example: Binomial and Poisson
7.
Basis for classical statistical inference
σ
Probability Density Function
1
f ( x)
e
2
2
1 x
2
•
= Standard deviation
•
= 3.14159;
•
x
= Value of random variable (– < x < )
•
= Mean
e = 2.71828
The mean and the variance are
E X and Var X
2
The notation that denotes the random variable X has a normal
distribution
2
X ~ N ,
Effect of Varying Parameters (μ & σ)
• Normal distributions differ by mean & standard deviation
• Each distribution would require its own table.
f(x)
X ~ N 5, 0.5
B
X ~ N 5, 2
X ~ N 10, 2
A
C
5
10
x
That’s an infinite number of table! Then we need a “standardized”
normal distribution
The Standard Normal Distribution
One table!
Normal
Distribution
Standard Normal
Distribution
=1
X
X ~ N , 2
Negative
=0
0
Z ~ N 0,1
Z
Positive
Normal Distribution Probability
• If X ~ N(μ, σ2) then the transformed random variable is
Z
X
~ N 0,1
• The random variable Z is known as the “standardized”
version of the random variable X.
• The probability values of a general normal distribution can
be related to the cumulative distribution function of the
standard normal distribution, Φ(z)
P X a z PZ z
where
z
a
Example
Normal Distribution
X ~ N 3, 0.8
2
Standard Normal Distribution
Z ~ N 0,1
=1
0.8
3
3 4.2
P X 4.2 ?
X
=0
0 z = 1.5
a 4.2 3
z
1.5
0.8
z PZ z
1.5 PZ 1.5
Therefore P X 4.2 1.5 PZ 1.5 ?
Z
Standard Normal Tables
• The values of the cumulative distribution function of the standard
normal distribution, Φ(z) or the probability P(Z ≤ z) is already
tabulated
=1
=0
0 z=?
Z
Normal Distribution Probability
• We just have the table of the cumulative distribution function of
the standard normal distribution, Φ(z) or P(Z ≤ z) to find P(X ≤ a).
By using the same table, we can find the other probabilities
Pa X b PX b P X a
P X a 1 P X a
a
b
X
a
X
Normal Distribution μ = 5, σ = 10 :
P(5 < X< 24.6) = ?
zupper
a
55
0
10
b 24.6 5
1.96
10
zlower
P5 X 24.6 P0 Z 1.96
PZ 1.96 PZ 0
Standard Normal Distribution
Normal Distribution
10
1
5
5
0
24.6
X
0 1.96
Z
Normal Distribution μ = 5, σ = 10 :
P(5 < X< 24.6) = ?
P5 X 24.6 P0 Z 1.96
Standard Normal
PZ 1.96 PZ 0
Probability Table
Look up
0.9750 0.5000 the table !
0.4750
Z 0.04 0.05 0.06
1.8 0.9671 0.9678 0.9686
1
1.9 0.9738 0.9744 0.9750
2.0 0.9793 0.9798 0.9803
2.1 0.9838 0.9842 0.9846
0.4750
0
0 1.96
Z
Normal Distribution μ = 5, σ = 10 :
P(X ≥ 8) = ?
z
a
P X 8 PZ 0.3
85
0.3
10
1 PZ 0.30
1 0.6179 0.3821
Look up the table ! please
Standard Normal Distribution
Normal Distribution
10
1
5
5
0
8
X
0
0.3
Z
Normal Distribution Example
You work in Quality Control for
GE. Light bulb life has a normal
distribution with = 2000 hours
and = 200 hours. What’s the
probability that a bulb will last
a) between 1800 and 2200
hours?
b) less than 1470 hours?
c) more than 2500 hours?
Example Solution a)
a
1800 2000
1.0
200
b 2200 2000
zup
1.0
200
zlow
Normal Distribution
P1800 X 2200 P 1 Z 1
PZ 1 PZ 1
0.8413 0.1587 0.6826
Standardized Normal Distribution
200
1
0
2000
1800
2200 X
-1.0
1.0
Z
Example Solution b)
z
a
P X 1470 PZ 2.65
0.0040
1470 2000
2.65
200
Normal
Distribution
Standardized Normal
Distribution
= 200
=1
2000
1470
0
0.0040
X
-2.65
0
Z
Example Solution c)
a
2500 2000
2.50
200
P X 2500 PZ 2.5
1 PZ 2.5
1 0.9938 0.0062
Normal Distribution
Standardized Normal Distribution
z
200
1
2000
2500
0
X
0
2.5
Z
The Empirical Rule
19
Finding Random Variable X
for Known Probabilities
Given that P(X ≤ a) = 0.6216, what is a?
Standard Normal
Probability Table
Firstly, find the value of z !
.6217
=1
Z
.00
.01
0.2
0.0 .5000 .5040 .5080
0.1 .5398 .5438 .5478
= 0 0.31
?
z 0.31
Z
0.2 .5793 .5832 .5871
0.3 .6179 .6217 .6255
The closest value
Finding Random Variable X
for Known Probabilities
Secondly, find the value of X = a !
Standard Normal Distribution
Normal Distribution
= 10
=1
.6217
= 5 8.1?
Z
X
X
X Z 5 .3110 8.1
.6217
= 0 0.31
Z
Exercise
1.
The thicknesses of metal plates made by a particular
machine are normally distributed with a mean 0f 4.3 mm
and a standard deviation of 0.12 mm
a) What is the proportion of the metal plates that have
thickness outside the range of 4.1 to 4.5 mm
b) What are the upper and lower quartiles of the metal
plate thickness?
c) What is the value of c for which there is 80%
probability that a metal plate has a thickness within the
interval [4.3 – c, 4.3 + c]?
Answer to the Exercise
μ = 4.3 mm
a)
and
σ = 0.12 mm
P 4.5 X 4.1 1 P 4.1 X 4.5
1 P X 4.5 P X 4.1
4.1 4.3
4.5 4.3
1.67
zup
1.67
0.12
0.12
P 4.5 X 4.1 1 P Z 1.67 P Z 1.67
zlow
1 0.9525 0.0475 0.095 9.5%
b) Lower quartile: P(X ≤ a) = 0.25 and upper quartile: P(X ≤ a)
= 0.75
P X a 0.25
z 0.67
X Z 4.3 0.67 0.12 4.2196
P X a 0.75 z 0.67
X Z 4.3 0.67 0.12 4.3804
Answer to the Exercise
c) It means P (a ≤ X ≤ b) = 80%, where P (X ≤ a) = 10% = 0.1
or P (X ≤ b) = 90% = 0.9. Pick either one.
P X a 0.10
X Z c
z 2.33
c 2.33 0.12 0.2796
c Z
Linear Combination of Normal
Random Variables
• Linear Functions of a Normal Random Variable
If X ~ N(μ, σ2) and a and b are constants then
Y aX b ~ N a b, a 2 2
• The Sum of Two Independent Normal Random Variables
If X1 ~ N(μ1, σ12) and X2 ~ N(μ2, σ22) are independent
random variables then
Y a1 X 1 a2 X 2 ~ N a11 a2 2 , a1212 a22 22
• Averaging Independent Normal Random Variables
If Xi ~ N(μ, σ2), 1≤ i ≤ n, are independent random variables
then their average X is distributed
2
X ~ N ,
n
Example
The annual return of the stock of company A, XA say (in percent), is
distributed
X A ~ N 8.0, 1.5 N 8.0, 2.25
2
In addition, suppose that the annual return from the stock of
company B, XB say, is distributed
X B ~ N 9.5, 4.0
independent of the stock of company A.
a) What is the probability that company B’s stock performs better
than company A’s stock?
b) What is the probability that company B’s stock performs at least
2% points better than company A’s stock?
Example Solution
a) Let Y = XB – XA , then
Y a1 X 1 a2 X 2 ~ N a11 a2 2 , a1212 a22 22
Y ~ N 9.5 8.0 , 4 1 2.25
2
Y ~ N 1.5, 6.25 or
1.5 and 6.25
Performs better means Y ≥ 0.
PY 0 1 PY 0 and
PY 0 1 PZ 0.60
1 0.2743 0.7257
0 1.5
z
0.60
6.25
Example Solution
b) It means Y ≥ 2.0.
PY 2.0 1 PY 2.0 and
PY 2.0 1 PZ 0.20
1 0.5793 0.4207
z
2.0 1.5
0.20
6.25
Normal Approximations to the
Binomial Distribution
1.
Not all binomial tables
exist
2.
Requires large sample
size
3.
Gives approximate
probability only
n = 10 p = 0.50
f(x)
.3
.2
.1
.0
0
x
2
4
6
8
4.
Need correction for
continuity
5.
The distribution B(n, p) can be approximated by a
normal distribution with the mean and variance
np
2 np1 p
N np, np1 p
10
Normal Approximations to the
Binomial Distribution
.3
f(x)
Probability Added by
Normal Curve
.2
.1
.0
P X a ?
a
x
Binomial Distribution: the area of all the
‘orange’ bars
Normal Approximation: the area starting
from the ‘blue’ vertical line to the left. So it
needs correction of a ‘half’ in order to have
the same area as the Binomial
Correction for Continuity
1.
2.
3.
A 1/2 unit adjustment to discrete
variable
Improves accuracy
Correction for each of four cases:
For P(X ≥ a), use the area
above â = (a – 0.5).
For P(X > a), use the area
above â = (a + 0.5).
For P(X ≤ a), use the area
below â = (a + 0.5).
For P(X < a), use the area
below â = (a – 0.5).
-0.5
a
+0.5
Normal Approximation Procedure
•
Normal approximations to the binomial distribution work
well as long as
np 5 and n1 p 5
•
For each of four cases above, use
z
where
aˆ
aˆ a 0.5
np and np1 p
Example
Suppose that a fair coin is tossed n times. The
distribution of the number of heads obtained, X, is B(n,
0.5). If n = 100, what is the probability of obtaining
between 45 and 55 heads?
np 5 and n1 p 5 are satisfied since
np n1 p 100 0.5 50
P 45 X 55 P aˆ X bˆ P X bˆ P X aˆ
aˆ 45 0.5 44.5 and bˆ 55 0.5 55.5
zlow
aˆ np
44.5 50
1.10
5
np 1 p
zup
bˆ np
55.5 50
1.10
5
np 1 p
Example Solution
P 45 X 55 P Z 1.1 P Z 1.1
0.8643 0.1357 0.7286
Using a statistical software or Excel, the exact solution
of the binomial probability is 0.7287. The difference is
just about 0.0001
Central Limit Theorem
•
If X1, …, Xn is a sequence of independent identically
distributed random variables with a mean μ and a variance σ2
(not necessarily normal distributed), then the distribution of
their average X can be approximated by a
2
N ,
n
distribution. Similarly, the distribution of the sum X1 + … + Xn
can be approximated by a
N n , n 2
distribution. The general rule is that the approximation is
adequate as long as n ≥ 30