Chapter 1: Data Collection - Department of Mathematics

Download Report

Transcript Chapter 1: Data Collection - Department of Mathematics

Chapter 7: The Normal Probability
Distribution
7.1 Properties of the Normal Distribution
7.2 The Standard Normal Distribution
7.3 Applications of the Normal Distribution
7.4 Assessing Normality
7.5 The Normal Approximation to the Binomial Probability Distribution
December 8, 2008
1
Properties of the Normal Distribution
In this chapter we study a probability distribution for a continuous random variable,
called the Normal Distribution. This distribution is studied for several reasons:
(1) It is a good model for the distribution of many different populations.
(2) Several probability distributions (including some discrete probability distributions)
can be approximated by a Normal Distribution.
(3) It is bell-shaped and hence, the Empirical Rule applies.
(4) Many inferential methods in statistics is based on the assumption that the
population is distributed according to a Normal Distribution.
Hence, it is ubiquitous. If you want to have detailed knowledge of only one
probability distribution, then the Normal Distribution is one to study.
Section 7.1
2
Continuous Random Variables
• A continuous random variable has a continuum of possible values.
• Examples: time, age, height and weight.
• A continuous random variable has a continuous probability
distribution that is a curve that is defined on the interval from which
X takes its values.
3
Probability Distribution of a Continuous
Random Variable
Definition: Let X be a continuous random variable. Suppose that
values of X, i.e., x, lie in an interval [a,b]. The probability distribution of
X is a function, f(x), that is define on [a,b], such that the area under the
graph of f is equal to 1. The function, f(x), is also called the probability
density function (PDF) of the distribution.
Note: It is possible that either a and/or b are infinity.
4
Probabilities and Continuous Probability
Distributions
N
Discre te Probabil ity Distributi
: 
on
x1,P(x1,x 2,P(x 2 , ,x N ,P(x N  such that  P(x j )  1
j1
b
C ontinuous Probability Distri bution
: P(x)  f (x), x  [a,b] such that  f (x)dx  1
a

In the discrete case, we can extend the probability
of x (say at x = 2) to the interval [1.5,2.5]. The
probability for any x in [1.5,2.5] will be P(2). This
probability is equal to the area of the rectangle
whose base is the interval [1.5,2.5] and the height is
P(2). This manner we can extend a discrete
probability distribution to a continuous probability
distribution that is defined on an intervals. For
example, the probability for any x in [1.5,2.5] is P(2)
which is area of the rectangle constructed above.
5
Area and Discrete Probability Distribution
Recall: If x1 < x2 < …< xN, then P(x ≤ xk) = P(x1) + P(x2) + … + P(xk).
From the histogram of the discrete probability distribution, the quantity,
P(x1) + P(x2) + … + P(xk), is related to the area of the bars in the histogram. In
fact, if the width of the bars are 1, then it is exactly the sum of the areas of the
bars from x1 to xk. Hence, P(x ≤ xk) is an area “under the bar.”
Note:
• P(x ≤ xN) = 1
• If m < n, then P(xm ≤ x ≤ xn) is the
sum of the areas of the bars from xm to
xn.
6
Probabilities and Continuous Probability
Distributions
For a continuous probability distribution, we
generalize the ideas presented for the discrete
probability distribution. Let us consider some interval
[,] in the interval [a,b]. We want to associate a
probability for x in the interval [,]. We define the
probability for x in the interval [,] as the area under
the curve of f(x) and above the interval [,] .

P(  x   )   f (x)dx  area

b
P(a  x  b)   f (x)dx  1, possibly,   ,   
a
7
Cumulative Probability Distribution
z
Definition : The function G(z)  P(x  z) 
 f (x)dx 
area under the curve on (a, z] is
a
called the cumulative probability distribution (CPD). We sometimes call G(z) the cumulative
probability function (CPF).
8
Continuous-Discrete Probability Distribution of a
Random Variable
Example:The random variable is
the height of females in a certain
population.
As the number of possible outcomes
for a random variable X becomes
large, the discrete probability
distribution can approach a
continuous probability distribution.
We can often approximate discrete
probability distribution by continuous
probability distributions.
9
Remark
For a continuous random variable, X, with a continous probability density function, f (x),
the probability that x   is zero i.e., P(x   ) 

 f (x)dx  0.
One can think of this as
the area under a single point  , f ( ) which is zero. Furthermore, since the probability of
a x equal to a particular point  , we note that P(x   )  P(x   ).
10
Mean and Standard Deviation of a
Continuous Probability Distribution
It is possible to generalize the mean and standard deviation of a discrete probability
n
distribution,    x j P(x j ) and  =
j 1

n

2
x j   P(x j ), to a continuous probability
j 1
distribution with the probability density function, f (x). Namely,
b
   xf (x)dx and  
a
b
 x   
2
f (x)dx .
a
11
Summary of a Probability Distribution for a
Continuous Random Variable
Probability Density Function (PDF) : f (x), a  x  b such that 0  f (x), x a,b 
z
Cumulative Probability Distribution (CPD) : G(z) 

b
f (x)dx, a  x  b and G(b) 
a
 f (z)dz  1, G(a)  0
a
b
Mean of Probability Distribution :  X   xf (x)dx
a
Standard Deviation of Probability Distribution :  X 
b
 x   
2
X
f (x)dx
a
12
The Uniform Probability Distribution
Probability Distribution Function for the Uniform Distribution : f (x) 
z
P(x  z)  G(z) 

a
G(a) 
X 
aa
 0,
b 1
ab
,
2
X 
1
, axb
ba
za
 1 
f (x)dx   
dx 

 b  a
ba
a
z
G(b) 
ba
1
ba
1
(b  a)2
12
13
Normal Probability Distribution
We now examine a particular probability distribution for a continuous random
variable that takes all values of the real line.

1
Normal Probability Distribution Function
: f (x) 
e
 2
(x  )2
2
2
,  x 

Remark: The function f(x) is called a probability density function and is
abbreviated as PDF. We shall call the probability distribution, given by the above
probability distribution function, the Normal Distribution.
14
Remark
1
If f (x) 
e
 2

( x   )2
2 2

, then


xf (x)dx   X and

2
2
x


f
(x)dx


.


X


Hence, the mean and standard deviation of a Normal Distribution are parameters
in the probability density function.
15
Dependence on Mean and Standard
Deviation
 = 0 and  = 1
 = 2 and  = 1
 = 0 and  = 3
 = -2 and  = 1
16
We will call the graph of f(x) the normal density curve or simply, the normal curve.
Computing the Probability Distribution Function
for the Normal Curve
How can you calculate the function f(x) for different values of x? Once
you have define  and , you use:
• calculator
• computer
• tables
17
Facts about the Normal Distribution
Here are some properties of the graph of the normal density function
f(x):
• It is symmetric with respect to the line x = 
• The highest value of the curve occurs when x = .
• It has two points of inflection: x =  ± . A point of inflection is
were a curve changes from being concave upward to concave
downward or vice-versa.
• The area under the curve is 1.
• It highest value of f(x) (at x = ) changes with , but is always
positive.
• For some standard deviations, , the values of f(x) may be larger
than 1.0 and hence, probability density function at a point, x, is not
necessarily the probability, P(x).
18
Some Useful Facts about the Normal
Distribution Function
19
Empirical Rule for the Normal Distribution
For the normal distribution and its curve, we have the following empirical rules
for bell-shaped distributions:
• Approximately 68% of the area under the curve lies in the interval [-, +].
• Approximately 95% of the area under the curve lies in the interval [-2, +2].
• Approximately 99.7% of the area under the curve lies in the interval [-3, +3].
Recall: The empirical rule for bell-shaped distributions.
20
The Normal Cumulative Probability
Distribution
Definition: The Cumulative Probability Distribution, P(x ≤ ), is defined to be the
area under the Normal Probability Density Function for x ≤ . The value of P(x ≤ )
is always between 0 and 1.
(x  )2
1    2 2 
dx
Re mark: P(x   ) 
e
 


 2 

(x  )2 


1
2 2 
Re mark: P(x   )  1 P(x   ) 
e
 
dx
 2  





21
Fact about P(x ≤ )
Fact: The Normal Cumulative Probability Distribution (Normal CPD) of x gives
the probability that x ≤ .
For example, if X denotes the continuous random variable which is the weight of
an individual randomly chosen from a population that obeys a normal distribution
and x is the numerical value for this random variable, then P(x ≤ 180) is the
probability that this individual weighs at least 180 pounds.
22
Cumulative Probability Distribution of an
Interval
Another Fact: The normal cumulative probability distribution for an interval
[,] is the area under the curve and above the interval: P(≤ x ≤ ).
1
Remark : P(  x   ) 
 2

 e

( x   )2
2 2
dx
23
Example
Suppose the replacement time of a particular brand of refrigerator is normally distributed with
mean  = 14 years and standard deviation  = 2.5 years.
(a) Sketch a graph of the probability density function and the cumulative probability density
function.
(b) Shade the region in the graph of the probability density function that represents the probability
that a randomly selected refrigerator will last at least 17 years.
(c) What is the probability that it will last more than 17 years.
(d) What is the probability that it will be replaced between 14 years and 16.5 years.
17
P(x  17) 


e

( x 17)2 / 22.5 2
2  2.5

dx  0.88493
16.5
P(14  x  16.5) 

14
e

( x 17)2 / 22.5 2
2  2.5

dx  0.841345  0.5  0.341345
24
Calculation of the Cumulative
Probability Distribution on the TI-83
• 2nd VARS (DISTR) key
• Select normalcdf( [ENTER]
• Complete entry e.g., normalcdf(-1.9,2.3,0.5,1.7) [ENTER]
• Answer: 0.7761502183
25
z - score
Recall: We introduce the concept of the z-score for an observation in a sample:
z = (observation - mean)/(standard deviation)
or letting observation = x, mean =  and standard deviation = , we have
z = (x - )/.
For example, when z = ±1, then x =  ± . When z = ±2, then x =  ± 2. In
general, the z-score is a measure of how far is the observation (x) from the mean.
26
z-score and the Normal Distribution
• Between z = -1 and z = 1, the values of x lie in the interval [-,+]. We know
from the empirical rule, this is approximately 68% of the total area under the normal
curve.
• Between z = -2 and z = 2, the values of x lie in the interval [-2,+2]. We know
from the empirical rule, this is approximately 95% of the total area under the normal
curve.
• Between z = -3 and z = 3, the values of x lie in the interval [-3,+3]. We know
from the empirical rule, this is approximately 99.7% of the total area under the
normal curve.
Hence, P(-≤ x ≤ +) is approximately 0.68, P(-2≤ x ≤ +2) is
approximately 0.95, and P(-3≤ x ≤ +3) is approximately 0.997.
27
Standard Normal Distribution
Definition: The normal distribution with  = 0 and  = 1 is called the
Standard Normal Distribution.
28
The Standard Random Variable
Theorem : Suppose x is a continuous random variable that is distributed by a
Normal Distribution with mean  and standard deviation  . If we introduce a
x
new continuous random variable z 
, then z is distributed by the Standard

Normal Distribution.
Application : Every random variable x distributed by a Normal Distribution can be converted
to a random variable distributed by the Standard Normal Distribution z and
P(  x   )  P(  z   )
where  


and  
.


29
Example
  1 and  
3
4
3
1
3
1
P  x   :   ,  
2
2
2
2
x 1
z
3/ 4
2
 2
P  z  
 3
3
30
The Standard Normal Distribution
We observed in the previous section that every Normal Distribution with mean  and
standard deviation  can be converted to a Standard Normal Distribution by the
change of random variable: z = (x - )/.
Normal Distribution
Standard Normal Distribution
Section 7.2
31
Computing Probabilities with the Standard
Normal Distribution
P   x   
P   z   , z 
x

32
Example
Example: The time between release from prison and conviction for another crime for individuals
under the age of 40 is normally distributed (i.e., the probability of these events happen is governed
by a Normal Distribution) with a mean of 30 months and a standard deviation of 6 months. Find the
probability that an individual who has been released from prison will be convicted of another crime
within 24 months.
Solution: We want to calculate P(x ≤ 24) with  = 30 and  = 6.
We can use the standard normal distribution by introducing the z-score. z = (x - 30)/6 or when x =
24, then z = (24 - 30)/6 = -1. Now P(z ≤ -1) = 0.1587. Hence, 15.87% of the prisoners will return
within 2 years. Below are the probability density function (PDF) and the cumulative probability
distribution (CPD). Notice that P(x < 0) is approximately zero.
33
Calculating P(a ≤ z ≤ b) from Tables
P z  2.6   0.0047 (calculator: 0.0046612218)
P z  2.6   1  P z  2.6   1  0.0047  0.9953
P(z  2.62)  0.0044 (calculator: 0.043965255)
P(2.0  z  1.5)  P(z  1.5)  P(2.0  z)  0.0668  0.0228  0.044
34
Inverse Problem: Given the value
of P(z ≤ a), find a
Suppose that we are given the value of P(z ≤ a) i.e., the area under a Standard
Normal curve and we want to determine the value of a.
Methods:
1.
Tables
2.
Calculator - invNorm
Example : P(z  a)  0.45  a  0.1256613
35
Inverse Problem: Given the value
of P(-a ≤ z ≤ a), find a
Suppose that we are given the value of P(-a ≤ z ≤ a) i.e., the area under a Standard
Normal curve and we want to determine the value of a.
1  P(z  a)  P(a  z  a)  P(z  a)  2P(z  a)  P(a  z  a)
1
 P(z  a)  1  P(a  z  a)  known number
2
Example : P(a  z  a)  0.8  P(z  a) 
1
1  0.8  0.10
2
 a  1.181551  a  1.181551
36
Inverse Problem: Given the value
of P(z > a), find a
Suppose that we are given the value of P(z > a) i.e., the area under a Standard
Normal curve and we want to determine the value of a.
P(z  a)  1  P(z  a)
Example : 0.45  P(z  a)  1  P(z  a)
 P(z  a)  1  0.45  0.35
 a  0.3853204
37
Applications of the Normal
Distribution
One important application of the Normal Distribution is the following. Suppose a
variable x in a population (e.g., the height of individuals in Math 127A) is
distributed according to a Normal Distribution with mean  and standard
deviation . If we consider X to be a continuous random variable, then what is
the probability that any randomly selected individual from the population will
satisfy: a ≤ x ≤ b? That is, what is P(a ≤ x ≤ b)?
Remark: We sometimes substitute the word “proportion” for probability. That is,
what proportion of the population will the random variable x lie in the interval
[a,b]?
Section 7.3
38
Example
The Accreditation Council for Graduate Medical Education found that average hours worked by
medical residents was 81.7 hours per week with a standard deviation of 6.9 hours. Suppose that
we assume that the number of hours per week worked by medical residents is distributed by a
Normal Distribution with  = 81.7 and  = 6.9.
(a) What is the probability that a medical resident will work more than 80 hours per week?
(b) What is the probability that a randomly selected resident will work between 60 and 80 hours
per week?
x  number of hours per week
  81.7 and   6.9  z 
x


x  81.7
6.9
80  81.7
1.7

 0.246377
6.9
6.9
P(x  80)  P(z  0.246377)  1  P(z  0.246388)  1  0.402695  0.597305
(a) x  80  z 
(b) P(60  x  80)  P(3.14493  z  0.246377)
 P(z  0.246377)  P(z  3.14493)  0.402695  0.00083064  0.401865
39
Example
The Timken Company manufactures ball bearings with a mean diameter of 5 mm. Due to the
manufacturing process there is some variation in the diameters of the ball bearings. It has
been calculated that the distribution of diameters is normally distributed with a mean of 5 and
a standard deviation of 0.02 mm.
(a) What proportion of the ball bearings have diameters which are greater than 5.03 mm?
(b) Any ball bearing that is smaller than 4.95 mm in diameter or greater than 5.05 mm is discarded.
What proportion of ball bearings is discarded?
(c) In one day, 30,000 ball bearings are manufactured. How many would you expect to be
discarded in a day?
Let X be the continuous random variable that is the diameter of the ball bearings.
(a) z 
x


x5
5.03  5 

. P(x  5.05)  P  z 
  P z  1.5   1  P z  1.5   1  0.933193  0.0668072

0.02
0.02 
(b) P(x  4.95 or x  5.05)  P(x  4.95)  P(x  5.05)  P(x  4.95)  1  P(x  5.05)
4.95  5 
5.05  5 


 P z 
1 P z 

  P(z  2.5)  1  P(z  2.5)  0.0124193


0.02 
0.02 
(c) number  30000  P(x  4.95 or x  5.05)  372.58  373
40
Assessing Normality
Suppose that a variable of a population X is distributed according to an unknown
distribution. Is there a way that we can test if this unknown distribution is actually a
Normal Distribution?
One Approach: Take a large finite sample from the population and create a
histogram to see if the histogram has the characteristics of a Normal Distribution i.e.,
it is bell-shaped. However, being bell-shaped does not mean that it is a Normal
Distribution.
Section 7.4
41
Another Approach
Sample : data  x1 , x2 ,..., xn  such that x1  x2  ...  xn .
Index Distribution : fi 
i  0.375
, i  1, 2,..., n. Note that 0  fi  1.
n  0.25
Normal Score : Find the value zi such that fi  P(zi  z), i  1, 2,..., n. This is the inverse problem since we are given
fi and we are asked to find zi . Hence, fi is a proportion of the total area under the Standard Normal Distribution and we
must determine the value of z (i.e., zi ) that produces this proportion (area).
Normal Probability Plot : Plot the bivariate data set:
x , z , x , z ,..., x , z . It this is approximately a straight line,
1
1
2
2
n
n
then the data is likely to come from a Normal Distribution.
TI-83: NormProbPlot
42
Example
Data: {0.533226, 2.73637, 2.76095, 2.83428, 2.62008, 1.82784, 1.31128, 1.87577, 0.70117, 3.09077, 2.47481, 2.09632,
2.22858, 2.23172, 1.76795, 0.153967, 1.19405, 2.70018, 1.66897, 0.583992}
Sorted Data: {0.153967, 0.533226, 0.583992, 0.70117, 1.19405, 1.31128, 1.66897, 1.76795, 1.82784, 1.87577, 2.09632,
2.22858, 2.23172, 2.47481, 2.62008, 2.70018, 2.73637, 2.76095, 2.83428, 3.09077}
Normal Scores: {-1.86824, -1.40341, -1.12814, -0.919136, -0.744143, -0.589456, -0.447768, -0.314572, -0.186756, 0.0619316, 0.0619316, 0.186756, 0.314572, 0.447768, 0.589456, 0.744143, 0.919136, 1.12814, 1.40341, 1.86824}
n = 20
Note: Data was generated by a
Normal Distribution with  = 2 and
 = 0.75.
43
Example
Data: {-8.21923, -2.74515, -0.386428, -0.677152, 4.02123, -0.826667, 9.17761, 6.45027, -2.31864, 6.53159, 7.68041, 1.54977, -0.988243, 3.35719, 5.98133, 4.44442, 4.03768, 9.3086, 6.4066, -9.51397, -6.42983, 1.88659, -1.5584, 6.85724, 8.2106, -5.36826, 8.82803, -2.46561, -2.23184, 5.45841}
Sorted Data: {-9.51397, -8.21923, -8.2106, -6.42983, -5.36826, -2.74515, -2.46561, -2.31864, -2.23184, -1.5584, -1.54977, 0.988243, -0.826667, -0.677152, -0.386428, 1.88659, 3.35719, 4.02123, 4.03768, 4.44442, 5.45841, 5.98133, 6.4066, 6.45027,
6.53159, 6.85724, 7.68041, 8.82803, 9.17761, 9.3086}
Normal Scores: {-2.04028, -1.60982, -1.36087, -1.17581, -1.02411, -0.892918, -0.775547, -0.668002, -0.567686, -0.472789, 0.381976, -0.294213, -0.208664, -0.124617, -0.0414437, 0.0414437, 0.124617, 0.208664, 0.294213, 0.381976, 0.472789,
0.567686, 0.668002, 0.775547, 0.892918, 1.02411, 1.17581, 1.36087, 1.60982, 2.04028}
n = 30
Note: Data was generated by a
Uniform Distribution on the interval
[-9,9].
44
Example
Data: {0.00881683, 0.295109, 2.71993, 0.0275762, 1.15885, 1.01363, 0.295519, 0.639201, 0.602931, 0.446441, 0.0801617,
0.580694, 0.367919, 0.477032, 0.197738, 0.16514, 1.43215, 0.305959, 0.269021, 0.359607}
Sorted Data: {0.00881683, 0.0275762, 0.0801617, 0.16514, 0.197738, 0.269021, 0.295109, 0.295519, 0.305959, 0.359607,
0.367919, 0.446441, 0.477032, 0.580694, 0.602931, 0.639201, 1.01363, 1.15885, 1.43215, 2.71993}
Normal Scores: {-1.86824, -1.40341, -1.12814, -0.919136, -0.744143, -0.589456, -0.447768, -0.314572, -0.186756, 0.0619316, 0.0619316, 0.186756, 0.314572, 0.447768, 0.589456, 0.744143, 0.919136, 1.12814, 1.40341, 1.86824}
n = 20
Note: Data was generated by a
non-Normal Distribution.
45
The Normal Approximation to the Binomial
Probability Distribution
Recall the discrete Binomial Distribution Probability Function: P(x) 
n!
n x
p x 1  p  , x  0,1, 2,..., n
x!(n  x)!
P(x  k)  P(0)  P(1)  ....  P(k)
P(x  k)  P(k  1)  P(k  2)  ...  P(n)
Observation 1 : If np(1  p)  10, then the Binomial Distribution is "bell-shaped."
Observation 2 : If np(1  p)  10, then the Binomial Distribution can be approximate by a Normal Distribution
with  X  np and  X  np(1  p).
Section 7.5
46
Example
According to the Commerce Department in 2004, 20% of U.S. households had some type of
high-speed internet connection (cable, DSL, satellite). Suppose 80 U.S. households are
selected at random. What is the probability that exactly 15 households of the 80 will have a
high-speed internet connection?
x  number of high-speed connections
n!
n x
px 1  p  , x  0,1, 2,..., n
x!(n  x)!
n  80, p  0.20
P(x) 
P(x  15) 
80!
0.2 15 0.8 65
15!(80  15)!
80!  7.15695  10118
Approximating Normal Distribution
  np  (80)(0.2)  16
  np(1  p)  16(1  0.2)  12.8  3.57771
15.5  16 
 14.5  16
Pbinomial (x  15)  Pnormal (14.5  x  15.5)  P 
z
  P 0.419263  z  0.139754 
 12.8
12.8 
 Pbinomial (x  15)  0.444427  0.337512  0.106915
47