Chapter 1: Statistics

Transcript Chapter 1: Statistics

Probability
The definition – probability of an Event
PE =
nE
nS 

nE
N
no. of outcomes in E

total no. of outcomes
Applies only to the special case when
1. The sample space has a finite no.of
outcomes, and
2. Each outcome is equi-probable
If this is not true a more general
definition of probability is required.
Summary of the Rules of
Probability
The additive rule
P[A  B] = P[A] + P[B] – P[A  B]
and
P[A  B] = P[A] + P[B] if P[A  B] = f
The Rule for complements
for any event
E
P  E   1  P  E 
Conditional probability
P  A B  
P  A  B
P  B
The multiplicative rule of probability

 P  A P  B A if P  A  0
P  A  B  


P
B
P
A
B
if
P
B

0








and
P A  B  P  A P  B
if A and B are independent.
This is the definition of independent
Counting techniques
Summary of counting results
Rule 1
n(A1  A2  A3  …. ) = n(A1) + n(A2) + n(A3) + …
if the sets A1, A2, A3, … are pairwise mutually exclusive
(i.e. Ai  Aj = f)
Rule 2
N = n1 n2 = the number of ways that two operations can be
performed in sequence if
n1 = the number of ways the first operation can be
performed
n2 = the number of ways the second operation can be
performed once the first operation has been
completed.
Rule 3
N = n1n2 … nk
= the number of ways the k operations can be
performed in sequence if
n1 = the number of ways the first operation can be
performed
ni = the number of ways the ith operation can be
performed once the first (i - 1) operations have
been completed. i = 2, 3, … , k
Basic counting formulae
1.
Orderings
n!  the number of ways you can order n objects
2.
Permutations
n!
 The number of ways that you
n Pk 
 n  k ! can choose k objects from n in a
specific order
3.
Combinations
 n
n!
 The number of ways that you
   n Ck 
k ! n  k !
k 
can choose k objects from n
(order of selection irrelevant)
Applications to some counting
problems
• The trick is to use the basic counting formulae
together with the Rules
• We will illustrate this with examples
• Counting problems are not easy. The more practice
better the techniques
Random Variables
Numerical Quantities whose values are
determine by the outcome of a random
experiment
Random variables are either
• Discrete
– Integer valued
– The set of possible values for X are integers
• Continuous
– The set of possible values for X are all real
numbers
– Range over a continuum.
Examples
• Discrete
– A die is rolled and X = number of spots
showing on the upper face.
– Two dice are rolled and X = Total
number of spots showing on the two
upper faces.
– A coin is tossed n = 100 times and X =
number of times the coin toss resulted in
a head.
–
We observe X, the number of hurricanes in the
Carribean from April 1 to September 30 for a
given year
Examples
• Continuous
– A person is selected at random from a
population and X = weight of that individual.
– A patient who has received who has revieved
a kidney transplant is measured for his serum
creatinine level, X, 7 days after transplant.
– A sample of n = 100 individuals are selected
at random from a population (i.e. all samples
of n = 100 have the same probability of being
selected) . X = the average weight of the 100
individuals.
The Probability distribution of A
random variable
A Mathematical description of the possible
values of the random variable together with
the probabilities of those values
The probability distribution of a
discrete random variable is describe by
its :
probability function p(x).
p(x) = the probability that X takes on
the value x.
This can be given in either a tabular
form or in the form of an equation.
It can also be displayed in a graph.
Example 1
• Discrete
– A die is rolled and X = number of spots
showing on the upper face.
x
1
2
3
4
5
6
p(x)
1/6
1/6
1/6
1/6
1/6
1/6
formula
– p(x) = 1/6 if x = 1, 2, 3, 4, 5, 6
Graphs
To plot a graph of p(x), draw bars of height p(x)
above each value of x.
Rolling a die
0
1
2
3
4
5
6
Example 2
– Two dice are rolled and X = Total
number of spots showing on the two
upper faces.
x
p(x)
2
3
4
5
6
7
8
9
10
11
12
1/36
2/36
3/36
4/36
5/36
6/36
5/36
4/36
3/36
2/36
1/36
Formula:
 x 1

 36
p( x)  
13  x

 26
x  2,3, 4,5, 6
x  7,8,9,10,11,12
Rolling two dice
0
36 possible outcome for rolling two dice
Comments:
Every probability function must satisfy:
1. The probability assigned to each value of the random
variable must be between 0 and 1, inclusive:
0  p ( x)  1
2. The sum of the probabilities assigned to all the values
of the random variable must equal 1:
 p ( x)  1

x
b
  p ( x)
3. P a  X  b 
xa
 p(a)  p(a  1)    p(b)
Example
In baseball the number of individuals, X, on base when a home run
is hit ranges in value from 0 to 3. The probability distribution is
known and is given below:
x
p(x)
0
6/14
1
4/14
2
3/14
3
1/14
Note:
 This chart implies the only values x takes on are 0, 1, 2, and 3.

If the random variable X is observed repeatedly the probabilities, p(x),
represents the proportion times the value x appears in that sequence.
3
14
3 1
4
PtherandomvariableX is at least 2  p2  p3   
14 14 14
P( the random variable X equals 2)  p (2) 
A Bar Graph
0.500
0.429
No. of persons on base
when a home run is hit
0.400
0.286
p(x)
0.300
0.214
0.200
0.100
0.071
0.000
0
1
2
# on base
3
Discrete Random Variables
Discrete Random Variable: A random variable usually
assuming an integer value.
• a discrete random variable assumes values that are isolated points
along the real line. That is neighbouring values are not “possible
values” for a discrete random variable
Note: Usually associated with counting
• The number of times a head occurs in 10 tosses of a coin
• The number of auto accidents occurring on a weekend
• The size of a family
Continuous Random Variables
Continuous Random Variable: A quantitative random variable that
can vary over a continuum
• A continuous random variable can assume any value along a line
interval, including every possible value between any two points on
the line
Note: Usually associated with a measurement
• Blood Pressure
• Weight gain
• Height
Probability Distributions
of Continuous Random Variables
Probability Density Function
The probability distribution of a continuous random
variable is describe by probability density curve f(x).
Notes:

The Total Area under the probability density curve is 1.

The Area under the probability density curve is from a to
b is P[a < X < b].
Normal Probability Distributions
(Bell shaped curve)
P(a  x  b)
a

b
x
Mean and Variance (standard deviation) of
a
Discrete Probability Distribution
• Describe the center and spread of a probability
distribution
• The mean (denoted by greek letter  (mu)),
measures the centre of the distribution.
• The variance (s2) and the standard deviation (s)
measure the spread of the distribution.
s is the greek letter for s.
Mean of a Discrete Random Variable
• The mean, , of a discrete random variable x is found by
multiplying each possible value of x by its own
probability and then adding all the products together:
   xpx 
x
 x1 px1   x2 px2    xk pxk 
Notes:

The mean is a weighted average of the values of X.

The mean is the long-run average value of the random
variable.

The mean is centre of gravity of the probability distribution
of the random variable
0.3
0.2
0.1
1
2
3
4
5
6
7
8

9
10
11
Variance and Standard Deviation
Variance of a Discrete Random Variable: Variance, s2, of a
discrete random variable x is found by multiplying each
possible value of the squared deviation from the mean, (x  )2,
by its own probability and then adding all the products
together:
2
2
s   x    px
2
x


2
  x px    xpx 
x
x

  x 2 p x    2






x
Standard Deviation of a Discrete Random Variable: The
positive square root of the variance:
s  s2
Example
The number of individuals, X, on base when a home run
is hit ranges in value from 0 to 3.
x
0
1
2
3
Total
p (x )
xp(x)
0.429
0.000
0.286
0.286
0.214
0.429
0.071
0.214
1.000
0.929
 p(x)  xp(x)
x
2
0
1
4
9
2
x p(x)
0.000
0.286
0.857
0.643
1.786
2
x
 p( x)
• Computing the mean:
   xpx  0.929
x
Note:
• 0.929 is the long-run average value of the random variable
• 0.929 is the centre of gravity value of the probability distribution
of the random variable
• Computing the variance:


s 2   x   2 px




  x px    xpx 
x
x

2
 1.786 .929  0.923
x
2
• Computing the standard deviation:
s  s2
 0.923  0.961
2
Random Variables
Numerical Quantities whose values are
determine by the outcome of a random
experiment
Random variables are either
• Discrete
– Integer valued
– The set of possible values for X are integers
• Continuous
– The set of possible values for X are all real
numbers
– Range over a continuum.
The Probability distribution of A
random variable
A Mathematical description of the possible
values of the random variable together with
the probabilities of those values
The probability distribution of a
discrete random variable is describe by
its :
probability function p(x).
p(x) = the probability that X takes on
the value x.
This can be given in either a tabular
form or in the form of an equation.
It can also be displayed in a graph.
Example
In baseball the number of individuals, X, on base when a home run
is hit ranges in value from 0 to 3. The probability distribution is
known and is given below:
x
p(x)
0
6/14
1
4/14
2
3/14
3
1/14
Note:
 This chart implies the only values x takes on are 0, 1, 2, and 3.

If the random variable X is observed repeatedly the probabilities, p(x),
represents the proportion times the value x appears in that sequence.
3
14
3 1
4
PtherandomvariableX is at least 2  p2  p3   
14 14 14
P( the random variable X equals 2)  p (2) 
A Bar Graph
0.500
0.429
No. of persons on base
when a home run is hit
0.400
0.286
p(x)
0.300
0.214
0.200
0.100
0.071
0.000
0
1
2
# on base
3
Probability Distributions
of Continuous Random Variables
Probability Density Function
The probability distribution of a continuous random
variable is describe by probability density curve f(x).
Notes:

The Total Area under the probability density curve is 1.

The Area under the probability density curve is from a to
b is P[a < X < b].
Mean, Variance and standard
deviation of Random Variables
Numerical descriptors of the distribution of
a Random Variable
Mean of a Discrete Random Variable
• The mean, , of a discrete random variable x is found by
multiplying each possible value of x by its own
probability and then adding all the products together:
   xpx 
x
 x1 px1   x2 px2    xk pxk 
Notes:

The mean is a weighted average of the values of X.

The mean is the long-run average value of the random
variable.

The mean is centre of gravity of the probability distribution
of the random variable
0.3
0.2
0.1
1
2
3
4
5
6
7
8

9
10
11
Variance and Standard Deviation
Variance of a Discrete Random Variable: Variance, s2, of a
discrete random variable x is found by multiplying each
possible value of the squared deviation from the mean, (x  )2,
by its own probability and then adding all the products
together:
2
2
s   x    px
2
x


2
  x px    xpx 
x
x

  x 2 p x    2






x
Standard Deviation of a Discrete Random Variable: The
positive square root of the variance:
s  s2
Example
The number of individuals, X, on base when a home run
is hit ranges in value from 0 to 3.
x
0
1
2
3
Total
p (x )
xp(x)
0.429
0.000
0.286
0.286
0.214
0.429
0.071
0.214
1.000
0.929
 p(x)  xp(x)
x
2
0
1
4
9
2
x p(x)
0.000
0.286
0.857
0.643
1.786
2
x
 p( x)
• Computing the mean:
   xpx  0.929
x
Note:
• 0.929 is the long-run average value of the random variable
• 0.929 is the centre of gravity value of the probability distribution
of the random variable
• Computing the variance:


s 2   x   2 px




  x px    xpx 
x
x

2
 1.786 .929  0.923
x
2
• Computing the standard deviation:
s  s2
 0.923  0.961
2
The Binomial distribution
An important discrete distribution
Situation - in which the binomial distribution arises
• We have a random experiment that has two outcomes
– Success (S) and failure (F)
– p = P[S], q = 1 - p = P[F],
• The random experiment is repeated n times
independently
• X = the number of times S occurs in the n repititions
• Then X has a binomial distribution
Example
• A coin is tosses n = 20 times
– X = the number of heads
– Success (S) = {head}, failure (F) = {tail
– p = P[S] = 0.50, q = 1 - p = P[F]= 0.50
• An eye operation has %85 chance of success. It is
performed n =100 times
– X = the number of Sucesses (S)
– p = P[S] = 0.85, q = 1 - p = P[F]= 0.15
• In a large population %30 support the death penalty.
A sample n =50 indiviuals are selected at random
– X = the number who support the death penalty (S)
– p = P[S] = 0.30, q = 1 - p = P[F]= 0.70
The Binomial distribution
1. We have an experiment with two outcomes –
Success(S) and Failure(F).
2. Let p denote the probability of S (Success).
3. In this case q=1-p denotes the probability of
Failure(F).
4. This experiment is repeated n times
independently.
5. X denote the number of successes occuring in the
n repititions.
The possible values of X are
0, 1, 2, 3, 4, … , (n – 2), (n – 1), n
and p(x) for any of the above values of x is
given by:
 n x
 n  x n x
n x
px     p 1  p     p q
 x
 x
X is said to have the Binomial distribution
with parameters n and p.
Summary:
X is said to have the Binomial distribution with
parameters n and p.
1. X is the number of successes occurring in the n
repetitions of a Success-Failure Experiment.
2. The probability of success is p.
3. The probability function
 n x
n x
px     p 1  p 
 x
Example:
1. A coin is tossed n = 5 times. X is the number of
heads occurring in the 5 tosses of the coin. In
this case p = ½ and
 5  1 x 1 5 x  5  1 5  5  1
px     2   2     2    32 
 x
 x
 x
x
0
1
2
3
4
5
p(x)
1
32
5
32
10
32
10
32
5
32
1
32
Note:
 5
5!
 
 x  x ! 5  x !
 5
5!
1
 
 0  0! 5  0 !
 5
5!
5!
 5
 
 1  1! 5  1! 4!
 5  5! 5  4 

 10
 
 2  2!3! 2 1
 5  5! 5  4 

 10
 
 3  3!2! 2 1
 5  5!
5
 
 4  4!1!
 5  5!
1
 
 5  0!5!
0.4
p (x )
0.3
0.2
0.1
0.0
1
2
3
4
number of heads
5
6
Computing the summary parameters for the
distribution – , s2, s
x
0
1
2
3
4
5
Total
p (x )
0.03125
0.15625
0.31250
0.31250
0.15625
0.03125
1.000
 p(x)
xp(x)
0.000
0.156
0.625
0.938
0.625
0.156
2.500
 xp(x)
x
2
0
1
4
9
16
25
2
x p(x)
0.000
0.156
1.250
2.813
2.500
0.781
7.500
2
x
 p( x)
• Computing the mean:
   xpx   2.5
x
• Computing the variance:


s 2   x   2 px




  x px    xpx 
x
x

2
 7.5  2.5  1.25
x
2
• Computing the standard deviation:
s  s2
 1.25  1.118
2
Example:
• A surgeon performs a difficult operation n =
10 times.
•
X is the number of times that the operation is
a success.
•
The success rate for the operation is 80%. In
this case p = 0.80 and
•
X has a Binomial distribution with n = 10 and
p = 0.80.
10
x
10  x
px    0.80 0.20
x
Computing p(x) for x = 0, 1, 2, 3, … , 10
x
p (x )
x
p (x )
0
0.0000
6
0.0881
1
0.0000
7
0.2013
2
0.0001
8
0.3020
3
0.0008
9
0.2684
4
0.0055
10
0.1074
5
0.0264
The Graph
0.4
p (x )
0.3
0.2
0.1
0
1
2
3
4
5
6
7
Number of successes, x
8
9
10
Computing the summary parameters for the distribution –
, s2, s
x
0
1
2
3
4
5
6
7
8
9
10
Total
p (x )
0.0000
0.0000
0.0001
0.0008
0.0055
0.0264
0.0881
0.2013
0.3020
0.2684
0.1074
1.000
xp(x)
0.000
0.000
0.000
0.002
0.022
0.132
0.528
1.409
2.416
2.416
1.074
8.000
 xp(x)
x2
x 2 p(x)
0
1
4
9
16
25
36
49
64
81
100
0.000
0.000
0.000
0.007
0.088
0.661
3.171
9.865
19.327
21.743
10.737
65.600
2
x
 p( x)
• Computing the mean:
   xpx   8.0
x
• Computing the variance:


s 2   x   2 px




  x px    xpx 
x
x

2
 65.6  8.0  1.60
x
2
• Computing the standard deviation:
s  s2
 1.25  1.118
2
Notes


The value of many binomial probabilities are found in Tables
posted on the Stats 245 site.
The value that is tabulated for n = 1, 2, 3, …,20; 25 and various values
of p is:
c
n x
10 x
PX  c     p  1  p    px 
x 0  x 
x 0
c
 p0  p1  p2    pc 

Hence
pc  T abled value for c  T abled value for c  1

The other table, tabulates p(x). Thus when using this
table you will have to sum up the values
Example

n =5


Suppose n = 8 and p = 0.70 and we want to
compute P[X = 5] = p(5)
c
0.05
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
0.95
0
0.663
0.430
0.168
0.058
0.017
0.004
0.001
0.000
0.000
0.000
0.000
1
0.943
0.813
0.503
0.255
0.106
0.035
0.009
0.001
0.000
0.000
0.000
2
0.994
0.962
0.797
0.552
0.315
0.145
0.050
0.011
0.001
0.000
0.000
3
1.000
0.995
0.944
0.806
0.594
0.363
0.174
0.058
0.010
0.000
0.000
4
1.000
1.000
0.990
0.942
0.826
0.637
0.406
0.194
0.056
0.005
0.000
5
1.000
1.000
0.999
0.989
0.950
0.855
0.685
0.448
0.203
0.038
0.006
6
1.000
1.000
1.000
0.999
0.991
0.965
0.894
0.745
0.497
0.187
0.057
7
1.000
1.000
1.000
1.000
0.999
0.996
0.983
0.942
0.832
0.570
0.337
8
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
Table value for n = 8, p = 0.70 and c =5 is 0.448 = P[X ≤ 5]
P[X = 5] = p(5) = P[X ≤ 5] - P[X ≤ 4] = 0.448 – 0.194 = .254
We can also compute Binomial probabilities using Excel
The function
=BINOMDIST(x, n, p, FALSE)
will compute p(x).
The function
=BINOMDIST(c, n, p, TRUE)
c
n x
will compute PX  c     p  1  p 10 x   px 
x 0  x 
x 0
 p0  p1  p2    pc 
c
Mean, Variance and standard
deviation of
Binomial Random Variables
Mean of a Discrete Random Variable
• The mean, , of a discrete random variable x
   xpx 
x
 x1 px1   x2 px2    xk pxk 
Notes:

The mean is a weighted average of the values of X.

The mean is the long-run average value of the random
variable.

The mean is centre of gravity of the probability distribution
of the random variable
Variance and Standard Deviation
Variance of a Discrete Random Variable: Variance, s2, of a
discrete random variable x


s   x    px
2
x

x
2




x px    xpx 
x


2
2

  x 2 p x    2
x
Standard Deviation of a Discrete Random Variable: The
positive square root of the variance:
s  s2
The Binomial ditribution
X is said to have the Binomial distribution with
parameters n and p.
1. X is the number of successes occurring in the n
repetitions of a Success-Failure Experiment.
2. The probability of success is p.
3. The probability function
 n x
n x
px     p 1  p 
 x
Mean,Variance & Standard
Deviation of the Binomial
Ditribution
• The mean, variance and standard deviation of the
binomial distribution can be found by using the
following three formulas:
1.   np
2. s  npq  np1  p
2
3. s  npq  np1  p
Example:
Find the mean and standard deviation of the
binomial distribution when n = 20 and p = 0.75
Solutions:
1) n = 20, p = 0.75,
q = 1 - 0.75 = 0.25
  np  (20)(0.75)  15
s  npq  (20)(0.75)(0.25)  3.75  1936
.
2) These values can also be calculated using the probability function:
 20


p ( x )   (0.75) x (0.25)20 x for x  0, 1, 2, ... , 20
 x
Table of probabilities
x
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Total
p (x )
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0002
0.0008
0.0030
0.0099
0.0271
0.0609
0.1124
0.1686
0.2023
0.1897
0.1339
0.0669
0.0211
0.0032
1.000
xp(x)
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.001
0.006
0.027
0.099
0.298
0.731
1.461
2.361
3.035
3.035
2.276
1.205
0.402
0.063
15.000
x2
x 2 p(x)
0
1
4
9
16
25
36
49
64
81
100
121
144
169
196
225
256
289
324
361
400
0.000
0.000
0.000
0.000
0.000
0.000
0.001
0.008
0.048
0.244
0.992
3.274
8.768
18.997
33.047
45.525
48.559
38.696
21.691
7.632
1.268
228.750
• Computing the mean:
   xpx   15.0
x
• Computing the variance:


s 2   x   2 px




  x px    xpx 
x
x

2
 228.75  15.0  3.75
x
2
• Computing the standard deviation:
s  s2
 3.75  1.936
2
Histogram

0.3
s
p(x)
0.2
0.1
0
2
4
6
8
10
12
-0.1
no. of successes
14
16
18
20
Probability Distributions
of Continuous Random Variables
Probability Density Function
The probability distribution of a continuous random
variable is describe by probability density curve f(x).
Notes:

The Total Area under the probability density curve is 1.

The Area under the probability density curve is from a to
b is P[a < X < b].
Normal Probability Distributions
P(a  x  b)
a

b
x
Normal Probability Distributions
• The normal probability distribution is the most
important distribution in all of statistics
• Many continuous random variables have normal or
approximately normal distributions
The Normal Probability Distribution
Points of
Inflection
s
  3s   2s   s

 s
  2s   3s
Main characteristics of the Normal Distribution
• Bell Shaped, symmetric
• Points of inflection on the bell shaped curve are
at  – s and  + s. That is one standard deviation
from the mean
• Area under the bell shaped curve between  – s
and  + s is approximately 2/3.
• Area under the bell shaped curve between  – 2s
and  + 2s is approximately 95%.
There are many Normal distributions
depending on by  and s
Normal  = 100, s =20
0.03
Normal  = 100, s = 40
Normal  = 140, s =20
f(x)
0.02
0.01
0
0
50
100
x
150
200
The Standard Normal Distribution
 = 0, s = 1
0.4
0.3
0.2
0.1
0
-3
-2
-1
0
1
2
3
• There are infinitely many normal probability
distributions (differing in  and s)
• Area under the Normal distribution with mean  and
standard deviation s can be converted to area under the
standard normal distribution
• If X has a Normal distribution with mean  and standard
deviation s than
z
X 
s
has a standard normal distribution.
• z is called the standard score (z-score) of X.
Converting Area
under the Normal distribution with mean  and
standard deviation s
to
Area under the standard normal distribution
Perform the z-transformation
z
then
X 
P a  X  b
s
Area under the Normal
distribution with mean 
and standard deviation s
a   X   b   
 P



s
s
s


b
a  
 P
z

s
s


Area under the
standard normal
distribution
Area under the Normal distribution with
mean  and standard deviation s
P a  X  b
s
a

b
Area under the standard normal distribution
b
a  
P
z

s
s


1
a
s
0
b
s
Using the tables for the Standard
Normal distribution
Table, Posted on stats 245 web site
0
z
• The table contains the area under the standard
normal curve between -∞ and a specific value of z
Example
Find the area under the standard normal curve between z = -∞
and z = 1.45
0.9265
• A portion of Table 3:
z
0.00
0.01
0.02
0
1.45
0.03
0.04
z
0.05
..
.
1.4
..
.
P( z  1.45)  0.9265
0.9265
0.06
Example
Find the area to the left of -0.98; P(z < -0.98)
Area asked for
0.98
P ( z <  0.98)  0.1635
0
Example
Find the area under the normal curve to the right of z =
1.45; P(z > 1.45)
Area asked for
0.9265
0
1.45
P( z  1.45)  1.0000 0.9265 0.0735
z
Example
Find the area to the between z = 0 and of z = 1.45; P(0 < z
< 1.45)
0
1.45
P(0  z < 1.45)  0.9265  0.5000  0.4265
• Area between two points = differences in two
tabled areas
z
Notes
Use the fact that the area above zero and the area
below zero is 0.5000



the area above zero is 0.5000
When finding normal distribution probabilities, a sketch
is always helpful
Example:
Find the area between the mean (z = 0) and z = -1.26
Area asked for
 1.26
0
z
P(1.26 < z < 0)  0.5000 0.1038 0.3962
Example: Find the area between z = -2.30 and z = 1.80
Required Area
.-2.30
0
. 1.80
P(2.30 < z < 1.80)  0.9641  0.0107  0.9534
Example: Find the area between z = -1.40 and z = -0.50
Area asked
for
-1.40
0
- 0.50
P(1.40 < z < 0.50)  0.3085 0.0808 0.2277
Computing Areas under the general
Normal Distributions
(mean , standard deviation s)
Approach:
1. Convert the random variable, X, to its z-score.
z
X 
s
2. Convert the limits on random variable, X, to
their z-scores.
3. Convert area under the distribution of X to area
under the standard normal distribution.
b
a  
Pa  X  b  P 
z

s
s


Example 1: Suppose a man aged 40-45 is selected at
random from a population.
• X is the Blood Pressure of the man.
• X is random variable.
• Assume that X has a Normal distribution with mean
 =180 and a standard deviation s = 15.
The probability density of X is plotted in the graph
below.
• Suppose that we are interested in the probability
that X between 170 and 210.
X 
X  180
z

s
15
170   170  180
a

 0.667
s
15
210   210  180
b

 2.000
s
15
Let
Hence
P170  X  210  P .667  z  2.000
P170  X  210  P .667  z  2.000
P170  X  210  P .667  z  2.000
Example 2
A bottling machine is adjusted to fill bottles with a
mean of 32.0 oz of soda and standard deviation of
0.02. Assume the amount of fill is normally
distributed and a bottle is selected at random:
1) Find the probability the bottle contains
between 32.00 oz and 32.025 oz
2) Find the probability the bottle contains more
than 31.97 oz
Solution part 1)
When x = 32.00
z
32.00  
s
32.00  32

 0.00
0.02
When x = 32.025
z
32.025  
s
32.025  32

 1.25
0.02
Graphical Illustration:
Area asked for
32.0
0
32.025
1.25
x
z
32.0  32.0 X  32.0 32.025  32.0 

<
<

P ( 32.0 < X < 32.025)  P 


0.02
0.02
0.02
 P ( 0 < z < 1.25)  0. 3944
Example 2, Part 2)
31.97
 150
.
32.0
0
x
z
x  32.0
3197
.  32.0 


  P( z  150)
P( x  3197
. )  P
.
 0.02

0.02
 1.0000  0.0668  0.9332
Summary
Random Variables
Numerical Quantities whose values are
determine by the outcome of a random
experiment
Types of Random Variables
•
•
Discrete
Possible values integers
Continuous
Possible values vary over a continuum
The Probability distribution of a
random variable
A Mathematical description of the possible
values of the random variable together with
the probabilities of those values
The probability distribution of a
discrete random variable is describe
by its :
probability function p(x).
p(x) = the probability that X takes on
the value x.
0.4
p (x )
0.3
0.2
0.1
0
1
2
3
4
5
6
7
Number of successes, x
8
9
10
The Binomial distribution
X is said to have the Binomial distribution with
parameters n and p.
1. X is the number of successes occurring in the n
repetitions of a Success-Failure Experiment.
2. The probability of success is p.
3. The probability function
 n x
n x
px     p 1  p 
 x
Probability Distributions
of Continuous Random Variables
Probability Density Function
The probability distribution of a continuous random
variable is describe by probability density curve f(x).
Notes:

The Total Area under the probability density curve is 1.

The Area under the probability density curve is from a to
b is P[a < X < b].
The Normal Probability Distribution
Points of
Inflection
s
  3s   2s   s

 s
  2s   3s
Normal approximation to the
Binomial distribution
Using the Normal distribution to calculate
Binomial probabilities
Binomial distribution n = 20, p = 0.70
0.2500
Approximating
Normal distribution
0.2000
  np  14
s  npq  2.049
0.1500
Binomial distribution
0.1000
0.0500
-0
-0.5
2
4
6
8
10
12
14
16
18
20
Normal Approximation to the
Binomial distribution
PX  a  Pa  12  Y  a  12 
• X has a Binomial distribution with
parameters n and p
• Y has a Normal distribution
  np
s  npq
1
2
 continuitycorrection
0.2500
Approximating
Normal distribution
0.2000
P[X = a]
0.1500
Binomial distribution
0.1000
0.0500
-0
-0.5
2
4
6
8
10
a  12
12
a
14
a
16
1
2
18
20
0.2500
0.2000
Pa  12  Y  a  12 
0.1500
0.1000
0.0500
--
-0.5
a
0.2500
0.2000
P[X = a]
0.1500
0.1000
0.0500
--
-0.5
a
Example
• X has a Binomial distribution with
parameters n = 20 and p = 0.70
We want PX  13
T heexact value PX  13
 20
13
7
  0.70 0.30  0.1643
 13
Using the Normal approximation to the
Binomial distribution
PX  13  P12 12  Y  13 12 
Where Y has a Normal distribution with:
  np  20(0.70)  14
s  npq  20.70.30  2.049
Hence
P12.5  Y  13.5
12.5  14 Y  14 13.5  14
 P



2
.
049
2
.
049
2
.
049


 P 0.73  Z  0.24
= 0.4052 - 0.2327 = 0.1725
Compare with 0.1643
Normal Approximation to the
Binomial distribution
Pa  X  b  p(a)  p(a  1)   p(b)
1
1
 Pa  2  Y  b  2 
• X has a Binomial distribution with
parameters n and p
• Y has a Normal distribution
  np
s  npq
1
2  continuitycorrection
0.2500
Pa  X  b
0.2000
0.1500
0.1000
0.0500
--
-0.5
a  12
a
b
b  12
0.2500
Pa  12  Y  b  12 
0.2000
0.1500
0.1000
0.0500
--
-0.5
a  12
a
b
b  12
Example
• X has a Binomial distribution with
parameters n = 20 and p = 0.70
We want P11 X  14
T heexact value P11 X  14
 p(11)  p(12)  p(13)  p(14)
 20
 20
11
9
14
6
  0.70 0.30     0.70 0.30
 11
 14 
 0.0654  0.1144  0.1643  0.1916  0.5357
Using the Normal approximation to the
Binomial distribution
P11 X  14  P10 12  Y  14 12 
Where Y has a Normal distribution with:
  np  20(0.70)  14
s  npq  20.70.30  2.049
Hence
P10.5  Y  14.5
10.5  14 Y  14 14.5  14
 P



2.049
2.049 
 2.049
 P1.71 Z  0.24
= 0.5948 - 0.0436 = 0.5512
Compare with 0.5357
Comment:
• The accuracy of the normal
appoximation to the binomial
increases with increasing values of n
Normal Approximation to the
Binomial distribution
Pa  X  b  p(a)  p(a  1)   p(b)
1
1
 Pa  2  Y  b  2 
• X has a Binomial distribution with
parameters n and p
• Y has a Normal distribution
  np
s  npq
1
2  continuitycorrection
Example
• The success rate for an Eye operation is 85%
• The operation is performed n = 2000 times
Find the probability that
1. The number of successful operations is
between 1650 and 1750.
2. The number of successful operations is at
most 1800.
Solution
• X has a Binomial distribution with
parameters n = 2000 and p = 0.85
We want P1680 X  1720
 P1679.5  Y  1720.5
where Y has a Normal distribution with:
  np  2000(0.85)  1700
s  npq  200.85.15  15.969
Hence P1680 X  1720
 P1679.5  Y  1720.5
1679.5  1700 Y  1700 1720.5  1700
 P



15
.
969
15
.
969
15
.
969


 P1.28  Z  1.28
= 0.9004 - 0.0436 = 0.8008
Solution – part 2.
We want PX  1800
 PY  1800.5
 Y  1700 1800.5  1700
 P

15.969 
 15.969
 PZ  6.29
= 1.000
Next topic: Sampling Theory

Chapter 1: Statistics

Transcript Chapter 1: Statistics

Directory