Statistics for Managers Using Microsoft Excel, 3/e

Download Report

Transcript Statistics for Managers Using Microsoft Excel, 3/e

Statistics for Managers
Using Microsoft Excel
3rd Edition
Chapter 5
The Normal Distribution and
Sampling Distributions
Chapter Topics

The normal distribution

The standardized normal distribution

Evaluating the normality assumption

The exponential distribution
Chapter Topics

Introduction to sampling distribution

Sampling distribution of the mean

Sampling distribution of the proportion

Sampling from finite population
(continued)
Continuous Probability
Distributions

Continuous random variable



Continuous probability distribution


Values from interval of numbers
Absence of gaps
Distribution of continuous random variable
Most important continuous probability
distribution

The normal distribution
The Normal Distribution





“Bell shaped”
Symmetrical
Mean, median and
mode are equal
Interquartile range
equals 1.33 s
Random variable
has infinite range
f(X)

Mean
Median
Mode
X
The Mathematical Model
f X  
1

e
1
2s
2
X





2s 2
f  X  : density of random variable X
  3.14159;
e  2.71828
 : population mean
s : population standard deviation
X : value of random variable    X   
Expectation
E( X ) 

1
2 s
1
2 s





xe


1
2 s
0 
 ( x   ) 2 / 2s 2
( x   )e


e

dx
 ( x   ) 2 / 2s 2
 ( x   ) 2 / 2s 2
dx
d x    
Variance
EX    
2



s
 
1
2 s
2
2
s2
2




(
x 2
s
2  y2 / 2
y e
2
) e
 ( xs ) 2 / 2
dy
d
 
x
s
Many Normal Distributions
There are an infinite number of normal distributions
By varying the parameters s and , we
obtain different normal distributions
Finding Probabilities
Probability is
the area under
the curve!
P c  X  d   ?
f(X)
c
d
X
Which Table to Use?
An infinite number of normal distributions
means an infinite number of tables to look up!
Solution: The Cumulative
Standardized Normal Distribution
Cumulative Standardized Normal
Distribution Table (Portion)
Z
.00
.01
Z  0
sZ 1
.02
.5478
0.0 .5000 .5040 .5080
Shaded Area
Exaggerated
0.1 .5398 .5438 .5478
0.2 .5793 .5832 .5871
Probabilities
0.3 .6179 .6217 .6255
0
Z = 0.12
Only One Table is Needed
Standardizing Example
Z
X 
s
6.2  5

 0.12
10
Standardized
Normal Distribution
Normal Distribution
s  10
 5
sZ 1
6.2
X
Shaded Area Exaggerated
Z  0
0.12
Z
Example:
P  2.9  X  7.1  .1664
Z
X 
s
2.9  5

 .21
10
Z
X 
s
7.1  5

 .21
10
Standardized
Normal Distribution
Normal Distribution
s  10
.0832
sZ 1
.0832
2.9
 5
7.1
X
0.21
Shaded Area Exaggerated
Z  0
0.21
Z
Example:
P  2.9  X  7.1  .1664(continued)
Cumulative Standardized Normal
Distribution Table (Portion)
Z
.00
.01
Z  0
sZ 1
.02
.5832
0.0 .5000 .5040 .5080
Shaded Area
Exaggerated
0.1 .5398 .5438 .5478
0.2 .5793 .5832 .5871
0.3 .6179 .6217 .6255
0
Z = 0.21
Example:
P  2.9  X  7.1  .1664(continued)
Cumulative Standardized Normal
Distribution Table (Portion)
Z
.00
.01
.02
Z  0
sZ 1
.4168
-03 .3821 .3783 .3745
Shaded Area
Exaggerated
-02 .4207 .4168 .4129
-0.1 .4602 .4562 .4522
0.0 .5000 .4960 .4920
0
Z = -0.21
Normal Distribution in PHStat


PHStat | probability & prob. Distributions |
normal …
Example in excel spreadsheet
Example:
P  X  8  .3821
Z
X 
s
85

 .30
10
Standardized
Normal Distribution
Normal Distribution
s  10
sZ 1
.3821
 5
8
X
Shaded Area Exaggerated
Z  0
0.30
Z
Example:
P  X  8  .3821
Cumulative Standardized Normal
Distribution Table (Portion)
Z
.00
.01
Z  0
(continued)
sZ 1
.02
.6179
0.0 .5000 .5040 .5080
Shaded Area
Exaggerated
0.1 .5398 .5438 .5478
0.2 .5793 .5832 .5871
0.3 .6179 .6217 .6255
0
Z = 0.30
Finding Z Values
for Known Probabilities
What is Z Given
Probability = 0.1217 ?
Z  0
sZ 1
Cumulative Standardized
Normal Distribution Table
(Portion)
Z
.00
.01
0.2
0.0 .5000 .5040 .5080
.6217
0.1 .5398 .5438 .5478
0.2 .5793 .5832 .5871
Shaded Area
Exaggerated
0
Z  .31
0.3 .6179 .6217 .6255
Recovering X Values
for Known Probabilities
Standardized
Normal Distribution
Normal Distribution
s  10
sZ 1
.1179
.3821
 5
?
X
Z  0
0.30
X    Zs  5  .3010  8
Z
Assessing Normality


Not all continuous random variables are
normally distributed
It is important to evaluate how well the data
set seems to be adequately approximated by
a normal distribution
Assessing Normality

Construct charts



(continued)
For small- or moderate-sized data sets, do stemand-leaf display and box-and-whisker plot look
symmetric?
For large data sets, does the histogram or polygon
appear bell-shaped?
Compute descriptive summary measures



Do the mean, median and mode have similar
values?
Is the interquartile range approximately 1.33 s?
Is the range approximately 6 s?
Assessing Normality

Observe the distribution of the data set




(continued)
Do approximately
between mean 
Do approximately
between mean 
Do approximately
between mean 
2/3 of the observations lie
1 standard deviation?
4/5 of the observations lie
1.28 standard deviations?
19/20 of the observations lie
2 standard deviations?
Evaluate normal probability plot

Do the points lie on or close to a straight line with
positive slope?
Assessing Normality

(continued)
Normal probability plot




Arrange data into ordered array
Find corresponding standardized normal quantile
values
Plot the pairs of points with observed data values
on the vertical axis and the standardized normal
quantile values on the horizontal axis
Evaluate the plot for evidence of linearity
Assessing Normality
(continued)
Normal Probability Plot for Normal
Distribution
90
X 60
Z
30
-2 -1 0 1 2
Look for Straight Line!
Normal Probability Plot
Left-Skewed
Right-Skewed
90
90
X 60
X 60
Z
30
-2 -1 0 1 2
-2 -1 0 1 2
Rectangular
U-Shaped
90
90
X 60
X 60
Z
30
-2 -1 0 1 2
Z
30
Z
30
-2 -1 0 1 2
Exponential Distributions
P  arrival time  X   1  e
 X
X : any value of continuous random variable
 : the population average number of
arrivals per unit of time
1/: average time between arrivals
e  2.71828
e.g.: Drivers Arriving at a Toll Bridge;
Customers Arriving at an ATM Machine
Exponential Distributions
(continued)

Describes time or distance between events


f(X)
Density function


Used for queues
f  x 
Parameters

 
1

e

x

 = 0.5
 = 2.0
X
s 
Example
e.g.: Customers arrive at the check out line
of a supermarket at the rate of 30 per hour.
What is the probability that the arrival time
between consecutive customers to be
greater than five minutes?
  30
X  5 / 60 hours
P  arrival time >X   1  P  arrival time  X 

1 1 e
 .0821
30 5/ 60 

Exponential Distribution
in PHStat


PHStat | probability & prob. Distributions |
exponential
Example in excel spreadsheet
Why Study
Sampling Distributions


Sample statistics are used to estimate
population parameters
 e.g.: X  50 Estimates the population mean 
Problems: different samples provide different
estimate



Large samples gives better estimate; Large
samples costs more
How good is the estimate?
Approach to solution: theoretical basis is
sampling distribution
Sampling Distribution


Theoretical probability distribution of a
sample statistic
Sample statistic is a random variable


Sample mean, sample proportion
Results from taking all possible samples
of the same size
Developing Sampling
Distributions

Assume there is a population …

Population size N=4


B
C
Random variable, X,
is age of individuals
Values of X: 18, 20,
22, 24 measured in
years
A
D
Developing Sampling
Distributions
(continued)
Summary Measures for the Population Distribution
N

X
i 1
P(X)
i
.3
N
18  20  22  24

 21
4
N
s 
 X
i 1
i

N
.2
.1
0
2
 2.236
A
B
C
D
(18)
(20)
(22)
(24)
Uniform Distribution
X
Developing Sampling
Distributions
All Possible Samples of Size n=2
1st
Obs
2nd Observation
18
20
22
24
18 18,18 18,20 18,22 18,24
20 20,18 20,20 20,22 20,24
(continued)
16 Sample Means
22 22,18 22,20 22,22 22,24
1st 2nd Observation
Obs 18 20 22 24
24 24,18 24,20 24,22 24,24
18 18 19 20 21
16 Samples Taken
with Replacement
20 19 20 21 22
22 20 21 22 23
24 21 22 23 24
Developing Sampling
Distributions
(continued)
Sampling Distribution of All Sample Means
Sample Means
Distribution
16 Sample Means
1st 2nd Observation
Obs 18 20 22 24
18 18 19 20 21
20 19 20 21 22
22 20 21 22 23
24 21 22 23 24
P(X)
.3
.2
.1
0
_
18 19
20 21 22 23
24
X
Developing Sampling
Distributions
(continued)
Summary Measures of Sampling Distribution
N
X 
X
i 1
N
i
18  19  19 

16
N
sX 
 X
i 1
i
 X 
 21
2
N
18  21  19  21
2

 24
16
2

  24  21
2
 1.58
Comparing the Population with
its Sampling Distribution
Population
N=4
  21
s  2.236
Sample Means Distribution
n=2
 X  21
P(X)
.3
P(X)
.3
.2
.2
.1
.1
0
0
A
B
C
(18)
(20)
(22)
D X
(24)
s X  1.58
_
18 19
20 21 22 23
24
X
Properties of Summary Measures

X  



I.E. X Is unbiased
Standard error (standard deviation) of the
sampling distribution s X is less than the
standard error of other unbiased estimators
For sampling with replacement:

As n increases,
sX
decreases
sX 
s
n
Unbiasedness
P(X)
Unbiased

Biased
X
X
Less Variability
P(X)
Sampling
Distribution
of Median
Sampling
Distribution of
Mean

X
Effect of Large Sample
Larger
sample size
P(X)
Smaller
sample size

X
When the Population is Normal
Population Distribution
Central Tendency
X  
Variation
sX 
s
n
Sampling with
Replacement
s  10
  50
Sampling Distributions
n4
n  16
sX 5
s X  2.5
 X  50
X
When the Population
is Not Normal
Population Distribution
Central Tendency
X  
Variation
sX 
s
n
Sampling with
Replacement
s  10
  50
Sampling Distributions
n4
n  30
sX 5
s X  1.8
 X  50
X
Central Limit Theorem
As sample
size gets
large
enough…
the
sampling
distribution
becomes
almost
normal
regardless
of shape of
population
X
How Large is Large Enough?

For most distributions, n>30

For fairly symmetric distributions, n>15

For normal distribution, the sampling
distribution of the mean is always normally
distributed
Example:   8
s =2
n  25
P  7.8  X  8.2   ?
 7.8  8 X   X 8.2  8 
P  7.8  X  8.2   P 



sX
2 / 25 
 2 / 25
 P  .5  Z  .5   .3830
Standardized
Normal Distribution
Sampling Distribution
2
sX 
 .4
25
sZ 1
.1915
7.8
8.2
X  8
X
0.5
Z  0
0.5
Z
Population Proportions

Categorical variable



e.g.: Gender, voted for Bush, college degree
Proportion of population having a
characteristic  p 
Sample proportion provides an estimate


 p
X number of successes
pS  
n
sample size
If two outcomes, X has a binomial distribution

Possess or do not possess characteristic
Sampling Distribution
of Sample Proportion

Approximated by
normal distribution


np  5
n 1  p   5
P(ps)
.3
.2
.1
0
Mean:

Sampling Distribution
p  p
0
.2
.4
.6
8
1
ps
S

Standard error:

sp 
S
p 1  p 
n
p = population proportion
Standardizing Sampling
Distribution of Proportion
Z
pS   pS
sp
S
p 1  p 
n
Standardized
Normal Distribution
Sampling Distribution
sp

pS  p
sZ 1
S
p
S
pS
Z  0
Z
Example:
n  200
p  .4
P  pS  .43  ?

 p 
.43  .4
S
pS

P  pS  .43  P

 s pS
.4 1  .4 

200

Standardized
Normal Distribution
Sampling Distribution
sp


  P  Z  .87   .8078



sZ 1
S
 p .43
S
pS
0 .87
Z
Sampling from Finite Sample


Modify standard error if sample size (n) is
large relative to population size (N )

n  .05N or n / N  .05

Use finite population correction factor (fpc)
Standard error with FPC


sX 
sP 
S
s
n
N n
N 1
p 1  p  N  n
n
N 1