6.11 – The Normal Distribution

Download Report

Transcript 6.11 – The Normal Distribution

6.11 – The Normal Distribution
IB Math SL/HL Y1&Y2 - Santowski
(A) Random Variables
 Now we wish to combine some basic statistics with some
basic probability  we are interested in the numbers that
are associated with situations resulting from elements of
chance i.e. in the values of random variables
We also wish to know the probabilities with which these
random variables take in the range of their possible values
 i.e. their probability distributions
(A) Random Variables
So 2 definitions need to be clarified:
(i) a discrete random variable is a variable quantity which occurs
randomly in a given experiment and which can assume certain, well
defined values, usually integral  examples: number of bicycles sold
in a week, number of defective light bulbs in a shipment
discrete random variables involve a count
(ii) a continuous random variable is a variable quantity which occurs
randomly in a given experiment and which can assume all possible
values within a specified range  examples: the heights of men in a
basketball league, the volume of rainwater in a water tank in a month
continuous random variables involve a measure
CLASSWORK: (to review the distinction
between the 2 types of random variables)
Math SL text, pg 710, Chap29A, Q1,2,3
 Math HL Text, p 728, Chap 30A, Q1,2,3
(C) The Normal Distribution
- data obtained by direct measurement (i.e. population
heights) is usually continuous rather than discrete (all
heights are possible, not just whole numbers
- continuous data also has statistical distributions and many
physical quantities are usually distributed symmetrically
and unimodally about the mean  statisticians observe
this bell shaped curve so often that its model is known as
the normal distribution
(C) The Normal Distribution
the graph of the normal distribution is also referred to as
the standard normal curve and one defining equation for
the curve for our purposes is
F z I
2 K
,  z  
f ( z) 
where z refers to a concept called the z score which takes
into account the mean and standard deviation of a set of
(C) The Normal Distribution
we can graph the normal distribution as
follows, where the x-axis is the number
of standard deviations, , from the
mean/median,  (the idea behind our z
the total area under the curve is 1 unit
(aising from the fact that the total
probability of all outcomes of an event
can be at most 1 or 100%)
With our z-score, we “set” the mean,  ,
to be 0 and each 1 unit of the x-axis is
1 standard deviation, .
(C) The Normal Distribution
to find the area under the curve between
any two given z-scores, we can rely on
the area under the curve between our
two given z-scores means the
proportion of values between our two
so if we write p(-2 < z < 1) = 0.81859,
we mean that the proportion of data
values that are between 2 standard
deviation units below the mean and 1
unit above is 0.81859, or as a
percentage: 81.859% of our data, or the
probability that our data values lie
between 2 SD’s below and 1 SD unit
above the mean is 0.0.81859  we can
illustrate this on a normal distribution
graph as follows:
(C) The Normal Distribution –
Tables of z scores
We can work out the previous example without a graph and shading areas
under a graph, by simply using prepared tables:
SL Math text, p735 and HL Math text, p772
So to determine the p(-2 < z < 1), we check the table and see that a z value of –
2.00 corresponds to a value of 0.0228  this means that the area shaded under
the curve, starting from –2.00 all the way left to - is 0.0288 (or 2.88% of the
data is more than 2 SD units below the mean)
Likewise, we check the table for our z value of 1.00 and see the value of
0.8413  this means that the area shaded under the curve, starting from 1.00
all the way left to - is 0.8413 (or 84.13% of the data is less than 1 SD units
above the mean)
So what do we do with the 2 numbers? Well, we have accounted for some of
the data twice  the data more than 2 SD units below the mean  so this gets
subtracted from the first value  0.8413 – 0.0288 = 0.8185 as we saw before
with the graph and graphing software
(D) Examples
Use the table to evaluate
p(z<1.5). Interpret the value.
The table gives us the value
0.9332, which means that
93.32% of our data lies 1.5 SD
units above the mean and below
 or the probability of getting
a random data point that is at
most 1.5 SD units above the
mean is 0.9332
We can see this illustrated on
the graph
(D) Example Using Standard Normal Tables
For the standard normal variable, find:
Some slightly more challenging examples:
(i) p(z > 1.7)
(ii) p(z < -0.88)
(iii) p(z > -1.53)
And now some in-between values:
(i) p(z < 1)
(ii) p(z < 0.96)
(iii) p(z < 0.03)
(i) p(1.7 < z < 2.5)
(ii) p(-1.12 < z < 0.67)
(iii) p(-2.45 < z < -0.08)
WE can also do some Ainverse@ problems
(i) p(z < a) = 0.5478
(ii) p(z > a) = 0.6
(iii) p(z < a) = 0.05
(E) Homework
SL Math text, Chap 29H.1, p736, Q1-4
 HL Math text, Chap30K.1, p757, Q2-5
(F) Standardizing Normal Distributions
When we have applications wherein we apply a normal
distribution (i.e. with any continuous R/V like height,
weight of people), each unique application has its own
unique mean and standard deviation along with its unique
distribution graph
What we wish to accomplish now  can we somehow
standardize a normal distribution so that one single
standardized normal distribution applies for every single
possible normal distribution
We can accomplish this by a combination of
transformations of our unique data with its unique normal
(F) Standardizing Normal Distributions
So from every data point in our distribution, we will
subtract the population’s mean and then divide this
difference by the population’s standard deviation  we
will call this result a “z”-score
 So our “formula” for this data transformation is z = (x )/
 So we then graph the newly transformed data points and
we get a standardized normal distribution curve
 The two key features on the standardized normal
distribution curve are (i) the mean is 0 and (ii) the standard
deviation is 1
(G) Graph of Standardized Normal Distribution
(H) Working with a Standardized
Normal Distribution
Ex 1  The heights of all rugby players
from India is normally distributed with a
mean of 179 cm with a standard deviation
of 5 cm. Find the probability that a
randomly selected player
 (i) was less than 181 cm tall
 (ii) was at least 177.5 cm tall
 (iii) was between 175 and 190 cm
(H) Working with a Standardized
Normal Distribution
Solution #1(i) is to use the zscore tables
z = (181-179)/5 = 0.40
So find 0.40 on the tables,
which is 0.6554
So given that the table gives us
the cumulative area under the
curve until the specified z-score
(0.40), then we can conclude
that 65.5% of the players would
be less than 181 cm
Alternatively, we can use a
We simply select the
normalcdf( command and enter
the specifics as follows:
which tells the GDC that you
want the heights less than 181
(basically from 181 down to ) and that the population mean
is 179 and the SD is 5
Our result is 0.6554 ….. similar
to the result from the table
(H) Working with a Standardized
Normal Distribution
Solution #1(ii)  use the z-score tables  however we must realize that the
table gives us a cumulative area under the curve up to the given z-score 
now however we are looking for a value GREATER than the given area
So, using the table, simply find the area under the curve BELOW the given zscore
Then, using the “complement” idea, simply subtract the area from 1
z-score = (177.5-179)/5 = -0.30
Table value is 0.4404 (so 44.04% of the area under the curve is to the left of –
0.30 on the z-axis)
Therefore, the area representing the probability of our players being
GREATER than 177.5 cm would be 1 – 0.4404 = 0.5596  (so this would be
the area under the curve, to the right of z = -0.30)
In using the GDC, we again simply enter the command normalcdf(177.5,
EE99, 179, 5) and get 0.5596 as our answer
(H) Working with a Standardized
Normal Distribution
Solution #1(iii)  use the z-score tables  however we must realize
that the table gives us a cumulative area under the curve up to the
given z-score  now however we are looking for a value BETWEEN
2 given values
So our two z-scores for 175 and 190 are z = –0.80 and z = 2.1, which
we can illustrate below
(H) Working with a Standardized
Normal Distribution
So, again our tables require several steps in the calculation
(i) find the area under the curve that is LESS THAN –0.80  0.2119
(ii) Now find the area under the curve that is less than 2.1  0.9821
So clearly, the 0.9821 total cumulative area includes the 0.2119 that we
DO NOT have within our specified range of z-scores (player heights
less than 175 cm)
Which suggests that we need to subtract the 0.2119 from 0.9821 =
Alternatively, using the GDC, we enter normalcdf(175,190,179,5) and
get the same 0.7702…..
(I) Homework
HL Math text
Chap30K.2, p759, Q1-3
 Chap 30K.3, p760, Q1-4
 Chap 30L, p761, Q1-7
SL Math text
Chap 29H.2, p738, Q1-3
 Chap 29H.3, p739, Q1-3
 Chap 29I, p740, Q1-8