STAT 205 slides - University of South Carolina

Download Report

Transcript STAT 205 slides - University of South Carolina

Elementary Statistics for the
Biological and Life Sciences
STAT 205
University of South Carolina
Columbia, SC
© 2005, University of South Carolina. All rights reserved, except where previous rights
exist. No part of this material may be reproduced, stored in a retrieval system, or
transmitted in any form or by any means — electronic, mechanical, photoreproduction,
recording, or scanning — without the prior written consent of the University of South
Carolina.
DoStat Sign-up

Go to http://www.dostat.com

“SIGN UP” as a “student” using your VIP login
name as your DOSTAT login name
• Use course reference DS- _____
• Submit an e-mail address you read often
• ( this is how you will receive info. on course
announcements)
STAT205 – Elementary Statistics for the Biological and Life Sciences
2
DoStat and StatCrunch

We will use the StatCrunch online
statistical system for online statistical
computations and graphics.

http://www.statcrunch.com

We will also use the DoStat course
management system for homework and
example online calculations.

STAT205 – Elementary Statistics for the Biological and Life Sciences
3
Motivation: why analyze data?




Clinical trials/drug development:
compare existing treatments with new
methods to cure disease.
Agriculture: enhance crop yields,
improve pest resistance
Ecology: study how ecosystems
develop/respond to environmental
impacts
Lab studies: learn more about
biological tissue/cellular activity
STAT205 – Elementary Statistics for the Biological and Life Sciences
4
Chapter 2: Description of
Populations and Samples
Selected tables and figures from Samuels, M. L., and Witmer, J. A., Statistics for the
Life Sciences, 3rd Ed. © 2003, Prentice Hall, Upper Saddle River, NJ. Used by permission.
STAT205 – Elementary Statistics for the Biological and Life Sciences
5
Statistics is:

Statistics is the science of
• collecting,
• summarizing,
• analyzing, and
• interpreting
data.

Goal: to understand the underlying
biological phenomena that generate
the data.
STAT205 – Elementary Statistics for the Biological and Life Sciences
6
Random Variables

Data are generated by some random
process or phenomenon.

Any observed datum represents the
outcome of a Random Variable.

NOTATION: upper case letter, W, X, Y, etc.
STAT205 – Elementary Statistics for the Biological and Life Sciences
7
Types of Random
Variables
 Qualitative
• Nominal (e.g., blood type – A, B, AB, O)
• Ordinal (e.g., therapy response – none,
some, cured)

Quantitative
• Discrete (e.g., number of nests – 0,1,2,…)
• Continuous (e.g., cholesterol conc. – 220.2,
210.4, 180.9, etc.)
STAT205 – Elementary Statistics for the Biological and Life Sciences
8
Random Samples

We take data as samples from a larger
population.

DEF’N: A SAMPLE is a collection of
‘subjects’ upon which we measure one
or more variables.

DEF’N: The SAMPLE SIZE is the
number of subjects in a sample.
NOTATION: n.
STAT205 – Elementary Statistics for the Biological and Life Sciences
9
Observations

DEF’N: The OBSERVATIONAL UNIT is
the type of subject being sampled.
Example: observational units could be
(i) baby, (ii) moth, (iii), Petri dish, etc.

DEF’N: An OBSERVATION is a recorded
outcome of a variable from a random
sample.
NOTATION: lower case letter, x, y, etc.
STAT205 – Elementary Statistics for the Biological and Life Sciences
10
Frequency Distributions

DEF’N: A FREQUENCY DISTRIBUTION is a
summary display of the frequencies of
occurrence of each value in a sample.

DEF’N: A RELATIVE FREQUENCY (a
percent or proportion ) is a raw frequency
divided by sample size, n:
Freq
Rel. Freq. =
n
STAT205 – Elementary Statistics for the Biological and Life Sciences
11
Frequency Distn’s

Frequency distributions come in varied
shapes:
•
•
•
•
•

Symmetric & bell-shaped
Symmetric, not bell-shaped
Asymmetric & skewed right
Asymmetric & skewed left
Bimodal
We use histograms, etc., to visualize these
shapes in the data.
STAT205 – Elementary Statistics for the Biological and Life Sciences
12
Example 2.4
Ex. 2.4: Y = no. of piglets surviving 21
days (litter size).
A sample of n=36 pigs (sows) generated
the data in Table 2.4.
STAT205 – Elementary Statistics for the Biological and Life Sciences
13
Dot Plot

A DOT PLOT is a simple graphic where
dots indicate observed data in a sample.

Ex. 2.4: Fig. 2.4 gives the dot plot for the
litter size data:
STAT205 – Elementary Statistics for the Biological and Life Sciences
14
Histogram

A HISTOGRAM is a simple bar chart where
the bars replace the dots in a dot plot.

Ex. 2.4 (cont’d): Fig. 2.5 gives the
histogram for the litter size data.
STAT205 – Elementary Statistics for the Biological and Life Sciences
15
Stemplot

A STEMPLOT (a.k.a. STEM-LEAF
DIAGRAM) is a dot plot (often drawn on its
side) with data information replacing the
dots.

The ‘stems’ are the core values of the data,
set in common groups.

The ‘leaves’ are the last digits of each
datum.
STAT205 – Elementary Statistics for the Biological and Life Sciences
16
Example 2.8

Ex. 2.8: Y = radish growth. Data in Table 2.8:

Radish Growth after 3 days in Total Darkness
STAT205 – Elementary Statistics for the Biological and Life Sciences
17
Descriptive Statistics

DEF’N: The SAMPLE MEAN is the
arithmetic average of a set of n data values.

NOTATION:
n
y1 + y2 +
1
y = n  yi =
n
i=1

+ yn
The sample mean is often viewed as a kind
of ‘balance point’ in the data.
STAT205 – Elementary Statistics for the Biological and Life Sciences
18
Example 2.15

Ex. 2.15: Y = weight gain (lb) of lambs on
special diet. Data: {11, 13, 19, 2, 10, 1}

n = 6:
y = 11 + 13 + 19 + 2 + 10 + 1
6
= 56 = 9.33 lb
6

Fig. 2.27:
STAT205 – Elementary Statistics for the Biological and Life Sciences
19
Sample Median

DEF’N: The SAMPLE MEDIAN is the
value of the data nearest to their
middle.

Find the median by ordering the
data, and calculating their middle
point (n odd) or the average of their
two middle points (n even).

NOTATION: Q2
STAT205 – Elementary Statistics for the Biological and Life Sciences
20
Example 2.17

Ex. 2.17: (2.15 cont’d) Lamb weight gain.
n = 6 is even , so find Q2 as avg. of two
middle points

ordered data: y(1) = 1, y(2) = 2,
y(3) = 10, y(4) = 11, y(5) = 13, y(6) = 19.
Q2 = 10 + 11 = 10.5 lb
2
STAT205 – Elementary Statistics for the Biological and Life Sciences
21
Example 2.19
Ex. 2.19: Y = cricket singing times.
Data in Table 2.10:
STAT205 – Elementary Statistics for the Biological and Life Sciences
22
Example 2.19 (cont’d)
STAT205 – Elementary Statistics for the Biological and Life Sciences
23
Skewness

Mean & median indicate skewness:
• If data are skewed right, mean > median.
• If data are skewed left, mean < median.
• If data are symmetric, mean ≈ median.

Both the mean and the median are useful
summary measures of location. The
median is slightly more ROBUST to
extreme values of yi, but of course, the
mean is easier to calculate.
STAT205 – Elementary Statistics for the Biological and Life Sciences
24
Quartiles
DEF’N: The QUARTILES of a distribution are
points that separate the data into quarters or
fourths:
• The first quartile separates the lower 25% of
the data from the upper 75%.
NOTATION: Q1
• The second quartile separates the lower 50%
of the data from the upper 50%. NOTATION: Q2
• The third quartile separates the lower 75% of
the data from the upper 25%.
NOTATION: Q3
STAT205 – Elementary Statistics for the Biological and Life Sciences
25
Example 2.20

Ex. 2.20: Y = Systolic blood pressure
(mm Hg) in men; n= 7.

Ordered data:
y(1) = 113, y(2) = 124, y(3) = 124,
y(4) = 132,
y(5) = 146, y(6) = 151, y(7) = 170.
 Q1 = 124
 Q2 = 132
 Q3 = 151
STAT205 – Elementary Statistics for the Biological and Life Sciences
26
IQR

DEF’N: The INTER-QUARTILE RANGE is
IQR = Q3 – Q1

DEF’N: The MINIMUM is the smallest value
of a data set or distribution.
NOTATION: y(1)

DEF’N: The MAXIMUM is the largest value
of a data set or distribution.
NOTATION: y(n)
STAT205 – Elementary Statistics for the Biological and Life Sciences
27
Five Number Summary

DEF’N: The FIVE NUMBER SUMMARY is
{y(1), Q1, Q2, Q3, y(n)}

DEF’N: A BOXPLOT is a graphic plot of the
5-no. summary, with a box spanning the
IQR and bridging the quartiles:
y(1)
Q1
Q2
Q3
y(n)
STAT205 – Elementary Statistics for the Biological and Life Sciences
28
Example 2.22
Ex. 2.22: Y = radish growth data from Ex.
2.8. Five-no. summary is {8, 15, 21, 30, 37}.
Boxplot is given in Fig. 2.30:
STAT205 – Elementary Statistics for the Biological and Life Sciences
29
Example 2.23
Ex. 2.23: Y = radish
growth data over three
different growth
regimes (see Ex. 2.9).
In Fig. 2.32, we use
boxplots for comparative purposes.

STAT205 – Elementary Statistics for the Biological and Life Sciences
30
Outliers

DEF’N: An OUTLIER is an obsv’n that differs
dramatically from the rest of the data.
Formally: Yi is an outlier if
Yi < Q1 – (1.5  IQR) or Yi > Q3 + (1.5  IQR)
“lower fence”
“upper fence”
STAT205 – Elementary Statistics for the Biological and Life Sciences
31
Example 2.25

Ex. 2.25: Y = radish growth data in full light
(from Ex. 2.23). The ordered data are:
3, 5, 5, 7, 7, 8, 9, 10, 10, 10, 10, 14, 20, 21

IQR = Q3 – Q1 = 10 – 7 = 3
 Upper fence = Q3 + (1.5  IQR)
= 10 + (1.5)(3) = 14.5
 Lower fence = Q1 – (1.5  IQR)
= 7 – (1.5)(3) = 2.5
 y = 20 and y = 21 are outliers.
STAT205 – Elementary Statistics for the Biological and Life Sciences
32
Dispersion

DEF’N: The SAMPLE RANGE is
Range = Y(n) – Y(1) = Max. – Min.

DEF’N: The SAMPLE VARIANCE is
2
S = 1
n-1

n
(Yi - Y)

i=1
2
DEF’N: The SAMPLE STANDARD
DEVIATION (SD) is S = S2
STAT205 – Elementary Statistics for the Biological and Life Sciences
33
The Empirical Rule
The sample mean and the sample SD
are useful in describing data sets (that
are unimodal and not too skewed). The
EMPIRICAL RULE states that
• ~68% of the data lie between
Y - S and Y + S
• ~95% of the data lie between
Y - 2S and Y + 2S
• >99% of the data lie between
Y - 3S and Y + 3S
STAT205 – Elementary Statistics for the Biological and Life Sciences
34
Example 2.36
Ex. 2.36: Suppose Y = pulse rate after 5 mins.
of exercise. For n = 28 subjects, we find Y =
98 (beats/min) and S = 13.4 (beats/min).
Thus, e.g., from the empirical rule we expect
~95% of the data to lie between
98 – (2)(13.4) = 98 – 26.8 = 71.2 beats/min
and
98 + (2)(13.4) = 98 + 26.8 = 124.8 beats/min.
STAT205 – Elementary Statistics for the Biological and Life Sciences
35
Inference

DEF’N: The POPULATION is the larger group of
subjects (organisms, plots, regions,
ecosystems, etc.) on which we wish to draw
inferences.

DEF’N: A PARAMETER is a quantified
population characteristic. E.g., the popl’n mean
is m and popl’n standard deviation is s.

DEF’N: A STATISTIC is a sample quantity used
to estimate a popl’n parameter.
STAT205 – Elementary Statistics for the Biological and Life Sciences
36
Proportions

DEF’N: The POPULATION PROPORTION
is the proportion of subjects exhibiting a
particular trait or outcome in the popl’n.
(It generalizes to the probability that any
popl’n element will exhibit the trait.)
NOTATION: p

DEF’N: The SAMPLE PROPORTION is the
number of sample elements exhibiting the
trait, divided by the sample size, n.
NOTATION: p
STAT205 – Elementary Statistics for the Biological and Life Sciences
37
Chapter 3: Random
Sampling, Probability, and the
Binomial Distribution
Selected tables and figures from Samuels, M. L., and Witmer, J. A., Statistics for the
Life Sciences, 3rd Ed. © 2003, Prentice Hall, Upper Saddle River, NJ. Used by permission.
STAT205 – Elementary Statistics for the Biological and Life Sciences
38
Random Samples

DEF’N: A SIMPLE RANDOM SAMPLE of n
items is a data set where
(a) every popl’n element has an equal chance
of selection, and
(b) every popl’n element is chosen
independently of every other element.

This draws upon the larger concept of
RANDOMIZATION: selection of data that
avoids sources of possible bias.
STAT205 – Elementary Statistics for the Biological and Life Sciences
39
Random Sampling
To choose a random sample:
1. assign each popl’n element a unique
code (or set of codes);
2. from a random number table (Table 1,
p. 670) or via computer, in a systematic
manner select n random digits whose
range corresponds to the codes assigned
above; and
3. select every element if its code appears in
step (2), ignoring repeated codes or those
with no assignment.
STAT205 – Elementary Statistics for the Biological and Life Sciences
40
Example 3.1
Ex. 3.1: Simple random sample of size n =
6 from population of 75 elements.
1. label each element 01, 02, …, 75
2. select random digits from a source such
as Table 1 or DoStat
3. choose elements for the sample if they
correspond to the selected random digits
(ignore repeats and drop-outs)
See Table 3.1

STAT205 – Elementary Statistics for the Biological and Life Sciences
41
Example 3.1 (cont’d)
 The sample uses elements 23, 38, 59, 21, 08, 09
STAT205 – Elementary Statistics for the Biological and Life Sciences
42
Probability

DEF’N: A PROBABILITY is the chance of
some event, E, occurring in a specified
manner. NOTATION: P{E}

We often view probabilities from a
Relative Frequency Interpretation:
# ways E occurs
P{E} =
# total events
STAT205 – Elementary Statistics for the Biological and Life Sciences
43
Example 3.12
Ex. 3.12: Toss a fair coin twice. We know
P{H} = 1/2 (see Ex. 3.8). What is P{HH}?
 Consider all possible outcomes:
HH, HT, TH, TT
 If each outcome is equally likely, then
# HH
# all outcomes
= 1
4
P{HH} =
STAT205 – Elementary Statistics for the Biological and Life Sciences
44
Probability Rules

Rule 1: 0 ≤ P{E} ≤ 1.

Rule 2: The entirety of events has
probability = 1. That is, if E1, ..., Ek are
all the possible events, ∑P{Ei} = 1.

Rule 3: (The Complement Rule):
c
c
If E = {not E}, then P{E } = 1 – P{E}.
STAT205 – Elementary Statistics for the Biological and Life Sciences
45
Example 3.19

Ex. 3.19: U.S. Blood types:
P{O} = 0.44
P{A} = 0.42
P{B} = 0.10
P{AB} = 0.04

Note: (1) all are between 0 and 1

and (2) P{O} + P{A} + P{B} + P{AB}
= 0.44 + 0.42 + 0.10 + 0.04
= 1.00


So, e.g., P{Oc} = 1 – P{O} = 1 – 0.44 = 0.56
STAT205 – Elementary Statistics for the Biological and Life Sciences
46
Probability (cont’d)

DEF’N: Two events, E1 and E2, are
DISJOINT (a.k.a MUTUALLY EXCLUSIVE) if
they cannot occur simultaneously.

DEF’N: The UNION of two events, E1 and
E2, is the event that E1 or E2 (or both)
occurs.

DEF’N: The INTERSECTION of two
events, E1 and E2, is the event that E1 and
E2 occurs.
STAT205 – Elementary Statistics for the Biological and Life Sciences
47
Venn Diagrams
A useful graphic to conceptualize how
events interrelate is the Venn Diagram.
 For example, Fig. 3.8 shows a Venn Diagram
with 2 intersecting events, E1 and E2:

STAT205 – Elementary Statistics for the Biological and Life Sciences
48
Probability Rules (cont’d)

We often denote the entirety of events as
the Sample Space, S. Conversely, the
c
Null Space is  = S

Rule 4: If E1 and E2 are disjoint, then
P{E1 or E2} = P{E1} + P{E2}.

Rule 5: If E1 and E2 are any two events,
then
P{E1 or E2} = P{E1} + P{E2} – P{E1 and E2}.
STAT205 – Elementary Statistics for the Biological and Life Sciences
49
Example 3.20
Ex. 3.20: Hair/Eye color of 1770 men. We
have the following distribution of traits:
So, e.g., P{Black Hair} = 500/1770, etc.
STAT205 – Elementary Statistics for the Biological and Life Sciences
50
Example 3.20 (cont’d)
Find P{Black Hair OR Red Hair}.
Clearly, E1 = {Black Hair} and
E2 = {Red Hair} are disjoint,
so from Rule 4,
P{Black Hair OR Red Hair}
= P{Black Hair} + P{Red Hair}
= 500/1770 + 70/1770 = 570/1770
= 0.32.
STAT205 – Elementary Statistics for the Biological and Life Sciences
51
Example 3.20 (cont’d)
Now, find P{Black Hair OR Blue Eyes}.
Here, E1 = {Black Hair} and
E2 = {Blue Eyes} are NOT disjoint,
so apply Rule 5:
P{Black Hair OR Blue Eyes}
= P{Black Hair} + P{Blue Eyes}
– P{Black Hair AND Blue Eyes}
= 500/1770 + 1050/1770 – 200/1770
= 1350/1770 = 0.76.
STAT205 – Elementary Statistics for the Biological and Life Sciences
52
Probability (cont’d)

DEF’N: Two events, E1 and E2, are
INDEPENDENT if knowledge that E1 occurs
does not affect P{E2} and vice versa.
If two events are not independent, they are
DEPENDENT.

DEF’N: A CONDITIONAL PROBABILITY is
the probability that 1 event occurs, given
that the other has already occurred.
NOTATION: P{E1 | E2}.
STAT205 – Elementary Statistics for the Biological and Life Sciences
53
Probability Rules (cont’d)

Rule 6: If E1 and E2 are independent, then
P{E1 and E2} = P{E1}  P{E2}.

Rule 7: If E1 and E2 are any two events, then
P{E1 and E2} = P{E1}  P{E2 | E1}
= P{E2}  P{E1 | E2}.
Consequences:

• if E1 and E2 are independent, then
P{E1} = P{E1 | E2} and P{E2} = P{E2 | E1}
• also, P{E2 | E1} = P{E1 and E2}/P{E1} if P{E1}≠0.
STAT205 – Elementary Statistics for the Biological and Life Sciences
54
Examples 3.21–3.22
Exs. 3.21–3.22 (3.20, cont’d): Hair/Eye color
of 1770 men.
Refer back to Table 3.3. There, we saw
P{Blue Eyes AND Black Hair} = 200/1770,
while P{Black Hair} = 500/1770. So,
P{Blue Eyes | Black Hair}
P{Blue Eyes AND Black Hair}
=
P{Black Hair}
= 200/1770 = 200 = 0.40
500/1770
500
STAT205 – Elementary Statistics for the Biological and Life Sciences
55
Example 3.25
Ex. 3.25 (3.20, cont’d): Hair/Eye color of 1770
men.
In Table 3.3, there is no evidence of independence between Hair & Eye color. So, e.g.,
P{Red Hair AND Brown Eyes}
= P{Red Hair} P{Brown Eyes | Red Hair}
=
70
1770
20 = 20
70
1770
which agrees with the display in Table 3.3.
STAT205 – Elementary Statistics for the Biological and Life Sciences
56
Density Curves
DEF’N: A RANDOM VARIABLE is a
measured outcome of some random
process.
 When a random variable is discrete, it is
usually straightforward to interpret
probabilities associated with it.
 For instance, if Y = {# leaves on tree}:

 P{Y = 122} = 0.42 is interpretable
 P{Y = 18} = 0.02 is interpretable
 but P{Y=120.472} is not interpretable.
STAT205 – Elementary Statistics for the Biological and Life Sciences
57
Probability Histogram
A probability histogram is used to
visualize discrete probability masses:
P{Y=k}
0.5
0.4
0.3
0.2
0.1
0
1
2
3
4
5
k
6
7
8
9
Notice: each “mass” has area=probability,
and all masses sum to 1.
STAT205 – Elementary Statistics for the Biological and Life Sciences
58
Continuous Random Variables


By contrast, a continuous random variable
has a different probability interpretation.
Extending the probability histogram to the
continuous case, we say Y has a
PROBABILITY DENSITY CURVE, where area
still represents probability.
STAT205 – Elementary Statistics for the Biological and Life Sciences
59
Continuous Random Variables
Consequences of the continuous probability
model:
• P{Y = a} = 0 = P{Y = b} (area of a line is zero)
• So, P{Y ≤ a} = P{Y < a} + P{Y = a} = P{Y < a}
• And for that matter:
P{a ≤ Y ≤ b} = P{a < Y ≤ b}
= P{a ≤ Y < b} = P{a < Y < b}
(all if Y is continuous).
STAT205 – Elementary Statistics for the Biological and Life Sciences
60
Example 3.30
Ex. 3.30: Y = diameter (in.) of tree trunk.
• Suppose the density has the form given in
Fig. 3.13:
• Then, for example, P{Y > 8} =
P{8 < Y ≤ 10} + P{Y > 10} = 0.12 + 0.07 = 0.19
STAT205 – Elementary Statistics for the Biological and Life Sciences
61
Mean and Expected Value

DEF’N: If Y is a discrete random variable,
its POPULATION MEAN is given by
µY = ∑yiP{Y = yi}
(where the sum is taken over all possible
yi’s)

More generally, the EXPECTED VALUE of Y
is E(Y) = ∑yiP{Y = yi}.
STAT205 – Elementary Statistics for the Biological and Life Sciences
62
Example 3.35
Ex. 3.35: Y = # tail vertebrae in fish.
From Table 3.4 we find
yi
20
21
22
23
P{Y = yi}
.03
.51
.40
.06
So, E(Y) = ∑yiP{Y = yi}
= (20)(.03) + (21)(.51) + (22)(.40) + (23)(.06)
= … = 21.49.
STAT205 – Elementary Statistics for the Biological and Life Sciences
63
Variance

DEF’N: If Y is a discrete random variable,
its POPULATION VARIANCE is given by
sY2 = ∑(yi – µY)2P{Y = yi}
One can show this is also
sY2 = E(Y2) – {E(Y)}2 = E(Y2) – µY2

From this, the POPULATION STANDARD
DEVIATION of Y is sY = (sY2)1/2.
STAT205 – Elementary Statistics for the Biological and Life Sciences
64
Example 3.37
Ex. 3.37: (3.35, cont’d). From Table 3.4 we
were given the values of P{Y = yi}.
Recall µY = 21.49.
So, sY2 = ∑(yi – µY)2P{Y = yi}
2
2
= (20–21.49) (.03) + (21–21.49) (.51)
2
2
+ (22–21.49) (.40) + (23–21.49) (.06)
= … = 0.4299.
STAT205 – Elementary Statistics for the Biological and Life Sciences
65
Example 3.37 (cont’d)
So sY2 = 0.4299.
But, it’s a lot easier to use
sY2 = E(Y2) – µY2 =
2
2
{(20) (.03) + (21) (.51)
2
2
+ (22) (.40) + (23) (.06)} – (21.49)2
= 462.25 – 461.8201
= 0.4299.
STAT205 – Elementary Statistics for the Biological and Life Sciences
66
Rules of Expected Value

E(·) is a mathematical operator.

It has certain general properties:
• Rule E1: E(aX + bY) = aE(X) + bE(Y)
= aµX + bµY
• Rule E2: E(a + bY) = a + bE(Y) = a +
bµY
(a “linear operator”)
STAT205 – Elementary Statistics for the Biological and Life Sciences
67
Rules of Variance
The special variance operator also has
certain general properties:
• Rule E3: If X and Y are independent, then
sX+Y2 = sX2 + sY2.
• Rule E4: If X and Y are independent, then
sX–Y2 = sX2 + sY2.
• General rule: If X and Y are independent,
then
saX+bY2 = a2sX2 + b2sY2.
STAT205 – Elementary Statistics for the Biological and Life Sciences
68
Example 3.41
Ex. 3.41: X = mass of cylinder from balance.
Y = mass of cylinder from 2nd balance.
Suppose sX = 0.03 and sY = 0.04. Then, if we
calculate the difference between the two
weighings, X – Y, we know
sX-Y =
=
s2X + s2Y =
0.032 + 0.042
0.0009 + 0.0016 = 0.0025 = 0.05
STAT205 – Elementary Statistics for the Biological and Life Sciences
69
Independent Trials

DEF’N: The INDEPENDENT TRIALS
MODEL occurs when
(i) n independent trials are studied
(ii) each trial results in a single binary obsv’n
(iii) each trial’s success has (constant)
probability: P{success} = p
Notice that if P{success} = p, P{failure} = 1–p.

We call this a BInS (Binary / Indep. / n is
const. / Same p) setting.
STAT205 – Elementary Statistics for the Biological and Life Sciences
70
Example 3.43
Ex 3.43: Suppose 39% of organisms in a
popl’n exhibit a mutant trait. Sample n=5
organisms randomly and check for
mutation:
•
•
•
•
Binary?
Indep.?
n const.?
Same p?




(mutant vs. non-mutant)
(if no bias in sampling)
(n=5)
(p = 0.39)
STAT205 – Elementary Statistics for the Biological and Life Sciences
71
Binomial Distribution

DEF’N: In a BInS setting, if we let
Y = {# successes} then Y has a
BINOMIAL DISTRIBUTION.

NOTATION: Y ~ Bin(n,p).

The binomial probability function is
P{Y = j} = nCj p j (1 – p)n–j
(j = 0,1,…,n).
STAT205 – Elementary Statistics for the Biological and Life Sciences
72
Binomial Coefficient

In the binomial probability function
P{Y = j} = nCj p j (1 – p)n–j
the BINOMIAL COEFFICIENT is
n!
C
=
n j
j! (n-j)!

Also, j! is the FACTORIAL OPERATOR:
j! = j(j–1)(j–2)…(2)(1)

We define 0! = 1.
STAT205 – Elementary Statistics for the Biological and Life Sciences
73
Factorial Operator
Example of factorial operator: at n = 5,
5! = (5)(4)(3)(2)(1) = 120
4! = (4)(3)(2)(1) = 24
3! = (3)(2)(1) = 6
2! = (2)(1) = 2
So:
j
nCj
0
1
1
5
2
10
3
10
4
5
5
1
(Also see Table 3.6 on page 105 of text.)
Values of nCj are given in Table 2 (p. 674)
STAT205 – Elementary Statistics for the Biological and Life Sciences
74
Table 3.6
STAT205 – Elementary Statistics for the Biological and Life Sciences
75
Example 3.45
Ex 3.45 (Ex. 3.43 cont’d): Y ~ Bin(5 , 0.39);
So P{Y = 3} = 5C3(.39)3(.61)2
= (10)(.0593)(.3721) = 0.22.
Can also find this via
DoStat. Table 3.7 gives
the full distribution.
Figure 3.15 gives a
probability histogram.
STAT205 – Elementary Statistics for the Biological and Life Sciences
76
Binomial Mean & Variance

If Y ~ Bin(n,p), the population mean and
variance are:
µY = np
and sY2 = np(1–p)

Ex. 3.49: Y = {# Rh+ in BInS sample}. We’re
given p = P{Rh+} = 0.85. So, if n = 6, we
expect µY = (6)(0.85) = 5.1 Rh+ in the
sample, with sY2 = (6)(.85)(.15) = 0.765, so
that sY = √.765 = 0.87 Rh+ .
STAT205 – Elementary Statistics for the Biological and Life Sciences
77
Chapter 4:
The Normal Distribution
Selected tables and figures from Samuels, M. L., and Witmer, J. A., Statistics for the
Life Sciences, 3rd Ed. © 2003, Prentice Hall, Upper Saddle River, NJ. Used by permission.
STAT205 – Elementary Statistics for the Biological and Life Sciences
78
Normal Distribution

DEF’N: A continuous random variable
Y has a NORMAL DISTRIBUTION if its
probability density can be written as
2
2
-(y-µ
)
2s
1
/
Y
Y
f (y) =
e
sY 2
over –∞ < y < ∞.


NOTATION: Y ~ N(µY , sY2)
The mean and variance of a normal dist’n
are E(Y) = µY and E[(Y – µY)2] = sY2.
STAT205 – Elementary Statistics for the Biological and Life Sciences
79
Normal Dist’n Examples

The Normal distribution appears in many
biological contexts:

Ex. 4.1: Y = serum cholesterol (mg/dLi)

Ex. 4.2: Y = eggshell thickness (mm)

Ex. 4.3: Y = nerve cell interspike times (ms)
STAT205 – Elementary Statistics for the Biological and Life Sciences
80
Normal Curve
The Normal density curve is
(i) continuous over –∞ < y < ∞
(ii) symmetric about y = µ
(iii) unimodal, and hence “bell-shaped”
STAT205 – Elementary Statistics for the Biological and Life Sciences
81
Figure 4.7
Since each µ,s2 pair indexes a different
Normal dist’n, this represents a rich family
of curves:
STAT205 – Elementary Statistics for the Biological and Life Sciences
82
Standard Normal

DEF’N: The STANDARDIZATION FORMULA
for Y ~ N(µ,s2) is
Z = (Y – µ)/s
This is often called a ‘Z-score’.

If Y ~ N(µ,s2), then Z ~ N(0,1) and we say Z
has a STANDARD NORMAL dist’n.

Std. Normal probab’s are tabulated in Table
3 (p. 675) and on text’s inside front cover.
STAT205 – Elementary Statistics for the Biological and Life Sciences
83
(Portion of) Table 3, p.675
STAT205 – Elementary Statistics for the Biological and Life Sciences
84
P(Z ≤ z)
Example: (p. 124) Suppose Z ~ N(0,1).
Find P{Z ≤ 1.53}.
In Table 3:
1.53  0.03



1.5 ………... 0.9370
Hint: “always draw the picture”
STAT205 – Elementary Statistics for the Biological and Life Sciences
85
P(a < Z ≤ b)

If Z ~ N(0,1), and we find P{Z ≤ 1.53} = 0.937,
notice then that P{Z > 1.53} = 1 – 0.937
= 0.063.

Example: (p. 125) Suppose Z ~ N(0,1); then
P{–1.20 < Z ≤ 0.80} = P{Z ≤ 0.80} – P{Z ≤ –1.20}
= 0.7881 – 0.1151 = 0.6730.
(See Fig. 4.11)

Can also find Std. Normal probabilities using
DoStat’s Normal dist’n calculator!
STAT205 – Elementary Statistics for the Biological and Life Sciences
86
Empirical Rule, revisited

If Z ~ N(0,1), it mimics the empirical rule
very closely:

The same effect holds for any Y ~ N(µ,s2).
STAT205 – Elementary Statistics for the Biological and Life Sciences
87
Example 4.5
Ex. 4.5: Y = length of herrings (mm).
Suppose Y ~ N(54, 20.25). Then we know
Z = Y - 54 = Y - 54 ~ N(0,1)
4.5
20.25
(a) What % of fish are less than 60 mm long?
P[Y < 60] = P Y - 54 < 60 - 54
4.5
4.5
= P Z < 6 = P[Z < 1.33]
4.5
= 0.9082
STAT205 – Elementary Statistics for the Biological and Life Sciences
88
Example 4.5 (cont’d)
Y = length of herrings ~ N(54, 20.25).
(c) What % of fish are between 51 and 60 mm
long?
P[51 < Y < 60] = P 51 - 54 < Y - 54 < 60 - 54
4.5
4.5
4.5
= P -3 < Z < 6
4.5
4.5
= P[-.67 < Z < 1.33]
= P[Z  1.33] - P[Z < -.67]
= 0.9082 - 0.2514 = 0.6568
STAT205 – Elementary Statistics for the Biological and Life Sciences
89
Std. Normal Tail Areas

We can also INVERT the std. Normal table
(Table 3):

Z ~ N(0,1), so find P{Z < 1.96} = 0.975. Then we
know P{Z > 1.96} = 1 – 0.975 = 0.025.

So, 2.5% of std. normal popl’n exceeds 1.96.
STAT205 – Elementary Statistics for the Biological and Life Sciences
90
za
More generally, if we find some number
za such that P{Z ≤ za} = 1 – a, we know
P{Z > za} = a and vice versa:
STAT205 – Elementary Statistics for the Biological and Life Sciences
91
Std. Normal Critical Point

DEF’N: The UPPER- a CRITICAL POINT
from Z ~ N(0,1) is the value za such that
P{Z > za} = a.

Find za by:
• carefully inverting Table 3
• reading off the bottom row (df = ∞) of
Table 4 (p. 677)
• using DoStat’s Normal dist’n calculator
STAT205 – Elementary Statistics for the Biological and Life Sciences
92
Percentiles

DEF’N: The point of a distribution below
which p% lies is the p th PERCENTILE of
the dist’n.

If Z ~ N(0,1), za is the (1 – a)th percentile
of Z.

We often ask what value is the p th
percentile of a biological population (see
Ex. 4.6).
STAT205 – Elementary Statistics for the Biological and Life Sciences
93
Example 4.6
STAT205 – Elementary Statistics for the Biological and Life Sciences
94
Example 4.6 (cont’d)

We want to find y* such that P{Y < y*} = 0.70.
This is
y* - 54
y* - 54
Y
54
P
<
= PZ<
4.5
4.5
4.5

Now, from Table 3 we find P{Z < 0.52} =
0.6985 is close to 0.70. This tells us to
equate (approximately) 0.52 and (y*–54)/4.5
 y* – 54 ≈ (0.52)(4.5)
 y* ≈ (0.52)(4.5) + 54 = 56.34
STAT205 – Elementary Statistics for the Biological and Life Sciences
95
Example 4.6 (conclusion)
So, we find that approximately 70% (69.85%,
exactly) of herring are less than 56.34 mm
long.
Notice also that we derived the critical point
z0.30 ≈ 0.52. (More precisely, we found z0.3015
= 0.52.)
Using DoStat, we can find z0.30 = 0.5244: this
yields the exact value y* = (0.5244)(4.5) + 54
= 56.36 for Example 4.6.
STAT205 – Elementary Statistics for the Biological and Life Sciences
96
Assessing Normality

Since many statistical procedures are
based on having data from a normal
population, we need ways to access
whether it is a reasonable to use a normal
model.

We have shown that a histogram can be
distorted by the selection of group size
(binwidth) so we will consider a statistical
graph called a normal probability plot or
QQ plot.
STAT205 – Elementary Statistics for the Biological and Life Sciences
97
QQ Plots

A QQ Plot can be used to assess normality of the
data.

A QQ Plot is a scatter plot of the ordered pairs for
the
normal score (x) vs. data value (y)
for all values in a data set.

If the plot of data points show a linear pattern we
can infer that the data values follow a normal
distribution.
STAT205 – Elementary Statistics for the Biological and Life Sciences
98
Example

The heights in inches of 11 women are
listed below. Check the assumption that
the data is distributed normally.

61 62.5 63 64 64.5
65 66.5 67 68 68.5 70.5
STAT205 – Elementary Statistics for the Biological and Life Sciences
99
Normal Probability Plot of the
Height Data
STAT205 – Elementary Statistics for the Biological and Life Sciences
100
Example

Measurements made for 62 mammals.
Reference: Sleep in Mammals: Ecological
and Constitutional Correlates, by Allison, T.
and Cicchetti, D. (1976), Science,
November 12, vol. 194, pp. 732-734.

Variable: Brain Weight (g)
STAT205 – Elementary Statistics for the Biological and Life Sciences
101
Normal Probability Plot of
Brain Weight (g)
STAT205 – Elementary Statistics for the Biological and Life Sciences
102
Normal Probability Plot of
log(brainweight(g))
STAT205 – Elementary Statistics for the Biological and Life Sciences
103
Chapter 5:
Sampling Distributions
Selected tables and figures from Samuels, M. L., and Witmer, J. A., Statistics for the
Life Sciences, 3rd Ed. © 2003, Prentice Hall, Upper Saddle River, NJ. Used by permission.
STAT205 – Elementary Statistics for the Biological and Life Sciences
104
Sampling Variability

Question: If Y is random, say Y ~ N(µ,s2),
and we take a random sample, Y1,Y2,…,Yn,
aren’t the Yi’s also random?

And, if the Yi’s are random, aren’t any
statistics based on them, such as Y
or S2?

This is known as SAMPLING VARIABILITY.
STAT205 – Elementary Statistics for the Biological and Life Sciences
105
Sampling Distributions

The fact that a sample statistic may itself
have a probab. dist’n is called the
SAMPLING DISTRIBUTION of the statistic.

Think of it as repeatedly taking a new
sample from the same popl’n and finding
each sample mean, ad infinitum.
• What will the probab. histogram/density
function of the sample mean look like?

The textbook calls this a Meta-Experiment.
STAT205 – Elementary Statistics for the Biological and Life Sciences
106
Binary Data

Recall that for Y ~ Bin(n,p) we can
estimate p if it is unknown using the
SAMPLE PROPORTION:
p = Y
n

Since Y is random, so is this statistic.
What is the sampling dist’n of p ?
STAT205 – Elementary Statistics for the Biological and Life Sciences
107
Example 5.4
Ex. 5.4: Y = # of people with 20/15 vision
(“superior”).
 Say n = 2. We are given P{superior} = 0.3.


Let p = Y/n. What are its possible values?
Clearly, Y = 0, 1, or 2. Thus, e.g.,
Pp=
1
2
= P[Y = 1]
= 2C1 (.3)1(.7)1
= (2)(.3)(.7) = .42
STAT205 – Elementary Statistics for the Biological and Life Sciences
108
Example 5.4 (cont’d)
Sampling dist’n of p :
j
0
1
2
p
0
1/2
1
P(Y = j)
.49
.42
.09
j
2
.49
.42
.09
P (p =
)
STAT205 – Elementary Statistics for the Biological and Life Sciences
109
Large-Sample Dist’n

Example 5.4 gives the sampling dist’n at
n = 2. The effort gets harder as n
increases. (Try it at n = 10….)

Fig. 5.5 shows the effect at larger n:
STAT205 – Elementary Statistics for the Biological and Life Sciences
110
Continuous Data

DEF’N: Given a random sample, Y1,Y2,…,Yn,
where E(Yi) = µ and E[(Yi – µ)2] = s2, then
(i) the POPL'N MEAN of Y is E(Y) = µ
(ii) the POPL'N VARIANCE of Y is
sY2
2
s
= n
(iii) the POPL'N SD of Y is sY = s
n
 Notice: same popl’n mean,
while SD  as n .
STAT205 – Elementary Statistics for the Biological and Life Sciences
111
Distribution of the
Sample Mean

If Yi ~ i.i.d. N(µ , s2) for i = 1,…,n, then
2
s
Y ~ N(µ , n )
 Once again:
• Same mean
• SD  as n 
• So, more precision as as n 
STAT205 – Elementary Statistics for the Biological and Life Sciences
112
Example 5.9

Ex. 5.9: Y = weight of seeds ~ N(500,14400).

Suppose n = 4. Since Y is normal, so is the
sample mean:
Y ~ N(500 ,

14400
4
) = N(500,3600)
And so, Z = Y - 500 = Y - 500 ~ N(0,1)
60
3600
STAT205 – Elementary Statistics for the Biological and Life Sciences
113
Example 5.9 (cont’d)
So, e.g.,
P[Y > 550] = P Y - 500 > 550 - 500
3600
3600
= P Z > 50 = P[Z > 0.83]
60
= 1 - P[Z < 0.83]
= 1 - .7967
= 0.2033
STAT205 – Elementary Statistics for the Biological and Life Sciences
114
CLT

Theorem: The CENTRAL LIMIT THEOREM
states that for any i.i.d. random sample,
Y1,Y2,…,Yn, where E(Yi) = µ and E[(Yi – µ)2] =
s 2,
2
s
Y  N(µ , n )
as n  ∞.

This is approximately true for any finite n,
and the approximation improves as n  ∞.
(A powerful tool !)
STAT205 – Elementary Statistics for the Biological and Life Sciences
115
CLT and Sample Size
Sometimes, the CLT kicks in after only a few
observations ( small n).
 But, sometimes we need a very large n:

STAT205 – Elementary Statistics for the Biological and Life Sciences
116
Example 5.13
Ex. 5.13: Y = # eye facets in fruit fly.
• Clearly Y is a count and can’t be exactly
normal (see the idealized plot in Fig. 5.13).
• But, by about n = 32 we’re close to normal:
STAT205 – Elementary Statistics for the Biological and Life Sciences
117
Unbiased Estimation

Parameters such as µ or p are usually
unknown, and we use the sample data to
estimate them.

DEF’N: If an estimator q of an unknown
parameter q has the property that E q = q
we say it is an UNBIASED ESTIMATOR.
(A BIASED estimator is not unbiased.)

For instance, we know E(Y) = µ, so Y is
unbiased for µ.
STAT205 – Elementary Statistics for the Biological and Life Sciences
118
Standard Error

DEF’N: The STANDARD ERROR of a point
estimator is the estimated SD (the square
root of the variance) of the estimator:
SE q = Variance q

DEF’N: The STANDARD ERROR OF THE
MEAN (SEM) is the estimated SD of the
sample mean:
2
2
SY
sY
SY
SE(Y) =
=
=
n
n
n
STAT205 – Elementary Statistics for the Biological and Life Sciences
119
Examples 6.1-6.2


Ex. 6.1-6.2: Y = stem length of soybean
plants (cm). n = 13:
2
We find Y = 21.34 cm and S = 1.486
so SE(Y) =
1.486 = 1.22 = 0.338 cm
13
13
STAT205 – Elementary Statistics for the Biological and Life Sciences
120
SE vs. SD

DO NOT confuse the SE with the SD !

In Ex. 6.2, the SD of the sample was
S = √1.486 = 1.22,
but the SEM was 1.22/√13 = 0.34.
(Usually, we round SEM to 2 signif. digits.)

Notice here again that as n  , SEM 
 more precision in larger samples.
STAT205 – Elementary Statistics for the Biological and Life Sciences
121