Stats 244.3 - The Department of Mathematics & Statistics

Download Report

Transcript Stats 244.3 - The Department of Mathematics & Statistics

Stats 244.3(02)
Review
Summarizing Data
Graphical Methods
Histogram
Grouped Freq Table
8
7
6
5
4
3
2
1
0
70 to 80 80 to 90
90 to
100
100 to
110
110 to
120
120 to
130
Stem-Leaf Diagram
8
9
10
11
12
024669
04455699
224559
189
70 to 80
80 to 90
90 to 100
100 to 110
110 to 120
120 to 130
Verbal IQ Math IQ
1
1
6
2
7
11
6
4
3
4
0
1
Box-whisker Plot
Summary
Numerical Measures
Measure of Central Location
n
1. Mean
x
x
i 1
i
n
• Center of gravity
2. Median
• “middle” observation
Measure of Non-Central Location
1. Percentiles
2. Quartiles
1. Lower quartile (Q1) (25th percentile)
(lower mid-hinge)
2. median (Q2) (50th percentile) (hinge)
3. Upper quartile (Q3) (75th percentile)
(upper mid-hinge)
Measure of Variability
(Dispersion, Spread)
1.
2.
3.
4.
Range
Inter-Quartile Range
Variance, standard deviation
Pseudo-standard deviation
1. Range
R = Range = max - min
2. Inter-Quartile Range (IQR)
Inter-Quartile Range = IQR = Q3 - Q1
The Sample Variance
Is defined as the quantity:
n
d
i 1
n
2
i
n 1

 x  x 
i 1
2
i
n 1
and is denoted by the symbol
s
2
The Sample Standard Deviation s
Definition: The Sample Standard Deviation is
defined by:
n
s
d
i 1
n
2
i
n 1

 x  x 
i 1
2
i
n 1
Hence the Sample Standard Deviation, s, is the
square root of the sample variance.
Interpretations of s
• In Normal distributions
– Approximately 2/3 of the observations will lie
within one standard deviation of the mean
– Approximately 95% of the observations lie
within two standard deviations of the mean
– In a histogram of the Normal distribution, the
standard deviation is approximately the
distance from the mode to the inflection point
Mode
0.14
0.12
Inflection point
0.1
0.08
0.06
0.04
s
0.02
0
0
5
10
15
20
25
2/3
s
s
2s
Computing formulae for s and s2
The sum of squares of deviations from the the
mean can also be computed using the
following identity:
2


x



i
n
n
2
2
i 1


xi  x    xi 

n
i 1
i 1
n
Then:
n
s 
2


x

i

n
i 1
2


xi 

n
i 1

n 1
n
 x  x 
i 1
i
n 1
2
2
and


x



i
n
2
i 1


xi 

n
i 1

n 1
n
n
s
 x  x 
i 1
2
i
n 1
2
A quick (rough) calculation of s
Range
s
4
The reason for this is that approximately all
(95%) of the observations are between x  2s
and x  2s.
Thus max  x  2s and min  x  2s.
and Range max min  x  2s   x  2s .
 4s
Range
Hence s 
4
The Pseudo Standard Deviation (PSD)
Definition: The Pseudo Standard Deviation
(PSD) is defined by:
IQR InterQuart ile Range
PSD 

1.35
1.35
Properties
• For Normal distributions the magnitude of the
pseudo standard deviation (PSD) and the standard
deviation (s) will be approximately the same value
• For leptokurtic distributions the standard deviation
(s) will be larger than the pseudo standard
deviation (PSD)
• For platykurtic distributions the standard deviation
(s) will be smaller than the pseudo standard
deviation (PSD)
Measures of Shape
Measures of Shape
• Skewness
0.14
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0.12
0.1
0.08
0.06
0.04
0.02
0
0
5
10
15
20
25
0
5
10
15
20
25
0
5
10
15
20
25
• Kurtosis
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0
-3
-2
-1
0
1
2
3
0
0
5
10
15
20
25
-3
-2
-1
0
1
2
3
• Skewness – based on the sum of cubes
n
 x  x 
i 1
3
i
• Kurtosis – based on the sum of 4th powers
n
 x  x 
i 1
4
i
The Measure of Skewness
1 n
3
 xi  x 

n i 1
g1 
3
s
The Measure of Kurtosis
1 n
4
 xi  x 

n i 1
g2 

3
4
s
Interpretations of Measures of Shape
• Skewness
0.14
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0.12
g1 > 0
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
g1 = 0
0.1
0.08
0.06
0.04
0.02
0
0
5
10
15
20
25
0
5
10
15
20
25
g1 < 0
0
5
10
15
20
25
• Kurtosis
0.14
g2 < 0
0.12
g2 = 0
0.1
0.08
0.06
g2 > 0
0.04
0.02
0
0
-3
-2
-1
0
1
2
3
0
0
5
10
15
20
25
-3
-2
-1
0
1
2
3
Inferential Statistics
Making decisions regarding the
population base on a sample
Estimation by Confidence Intervals
• Definition
– An (100) P% confidence interval of an unknown
parameter is a pair of sample statistics (t1 and t2)
having the following properties:
1. P[t1 < t2] = 1. That is t1 is always smaller than t2.
2. P[the unknown parameter lies between t1 and t2] = P.
•
the statistics t1 and t2 are random variables
•
Property 2. states that the probability that the
unknown parameter is bounded by the two
statistics t1 and t2 is P.
Confidence Interval for a Proportion
pˆ  z / 2 pˆ
 pˆ 
p1  p 

n
pˆ 1  pˆ 
n
z / 2  upper / 2 criticalpoint
of the standard normal distribtio n
B  z / 2 pˆ  z / 2
p 1  p 
n
 z / 2
 Error Bound
pˆ 1  pˆ 
n
Determination of Sample Size
The sample size that will estimate p with an Error Bound B
and level of confidence P = 1 –  is:
za2/ 2 p * 1  p *
n
2
B
where:
• B is the desired Error Bound
• z/2 is the /2 critical value for the standard normal
distribution
• p* is some preliminary estimate of p.
Confidence Intervals for the mean
of a Normal Population, m
x  z / 2 x
or x  z / 2
or x  z / 2

n
s
n
x  sample mean
z / 2  upper / 2 criticalpoint
of the standard normal distribtio n
s  sample standard deviation  
Determination of Sample Size
The sample size that will estimate m with an Error Bound B
and level of confidence P = 1 –  is:
z 
z s *
n

2
2
B
B
2
a/2
2
2
a/2
2
where:
• B is the desired Error Bound
• z/2 is the /2 critical value for the standard normal
distribution
• s* is some preliminary estimate of s.
Hypothesis Testing
An important area of statistical
inference
Definition
Hypothesis (H)
– Statement about the parameters of the
population
• In hypothesis testing there are two
hypotheses of interest.
– The null hypothesis (H0)
– The alternative hypothesis (HA)
Type I, Type II Errors
1. Rejecting the null hypothesis when it is
true. (type I error)
2. accepting the null hypothesis when it is
false (type II error)
Decision Table showing types of Error
H0 is True
H0 is False
Accept H0
Correct
Decision
Type II
Error
Reject H0
Type I
Error
Correct
Decision
To define a statistical Test we
1. Choose a statistic (called the test statistic)
2. Divide the range of possible values for the
test statistic into two parts
• The Acceptance Region
• The Critical Region
To perform a statistical Test we
1. Collect the data.
2. Compute the value of the test statistic.
3. Make the Decision:
• If the value of the test statistic is in
the Acceptance Region we decide to
accept H0 .
• If the value of the test statistic is in
the Critical Region we decide to
reject H0 .
Probability ofhe two types of error
Definitions:
For any statistical testing procedure define
1.  = P[Rejecting the null hypothesis when it
is true] = P[ type I error]
2. b = P[accepting the null hypothesis when it
is false] = P[ type II error]
Determining the Critical Region
1. The Critical Region should consist of values of
the test statistic that indicate that HA is true.
(hence H0 should be rejected).
2. The size of the Critical Region is determined so
that the probability of making a type I error, ,
is at some pre-determined level. (usually 0.05 or
0.01). This value is called the significance level
of the test.
Significance level = P[test makes type I error]
To find the Critical Region
1. Find the sampling distribution of the test statistic
when is H0 true.
2. Locate the Critical Region in the tails (either
left or right or both) of the sampling distribution
of the test statistic when is H0 true.
Whether you locate the critical region in the left
tail or right tail or both tails depends on which
values indicate HA is true.
The tails chosen = values indicating HA.
3. the size of the Critical Region is chosen so that
the area over the critical region and under the
sampling distribution of the test statistic when is
H0 true is the desired level of  =P[type I error]
Sampling distribution
of test statistic when H0
is true
Critical Region - Area = 
The z-tests
Testing the probability of success
z
pˆ  p0
 pˆ

pˆ  p0
p0 1  p0 
n
Testing the mean of a Normal Population
z
x  m0
x

x  m0

n
 n
x  m0

x  m0
 n
s
Critical Regions for testing the probability
of success, p.
The Alternative
Hypothesis HA
The Critical Region
H A : p  p0
z   z / 2 or z  z / 2
H A : p  p0
z  z
H A : p  p0
z   z
Critical Regions for testing mean, m, of a
normal population
The Alternative
Hypothesis HA
The Critical Region
H A : m  m0
z   z / 2 or z  z / 2
H A : m  m0
z  z
H A : m  m0
z   z
• You can compare a statistical test to a meter
Value of test
statistic
Acceptance
Region
Critical
Critical
Region
Region
Critical Region is the red zone of the meter
Value of test
statistic
Acceptance
Region
Critical
Critical
Region
Region
Accept H0
Acceptance
Region
Value of test
statistic
Critical
Critical
Region
Region
Reject H0
Acceptance
Region
Critical
Region
Sometimes the critical region is
located on one side. These tests are
called one tailed tests.
Whether you use a one tailed test or a two tailed
test depends on:
1. The hypotheses being tested (H0 and HA).
2. The test statistic.
If only large positive values of the test statistic
indicate HA then the critical region should be
located in the positive tail. (1 tailed test)
If only large negative values of the test statistic
indicate HA then the critical region should be
located in the negative tail. (1 tailed test)
If both large positive and large negative values
of the test statistic indicate HA then the critical
region should be located both the positive and
negative tail. (2 tailed test)
Usually 1 tailed tests are appropriate if HA is
one-sided.
Two tailed tests are appropriate if HA is two sided.
But not always
The p-value approach to
Hypothesis Testing
Definition – Once the test statistic has been
computed form the data the p-value is defined
to be:
p-value = P[the test statistic is as or more
extreme than the observed value of
the test statistic when H0 is true]
more extreme means giving stronger evidence to
rejecting H0
Properties of the p -value
1. If the p-value is small (<0.05 or 0.01) H0 should be
rejected.
2. The p-value measures the plausibility of H0.
3. If the test is two tailed the p-value should be two
tailed.
4. If the test is one tailed the p-value should be one
tailed.
5. It is customary to report p-values when reporting
the results. This gives the reader some idea of the
strength of the evidence for rejecting H0
Summary
• A common way to report statistical tests is to
compute the p-value.
• If the p-value is small ( < 0.05 or < 0.01) then
H0 is rejected.
• If the p-value is extremely small this gives a
strong indication that HA is true.
• If the p-value is marginally above the
threshold 0.05 then we cannot reject H0 but
there would be a suspicion that H0 is false.
“Students” t-test
The Situation
• Let x1, x2, x3 , … , xn denote a sample from a
normal population with mean m and standard
deviation . Both m and  are unknown.
• Let
n
x
x
i 1
n
n
s
i
 t hesamplemean
 x  x 
i 1
2
i
n 1
 thesamplestandarddeviation
• we want to test if the mean, m, is equal to some
given value m0.
The Test Statistic
x  m0
t
s
n
The sampling distribution of the test
statistic is the t distribution with n - 1
degrees of freedom
The Alternative
Hypothesis HA
The Critical Region
H A : m  m0
t  t / 2 or t  t / 2
H A : m  m0
t  t
H A : m  m0
t  t
t and t/2 are critical values under the t
distribution with n – 1 degrees of
freedom
Critical values for the t-distribution
 or /2
0
t
t / 2 or t
Confidence Intervals
using the t distribution
Confidence Intervals for the mean of a Normal
Population, m, using the Standard Normal
distribution
x  z / 2

n
Confidence Intervals for the mean of a Normal
Population, m, using the t distribution
x  t / 2
s
n
Testing and Estimation of
Variances
Sampling Theory
The statistic
U
n
x  x 
i 1
i
2
2
n  1 s


2
2
has a c2 distribution with n – 1 degrees of freedom
Critical Points of the c2 distribution
0.2
0.1

0
0
5
c
2
10
15
20
Confidence intervals for 2 and 
Hence (1 – )100% confidence limits for 2 are:
 n  1 s 2
c / 2
2
to
 n  1 s 2
c
2
1 / 2
and (1 – )100% confidence limits for  are:
 n  1 s
c / 2
2
to
 n  1 s
c12 / 2
Testing Hypotheses for 2 and .
Suppose we want to test:
H0 :  2   02 against H A :  2   02
The test statistic:
U
2
n

1
s
 
 02
If H 0 is true the test statistic, U, has a c2 distribution
with n – 1 degrees of freedom:
Thus we reject H0 if
2
n

1
s
 

2
0
c
2
1 / 2
or
2
n

1
s
 

2
0
 c / 2
2
0.2
0.1
/2
/2
Reject
Reject
Accept
0
0
c12 / 2
5
c2 / 2
10
15
20
One-tailed Tests for 2 and .
Suppose we want to test:
H0 :  2   02 against H A :  2   02
The test statistic:
We reject H0 if
U
2
n

1
s
 
 02
 n  1 s
 02
2
 c2
0.2
0.1

Reject
Accept
0
0
5
c2
10
15
20
Or suppose we want to test:
H0 :  2   02 against H A :  2   02
2
n

1
s
 
The test statistic:
U
We reject H0 if
 n  1 s
 02
 02
2
 c12
0.2
0.1

Reject
Accept
0
0
c12
5
10
15
20
Comparing Populations
Proportions and means
Comparing proportions
Comparing two binomial probabilities p1 and p2
The test statistic
z
pˆ1  pˆ 2
1 1
pˆ 1  pˆ    
 n1 n2 
where
x1
x2
pˆ1 
, pˆ 2 
n1
n2
x1  x2
and pˆ 
n1  n2
The Critical Region
The Alternative
Hypothesis HA
The Critical Region
H A : p1  p2
z   z / 2 or z  z / 2
H A : p1  p2
z  z
H A : p1  p2
z   z
100(1 – ) % Confidence Interval for d = p1 – p2 :
pˆ1  pˆ 2  z / 2
pˆ1 1  pˆ1  pˆ 2 1  pˆ 2 

n1
n2
or pˆ1  pˆ 2  B
where B  z 2
p1 1  p1  p2 1  p2 

n1
n2
 pˆ  pˆ
1
2
Sample size determination
Confidence Interval for d = p1 – p2 :
pˆ1  pˆ 2  B
where B  z 2
p1 1  p1  p2 1  p2 

n1
n2
Again we want to choose n1 and n2 to set B
at some predetermined level with a fixed
level of confidence 1 – .
Special solutions - case 1: n1 = n2 = n.
then
n1  n2  n  z / 2
2
p1111 p11  p2 11 p22 
BB22
Special solutions - case 2: Choose n1 and n2 to
minimize N = n1 + n2 = total sample size
then
z2 / 2
n1  2
B
p 11 p  
z2 / 2
n2  2
B
p 11p 
11
22
11
22

p11 1  p11p22 1  p22


p1 11 p11 p2 11 p22 
Special solutions - case 3: Choose n1 and n2 to minimize
C = C0 + c1 n1 + c2 n2 = total cost of the study
Note: C0 = fixed (set-up) costs
c1 = cost per unit in population 1
c2 = cost per unit in population 2
then
z2 / 2
n1  2
B


cc2 2
 1p111 1p12p211 2p2 
p1111p11
c1c1


z2 / 2
n2  2
B


cc1 1
1p1111p1 2p211 2p2
p221 p22
cc2 2


Comparing Means
The z-test
z
xy

2
1
n


2
2

m
n and m large
xy
2
x
2
y
s
s

n m
Confidence Interval for d = m1 – m2 :

x1  x2  z 2
2
1
n1


n2
or x1  x2  B
where B  z 2

2
1
n1


2
2
2
2
n2
Sample size determination
The sample sizes required, n1 and n2, to estimate m1 – m2 within
an error bound B with level of confidence 1 –  are:
Equal sample sizes
n  n1  n2  z2 / 2
 1x2   2y2 
2
B
Minimizing the total sample size N = n1 + n2 .
z2 / 2 2
n1  2  1x  1x 2y 
B
z2 / 2 2
n2  2  x2  1x 2y 
B
Minimizing the total cost C = C0 + c1n1 + c2n2 .
z2 / 2
n1  2
B
 2

c2
1x y2
1x 
c1


z2 / 2
n2  2
B
 2

c1
1x 2y 
 2y 
c2


The t test – for comparing means –
small samples (equal variances)
Situation
• We have two normal populations (1 and 2)
• Let m1 and  denote the mean and standard
deviation of population 1.
• Let m2 and  denote the mean and standard
deviation of population 1.
• Note: we assume that the standard deviation
for each population is the same.
1 = 2 = 
The t test
for comparing means – small samples (equal variances)
xy
t
1 1
sPooled

n m
sPooled 
2
2
n

1
s

m

1
s
  x 
 y
nm2
The Alternative
Hypothesis HA
The Critical Region
H A : m1  m2
t  t / 2 or t  t / 2
H A : m1  m2
t  t
H A : m1  m2
t  t
t / 2 and t
are critical points under the t distribution with
degrees of freedom n + m –2.
Confidence intervals for the difference in two
means of normal populations (small sample sizes
equal variances)
(1 – )100% confidence limits for m1 – m2
1 1

n m
 x  y   t / 2 sPooled
where
sPooled 
 n  1 s
and df  n  m  2
2
x
  m  1 s
nm2
2
y
Tests, Confidence intervals
for the difference in
two means of normal populations
(small sample sizes, unequal variances)
The approximate test for a comparing two means
of Normal Populations (unequal variances)
2 2
2
 sx s y 
Test statistic
  
n m
xy

df 
t
2
2 2
2
2 2
sx s y
1  sx 
1  sy 


 


n m
n 1 n  m 1 m 
Null Hypothesis
H0: m1 = m2
Alt. Hypothesis
H0: m1 ≠ m2
H0: m1 > m2
H0: m1 < m2
Critical Region
t < -t/2 or t > t/2
t > t
t < -t
Confidence intervals for the difference in two
means of normal populations (small samples,
unequal variances)
(1 – )100% confidence limits for m1 – m2
 x  y   t / 2
with
df 
s
s

n m
s
s 
  
 n m
2
x
2
y
2
x
2
y
2
1 s 
1 s 

 


n 1 n  m 1 m 
2
x
2
2
y
2
The paired t-test
An example of improved
experimental design
The matched pair experimental design (The paired
sample experiment)
Prior to assigning the treatments the subjects are grouped
into pairs of similar subjects.
Suppose that there are n such pairs (Total of 2n = n + n
subjects or cases), The two treatments are then randomly
assigned to each pair. One member of a pair will receive
treatment 1, while the other receives treatment 2. The data
collected is as follows:
– (x1, y1), (x2 ,y2), (x3 ,y3),, …, (xn, yn) .
xi = the response for the case in pair i that receives
treatment 1.
yi = the response for the case in pair i that receives
treatment 2.
Let xi = the measurement of the response for the subject
in pair i that received treatment 1.
Let yi = the measurement of the response for the subject
in pair i that received treatment 2.
The data
x1
y1
x2
y2
x3
y3
…
xn
yn
To test H0: m1 = m2 is equivalent to testing H0: md = 0.
(we have converted the two sample problem into a single
sample problem).
The test statistic is the single sample t-test on the
differences
d1, d2, d3 , … , dn
namely
d 0
td 
sd n
df = n - 1
d  themean of thed i' s and
sd  thestd. dev. of thed i' s
Testing for the equality of
variances
The F test
The test statistic (F)
s
F
s
2
x
2
y
or
s y2
1
 2
F sx
The sampling distribution of the test statistic
If the Null Hypothesis (H0) is true then the sampling
distribution of F is called the F-distribution with
n1 = n - 1 degrees in the numerator
and
n2 = m - 1 degrees in the denominator
The F distribution
n1 = n - 1 degrees in the numerator
0.7
n2 = m - 1 degrees in the denominator
0.6
0.5
0.4
0.3
0.2

0.1
0
0
1
2
3
F(n1, n2)
4
5
Critical region for the test:
H0 :  x2   y2 against H A :  x2   y2
(Two sided alternative)
Reject H0 if
or
2
x
2
y
s
F   F / 2  n  1, m  1
s
2
y
2
x
1 s
  F / 2  m  1, n  1
F s
Critical region for the test (one tailed):
H0 :    against H A :   
2
x
2
y
2
x
(one sided alternative)
Reject H0 if
sx2
F  2  F  n  1, m  1
sy
2
y
Summary of Tests
One Sample Tests
Situation
Test Statistic
Sample form the Normal
distribution with unknown
mean and known variance
(Testing m)
z
Sample form the Normal
distribution with unknown
mean and unknown variance
(Testing m)
Testing of a binomial
probability 
Sample form the Normal
distribution with unknown
mean and unknown variance
(Testing )
t
z
n x  m0 
H0
m  m
0
n x  m0 
s
pˆ  p0
p0 (1  p0 )
n

n  1s 2
U
 02
m  m
  
p=
p0
 0
HA
m  m
m  m
m  m
m  m
m  m
m  m
p ≠p0 
p >p0 

p <
p0
  0
Critical Region
z < -z/2 or z > z/2
z > z
z <-z
t < -t/2 or t > t/2
t > t
t < -t
z < -z/2 or z > z/2
z > z
z < -z
U  c12 / 2 n  1 or
  0
U  c2 n  1
  0
U  c12 n  1
U  c2 / 2 n  1
Two Sample Tests
Situation
Two independent samples
from the Normal distribution
with unknown means and
known variances
(Testing m1 - m2)
Test Statistic
x1  x2 
z
 12
n1

H0
HA
Critical Region
m1  m 2 m1  m2 z < -z/2 or z > z/2
 22
m1  m 2 z > z
n2
m1  m2 z < -z
Two independent samples
from the Normal distribution
with unknown means and
unknown but equal
variances.
(Testing m1 - m2)
t
sp
sp 
Estimation of a the
difference between two
binomial probabilities, p1-p2
x1  x2 
zz 
m1  m 2
1 1

n1 n2
m1  m2 t < -t
nm2
 11 11 
ˆ
pˆˆ (1


)
ˆ
1  p  n1 n2  
 n1 n2 
df  n  m  2
m1  m 2 t > t
n  1s12  m  1s22
pˆˆ11 ˆpˆ2 2
m1  m2 t < -t/2 or t > t/2
p11  p22
df  n  m  2
df  n  m  2
p11 
 p2 z < -z/2 or z > z/2
2
p11  p22 z > z
p1  p22
z < -z
Two Sample Tests - continued
Situation
Two independent
Normal samples
with unknown means
and variances
(unequal)
Two independent
Normal samples
with unknown means
and variances
(unequal)
Test statistic
t
x1  x2
H0
HA
m1  m2 m1 ≠ m2
s12 s22

n1 n2
* = df 
t > t df = *
m1 < m2
t < - t df = *
F > F(n-1, m -1)
1 < 2
1/F > F(m-1, n -1)
2
2
1 s 
1  s 2y 

 


n2  1  nm2 
n1 1  n 1 m
2
F > F/2(n-1, m -1) or 1/F > F/2(m-1, n -1)
1 > 2
 s1x2 s22y 
  
 n1 nm2 
2
1x
t < - t/2 or t > t/2 df = *
m1 > m2
1  2 1 ≠ 2
s12
1 s22
F  2 or

s2
F s12
Critical Region
2
The paired t test
Situation
n matched pair of
subjects are treated
with two treatments.
di = xi – yi has mean
d = m1 – m2
Test statistic
t
H0
HA
Critical Region
m1  m2 m1 ≠ m2
d
sd
n
Independent
Treat Treat
samples
2
1
Possibly
equal
numbers
t < - t/2 or t > t/2 df = n - 1
m1 > m2
t > t df = n - 1
m1 < m2
t < - t
df = n - 1
Matched Pairs
Treat
Treat
2
1
Pair
1Pair
2Pair
3
Pair
n
Comparing k Populations
Means – One way Analysis of
Variance (ANOVA)
The F test
The F test – for comparing k means
Situation
• We have k normal populations
• Let mi and  denote the mean and standard
deviation of population i.
• i = 1, 2, 3, … k.
• Note: we assume that the standard deviation
for each population is the same.
1 = 2 = … = k = 
We want to test
H 0 : m1  m2  m3    mk
against
H A : mi  m j for at least one pair i, j
To test
H 0 : m1  m2  m3    mk
against H A : mi  m j for at least one pair i, j
use the test statistic
2
Between
2
Pooled
s
F
s

k
2


 ni xi  x
k 1
i 1
k


2
ni  1si   ni  k 

i 1
 i 1

where xi  mean for the ith sample.
th
si  standard deviation for the i sample
n1 x1   nk xk
x
 overall mean
n1   nk
k
k
the statistic
n x  x
i
i 1
2
i
is called the Between Sum of Squares and is
denoted by SSBetween
It measures the variability between samples
k – 1 is known as the Between degrees of
freedom and k
 n  x  x   k  1
2
i 1
i
i
is called the Between Mean Square and is
denoted by MSBetween
k
2
n

1
s
 i  i
the statistic
i 1
is called the Within Sum of Squares and is
denoted by SSWithin
k
n  k  N  k
i 1
i
is known as the Within degrees of freedom and
k
  n 1 s
i 1
i
2
i
 k

  ni  k 
 i 1

is called the Within Mean Square and is denoted
by MSWithin
then
F  MSBetween MSWithin
The Computing formula for F:
Compute
ni
1)
2)
Ti   xij  T otalfor sample i
j 1
k
k
G  Ti   xij  Grand T otal
i 1
k
3)
i 1
ni
 x
ij
i 1 j 1
k
5)
i 1 j 1
N   ni  T otalsamplesize
k
4)
ni
2
Ti

i 1 ni
2
Then
1)
3)
2
Ti G
SSBetween   
N
i 1 ni
k
2)
2
k
ni
k
2
Ti
SSWithin   xij  
i 1 j 1
i 1 ni
2
SSBetween k  1
F
SSWithin N  k 
The critical region for the F test
We reject
H 0 : m1  m2  m3    mk
if
F  F
F is the critical point under the F distribution
with n1 = k - 1degrees of freedom in the
numerator and n2 = N – k degrees of freedom in
the denominator
The ANOVA Table
A convenient method for displaying
the calculations for the F-test
Anova Table
Source
d.f.
Sum of
Squares
Between
k-1
SSBetween
Mean
Square
MSBetween
Within
N-k
SSWithin
MSWithin
Total
N-1
SSTotal
F-ratio
MSB /MSW
Fishers LSD (least significant difference)
procedure:
1. Test H0: m1 = m2 = m3 = … = mk against HA:
at least one pair of means are different,
using the ANOVA F-test
2. If H0 is accepted we know that all means
are equal (not significantly different). Then
stop in this case
3. If H0 is rejected we conclude that at least
one pair of means is significantly different,
then follow this by
• using two sample t tests to determine which pairs
means are significantly different
Comparing k Populations
Proportions
The c2 test for independence
1. The no. of populations (columns) k (or c)
2. The number of categories (rows) from 2 to r.
1
2
c
Total
1
x11
x12
R1
2
x21
x22
R2
Rr
Total
C1
C2
Cc
N
The c2 test for independence
Situation
•
•
•
•
We have two categorical variables R and C.
The number of categories of R is r.
The number of categories of C is c.
We observe n subjects from the population
and count xij = the number of subjects for
which R = i and C = j.
• R = rows, C = columns
The c2 test for independence
Define
c
Ri   xij  i th row T otal
j 1
c
Ci   xij  j
th
columnT otal
i 1
Eij 
Ri C j
n
= Expected frequency in the (i,j) th cell in
the case of independence.
Then to test
H0: R and C are independent
against
HA: R and C are not independent
Use test statistic
r
c
c  
2
i 1 j 1
x
ij
 Eij 
2
Eij
Eij= Expected frequency in the (i,j) th cell 
in the case of independence.
xij= observed frequency in the (i,j) th cell
Ri C j
n
Sampling distribution of test statistic when H0 is
true
r
c
c  
2
x
ij
 Eij 
2
Eij
i 1 j 1
- c2 distribution with degrees of
freedom n = (r - 1)(c - 1)
Critical and Acceptance Region
Reject H0 if :
c 2  c2
Accept H0 if :
c  c
2
2
Linear Regression
Hypothesis testing and Estimation
Assume that we have collected data on two
variables X and Y. Let
(x1, y1) (x2, y2) (x3, y3) … (xn, yn)
denote the pairs of measurements on the on
two variables X and Y for n cases in a sample
(or population)
The Statistical Model
Each yi is assumed to be randomly generated from
a normal distribution with
mean mi =  + bxi and
standard deviation .
(, b and  are unknown)
slope = b
yi

 + b xi

xi
Y =  + bX
The Data
The Linear Regression Model
• The data falls roughly about a straight line.
160
Y =  + bX
140
120
100
unseen
80
60
40
20
0
40
60
80
100
120
140
The Least Squares Line
Fitting the best straight line
to “linear” data
Let
Y=a +bX
denote an arbitrary equation of a straight line.
a and b are known values.
This equation can be used to predict for each value
of X, the value of Y.
For example, if X = xi (as for the ith case) then the
predicted value of Y is:
yˆi  a  bxi
The residual
ri  yi  yˆi  yi  a  bxi 
can be computed for each case in the sample,
r1  y1  yˆ1, r2  y2  yˆ 2 ,, rn  yn  yˆ n ,
The residual sum of squares (RSS) is
n
n
n
RSS   ri    yi  yˆ i     yi  a  bxi 
2
i 1
i 1
2
i 1
a measure of the “goodness of fit of the line
Y = a + bX to the data
2
The optimal choice of a and b will result in
the residual sum of squares
n
n
n
RSS   ri    yi  yˆ i     yi  a  bxi 
2
i 1
i 1
2
i 1
attaining a minimum.
If this is the case than the line:
Y = a + bX
is called the Least Squares Line
2
Comments
• b and  are the slope and intercept of the
regression line (unseen)
• b and a are the slope and intercept of the
least squares line (calculated from the
data
bˆ and ˆ are sometimes used in place of b and a
• They represent the same quantities
The equation for the least squares line
n
Let
2
S xx   xi  x 
i 1
n
S yy    yi  y 
2
i 1
n
S xy   xi  x  yi  y 
i 1
Computing Formulae:
2


  xi 
n
n
2
i 1


2
S xx   xi  x    xi 
n
i 1
i 1
2
n


  yi 
n
n
2
S yy    yi  y    yi2   i 1 
n
i 1
i 1
n
n
S xy   xi  x  yi  y 
i 1
 n  n 
  xi   yi 
n
i 1
i 1




  xi yi 
n
i 1
Then the slope of the least squares line can be
shown to be:
n
b
S xy
S xx

 x  x  y
i
i 1
n
 y
i
 x  x 
i 1
2
i
and the intercept of the least squares line can
be shown to be:
a  y  bx  y 
S xy
S xx
x
The residual sum of Squares
n
n
RSS    yi  yˆi     yi   a  bxi  
2
i 1
 S xy 
 S yy 
S xx
2
i 1
2
Computing
formula
Estimating , the standard deviation in the
regression model :
n
s
y
i 1
i
 yˆ i 
n2
n
2

  y  a  bx 
2
i
i 1
i
n2

S xy 
1 

 S yy 
n  2 
S xx
2



Computing
formula
This estimate of  is said to be based on n – 2
degrees of freedom
Sampling distributions of the
estimators
The sampling distribution slope of the least
squares line :
n
b
S xy
S xx

 x  x  y
i
i 1
n
 y
i
 x  x 
2
i
i 1
It can be shown that b has a normal
distribution with mean and standard deviation
mb  b and  b 

S xx


n
 x  x 
i 1
2
i
Thus
z
b  mb
b

bb

S xx
has a standard normal distribution, and
b  mb
bb
t

s
sb
S xx
has a t distribution with df = n - 2
(1 – )100% Confidence Limits for slope b :
bˆ  t
 /2
s
S xx
t/2 critical value for the t-distribution with n – 2
degrees of freedom
Testing the slope
H0 : b  b0 vs H A : b  b0
The test statistic is:
b  b0
t
s
S xx
- has a t distribution with df = n – 2 if H0 is true.
The Critical Region
Reject
H0 : b  b0 vs H A : b  b0
if
b  b0
t
 t / 2 or t  t / 2
s
S xx
df = n – 2
This is a two tailed tests. One tailed tests are
also possible
The sampling distribution intercept of the
least squares line :
a  ˆ  y  bx  y 
S xy
S xx
x
It can be shown that a has a normal
distribution with mean and standard deviation
1
ma   and  a  

n
x
n
2
 x  x 
i 1
2
i
Thus
z
a  ma
a
a 

1


n
x
2
n
 x  x 
i
i 1
has a standard normal distribution and
a  ma
t

sa
a 
1
s

n
x
2
n
 x  x 
i 1
i
has a t distribution with df = n - 2
2
2
(1 – )100% Confidence Limits for intercept  :
2
1 x
ˆ  t / 2 s

n S xx
t/2 critical value for the t-distribution with n – 2
degrees of freedom
Testing the intercept
H0 :   0 vs H A :   0
The test statistic is:
t
1
s

n
a  0
x
n
2
 x  x 
i 1
2
i
- has a t distribution with df = n – 2 if H0 is true.
The Critical Region
Reject
H0 :   0 vs H A :   0
if
a  0
t
 t / 2 or t  t / 2
sa
df = n – 2
Confidence Limits for Points on the
Regression Line
• The intercept  is a specific point on the
regression line.
• It is the y – coordinate of the point on the
regression line when x = 0.
• It is the predicted value of y when x = 0.
• We may also be interested in other points on the
regression line. e.g. when x = x0
• In this case the y – coordinate of the point on the
regression line when x = x0 is  + b x0
y=+bx
 + b x0
x0
(1- )100% Confidence Limits for  + b x0 :
1 x0  x 
a  bx0  t / 2 s

n
S xx
2
t/2 is the /2 critical value for the t-distribution with
n - 2 degrees of freedom
Prediction Limits for new values of the
Dependent variable y
• An important application of the regression line is
prediction.
• Knowing the value of x (x0) what is the value of y?
• The predicted value of y when x = x0 is:
yˆ    bx0
• This in turn can be estimated by:.
yˆ  ˆ  bˆx0  a  bx0
The predictor
yˆ  ˆ  bˆx0  a  bx0
• Gives only a single value for y.
• A more appropriate piece of information would
be a range of values.
• A range of values that has a fixed probability of
capturing the value for y.
• A (1- )100% prediction interval for y.
(1- )100% Prediction Limits for y when x = x0:
1  x0  x 
a  bx0  t / 2 s 1  
n
S xx
2
t/2 is the /2 critical value for the t-distribution with
n - 2 degrees of freedom
Correlation
Definition
The statistic:
n
r
S xy
S xx S yy

 x  x  y  y 
i 1
n
i
i
n
 x  x    y  y 
i 1
2
i
i 1
2
i
is called Pearsons correlation coefficient
Properties
1. -1 ≤ r ≤ 1, |r| ≤ 1, r2 ≤ 1
2. |r| = 1 (r = +1 or -1) if the points
(x1, y1), (x2, y2), …, (xn, yn) lie along a
straight line. (positive slope for +1,
negative slope for -1)
The test for independence (zero correlation)
H0: X and Y are independent
HA: X and Y are correlated
The test statistic:
t  n2
The Critical region
r
1 r
2
Reject H0 if |t| > ta/2 (df = n – 2)
This is a two-tailed critical region, the critical
region could also be one-tailed
Spearman’s rank
correlation coefficient
r (rho)
Spearman’s rank correlation coefficient
r (rho)
Spearman’s rank correlation coefficient is
computed as follows:
• Arrange the observations on X in increasing
order and assign them the ranks 1, 2, 3, …, n
• Arrange the observations on Y in increasing
order and assign them the ranks 1, 2, 3, …, n.
•For any case (i) let (xi, yi) denote the
observations on X and Y and let (ri, si) denote
the ranks on X and Y.
Spearman’s rank correlation coefficient
is defined as follows:
For each case let di = ri – si = difference in the
two ranks.
Then Spearman’s rank correlation coefficient
(r) is defined as follows:
n
r  1
6 d
i 1
2
2
i
n n  1
Properties of Spearman’s rank
correlation coefficient r
1. The value of r is always between –1 and +1.
2. If the relationship between X and Y is positive, then
r will be positive.
3. If the relationship between X and Y is negative,
then r will be negative.
4. If there is no relationship between X and Y, then r
will be zero.
5. The value of r will be +1 if the ranks of X
completely agree with the ranks of Y.
6. The value of r will be -1 if the ranks of X are in
reverse order to the ranks of Y.
Relationship between Regression
and Correlation
Recall
r
S xy
S xx S yy
Also
bˆ 
since
S xy
S xx

sy
sx
r
S yy
S xx
sx 
and s y 
n 1
n 1
Thus the slope of the least squares line is simply the ratio
of the standard deviations × the correlation coefficient
The coefficient of Determination
Sums of Squares associated with
Linear Regresssion
n
n
n
RSS   ri    yi  yˆ i     yi  a  bxi 
2
2
i 1
i 1
i 1
= SSunexplained
n
SSTotal    yi  y 
2
i 1
n
SSExplained    yˆ i  y 
i 1
2
2
It can be shown:
n
n
n
  y  y     yˆ  y     y  yˆ 
2
i 1
i
i 1
2
i
i 1
2
i
SSTotal  SSExplained  SSUn explained
(Total variability in Y) = (variability in Y
explained by X) + (variability in Y
unexplained by X)
i
It can also be shown:
n
r 
2
2
ˆ
  yi  y 
i 1
n
y
i 1
i
 y
2
= proportion variability in Y explained by X.
= the coefficient of determination
Further:
n
1 r 
2
2
ˆ
  yi  yi 
i 1
n
y
i 1
 y
2
i
= proportion variability in Y that is
unexplained by X.
Regression (in general)
In many experiments we would have collected data on a
single variable Y (the dependent variable ) and on p (say)
other variables X1, X2, X3, ... , Xp (the independent variables).
One is interested in determining a model that describes the
relationship between Y (the response (dependent) variable)
and X1, X2, …, Xp (the predictor (independent) variables.
This model can be used for
– Prediction
– Controlling Y by manipulating X1, X2, …, Xp
The Model:
is an equation of the form
Y = f(X1, X2,... ,Xp | q1, q2, ... , qq) + e
where q1, q2, ... , qq are unknown parameters
of the function f and e is a random disturbance
(usually assumed to have a normal
distribution with mean 0 and standard
deviation .
The Multiple Linear
Regression Model
In Multiple Linear Regression we assume the
following model
Y = b0 + b1 X1 + b2 X2 + ... + bp Xp + e
This model is called the Multiple Linear Regression
Model.
Again are unknown parameters of the model and
where b0, b1, b2, ... , bp are unknown parameters and
e is a random disturbance assumed to have a normal
distribution with mean 0 and standard deviation .
Summary of the Statistics
used in
Multiple Regression
The Least Squares Estimates:
b0 , b1 , b2 , , b p ,
- the values that minimize
n
RSS    yi  yˆi 
i 1
n

2
  yi   b0  b1 x1i  b2 x2i 
i 1
 b p x pi 

2
The Analysis of Variance Table Entries
a) Adjustedn Total Sum of Squares (SSTotal)
SSTotal 
 y  y . d.f.  n  1
_ 2
i
i1
b) Residual Sum of Squares (SSError)
n
RSS  SSError 

2
yi  yˆ i  . d.f.  n  p  1
i1
c) Regression Sum of Squares (SSReg)
n
SSReg  SSb 1 ,b 2 , ... , b p  
Note:

_ 2
ˆ
yi  y . d.f. p
i1
n

i1
n
_ 2
y i  y  

n
_ 2
yˆ i  y 
i1
i.e. SSTotal = SSReg +SSError

i1
2
yi  yˆ i  .
The Analysis of Variance Table
Source
Sum of Squares d.f.
Regression
Error
SSReg
SSError
Total
SSTotal
Mean Square
p
SSReg/p = MSReg
n-p-1 SSError/(n-p-1) =MSError = s2
n-1
F
MSReg/s2
Uses:
1. To estimate 2 (the error variance).
- Use s2 = MSError to estimate 2.
2. To test the Hypothesis
H0: b1 = b2= ... = bp = 0.
Use the test statistic
F  MSReg MSError  MSReg s
  SSReg p   SS Error
2
 n  p  1
- Reject H0 if F > F(p,n-p-1).
3. To compute other statistics that are useful in describing
the relationship between Y (the dependent variable) and
X1, X2, ... ,Xp (the independent variables).
a)R2 = the coefficient of determination
= SSReg/SSTotal
n
yˆ i  y


= i 1
2
n
2

y

y

 i
i 1
= the proportion of variance in Y explained by
X1, X2, ... ,Xp
1 - R2 = the proportion of variance in Y
that is left unexplained by X1, X2, ... , Xp
= SSError/SSTotal.
b) Ra2 = "R2 adjusted" for degrees of freedom.
= 1 -[the proportion of variance in Y that is left
unexplained by X1, X2,... , Xp adjusted for d.f.]
 1  MSError MSTotal
 1
SSError  n  p  1
SSTotal  n  1
n  1 SSError

 1
 n  p 1 SSTotal
n  1

1  R2 
 1
 n  p 1
c) R= R2 = the Multiple correlation coefficient of
Y with X1, X2, ... ,Xp
=
SS Re g
SS Total
= the maximum correlation between Y and a
linear combination of X1, X2, ... ,Xp
Comment: The statistics F, R2, Ra2 and R are
equivalent statistics.
Logistic regression
The dependent variable y is binary.
It takes on two values “Success” (1) or
“Failure” (0)
We are interested in predicting a y from a
continuous dependent variable x.
This is the situation in which Logistic
Regression is used
The logisitic Regression Model
Let p denote P[y = 1] = P[Success].
This quantity will increase with the value of x.
p
is called the odds ratio
The ratio:
1 p
This quantity will also increase with the value of
x, ranging from zero to infinity.
 p 
The quantity: ln 

 1 p 
is called the log odds ratio
The logisitic Regression Model
Assumes the log odds ratio is linearly
related to x.
 p 
i. e. :
ln 
  b0  b1 x
 1 p 
In terms of the odds ratio
p
b0  b1 x
e
1 p
The logisitic Regression Model
Solving for p in terms x.
b0  b1 x
e
p
b0  b1 x
1 e
Interpretation of the parameter b0
(determines the intercept)
1
0.8
p
0.6
b0
0.4
e
b0
1 e
0.2
0
0
2
4
x
6
8
10
Interpretation of the parameter b1
(determines when p is 0.50 (along with b0))
1
b0  b1 x
0.8
p
e
1
1
p


b0  b1 x
1 e
11 2
when
0.6
0.4
b0
b0  b1 x  0 or x  
b1
0.2
0
0
2
4
x
6
8
10
Interpretation of the parameter b1
(determines slope when p is 0.50 )
1
0.8
p
0.6
slope 
0.4
b1
4
0.2
0
0
2
4
x
6
8
10
The Multiple Logistic Regression
model
Here we attempt to predict the outcome of a
binary response variable Y from several
independent variables X1, X2 , … etc
 p 
ln 
  b0  b1 X1 
 1 p 
 bp X p
b0  b1 X1   b p X p
e
or p 
b0  b1 X1 
1 e
b p X p
Nonparametric
Statistical Methods
Definition
When the data is generated from process
(model) that is known except for finite
number of unknown parameters the model is
called a parametric model.
Otherwise, the model is called a nonparametric model
Statistical techniques that assume a nonparametric model are called non-parametric.
Nonparametric
Statistical Methods
The sign test
A nonparametric test for the central
location of a distribution
To carry out the The Sign test:
1. Compute the test statistic:
S = the number of observations that exceed m0
= sobserved
2.
Compute the p-value of test statistic, sobserved :
p-value = P [S ≥ sobserved ]
( = 2 P [S ≥ sobserved ] for 2-tailed test)
where S is binomial, n = sample size, p = 0.50
3.
Reject H0 if p-value low (< 0.05)
Sign Test for Large Samples
If n is large we can use the Normal approximation to the
Binomial.
Namely S has a Binomial distribution with p = ½ and n =
sample size.
Hence for large n, S has approximately a Normal
distribution with
mean
and
n
m S  np 
2
standard deviation  S  npq  n 1  1   n
2
 2  2 
Hence for large n,use as the test statistic (in
place of S)
n
z
S  mS
S

S
n
2
2
Choose the critical region for z from the
Standard Normal distribution.
i.e. Reject H0 if z < -z/2 or z > z/2
two tailed ( a one tailed test can also be set up.
Nonparametric
Confidence Intervals
Assume that the data, x1, x2, x3, … xn is a sample
from an unknown distribution.
Now arrange the data x1, x2, x3, … xn in increasing
order
x(1) < x(2) < x(3) < … < x(n)
Hence
x(1) = the smallest observation
x(2) = the 2nd smallest observation
x(n) = the largest observation
Consider the kth smallest observation and the kth
largest observation in the data x1, x2, x3, … xn
x(k) and x(n – k + 1)
Hence
P[x(k) < median < x(n – k + 1) ]
= P[k ≤ the no. of obs greater than the median ≤ n-k]
= p(k) + p(k + 1) + … + p(n-k) = P
where p(i)’s are binomial probabilities with
n = the sample size and p =1/2.
This means that x(k) to x(n – k + 1) is a P(100)%
confidence interval for the median
Choose k so that P = p(k) + p(k + 1) + … + p(n-k) is
close to .95 (or 0.99)
Summarizing
x(k) to x(n – k + 1)
is a P(100)% confidence interval for the
median
where P = p(k) + p(k + 1) + … + p(n-k)
and p(i)’s are binomial probabilities with
n = the sample size and p =1/2.
For large values of n one can use the normal
approximation to the Binomial to find the value of k so
that x(k) to x(n – k + 1) is a 95% confidence interval for the
median.
n  1.96 n
k
2
n  1.96 n
Using k 
2
n
20
40
100
200
k
5.6
13.8
40.2
86.1
The Wilcoxon Signed Rank Test
An Alternative to the sign test
• For Wicoxon’s signed-Rank test we would
assign ranks to the absolute values of (x1 – m0 , x2
– m0 , … , xn – m0).
• A rank of 1 to the value of xi – m0 which is
smallest in absolute value.
• A rank of n to the value of xi – m0 which is
largest in absolute value.
W+ = the sum of the ranks associated with
positive values of xi – m0.
W- = the sum of the ranks associated with negative
values of xi – m0.
To carry out Wilcoxon’s signed rank test
We
1. Compute T = W+ or W- (usually it would be the smaller
of the two)
2. Let tobserved = the observed value of T.
3. Compute the p-value = P[T ≤ tobserved] (2 P[T ≤ tobserved]
for a two-tailed test).
i.
ii.
4.
For n ≤ 12 use the table.
For n > 12 use the Normal approximation.
Conclude HA (Reject H0) if p-value is less than 0.05 (or
0.01).
For sample sizes, n > 12 we can use the fact that
T (W+ or W-)
has approximately a normal distribution with
nn  1
mean mT 
4
nn  12n  1
standarddeviation  T 
24
 T  mT t  mT 

t  mT 
and PT  t   P 

  P Z 

T 
T 
 T

Comments
The t – test
1.
i.
ii.
This test requires the assumption of normality.
If the data is not normally distributed the test is invalid
•
iii.
2.
The probability of a type I error may not be equal to its desired value
(0.05 or 0.01)
If the data is normally distributed, the t-test commits type II errors
with a smaller probability than any other test (In particular
Wilcoxon’s signed rank test or the sign test)
The sign test
i.
ii.
This test does not require the assumption of normality (true also for
Wilcoxon’s signed rank test).
This test ignores the magnitude of the observations completely.
Wilcoxon’s test takes the magnitude into account by ranking them
Two-sample – Non-parametic
tests
Mann-Whitney Test
A non-parametric two sample test for
comparison of central location
The Mann-Whitney Test
• This is a non parametric alternative to the two
sample t test (or z test) for independent samples.
• These tests (t and z) assume the data is normal
• The Mann- Whitney test does not make this
assumption.
• Sample of n from population 1
x1, x2, x3, … , xn
• Sample of m from population 2
y1, y2, y3, … , ym
The Mann-Whitney test statistics U1 and U2
Arrange the observations from the two samples
combined in increasing order (retaining sample
membership) and assign ranks to the observations.
Let W1 = the sum of the ranks for sample 1.
Let W2 = the sum of the ranks for sample 2.
Then
U1  nm 
and
n  n  1
U 2  nm 
 W1
2
m  m  1
2
 W2
• The distribution function of U (U1 or U2) has
been tabled for various values of n and m (<n)
when the two observations are coming from
the same distribution.
• These tables can be used to set up critical
regions for the Mann-Whitney U test.
The Mann-Whitney test for large
samples
For large samples (n > 10 and m >10) the
statistics U1 and U2 have approximately a
Normal distribution with mean and standard
nm
deviation mU 
i
2
U 
i
nm  n  m  1
12
Thus we can convert Ui to a standard normal
statistic
nm
Ui 
U i  mUi
2
z

 Ui
nm  n  m  1
12
And reject H0 if z < -z/2 or z > z/2 (for a two
tailed test)
The Kruskal Wallis Test
• Comparing the central location for k populations
• An nonparametric alternative to the one-way
ANOVA F-test
Situation:
Data is collected from k populations.
The sample size from population i is ni.
The data from population i is:
xi1, xi 2 ,
, xini
i  1, 2, .k
The computation of The Kruskal-Wallis statistic
We group the N = n1 + n2 + … + nk observation from
k populations together and rank these observations
from 1 to N.
Let rij be the rank associated with with the observation
xij.
Handling of “tied” observations
If a group of observations are equal the ranks that
would have been assigned to those observations are
averaged
The Kruskal-Wallis statistic
k
2
i
U
12
K
 3  N  1

N  N  1 i 1 ni
ni
where
Ui   rij  ri1 
j 1
 rini
= the sum of the ranks for the ith
sample
The Kruskal-Wallis test
Reject
H0: the k populations have same
central location
if K  c with d. f .  k 1
2
Probability Theory
Probability – Models for random
phenomena
Definitions
The sample Space, S
The sample space, S, for a random phenomena
is the set of all possible outcomes.
An Event , E
The event, E, is any subset of the sample space,
S. i.e. any set of outcomes (not necessarily all
outcomes) of the random phenomena
Venn
diagram
S
E
The event, E, is said to have occurred if after
the outcome has been observed the outcome lies
in E.
S
E
Set operations on Events
Union
Let A and B be two events, then the union of A
and B is the event (denoted by AB) defined by:
A  B = {e| e belongs to A or e belongs to B}
AB
A
B
The event A  B occurs if the event A occurs or
the event and B occurs .
AB
A
B
Intersection
Let A and B be two events, then the intersection
of A and B is the event (denoted by AB) defined
by:
A  B = {e| e belongs to A and e belongs to B}
AB
A
B
The event A  B occurs if the event A occurs and
the event and B occurs .
AB
A
B
Complement
Let A be any event, then the complement of A
(denoted by A ) defined by:
A = {e| e does not belongs to A}
A
A
The event A occurs if the event A does not
occur
A
A
In problems you will recognize that you are
working with:
1. Union if you see the word or,
2. Intersection if you see the word and,
3. Complement if you see the word not.
Definition: mutually exclusive
Two events A and B are called mutually
exclusive if:
A B  
A
B
If two events A and B are are mutually
exclusive then:
1. They have no outcomes in common.
They can’t occur at the same time. The outcome
of the random experiment can not belong to both
A and B.
A
B
Rules of Probability
The additive rule
P[A  B] = P[A] + P[B] – P[A  B]
and
P[A  B] = P[A] + P[B] if A  B = 
The Rule for complements
for any event E
P  E   1  P  E 
Conditional probability
P  A B  
P  A  B
P  B
The multiplicative rule of probability

 P  A P  B A if P  A  0
P  A  B  


P
B
P
A
B
if
P
B

0








and
P A  B  P  A P  B
if A and B are independent.
This is the definition of independent
Counting techniques
Summary of counting rules
Rule 1
n(A1  A2  A3  …. ) = n(A1) + n(A2) + n(A3) + …
if the sets A1, A2, A3, … are pairwise mutually exclusive
(i.e. Ai  Aj = )
Rule 2
N = n1 n2 = the number of ways that two operations can be
performed in sequence if
n1 = the number of ways the first operation can be
performed
n2 = the number of ways the second operation can be
performed once the first operation has been
completed.
Rule 3
N = n1n2 … nk
= the number of ways the k operations can be
performed in sequence if
n1 = the number of ways the first operation can be
performed
ni = the number of ways the ith operation can be
performed once the first (i - 1) operations have
been completed. i = 2, 3, … , k
Basic counting formulae
1.
Orderings
n!  the number of ways you can order n objects
2.
Permutations
n!
 The number of ways that you can
n Pk 
 n  k ! choose k objects from n in a
specific order
3.
Combinations
 n
n!
 The number of ways that you
   n Ck 
k ! n  k !
k 
can choose k objects from n
(order of selection irrelevant)
Random Variables
Numerical Quantities whose values
are determine by the outcome of a
random experiment
Random variables are either
• Discrete
– Integer valued
– The set of possible values for X are integers
• Continuous
– The set of possible values for X are all real
numbers
– Range over a continuum.
The Probability distribution of A
random variable
A Mathematical description of the
possible values of the random
variable together with the
probabilities of those values
The probability distribution of a discrete
random variable is describe by its :
probability function p(x).
p(x) = the probability that X takes on the
value x.
This can be given in either a tabular form
or in the form of an equation.
It can also be displayed in a graph.
Comments:
Every probability function must satisfy:
1. The probability assigned to each value of the random
variable must be between 0 and 1, inclusive:
0  p ( x)  1
2. The sum of the probabilities assigned to all the values
of the random variable must equal 1:
 p ( x)  1

x
b
  p ( x)
3. P a  X  b 
xa
 p(a)  p(a  1)    p(b)
Probability Distributions
of Continuous Random Variables
Probability Density Function
The probability distribution of a continuous random
variable is describe by probability density curve f(x).
Notes:

The Total Area under the probability density curve is 1.

The Area under the probability density curve is from a to
b is P[a < X < b].
Normal Probability Distributions
(Bell shaped curve)
P(a  x  b)
a
m
b
x
Mean, Variance and standard
deviation of Random Variables
Numerical descriptors of the
distribution of a Random Variable
Mean of a Discrete Random Variable
• The mean, m, of a discrete random variable x is found by
multiplying each possible value of x by its own
probability and then adding all the products together:
m   xpx 
x
 x1 px1   x2 px2    xk pxk 
Notes:

The mean is a weighted average of the values of X.

The mean is the long-run average value of the random
variable.

The mean is centre of gravity of the probability distribution
of the random variable
Variance of a Discrete Random Variable: Variance, 2, of a
discrete random variable x is found by multiplying each
possible value of the squared deviation from the mean, (x  m)2,
by its own probability and then adding all the products
together:


   x  m  px
2
2

 
x

x




x px    xpx 
x

x 2 p x   m 2
2
2
x
Standard Deviation of a Discrete Random Variable: The
positive square root of the variance:
  2
The Binomial distribution
An important discrete distribution
X is said to have the Binomial distribution
with parameters n and p.
1. X is the number of successes occurring in
the n repetitions of a Success-Failure
Experiment.
2. The probability of success is p.
3. The probability function
 n x
n x
px     p 1  p 
 x
Mean,Variance & Standard
Deviation of the Binomial
Ditribution
• The mean, variance and standard deviation of
the binomial distribution can be found by
using the following three formulas:
1. m  np
2.   npq  np1  p
2
3.   npq  np1  p
Mean of a Continuous Random Variable
(uses calculus)
• The mean, m, of a discrete random variable x
m


xf  x  dx

Notes:

The mean is a weighted average of the values of X.

The mean is the long-run average value of the random
variable.

The mean is centre of gravity of the probability distribution
of the random variable
Variance of a Continuous Random Variable
 
2

  x  m  f  x  dx
2

Standard Deviation of a Continuous Random Variable:
The positive square root of the variance:
  
2

  x  m  f  x  dx
2

The Normal Probability Distribution
Points of
Inflection

m  3 m  2 m  
m
m 
m  2 m  3
Main characteristics of the Normal Distribution
• Bell Shaped, symmetric
• Points of inflection on the bell shaped curve are
at m –  and m + . That is one standard deviation
from the mean
• Area under the bell shaped curve between m – 
and m +  is approximately 2/3.
• Area under the bell shaped curve between m – 2
and m + 2 is approximately 95%.
Normal approximation to the
Binomial distribution
Using the Normal distribution to
calculate Binomial probabilities
Normal Approximation to the Binomial
distribution
Pa  X  b  p(a)  p(a  1)   p(b)
 Pa  12  Y  b  12 
• X has a Binomial distribution with
parameters n and p
• Y has a Normal distribution
m  np
  npq
1
2
 continuitycorrection
Sampling Theory
Determining the distribution of
Sample statistics
The distribution of the sample
mean
Thus if x1, x2, … , xn denote n independent random
variables each coming from the same Normal
distribution with mean m and standard deviation .
Then
n
x
x
i 1
i
n
has Normal distribution with
mean mx  m and
2

variance  x2 
n
standard deviation  x 

n
The Central Limit Theorem
The Central Limit Theorem (C.L.T.) states that if n is
sufficiently large, the sample means of random
samples from any population with mean m and finite
standard deviation  are approximately normally
distributed with mean m and standard deviation  .
n
Technical Note:
The mean and standard deviation given in the CLT
hold for any sample size; it is only the “approximately
normal” shape that requires n to be sufficiently large.
Graphical Illustration of the Central Limit Theorem
Distribution of x:
n=2
Original Population
10
20
30
x
10
20
Distribution of x:
n = 30
Distribution of x:
n = 10
10
x
x
30
10
20
x
Implications of the Central Limit Theorem
• The Conclusion that the sampling distribution of the
sample mean is Normal, will to true if the sample size
is large (>30). (even though the population may be nonnormal).
• When the population can be assumed to be normal, the
sampling distribution of the sample mean is Normal, will
to true for any sample size.
• Knowing the sampling distribution of the sample mean
allows to answer probability questions related to the
sample mean.
Sampling Distribution of a
Sample Proportion
Sampling Distribution for Sample Proportions
Let p = population proportion of interest
or binomial probability of success.
Let pˆ  X  no. of succeses
n no. of bimomial trials
= sample proportion or proportion of
successes.
T hen thesamplingdistribution of pˆ
is approximately a normal distribution with
mean m pˆ  p
 pˆ 
p (1  p )
n
Sampling distribution of a
differences
Note
If X, Yare independent normal random variables, then :
X – Y is normal with
mean m X  mY
st andarddeviat ion  X2   Y2
Sampling distribution of a difference in
two Sample means
Situation
• We have two normal populations (1 and 2)
• Let m1 and 1 denote the mean and standard deviation of
population 1.
• Let m2 and 2 denote the mean and standard deviation of
population 2.
• Let x1, x2, x3 , … , xn denote a sample from a normal
population 1.
• Let y1, y2, y3 , … , ym denote a sample from a normal
population 2.
• Objective is to compare the two population means
Then
D  x  y is Normal with mean
mx  y  mx  my  m1 - m2 and
 xy =   
2
x
2
y

2
1
n


2
2
m
Sampling distribution of a difference in
two Sample proportions
Situation
• Suppose we have two Success-Failure experiments
• Let p1 = the probability of success for experiment 1.
• Let p2 = the probability of success for experiment 2.
• Suppose that experiment 1 is repeated n1 times and
experiment 2 is repeated n2
• Let x1 = the no. of successes in the n1 repititions of
experiment 1, x2 = the no. of successes in the n2 repititions
of experiment 2.
x1
x2
pˆ1 =
and pˆ 2 =
n1
n2
Then
D  pˆ1  pˆ 2 is Normal with mean
m pˆ  pˆ  m pˆ  m pˆ  p1 - p2
1
2
1
 pˆ  pˆ =   
1
2
2
pˆ1
2
2
pˆ 2

p1 1  p1  p2 1  p2 

n1
n2
The Chi-square (c2) distribution
The Chi-squared distribution
with
n degrees of freedom
Comment: If z1, z2, ..., zn are independent random
variables each having a standard normal distribution then
2
2
2
U = z1  z2    zn
has a chi-squared distribution with n degrees of freedom.
The Chi-squared distribution
with
n degrees of freedom
0.18
0.12
0.06
0
0
10
n - degrees of freedom
20
0.5
2 d.f.
0.4
3 d.f.
0.3
4 d.f.
0.2
0.1
2
4
6
8
10
12 14
Statistics that have the Chi-squared distribution:
c
1.
r
c 2  
j 1 i 1
x
ij
 Eij 
Eij
2
c
r
  rij2
j 1 i 1
This statistic is used to detect independence
between two categorical variables
d.f. = (r – 1)(c – 1)
Let x1, x2, … , xn denote a sample from the
normal distribution with mean m and
standard deviation , then
r
2.
U
x  x 
i 1
2
i

2

(n  1) s

2
2
has a chi-square distribution with d.f. = n – 1.