Stat200: pre 10- More about CI

Download Report

Transcript Stat200: pre 10- More about CI

Presentation 10
More About Confidence
Intervals
Types of CI’s in Chapter 12
A.
B.
C.
D.
1 mean
Difference Between 2 Independent means
Difference Between 2 Paired Means
Difference Between 2 Proportions
In cases B, C and D we are interested in comparing 2 populations with
regard to a parameter. There are two possible ways to get the samples
from the two populations:
1.
Independent samples – The data from one sample do not tell us
anything for the data in the other sample (cases B and D)
2.
Paired Data – A natural pairing exists among the two samples,
e.g. “before and after” studies, studies on twins, etc. (case C)
Basic formula for CI remains the same!
Estimate ± Multiplier x Standard Error of the Estimate
Recognize the Situation





The biggest challenge that most of you face at this point is reading a
problem and deciding which kind of confidence interval is required.
So, I will make it very clear how to do so, and then we will get some
practice.
First, you need to identify the response variable and then determine
what type of variable (categorical or quantitative) it is.
If it is categorical, we are dealing with proportions. From there, you
should be able to determine whether we are looking at just one
proportion or the difference between two proportions.
If the variable of interest is quantitative, we are dealing with means.
If it is just one mean, you are all set. If we are looking at the
difference between two means, you need to determine if they are
paired or independent.
Recognizing what you need to do is half the battle. Once you have
accomplished that, it is just a matter of putting the right pieces
together. Every confidence interval requires a sample estimate, a
multiplier, and a standard error, and you should have the right
formulas written down for each type of CI. Once you have made the
correct diagnosis, just plug and have fun with the calculations!

Example 1: John records the number of blue eyed individuals from a
sample of 60 men and 60 women. Construct an appropriate confidence
interval for the difference between men and women with respect to blue
eyes.
Data Table based on Each Observation
Subject
Gender
Blue Eyes?
1
Male
N
2
Female
Y
3
Female
N
4
Male
N
Etc.
….
….
Think about what variables are recorded for each subject. In this case we
have gender and eye color for each subject, both of which are categorical
variables. When we want to compare the categorical response variable
(Blue Eyes) over 2-levels of the categorical predictor variable (Gender), we
want a confidence interval for 2 Proportions. Note: Means would make NO
SENSE here. You can’t have the mean of a categorical variable!
Construct a 95% CI for the difference in the proportion of men and
women who have blue eyes.
Example 2: John recorded the lengths of height of 50 randomly
chosen redwood trees in State College. He is interested in estimating
the average height of redwood trees in State College. It is easy to
see that the data would consist of a single quantitative variable
(height) measured for each tree. An appropriate CI might be a 95%
CI for the mean height of redwood trees. That is a CI for 1 Mean.
Tree
Height (ft)
1
190
2
230
3
175
4
245
Etc.
…
Note: If height had been replaced by a
categorical variable (e.g. Tree greater
than 200 ft - Yes/No) then a confidence
interval for 1 Proportion would have
been appropriate.
Examples: Independent vs. Paired Data
Independent Data: Occurs when the observations are not related in
any way. For example taking a random sample of 50 males and 50
females and recording their SAT scores. The scores from the first
female and the first male are NOT related. The observations are
independent.
Paired Data: Occurs when the observations are paired. For example
if we select 50 random subjects to participate in a diet study and we
record their weights before and after. The weight before is paired
with the weight after for each individual. Paired data occurs when
either there are repeated measurements on the same unit (e.g.
before and after some treatment) or if the units themselves are
naturally paired (ex. twins, husband and wife, etc. )
Structure of Paired and
Independent Data

Independent Data: A random sample of 400 apples is taken off the
shelf at a grocery store. The apples are classified as yellow or red, and
the amount of vitamin C in each apple is recorded.
Apple
Color
Vitamin C
(mg)
1
Red
125
2
Red
110
3
Yellow
235
4
Red
104
Etc.
….
….
What type of CI makes sense here?
A CI for the difference in the mean amount of vitamin C between
yellow and red apples. That is a CI for 2 Means.
Structure of Paired and
Independent Data

Paired Data: A random sample of 200 patients is administered a new
cholesterol drug. The patients cholesterol is recorded before and after taking
the drug.
Patient
Cholesterol
Before
Cholesterol
After
Decrease in
Cholesterol
1
235
215
20
2
310
254
56
3
198
178
20
4
245
231
14
Etc.
….
….
What type of CI makes sense here?
A CI for the mean decrease in cholesterol. That is a CI for 1 Mean
based on the pair-wise differences (decrease in cholesterol).
Practice…

Twenty-five people have their blood pressure measured in the
morning and again in the afternoon. The data will be used to
determine whether blood pressure increases during the day.
Independent

What is the difference in average ages at which teachers and
plumbers retire?
Independent

Paired
What is the difference in average salaries for high school
graduates and college graduates?
Independent

Paired
A sample of 100 students at a university was asked how many
hours a week they spent studying and how many they spent
socializing. The difference was computed for each student.
Independent

Paired
Paired
Students are asked their actual weight and their ideal weight in
order to determine how far they are from their "goal".
Independent
Paired
General Format of a CI

In Chapter 10 we have seen how to create confidence interval for a
proportion. Recall that a β% C.I. for some population proportion p is
pˆ  z * se( pˆ )
where pˆ is the sample proportion (the statistic), and the z* multiplier
depends on the desired confidence level, β% and is obtained from
the standard normal tables. More specifically, z* is such that
P(-z*<Z<z*)= β%.


In general, the format of a CI for a parameter is
Sample Estimate ± Multiplier x Standard Error of the Sample Estimate
In the following, we will see what is the appropriate sample statistic
what is its standard error and how to obtain the multiplier for each of
the situations.
CI for One Mean





Here is the case were we want to make inference about the population
mean of a quantitative random variable.
The sample statistic used in this case is the sample mean
x.
The standard error of the sample mean is s.e( x )  s
n,
where s is the sample standard deviation, and n is the sample size.
It remains to specify the “Multiplier” in the general form of a CI. To do so
we nee to introduce some further distribution theory.
In Chapter 9 we have seen that if we have a sample from a population
with some mean µ and some standard deviation σ, then under some
conditions X is normal with mean µ and std deviation σ/√n.
Equivalently,
X 
, is normalwith mean 0 and std dev 1.
 n

If σ was known, based on this result we would be able to create a CI for
µ. However, usually this is not the case.
CI for One Mean
Replacing σ with s, we have that
X 
, has t - distribution with(n  1) degrees of freedom.
s n
if one of the following conditions is true:

1.
2.
the random variable of interest is bell-shaped (in practice, for small
samples the data should show no extreme skewness or outliers).
the random variable is not bell-shaped, but a large random sample is
measured, n ≥ 30.
Some Properties of the t-distribution:

1.
There are infinitely many t-distributions, each characterized by one
parameter, the degrees of freedom (df).
2.
The degrees of freedom are positive integers, e.g. 1,2,…
3.
Random variables with t-distribution are continuous.
4.
5.
The density curve of a t-distribution is symmetric, bell-shaped and
centered at zero (similar to the standard normal curve).
As the degrees of freedom increase, the variance of the t- random
variable decreases, i.e. the density curve is less spread, and actually it
approaches the standard normal density. (That implies that the density
curve of a t-distribution is more spread out than the standard normal
curve.)
CI for One Mean



Based on these results we have that the multiplier for the confidence
interval of µ is the value in the t-distribution with df=n-1, such that
the area between the (multiplier) and the -(multiplier) is equal to the
desired confidence level.
The multiplier in this case is denoted with t*.
We can easily obtain the values of the multiplier from Table A2. Here
are some examples for the values of the multiplier:
1.
2.
n= 41 (i.e. df=40), confidence level 95%, t*=2.02.
n= 10 (i.e. df=9), confidence level 99%, t*=3.25.
Summary – Steps to obtain CI for µ:
1.
Check if the condition is satisfied, i.e. bell shaped population or n≥30.
2.
Calculate x and s.e( x )  s
n.
3.
Based on the required confidence level, β%, and the degrees of
freedom (n-1), use Table A2 to get the multiplier t*.
4.
The β% CI for µ is

x t* s

n.
Special case of CI for 1 Mean: CI for Paired Data

Consider the example were we are interested in the difference in the
mean blood pressure before exercise and after exercise.

We are interested in estimating µ1 -µ2 for
µ1: mean blood pressure before exercise
µ2: mean blood pressure after exercise.




For each person we have two measurements resulting in two samples,
the ''before'' sample (the values of blood pressure before exercise) and
the "after" sample (the values of blood pressure before exercise).
However, we are just interested in the difference between the "before"
measurement and the "after" measurement. So, for each pair of values
we computer their difference resulting in one sample of the differences.
Then, using the sample of the differences we can create a C.I. for the
population mean of the differences using the same procedure as the CI
for one mean!
Let µd= the population mean of the differences, and d the sample
mean of the differences, then d  1  2 and d  x1  x2 .
The CI for µd is

d  t * sd
n

where sd is the sample standard deviation of the differences.
Difference between two means
(Independent Samples).
Steps to obtain CI of µ1 - µ2 (difference between 2 pop. Means):
1. Check if the following conditions are valid:
1.
The two samples are independent.
2.
Each sample is either coming from a bell shaped population or the
sample size is ≥30.
x  x2 and the standard error
2. Calculate the sample statistic 1
s12 s22
s.e( x1  x2 ) 
 ,
n1 n2
where n1, n2 are the sizes of the two samples and s12, s22 are the
variances of the two samples.
3. The multiplier for the confidence interval is a t-multiplier (t*) and the df are
approximately equal to the lesser of n1-1 and n2 -1.
4. The β% CI for µ1 - µ2 is
x1  x2  t * s.e( x1  x2 )
Difference between two proportions
(Independent Samples).
Steps to obtain CI of p1 – p2 (difference between 2 pop. Prop.):
1. Check if the following conditions are valid:
1.
2.
The two samples are independent.
ˆ 1, n1(1  pˆ 1), n2 pˆ 2 and n2(1 
All the quantities n1 p
and preferably at least 10.
pˆ 2) are at least 5
2. Calculate the sample statistic pˆ 1  pˆ 2 and the standard error
pˆ 1(1  pˆ 1) pˆ 2(1  pˆ 2)
s.e ( pˆ1  pˆ 2 ) 

n1
n2
where n1, n2 are the sizes of the two samples and
sample proportions in the two samples.
pˆ 1, pˆ 2
are the
3. The multiplier for the confidence interval is a z-multiplier (z*) like in
the one sample case, i.e. P(-z*<Z<z*)= β%.
4. The β% CI for p1 – p2 is
( pˆ1  pˆ 2 )  z * s.e( pˆ1  pˆ 2 )
Table of CI Types
Type
Parameter Statistic
Standard Error
Multiplier
One Mean
(or Paired mean)
µ or µd
x or d
Difference
Between Means
µ1- µ2
x1  x 2
p
pˆ
pˆ (1  pˆ )
n
z*
p1-p2
pˆ 1  pˆ 2
pˆ 1(1  pˆ 1) pˆ 2(1  pˆ 2)

n1
n2
z*
One Proportion
Difference
Between
Proportions
s
or
n
sd
n
s12 s 2 2

n1 n 2
t*
df=n-1
t*
df=min(n1-1,n2-1)
Conditions Necessary for
Confidence Intervals



1 mean or Difference Between Paired Means
Population is normal (bell-shaped) or n≥30.
Difference Between 2 Independent means
At least one of the above conditions must hold
for BOTH samples. The two samples are
indepentet.
Difference Between 2 Proportions
ˆ 1, n1(1  pˆ 1) AND n2 pˆ 2, n2(1  pˆ 2)
Both n1 p
must be greater than or equal to 10.
Example 1
Veronica records the weights of 64 adult black bears trapped in New York
in the fall of 2002. The sample mean weight was 210 lbs and with a
standard deviation of 25 lbs. Construct a 95% confidence interval for the
mean weight of adult black bears.






The parameter of interest is μ, the population mean weight of black
bears.
Conditions: The sample size is greater than 30, n=64> 30.
s
x  210lbs, and s.e ( x ) 
 25 / 8  3.125.
n
The multiplier is a t*. Use table A.2 in your text. The df = n-1 = 63 and
the CI level=95%.
Note: If they do not have the specific df, then use the next LOWEST
number in the table. So for df=60, we get t*=2.
95% CI for μ: 210± 2(3.125) = (203.8,216.3)
Interpretation: We are 95% confident that the mean weight of
adult black bears is between 203.8 and 216.3 lbs.
Example 2
Margaret conducts a study to determine the difference in opinion between
men and women on abortion. She randomly asks 200 men and 300
women whether they are pro-life or pro-choice. 80 men and 180 women
say they are pro-choice. Construct a 99% confidence interval for the
difference in the proportion of men and women who are pro-choice.

The parameter of interest is pf – pm .
Conditions: All quantities,n1 pˆ 1, n1(1  pˆ 1)
greater than 10.

pˆ f  pˆ m  180/ 300 80 / 200  .6  .4  .2

andn2 pˆ 2, n2(1  pˆ 2) are
s.e ( pˆ f  pˆ m )  (.4)(.6)/200  (.6)(.4)/300  .0447

For 99% confidence level z*= 2.58.

The 99% CI for pf - pm is: .20 ±2.58(.0447) = (.085,.315).

Interpretation: We are 99% confident that the proportion of females
who are pro-choice is between 8.5% and 31.5% greater than the
proportion of males who are pro-choice.
Identifying the C.I
For each example below, decide which type of
confidence interval should be calculated.




We want to estimate the difference between the
heights of smokers and non-smokers at PSU.
We want to calculate an interval that contains the
fraction of all PSU students who are right-handed.
We want to capture the difference between the
proportions of smokers and non-smokers at PSU who
have two or more tattoos.
We want to estimate the daily sugar intake (in grams)
of adult Americans.