Transcript estimation

ESTIMATION
STATISTICAL INFERENCE

It is the procedure where inference about
a population is made on the basis of the
results obtained from a sample drawn
from that population
STATISTICAL INFERENCE
This can be achieved by :
Hypothesis testing
Estimation: Point estimation
Interval estimation

Estimation


If the mean and the variance of a normal
distribution are known , then the
probabilities of various events can be
determined.
But almost always these values are not
known , and we have to estimate these
numerical values from information of a
simple random sample
Estimation

The process of estimation involves
calculating from the data of a sample ,
some “statistic” which is an approximation
of the corresponding “parameter” of the
population from which the sample was
drawn
POINT ESTIMATION


It is a single numerical value btained from
a random sample used to estimate the
corresponding population parameter
_
Sample mean (X) is the best point
estimate for population mean(µ )
POINT ESTIMATION


Sample standard deviation (s) is the best point
estimate for population standard deviation (σ )
~
Sample proportion ( P) is the best point
estimator for population proportion (P)

But, there is always a sort of sampling error that
can be measured by the Standard Error of the
mean which relates to the precision of the
estimated mean

Because of sampling variation we can not say
that the exact parameter value is some specific
number, but we can determine a range of values
within which we are confident the unknown
parameter lies
INTERVAL ESTIMATION

It consists of two numerical values
defining an interval within which lies the
unknown parameter we want to estimate
with a specified degree of confidence
INTERVAL ESTIMATION


The values depend on the confidence level
which is equal to 1-α (α is the probability of
error)
The interval estimate may be expressed as:
Estimator ± Reliability coefficient X standard error
INTERVAL ESTIMATION
Parameter
Estimator
Standard error
Population mean
(µ )
Sample mean_
( X)
σ /√ n
INTERVAL ESTIMATION
Parameter
Estimator
Difference between two
population means
Difference between
two sample means
(µ1-µ2)
_ _
( X1-X2)
Standard error
√(σ
2 /n )+
1
1
(σ22/n2)
INTERVAL ESTIMATION
Parameter
Estimator
Population
proportion
( P)
Sample
proportion
~
(P)
Standard error
~
~
√ p(1-p)/n
(since P is unknown,
and we want to
estimate it)
INTERVAL ESTIMATION
Parameter
Estimator
Difference
between two
Population
proportions
Difference
between two
Sample
proportion
( P1-P2)
~
~
P1-P2
Standard error
~
~
~
√ p1(1-p1)/n1 + p2(1~
p2)/n2
Reliability Coefficient

The reliability coefficient is the value of
Z1-α /2 corresponding to the confidence
level
Reliability Coefficient
Confidence
level
90%
α -value
Z-value
0.1
1.645
95%
0.05
1.96
99%
0.01
2.58
Confidence Interval

The Confidence Interval is central and
symmetric around the sample mean , so that
there is (α/2 %) chance that the parameter is
more than the upper limit, and (α/2 % ) chance
that it is less than the lower limit
CI FOR POPULATION MEAN


The sample mean is an unbiased estimate for
population mean
If the population variance is known, CI around µ:
_
_
{X- Z1-α /2 x σ /√ n < µ < X + Z1-α /2 x σ /√ n}
EXERCISE

The mean s.indirect bilirubin level of 16 four
days old infants was found to be 5.98 mg/dl.
The population SD (σ)=3.5 mg/dl. Assuming
normality , find 90,95, 99% CI for µ:
_
_
{X- Z1-α /2 x σ /√ n < µ < X + Z1-α /2 x σ /√ n}
EXERCISE
_
_
CI{X- Z1-α /2 x σ /√ n < µ < X + Z1-α /2 x σ /√ n}=1-
α
90%CI= {5.98- 1.645 * 3.5 /√ 16 < µ < 5.98 + 1.645 *
3.5 /√ 16}=1-0.1
90%CI= {5.98- 1.44 < µ < 5.98 + 1.44}=1-0.1
90%CI= {4.54 < µ < 7.42}
_
_
CI{X- Z1-α /2 x σ /√ n < µ < X + Z1-α /2 x σ /√ n}=1-α
95%CI {5.98- 1.96 * 3.5 /√ 16 < µ < 5.98 + 1.96 *
3.5 /√ 16}
95%CI {5.98- 1.715 < µ < 5.98 + 1.715}
95%CI {4.265 < µ < 7.695}
_
_
CI{X- Z1-α /2 x σ /√ n < µ < X + Z1-α /2 x σ /√ n}=1-α
99%CI{5.98- 2.58 * 3.5 /√ 16 < µ < 5.98 + 2.58 * 3.5 /√
16}
99%CI{5.98- 2.258 < µ < 5.98 + 2.258}
99%CI={ 3.72 < µ < 8.24}
CI for difference between two
population means

A sample of 10 twelve years old boys and a sample of
10 twelve years old girls yielded mean height of 59.8
inches (boys), and 58.5 inches (girls). Assuming
normality and σ1=2 inches, and σ2= 3 inches . Find
90% CI for the difference in means of height between
girls and boys at this age.
CI for difference between two
population means
_
_
_ _
CI{( X1-X2) -Z √(σ21/n1)+ (σ22/n2)< (µ1-µ2)< ( X1-X2)+
Z√ (σ21/n1)+ (σ22/n2)}
90%CI{( 59.8-58.5) -1.645
(2)2/10)+ (3)2/10)}
√(2)2/10)+ (3)2/10)< (µ1-µ2)< ( 59.8-58.5)+1.645√
90%CI{1.3 -1.88< (µ1-µ2)< 1.3+ 1.88}
90%CI{ -0.58< (µ1-µ2)< 3.18}
CI for population proportion
In a survey 300 adults were interviewed , 123
said they had yearly medical checkup. Find the
95% for the true proportion of adults having
yearly medical checkup.
~ 123
 P=-------=0.41
300

CI for population proportion
~
~ ~
~
~ ~
CI{P-Z √ p(1-p)/n<P<P+Z √ p(1-p)/n}=1-α
95%CI{0.41-1.96 √ 0.41(1-0.41)/300<P<0.41+1.96
√ 0.41(1-0.41)/300}
95%CI{0.41- 0.06<P<0.41+0.06}
95%CI{0.35<P<0.47}
95%CI= 35-47%
CI for difference between two
population proportions

200 patients suffering from a certain disease
were randomly divided into two equal groups.
The first group received NEW treatment, 90
recovered in three days. Out of the other 100
who received the STANDARD treatment 78
recovered within three days. Find the 95% CI
for the difference between the proportion of
recovery among the populations receiving the
two treatments
Answer
~ ~ 90
78
 P1-P2=------- - ---------=0.12
100
100
Answer
~ ~
~ ~
CI ( P1-P2 )-Z √ p1(1-p1)/n1
~ ~
√ p1(1-p1)/n1
+
~ ~
~ ~
+ p2(1-p2)/n2 < P1-P2 < ( P1-P2 )+Z
~ ~
p2(1-p2)/n2
95% CI=0.12± 1.96 √ 0.9(1.0.9)/100 + 0.78(1-0.78)/100
95%CI=0.12 ± 0.1
95%CI =0.02-0.22 ( 2-22%)



The width of the interval estimation is increased
by:
Increasing confidence level (i.e.: decreasing alpha
value)
Decreasing sample size
Confidence level can shade the light on the
following information:
1.The range within which the true value of the
estimated parameter lies

2.The statistical significance of a difference ( in
population means or proportions).
If the ZERO value is included in the interval of
such differences( i.e.: the range lies between a
negative value and a positive value), then we can
state that there is no statistically significant
difference between the two population values
(parameters), although the sample values
(statistics) showed a difference
3.The sample size.
A narrow interval indicates a “large” sample size,
while a wide interval indicates a “small” sample
size (with fixed confidence level)
EXERCISES

In a study to assess the side effects of two drugs
, 50 animals were given Drug A (11 showed
undesirable side effects), and 50 were given
Drug B (8 showed similar side effects).
Find the 95% CI for PA-PB
EXERCISES

In a random sample of 100 workers , the mean
blood lead level was 90 ppm. If the distribution
of blood lead level in workers population is
normal with a standard deviation of 10 ppm.
Find the 90,95,and 99% CI for the population
mean.
EXERCISE

In assessing the relationship between a certain
drug and a certain anomaly in chick embryos, 50
fertilized eggs were injected with the drug on
the 4th day of incubation . On the 20th day the
embryos were examined and in 12 the presence
of the abnormality was observed. Find the
90,95, and 99% CI for the population
proportion.
EXERCISE

If the Hb level of males aged >10 years is
normally distributed with a variance of 1.462
(gm/dl)2 , and that of males below 10 years is
also normally distributed with a variance of
0.867 (gm/dl)2 . If a random sample of 10 older
and 20 younger males are selected , and showed
sample means of 14.47 gm/dl, and 12.64 gm/dl
, respectively. Find the 90, 95, and 99% CI for
the difference in population means.