Chapter 5 Stratified Random Sampling

Download Report

Transcript Chapter 5 Stratified Random Sampling

Chapter 6
Ratio, Regression, and
Difference Estimation
Ratio Estimation





How many?
In 1802, Laplace wanted to estimate
the population of France.
He sampled 30 communes throughout
the country.
There were 2,037,615 people living
in those communes.
Between 1799 and 1802 (3 years), 215,599 births were
registered to those communes.
Laplace calculated that there was one birth for every 28.35
people.
If you were given the number of births in France between
1799 and 1802, could you estimate the population?
Ratio Estimation in an SRS

Calculating the age of a tree (or the
volume of wood in a tree) is hard, timeconsuming work. However,
– tree age correlates with tree diameter,
which is easy to measure.
– volume correlates with diameter and height
– yi =the age of the tree; xi =the diameter of
the tree
Ratio Estimation in an SRS
Using auxiliary information
 Auxiliary information about the population
may include a known variable to which the
variable of interest is related (correlated).
 The auxiliary information typically is easy to
measure, whereas the variable of interest
may be expensive to measure.
Measure 2 quantities on each sample unit
 y variable of interest
 x auxiliary variable
Example: Ratio Estimation in SRS
 Wholesale
price of oranges in large
shipments based on sugar content of
entire load
 Sugar content cannot be determined
before extraction of juice from the
entire load.
How much orange juice
can be made from
these oranges?
Example, p. 2
Method 1
Obtain sugar content y1 , y2 ,..., yn from
sample; use y as an estimate of  y ;
ˆy  Ny estimates total sugar content  y . What is N?
Method 2
Fact: the sugar content y of an individual
orange is closely related to its weight x.
y N y  y
y

 y 
 x 
x N x  x
x
No need for N
n
So ˆy 
y
ny
 x    x  
x
nx
 yi
i 1
n
x
i 1
i
 x 
 x?
Ratio Estimation
in an SRS
Parameters of interest that want to estimate:
 y population total
 y population mean
y
R  population ratio
x
How many jelly beans in
this jar?
Ratio Estimation in an SRS
Correlation is important!
 Ratio and regression information take
advantage of the correlation of x and y
in the population.
 The higher the correlation, the better
ratio and regression estimators work.
N

 ( x  x )( y  y )
i 1
i
i
( N  1) x y
y
Estimation of the Ratio R  
x
n
r
yi

i 1
n
xi

i 1
y
x

Estimated variance of r :


Vˆ (r )  Vˆ 




yi 
  1 
i 1
n
 
xi  
i 1

n


n
where sr2 
n   1  sr2
 2 
N   x  n
2
(
y

rx
)
 i i
i 1
n 1
Example: (Food additive)


A researcher was investigating a new food additive
for cattle. Midway through the two-month study, she
was interested in estimating the ratio of present
weight to pre-study weight for the entire herd of N =
500 steers. A simple random sample of n = 12 steers
was selected from the herd and weighed. These data
and pre-study weights are presented in the
accompanying table for all cattle sampled. Assume the
pre-study total  x = 440,000 pounds.
Estimate the ratio of present weight to pre-study
weight of the herd, and provide an estimate of the
standard error for your answer.
10
Solution:
12
The estimate of the ratio R of the present weight to pre-study
weight for the herd is:
Solution:
y 954.8333
r 
 1.134119
x 841.9167
 1  sr2


n
Vˆ (r )  1    2 
N   x  n

12   1  8,848.646

 1 


500   8802 
12

 0.000929
SD (rˆ) 
0.000929  .0305
95% CI for R:
r  1.96 SD(rˆ)  1.134  1.96(.0305)
 1.134  .0598  (1.0742,1.1938)
On average, a steer is about
7.4% to 19.4% heavier
13
RatioEstimator of the Population Total  y
n
ˆy 
yi

i 1
n
xi

i 1
y
 x   x  x   r x
Estimated variance of ˆy :
Vˆ (ˆy )   x  Vˆ (r )   N  x 
2
2
 N 1 

n  sr2
N  n
n
where sr2 
2
(
y

rx
)
 i i
i 1
n 1
n   1  sr2
1 
 2 
N

  x  n
2
Example: (Food additive) (cont.)
Estimate the current total weight  y of the herd.
ˆy  r x  1.1341(440, 000)  499, 000
SD(ˆy )  Vˆ (ˆy ) 
2


n sr
2
N 1  
N n

12  8848.646

 500 1 
 13, 413 lbs

12
 500 
95% CI for  y :
2
499, 000  1.96(13, 413)  499, 000  26289
 (472, 711, 525, 289)
Example: (Food additive) (cont.)
Since in this particular example we know
N  500, we could also use
ˆy  Ny  500(954.833)  477, 417
SDˆ (ˆy )  Vˆ (ˆy ) 
2
n
s


2
N 1  
N n

12  46,636

 500 1 
 30,794

500  12

2
95% CI:
477,417  1.96(30,794)=477,417  60,356
 (417,061 , 537,773)
Example: (Food additive) (cont.)
ˆy  r x
ˆy  Ny
ˆy  499,000
ˆy  477, 417
SDˆ (ˆy )  13, 413
SDˆ (ˆy )  30, 794
(472,711 , 525, 289) (417, 061 , 537, 773)
RatioEstimator of the Population Mean  y
n
ˆ y 
yi

i 1
n
xi

i 1
 x  
y
x   rx

x
Estimated variance of ˆ y :
Vˆ ( ˆ y )   Vˆ (r )  
2
x

 1 

2
x
n  sr2
N  n
n
where sr2 

1 

2
(
y

rx
)
 i
i
i 1
n 1
n
N




1  sr2
 x2  n
Example: (Food additive) (cont.)
Estimate the current mean weight  y of the herd.
ˆ y  r  x  1.1341(880)  998
SD(ˆ y )  Vˆ (ˆ y ) 
2


n sr
1  
N n

12  8848.646

 1 
 26.83 lbs

12
 500 
95% CI for  y :
998  1.96(26.83)  998  52.59
 (945.41, 1050.59)
Example: (Food additive) (cont.)
ˆ y  r  x
ˆ y  998
SDˆ ( ˆ y )  26.83
(945.41, 1050.59)
ˆ y  y
ˆ y  954.833
2
n
s


SDˆ ( ˆ y )  1  
 N n
12  46636

 1 

500

 12
 61.59
(834.12 , 1075.55)