No Slide Title

Download Report

Transcript No Slide Title

Chapters 8 - 9
Estimation
Mat og metlar
©
Estimator and Estimate
Metill og mat
An estimator of a population parameter is a random
variable that depends on the sample information and
whose value provides approximations to this
unknown parameter. A specific value of that random
variable is called an estimate.
Metill fyrir þýðisstika er hending sem er háð
úrtaksupplýsingum og gildi metilsins sem kallast
mat gefur nálgun á hinn óþekkta þýðisstika.
Point Estimator and Point Estimate
Punktmetill og punktmat
Let  represent a population parameter (such as
the population mean  or the population
proportion ). A point estimator, θˆ, of a
population parameter, , is a function of the
sample information that yields a single number
called a point estimate. For example, the sample
mean, X, is a point estimator of the population
mean , and the value that X assumes for a given
set of data is called the point estimate.
ˆ
θ
X
Þýðisstiki (population parameter)
Unbiasedness
Óhneigður (óbjagaður)
The point estimator θˆ is said to be an unbiased
estimator of the parameter  if the expected
value, or mean, of the sampling distribution of θˆ
is ; that is,
E (ˆ)  
Punktmetill θˆ er sagður óhneigður metill fyrir
stikann  ef vongildi líkindadreifingar úrtaks
fyrir er ; þ.e., θˆ
E (ˆ)  
Probability Density Functions for unbiased and
Biased Estimators
Þéttifall fyrir hneigðan og óhneigðan metil
(Figure 8.1)
ˆ1
ˆ2

ˆ
Bias Bjögun (skekkja)
Let θˆ be an estimator of . The bias in θˆ is defined as
the difference between its mean and ; that is
Bias(ˆ)  E(ˆ)  
It follows that the bias of an unbiased estimator is 0.
Látum θˆ vera metil fyrir . Bjögun í θˆ er skilgreind
sem mismunur milli vongildis metilsins og ; þ.e.
Bias(ˆ)  E(ˆ)  
Samkvæmt þessu er bjögun (bias) fyrir óhneigðan
metil 0.
Most Efficient Estimator and Relative Efficiency
Skilvirkasti metillinn og hlutfallsleg skilvirkni
Suppose there are several unbiased estimators of . Then the
unbiased estimator with the smallest variance is said to
be the most efficient estimator or to be the minimum
variance unbiased estimator of . Let θˆ 1and θˆ 2 be two
unbiased estimators of , based on the same number of
sample observations. Then,
a) θˆ 1is said to be more efficient than θˆ 2 if Var(ˆ1 )  Var(ˆ2 )
b) The relative efficiency of θˆ 1 with respect to θˆ 2 is the ratio of
their variances; that is, hlutfallsleg skilvirkni
Var(θˆ2 )
RelativeEfficiency
Var(θˆ1 )
Point Estimators of Selected
Population Parameters
(Table 8.1)
Population
Parameter
Point
Estimator
Properties
Mean, 
X
Unbiased, Most Efficient
(assuming normality)
Mean, 
Xm
Unbiased (assuming
normality), but not most
efficient
Proportion, 
p
Unbiased, Most Efficient
Variance, 2
s2
Unbiased, Most Efficient
(assuming normality)
Confidence Interval Estimator
Metill fyrir öryggismörk
A confidence interval estimator for a population
parameter  is a rule for determining (based on
sample information) a range, or interval that is
likely to include the parameter. The corresponding
estimate is called a confidence interval estimate.
Metill fyrir öryggismörk á þýðisstika  er til að
ákvarða (byggt á úrtaksgögnum) spönn, eða bil
sem líklegt er til að ná utan um hinn sanna stika.
Samsvarandi mat köllum við mat fyrir
öryggismörk eða bara öryggismörk.
Confidence Interval and Confidence Level
Let  be an unknown parameter. Suppose that on the basis of sample
information, random variables A and B are found such that P(A <  < B) = 1 - ,
where  is any number between 0 and 1. If specific sample values of A and B
are a and b, then the interval from a to b is called a 100(1 - )% confidence
interval of . The quantity of (1 - ) is called the confidence level of the
interval.
If the population were repeatedly sampled a very large number of
times, the true value of the parameter  would be contained in 100(1 - )% of
intervals calculated this way. The confidence interval calculated in this manner
is written as a <  < b with 100(1 - )% confidence.
Látum  vera óþekktan stika. Hugsum okkur á að á grunni úrtaksupplýsinga
séu hendingar A og B reiknaðar þannig að P(A <  < B) = 1 - , þar sem  er
einhver tala milli 0 og 1. Ef ákveðin gildi A og B eru a and b, þá er bilið frá a til
b kallað 100(1 - )% öryggismörk fyrir . Stærðin (1 - ) er kallað öryggsstig
bilsins.
Ef endurtekin úrtök væru tekin úr þýðinu mjög oft þá myndi 100(1 )% allra þeirra bila sem reiknuð væri út innihalda hinn sanna stika .
Öryggismörkin sem reiknuð eru á þennan hátt eru skrifuð sem a <  < b með
100(1 - )% vissu.
P(-1.96 < Z < 1.96) = 0.95, where Z is
a Standard Normal Variable
(Figure 8.3)
0.95 = P(-1.96 < Z < 1.96)
0.025
0.025
-1.96
1.96
Notation Táknmálsnotkun
Let Z/2 be the number for which
P( Z  Z / 2 ) 

2
where the random variable Z follows a standard
normal distribution.
Látum Z/2 vera tölu sem
P( Z  Z / 2 ) 

2
Þar sem hendingin Z fylgir staðlaðri
normaldreifingu
Selected Values Z/2 from the Standard
Normal Distribution Table
(Table 8.2)

Z/2
Confidence
Level
0.01
0.02
0.05
0.10
2.58
2.33
1.96
1.645
99%
98%
95%
90%
Confidence Intervals for the Mean of a Population that
is Normally Distributed: Population Variance Known
Öryggismörk fyrir meðaltal þýðis sem er normaldreift
og með þekkta dreifni
Consider a random sample of n observations from a normal
distribution with mean  and variance 2. If the sample
mean is X, then a 100(1 - )% confidence interval for the
population mean with known variance is given by
or equivalently,
Z / 2
Z / 2
X
X 
n
n
X B
where the margin of error (also called the sampling error,
the bound, or the interval half width) is given by
B  Z / 2

n
Basic Terminology for Confidence Interval for a
Population Mean with Known Population Variance
Orðnotkun fyrir öryggismörk þýðismeðaltals með
þekktri dreifni
Terms
(Table 8.3)
Symbol
Standard Error of the Mean
X
Z Value (also called the Reliability
Factor)
Z / 2
Margin of Error skekkjumörk
B
To Obtain:
/ n
Use Standard Normal
Distribution Table
B  Z / 2

n
Lower Confidence Limit Neðri mörk
LCL
LCL  X  Z / 2
Upper Confidence Limit Efri mörk
UCL
UCL  X  Z / 2
Width (width is twice the bound)
Breidd
w
w  2 B  2 Z / 2

n

n

n
Student’s t Distribution
Given a random sample of n observations, with
mean X and standard deviation s, from a normally
distributed population with mean , the variable t
follows the Student’s t distribution with (n - 1)
degrees of freedom and is given by
X 
t
s/ n
Hugsum okkur slembið úrtak n athugana með
úrtaksmeðaltal X og úrtaksstaðalfrávik s, úrtakið er
fengið úr þýði sem er normaldreift með vongildi ,
breytan t er sögð fylgja Student’s t dreifingu með
(n - 1) frígráður og er gefin af
Notation Táknmálsnotkun
A random variable having the Student’s t
distribution with v degrees of freedom will be
denoted tv. The tv,/2 is defined as the number
for which
P(tv  tv, / 2 )   / 2
Slembin breyta sem hefur Student’s t dreifingu með v
frelsisgráður verður táknuð með tv. Stærðin tv,/2 er
skilgreind sem stærðin sem
Confidence Intervals for the Mean of a Normal
Population: Population Variance Unknown Öryggismörk
fyrir vongildi í normaldreifðu þýði með óþekktri dreifni
Suppose there is a random sample of n observations from a normal
distribution with mean  and unknown variance. If the sample
mean and standard deviation are, respectively, X and s, then a 100(1
- )% confidence interval for the population mean, variance
unknown, is given by
X  tn1, / 2
s
s
   X  tn1, / 2
n
n
or equivalently,
X B
where the margin of error, the sampling error, or bound, B, is given
s
by
B  t n 1, / 2
n
and tn-1,/2 is the number for which
P(tn1  tn1, / 2 )   / 2
The random variable tn-1 has a Student’s t distribution with v=(n-1) degrees of freedom.
Confidence Intervals for Population Proportion (Large
Samples) Öryggismörk fyrir þýðishlutfall
(Stór úrtök)
Let p denote the observed proportion of “successes” in a random
sample of n observations from a population with a proportion  of
successes. Then, if n is large enough that (n)()(1- )>9, then a 100(1
- )% confidence interval for the population proportion is given
by
p  Z / 2
p(1  p)
p(1  p)
   p  Z / 2
n
n
or equivalently,
pB
where the margin of error, the sampling error, or bound, B, is given
by
p(1  p)
B  Z / 2
n
and Z/2, is the number for which a standard normal variable Z
satisfies
P(Z  Z / 2 )   / 2
Notation Táknmálsnotkun
A random variable having the chi-square
distribution with v = n-1 degrees of freedom
will be denoted by 2v or simply 2n-1. Define
as 2n-1, the number for which
P(
2
n1

2
n1,
) 
Hending með chi-square dreifingu þar sem
v = n-1 frelsisgráður er táknuð með 2v eða
2n-1. Skilgreinum 2n-1, sem töluna sem um
gildir að
The Chi-Square Distribution
(Figure 8.17)
1-
0

2n-1,
The Chi-Square Distribution for n – 1
and (1-)% Confidence Level
(Figure 8.18)
/2
/2
1-
2n-1,1- /2
2n-1,/2
Confidence Intervals for the Variance of a Normal
Population Öryggismörk fyrir dreifni í normaldreifðu þýði
Suppose there is a random sample of n observations from a
normally distributed population with variance 2. If the observed
variance is s2 , then a 100(1 - )% confidence interval for the
population variance is given by Hugsum okkur slembið úrtak n
gagna úr normaldreifðu þýði með dreifni 2. Ef úrtaksdreifni er s2 ,
þá eru 100(1 - )% öryggismörk fyrir þýðisdreifni gefin sem
(n  1)s 2

2
n 1, / 2
 2 
(n  1)s 2

2
n 1,1 / 2
is the number for which
P( 
and 2n-1,1 - /2 is the number for which
P( 
where
2
n-1,/2
2
2
n 1
2
n 1


2
n 1, / 2
)
2
n 1,1 / 2

)
2

2
And the random variable n-1 follows a chi-square distribution
with (n – 1) degrees of freedom. Og hendingin 2n-1 fylgir chi-square
dreifingu með (n – 1) frelsisgráður
Confidence Intervals for Two Means: Matched Pairs
Öryggismörk fyrir tvö vongildi : Pör (Matched Pairs)
Suppose that there is a random sample of n matched pairs of
observations from a normal distributions with means X and Y .
That is, x1, x2, . . ., xn denotes the values of the observations from the
population with mean X ; and y1, y2, . . ., yn the matched sampled
values from the population with mean Y . Let d and sd denote the
observed sample mean and standard deviation for the n differences
di = xi – yi . If the population distribution of the differences is
assumed to be normal, then a 100(1 - )% confidence interval for
the difference between means (d = X - Y) is given by
d  tn1, / 2
or equivalently,
sd
sd
 d  d  tn1, / 2
n
n
d B
Confidence Intervals for Two Means:
Matched Pairs
(continued)
Where the margin of error, the sampling error or the bound,
B, is given by
B  t n 1, / 2
sd
n
And tn-1,/2 is the number for which
P(t n 1  t n 1, / 2 ) 

2
The random variable tn – 1, has a Student’s t distribution
with (n – 1) degrees of freedom.
Confidence Intervals for Difference Between Means:
Independent Samples (Normal Distributions and Known
Population Variances) Öryggismörk fyrir mismun vongilda:
Óháð úrtök
Suppose that there are two independent random samples of nx and
ny observations from normally distributed populations with means
X and Y and variances 2x and 2y . If the observed sample means
are X and Y, then a 100(1 - )% confidence interval for (X - Y) is
given by
( X  Y )  Z / 2
or equivalently,
 X2
nx

 Y2
ny
  X  Y  ( X  Y )  Z  / 2
(X Y )  B
where the margin of error is given by
B  Z / 2
 X2
nx

 Y2
ny
 X2
nx

 Y2
ny
Confidence Intervals for Two Means: Unknown
Population Variances that are Assumed to be Equal
Öryggismörk fyrir mismun vongilda: Óþekkt dreifni en
dreifnin er eins skv. Forsendu.
Suppose that there are two independent random samples with nx and
ny observations from normally distributed populations with means X
and Y and a common, but unknown population variance. If the
observed sample means are X and Y, and the observed sample
variances are s2X and s2Y, then a 100(1 - )% confidence interval for (X
- Y) is given by
s 2p s 2p
s 2p s 2p
( X  Y )  tnx  n y 2, / 2

  X  Y  ( X  Y )  tnx  n y 2, / 2

nx n y
nx n y
or equivalently,
(X Y )  B
where the margin of error is given by
B  tnx  n y 2, / 2
s 2p
nx

s 2p
ny
Confidence Intervals for Two Means: Unknown
Population Variances that are Assumed to be Equal
(continued)
The pooled sample variance, s2p, is given by
s 
2
p
tnx ny 2, / 2 is the number for which
(nx  1) s X2  (n y  1) sY2
nx  n y  2
P(t nx  n y  2  t nx  n y 2, / 2 ) 

2
The random variable, T, is approximately a Student’s t distribution
with nX + nY –2 degrees of freedom and T is given by,
( X  Y )  (  X  Y )
T
1
1
sp

n X nY
Confidence Intervals for Two Means:
Unknown Population Variances, Assumed
Not Equal
Suppose that there are two independent random samples of nx and ny
observations from normally distributed populations with means X
and Y and it is assumed that the population variances are not equal.
If the observed sample means and variances are X, Y, and s2X , s2Y, then
a 100(1 - )% confidence interval for (X - Y) is given by
( X  Y )  t( v , / 2)
s X2 sY2
s X2 sY2

  X  Y  ( X  Y )  t( v , / 2)

nx n y
nx n y
where the margin of error is given by
B  t( v , / 2)
s X2 sY2

nx n y
Confidence Intervals for Two Means: Unknown
Population Variances, Assumed Not Equal
(continued)
The degrees of freedom, v, is given by
s X2
sY2 2
[( )  ( )]
nX
nY
v 2
sX 2
sY2 2
( ) /(n X  1)  ( ) /(nY  1)
nX
nY
If the sample sizes are equal, then the degrees of freedom reduces to




2

  (n  1)
v  1  2
s X sY2 
 2 

2
sY s X 

Confidence Intervals for the Difference Between Two
Population Proportions (Large Samples) Öryggismörk
fyrir mismun þýðishlutfalla (stór úrtök)
Let pX, denote the observed proportion of successes in a random
sample of nX observations from a population with proportion X
successes, and let pY denote the proportion of successes observed in
an independent random sample from a population with proportion
Y successes. Then, if the sample sizes are large (generally at least
forty observations in each sample), a 100(1 - )% confidence interval
for the difference between population proportions (X - Y) is given
by
( p X  pY )  B
Where the margin of error is
B  Z / 2
p X (1  p X ) pY (1  pY )

nX
nY
Sample Size for the Mean of a Normally
Distributed Population with Known Population
Variance Gagnasafn fyrir vongildi normaldreifðs
þýðis með þekktri þýðisdreifni
Suppose that a random sample from a normally
distributed population with known variance 2 is
selected. Then a 100(1 - )% confidence interval for
the population mean extends a distance B
(sometimes called the bound, sampling error, or the
margin of error) on each side of the sample mean, if
the sample size, n, is
Z / 2
n
2
B
2
2
Sample Size for Population Proportion
Stærð gagnasafns fyrir þýðishlutfall
Suppose that a random sample is selected from a
population. Then a 100(1 - )% confidence interval
for the population proportion, extending a distance
of at most B on each side of the sample proportion,
can be guaranteed if the sample size, n, is
0.25( Z / 2 )
n
2
B
2
Key Words
 Bias
 Bound
 Confidence interval:
 For mean, known variance
 For mean, unknown
variance
 For proportion
 For two means, matched
 For two means, variances
equal
 For two means, variances
not equal
 For variance
 Confidence Level
 Estimate
 Estimator
 Interval Half Width
 Lower Confidence Limit
(LCL)
 Margin of Error
 Minimum Variance
Unbiased Estimator
 Most Efficient Estimator
 Point Estimate
 Point Estimator
Key Words
(continued)
 Relative Efficiency
 Reliability Factor
 Sample Size for Mean,
Known Variance
 Sample Size for
Proportion
 Sampling Error
 Student’s t
 Unbiased Estimator
 Upper Confidence Limit
(UCL)
 Width