Mean Squared Error and Maximum Likelihood

Download Report

Transcript Mean Squared Error and Maximum Likelihood

Mean Squared Error and
Maximum Likelihood
Lecture XVIII
Mean Squared Error
• As stated in our discussion on closeness,
one potential measure for the goodness of
an estimator is




ˆ
E  


2
• In the preceding example, the mean square
error of the estimate can be written as:

E T   
2

where  is the true parameter value between
zero and one.
• This expected value is conditioned on the
probability of T at each level value of .
PX ,    1  
1 X
X
PX1, X 2 ,   
1 
1 X1  X 2
X1  X 2
MSE   P0,0, 0     2 P0,1, .5   
2
2
 P1,1, 1   
2
MSE   P0, 0    P1, 1  
2
MSE   (.5  )
2
2
MSEs of Each Estimator
0.25
0.2
0.15
0.1
0.05
0.2
0.4
0.6
0.8
1
• Definition 7.2.1. Let X and Y be two
estimators of . We say that X is better (or
more efficient) than Y if E(X-)2  E(Y-)
for all  in Q and strictly less than for at
least one  in Q.
• When an estimator is dominated by another
estimator, the dominated estimator is
inadmissable.
• Definition 7.2.2. Let  be an estimator of .
We say that  is inadmissible if there is
another estimator which is better in the
sense that it produces a lower mean square
error of the estimate. An estimator that is
not inadmissible is admissible.
Strategies for Choosing an
Estimator:
• Subjective strategy: This strategy considers
the likely outcome of  and selects the
estimator that is best in that likely
neighborhood.
• Minimax Strategy: According to the
minimax strategy, we choose the estimator
for which the largest possible value of the
mean squared error is the smallest:
• Definition 7.2.3: Let ^ be an estimator of .
It is a minimax estimator if for any other
estimator of ~ , we have:



~


max E ˆ  
 max E   




2

2
Best Linear Unbiased Estimator:
• Definition 7.2.4: ^ is said to be an
unbiased estimator of  if

E ˆ  
for all  in Q. We call
bias
 
E ˆ 
• In our previous discussion T and S are
unbiased estimators while W is biased.
• Theorem 7.2.10: The mean squared error is
the sum of the variance and the bias
squared. That is, for any estimator ^ of 


 
E  ˆ     V    E ˆ  


2
2
• Theorem 7.2.11 Let {Xi} i=1,2,…n be
independent and have a common mean m
and variance s2. Consider the class of
linear estimators of m which can be written
in the form

n
a
X
i
i
i 1
and impose the unbaisedness condition
E i 1 ai X i  m
n
Then
V X   V

n
a
X
i
i
i 1

for all ai satisfying the unbiasedness
condition. Further, this condition holds
with equality only for ai=1/n.
• To prove these points note that the ais must
sum to one for unbiasedness
n
n
n
n
E i 1 ai X i  i 1 ai EX i   i 1 ai m  m i 1 ai
• The final condition can be demonstrated
through the identity
2
1
2 n
1
n

2
i1  ai  n   i1 ai  n i1 ai  n
n
n
1
ai 

n
i 1
• Theorem 7.2.12: Consider the problem of
minimizing

n
2
i
a
i 1
with respect to {ai} subject to the condition

n
a
b

1
i
i
i 1
The solution to this problem is given by
ai 
bi
n
b
i 1
2
i
Asymptotic Properties
• Definition 7.2.5. We say that ^ is a
consistent estimator of  if
plimnˆ  
Maximum Likelihood
• The basic concept behind maximum
likelihood estimation is to choose that set of
parameters that maximize the likelihood of
drawing a particular sample.
– Let the sample be X={5,6,7,8,10}. The
probability of each of these points based on the
unknown mean, m, can be written as
2


1
5 m 
f 5 | m  
exp

2 
2

2


1
6 m 
f 6 | m  
exp

2 
2


2


1
10  m  
f 10 | m  
exp

2
2


• Assuming that the sample is independent so
that the joint distribution function can be
written as the product of the marginal
distribution functions, the probability of
drawing the entire sample based on a given
mean can then be written as:
L X | m  
1
2 
5
2
2
 5  m 2 6  m 2

10  m  
exp



2
2
2


• The value of m that maximize the likelihood
function of the sample can then be defined
by
max L X | m 
m
Under the current scenario, we find it easier,
however, to maximize the natural logarithm
of the likelihood function:
2
2
2



5  m  6  m 
10  m  


K 

2
2
2


 5  m   6  m    10  m   0

max lnL X | m  
m
m
mˆ MLE
5  6  7  8  9  10

6