Econometrics I Professor William Greene Stern School of Business Department of Economics 11-1/72 Part 11: Asymptotic Distribution Theory.

Download Report

Transcript Econometrics I Professor William Greene Stern School of Business Department of Economics 11-1/72 Part 11: Asymptotic Distribution Theory.

Econometrics I
Professor William Greene
Stern School of Business
Department of Economics
11-1/72
Part 11: Asymptotic Distribution Theory
Econometrics I
Part 11 – Asymptotic
Distribution Theory
11-2/72
Part 11: Asymptotic Distribution Theory
Received October 6, 2012
Dear Prof. Greene,
I am AAAAAA, an assistant professor of Finance at the xxxxx university
of xxxxx, xxxxx. I would be grateful if you could answer my question
regarding the parameter estimates and the marginal effects in
Multinomial Logit (MNL).
After running my estimations, the parameter estimate of my variable of
interest is statistically significant, but its marginal effect, evaluated at the
mean of the explanatory variables, is not. Can I just rely on the
parameter estimates’ results to say that the variable of interest is
statistically significant? How can I reconcile the parameter estimates
and the marginal effects’ results?
Thank you very much in advance!
Best,
AAAAAA
11-3/72
Part 11: Asymptotic Distribution Theory
Asymptotics: Setting
Most modeling situations involve stochastic
regressors, nonlinear models or nonlinear estimation
techniques. The number of exact statistical results,
such as expected value or true distribution, that can
be obtained in these cases is very low. We rely,
instead, on approximate results that are based on
what we know about the behavior of certain statistics
in large samples. Example from basic statistics:
What can we say about 1/ x ? We know a lot about
x . What do we know about its reciprocal?
11-4/72
Part 11: Asymptotic Distribution Theory
Convergence
Definitions, kinds of convergence as n grows large:
1. To a constant; example, the sample mean, x
converges to the population mean.
2. To a random variable; example, a t statistic
with n -1 degrees of freedom converges to a
standard normal random variable
11-5/72
Part 11: Asymptotic Distribution Theory
Convergence to a Constant
Sequences and limits.
Sequence of constants, indexed by n
(n(n+1)/2 + 3n + 5)
Ordinary limit: --------------------------  ? (1/2)
(n2 + 2n + 1)
(The use of the “leading term”)
Convergence of a random variable. What does it mean
for a random variable to converge to a constant?
Convergence of the variance to zero. The random
variable converges to something that is not random.
11-6/72
Part 11: Asymptotic Distribution Theory
Convergence Results
Convergence of a sequence of random variables to a constant convergence in mean square: Mean converges to a constant,
variance converges to zero. (Far from the most general, but
definitely sufficient for our purposes.)
xn  1n in1xi , E[ xn ]   , Var[xn ]=2 / n  0
A convergence theorem for sample moments. Sample moments
converge in probability to their population counterparts.
Generally the form of The Law of Large Numbers. (Many forms; see
Appendix D in your text. This is the “weak” law of large numbers.)
Note the great generality of the preceding result. (1/n)Σig(zi) converges
to E[g(zi)].
11-7/72
Part 11: Asymptotic Distribution Theory
Extending the Law of Large Numbers
Suppose x has mean  and finite variance 2 and x1 , x 2 , ..., x n
are a random sample. Then the LLN applies to x.
Let z i  x iP . Then, z1 , z 2 ,..., z n are a random sample from a population
with mean E[z] = E[x P ] and Var[z] = E[x 2P ] - {E[x P ]}2 . The LLN
applies to z as long as the moments are finite.
There is no mention of normality in any of this.
Example: If x ~ N[0,2 ], then
 0 if P is odd
E[x ]   P
 (P  1)!! if P is even
(P  1)!! = product of odd numbers up to P-1.
P
No power of x is normally distributed. Normality is irrelevant to the LLN.
11-8/72
Part 11: Asymptotic Distribution Theory
Probability Limit
Let  be a constant,  be any positive value,
and n index the sequence.
If lim(n  )Prob[|x n   | > ]  0 then, plim x n  .
x n converges in probability to .
In words, the probability that the difference
between x n and  is larger than  for any 
goes to zero. x n becomes arbitrarily close to .
Mean square convergence is sufficient (not necessary)
for convergence in probability. (We will not require
other, broader definitions of convergence, such as
"almost sure convergence.")
11-9/72
Part 11: Asymptotic Distribution Theory
Mean Square Convergence
11-10/72
Part 11: Asymptotic Distribution Theory
Probability Limits and Expecations
What is the difference between
E[xn] and plim xn?
A notation
P
plim xn    x n 
11-11/72
Part 11: Asymptotic Distribution Theory
Consistency of an Estimator
If the random variable in question, xn is an estimator (such
as the mean), and if
plim xn = θ
Then xn is a consistent estimator of θ.
Estimators can be inconsistent for two reasons:
(1) They are consistent for something other than the
thing that interests us.
(2) They do not converge to constants. They are not
consistent estimators of anything.
We will study examples of both.
11-12/72
Part 11: Asymptotic Distribution Theory
The Slutsky Theorem
Assumptions: If
xn is a random variable such that plim xn = θ.
For now, we assume θ is a constant.
g(.) is a continuous function with continuous derivatives.
g(.) is not a function of n.
Conclusion: Then plim[g(xn)] = g[plim(xn)] assuming
g[plim(xn)] exists. (VVIR!)
Works for probability limits. Does not work for expectations.
E[xn ]=; plim(xn )  , E[1/xn ]=?; plim(1/xn )=1/
11-13/72
Part 11: Asymptotic Distribution Theory
Slutsky Corollaries
x n and y n are two sequences of random variables with
probability limits  and .
Plim (x n  y n )     (sum)
Plim (x n  y n )     (product)
Plim (x n / y n )   /  (product, if   0)
Plim[g(x n ,y n )]  g( , ) assuming it exists and g(.) is
continuous with continuous partials, etc.
11-14/72
Part 11: Asymptotic Distribution Theory
Slutsky Results for Matrices
Functions of matrices are continuous functions of the
elements of the matrices. Therefore,
If plimAn = A and plimBn = B (element by element), then
plim(An-1) = [plim An]-1 = A-1
and
plim(AnBn) = plimAnplim Bn = AB
11-15/72
Part 11: Asymptotic Distribution Theory
Limiting Distributions
Convergence to a kind of random variable instead of to a
constant
xn is a random sequence with cdf Fn(xn). If plim xn = θ (a
constant), then Fn(xn) becomes a point. But, xn may
converge to a specific random variable. The
distribution of that random variable is the limiting
distribution of xn. Denoted
d
xn  x  Fn (xn ) 
F(x)
n
11-16/72
Part 11: Asymptotic Distribution Theory
Limiting Distribution
x1 , x2 ,..., xn = a random sample from N[, 2 ]
For purpose of testing H 0 :   0, the usual test statistic is
t n 1 
s
xn
n
/ n
x


n

, where s n2
i 1
i
 xn 
2
n 1
The exact density of the random variable t n 1 is t with n-1 degrees of
freedom. The density varies with n;

tn21 
(n / 2)
1
f(t n 1 ) 
1 

[(n  1) / 2] (n  1)   n  1 
The cdf, Fn 1 (t) =

t

n/2
f n 1 ( x)dx. The distribution has mean zero and
variance (n-1)/(n-3). As n  , the distribution and the random variable
d
converge to standard normal, which is written t n1 
 N[0,1].
11-17/72
Part 11: Asymptotic Distribution Theory
A Slutsky Theorem for Random Variables
(Continuous Mapping)
d
If x n 
 x, and if g(x n ) is a continuous function with
continuous derivatives and does not involve n, then
d
g(x n ) 
 g(x).
Example : t n = random variable with t distribution with
n degrees of freedom.
tn2 = exactly, an F random variable with [1,n]
degrees of freedom.
d
tn 
 N(0,1),
d
tn2 
[N(0,1)]2 = chi-squared[1].
11-18/72
Part 11: Asymptotic Distribution Theory
An Extension of the Slutsky Theorem
d
If x n 
 x (x n has a limiting distribution) and
 is some relevant constant, and
d
g(x n , ) 
 g (gn has a limiting distribution that is
some function of ) and
plim y n  , then
d
g(x n , y n ) 
 g(x n , )
(replacing  with a consistent estimator
leads to the same limiting distribution).
11-19/72
Part 11: Asymptotic Distribution Theory
Application of the Slutsky Theorem
Large sample behavior of the F statistic for testing restrictions
(e * 'e * - e'e)  d 2 [J]
 
2
(e * 'e * - e'e)/J
J
J

F=

e'e /(n-K)  p
e'e /(n-K)
  1
2


Therefore,
d
JF 
 2 [J]
Establishing the numerator requires a c entral limit theorem.
11-20/72
Part 11: Asymptotic Distribution Theory
Central Limit Theorems
Central Limit Theorems describe the large sample
behavior of random variables that involve sums
of variables. “Tendency toward normality.”
Generality: When you find sums of random
variables, the CLT shows up eventually.
The CLT does not state that means of samples
have normal distributions.
11-21/72
Part 11: Asymptotic Distribution Theory
A Central Limit Theorem
Lindeberg-Levy CLT (the simplest version of the CLT)
If x1 ,..., x n are a random sample from a population
with finite mean  and finite variance 2 , then
n(x  )
d

 N(0,1)

Note, not the limiting distribution of the mean, since
the mean, itself, converges to a constant.
A useful corollary: if plim s n  , and the other conditions
are met, then
n(x  )
d

 N(0,1)
sn
11-22/72
Part 11: Asymptotic Distribution Theory
Lindeberg-Levy vs. Lindeberg-Feller
Lindeberg-Levy assumes random sampling –
observations have the same mean and same
variance.
Lindeberg-Feller allows variances to differ across
observations, with some necessary assumptions
about how they vary.
Most econometric estimators require LindebergFeller (and extensions such as Lyapunov).
11-23/72
Part 11: Asymptotic Distribution Theory
Order of a Sequence
Order of a sequence
‘Little oh’ o(.). Sequence hn is o(n) (order less than n) iff n- hn  0.
Example: hn = n1.4 is o(n1.5) since n-1.5 hn = 1 /n.1  0.
‘Big oh’ O(.). Sequence hn is O(n) iff n- hn  a finite nonzero
constant.
Example 1: hn = (n2 + 2n + 1) is O(n2).
Example 2: ixi2 is usually O(n1) since this is nthe mean of xi2
and the mean of xi2 generally converges to E[xi2], a finite
constant.
What if the sequence is a random variable? The order is in terms of
the variance.
Example: What is the order of the sequence x in random sampling?
n
Var[ x ] = σ2/n which is O(1/n)
n
11-24/72
Part 11: Asymptotic Distribution Theory
Cornwell and Rupert Panel Data
Cornwell and Rupert Returns to Schooling Data, 595 Individuals, 7 Years
Variables in the file are
EXP
WKS
OCC
IND
SOUTH
SMSA
MS
FEM
UNION
ED
LWAGE
=
=
=
=
=
=
=
=
=
=
=
work experience
weeks worked
occupation, 1 if blue collar,
1 if manufacturing industry
1 if resides in south
1 if resides in a city (SMSA)
1 if married
1 if female
1 if wage set by union contract
years of education
log of wage = dependent variable in regressions
These data were analyzed in Cornwell, C. and Rupert, P., "Efficient Estimation with Panel
Data: An Empirical Comparison of Instrumental Variable Estimators," Journal of Applied
Econometrics, 3, 1988, pp. 149-155.
11-25/72
Part 11: Asymptotic Distribution Theory
11-26/72
Part 11: Asymptotic Distribution Theory
11-27/72
Part 11: Asymptotic Distribution Theory
Histogram for LWAGE
11-28/72
Part 11: Asymptotic Distribution Theory
The kernel density
estimator is a
histogram (of sorts).
* 

ˆf(x * )  1 n 1 K  xi  xm , for a set of points x *

m
m
n i1 B  B 
B  "bandwidth" chosen by the analyst
K  the kernel function, such as the normal
or logistic pdf (or one of several others)
x*  the point at which the density is approximated.
This is essentially a histogram with small bins.
11-29/72
Part 11: Asymptotic Distribution Theory
Kernel Density Estimator
The curse of dimensionality
* 

x

x
1
n 1
*
i
m
ˆ
f(x
)

K
,
for
a
set
of
points
x



m
m
n i1 B  B 
*
B  "bandwidth"
K  the kernel function
x*  the point at which the density is approximated.
ˆ
f(x*)
is an estimator of f(x*)
1 n
Q(xi | x*)  Q(x*).

i1
n
1
1
But, Var[Q(x*)]   Something. Rather, Var[Q(x*)]  3/5 * Something
N
N
ˆ
I.e.,f(x*)
does not converge to f(x*) at the same rate as a mean
converges to a population mean.
11-30/72
Part 11: Asymptotic Distribution Theory
Kernel Estimator for LWAGE
ˆ
f(x*)
X*
11-31/72
Part 11: Asymptotic Distribution Theory
Asymptotic Distribution
An asymptotic distribution is a finite sample approximation to the true
distribution of a random variable that is good for large samples, but not
necessarily for small samples.
Stabilizing transformation to obtain a limiting distribution. Multiply random
variable xn by some power, a, of n such that the limiting distribution of
naxn has a finite, nonzero variance.
Example, x has a limiting variance of zero, since the variance is σ2/n. But,
n
the variance of √n xn is σ2. However, this does not stabilize the
distribution because E[ n x] = √ nμ.
n
The stabilizing transformation would be n(x  )
11-32/72
Part 11: Asymptotic Distribution Theory
Asymptotic Distribution
Obtaining an asymptotic distribution from a limiting distribution
Obtain the limiting distribution via a stabilizing transformation
Assume the limiting distribution applies reasonably well in
finite samples
Invert the stabilizing transformation to obtain the asymptotic
d
distribution
n(x  ) /  
 N[0,1]
Assume holds in finite samples. Then,
a
n(x  ) 
 N[0, 2 ]
a
(x  ) 
 N[0, 2 / n]
a
x 
 N[, 2 / n]
Asymptotic distribution.
2 / n  the asymptotic variance.
Asymptotic normality of a distribution.
11-33/72
Part 11: Asymptotic Distribution Theory
Asymptotic Efficiency


Comparison of asymptotic variances
How to compare consistent estimators? If both
converge to constants, both variances go to zero.
 Example: Random sampling from the normal
distribution,
 Sample mean is asymptotically normal[μ,σ2/n]
 Median is asymptotically normal [μ,(π/2)σ2/n]
 Mean is asymptotically more efficient
11-34/72
Part 11: Asymptotic Distribution Theory
Limiting Distribution: Testing for Normality
Normality Test for Random Variable e
 i 1 (ei  e )2
n
s=
, mj


n
e = 0 for regression residuals
n
j
(
e

e
)
i 1 i
n
,
( m3 / s 3 ) 2 [( m4 / s 4 )  3]2
Chi-squared[2] =

6
20
If the assumptions of the Lindeberg CLT apply to
( m3 / s 3 ) 2 [( m4 / s 4 )  3]2 d
n ( x  ) /  x then


 Chi squared[2]
6
20
11-35/72
Part 11: Asymptotic Distribution Theory
Extending the Law of Large Numbers
Suppose x has mean  and finite variance 2 and x1 , x 2 , ..., x n
are a random sample. Then the LLN applies to x.
Let z i  x iP . Then, z1 , z 2 ,..., z n are a random sample from a population
with mean E[z] = E[x P ] and Var[z] = E[x 2P ] - {E[x P ]}2 . The LLN
applies to z as long as the moments are finite.
There is no mention of normality in any of this.
Example: If x ~ N[0,2 ], then
 0 if P is odd
E[x ]   P
 (P  1)!! if P is even
(P  1)!! = product of odd numbers up to P-1.
P
No power of x is normally distributed. Normality is irrelevant to the LLN.
11-36/72
Part 11: Asymptotic Distribution Theory
Elements of a Proof for Bowman Shenton
1 n
1 n
1 n
j
j
(1) CLT applies to m j   i 1 (x i  )   i 1 ( x i )   i 1 z i
n
n
n
(2) Slutsky Theorem: use x instead of  in m j .
(3) Slutsky Theorem: use s 2 instead of 2 in computing Var[m j ]
(4) Continuous mapping: Square of a N[0,1] is chi-squared[1].
(5) Slutsky Theorem: Operate on h(m 3 ) + h(m 4 ). h(.) is square.
What if the variable is not normally distributed?
(a) CLT still applies.
(b) E[x i j ] is still zero but (m3 ,m 4 ) no longer 
(0,3)
(c) Sum of squares now has a noncentral chi-squared
11-37/72
Part 11: Asymptotic Distribution Theory
Bera and Jarque
Directly Apply Bowman and Shenton to Regression Residuals?
No. Regression residuals are not a random sample.
e = My = MXβ + Mε = M.
E[e]= 0, but E[ee'] = 2 M
M  I  X( XX) 1 X
 1  xi ( XX) 1 xi on the diagonal 
1
M ij  
1
0
xi ( XX) x j off the diagonal 
As n 
 , M 
 I BUT!! M is n  n, so it does not 
 anything!
As n 
 , the vector of residuals becomes a random sample.
11-38/72
Part 11: Asymptotic Distribution Theory
Expectation vs. Probability Limit
1
xN , PX N ( x)  
N
(1- N1 )
1
N
E[ xN ]  1 (1- N1 )  N N1  2  N1 
2
Plim x N  1 by the definition, lim N  Prob[|x N  1| >  ]=0  
(lim N  Prob( xN  1)  1
3
1
Var[ xN ]  N  3   2 

N N
11-39/72
Part 11: Asymptotic Distribution Theory
The Delta Method
The delta method (combines most of these concepts)
Nonlinear transformation of a random variable: f(xn) such that
plim xn =  but n (xn - ) is asymptotically normally
distributed. What is the asymptotic behavior of f(xn)?
Taylor series approximation: f(xn)  f() + f() (xn - )
By Slutsky theorem, plim f(xn) = f()
n[f(xn) - f()]  f() [n (xn - )]
Large sample behaviors of the LHS and RHS sides are the same
(generally - requires f(.) to be nicely behaved. RHS is a constant
times something familiar.
Large sample variance is [f()]2 times large sample Var[n (xn - )]
Return to asymptotic variance of xn
Gives us the VIR for the asymptotic distribution of a function.
11-40/72
Part 11: Asymptotic Distribution Theory
Delta Method
a
If x n 
 N[, 2 / n] and
f(x n ) is a continuous and continuously differentiable
function that does not involve n, then
a
f(x n ) 
 N{f(),[f '()]2 2 / n}
11-41/72
Part 11: Asymptotic Distribution Theory
Delta Method - Applications
a
x n 
 N[, 2 / n]
What is the asymptotic distribution of
f(x n )=exp(x n ) or f(x n )=1/x n
(1) Normal since x n is asymptotically normally distributed
(2) Asymptotic mean is f()=exp() or 1/.
(3) For the variance, we need f'() =exp() or -1/2
Asy.Var[f(x n )]= [exp()]2 2 / n or [1/ 4 ]2 / n
11-42/72
Part 11: Asymptotic Distribution Theory
Delta Method Application
2
Target of estimation is  =
1  2
Estimation strategy:
(1) Estimate  = log 2
(2) Estimate  = exp( /2)
(3) Estimate  =  2 / (1   2 )
11-43/72
Part 11: Asymptotic Distribution Theory
Delta Method
ˆ  1.123706 Vˆ  .07155482  .00512009
ˆ  exp(ˆ / 2)  exp(1.123706 / 2)  exp(.561853)  .5701517
gˆ  d ˆ / d ˆ  12 exp(ˆ / 2)  12 ˆ  .2850758
gˆ 2   d ˆ / d ˆ   .08126821
2
 d ˆ / d ˆ 
2
Vˆ  .08126821(.00512009)  .0004161
Estimated Standard Error = .0004161  .02039854
11-44/72
Part 11: Asymptotic Distribution Theory
Midterm Question
Continuing the previous example, there are two approaches implied
for estimating :
2
(1)  = f () =
1  2
exp()
(2)  = h() =
1  exp()
Use the delta method to estimate a standard error for each of the two
estimators of . Do you obtain the same answer?
Bring your prepared solution to the exam and submit it with the
in class part of the exam.
11-45/72
Part 11: Asymptotic Distribution Theory
Confidence Intervals?
ˆ  .5701517  1.96(.0203985) = .5301707 to .6101328??
The center of the confidence interval given in the table is .571554!
What is going on here?
The confidence limits given are exp(-1.23695/2) to exp(-.984361/2)!
11-46/72
Part 11: Asymptotic Distribution Theory
Krinsky and Robb vs. the Delta Method
Krinsky and Robb, ReStat, 1986. The delta method doesn't work
very well when f(.) is highly nonlinear. Alternative approach based
on the law of large numbers:
(1) Compute x n = the estimator of , and estimate the asymptotic
variance (standard deviation), v n (s n ).
(2) Compute f(x n ) to estimate f().
(3) Estimate the asymptotic variance of f(x n ) by using a random
number generator to draw a normal random sample from the
asymptotic population N[x n , s n2 ]. Compute a sample of function
values f1 ... fR = f(x n,1 ),...,f(x n,R ). Use the sample variance
of these draws to estimate the asymptotic variance of f(x n ).
(Krinsky and Robb, ReStat, 1991. Programming error. Retraction.)
11-47/72
Part 11: Asymptotic Distribution Theory
Krinsky and Robb
Generate R (e.g., 10000) draws on lnsig2u ~N[-1.123706,.07155482 ]
(To center around point estimate, use sigma_ur  -1.123706  (lnsig2u r  lnsig2u)
Calculate sigma_u r = exp(.5*lnsig2u r )
Compute standard error and confidence interval for sigma_u r as descriptive statistics.
11-48/72
Part 11: Asymptotic Distribution Theory
Krinsky and Robb
11-49/72
Part 11: Asymptotic Distribution Theory
Delta Method – More than One Parameter
If ˆ
1 , ˆ
1 ,..., ˆ
K are K consistent estimators of K parameters
 v11 v12 ... v1K 
v

v
...
v
22
2K 
1 ,2 ,...,K with asymptotic covariance matrix V=  21
,
 ... ... ... ... 


 v K1 vK2 ... vKK 
and if f(ˆ
1 , ˆ
1 ,..., ˆ
K ) = a continuous function with continuous derivatives, then
the asymptotic variance of f(ˆ
1 , ˆ
1 ,..., ˆ
K ) is
 f(.)
g'Vg = 
 1
11-50/72
 v11

f(.)
f(.)   v 21
...

2
K   ...

 v K1
v12 ...
v 22 ...
... ...
v K2 ...
 f(.) 
  
1 
v1K  
 f(.)  K K
v 2K  
f(.) f(.)



Vkl


2 
...  
l
k 1 l1 k
  ... 
v KK  


f(.)


 K 
Part 11: Asymptotic Distribution Theory
Log Income Equation
---------------------------------------------------------------------Ordinary
least squares regression ............
LHS=LOGY
Mean
=
-1.15746
Estimated Cov[b1,b2]
Standard deviation
=
.49149
Number of observs.
=
27322
Model size
Parameters
=
7
Degrees of freedom
=
27315
Residuals
Sum of squares
=
5462.03686
Standard error of e =
.44717
Fit
R-squared
=
.17237
--------+------------------------------------------------------------Variable| Coefficient
Standard Error b/St.Er. P[|Z|>z]
Mean of X
--------+------------------------------------------------------------AGE|
.06225***
.00213
29.189
.0000
43.5272
AGESQ|
-.00074***
.242482D-04
-30.576
.0000
2022.99
Constant|
-3.19130***
.04567
-69.884
.0000
MARRIED|
.32153***
.00703
45.767
.0000
.75869
HHKIDS|
-.11134***
.00655
-17.002
.0000
.40272
FEMALE|
-.00491
.00552
-.889
.3739
.47881
EDUC|
.05542***
.00120
46.050
.0000
11.3202
--------+-------------------------------------------------------------
11-51/72
Part 11: Asymptotic Distribution Theory
Age-Income Profile:
Married=1, Kids=1, Educ=12, Female=1
11-52/72
Part 11: Asymptotic Distribution Theory
Application: Maximum of a Function
AGE|
AGESQ|
.06225***
-.00074***
.00213
.242482D-04
29.189
-30.576
.0000
.0000
43.5272
2022.99
log Y  1 Age  2 Age2  ...
At what age does log income reach its maximum?
1
 log Y
.06225
 1  22 Age  0 => Age* =

 42.1
Age
22 2(.00074)
Age * 1
1

=g1 =
=675.68
1
22
2(.00074)
Age * 1
.06225
 2  g2 
 56838.9
2
2
22
2(.00074)
11-53/72
Part 11: Asymptotic Distribution Theory
Delta Method Using Visible Digits
675.682 (4.54799  106 )  56838.92 (5.8797  1010 )
2(675.68)(56838.9)(5.1285  108 )
 .0366952
standard error = square root = .1915599
11-54/72
Part 11: Asymptotic Distribution Theory
Delta Method Results
----------------------------------------------------------WALD procedure.
--------+-------------------------------------------------Variable| Coefficient
Standard Error b/St.Er. P[|Z|>z]
--------+-------------------------------------------------G1|
674.399***
22.05686
30.575
.0000
G2|
56623.8***
1797.294
31.505
.0000
AGESTAR|
41.9809***
.19193
218.727
.0000
--------+-------------------------------------------------(Computed using all internal digits of regression results)
11-55/72
Part 11: Asymptotic Distribution Theory
11-56/72
Part 11: Asymptotic Distribution Theory
Krinsky and Robb
Descriptive Statistics
============================================================
Variable
Mean
Std.Dev.
Minimum
Maximum
--------+--------------------------------------------------ASTAR
| 42.0607
.191563
41.2226
42.7720
11-57/72
Part 11: Asymptotic Distribution Theory
More than One Function and
More than One Coefficient
If ˆ
1 , ˆ
1 ,..., ˆ
K are K consistent estimators of K parameters
 v11 v12
v
v 22
1 ,2 ,...,K with asymptotic covariance matrix V=  21
 ... ...

 v K1 v K2
and if f1(ˆ
1 , ˆ
1 ,..., ˆ
K ), f2(ˆ
1 , ˆ
1 ,..., ˆ
K ), ...,fJ(ˆ
1 , ˆ
1 ,..., ˆ
K ) =
... v1K 
... v 2K 
,
... ... 

... v KK 
J continuous
ˆ is
functions with continuous derivatives, then the asymptotic covariance matrix of ˆf1,...,fJ
f1
 
1
 f 2
 
GVG' =  1
 ...
 fJ
 1
11-58/72
...
 v
 11
f 2
f 2 
... K  v 21
2

  ...
... ... ... 

fJ
fJ  v
...
2
K   K1
f1
2
f1
K
v12 ... v1K   1
 f1

v 22 ... v 2K   2
... ... ...   ...
  f1
v K2 ... v KK   
 K
f1
...


f 2
fJ
...
2
2 

... ... ... 
f 2
... fJK 
K
f 2
1
fJ
1
Part 11: Asymptotic Distribution Theory
Application: Partial Effects
Received October 6, 2012
Dear Prof. Greene,
I am AAAAAA, an assistant professor of Finance at the xxxxx university
of xxxxx, xxxxx. I would be grateful if you could answer my question
regarding the parameter estimates and the marginal effects in
Multinomial Logit (MNL).
After running my estimations, the parameter estimate of my variable of
interest is statistically significant, but its marginal effect, evaluated at the
mean of the explanatory variables, is not. Can I just rely on the
parameter estimates’ results to say that the variable of interest is
statistically significant? How can I reconcile the parameter estimates
and the marginal effects’ results?
Thank you very much in advance!
Best,
AAAAAA
11-59/72
Part 11: Asymptotic Distribution Theory
Application: Doctor Visits


German Individual Health Care data: N=27,236
Model for number of visits to the doctor:
 True E[V|Income] =
exp(1.412 - .0745*income)
 Linear regression: g*(Income)=3.917 - .208*income
11-60/72
Part 11: Asymptotic Distribution Theory
A Nonlinear Model
E[docvis | x]  exp(1  2 AGE  3EDUC...)
11-61/72
Part 11: Asymptotic Distribution Theory
Interesting Partial Effects
Estimate Effects at the Means of the Data
ˆ
E[docvis
| x]  exp(b  b AGE  b EDUC...)
1
2
3
ˆ
E[docvis
| x]
 exp(b1  b2 AGE  b3 EDUC...)b 2
AGE
= f AGE (b1 , b2 ,..., b6 | AGE, EDUC)
ˆ
E[docvis
| x]
 exp(b1  b 2 AGE  b3 EDUC...)b3
EDUC
= f EDUC (b1 , b2 ,..., b6 | AGE, EDUC)
ˆ
E[docvis
| x]
 exp(b1  b2 AGE  b3 EDUC...)b6
INCOME
= f HHNINC (b1 , b2 ,..., b6 | AGE, EDUC)
11-62/72
Part 11: Asymptotic Distribution Theory
Necessary Derivatives (Jacobian)
ˆ
E[docvis
| x]
 exp(b1  b2 AGE  b3 EDUC...)b2 = f AGE (b1 , b2 ,..., b6 | AGE, EDUC)
AGE
f AGE b2 exp(b1  b2 AGE  b3 EDUC...)

 b2 exp(...)  1
b1
b1
f AGE
b
 exp(...)
 b2
 exp(...) 2
b2
b2
b2
 b2 exp(...)  AGE
 e xp(...)1
f AGE b2 exp(b1  b2 AGE  b3 EDUC...)

 b2 exp(...)  EDUC
b3
b3
f AGE b2 exp(b1  b2 AGE  b3 EDUC...)

 b2 exp(...)  MARRIED
b4
b4
f AGE b2 exp(b1  b2 AGE  b3 EDUC...)

 b2 exp(...)  FEMALE
b5
b5
f AGE b2 exp(b1  b2 AGE  b3 EDUC...)

 b2 exp(...)  HHNINC
b6
b6
11-63/72
Part 11: Asymptotic Distribution Theory
11-64/72
Part 11: Asymptotic Distribution Theory
11-65/72
Part 11: Asymptotic Distribution Theory
Partial Effects at Means vs.
Mean of Partial Effects
Partial Effects at the Means
1 n
f ( | x ) f  | n i 1x i 
(, x ) 

x
x
Mean of Partial Effects
1 n f ( | x i )
(, X ) =  i 1
n
x i
Makes more sense for dummy variables, d:
 i (, x i ,d)  f ( | x i ,d=1) - f ( | x i ,d=0)
(, X, d) makes more sense than (, x, d)
11-66/72
Part 11: Asymptotic Distribution Theory
Partial Effect for a Dummy Variable?
11-67/72
Part 11: Asymptotic Distribution Theory
Application: CES Function
11-68/72
Part 11: Asymptotic Distribution Theory
Application: CES Function
 
 
 1
 

1
G
 

 1
 

 1
11-69/72

2

3

2

3

2

3

2

3
 
4  exp(1 )
0


3
  
0


4 
(2  3 )2

   0
1

4  
34
0


223



4 
0
2
(2  3 )2
1
24
322
0



0


0

(2  3 ) 

23 
Part 11: Asymptotic Distribution Theory
Application: CES Function
Using Spanish Dairy Farm Data
Create
Regress
Calc
Calc
; x1=1 ; x2=logcows ; x3=logfeed ; x4=-.5*(logcows-logfeed)^2$
; lhs=logmilk;rh1=x1,x2,x3,x4$
; b1=b(1);b2=b(2);b3=b(3);b4=b(4) $
; gamma=exp(b1) ; delta=b2/(b2+b3) ; nu=b2+b3
; rho=b4*(b2+b3)/(b2*b3)$
Calc
;g11=exp(b1) ;g12=0
;g13=0
;g14=0
;g21=0
;g22=b3/(b2+b3)^2
;g23=-b2/(b2+b3)^2 ;g24=0
;g31=0
;g32=1
;g33=1
;g34=0
;g41=0
;g42=-b3*b4/(b2^2*b3) ;g43=-b2*b4/(b2*b3^2)
;g44=(b2+b3)/(b2*b3)$
Matrix ; g=[g11,g12,g13,g14/g21,g22,g23,g24/g31,g32,g33,g34/g41,g42,g43,g44]$
Matrix ; VDelta=G*VARB*G' $
Matrix ; theta=[gamma/delta/nu/rho] ; Stat(theta,vdelta)$
11-70/72
Part 11: Asymptotic Distribution Theory
Estimated CES Function
--------+-------------------------------------------------------------------|
Standard
Prob.
95% Confidence
Matrix| Coefficient
Error
z
|z|>Z*
Interval
--------+-------------------------------------------------------------------THETA_1|
105981***
475.2458
223.00 .0000
105049 106912
THETA_2|
.56819***
.01286
44.19 .0000
.54299
.59340
THETA_3|
1.06781***
.00864
123.54 .0000
1.05087
1.08475
THETA_4|
-.31956**
.12857
-2.49 .0129
-.57155
-.06758
--------+--------------------------------------------------------------------
11-71/72
Part 11: Asymptotic Distribution Theory
Asymptotics for Least Squares
Looking Ahead…
Assumptions: Convergence of XX/n (doesn’t
require nonstochastic or stochastic X).
Convergence of X/n to 0. Sufficient for
consistency.
Assumptions: Convergence of (1/n)X’ to a
normal vector, gives asymptotic normality
What about asymptotic efficiency?
11-72/72
Part 11: Asymptotic Distribution Theory