Descriptive Research Methodologies

Download Report

Transcript Descriptive Research Methodologies

ADVANCE ECONOMETRICS
LECTURE SLIDES, 2012 SPRING
The distinction between qualitative and
quantitative data
 The microeconomist’s data on sales will have a number
corresponding to each firm surveyed (e.g. last month’s sales
in the first company surveyed were £20,000). This is referred
to as quantitative data.
 The labor economist, when asking whether or not each
surveyed employee belongs to a union, receives either a Yes
or a No answer. These answers are referred to as qualitative
data. Such data arise often in economics when
choices are involved (e.g. the choice to buy or not buy a
product, to take public transport or a private car, to join or
not to join a club).
 Economists will usually convert these qualitative answers
into numeric data. For instance, the labor economist might
set Yes = 1 and No = 0. Hence, Y1 = 1 means that the first
individual surveyed does belong to a union, Y2 = 0 means that
the second individual does not. When variables can take on
only the values 0 or 1, they are referred to as dummy (or
binary) variables.
What is
Descriptive Research?
 Involves gathering data that describe events and then
organizes, tabulates, depicts, and describes the data.
 Uses description as a tool to organize data into patterns that
emerge during analysis.
 Often uses visual aids such as graphs and charts to aid the
reader
Descriptive Research
takes a “what is” approach
 What is the best way to provide access to computer equipment
in schools?
 Do teachers hold favorable attitudes toward using computers in
schools?
 What have been the reactions of school administrators to
technological innovations in teaching?
Descriptive Research
 We will want to see if a value or sample comes from a known
population. That is, if I were to give a new cancer treatment
to a group of patients, I would want to know if their survival
rate, for example, was different than the survival rate of
those who do not receive the new treatment. What we are
testing then is whether the sample patients who receive the
new treatment come from the population we already know
about (cancer patients without the treatment).
Economic and Econometric Models
 The model of the economic behavior that has been derived from a




7
theory is the economic model
After the unknown parameters have been estimated by using
economic data foe the variables and by using an appropriate
econometric estimation method, one has obtained an econometric
model.
It is common to use Greek characters to denote the parameters.
C=f(Y) economic model
C=β1 + β2 Y , C=15.4+0.81Y econometric model
Economic data
 Time series data, historical data
 Cross sectional data
 Panel data
Variables of an economic model
Dependent variable
Independet variable, explanatory variable, control
variable
The nature of economic variables can be endogeneous , exogeneous
or lagged dependent
8
1-9
1-10
1-11
 A variable is an endogenous variable if the variable is determined




12
in the model. Therefore dependant variable is always an
endogenous variable
The exogenous variables are determined outside of the model.
Lagged dependent or lagged endogenous variables are
predetermined in the model
The model is not necessary a model of only one equation if more
equations have been specified to determine other endogenous
variables of the system then the system is called a simultaneous
equation model (SEM)
If the number of equation is identical to the number of
endogenous variables, then that system of equation is called
complete. A complete model can be solved for the endogenous
variables.
 Static model
 A change of an explanatory variable in period t is fully reflected in






13
the dependent variable in the same period t.
Dynamic model
The equation is called a dynamic model when lagged variables have
been specified.
Structural equations
The equations of the economic model as specified in the economic
theory, are called the structural equations.
Reduced form model
A complete SEM can be solved for the endogenous variables. The
solution is called the reduced form model. The reduced form wil
be used to stimulate a policy or to compute forecasts for the
endogenous variables.
 Parameters and elasticities,
 The parameters are estimated by an estimator and the result is




14
called an estimate
The log transformation is a popular transformation in
econometric research, because it removes non-linearities to some
extent.1
Stochastic term
A disturbance term will be included at the right hand side of the
equation and is not observed
At the right hand side of the equation, two parts of the
specification ; the systematic part which concerns the
specification of variables based on the economic theory; and the
non-systematic part which is remaining random non-systematic
variation.2
Applied quantitative economic research
 The deterministic assumptions;
 It concerns the specification of the economic model, which is
formulation of the null hypothesis about the relationship between
the economic variables of interest. The basic specification of the
model originates from the economic theory. An important decision
is made about the size of the model, whether one or more equation
have to be specified.
 the choice of which variables have to be included in the model stems
from the economic theory
 The availability and frequency of the data can influence the
assumptions that have to be made
 The mathematical form of the model has to be determined. Linear
or nonlinear. Linear is more convenient to analyze.
15
 Evaluation of the estimation results,
 The evaluation concerns the verification of the validity and
reliability of all the assumptions that have been made
 A first evaluation is obtained by using common sense and economic
knowledge. This is followed by testing the stochastic assumptions by
using a normality test, autocorrelation test, heteroscedasticity tests,
etc. looking at a plot of the residuals can be very informative about
cycles or outliers
 If the stochastic assumptions have not been rejected , the
deterministic assumptions can be tested by using statistical tests to
test restrictions on parameters. The t-test and F-test can be
performed. The coefficient of determination R2 can be interpreted.
16
Hypothesis Testing
Purpose: make inferences about a population parameter by
analyzing differences between observed sample statistics and the
results one expects to obtain if some underlying assumption is
true.
X 
Z
• Null hypothesis:

n
• Alternative hypothesis:
If the null hypothesis is rejected then the alternative hypothesis
is accepted.
H 0 :   12.5Kbytes
H1 :   12.5Kbytes
Hypothesis Testing
A sample of 50 files from a file system is selected. The sample
mean is 12.3Kbytes. The standard deviation is known to be
0.5 Kbytes.
H 0 :   12.5Kbytes
H1 :   12.5Kbytes
Confidence: 0.95
Critical value =NORMINV(0.05,0,1)= -1.645.
Region of non-rejection: Z ≥ -1.645.
So, do not reject Ho. (Z exceeds critical value)
Steps in Hypothesis Testing
1. State the null and alternative hypothesis.
2. Choose the level of significance a.
3. Choose the sample size n. Larger samples allow us to detect
even small differences between sample statistics and true
population parameters. For a given a, increasing n decreases b.
4. Choose the appropriate statistical technique and test statistic
to use (Z or t).
5. Determine the critical values that divide the regions of
acceptance and nonacceptance.
6. Collect the data and compute the sample mean and the
appropriate test statistic (e.g., Z).
7. If the test statistic falls in the non-reject region, Ho cannot
be rejected. Else Ho is rejected.
Z test versus t test
 1.Z-test is a statistical hypothesis test that follows a normal
distribution while T-test follows a Student’s T-distribution.
2. A T-test is appropriate when you are handling small samples
(n < 30) while a Z-test is appropriate when you are handling
moderate to large samples (n > 30).
3. T-test is more adaptable than Z-test since Z-test will often
require certain conditions to be reliable. Additionally, T-test has
many methods that will suit any need.
4. T-tests are more commonly used than Z-tests.
5. Z-tests are preferred than T-tests when standard deviations
are known.
One tail versus two tail
 we were only looking at one “tail” of the distribution at a time (either




on the positive side above the mean or the negative side below the
mean). With two-tail tests we will look for unlikely events on both
sides of the mean (above and below) at the same time.
So, we have learned four critical values.
1-tail
2-tail
α = .05 1.64
1.96, -1.96
α = .01 2.33
2.58/-2.58
Notice that you have two critical values for a 2-tail test, both positive
and negative.You will have only one critical value for a one-tail test
(which could be negative).
Which factors affect the accuracy of
ˆ

the estimate
 1. Having more data points improves accuracy of
estimation.
 2. Having smaller errors improves accuracy of
estimation. Equivalently, if the SSR is small or the
variance of the errors is small, the accuracy of the estimation
will be improved.
 3. Having a larger spread of values (i.e. a larger variance) of
the explanatory variable (X) improves accuracy of estimation.
Calculating a confidence interval for β
 If the confidence interval is small, it indicates accuracy.
Conversely, a large confidence interval indicates great uncertainty
over β’s true value. Confidence interval for β is


ˆ  tb sb , ˆ  tbor
sb

ˆˆ  tb sb    ˆ  tb sb
 sb is the standard deviation of ˆ
 Large values of sb will imply large uncertainty
One group
Two
unrelated
groups
Nonparametric tests
Parametric tests
Nominal
data
Chi square
goodness
of fit
Chi square
Ordinal, interval,
ratio data
One group t-test
Ordinal data
Wilcoxon
signed rank test
Wilcoxon rank
sum test,
Mann-Whitney
test
Two related McNemar’s Wilcoxon
groups
test
signed rank test
K-unrelated Chi square Kruskal -Wallis
groups
test
one way
analysis of
variance
K-related
Friedman
groups
matched
samples
Student’s t-test
Paired Student’s
t-test
ANOVA
ANOVA with
repeated
measurements
Hypothesis testing involving R2: the F-statistic
 R2 is a measure of how well the regression line fits the data or,
equivalently, of the proportion of the variability in Y that can
be explained by X. If R2 = 0 then X does not have any explanatory
power for Y.The test of the hypothesis R2 = 0 can therefore be
interpreted as a test of whether the regression explains anything
at all
( N  2) R 2
F
1 R2
 we use the P-value to decide what is “large” and what is
“small” (i.e. whether R2 is significantly different from zero or not)
F-Test
 Usage of the F-test
 We use the F-test to evaluate hypotheses that involved
multiple parameters. Let’s use a simple setup:
 Y = β0 + β1X1 + β2X2 + β3X3 + εi
F-Test
 For example, if we wanted to know how economic policy
affects economic growth, we may include several policy
instruments (balanced budgets, inflation, trade-openness, &c)
and see if all of those policies are jointly significant. After all,
our theories rarely tell
us which variable is important, but rather a broad category of
variables.
F-statistic
 The test is performed according to the following strategy:
 1. If Significance F is less than 5% (i.e. 0.05), we
conclude R2 ≠ 0.
 2. If Significance F is greater than 5% (i.e. 0.05), we
conclude R2 = 0.
Multiple regression model
 The model is interpreted to describe the conditional
expected wage of an individual given his gender, years of
schooling and experience;
wagei  1  2malei  3schooli  4 experi   i
 The coefficient β2 for malei measures the difference in
expected wage between a male and a female with the same
schooling and experience.
 The coefficient β3 for schooli gives the expected wage
difference between two individuals with the same experience
and gender where one has additional year of schooling.
 The coefficients in a multiple regression model can only be
interpreted under a ceteris paribus condition.
 Estimation by OLS gives the following results;
 Dependent variable: wage
Variable
Constant
Male
School
Exper
s = 3.0462
Estimate
-3.38
1.3444
0.6388
0.1248
R2 = 0.1326
Standard Error
t-ratio
0.4650
-7.2692
0.1077
12.4853
0.0328
19.4780
0.0238
5.2530
R2 = 0.13
F = 167.63
The coefficient for malei suggest that if we compare an arbitrary male and female with the
same years of schooling and experience, the expected wage differential is 1.34$ with
standard error 0.1077 which is statistically highly significant.
The null hypothesis that schooling has no effect on a persons wage, given gender and
experience, can be tested using the t-test with a test statistic of 19.48.
The estimated wage increase from one additional year of schooling, keeping years of
experience fixed is $0.64
The joint hypothesis that all three partial slope coefficients are zero, that is wages are not
affected by gender, schooling or experience has to be rejected as well.
 R2 is 0.1326 which means that the model is able explain 13.3% of
the within sample variation in wages.
 A joint test on the hypothesis that the two variables, schooling and
experience both have zero coefficient by performing the F test.
f 
(0.1326 0.0317) / 2
 191.35
(1  0.1326) /(3294 4)
 With a 5% critical value of 3.0, the null hypothesis is obviously
rejected. We can thus conclude that the model includes gender
schooling and experience performs significantly better than model
which only includes gender.
Hypothesis testing
 When a hypothesis is statistically tested two types of errors can be
made. The first one is that we reject the null hypothesis while it is
actually true- type I error. The second one is that the null hypothesis is
not rejected while the alternative is true- type II error
 The probability of a type I error is directly controlled by the researcher
through his choice of the significance level α. When a test is performed
at the 5% level, the probability of rejecting the null hypothesis while it is
true is 5%.
 The probability of a type II error depends upon the true parameter
values. If the truth deviates much from the stated null hypothesis, the
probability of such an error will be relatively small, while it will be quite
large if the null hypothesis is close to the truth.
 The probability of rejecting the null hypothesis when it is false, is known
as the power of the test. It indicates how powerful a test is in finding
deviations from the null hypothesis
Ordinary Least Squares (OLS)
Yi  0  1 X1i  2 X 2i  ...  K X Ki   i
 Objective of OLS  Minimize the sum of squared residuals:

min n 2
ei

ˆ
 i 1
where
ei  Yi  Yˆi
 Remember that OLS is not the only possible estimator of the
βs.
 But OLS is the best estimator under certain assumptions…
CAPM
 Expected returns on individual assets are linearly related to the
expected return on the market portfolio.
 rjt  rf   j (rmt  rf )   jt regression without an intercept
CAPM regression without intercept
Dependent variable: excess industry portfolio returns
Industry
food
durables
construction
Excess market return
0.790
1.113
1.156
(0.028)
(0.029)
(0.025)
Adj.R2
0.601
0.741
0.804
S
2.902
2.959
2.570
CAPM regression with intercept
Dependent variable: excess industry portfolio returns
Industry
food
durables
Constant
0.339
0.064
(1.128)
(0.131)
Excess market return
0.783
1.111
(0.028)
(0.029)
Adj.R2
0.598
0.739
S
2.885
2.961
construction
-0.053
(0.114)
1.157
(0.025)
0.803
2.572
CAPM regression with intercept and January dummy
Dependent variable: excess industry portfolio returns
Industry
food
durables
Constant
0.417
0.069
(0.133)
(0.137)
January dummy
-0.956
-0.063
(0.456)
(0.473)
Excess market return
0.788
1.112
(0.028)
(0.029)
Adj.R2
0.601
0.739
S
2.876
2.964
construction
-0.094
(0.118)
0.498
(0.411)
1.155
(0.025)
0.804
2.571
OLS Assumptions
 But OLS is the best estimator under certain assumptions…
 Regression is linear in parameters
 2. Error term has zero population mean
 3. Error term is not correlated with X’s (exogeneity)
 4. No serial correlation
 5. No heteroskedasticity
 6. No perfect multicollinearity
and we usually add:
 7. Error term is normally distributed
Exogeneity
 All explanatory variables are uncorrelated with the error term
 E(εi|X1i,X2i,…, XKi,)=0
 Explanatory variables are determined outside of the model (They




are exogenous)
What happens if assumption 3 is violated?
Suppose we have the model,
Yi =β0+ β1Xi+εi
Suppose Xi and εi are positively correlated When Xi is large, εi
tends to be large as well.
Exogeneity
 We estimate the relationship using the following model:
 salesi= β0+β1pricei+εi
 What’s the problem?
Assumption 3: Exogeneity
 What’s the problem?
 What else determines sales of hamburgers?
 How would you decide between buying a burger at McDonald’s
($0.89) or a burger at TGI Fridays ($9.99)?
 Quality differs
 salesi= β0+β1pricei+εi  quality isn’t an X variable even though
it should be.
 It becomes part of εi
Assumption 3: Exogeneity
 What’s the problem?
 But price and quality are highly positively correlated
 Therefore x and ε are also positively correlated.
 This means that the estimate of β1will be too high
 This is called “Omitted Variables Bias” (More in Chapter 6)
Serial Correlation
 Serial Correlation:The error terms across observations are






correlated with each other
i.e. ε1 is correlated with ε2, etc.
This is most important in time series
If errors are serially correlated, an increase in the error term in
one time period affects the error term in the next
Homoskedasticity: The error has a constant variance
This is what we want…as opposed to
Heteroskedasticity:The variance of the error depends on the
values of Xs.
Perfect Multicollinearity
 Two variables are perfectly collinear if one can be
determined perfectly from the other (i.e. if you know the
value of x, you can always find the value of z).
 Example: If we regress income on age, and include both age
in months and age in years.
Adjusted/Corrected R2
 R2 = SSR/SST . As before, R2 measures the proportion of the sum of




squares of deviations of Y that can be explained by the relationship
we have fitted using the explanatory variables.
Note that adding regressors can never cause R2 to decrease, even if
the regressors) do not seem to have a significant effect on the
response of Y .
Adjusted (sometimes called \corrected") R2 takes into account the
number of regressors included in the model; in effect, it penalizes us
for adding in regressors that don't \contribute their part" to
explaining the response variable.
Adjusted R2 is given by the following, where k is the number of
regressors
2
(
n

1
)
R
k
2
Adjusted R 
n  k 1
Interpreting Regression Results
 β measures the expected change in Yi if Xi changes with one unit
but all other variables in Xi do not change
Yi  X i   i
 In a multiple regression model single coefficients can only be
interpreted under ceteris paribus conditions.
 It is not possible to interpret a single coefficient in a regression
model without knowing what the other variables in the model are.
 The other variables in the model are called control variables
 Economists are interested in elasticities rather than marginal effect.
An elasticity measures the relative change in the dependant variable
due to a relative change in one of the Xi variables.
 Linear model implies that elasticities are nonconstant and vary with
Xi, while the loglinear model imposes constant elasticities.
 Explaining log Yi rather than Yi may help reducing heteroscedasticity
problems.
 If Xi is a dummy variable (or another variable that may take negative
values) we cannot take logarithm and we include the original variable
in the model. Thus we estimate, logY  X  
i
i
i
 It is possible to include some explanatory variables in logs and some
in levels. The interpretation of a coefficient β is the relative change in
Yi due to absolute change of one unit in Xi. If Xi is a dummy variable
for males β is the (ceteris paribus) relative wage differential between
men and women.
 Selecting regressors
 What happens when a relevant variable is excluded from the model and




what happens when an irrelevant variable is included in the model.
Omitted variable bias
Including irrelevant variables in your model, even though they have a
zero coefficient, will typically increase the variance of the estimators
for the other model parameters. While including too few variables has
the danger of biased estimates.
To find potentially relevant variables we can use economic theory.
General-to-spesific modelling approach – LSE methodology. This
approach starts by estimating a general unrestricted model (GUM),
which subsequently reduced in size and complexity by testing
restrictions.
 In presenting your estimation results, it is not a sin to have insignificant





variables included in your specification
Be careful including many variables in your model can cause
multicollinearity, in the end almost none of the variables appears
individually significant
Another way to select a set of regressors, R2 measures the proportion
of the sample variation in Yi , that is explained by variation in Xi.
If we were to extend the model by including Zi in the set of regressors
the explained variation would never decrease, so that R2 will never
decrease we include relevant additional variables in the model.
-not optimal because with too many variables we will not be able to
say very much about the models coefficients. Becaause R2 does not
punish the inclusion of many variables.
Better to use two of them. Trade-off between goodness of fir and the
number of regressors employed in the model. Use adjusted R2
 Adjusted R2 provides a trade-off between goodness of fit as
measured by  e and simplicity of the model as measured by
N
i 1
2
i
number of parameters
N
R 2  1
1 /( N  K ) ei2
i 1
N
1 /( N  1) ( yi  y ) 2
i 1
 Akaikes Information Criterion
 Schwarz Bayesian Information Criterion
 Models with lower AIC and BIC are preferred.
 It is possible to test whether the increase in R2 is statistically
significant.
 Approproate F-statistics as follows,

j: number of added variables
( R12  R02 ) / J
f 

(1  R 2 ) /( N  K ) N-K : degrees of freedom
1
 If we want to drop two variables from the model at the same time we
should be looking at a joint test( f or wald test) rather than at two
separate t tests. Once the first variable is omitted from the model the
second one may appear significant. This is of importance if
collinearity exits between two variables.
Comparing non-nested models
 Both are interpreted as describing the conditional expectation of yi
give xi and zi,
yi  xi   i
yi  zi  vi
 The two models are non-nested if zi includes a variable that is not in
xi and vice versa.because both models are explaining the same
endogenous variable, it is possible to use the adj R2, AIC and BIC
criteria. An alternative and more formal way that can be used to
compare the two models is that of encompassing.
 - if a model A is believed to be correct model it must be able to
encompass model B, that is, it must be able to explain model B s
result. If model A is unable to do so, it has to be rejected.
 There are two alternatives; non-nested F test and J test.
 Compared to non-nested F test, the J test involves only one
restriction. This means that J test may be more powerful if the
number of additional regressors in the non-nested F test is large. If
the non-netsed F test involves only one additional regressor, ir is
equivalent to the J test.
 Two alternative ways that are nested is the choice between a linear
and log linear functional form. Because the depeandant variables are
different comparison of R2, AIC and BIC is inappropriate.
 One way to test is Box-Cox transformation.
Misspecifying the functional form
 Nonlinear models,
 Nonlinearity can arise in two ways; 1. model is still linear in the
parameters but non linear in its explanatory variables. We include
nonlinear functions of xi as additional explanatory variables, for
2
example the variables agei .and.agei malei could be included in an
individual wage equation. The resulting model is still linear in
parameters and can still be estimated by OLS.
 Nonlinear models can also be estimated by a nonlinear version of the
LS method.
Testing the Functional Form
 To test whether the additional nonlinear terms in xi are significant.




This can be done using t-test, F-test or Wald test
Or RESET test (regression equation specification error test)
Testing for a structural break
The coefficient in the model may be different before and after a
major change in macro-economic policy. The change in regression
coefficient is referred to as a structural break.
A convenient way to express the general specification is given by;
yi  xi  gi xi   i
 Specification consist of two groups indicated by gi =0 and gi=1.
 The null hypothesis is γ=0, in which case the model reduces to the
restricted model
 A first way to test γ=0 is obtained by using the F-test;


( S R  SUR ) / K K: number of regressors in the
f 
restricted model (inc.intecept)
SUR /( N  2 K )
 SUR and SR residual sum of squares of the unrestricted and restricted
model
 The above f test is referred to as the Chow test for structural change

 OLS results hedonic price function
 Dependant variable:log (price)
 Variable






estimate
standard error
t-ratio
Constant
7.094
0.232
30.636
Log(lot size)
0.400
0.028
14.397
Bedrooms
0.078
0.015
5.017
Bathrooms
0.216
0.023
9.386
Air conditioning
0.212
0.024
8.923
S= 0.2456 R2=0.5674
Adj R2=0.5642
F=177.41
 OLS results hedonic price function
 Dependant variable:log (price)














Variable
estimate
standard error
Constant
7.745
0.216
Log(lot size)
0.303
0.027
Bedrooms
0.034
0.014
Bathrooms
0.166
0.020
Air conditioning
0.166
0.021
Driveway
0.110
0.028
Recreational room
0.058
0.026
Full basement
0.104
0.022
Gas for hot water
0.179
0.044
Garage places
0.048
0.011
Preferred area
0.132
0.023
Stories
0.092
0.013
S= 0.2104
R2=0.6865
Adj R2=0.6801 F=106.33
(0.6865 0.5674) / 7
f 
 28.99
(1  0.6865) /(546 12)
t-ratio
35.801
11.356
2.410
8.154
7.799
3.904
2.225
4.817
4.079
4.178
5.816
7.268
 OLS results hedonic price function















Dependant variable:log (price)
Variable
estimate
standard error
Constant
-4038.35
3409
lot size
3.546
0.350
Bedrooms
1832
1047
Bathrooms
14335
1489
Air conditioning
12632
1555
Driveway
6687
2045
Recreational room
4511
1899
Full basement
5452
1588
Gas for hot water
12831
3217
Garage places
4244
840
Preferred area
9369
1669
Stories
6556
925
S= 15423
R2=0.6731
Adj R2=0.6664 F=99.97
t-ratio
-1.184
10.124
1.75
9.622
8.124
3.27
2.374
3.433
3.988
5.050
5.614
7.086
Heteroscedasticity
 Remember Gaus-markov assumptions – BLUE
 Three ways of handling the problems of heteroscedasticity and
autocorrelation;
 Derivation of an altenative estimator that is best linear unbiased
 Sticking to the OLS estimator but somehow adjust the standard
erors to allow heteroscedasticity or/and autocorrelation
 Modify your model specification
Deriving an alternative estimator
 We transform the model such that it satisfies the Gauss-markov
conditions again
 We obtain error terms that are homoscedastic and exhibit no
autocorrelation
 This estimator is referred to as the generalized least squares (GLS)
estimator.
 Hypothesis testing pg. 84-85
 Testing heteroscedasticity,
 Testing equality of two unknown variances,
H 0 :  A2   B2
H 1 :  A2   B2
 For one sided;
H0 :  A2   B2
 Goldfeld-Quandt test
 The Breusch-Pagan test – lagrange multiplier test for
heteroscedasticity
 The white test – most common one but has limited power against a
large number of alternatives
 Multiplicative heteroscedasticity- has more power but only against a
limited number of alternatives.
illustration












Labour:
total employment
Capital:
total fixed assets
Wage:
total wage costs divided by number of workers
Output:
value added
OLS result linear model
Dependent variable: labour
Variable
estimate
st.error
t-ratio
Constant
287.72
19.64
14.648
Wage
-6.742
0.501
-13.446
Output
15.40
0.356
43.304
Capital
-4.590
0.269
-17.067
S=156.26 R2=0.9352
Adj.R2=0.9348
F=2716.02
 auxiliary regression Breusch-Pagan test
 Dependent variable: ei2
 Variable
estimate
st.error
t-ratio
 Constant
-22719.51
11838.88
-1.919
 Wage
228.86
302.22
0.757
 Output
5362.21
214.35
25.015
 Capital
-3543.51
162.12
-21.858
 S=94182
R2=0.5818
Adj.R2=0.5796
F=262.05
 High t-ratios and relatively high R2 indicate that the error variance is
unlikely to be constant.
 Breusch-Pagan test statistic = 569 (N) x R2=331.0
 As the asymtotic distribution under the null hypothesis is a chi-squared
with 3 degrees of freedom. This implies rejection of homoscedasticity.
 Use logarithms to alleviate this problem
 First step in handling the heteroscedasticity problem is to consider
loglinear model,
 If the production function is Cobb Douglas type Q=AKαLβ








OLS results loglinear model
Dependent variable: log(labour)
Variable
estimate
st.error
t-ratio
Constant
6.117
0.246
25.089
logWage
-0.928
0.071
-12.993
logOutput 0.990
0.026
37.487
logCapital -0.004
0.019
-0.197
S=0.465
R2=0.8430
Adj.R2=0.8421
F=1011.02
 A more general test is the White test. Run an auxiliary regression of
squared OLS residuals upon all original regressors;













auxiliary regression Breusch-Pagan test
Dependent variable: ei2
Variable
estimate
st.error
Constant
2.545
3.003
Log(Wage)
-1.299
1.753
log(Output)
-0.904
0.560
Log(Capital)
1.142
0.376
log2Wage
0.193
0.259
log2Output
0.138
0.036
log2Capital
0.090
0.014
logWagelogoutput 0.138
0.163
Logwagelogcapital -0.252
0.105
logoutputlogCapital -0.192
0.037
 S=0.851
R2=0.1029
Adj.R2=0.0884
t-ratio
0.847
-0.741
-1.614
3.039
0.744
3.877
6.401
0.849
-2.399
-5.197
F=7.12









Auxiliary regression multiplicative heteroscedasticity
Dependent variable: logei2
Variable
estimate
st.error
t-ratio
Constant
-3.254
1.185
-2.745
logWage
-0.061
0.344
-0.178
logOutput 0.267
0.127
2.099
logCapital -0.331
0.090
-3.659
S=2.241
R2=0.0245
Adj.R2=0.0193
F=4.73
F values leads to rejection of the null hypothesis of homoscedasticity.
Autocorrelation
 When two or more consecutive error terms are correlated
 OLS remain unbiased but it becomes inefficient and its standard
errors are estimated in the wrong way
 Occurs only when using time series data
 First order autocorrelation
 t  ot 1  ut error term is assumed to depend upon its predecessor as follows;
o
 t  ot 1  ut
ut is an error term with mean zero and canstant
variance and exhibits no serial correlation
o |ρ|<1 first order autoregressive process is stationary(means,
variance and covariances of εt do not change over time
Testing for first order autocorrelation
 When ρ=0 no autocorrelation is present
 Breusch-Godfrey Lagrange multiplier test
 Durbin-Watson test
T


dw 

2
(
e

e
)
 t
t 1
t 2
T
e
t 1
2
t
dw ≈ 2-2ρ
dw close to 2 indicates that first order
autocorrelation coefficient ρ is close to 0.
if dw is much smaller than 2 this is an indication
for positive correlation
What to do when you find A.correlation
 In many cases finding of autocorrelation is an indication that




the model is misspecified. The most natural way is not to
change your estimator from OLS to EGLS but to change
your model.
Three types of misspecification may lead to a finding of
autocorrelation in your OLS residuals:
Dynamic misspecification
Omitted variables
Functional form misspecification
Endogeneity
 Some of explanatory variables can be correlated with the error term,
such that OLS estimator is biased and inconsistent
 Autocorrelation with a lagged dependent variable;
yt  1   2 xt  3 yt 1   t
 t  t 1  t
yt  1   2 xt  3 yt 1  t 1  t
 If ρ≠0 OLS no longer yields consistent estimators.
 A possible solution is the use of maximum likelihood or instrumental variables
techniques
 Durbin-Watson test is not valid
 Use Breusch-Godfey lagrange multiplier test for autocorrelation, or
 Lagrange multiplier test
Maximum likelihood
Instrumental Variable
Simultaneity
 When the model of interest contains behavioural parameter, usually
measuring the causal effects of changes in the explanatory variables,
and one or more of these explanatory variables are jointly
determined with the left hand side variable.
Illustration
The instrumental variables estimator
 It is clear that people with more education have higher wages. It is
less clear whether this positive correlation reflects a causal effect of
schooling, or that individuals with a greater earnings capacity have
chosen for more years of schooling. If the latter possibility is true,
the OLS estimates on the return to schooling simply reflect
differences in unobserved characteristics of working individuals
and an increase in a persons schooling due to an exogenous shock
will have no effect on this persons wage.
 The problem of estimating the causal effect of schooling upon
earnings
 Most studies are based upon the human capital earnings function;
wi  1  2 Si  3 Ei  4 Ei2   i
 Wi denotes the log of individual earnings, Si denotes years of
schooling and Ei denotes years of experience in the absense of
information on actual experience, Ei sometimes replaced by
potential experience measured as age-Si-6
 Which variable serve as instrument. And instrument is thought of as
a variable that affects the cost of schooling and thus the choice of
schooling but not earnings. There is a long tradition of using family
background variables, e.g. parents education as an instrument.
 Table reports the results of an OLS regression of an individuals log
hourlywage upon years of schooling, experience and experience
squared and three dummy variables indicating whether the individual
was black, lived in a metropolitan area and lived in the south
 Wage equation estimated by ols
 Dependent variable: log(wage)
 Variable
 Constant
 Schooling
 Exper
 Exper2
 Black
 Smsa
 South
 S=0.374
estimate Standard error
4.7337
0.0676
0.0740
0.0035
0.0836
0.0066
-0.0022
0.0003
-0.1896
0.0176
0.1614
0.0156
-0.1249
0.0151
R2=0.2905
Adj. R2=0.2891
t-ratio
70.022
21.113
12.575
-7.050
-10.758
10.365
-8.259
F=204.93
 Reduced form for schoolingestimated by ols
 Dependent variable: schooling
 Variable








estimate
Constant
-1.8695
Age
1.0614
age2
-0.0188
Black
-1.4684
Smsa
0.8354
South
-0.4597
Lived near coll. 0.3471
S=2.5158
R2=0.1185
Standard error
4.2984
0.3014
0.0052
0.1154
0.1093
0.1024
0.1070
Adj. R2=0.1168
t-ratio
-0.435
3.522
-3.386
-12.719
7.647
-4.488
3.244
F=67.29
 Wage equation estimated by IV
 Dependent variable: log(wage)
 Variable









estimate Standard error
Constant
4.0656
0.6085
Schooling
0.1329
0.0514
Exper
0.0560
0.0260
Exper2
-0.0008
0.0013
Black
-0.1031
0.0774
Smsa
0.1080
0.0050
South
-0.0982
0.0288
Instruments: age, age2, lived near college
Used for exper, exper2 and schooling
 OLS undeestimates the true causal effect of schooling
t-ratio
6.682
2.588
2.153
-0.594
-1.133
2.171
-3.413
2SLS
GMM
 This approach estimates the model parameters directly from the






moment conditions that are imposed by the model.
These conditions can be linear in the parameters but quite often are
nonlinear.
The number of moment conditions should be at least as large as the
number of unknown parameters
The great advantage of GMM is that
1- it doesn’t require distributional assumption, like normality
2-it can allow for heteroscedasticity of unknown form
3-it can estimate parameters efen if the model cannot be solved
analytically from the first order condition
 Consider a linear model;
yi  xi   i
 With instrument vector zi the moment conditions are
E i zi   E( yi  xi) zi   0
 If εi is i.i.d. the optimal GMM is the instrumental variables estimator
 Consider the consumption based pricing model. Assume that there are risky
assets as well as riskless asset with certain return
 With one riskless asset and ten risky portfolios provide eleven moment
conditions with only two parameters to estimate
 These parameters can be estimated using the identity matrix as a suboptimal
weighting matrix, using the efficient two-step GMM estimatoror using iterated
GMM estimator
 Folowing Table presents the estimation results on the basis of the monthly
returns using one step GMM and iterated GMM.
 GMM estimation results consumption based asset pricing model


 δ
 γ
 ξ (df=9)
One step GMM
Estimate
s.e.
Iterated GMM
Estimate
s.e.
0.6996
(0.1436)
0.8273
(0.1162)
91.4097
4.401
(38.1178)
(p=0.88)
57.3992
5.685
(34.2203)
(p=0.77)
 The γ estimates are huge and rather imprecise. For the iterated GMM procedure, for
example a 95% confidence interval for γ based on the approximate normal distribution
is as large as (-9.67, 124.47). The estimated risk aversion coefficients of 57.4 and 91.4
are much higher than what is considered economically plausible.
 To investigate the economic value of the model, it is possible to compute pricing errors
 One can directly compute the average expected excess return according to the model,
simply by replacing the population moments by the corresponding sample moments and
using the estimated values for δ and γ. On the other hand the average excess returns on
asset I can be computed from the data. (verbeek 157)
Maximum Likelihood
 Stronger than GMM approach
 It assumes knowledge of the entire distribution, not just of a number
of moments
 If the distribution of a variable yi conditional upon a number
variables xi, is known up to a small number of unknown coefficients,
we can use this to estimate these unknown parameters by choosing
them in such a way that the resulting distribution corresponds as well
as possible.
 The conditional distribution of an observed phenomenon (the
endogenous variable) is known, except for a finite number of
unknown parameters.
 These parameters will be estimated by taking those values for them
that give the observed values the highest probability, the highest
likelihood.
Specification Tests
 The Wald, the likelihood ratio and Lagrange multiplier principle.
 The wald test is generally applicable to any estimator that is
consistent and asymptotically normal.
 The likelihood ratio (LR) principle provides an easy way to
compare two alternative nested models
 Lagrange multiplier (LM) test allow one to test restrictions that
are imposed in estimation. This makes the LM approach
particularly suited for misspecification tests where a chosen
specification of the model is tested for misspecification in several
directions (like heteroscedasticity, non-normality or omitted
variables)
 Suppose that we have a sample of N=100 balls, of which 44%
are red. If we test the hypothesis that p=0.5, we obtain wald,
LR and LM test statistics of 1.46, 1.44 and 1.44 respectively.
The 5% srtical value taken from the asymptotic chi-squared
distribution with one degree of freedom is 3.84, so that the
null hypothesis is not rejected at the 5% level with each of
the three tests.
Testing for Omitted variables in non-linear
models
 Hhh
yi  xi  zi   i
 Zi is a J-dimensional vector of explanatory variables, independent of
εi. The null hypothesis states H0:γ=0. note that under the
assumptions above the F-test provides an exact test for γ=0 and
there is no real need to look at asymptotic tests.
 We can compute LM test for nonlinear models.
 Testing for heteroscedasticity
 The squared OLS or maximum likelihood residuals
 Testing for autocorrelation
 Auxiliary regression of the OLS or ML.
 Quasi-maximum likelihood and moment conditions tests
 Conditional moment test
 Information matrix test
 Testing for normality
Models with limited dependant variable
 Binary Choice Models;
 To overcome the problems with the linear model, there exist a class
of binary choice models which are;
 Probit model
 Logit model
 Linear probability model
An underlying latent model
 It is possible to derive a binary choice model from underlying
behavioural assumptions. This leads to a latent variable
representation of the model.1
yi*  xi   i
 Because yi* is unobserved , it is referred to as a latent variable.
 Our assumption is that an individual choices to work if the utility
difference exceeds a certain threshold level, which can be set to zero
without loss of generality.
 Consequently, we observe yi=1 (job) if and only if yi*>0 and
yi=0(no job) otherwise. Thus we have2


Pyi  1  P yi*  0  Pxi   i  0  P  i  xi  F (xi)
yi*  xi   i ,.. i ~ NID(0,1)
 yi=1 if yi*>0
 yi=0 if yi*≤0
 Where εi independent of all xi. For the logit model, the normal
distribution is replaced bv the standard logictic one. Most
commonly, the parameters in binary choice models (or limited
dependent variable models in general) are estimated by the model of
maximum likelihood.
 Goodness of fit
 There is no single measure for the goodness of fit in binary choice
models that contains only a constant as explanatory variable.
 There are two goodness of fit measure defined in the lit.1
 An alternative way to evaluate goodness of fit is comparing correct
and incorrect predictions.2
wr1
R  1
wr0
2
p
 Because it is possible that the model predicts worse than the simple
model one can have wr1>wr0, in which case the Rp2 becomes
negative. This is not good sign for the predictive quality of the
model.
 Illustration verbeek page; 197
Specification tests in binary choice models
 The likelihood function has to be correctly specified. This means that
we must be sure about the entire distribution that we impose on our
data. Deviations will cause inconsistent estimators and in binary
choice models this typically arises when the probability that yi=1 is
misspecified as a function of xi. Usually such misspecifications are
motivated from the latent variable model and reflect
heteroscedasticity or non-normality of εi. In addition we may want
to test for omitted variables without having to re-estimate the
model. The most convenient framework for such tests is the
lagrange multiplier framework.1
Multi response models
 1-ordered response models;
yi*   xi   i
yi  j...if .. j 1  yi*   j
 The probability that alternative j is chosen is the probability that the
latent variable yi* is between two boundaries γj-1and γj . Assumng
that εi is i.i.d. standard normal results in the ordered probit model.
The logistic distribution gives the ordered logit model. For M=2 we
are back at the binary choice model.
 Illustration; married females, how much would you like to work?
verbeek 203
 Illustration; willingness to pay for natural areas
 Multinominal models; verbeek 208
Models for count data
 In certain applications, we would like to explain the number of times
a given event occurs, for example, how often a consumer visits a
supermarket in a given week, or the number of patents a firm has
obtained in a given year. Outcame might be zero for a subtantial part
of the population. While the outcomes are discrete and ordered,
there are two important differences with ordered response
outcomes. First,he values of the outcome have a cardinal rather than
just an ordinal meaning (4 is twice as much as 2 and two is twice as
much as 1). Second, there is no natural upper bound to the outcomes
 Poisson regression model. See page 211
 Illustration, see page 215 patent and Rand D expenditures
Tobit Models
 The dependant is continuous, but its range may be constraint.
Most commonly this occurs when the dependant variable is
zero for a substantial part of the population but positive for
the rest of the population.
 Expenditures on durable goods, hours of work and the
amount of foreign direct investment of a firm.
 Estimation is usually done through maximum likelihood
 Ilustration ; expenditures on alcohol and tobacco
Extensions of Tobit Models
 It is coceivable that households with many children are less likely to
have positive expenditures, while if a holiday is taken up, the
expected level of expenditures for such households is higher.
 The tobit II model;
 Traditional model to describe sample selection problems is the tobit
II model also referred to as the
selection model. In this
h  sample
x  
context, it consists of a linear wage equation
*
i

2 2i
2i
wi*  1 x1i  1i
 where x1i denotes a vector of exogenous characteristics (age,
education, gender,…) and wi* denotes person i s wage. The wage
wi* is not observed for people that are not working. To describe
whether a person is working or not a second equation is specified,
which is of the binary choice type. That is;
 Dd
h  2 x2 i   2i
*
i
 Where we have the following observation rule:
 wi=wi*, hi=1 if hi*> 0
 Wi not obseved, hi=0 if hi*≤0
 Where wi denotes person i s actual wage. the binary variable hi
simply indicates working or not working. The model is completed by
a distributional assumption on the unobserved erors (ε1i, ε 2i),
usually a bivariate normal distribution with expectations zero,
variances σ12, σ22 respectively, and a covariance σ12 . The model is a
standard probit model, describing the choice working or no working
 Illustration: expenditures on Alcohol and Tobacco, verbeek page 233
Sample selection bias
 When the sample used in a statistical analysis is not randomly drawn
from a larger population, selection bias may occur.
 If you interview people in the restaurant and ask how often they visit
it, thoe that go there every day are much more likely to end up in the
sample than those that visit every two weeks
 Nonresponse may result in selection bias, people refuse to report
their income are typicaly those with relatively high or low income
levels.
 Self selection of economic agents. Individuals select themselves into a
certain state, working, union member, public sector employment, in
a nonrandom way on the basis of economic arguments. In general
who benefit most from being in a certain state will be more likely to
be in this state.
Estimating treatment effects
 Another area where sample selection plays an important role is in the
estimation of treatment effects. A treatment effect refers to the
impact of receiving a certain treatment upon a particular outcome
variable, for example the effect of participating in a job training
programme on future earnings. Because this effect may be different
across individuals and selection into the training programme may be
nonrandom.
 The treatment effect is simply the coefficient for a treatment dummy
variable in a regression model. Because interest is in the causal effect
of the treatment, we need to worry about the potential endogeneity
of the treatment dummy. We need to worry about selection into
treatment.
Duration models
 To explain the time it takes for an unemployed person to find a job.
 It is Often used in labour market studies.
 Illustration; duration of the firm-bank relationship
 Verbeek page 250

Univariate time series models
 Yt depends linearly on its previous value Yt-1 .that is,
Yt    Yt 1   t
 Where εt denotes a serially uncorrelated innovation with a mean of
zero and a constant variance. This process is called a first order
autoregressive process or AR(1).
 it says that the current valueYt equals a constant plus θ times its
various value plus an unpredictable component εt
 Assume that |θ|<1.
 The process for εt is an important building block of time series
models and it is referred to as white noise process. In this chapter it
will always denote such s process that is homoscedastic and exhibits
no autocorrelation
 The expected value of Yt can be solved from
EYt     EYt 1
 Assuming that E{Yt } does not depend on t, allow us to write



  EYt  
yt  Yt  

1
yt  yt 1   t
,

Yt     j t  j
j 0
 Writing time series models in terms of yt rather then Yt is often
notationally more convenient.
 Joint distribution of all values of Yt is characterized by the so-called
autocovariances, the covariances between Yt and one of its lags,Yt-k
 If we impose that variances and autocovariances do not depend on
the index t. this is so called stationary assumption.
 Another simple time series model is the first order moving average






process or MA(1) process, given by
Yt     t   t 1
This says that Y1 is a weighted average of ε1 and ε0 ,Y2 is a weighted
average of ε2 and ε1
The values of Yt are defined in terms of drawings from the white
noise process εt
The simple moving average structure implies that observations that
are two or more periods apart are uncorrelated.

this expression is referred to as the
j
Yt      t  j
moving average representation of
j 0
the autoregressive process
AR process is writen as an infinite order moving average processes.
We can do so provided that |θ|<1.
For some purposes a moving average representation is more
convenient than an autoregressive one.
 We assume that the process for Yt is stationary.
 Stationarity
 A stochastic process is said to be strictly stationary if its properties are
unaffected by a change of time origin
 We will be only concerned with the means, variances and covariances
of the series, and it is sufficient to impose that these moments are
independent of time, rather than the entire distribution. This is
referred to as weak stationarity or covariance stationarity.
 Strict stationarity is stronger as it requires that the whole distribution
is unaffected by a change in time horizon, not just the first and second
order moments.
 Autocovariance and autocorrelation1
 Defining autocorrelation ρk as,
 k
covYt , Yt k   k

V Yt 
0
 The autocorrelation considered as a function of k are referred to as
autocorrelation function(ACF) or sometimes correlogram of the
series Yt. The ACF plays a major role in modelling the dependencies
among observations, because it characterizes the process describing
the evolution of Yt over time.1
 From the ACF we can infer the extent to which one value of the
process is correlated with previous values and thus the length and
strength of the memory of the process. It indicates how long and
how strongly a shock in the process εt affects the values of Yt.
 For the AR(1) process,
Yt    Yt 1 t
we have autocorrelation coefficients
while for the MA(1) process
k   k
we have
Yt     t   t 1

1 
and. k  0....k  2,3,4....
2
1 

 Consequently, a shock in an MA(1) process affects Yt in two periods
only, while a shock in the AR(1) process affects all future
observations with a decreasing affect.
 Illustration. Verbeek page 260-261.
General ARMA Processes
 We define more general autoregressive and moving average processes
 MA(q)
yt   t  1 t 1  ...  q t q
 AR(p)
yt  1 yt 1   2 yt 2  ...  p yt  p   t
 ARMA(p,q)
yt  1 yt 1   2 yt 2  ...  p yt  p   t  1 t 1   2 t 2  ...  q t q
 ( L) yt   (L) t
 Often it is convenient to use the lag operator, denoted by L. it is
defined by, Lyt  yt 1
 Most of the time lag operator can be manipulated just as if it were a
2
constant; L yt  L( Lyt )  Lyt 1  yt 2
 So that, more generally.
Lp yt  yt  p
 Using the lag operator allow us to write ARMA models in a concise
way,
 AR(1)
yt 


j

L




 t j
t
j
j 0
j
j 0

yt    (  ) j yt  j 1   t
 MA(1)
j 0
 For more parsimonious representation, we may want to work with
an ARMA model that contains both an AR and MA part. The general
ARMA model can be written as
 (L) yt   (L) t
 Lag polynomials
 Common roots, verbeek page 164-265
 Stationarity and Unit Roots
 Stationarity of a stochastic process requires that the variances and
autocovariances are finite and independent of time. See Verbeek
page 266, 267
 A series which becomes stationary after first differencing is said to
be integrated of order one, denoted I(1). If Δyt is described by a
stationary ARMA(p,q) model, we say that yt is described by an
autoregressive integrated moving average (ARIMA) model of order
p, 1, q or in short an ARIMA (p,1,q) model.
 First differencing quite often can transform a nonstationary series
into a stationary one.
 If a series must be differenced twice before it becomes stationary,
then it is said to be integrated of order two, denoted I(2) and it
must have two unit roots.
 Testing for Unit Roots
 Dickey-Fuller test, to test the null hypothesis that Ho: θ=1 (a unit
root) , it is possible to use the standard t-statistic given by

DF 
 1

se ( )
 ADF test: testing for unit roots in higher order autoregressive
models1
 KPSS test: the null hypothesis of trend stationary specifies that the
variance of the random walk component is zero. The test is actually
a lagrange multiplier test. t statistic is given by,
T
KPSS   St2 / ˆ 2
t 1
 ILLUSTRATIONS, verbeek page 276
 ESTIMATION OF ARMA MODELS
 Least Squares
 Maximum Likelihood1
 CHOOSING A MODEL
 Before estimating any model it is common to estimate
autocorrelation and partial autocorrelation coefficients directly from
the data. Often this gives some idea which model might be
appropriate. After one or more models are estimated, their quality
can be judged by checking whether the residuals are more or less
white noise, and comparing them with alternative specifications.
These comparisons can be based on statistical significance tests or
the use of particular model selection criteria.
 The Autocorrelation Function. ACF
 Describes the correlation between Yt and its lag Yt-1 as a function of




k.
The partial autocorrelation function
DIAGNOSTIC CHECKING
As a last step in model building cycle some checks on the model
adequacy are required. Possibilities are doing a residual analysis and
overfitting the specified model. For example if ARMA(p,q) model is
chosen on the basis of the sample ACF and PACF, we could also
estimate an ARMA(p+1,q) and an ARMA(p,q+1) model and test
the significance of the additional parameters.
A residual analysis is usually based on the fact that the residuals of an
adequate model should be approximately white noise. A plot of the
residuals can be a useful tool in checking for outliers.
 Moreover the estimated residual autocorrelation are usually
examined. For A white noise series the autocorrelations are zero.
Therefore the significance of the residual autocorrelations is often
checked by comparing with approximate two standard error
bounds  2 / T . To check the overall acceptability of the residual
autocorrelations, the Ljung-Box portmanteau test statistics is used;
o
K
1 2
Qk  T T  2
rk
k 1 T  k
If a model is rejected at this stage, the model building cycle has to be
repeated.
 Criteria for model selection





p  q 1
Akaike s information Criterian (AIC) AIC  logˆ  2
T
p  q 1
2
ˆ
BIC  log 
logT
Bayesian Information Criterian
T
Both criteria are likelihood based and represent a different tradeoff between fit as measured by the loglikelihood value, and
parsimony, as measured by the number of free parameters,
p+q+1.
Usually the model with the smallest AIC and BIC value is
preferred.
ILLUSTRATION verbeek page 286.
2
 PREDICTING WITH ARMA MODELS
 A main goal of building a time series model is predicting the






future path of economic variables.
The optimal predictor
Our criterian for choosing a predictor from the many possible
ones is to minimize the expected quadratic prediction error.
Prediction accuracy
It is important to know how accurate this prediction is
The accuracy of the prediction decreases if we predict further
into the future.
The MA(1) model gives more efficient predictors only if one
predicts one period ahead. More general ARMA models however
will yield efficiency gains also in further ahead predictors.1
 For the computation of the predictor, the autoregressive




representation is most convenient.
The informational value contained in an AR(1) process slowly
decays over time
For a random walk, with θ=1, the forecast error variance increase
linearly with the forecast horizon.
In practical cases, the parameters in ARMA models will be
unknown and we replace them by their estimated values. This
introduces additional uncertainty in predictors. However this
uncertainty is ignored.
ILLUSTRATION, the expectations theory of the term structure
AUTOREGRESSIVE CONDITIONAL HETEROSCEDASTICITY
 In financial times series one often observes what is referred to as
volatility clustering
 In this case big shocks (residuals) tend to be followed by big shocks
in either direction, and small shocks tend to follow small shocks.
 ARCH: engel 1982, the variance of the error term at time t
depends on the squared error terms from previous periods. The
most simplest form is;
  E | t 1   
2
t
2
t
2
t 1
 This specification does not imply that the process for εt is
nonstationary. It just says that the squared values of
  2 .and,  2
are correlated.
t
t 1
The unconditional variance of εt is given by
and has a stationary solution
 2  E  2    E  2
 
t
 
t 1

provided that 0≤α<1. Note
 
1   that unconditional variance does not depend on t
 t2    1 t21  2 t22  ...   p t2 p     ( L) t21
2
 The ARCH(1) model is easily extended to an ARCH(p) process
 The effect of a shock j periods ago on current volatility is
determined by the coefficient αj
 In ARCH(p) model old shocks of more than p periods ago have no
effect on current volatility.
 GARCH MODELS
 ARCH models have been generalized in many different ways
 GARCH(p,q) model can be written as;
p

      j
2
t
j 1
q
2
t j
   j t2 j
j 1
or
 t2     ( L ) t21   ( L ) t21
 Where α(L) and β(L) are lag polynomials. In practice a GARCH (1,1)
specification often perform very well. It can be written as;

      
2
t
2
t 1
2
t 1
 which has only three parameters to estimate.
Non negative of
 t2
requires that ϖ, α and β are non negative
 If we define the surprise in the squared innovations as

t   t2   t2
the GARCH(1,1) process can be written as;
 t2    (   ) t21 t  t 1
 It implies that the effect of a shock on current volatility decreases
over time. 1
 An important restriction of the ARCH and GARCH specifications is
their symmetry; only absolute values of the innovations matter, not
their sign. That is a big negative shock has the same impact on future
volatility as a big positive shock of the same magnitude.
 An interesting extension is toward asymmetric volatility models, in
which good news and bad news have a different impact on future
volatility. Distinction between good and bad news is more sensible
for stock markets than for exchange rates.
 An asymmetric model should allow for the possibility that an
unexpected drop in price has a larger impact on future volatility
than an expected increase in price of similar magnitude. A fruitful
approach to capture such asymmetry is provided by nelson s
exponential GARCH or EGARCH
 t 1
 t 1
log     log  

 t 1
 t 1
2
t
2
t 1
 Where α, β and γ are constant parameters. The EGARCH model
is asymmetric as log as γ ≠0. when γ < 0 positive shocks generate
less volatility than negative shocks
 It is possible to extend the EGARCH model by including
additional lags.
 Estimation
 Quasi maximum likelihood estimation
 Feasible GLS
 In financial markets, GARCH models are frequently used to forecast
volatility of returns, which is an important input for investment,
option pricing, risk management and financial market regulation
 ILLUSTRATION: Volatility in daily exchange rates verbeek page
303
Multivariate Time Series Models
 Dynamic models with stationary variables
 let us consider two stationary variables Y1 and X1, and assume that
it holds that
Y1    Yt 1  0 X t  1 X t 1   t
 We can think of Yt as company sales and Xt as advertising both in
month t. if we assume that εt is a white noise process, independent
of Xt and Yt values. The above relation referred to as an
autoregressive distributed lag model. We use OLS to estimate it.
 The interesting element in the model is that it describes the
dynamic effects of a change in Xt upon current and future values of
Yt. Taking partial derivatives, we can derive that the immediate
response is given by
Y / X  
t
t
0
 This is referred to as the impact multiplier. An increase in X with
one unit has an immediate impact on Y of 0 units. The effect after
one period is
Yt 1 / X t  Yt / X t  1  0  1
 And after two periods
Yt 2 / X t  Yt 1 / X t  1   (0  1 )
 This shows that after the first period, the effect is decreasing if
|θ|<1. imposing this so-called stability condition allows us to
determine the long run effect of a unit change in Xt. It is given by
the long-run multiplier or equilibrium multiplier.
0  o(0  1 )   (0  1 )  .....
0  1
 0  (1      ...)(0  1 ) 
1
2
 This says that if advertising Xt increase with one unit, the expected
cumulative increase in sales is given by  0  1
1
 If the increase in Xt is permanent, the long run multiplier also has
the interpretation of the expected long run permanent increase in
Yt. An alternative way to formulate ARDL model is;
Yt  0X t  (1   )[Yt 1    X t 1 ]   t
 This formulation an example of an ECM. It says that the change in Yt
due to current change in Xt, plus an error correction term. If Yt-1 is
above the equilibrium value that corresponds to Xt-1, that is if the
equilibrium error in square brackets is positive, an additional
negative adjustments in Yt is generated. The speed of adjustment is
determined by 1-θ, which is the adjustment parameter.(1-θ>0)
 It is also possible to estimate ECM by OLS.
 Both the ARDL model and the ECM assume that the values of Xt can
be treated as given, that is , as being uncorrelated with the equations
error terms. ARDL is describing the expected value of Yt given its
own history and conditional upon current and lagged values of Xt. If
Xt is simultaneously determined with Yt, OLS in ARDL and ECM
would be inconsistent.
 Partial adjustment model see verbeek page 312
 Generalized ARDL see verbeek 312
 MODELS WITH NON-STATIONARY VARIABLES
 Spurious regression
 Consider two variables,Yt and Xt, generated b two independent
random walks
Yt  Yt 1  1t
X t o X t 1   2t a researcher may want to estimate a regression model such
as
Yt    X t   t
Yt  Yt 1  1t
X t  X t 1   2t
Yt    X t   t
 The results from this regression are likely to be characterized by a
fairly high R2 statistic, highly autocorrelated residuals and
significant value of β. This is the well known problem of nonsense
or spurious regression. Two independent nonstationary series are
spuriously related due to the fact that they are both trending.
 Including lagged variables in the regression is sufficient to solve
many of the problems associated with the spurious regression.
 COINTEGRATION
 Consider two series integrated of order one,Yt and Xt, and suppose
that a linear relationship exist between them. This is reflected in the
proposition that there exist some value β such that Yt- βXt is I(0),
although Yt and Xt are both I(1). In such a cse it is said that Yt and Xt
are cointegrated and they share common trend.
 The presence of a cointegrating vector can be interpreted as the
presence of a long run equilibrium relationship.
 If Yt and Xt are cointegrated the error term is I(0). If not, error
term will be I(1). Hence we can test for the presence of a
cointegartion relationship by testing for a unit roo in the OL
residuals. It seems that this can be done by using Dickey-Fuller test1
 An alternative test for cointegration is basd on the usual Durbin
watson statistic referred to as cointegrating regression durbin
watson test or CRDW (page 316)
 The presence of a cointegrating relationship implies that there
exist a error correction model that desribes the short run
dynamics consistently with the long run relationship.
 COINTEGRATION AND ECM
 The granger representation theorem states that if a set of variables
are cointegrated then there exist a valid error-correction
representation of the data.
Yt    1X t 1   (Yt 1  X t 1 )   t
 If Yt and xt are both I(1) but have a long run relationship, there
must be some force which pulls he equilibrium error back towards
zero.
 The ECM describes how Yt and Xt behave in the short run
consistent with a long run cointegrating relationship.
 The repreenation theorem also holds conversely. If Yt and Xt ae
both I(1) and have error correction representation, then they are
necessarily cointegrated.
 The concept of cointegration can be applied to nonstationary time
series only.
ILUSTRATION; Long run purchasing power parity
 VECTOR AUTOREGRESSIVE MODELS
 A VAR describes the dynamic evolution of a number of variables
from their common history.
 If we consider two variables,Yt and Xt, say the VAR consists of
two equations. A first order VAR would be given by;
Yt  1  11Yt 1  12 X t 1  1t
X t   2   21Yt 1   22 Xt 1  2t
Yt  1  11Yt 1  12 X t 1  1t
Xt 
2

Y 
21 t 1
22
X t 1   2t
 Where ε1t and ε2t are two white noise processes (independent
of the history of Y and X) that may be correlated. If, for example,
θ12≠0 it means that the history of X helps explaining Y.
 The VAR model implies univariate ARMA models for each of its
comonents. The advantages of considering the components
simultaneously that the model may be more parsimonious and
includes fewer lags, and that more accurate forecasting is
possible, because the information set is extended to also include
the history of the other variables. Sims (1980) has advocated the
use of VAR models instead of structural simultaneous equations
models because the distinction between endogenous and
exogenous variables does not have to be made a priori, and
arbitrary constraints to ensure identification are not required.
 We can use VAR model for forecasting in a straightforward way
 To estimate A VAR we can simply use OLS equation by equation
which is consistent because the white noise terms are assumed to
be independent of the history of Yt.
 Impulse Response Function
 It measures the response of Yj,t+s to an impulse in Y1t, keeping






constant all other variables dated t and before.
COINTEGRATION: The Multivariate Case
We have a set of k I(1) variables, there may exist up to k-1
independent linear relationships that are I(0). While any linear
combination of these relationships is also I(0).
Vector error correction model
Johansen developed a maximum likelihood estimation procedure,
which allows one test for the number of cointegrating relations.
Trace test and maximum eigenvalue test. They are actually
likelihood ratio tests.1
ILLUSTRATION: long run purchasing power parity, page 331
Illustration
Mt:
log of real M1 money balances
Inflt:
quarterly inflation rate (in % per year)
Cpr t:
commercial paper rate
yt:
log real GDP (in billions of 1987 dollars)
tbr t:
treasury bill rate
We can think three possible cointegrating relationships governing
the long run behavior of these variables
 First we can specify an equation for money demand as;








mt  1  14 yt  15tbrt  1t
where β14 denotes the
income elasticity. It can be expected that it is close to unity,
corresponding to a unitary income elasticity., and β15 < 0.
 Second, if real interest rates are stationary we can expect that
inf lt   2  25tbrt   2t
 Corresponds to a cointegrating relationship with β25 = 1. this is
referred to as the Fisher relation, where are using actual inflation as a
proxy for expected inflation.
 Third, it can be expected that the risk premium, as measured by the
difference between the commercial paper rate and the treasury bill
cpr     tbr  
rate, is stationary such that,
t
3
cprto  3  35tbrt   3t
35
t
3t
with β35 =1
 Before proceeding to the vector process of these five variables, let us
consider the OLS estimates of the above three regressions.
 Univariate cointegrating regressions by OLS, intercept estimates not
reported, standard errors in the parenthese




Money Demand
-1
0
0
0.423
mt:
Inflt:
cpr t:
yt:
o





tbr t:
R2
dw
ADF(6)
Fisher Equation
0
-1
0
0
Risk Prem.
0
0
-1
0
(0.016)
-0.031
(0.002)
0.815
0.199
-3.164
0.558
(0.053)
0.409
0.784
-1.188
1.038
(0.010)
0.984
0.705
-3.975
Note that the OLS standard errors are inappropriate if the variables in the
regression are integrated. Except for the risk premium equation, the R2 are
not close to unity, which is an informal requirement for a cointegrating
regression.
 The empirical evidence for the existence of the suggested
cointegration relationships between the five variables is somewhat
mixed. Only for the risk premium equation we find and R2 close to
unity, a sufficiently high dw statistic and a significant rejection of the
ADF test for a unit root in the residuals. For the two other
regressions there is little reason to reject the null hypotheses of no
cointegration. Potentially this is caused by the lack of power of the
tests that we employ, and it is possible that a multivariate vector
analysis provides stronger evidence for the existence of cointegrating
relationships between these five variables.
 Plot of the residuals verbeek page 335
 The first step in the Johansen approach involves testing for the
cointegrating rank r. to compute these tests we need to choose the
maximum lag length p in the vector autoregressive model. Choosing
p too small will invalidate the tests and choosing p too large may
result in a loss of power
 Trace and maximum eigenvalue tests for cointegration
test statistics

 Null hypothesis

 H0: r=0
 H0: r≤1
 H0: r≤2
 H0: r≤3

 H0: r=0
 H0: r≤1
 H0: r≤2
 H0: r≤3
 See page 337-338
Alternative
p=5
λtrace- statistic
H1:r≥1
108.723
H1:r≥2
59.189
H1:r≥3
29.201
H1:r≥4
13.785
p=6
5%critical value
127.801
75.98
72.302
53.48
35.169
34.87
16.110
20.18
49.534
55.499
734.40
29.988
37.133
28.27
15.416
19.059
22.04
9.637
11.860
15.87
λmax- statistic
H1:r=1
H1:r=2
H1:r=3
H1:r=4
 We impose two restrictions and, assuming that the money demand and
risk premium relationships are the most likely candidates. We shall
impoase that mt and cprt have coefficients of -1 ,0 and 0, -1,
respectively. Economically we expect that inflt does not enter in any of
the cointegrating vektors. With these two restrictions, the cointegrating
vectors are ectimated by maximum likelihood, jointly with the
coefficients in the vector error-correction model. The results for the
cointegrating vectors are presented in table.
 ML estimates of cointegrating vectors (after normalization) based









on VAR with p=6 (standard errors in the parentheses)
Money Demand
Risk Premium
mt:
-1
0
inflt:
-0.023
0.041
(0.006)
(0.031)
cpr t:
0
-1
yt:
0.425
-0.037
(0.033)
(0.173)
tbr t:
-0.028
1.017
(0.005)
(0.026)
Likelihood value: 808.2770
 It is possible to test our a priori cointegrating vectors by using
likelihood ratio tests. These tests require that the model is re
estimated imposing some additional restrictions on the
cointegrating vectors. This way we can test the following
hypotheses,
H 0a : 12  0, 14  1
H 0b :  22   24  0,  25  1,
H 0c : 12   22   24  0, 14   25  1
 The loglikelihood values for the complete model, estimated
imposing H0a , H0b andH0c respectively are given by 782.3459,
783.7761 and 782.3196. the likelihood ratio test statistics defined
as twice the difference in loglikelihood values, for the three null
hypothesis are thus given by 51.86,49.00 and 51.91. compared
with the chi-squared critical values with 3,2 or 5 degrees of
freedom, each of the hypotheses is clearly rejected.
 As a last step we consider the VECM for this system. This corresponds
to a VAR of order p-1=5 for the first differenced series, with the
inclusion of two error correction terms in each equation. One for
each cointegrating vector. The number of parameters estimated in this
VECM is well above 100, so we shall concentrate on a limited part of
the results only. The two error correction terms are given by
ecm1t  mt  0,023inf lt  0.425yt  0,028tbrt  3.362
ecm2t  cprt  0,04inf lt  0,037yt  1.017tbrt  0,687
 The adjustment coefficients in 5x2 matrix with their associated
standard errors are reported in the table. The long run money demand
equation contributes significantly to the short run movements of both
money demand and income. The short run behaviour of money
demand, inflation and the commercial paper rate appears to be
significanatly affected by the long run risk premium relationship
 Estimated matrix of adjustment coefficients (standard errors in the
parentheses) * indicates significance at the 5% level

Error correction term
 Equation
ecm1t-1
em2t-2











Δmt
0.0276
0,0090*
(0.0104)
(0.0024)
Δinflt
1.4629
-1.1618*
(2.3210)
(0.5287)
Δcprt
-2.1364
0.6626*
(1.1494)
(0.2618)
Δyt
0.0687*
0,0013
(0.0121)
(0.0028)
Δtbrt
-1.2876
0.3195
(1.0380)
(0.2365)
There is no statistical evidence that treasury bill rate adjusts to
any deviation from long run equlibria, so that it could be
treated as weakly exogenous
Model based on panel data
 An important adavaantage of the panel data compared to time series
and cross-sectional data sets it allows identification of sertain
paraameters or questions, without the need to make restrictive
assumptions. For example panel data maake it possible to analyse
changes on an individual level.
 Consider a situation in wwhich the average consumption level rises
with 2% from ona yeaar to another. Panel data can identfy whether
this rise is a result of an increase of 2% for all individuals or an
increase of 4% for approximately one half of the individuals and no
change for the other half(or any other combinations)
 Panel data are not only suitable to model or explain why individual
units behave differently but also to model why a given unit behaves
differently at different time periods.
 We shall in the sequel index all variables by an I for the individual
(i=1,2,3,…,N)and a t for the time period (t=1,2,3,…,T)
 We can specify a linear model as,

yit  it xit   it where βit measures the partial effects
of xit in period t for unit i.
 Xit is a K-dimensional vector of explanatory variables, not
including a constant. This means that the effects of a change in x are

the same for all units and all periods,
but that the average level for
unit I may be different from that for unit j. the αi thus capture the
effects of those variables that are peculiar to the i-th individual and
that are constant over time. In the standard case, εit is assumed to
be independent and identicallyNdistributed over individuals and
yit   j dij. 
xit treat the αi as N fixed
time, with mean 0 and variance
If we
j 1
unknown parameters the model
is referred to as the standard
fixed effects model
2

 An alternative approach assumes that the intercepts of the
individuals are different but that they can be treated as drawings
from a distribution with mean μ and variance  2 . The essential
assumption here is that these drawings are independent of the
explanatory variables in xit. This leads to random effect model
where the individual effects αi are treated as random
 Most panel data models are estimated under either the
fixed effects or the random effects assumption
 Efficiency of Parameter Estimation
 Because panel data sets are typically larger than cross-sectional or
time series data sets, and explanatory variables vary over two
dimensions (individuals and time) rather than one, estimators
based on panel dat are quite often more accurate tha from other
sources. Even with identical sample sizes the use of a panel data
set wil often yield more efficient estimators than a series of
independent cross sections (where different units are sampled in
each period)1
 Identification of parameters
 A second adavantage of the availability of panel dat is that it
reduces identification problem.
 Estimating the model under fixed effects assumption eliminates αi
from the error term and, consequently, eliminates all endogeneity
problems relating to it. 1
 THE STATIC LINEAR MODEL
 1. Fixed effects model
 This is simply a linear regression model in which the intercept
terms vary over the individual units i,
yit  i  xit   it
 Where it is usually assumed that all xit are independent of all εit.
We can write this in the usual regression framework by including
a dummy variable for each unit i in the model
N

we thus have a set of N dummy
yit   j dij  xit   it
variables in the model. The
j 1
parameters can be estimated by OLS. The implied estimator for β
is referred to as the least squares dummy variable estimator
(LSDV)
 We eliminate the individual effect of αi first by transforming the
data
yit  yi  ( xit  xi )  ( it   i )
 This is a regression model in deviations from individual means and
does not include the individual effects αi. The transformation that
produces observations in deviation from individual means is called
the within transformation. The OLS estimator for β obtained from
this transformed model is often called the within estimator or fixed
effects estimator, and it is exactly identical to the LSDV estimator.
 We call xit strictly exogenous. A strictly exogeneous variable is not
allowed to depend on current, future and past values of the error
term.
 The fixed effects model concentrates on differences within
individuals. It is explaining to what extent yit differs from yi and
does not explain why yi is different from yj
 The random effect model
 It is commonly assumed in regresion analysis that all factors that
affect the dependent variable, but that have not been included as
regressors, can be appropriately summarized by a random error
term. In our case this leads to the assumption that the αi are random
factors, independently and identically distributed over individuals.
Thus write the random affects model as;
yit    xit  i   it
 i   it
 Where  i   it is treated as an error term consisting of two
components: and individual specific component, which does not
vary over time, and the remainder component, which is assumed to
be uncorrelated over time.
 That is all correlation of the error terms over time is attributed to
the individual affects of xjs (for all j and s). This implies that OLS
estimator is unbiased and consistent.
 Estimators for the parameter vector β;
 1- the between estimator, exploiting the between dimension of the
data (differences between individuals) determined as the OLS in a
regression of individual averages of y on individual averages of x. the
explanatory variables are strictly exogenous and uncorrelated with
the individual specific affect αi.
 2- the fixed effect estimator, exploiting the within dimension of the
data (differences between individuals) determined as the OLS
estimator in a regression in deviations from individual means.
 3-the OLS estimator, exploiting both dimensions (within and
between) but not efficiently. The explanatory variables to be
uncorrelated with αi but does not impose that they are strictly
exogenous.
 4- The random effect (EGLS) estimator, combining the information
from the between and within dimensions in an efficient way.
 Fixed effects or random effects
 The fixed effects approach is conditional upon the values for αi.
 If the individuals in the sample are one of a kind and cannot be
viewed as a random draw some underlying population. When I
denotes countries, large companies or industries, and predictions
we want to make are for a particular country, company or
industry.
 In contrast the random effects approach is not conditional upon
the individual αis, but integrates them out. In this case, we are
usually not interested in the particular value of some persons αi,
we just focus on arbitrary individuals that have certain
characteristics. The random effects approach allows one to make
inference with respect to the population characteristics.
 Hausman test: tests whether the fixed effects and random
effects estimator are significantly different.
 Goodness of fit
 The computation of goodness of fit measures in panel data
applications is somewhat uncommon. One reason is the fact that one
may attach different importance to explaining the within and
between variation in the data. Another reason is that the usual R2 or
adjusted R2 criteria are only appropriate if the model is estimated by
OLS.
 The goodness of fit measures are not adequate to choose between
alternative estimators
 Robust Inference, page 355
 Testing for heteroscedasticity and autocorrelatio pg.357
 ILLUSTRATION- 358 . Explaining individual wages
Dynamic linear models
An autoregressive panel data model
Dynamic models with exogenous variables
Illustration- wage elasticities of
labour demand
Non-stationarity, unit roots and
cointegration
Panel data unit root test
Panel data cointegration tests
Models with limited dependant
variables
Binary choice models
The fixed effects logit model
The random affects probit model
Tobit models
Incomplete panels and selection
bias