lecture note 8
Download
Report
Transcript lecture note 8
STAT 497
LECTURE NOTES 8
ESTIMATION
1
ESTIMATION
• After specifying the order of a stationary ARMA
process, we need to estimate the parameters.
• We will assume (for now) that:
1. The model order (p and q) is known, and
2. The data has zero mean.
• If (2) is not a reasonable assumption, we can
subtract the sample mean Y , fit a zero-mean
ARMA model: B X t B at where X t Yt Y
Then use X t Y as the model for Yt.
2
ESTIMATION
– Method of Moment Estimation (MME)
– Ordinary Least Squares (OLS) Estimation
– Maximum Likelihood Estimation (MLE)
– Least Squares Estimation
• Conditional
• Unconditional
3
THE METHOD OF MOMENT ESTIMATION
• It is also known as Yule-Walker estimation. Easy but not
efficient estimation method. Works for only AR models
for large n.
• BASIC IDEA: Equating sample moment(s) to population
moment(s), and solve these equation(s) to obtain the
estimator(s) of unknown parameter(s).
1n
E (Yt ) Yt Y
n t 1
OR
1n
E YtYt k YtYt k k ˆk
n t 1
k ˆ k
4
THE METHOD OF MOMENT ESTIMATION
• Let n is the variance/covariance matrix of X with
the given parameter values.
• Yule-Walker for AR(p): Regress Xt onto Xt−1, . . ., Xt−p.
• Durbin-Levinson algorithm with replaced by ˆ .
• Yule-Walker for ARMA(p,q): Method of moments.
Not efficient.
5
THE YULE-WALKER ESTIMATION
• For a stationary (causal) AR(p)
p
E X t k X t j X t j E X t k at , k 0,1,..., p
j 1
0 p a2 and
p p 0
where 1 ,2 ,. p
To calculate the values E X t k at , we have used the
RSF of the process : X t B at .
6
THE YULE-WALKER ESTIMATION
• To find the Yule-Walker estimators, we are
using, k ˆk or k ˆ k .
ˆ ˆ ˆ
p
p
ˆ
Thus, theYule - Walkerequationsfor :
2
ˆˆ
ˆ
ˆ
0
p
a
• These are forecasting equations.
• We can use Durbin-Levinson algorithm.
7
THE YULE-WALKER ESTIMATION
• If ˆ0 0, then ˆ m is nonsingular.
• If {Xt} is an AR(p) process,
2
asymp .
1
a
ˆ
~ N , p
n
ˆ a2 p a2
1
ˆ
kk ~ N 0, for k p.
n
asymp .
Hence, we can use the sample PACF to test for AR order, and we can
calculate approximate confidence intervals for the parameters.
8
THE YULE-WALKER ESTIMATION
• If Xt is an AR(p) process, and n is large,
n ˆ ~ N 0,ˆ a2ˆ p1
approx.
• 100(1)% approximate confidence interval
for j is
ˆ
1 1 / 2
a
ˆj z / 2 ˆ p
n
jj
9
THE YULE-WALKER ESTIMATION
• AR(1)
Yt Yt 1 at
Find the MME of .
It is known that 1 = .
1 ˆ1
n
Yt Y Yt 1 Y
ˆ1 t 1
n
Yt Y
2
t 1
10
THE YULE-WALKER ESTIMATION
• So, the MME of is
n
~
Yt Y Yt 1 Y
t 1
n
Yt Y
2
t 1
• Also, is unknown.
2
a
a2
0
1 2
• Therefore, using the variance of the process,
we can obtain MME of a2 .
11
THE YULE-WALKER ESTIMATION
0 ˆ0
a2
1 n
2
Yt Y
2
1 n t 1
n
1
~
2
2
2
~
Yt Y
a 1
n t 1
n
1
2
2
2
~
a 1 ˆ1 Yt Y
n t 1
12
THE YULE-WALKER ESTIMATION
• AR(2)
Yt 1Yt 1 2Yt 2 at
Find the MME of all unknown parameters.
• Using the Yule-Walker Equations
1
1 1 2 1 1
1 2
2 11 2 2
2
1
1 2
2
13
THE YULE-WALKER ESTIMATION
• So, equate population autocorrelations to
sample autocorrelations, solve for 1 and 2.
1
1 ˆ1
ˆ1
1 2
2 ˆ 2
2
1
1 2
2 ˆ 2
14
THE YULE-WALKER ESTIMATION
2
ˆ
ˆ
~
~ ˆ1 1 ˆ 2
2
1
2
and1
.
2
2
1 ˆ1
1 ˆ
1
Using these we can obtain the MME of
2
a
~
~
2
~
a ˆ0 1 1ˆ1 2 ˆ 2
To obtain MME of , use the process variance
formula.
2
a
15
THE YULE-WALKER ESTIMATION
• AR(1)
0
2
a
1 ˆ
2
ˆ ˆ11ˆ1
1
ˆ0
ˆ1
• AR(2)
1
ˆ
1
ˆ
ˆ0 ˆ1 ˆ1
1 1
ˆ 2
ˆ
ˆ2 ˆ1 ˆ0 ˆ2
2
16
THE YULE-WALKER ESTIMATION
• MA(1)
Yt at at 1
• Again using the autocorrelation of the series
at lag 1,
1
ˆ1
2
Choose the root so
1
2 ˆ1 ˆ1 0
2
1
1
4
ˆ
~
1
1, 2
2 ˆ1
that the root satisfying
the invertibility
condition
17
THE YULE-WALKER ESTIMATION
• For real roots,
1 4 ˆ12 0 0.25 ˆ12
0.5 ˆ1 0.5
If ˆ1 0.5, unique real roots but non-invertible.
If ˆ1 0.5 , no real roots exists and MME fails.
If ˆ1 0.5, unique real roots and invertible.
18
THE YULE-WALKER ESTIMATION
• This example shows that the MMEs for MA and
ARMA models are complicated.
• More generally, regardless of AR, MA or ARMA
models, the MMEs are sensitive to rounding
errors. They are usually used to provide initial
estimates needed for a more efficient nonlinear
estimation method.
• The moment estimators are not recommended
for final estimation results and should not be
used if the process is close to being nonstationary
or noninvertible.
19
THE MAXIMUM LIKELIHOOD
ESTIMATION
• Assume that a ~ N 0, .
i .i . d .
t
2
a
• By this assumption we can use the joint pdf
f a1 ,, an f a1 f an
instead of f y1 ,, yn which cannot be
written as multiplication of marginal pdfs
because of the dependency between time
series observations.
20
MLE METHOD
• For the general stationary ARMA(p,q) model
Yt 1Yt 1 pYt p at 1at 1 q at q
or
at Yt 1Yt 1 pYt p 1at 1 q at q
where Yt Yt .
21
MLE
• The joint pdf of (a1,a2,…, an) is given by
f a1 ,, an , , ,
2
a
2
2 n / 2
a
1 n 2
exp 2 at
2 a t 1
where 1 ,, p and 1 ,. q .
• Let Y=(Y1,…,Yn) and assume that initial
conditions Y*=(Y1-p,…,Y0)’ and a*=(a1-q,…,a0)’
are known.
22
MLE
• The conditional log-likelihood function is given
by
n
S* , ,
2
2
ln L , , , a ln2 a
2
2 a2
where S , , a , , Y , Y , a is the
n
*
t 1
2
t
*
*
conditional sum of squares.
Initial Conditions: Y* Y and a* E at 0.
23
MLE
• Then, we can find the estimators of =(1,…,p),
=(1,…, q) and such that the conditional
likelihood function is maximized. Usually,
numerical nonlinear optimization techniques
are required. After obtaining all the estimators,
S* ˆ ,ˆ,ˆ
ˆ
d. f .
2
a
where d.f.= of terms used in SS of parameters
= (np) (p+q+1) = n (2p+q+1).
24
MLE
• AR(1) Yt Yt 1 at where at ~ N 0, a2 .
i .i .d .
f a1,, an 2
2 n / 2
exp
a
1
2
2
a
2 t
a t 1
n
Y1 Y0 a1 Let' s take Y0 0
Y2 Y1 a2 a2 Y2 Y1
Y3 Y2 a3 a3 Y3 Y2
Yn Yn1 an an Yn Yn1
25
MLE
The Jacobian will be
a2
Y2
J
an
Y2
a2
a2
1 0 0
Y3
Yn 1 0
1
an
an
0 0 1
Y3
Yn
f Y2 ,, Yn Y1 f a2 ,, an J f a2 ,, an
26
MLE
• Then, the likelihood function can be written as
L , a2 f Y1 ,, Yn f Y1 f Y2 ,, Yn Y1
f Y1 f a2 ,, an
1/ 2
1
e
2 0
Y1 0 2
2 0
n 1 / 2 1 n Y Y 2
t
t 1
2
1
2
2 a
e
2 a t 2
.
where Y1 ~ N 0, 0
2
1
2
a
27
MLE
• Hence,
1 2
1 n
2
2
2
L ,
exp 2 Yt Yt 1 1 Y1
n
/
2
2 a2
2 a t 2
2
a
• The log-likelihood function:
n
n
1
2
2
ln L , a ln 2 ln a ln 1 2
2
2
2
1 n
2
2
2
2 Yt Yt 1 1 Y1
2 a t
2
S*
S
28
MLE
• Here, S*() is the conditional sum of squares
and S() is the unconditional sum of squares.
• To find the value of where the likelihood
function is maximized,
ln L , a2
0 ˆ.
• Then,
S ˆ
ˆ
.
n
2
a
29
MLE
• If we neglect ln(12), then MLE=conditional LSE.
max L ,
•
min S .
If we neglect both ln(12) and 1 Y
max L , min S .
2
a
2
2
a
2
1 ,
then
*
30
MLE
• Asymptotically unbiased, efficient, consistent,
sufficient for large sample sizes but hard to
deal with joint pdf.
31
CONDITIONAL LEST SQUARES
ESTIMATION
• AR(1)
Yt Yt 1 at where at ~ N 0, a2 .
i .i . d .
at Yt Yt 1
SSE a Yt Yt 1 S* for observed Y1 ,..., Yn .
n
t 1
2
t
n
2
t 1
n
dS*
2 Yt 1 Yt Yt 1 0
t 1
d
n
YtYt 1
ˆ t 2n
2
Y
t 1
t 2
32
CONDITIONAL LSE
• If the process mean is different than zero
Yt Y Yt 1 Y
n
ˆ t 2
Yt 1 Y
n
2
MME of
t 2
33
CONDITIONAL LSE
• MA(1)
Yt at at 1 ,1 1, at ~ WN _ Normal 0, a2
IF : at Yt Yt 1 2Yt 2 AR
– Non-linear in terms of parameters
– LS problem
– S*() cannot be minimized analytically
– Numerical nonlinear optimization methods like
Newton-Raphson or Gauss-Newton,...
*There are similar problem is ARMA case.
34
UNCONDITIONAL LSE
min S wrt where
S Yt Yt 1 1 Y1
n
2
2
2
t 2
dS
0 ˆ
d
• This nonlinear in .
• We need nonlinear optimization techniques.
35
BACKCASTING METHOD
• Obtain the backward form of ARMA(p,q)
1 F F Y 1 F F a
p
1
p
q
t
1
q
t
where F Yt Yt j .
j
• Instead of forecasting, backcast the past
values of Yt and at, t 0. Obtain the
unconditional log-likelihood function, then
obtain the estimators.
36
EXAMPLE
• If there are only 2 observations in time series
(not realistic)
Y1 a1
Y2 a2 a1
where a1 and a2 ~ N 0, .
2
Find the MLE of and a .
i .i . d .
2
a
37
EXAMPLE
• US Quarterly Beer Production from 1975 to 1997
> par(mfrow=c(1,3))
> plot(beer)
> acf(as.vector(beer),lag.max=36)
> pacf(as.vector(beer),lag.max=36)
38
EXAMPLE (contd.)
> library(uroot)
Warning message: package 'uroot' was built under R version 2.13.0
> HEGY.test(wts =beer, itsd = c(1, 1, c(1:3)), regvar = 0,selectlags = list(mode = "bic", Pmax = 12))
Null hypothesis: Unit root.
Alternative hypothesis: Stationarity.
---HEGY statistics:
Stat. p-value
tpi_1 -3.339 0.085
tpi_2 -5.944 0.010
Fpi_3:4 13.238 0.010
> CH.test(beer)
------ - ------ ---Canova & Hansen test
------ - ------ ---Null hypothesis: Stationarity.
Alternative hypothesis: Unit root.
L-statistic: 0.817
Critical values:
0.10 0.05 0.025 0.01
0.846 1.01 1.16 1.35
39
EXAMPLE (contd.)
> plot(diff(beer),ylab='First Difference of Beer Production',xlab='Time')
> acf(as.vector(diff(beer)),lag.max=36)
> pacf(as.vector(diff(beer)),lag.max=36)
40
EXAMPLE (contd.)
> HEGY.test(wts =diff(beer), itsd = c(1, 1, c(1:3)), regvar = 0,selectlags = list(mode = "bic", Pmax = 12))
---- ---HEGY test
---- ---Null hypothesis: Unit root.
Alternative hypothesis: Stationarity.
---HEGY statistics:
Stat. p-value
tpi_1 -6.067 0.01
tpi_2 -1.503 0.10
Fpi_3:4 9.091 0.01
Fpi_2:4 7.136 NA
Fpi_1:4 26.145 NA
41
EXAMPLE (contd.)
> fit1=arima(beer,order=c(3,1,0),seasonal=list(order=c(2,0,0), period=4))
> fit1
Call:
arima(x = beer, order = c(3, 1, 0), seasonal = list(order = c(2, 0, 0), period = 4))
Coefficients:
ar1
ar2
ar3
sar1
sar2
-0.7380 -0.6939 -0.2299 0.2903 0.6694
s.e. 0.1056 0.1206 0.1206 0.0882 0.0841
sigma^2 estimated as 1.79: log likelihood = -161.55, aic = 335.1
> fit2=arima(beer,order=c(3,1,0),seasonal=list(order=c(3,0,0), period=4))
> fit2
Call:
arima(x = beer, order = c(3, 1, 0), seasonal = list(order = c(3, 0, 0), period = 4))
Coefficients:
ar1
ar2
ar3
sar1
sar2
sar3
-0.8161 -0.8035 -0.3529 0.0444 0.5798 0.3387
s.e. 0.1065 0.1188 0.1219 0.1205 0.0872 0.1210
sigma^2 estimated as 1.646: log likelihood = -158.01, aic = 330.01
42