lecture note 4

Download Report

Transcript lecture note 4

STAT 497 LECTURE NOTES 4 MODEL INDETIFICATION AND NON STATIONARY TIME SERIES MODELS

1

MODEL IDENTIFICATION

• • We have learned a large class of linear parametric models for stationary time series processes.

Now, the question is how we can find out the best suitable model for a given observed series. How to choose the appropriate model (on order of

p

and

q

).

2

MODEL IDENTIFICATION

• • ACF and PACF show specific properties for specific models. Hence, we can use them as a criteria to identify the suitable model.

Using the patterns of sample ACF and sample PACF, we can identify the model.

3

MODEL SELECTION THROUGH CRITERIA

• • • • Besides sACF and sPACF plots, we have also other tools for model identification.

With messy real data, sACF and sPACF plots become complicated and harder to interpret. Don’t forget to choose the best model with as few parameters as possible.

It will be seen that many different models can fit to the same data so that we should choose the most appropriate (with less parameters) one and the information criteria will help us to decide this.

4

MODEL SELECTION THROUGH CRITERIA

• The three well-known information criteria are – Akaike’s information criterion (AIC) (Akaike, 1974) – Schwarz’s Bayesian Criterion (SBC) (Schwarz, 1978). Also known as Bayesian Information Criterion (BIC) – Hannan-Quinn Criteria (HQIC) (Hannan&Quinn, 1979) 5

AIC

• • Assume that a statistical model of

M

parameters is fitted to data

AIC

  2 ln  maximum likelihood   2

M

.

For the ARMA model and

n

log-likelihood function observations, the ln

L

 

n

ln 2 2 

a

2    assuming

a t

 1 2 

a

2

S

   ,  

q

,   

SS

Re

sidual i

.

i

.

d

~

N

 0 ,  2

a

    .

6

AIC

• Then, the maximized log-likelihood is l n

L

 

n

2 ln  ˆ

a

2 

n

1  ln 2   

constant

AIC

n

ln  ˆ

a

2  2

M

Choose model (or the value of

M

) with minimum AIC.

7

SBC

• • The Bayesian information criterion (BIC) or Schwarz Criterion (also SBC, SBIC) is a criterion for model selection among a class of parametric models with different numbers of parameters. When estimating model parameters using maximum likelihood estimation, it is possible to increase the likelihood by adding additional parameters, which may result in overfitting. The BIC resolves this problem by introducing a penalty term for the number of parameters in the model.

8

SBC

• In SBC, the penalty for additional parameters is stronger than that of the AIC.

SBC

n

ln  ˆ

a

2 

M

ln

n

• • It has the most superior large sample properties. It is consistent, unbiased and sufficient.

9

HQIC

• The Hannan-Quinn information criterion (HQIC) is an alternative to AIC and SBC.

HQIC

n

ln  ˆ

a

2  2

M

ln • It can be shown [see Hannan (1980)] that in the case of common roots in the AR and MA polynomials, the Hannan-Quinn and Schwarz criteria still select the correct orders

p and q

consistently.

10

THE INVERSE AUTOCORRELATION FUNCTION

• The sample inverse autocorrelation function (SIACF) plays much the same role in ARIMA modeling as the sample partial autocorrelation function (SPACF), but it generally indicates subset and seasonal autoregressive models better than the SPACF. 11

THE INVERSE AUTOCORRELATION FUNCTION

• Additionally, the SIACF can be useful for detecting over-differencing. If the data come from a nonstationary or nearly nonstationary model, the SIACF has the characteristics of a noninvertible moving-average. Likewise, if the data come from a model with a noninvertible moving average, then the SIACF has nonstationary characteristics and therefore decays slowly. In particular, if the data have been over-differenced, the SIACF looks like a SACF from a nonstationary process 12

THE INVERSE AUTOCORRELATION FUNCTION

• • Let

Y t

p

If 

(B)

be generated by the ARMA(

p, q

) process

Y t

 

q

 

a t

where

a t

~ is invertible, then the model 

q

 

Z t

 

p

 

a t

 

a

.

is also a valid ARMA(

q, p

) model. This model is sometimes referred to as the dual model. The autocorrelation function (ACF) of this dual model is called the inverse autocorrelation function (IACF) of the original model. 13

• •

THE INVERSE AUTOCORRELATION FUNCTION

Notice that if the original model is a pure autoregressive model, then the IACF is an ACF that corresponds to a pure moving-average model. Thus, it cuts off sharply when the lag is greater than autocorrelation function (PACF).

p

; this behavior is similar to the behavior of the partial Under certain conditions, the sampling distribution of the SIACF can be approximated by the sampling distribution of the SACF of the dual model (Bhansali, 1980). In the plots generated by ARIMA, the confidence limit marks (.) are located at process. 

2n

1/2

. These limits bound an approximate 95% confidence interval for the hypothesis that the data are from a white noise 14

• •

EXAMPLE USING SIMULATED SERIES 1

Simulated 100 data from AR(1) where  =0.5.

SAS output Autocorrelations Lag Covariance Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 Std Error 0 1.498817 1.00000 | |********************| 0 1 0.846806 0.56498 | . |*********** | 0.100000

2 0.333838 0.22273 | . |****. | 0.128000

3 0.123482 0.08239 | . |** . | 0.131819

4 0.039922 0.02664 | . |* . | 0.132333

5 -0.110372 -.07364 | . *| . | 0.132387

6 -0.162723 -.10857 | . **| . | 0.132796

7 -0.301279 -.20101 | .****| . | 0.133680

8 -0.405986 -.27087 | *****| . | 0.136670

9 -0.318727 -.21265 | . ****| . | 0.141937

10 -0.178869 -.11934 | . **| . | 0.145088

11 -0.162342 -.10831 | . **| . | 0.146066

12 -0.180087 -.12015 | . **| . | 0.146867

13 -0.132600 -.08847 | . **| . | 0.147847

14 0.026849 0.01791 | . | . | 0.148375

15 0.175556 0.11713 | . |** . | 0.148397

15

Inverse Autocorrelations

Lag Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 1 -0.50606 | **********| . | 2 0.09196 | . |** . | 3 0.06683 | . |* . | 4 -0.14221 | .***| . | 5 0.16250 | . |***. | 6 -0.07833 | . **| . | 7 -0.02154 | . | . | 8 0.10714 | . |** . | 9 -0.03611 | . *| . | 10 0.03881 | . |* . | 11 -0.04858 | . *| . | 12 0.00989 | . | . | 13 0.09922 | . |** . | 14 -0.09950 | . **| . | 15 0.11284 | . |** . |

Partial Autocorrelations

Lag Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 1 0.56498 | . |*********** | 2 -0.14170 | .***| . | 3 0.02814 | . |* . | 4 -0.01070 | . | . | 5 -0.11912 | . **| . | 6 -0.00838 | . | . | 7 -0.17970 | ****| . | 8 -0.11159 | . **| . | 9 0.02214 | . | . | 10 -0.01280 | . | . | 11 -0.07174 | . *| . | 12 -0.06860 | . *| . | 13 -0.02706 | . *| . | 14 0.07718 | . |** . | 15 0.04869 | . |* . | 16

• •

EXAMPLE USING SIMULATED SERIES 2

Simulated 100 data from AR(1) where  =0.5 and take a first order difference.

SAS output

Autocorrelations

Lag Covariance Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 Std Error 0 1.301676 1.00000 | |********************| 0 1 -0.133104 -.10226 | . **| . | 0.100504

2 -0.296746 -.22797 | *****| . | 0.101549

3 -0.131524 -.10104 | . **| . | 0.106593

4 0.080946 0.06219 | . |* . | 0.107557

5 -0.116677 -.08964 | . **| . | 0.107919

6 0.080503 0.06185 | . |* . | 0.108669

7 -0.016109 -.01238 | . | . | 0.109024

8 -0.176930 -.13592 | .***| . | 0.109038

9 -0.055488 -.04263 | . *| . | 0.110736

10 0.136477 0.10485 | . |** . | 0.110902

11 0.022838 0.01754 | . | . | 0.111898

12 -0.067697 -.05201 | . *| . | 0.111926

13 -0.117708 -.09043 | . **| . | 0.112170

14 0.013985 0.01074 | . | . | 0.112904

15 0.0086790 0.00667 | . | . | 0.112914

17

Inverse Autocorrelations

Lag Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 1 0.58314 | . |************ | 2 0.60399 | . |************ | 3 0.56860 | . |*********** | 4 0.46544 | . |********* | 5 0.51176 | . |********** | 6 0.43134 | . |********* | 7 0.40776 | . |******** | 8 0.42360 | . |******** | 9 0.36581 | . |******* | 10 0.33397 | . |******* | 11 0.28672 | . |****** | 12 0.27159 | . |***** | 13 0.26072 | . |***** | 14 0.16769 | . |***. | 15 0.17107 | . |***. |

Partial Autocorrelations

Lag Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 1 -0.10226 | . **| . | 2 -0.24095 | *****| . | 3 -0.16587 | .***| . | 4 -0.03460 | . *| . | 5 -0.16453 | .***| . | 6 0.01299 | . | . | 7 -0.06425 | . *| . | 8 -0.18066 | ****| . | 9 -0.11338 | . **| . | 10 -0.03592 | . *| . | 11 -0.05754 | . *| . | 12 -0.08183 | . **| . | 13 -0.17169 | .***| . | 14 -0.11056 | . **| . | 15 -0.13018 | .***| . | 18

THE EXTENDED SAMPLE AUTOCORRELATION FUNCTION_ESACF

• The extended sample autocorrelation function (ESACF) method can tentatively identify the orders of a stationary or nonstationary ARMA process based on iterated least squares estimates of the autoregressive parameters. Tsay and Tiao (1984) proposed the technique.

19

ESACF

•  1 Consider ARMA(

p, q

) model   1

B

   

p B p

Y t

  0   1   1

B

   

q B q

a t

or

Y t

  0   1

Y t

 1    

p Y t

p

a t

  1

a t

 1    

q a t

q

then

Z t

  1   1

B

   

p B p

Y t

follows an MA(

q

) model

Z t

  0 

a t

  1

a t

 1    

q a t

q

.

20

ESACF

Given a stationary or nonstationary time series

Y t Y t

Y t

 true autoregressive order of

p+d

and with a true moving-average order of

q

, we can use the ESACF method to estimate the unknown orders and by analyzing the sample autocorrelation functions associated with filtered series of the form

Z t

m

,

j

   ˆ 

m

,

j

 

Y t

Y t

i m

  1 

i

ˆ 

m

,

j

Y t

j

where  ˆ i ' s are the parameter estimates under the assumption that the series is an ARMA  m, j  process.

21

ESACF

• • It is known that OLS estimators for ARMA process are not consistent so that an iterative procedure is proposed to overcome this.

 ˆ

i

m

,

j

    ˆ

i

m

 1 ,

j

 1    ˆ

i

m

 1 ,

j

 1   ˆ  ˆ 

m m

  1 1 , 

m

,

j m j

 1   1  The

j

-th lag of the sample autocorrelation function of the filtered series is the extended sample autocorrelation function, and it is denoted as ˆ

j

.

22

AR 0 1 2 3 …

ESACF

ESACF TABLE

0

 ˆ  ˆ  ˆ 1 1   1    ˆ 1 …

1

 ˆ  ˆ 2   2   ˆ ˆ 2   2 …

MA 2

   ˆ ˆ ˆ 3   3 3    ˆ 3 …

3

   ˆ ˆ ˆ 4   4 4    ˆ 4 …

… … … … … 23

ESACF

• For an ARMA(

p,q

) process, we have the following convergence in probability, that is, for

m=1,2,

… and

j=1,2,

…, we have

j

   0

,

0

X

m

p

j

q

0, otherwise

24

ESACF

• Thus, the asymptotic ESACF table for ARMA(1,1) model becomes

AR 0 1 2 3 4 …

X X …

0

X X X X X …

1

X 0 X X X …

2

X 0 0

MA

0 X …

3

X 0 0 0 0 …

4

X 0 0 … … …

… … … 25

ESACF

• • In practice, we have finite samples and  ˆ

j

, 0 

m

p

j

q

may not be exactly zero. However, we can use the Bartlett’s approximate  ˆ The orders are tentatively identified by finding a right (maximal) triangular pattern with vertices located at (

p+d, q

) and (

p+d, q max

) and in which all elements are insignificant (based on asymptotic normality of the autocorrelation function). The vertex (

p+d, q

) identifies the order.

26

EXAMPLE (R CODE)

> x=arima.sim(list(order = c(2,0,0), ar = c(-0.2,0.6)), n = 200) > par(mfrow=c(2,1)) > par(mfrow=c(1,2)) > acf(x) > pacf(x) 27

EXAMPLE (CONTD.)

• After Loading Package TSA in R: > eacf(x) AR/MA 0 1 2 3 4 5 6 7 8 9 10 11 12 13 0 x x x x x x x x x x x x x x 1 x x x x x o o o o o o o o o 2

o o o o o o o o o o o o o o

3 x

o

o o o o o o o o o o o o 4 x x

o

o o o o o o o o o o o 5 x o x

o

o o o o o o o o o o 6 x x o x

o

o o o o o o o o o 7 x x o x o

o

o o o o o o o o 28

MINIMUM INFORMATION CRITERION

AR

0 1 2 3 … MINIC TABLE

MA

0 1 2 3 SBC(0,0) SBC(0,1) SBC(0,2) SBC(0,3) SBC(1,0) SBC(1,1) SBC(1,2) SBC(1,3) SBC(2,0) SBC(2,1) SBC(2,2) SBC(2,3) SBC(3,0) SBC(3,1) SBC(3,2) SBC(3,3) … … … … … … … … … … 29

MINIC EXAMPLE

• • Simulated 100 data from AR(1) where  =0.5

SAS Output

Minimum Information Criterion

Lags MA 0 MA 1 MA 2 MA 3 MA 4 MA 5 AR 0 0.366884 0.074617 0.06748 0.083827 0.11816 0.161974

AR 1

-0.03571

-0.00042 0.038633 0.027826 0.064904 0.097701

AR 2 -0.0163 0.021657 0.064698 0.072834 0.107481 0.140204

AR 3 0.001216 0.034056 0.080065 0.118677 0.152146 0.183487

AR 4 0.037894 0.069766 0.115222 0.14586 0.189454 0.229528

AR 5 0.065179 0.099543 0.143406 0.185604 0.230186 0.272322

Error series model: AR(8) Minimum Table Value: BIC(1,0) = -0.03571

30

NON-STATIONARY TIME SERIES MODELS

• Non-constant in mean • Non-constant in variance • Both 31

NON-STATIONARY TIME SERIES MODELS

• • Inspection of the ACF serves as a rough indicator of whether a trend is present in a series. A slow decay in ACF is indicative of a large characteristic root; a true unit root process, or a trend stationary process.

Formal tests can help to determine whether a system contains a trend and whether the trend is deterministic or stochastic.

32

NON-STATIONARITY IN MEAN

• • Deterministic trend – Detrending Stochastic trend – Differencing 33

DETERMINISTIC TREND

• • A deterministic trend is when we say that the series is trending because it is an explicit function of time.

Using a simple linear trend model, the deterministic (global) trend can be estimated. This way to proceed is very simple and assumes the pattern represented by linear trend remains fixed over the observed time span of the series. A simple linear trend model:

Y t

   

t

a t

34

DETERMINISTIC TREND

• The parameter in

Y t

 measure the average change from one period to the another: 

Y t

Y t

Y t

 1 

E

 

  

t

 

t

   1

 

a t

a t

 1 • The sequence

{Y t }

will exhibit only temporary departures from the trend line 

+

t

. This type of model is called a trend stationary (TS) model.

35

EXAMPLE

36

TREND STATIONARY

• If a series has a deterministic time trend, then we simply regress

Y t

on an intercept and a time trend (

t=1,2,…,n

) and save the residuals. The residuals are detrended series. If stochastic, we do not necessarily get

Y t

is stationary series.

37

DETERMINISTIC TREND

• • Many economic series exhibit “exponential trend/growth”. They grow over time like an exponential function over time instead of a linear function.

For such series, we want to work with the log of the series: ln

 

t

   

t

a t

So the average growth rate is 

E

 ln

 

t

  : 38

DETERMINISTIC TREND

• Standard regression model can be used to describe the phenomenon. If the deterministic trend can be described by a

k

-th order polynomial of time, the model of the process

Y t

  0 where

a t

  1

t

~ 

WN

  2 0 ,

t

 2  2

a

.

   

k t k

a t

• Estimate the parameters and obtain the residuals. Residuals will give you the detrended series.

39

DETERMINISTIC TREND

• • •

This model has a short memory. If a shock hits a series, it goes back to trend level in short time. Hence, the best forecasts are not affected.

Rarely model like this is useful in practice. A more realistic model involves stochastic (local

) trend.

40

STOCHASTIC TREND

• A more modern approach is to consider trends in time series as a variable. A variable trend exists when a trend changes in an unpredictable way. Therefore, it is considered as stochastic.

41

STOCHASTIC TREND

• • • • Recall the AR(1) model:

Y t

As long as |

= c +

Y t−1 + a t

.

 | < 1, everything is fine (OLS is consistent, t-stats are asymptotically normal, ...).

Now consider the extreme case where 

= 1

, i.e.

Y t = c + Y t−1 + a t .

Where is the trend? No

t

term.

42

STOCHASTIC TREND

• Let us replace recursively the lag of

Y t

right-hand side:

Y t

c

c

 

Y t

c

 1  

Y t a t

 2 

a t

 1

a t

on the  

tc

Y

0 

i t

  1

a i

Deterministic trend • This is what we call a “random walk with drift”. If

c = 0

, it is a“random walk”.

43

STOCHASTIC TREND

• • Each

a i

shock represents shift in the intercept. Since all values of the effect of each shock on the intercept term is permanent.

{a i }

have a coefficient of unity, In the time series literature, such a sequence is said to have a stochastic trend since each

a i

conditional mean of the series. To be able to define this situation, we use Autoregressive Integrated Moving Average (ARIMA) models.

shock imparts a permanent and random change in the 44

DETERMINISTIC VS STOCHASTIC TREND

• They might appear similar since they both lead to growth over time but they are quite different.

• To see why, suppose that through any policies, you got a bigger

Y t

because the noise

a t

is big. What will happen next period?

With a deterministic trend,

Y t+1

The noise

a t

one period impact.

= c +

(t+1)+a t+1

. is not affecting

Y t+1

. Your policy had a – With a stochastic trend,

Y t+1 (c + Y t−1 + a t ) + a t+1 = c + Y t

. The noise

a t + a t+1 = c +

is affecting

Y t+1

. In fact, the policy will have a permanent impact. 45

DETERMINISTIC VS STOCHASTIC TREND

Conclusions: When dealing with trending series, we are always interested in knowing whether the growth is a deterministic or stochastic trend.

– There are also economic time series that do not grow over time (e.g., interest rates) but we will need to check if they have a behavior ”similar” to stochastic trends (

1

instead of

|

| < a

, while

c = 0

).

=

– A deterministic trend refers to the long-term trend that is not affected by short term fluctuations in the series. Some of the occurrences are random and may have a permanent effect of the trend. Therefore the trend must contain a deterministic and a stochastic component.

46

DETERMINISTIC TREND EXAMPLE

Simulate data from let’s say AR(1): >x=arima.sim(list(order = c(1,0,0), ar = 0.6), n = 100) Simulate data with deterministic trend >y=2+time(x)*2+x >plot(y) 47

DETERMINISTIC TREND EXAMPLE

> reg=lm(y~time(y)) > summary(reg)

Call: lm(formula = y ~ time(y)) Residuals: Min 1Q Median 3Q Max -2.74091 -0.77746 -0.09465 0.83162 3.27567 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.179968 0.250772 8.693 8.25e-14 *** time(y) 1.995380 0.004311 462.839

< 2e-16

*** -- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.244 on 98 degrees of freedom Multiple R-squared: 0.9995, Adjusted R-squared: 0.9995 F-statistic: 2.142e+05 on 1 and 98 DF, p-value: < 2.2e-16 48

DETERMINISTIC TREND EXAMPLE

> plot(y=rstudent(reg),x=as.vector(time(y)), ylab='Standardized Residuals',xlab='Time',type='o')

49

DETERMINISTIC TREND EXAMPLE

> z=rstudent(reg) > par(mfrow=c(1,2)) De-trended series > acf(z) > pacf(z) AR(1) 50

STOCHASTIC TREND EXAMPLE

Simulate data from ARIMA(0,1,1): > x=arima.sim(list(order = c(0,1,1), ma = -0.7), n = 200) > plot(x) > acf(x) > pacf(x) 51

AUTOREGRESSIVE INTEGRATED MOVING AVERAGE (ARIMA) PROCESSES

• Consider an ARIMA(

p,d,q

) process 

q

p

 

1 

B

d Y t

  0  

q

 

a t

where 

p

 1   1

B

   

p B p

and  1   1

B

  and

a t

 

q

~

B q

share no common WN  0,  a 2 .

 roots 52

ARIMA MODELS

• • When

d=0,

0

process.

 0  is related to the mean of the   1   1    

p

.

 When

d>0,

0

– Non-stationary in mean: 

p

is a deterministic trend term.

 

1 

B

Y t

  0  

q a t

– Non-stationary in level and slope: 

p

1 

B

2

Y t

  0  

q a t

53

RANDOM WALK PROCESS

• • A random walk is defined as a process where the current value of a variable is composed of the past value plus an error term defined as a white noise (a normal variable with zero mean and variance one).

ARIMA(0,1,0) PROCESS

Y t

Y t

 1 where

a t

 ~

a t

WN

  0 ,

Y t

 2

a

 .

1 

B

Y t

a t

54

RANDOM WALK PROCESS

• • • • Behavior of stock market.

Brownian motion.

Movement of a drunken men.

It is a limiting process of AR(1).

55

RANDOM WALK PROCESS

• • • The implication of a process of this type is that the best prediction of in other words the process does not allow to predict the change (

Y t Y

 for next period is the current value, or

Y t-1

absolutely random.

). That is, the change of

Y

is It can be shown that the mean of a random walk process is constant but its variance is not. Therefore a random walk process is nonstationary, and its variance increases with

t

.

In practice, the presence of a random walk process makes the forecast process very simple since all the future values of

Y t+s

for

s > 0

, is simply

Y t

.

56

RANDOM WALK PROCESS

57

RANDOM WALK PROCESS

58

RANDOM WALK WITH DRIFT

• Change in

Y t

is partially deterministic and partially stochastic.

Y t

Y t Y t

 It can also be written as  0 

a t Y t

Y

0 

t

 0 determinis tic trend 

i t

a

i

stochastic trend Pure model of a trend (no stationary component) 59

RANDOM WALK WITH DRIFT

E

 

t

Y

0 

t

 0 After

t

periods, the cumulative change in

Y t

is

t

0

.

E

Y t

s Y t

 

Y t

s

 0  not flat Each

a i

shock has a permanent effect on the mean of

Y t

.

60

RANDOM WALK WITH DRIFT

61

ARIMA(0,1,1) OR IMA(1,1) PROCESS

• Consider a process

1 

B

Y t

where

a t

~ 

1

WN

 

B

 0 , 

a t

2

a

.

 • Letting

W t

W t

  1  1  

B

a t

B

 

Y t stationary

62

ARIMA(0,1,1) OR IMA(1,1) PROCESS

• • Characterized by the sample ACF of the original series failing to die out and by the sample ACF of the first differenced series shows the pattern of MA(1).

IF:

Y t

 

j

    1

j Y t

j

a t

where 

E

Y t Y t

 1 ,

Y t

 2 ,    

j

   1  1   

j

 1

Y t

j

 1   .

ARIMA(0,1,1) OR IMA(1,1) PROCESS

E

Y t

 1

Y t

,

Y t

 1 ,    

Y t

1  

E

Y t Y t

 1 ,

Y t

 2 ,   where  is the smoothing constant in the method of exponential smoothing.

64

REMOVING THE TREND

• • Shocks to a stationary time series are temporary over time. The series revert to its long-run mean.

A series containing a trend will not revert to a long-run mean. The usual methods for eliminating the trend are detrending and differencing.

65

DETRENDING

• Detrending is used to remove deterministic trend.

• Regress

Y t

on time and save the residuals.

• Then, check whether residuals are stationary. 66

DIFFERENCING

• • • Differencing is used for removing the stochastic trend.

d-th

difference of ARIMA(

p,d,q

) model is stationary. A series containing unit roots can be made stationary by differencing.

ARIMA(

p,d,q

) 

d

unit roots Integrated of order

d

,

I(d) Y t

~

I

67

DIFFERENCING

• Random Walk:

Y t

Y t

 1 

a t

Y t

a t

Non-stationary Stationary 68

DIFFERENCING

• Differencing always makes us to loose observation.

• 1st regular difference:

d=1

 1 

B

Y t

Y t

Y t

 1 •

1  2 nd

B

2 regular difference:

d=2 Y t

  2

Y t

  1  2

B

B

2 

Y t

  

Y t Y t

 2

Y t

 1 

Y t

 2

Y t

Y t

 2 is not the 2nd difference 69

Y t

3 8 5 9

DIFFERENCING

Y t

* 8  3=5 5  8=  3 9  5=4 

2 Y t

* *  3  5=  8 4  (  3)=7

Y t

Y t-2

* * 5  3=2 9  8=1 70

KPSS TEST

• To be able to test whether we have a deterministic trend vs stochastic trend, we are using KPSS (Kwiatkowski, Phillips, Schmidt and Shin) Test (1992).

H

0 :

Y t

~

I H

1 :

Y t

~

I

 level

or trend

stationary  difference stationary 71

KPSS TEST

STEP 1: Regress

Y t

on a constant and trend and construct the OLS residuals

e=(e 1 ,e 2 ,…,e n )’.

STEP 2: Obtain the partial sum of the residuals.

S t

i t

  1

e i STEP 3: Obtain the test statistic KPSS

n

 2

t n

  1

S

 ˆ

t

2  ˆ

of the residuals.

72

KPSS TEST

• • • STEP 4: Reject

H 0

when KPSS is large, because that is the evidence that the series wander from its mean.

Asymptotic distribution of the test statistic uses the standard Brownian bridge.

It is the most powerful unit root test but if there is a volatility shift it cannot catch this type non-stationarity.

73

DETERMINISTIC TREND EXAMPLE

kpss.test(x,null=c("Level"))

KPSS Test for Level Stationarity data: x KPSS Level = 3.4175, Truncation lag parameter = 2,

p-value = 0.01

Warning message: In kpss.test(x, null = c("Level")) : p-value smaller than printed p-value

> kpss.test(x,null=c("Trend"))

KPSS Test for Trend Stationarity data: x KPSS Trend = 0.0435, Truncation lag parameter = 2,

p-value = 0.1

Warning message: In kpss.test(x, null = c("Trend")) : p-value greater than printed p-value Here, we have deterministic trend or trend stationary process. Hence, we need de-trending to work with stationary series.

74

STOCHASTIC TREND EXAMPLE

> kpss.test(x, null = "Level")

KPSS Test for Level Stationarity data: x KPSS Level = 3.993, Truncation lag parameter = 3, p-value = 0.01

Warning message: In kpss.test(x, null = "Level") : p-value smaller than printed p value

> kpss.test(x, null = "Trend")

KPSS Test for Trend Stationarity data: x KPSS Trend = 0.6846, Truncation lag parameter = 3, p-value = 0.01

Warning message: In kpss.test(x, null = "Trend") : p-value smaller than printed p value Here, we have stochastic trend or difference stationary process. Hence, we need differencing to work with stationary series.

75

PROBLEM

• • When an inappropriate method is used to eliminate the trend, we may create other problems like non-invertibility.

E.g.

 

Y t

  0   1

t

where the roots of unit circle and 

t

     

t

   

 Trend

a t

.

stationary 0 are outside the 76

PROBLEM

• But if we misjudge the series as difference stationary, we need to take a difference. Actually, detrending should be applied. Then, the first difference:  

Y t

  1   1 

B

 

t

Now, we create a non-invertible unit root process in the MA component.

77

PROBLEM

• • • To overcome this, look at the inverse sample autocorrelation function. If it has the same ACF pattern of non-stationary process (that is, slow decaying behavior), this means that we over-differenced the series. Go back and de-trend the series instead of differencing.

There are also smoothing filters to eliminate the trend (Decomposition Methods).

78

NON-STATIONARITY IN VARIANCE

• • • Stationarity in mean Stationarity in variance Non-stationarity in mean Non-stationarity in variance If the mean function is time dependent, 1. The variance, Var(

Y t

) is time dependent.

2. Var(

Y t

) is unbounded as

t

 .

3. Autocovariance and autocorrelation functions are also time dependent.

4. If

t

is large wrt

Y 0

, then 

k

 1.

79

VARIANCE STABILIZING TRANSFORMATION

• • The variance of a non-stationary process changes as its level changes

Var

 

t

c

.

f

 

t

for some positive constant

c

and a function

f

.

Find a function

T

series

T(Y t )

so that the transformed has a constant variance.

The Delta Method 80

VARIANCE STABILIZING TRANSFORMATION

• Generally, we use the power function

T

 

t

Y t

   1 (Box and Cox, 1964)   1  0.5

0 0.5

1

Transformation

1/Y t 1/(Y t ) 0.5

ln Y t (Y t ) 0.5

Y t (no transformation)

81

• • •

VARIANCE STABILIZING TRANSFORMATION

Variance stabilizing transformation is only for positive series. If your series has negative values, then you need to add each value with a positive number so that all the values in the series are positive. Now, you can search for any need for transformation.

It should be performed before any other analysis such as differencing.

Not only stabilize the variance but also improves the approximation of the distribution by Normal distribution.

82

TRANSFORMATION

install(TSA) library(TSA) oil=ts(read.table('c:/oil.txt',header=T), start=1996, frequency=12) BoxCox.ar(y=oil) 83