Transcript Thanks
Stationarity Issues in Time Series Modeling David A. Dickey North Carolina State University “Stationarity”-what is it? Example: Stocks of Silver in the NY Commodities Exchange Two forecasts: Nonstationary in yellow – No mean reversion, unbounded error bands Stationary in green – Reverts to mean, bounded error bands Silver Series “Stationarity”-what is it? Constant mean m Covariance between Yt, Yt+h function of h only. g(h) [Autocorrelation r(h) = g(h)/g(0)] One Lag Model Yt-m=r(Yt-1-m)+et – “shocks” et~N(0,s2) Stationary: |r|<1 – – Yt=m(1-r) +rYt-1+et Regress Yt on 1, Yt-1 » Estimators approximately normally distributed in large samples » Use t test for H0:r=0 One Lag Model with r=1 Yt-m=1(Yt-1-m)+et – “shocks” et~N(0,s2) Yt=Yt-1+et Best forecast of Yt is Yt-1 Nonstationary: r=1 – Regress Yt on 1, Yt-1 – Estimators NOT normally distributed even in large samples – CANNOT use t tables to test for H0:r=0 – t test statistic does NOT have t distribution!!! Hypothesis Test Model: Yt-m=r(Yt-1-m)+et Test – – H0: r=1 “Nonstationary, Unit Root” H1: |r|<1 “Stationary (mean reverting) Compare t calculated to new distribution Two Tests Model: Yt-m=r(Yt-1-m)+et – Yt-m-(Yt-1-m)=(r-1)(Yt-1-m)+et – Yt-Yt-1= m (1-r)+ (r-1)Yt-1+et – Regress Yt-Yt-1 on 1, Yt-1 – Tests: – n(coefficient of Yt-1) “Rho” – calculated t test “Tau” Some math Yt et + et -1 + + e1 e 21 e1e2 2 e e e 2 21 e3e1 e3e2 e4 e1 e4 e2 Above diagonal -> e1e3 e2 e3 e32 e4 e3 Y1e2 Y2e3 n e1e4 e2 e4 Y42 e3e4 2 e4 Y3e4 1 2 n 2 Yt -1et (Yn - et ) 2 t 2 t 1 More math n n t 2 t 2 n( rˆ - 1) [n -1 Yt -1et ] /[ n - 2 Yt -21 ] n 1 2 n 2 [ (Yn - et ) /( ns 2 )] / [ Yt -21 /( n 2s 2 )] 2 t 1 t 2 1 1 2 (W (1) - 1) / W 2 (t ) dt 2 0 W(t) is Wiener Process on [0,1] 1 2 n (t test ) (W (1) - 1) / 2 1 2 W (t ) dt 0 Two Series SAS software: PROC ARIMA proc gplot; plot (Y Z)*t / overlay; proc arima; i var=Y nlag=10 stationarity=(adf); i var=Z nlag=10 stationarity=(adf); Symptoms of Nonstationarity ACF dies down slowly – ACF is Corr (Yt, Yt-j) plot vs. j Nonconstant level when plotted Saw plot, ACFs coming up Y series ACF The ARIMA Procedure Name of Variable = Y Mean of Working Series Standard Deviation Number of Observations 110.9728 5.286108 250 Autocorrelation Lag 0 1 2 3 4 5 6 7 8 9 10 Correlation 1.00000 0.97219 0.94506 0.91741 0.89025 0.86479 0.84145 0.81771 0.79836 0.77912 0.75671 -1 9 8 7 6 5 4 3 2 1 | | . | . | . | . | . | . | . | . | . | . 0 1 2 3 4 5 6 7 8 9 1 |********************| |******************* | |******************* | |****************** | |****************** | |***************** | |***************** | |**************** | |**************** | |**************** | |*************** | Std Error 0 0.063246 0.107523 0.136771 0.159498 0.178269 0.194326 0.208391 0.220853 0.232110 0.242346 Z series ACF The ARIMA Procedure Name of Variable = Z Lag 0 1 2 3 4 5 6 7 8 9 10 Mean of Working Series 100.5022 Standard Deviation 2.402392 Number of Observations 250 Autocorrelations Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 1.00000 | |********************| 0.90796 | . |****************** | 0.81755 | . |**************** | 0.72228 | . |************** | 0.63703 | . |************* | 0.56707 | . |*********** | 0.51964 | . |********** | 0.47865 | . |********** | 0.46026 | . |********* | 0.44466 | . |********* | 0.42313 | . |******** | "." marks two standard errors Tests on Y The ARIMA Procedure Augmented Dickey-Fuller Unit Root Tests Type Lags Rho Pr < Rho Tau Pr < Tau F Pr > F Zero Mean 0 1 2 0.1014 0.0880 0.0719 0.7059 0.7027 0.6989 0.71 0.59 0.45 0.8675 0.8422 0.8101 Single Mean 0 -6.8507 1 -6.8539 2 -7.1478 0.2817 0.2815 0.2624 -2.30 -2.16 -2.07 0.1724 0.2211 0.2564 2.99 2.57 2.29 0.3095 0.4147 0.4861 Trend 0 -7.3468 1 -7.3273 2 -7.5909 0.6313 0.6328 0.6114 -2.46 -2.30 -2.19 0.3502 0.4295 0.4905 3.64 3.07 2.65 0.4500 0.5636 0.6489 Tests on Z The ARIMA Procedure Augmented Dickey-Fuller Unit Root Tests Type Lags Rho Pr < Rho Tau Pr < Tau 0.6803 0.6769 0.6733 -0.05 -0.15 -0.24 0.6647 0.632 0.5997 F Pr > F Zero Mean 0 1 2 -0.0087 -0.0237 -0.0393 Single Mean 0 1 2 -22.8511 -24.5443 -28.8542 0.0051 0.0034 0.0015 -3.45 -3.48 -3.69 0.0104 0.0095 0.0050 5.96 6.06 6.80 0.0136 0.0114 0.0010 Trend 0 1 2 -24.6119 -26.2971 -30.7682 0.0236 0.0161 0.0057 -3.61 -3.60 -3.77 0.0312 0.0319 0.0196 6.53 6.48 7.13 0.0449 0.0461 0.0283 Higher Order Processes Yt-ma1(Yt-1-m) + a2(Yt-2-m) + a3(Yt-3-m) + et DYt= Yt-Yt-1 = -(1-a1- a2 - a3) (Yt-1-m) - (a2 + a3) DYt-1 - a3 DYt-2 + et [ coefficient ] ADF stands for Augmenting lags Augmented Dickey-Fuller Testing for no mean reversion: H0: (1-a1- a2 - a3) 0 Regress Yt-Yt-1 on 1, Yt-1, Yt-1-Yt-2, Yt-2-Yt-3 Nonstandard | N(__, __) | Higher Order Processes Q1: How many lags??? Regress DYt on 1,Yt-1, DYt-1 , DYt-2, . . . | N(__, __) | so . . . Just use usual t tests and p-values!!! Q2: Why “Unit Root” Tests ?? B(Yt)= Yt-1 (1-a1B - a2 B2- a3B3)(Yt -m)= et root of 1-a1B - a2 B2- a3B3 at B=1 means 1-a11 - a2 12- a313 = 0 Check Silver Series for Augmenting Lags PROC REG; MODEL DEL= LSILVER DEL1 DEL2 DEL3 DEL4; TEST DEL2=0, DEL3=0, DEL4=0; Source DF Numerator 3 Denominator 133 Mean Square F Value Pr > F 4589.63459 3515.48242 1.31 0.2753 Unit Root test in PROC REG PROC REG; MODEL DEL= LSILVER DEL1; Variable Parameter DF Estimate t Value Pr > |t| Intercept 1 75.58073 2.76 0.0082 LSILVER DEL1 1 1 -0.11703 0.67115 -2.78 6.21 0.0079 <.0001 Unit Root test in PROC ARIMA PROC ARIMA DATA=SILVER; I VAR=SILVER STATIONARITY=(ADF=(1)); Augmented Dickey-Fuller Unit Root Tests Type Lags Tau Pr < Tau Zero Mean 1 -0.28 0.5800 Single Mean Trend 1 1 -2.78 -2.63 0.0689 0.2697 And now. . .the rest of the story Type Lags Zero Mean Single Mean Trend Tau Pr < Tau ????? (A) 1 -2.78 0.0689 ????? (B) (A) Assumes mean is 0 (or known and subtracted off) Has different (pair of) distributions !! (B) Allows for TREND under H1 Has third (pair of) distributions !!!! Silver - Need 2nd Difference? Dt = DYt = Yt-Yt-1 Q: Does D (also) have a unit root ? Regress DDt on Dt-1 using /NOINT (why?) No augmenting lags (why?) I VAR=Y(1) STATIONARITY = . . . Type Zero Mean Single Mean Trend Lags 0 0 0 Tau Pr < Tau -3.42 -3.39 -3.62 0.0010 0.0158 0.0383 Autocorrelations Lag Covariance Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 0 7612550 1.00000 | |********************| 1 7604217 0.99891 | .|********************| 2 7595529 0.99776 | .|********************| 3 7586855 0.99662 | . |********************| 4 7578152 0.99548 | . |********************| 5 7569481 0.99434 | . |********************| 6 7560553 0.99317 | . |********************| 7 7551925 0.99204 | . |********************| 8 7543869 0.99098 | . |********************| 9 7535957 0.98994 | . |********************| 10 7528240 0.98892 | . |********************| 11 7519890 0.98783 | . |********************| 12 7511672 0.98675 | . |********************| "." marks two standard errors Output from SAS PROC ARIMA Augmented Dickey-Fuller Unit Root Tests Type Lags Rho Pr < Rho Zero Mean 0 1.3567 0.9565 1 1.3481 0.9557 Single Mean 0 0.4065 0.9744 1 0.3500 0.9725 Trend 0 -6.3073 0.7203 1 -6.5833 0.6981 Differences Autocorrelations Lag Covariance Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 0 4003.285 1.00000 | |********************| 1 102.471 0.02560 | .|* | 2 -117.368 -.02932 | *|. | 3 -235.578 -.05885 | *|. | 4 -26.946567 -.00673 | .|. | 5 -46.750761 -.01168 | .|. | 6 -77.100469 -.01926 | .|. | 7 -224.055 -.05597 | *|. | 8 -27.874814 -.00696 | .|. | 9 132.415 0.03308 | .|* | 10 316.534 0.07907 | .|** | 11 -254.117 -.06348 | *|. | 12 200.979 0.05020 | .|* | "." marks two standard errors Inverse Autocorrelation Ming Chang thesis Dual model (1-a B) Yt= et AR(1) dual is Yt = (1-a B) et MA(1) Chang shows IACF dies off slowly if you overdifference. Differenced DJIA IACF Inverse Autocorrelations Lag Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 1 -0.51119 | **********|. | 2 0.01380 | .|. | 3 -0.00533 | .|. | 4 0.01061 | .|. | 5 -0.02324 | .|. | 6 0.00722 | .|. | 7 0.02122 | .|. | 8 -0.01617 | .|. | 9 0.02831 | .|* | 10 -0.04860 | *|. | 11 0.02759 | .|* | 12 -0.00422 | .|. | 2nd Differenced DJIA IACF Just for illustration, here is the inverse autocorrelation you would get if you differenced these differences once more, that is, if you took the second difference of the original series. Note the roughly triangular appearance, suggesting that you should have stopped after the first difference Lag 1 2 3 4 5 6 7 8 9 10 11 12 Inverse Autocorrelations Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 0.89720 | .|****************** | 0.80302 | .|**************** | 0.70785 | .|************** | 0.60466 | .|************ | 0.50498 | .|********** | 0.41173 | .|******** | 0.32523 | .|******* | 0.23836 | .|***** | 0.15871 | .|*** | 0.09447 | .|** | 0.05758 | .|* | 0.01735 | .|. | Rho and F Yt-ma1(Yt-1-m) + a2(Yt-2-m) + et Factor: (1-a1B-a2B2) (1-rB)(1-gB) DYt - (1-r)(1-g)(Yt-1-m) + rg(DYt-1) + et Rho (1) Estimate rg -a2 ( H0) g by regression (2) Divide n[(1-r)(g-1) estimate] by (g estimate-1) F Regress DYt on 1, t, Yt-1 , DYt-1 Test underlined items with F (3 numerator df) Trend is not Unit Root Yt = a + b t + Zt with Zt stationary Yt-1 = a + b(t-1) + Zt-1 DYt = b + DZt with DZt an overdifferenced series !! Example: Amazon.com Example (volume) PROC REG; MODEL DV = DATE LAGV DV1-DV4; TEST DV3=0, DV4=0; Variable DF Parameter Estimate Intercept date LAGV DV1 DV2 DV3 DV4 1 1 1 1 1 1 1 -17.49220 0.00147 -0.21914 -0.15446 -0.18447 -0.04433 -0.05774 t Value Pr > |t| Type I SS -5.26 5.41 -5.80 -3.08 -3.72 -0.94 -1.31 <.0001 <.0001 <.0001 0.0022 0.0002 0.3477 0.1923 0.00848 0.01395 26.67803 0.94211 3.52898 0.07997 0.48763 Test 1 Results for Dependent Variable DV Source Numerator Denominator DF 2 497 Mean Square 0.28380 0.28602 F Value 0.99 Pr > F 0.3715 ACF Levels: Lag Covariance Correlation -1 9 8 7 6 5 4 3 2 1 0 2.503910 1.00000 | 1 2.327538 0.92956 | . 2 2.225324 0.88874 | . 3 2.193509 0.87603 | . 4 2.155492 0.86085 | . 5 2.127643 0.84973 | . 6 2.099292 0.83841 | . 7 2.069929 0.82668 | . 8 2.062194 0.82359 | . 9 2.051450 0.81930 | . 10 2.011864 0.80349 | . 11 2.006564 0.80137 | . 12 1.996735 0.79745 | . 13 1.960231 0.78287 | . 14 1.951272 0.77929 | . 15 1.940939 0.77516 | . 16 1.919167 0.76647 | . 17 1.906896 0.76157 | . 18 1.905406 0.76097 | . 19 1.892168 0.75569 | . 20 1.857199 0.74172 | . 21 1.846038 0.73726 | . 22 1.826167 0.72933 | . 23 1.816151 0.72533 | . 24 1.821228 0.72735 | . 0 1 2 3 4 5 6 7 8 9 1 |********************| |******************* | |****************** | |****************** | |***************** | |***************** | |***************** | |***************** | |**************** | |**************** | |**************** | |**************** | |**************** | |**************** | |**************** | |**************** | |*************** | |*************** | |*************** | |*************** | |*************** | |*************** | |*************** | |*************** | |*************** | "." marks two standard errors IACF - Differences Lag 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Correlation 0.48216 0.44816 0.34266 0.30682 0.25213 0.24854 0.23624 0.18675 0.14088 0.20330 0.13295 0.11437 0.15524 0.11829 0.09978 0.10919 0.09049 0.06653 0.02886 0.09515 0.05504 0.07104 0.06065 0.02284 -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 | . |********** | | . |********* | | . |******* | | . |****** | | . |***** | | . |***** | | . |***** | | . |**** | | . |*** | | . |**** | | . |*** | | . |** | | . |*** | | . |** | | . |** | | . |** | | . |** | | . |*. | | . |*. | | . |** | | . |*. | | . |*. | | . |*. | | . | . The ARIMA Procedure Do the test: Augmented Dickey-Fuller Unit Root Tests Type Zero Mean Single Mean Trend Lags Rho Pr < Rho Tau Pr < Tau F Pr > F 2 2 2 0.0144 -14.2100 -85.7758 0.6861 0.0474 0.0007 0.02 -2.60 -6.35 0.6909 0.0944 <.0001 3.42 20.18 0.1920 0.0010 Fit AR(3) plus trend. Diagnostics: Autocorrelation Check of Residuals To Lag ChiSquare DF Pr > ChiSq 6 12 18 24 30 36 42 48 1.59 10.89 12.43 18.97 23.75 30.32 37.56 39.37 3 9 15 21 27 33 39 45 0.6615 0.2835 0.6460 0.5872 0.6439 0.6014 0.5358 0.7087 -----Autocorrelations-----0.015 -0.025 -0.036 . . . -0.000 . . . 0.072 . . . 0.031 Extensions S. E. Said shows that models with lagged et terms can still be tested by ADF tests. Nobel Prize “cointegration” idea: Two or more unit root processes have stationary linear combination. Compute, e.g. Yt = ln(St/Lt) and test for stationarity. http://www4.stat.ncsu.edu/~dickey Click: SAS Code from Presentations Thanks ! Questions ?