Transcript Slide 1
MKFM6: multivariate stationary state-space time-series modeling using ML estimation in the Kalman Filter zt G at-1 H H at B xt+1 Z G at+1 B S xt+1 Z yt et zt+1 H at-2 S yt+1 et+1 R R 1 y[t] an ny dimensional random vector repeatedly observed at occasion t-1...t...t+1 in a sample of N=1 or N>1 Q zt G at-1 H xt Z G H at B Q at+1 B S xt+1 Z yt et zt+1 H S yt+1 et+1 R at-2 R t=1.....T 2 y[t] = S a[t] S=I Q zt G at-1 H xt Z G H at B Q at+1 B S xt+1 Z yt et zt+1 H at-2 S yt+1 et+1 R R 3 y[t] = S a[t] a[t+1] = H a[t+1] + G z[t+1] G=I Q Q G H zt G H H at-1 at at+1 B xt Z S B xt+1 Z yt zt+1 at-2 S yt+1 e e 1st Order Autoregressive in structure, Markov model R VARMA(p,q) models R t t+1 4 y[t] = S a[t] + e[t] S≠I a[t+1] = H a[t+1] + G z[t+1] zt G at-1 H G H at B xt Q Q at latent (as in factor analysis) Z at+1 B S xt+1 Z yt zt+1 H at-2 S yt+1 yt observed et et+1 R R 5 covariance matrices regression parameters y[t] = S a[t] + e[t] + Z x[t] a[t+1] = H a[t+1] + G z[t+1] Q Q zt G at-1 H H at B xt G Z at+1 B S xt+1 yt et Z zt+1 H at-2 S yt+1 et+1 R R x is fixed regressor (e.g., if x=1, Z are means) 6 covariance matrices regression parameters y[t] = S a[t] + e[t] + Z x[t] a[t+1] = H a[t+1] + G z[t+1] + B x[t+1] Q Q zt G at-1 H H at B xt G Z at+1 B S xt+1 yt et Z zt+1 H at-2 S yt+1 et+1 R R 7 y[t] = S a[t] + d + e[t] + Z x[t] a[t+1] = H a[t+1] + c + G z[t+1] + B x[t+1] d and c superfluous, but convenient Q Q zt zt+1 G at-1 H H at B xt G Z at+1 B S xt+1 yt et H Z at-2 S yt+1 et+1 R R 8 y[t] = S a[t] + d + e[t] + Z x[t] a[t+1] = H a[t+1] + c + G z[t+1] + B x[t+1] d and c superfluous, but convenient Q Q zt zt+1 G at-1 H H at c 1 G d at+1 c S 1 yt et H d at-2 S yt+1 et+1 R R 9 zt-1 zt at-1 zt+1 at at+1 y y y y y y y y y e e e e e e e e e time yt-11 ..... yt-1N N large small 1 Groups ≥1 ≥1 ≥1 yt1 yt+11 ....... yT1 ytN yt+1N ....... yTN T small intermediate large type SEM hybrid Timeseries subject Software LISREL, M+, Mx MKFM Many, MKFM 10 Acually all structural equation modeling, the details dictate computational strategies Q Q zt-1 G 1 at-1 d zt G at H S Q zt+1 G at+1 H S S y y y y y y y y y e e e e e e e e e R R R nm=1 se=yes mo=1 ny=4 ne=1 nx=0 df=ts1 rf=no ns=1 mi=-9 S=1 R=1 H=1 Q=1 d=1 c=0 Z=0 P=1 B=0 G=1 11 S=1 R=1 H=1 Q=1 d=1 c=0 Z=0 P=1 B=0 G=1 R fi di 0000 R fr di 1234 G fi di 1 G fr di 0 d fr 11 12 13 14 d fi 0000 H fi 0 H fr 21 Q fi fu 1 Q fr fu 31 S fi 1 0 0 0 S fr 0 41 42 43 12 Model 1 of 1 S parameters 1.000 0.934 0.812 0.683 R parameters - diagonal 0.395 0.453 0.563 0.500 H parameters 0.799 Q parameters 0.348 d parameters 4.936 5.081 5.030 4.947 G parameters - diagonal 1.000 Q zt-1 G at-1 1 H d S y y y y e e e e R 13 2 0 -1 -2 ts1[ist:iend, i] Black line estimated latent series (Kalman Filter) 1 Colored lines observed series 225 230 235 240 ist:iend 245 250 14 Q Q zt-1 G 1 at-1 d G at H S Q zt G H at+1 S zt+1 S y y y y y y y y y e e e e e e e e e R R R Q: what if H is zero? 15 Q Q zt-1 G 1 d Q zt G zt+1 G at-1 at at+1 S S S y y y y y y y y y e e e e e e e e e R R R Q: what if H is zero? A: data at each occasion are independent. If H is zero, I can fit the model in LISREL (or Mx, or M+) Or in MKFM6 16 H=0 S parameters 1.000 0.867 0.841 0.665 LAMBDA-Y 1.000 0.867 0.841 0.665 Q parameters 0.949 PSI 0.949 Q zt+1 I at+1 S y y y e e e R R parameters - diagonal 0.557 0.481 0.472 0.634 THETA-EPS 0.557 0.481 0.472 0.634 17 Similarities between the LISREL model and the MKF State- Space model. measurement (linear factor) model d+ t+ e[t] + Z x[t] e[i] structural regression model a[t+1] = H a[t+1] h[i] = B h[i] cov(e) = R cov(e) = Q covariance matrices regression parameters y[t] = Sa[t] + y[i] = Lh[i] + + c + G z[t+1] + B x[t+1] + a + I z[t+1] cov(z) = GQG' = Q cov(z) = Y Sy-Miin Chow et al. SEM 2010. 18 next example VAR x B x B Q a1t a1t+1 a2t a2t+1 a3t H a3t+1 d 1 1 Restricted vector autoregressive model, S=I y[t] and a[t] variables identical 19 regression on fixed x x B B Q a1t a1t+1 a2t a2t+1 a3t intercepts x H a3t+1 d 1 1 Effect of x on a2 and a3 via a1 (a causal model) 20 2 3 timeseries a1, a2, a3 1 0 -1 -2 -3 fsdat[101:150, 1] x fixed variable 0 10 20 30 40 50 Index gaps: 25% missing in each series 21 G fi di 111 G fr di 000 d fr 123 d fi 000 H fi 000 000 000 H fr 456 789 10 11 12 x x B Q a1t a2t a3t B a1t+1 H a2t+1 a3t+1 d 1 1 Q fi di 111 Q fr di 21 22 23 S fi di 111 S fr di 000 B fr 31 32 33 B fi 0 0 0 S=1 R=0 H=1 Q=1 d=1 c=0 Z=0 P=1 B=1 G=1 22 next example latent var D - depression, A anxiety D D A A S≠I: autoregressive / cross lagged regressive model - with indicators 23 D - depression, A anxiety D D wife A A D D husband A A 24 N=1 Meas. Inv. of indicators w.r.t. external variable x x x z z z z a a a y y y e e y y e H y e y y y e e B a e e y y y e e e e S d (not shown) i.e. intercepts G 25 N=1 Meas. Inv. of indicators w.r.t. external variable x x f(yi|a*) = f(yi|a*,xi) z z a a y e y e y y y y e e e e 26 N=1 Meas. Inv. of indicators w.r.t. external variable x two indicators biased w.r.t. x. x x z z z a z a a y e y y e e a y y y y e e e e y e y e y y y e e e B S H Z (bias with respect to x) G f(yi|a*) ≠ f(yi|a*,xi) 27 x x z z B B z z a a a y y y e e y y e a y e y y e e y e e y y y e e e e f(yi|a*) = f(yi|a*,subjecti) Are the indicators measurement invariant w.r.t. subject (e.g., N=2)? d is invariant (intercepts equal), B zero in subject 1, B free in subject 2 X could equal 1. 28 f(yi|a*) = f(yi|a*,xi) f(yi|a*) = f(yi|a*,subjecti) Definition of measurement invariance in N=1 or N=2. + Interpretation as an intra-individual causal model Relationship with inter-individual causal model Issue of power: simulation? exact simulation? is N=100, T=1 relevant to N=1, T=100. Application (real data) 29 T=50 Ny=4 N=250 ML parameter estimates R11 0.51000 R22 0.36000 R33 0.51000 R44 0.36000 D1 0.00000 D2 0.00000 D3 0.00000 D4 0.00000 H 0.00000 Q 1.00000 S21 0.80000 S31 0.70000 S41 0.80000 ML parameter estimates 0.50998 (.51) 0.35997 (.36) 0.50994 (.51) 0.36001 (.36) -0.00153 (.0) -0.00175 (.0) -0.00153 (.0) -0.00175 (.0) 0.79994 (.80) 0.37097 (.36) 0.79994 (.8) 0.69997 (.7) 0.79992 (.8) 30 Good points: N=1, N=few, N=many Multigroup, where group is N=1 or N>1 No limitation on length of timeseries T Can handle N=few T=intermediate (a niche!) Missing data no problem (under assumptions) Model quite flexible Freely available (FORTRAN 77 code) Easy-ish to use Bad points: Stationarity (cov structure) ML fixed effect only (no random effects) Continuous indicators, conditional normality Easy-ish to use 31 Estimation: Maximum Likelihood in the Kalman Filter (prediction error decomp.) Documentation: mkfm.doc (manual with examples: DFA, ARMA, includes FORTAN source code) j_adolf.doc (more examples incl meas. inv.) some technical doc (online; Ellen Hamaker) My main reference: A.C. Harvey (1996). Forecasting, structural time series models and the Kalman Filter. Cambridge: Cambridge Univ. Press. Other good references: Hamilton, Kim & Nelson. One or two articles using MKFM: Ellen Hamaker (UU). 32 To use: 1) Organize data input (manual) 2) Write input script (manual) 3) Run analysis in DOS window mkfm6-1 < inputfile > outputfile 33 title example simulated nm=1 se=yes mo=1 ny=4 ne=1 nx=1 df=ts1mi rf=no ns=1 mi=-999 B=1 S=1 R=1 H=1 Q=1 d=1 c=0 Z=0 P=1 G=1 R fi di 0000 R fr di 1234 G fi di 1 G fr di 0 d fr 11 12 13 14 d fi 0000 H fi 0 H fr 21 Q fi 1 Q fr 31 S fi 1 0 0 0 S fr 0 41 42 43 Input #1: Model specification B fi 0 B fr 50 P fi 100 P fr 0 st ... lb ... ub ... 34 Input part #2: the data file data file: TS1MI 250 0 4.621209 5.754381 6.855362 7.026104 0 5.826427 6.732414 6.818705 5.448525 0 4.840544 4.112223 6.653377 7.070116 0 6.197258 7.596196 7.257844 3.101740 1 .......... 35 max nm= 5 nt=5000 ns= 10 ny=30 nx= 5 ne=30 npar=400 Output part#1: Read from input file title example nm=1 se=yes simulated Model specification mo=1 ny=4 ne=1 nx=1 df=ts1mi rf=no ns=1 mi=-999 B=1 S=1 R=1 H=1 Q=1 d=1 c=0 Z=0 P=1 G=1 =================== MKFv1 April 2010 =================== title example simulated Model 1 of 1 S fr parameters (nonzero) 0 11 12 13 R fr parameters (nonzero) - diagonal 1 2 3 4 H fr parameters (nonzero) 9 Q fr parameters (nonzero) 10 d fr parameters (nonzero) 5 6 7 8 B fr parameters (nonzero) 14 36 Output part#2: DATA SUMMARY MODEL CASE 1 of 1 1 NY= 4 NX= 1 NE= 1 Ncases= T= 250 N of T missing= State_0 0.00 var 1 %miss 0.2720 mean 5.52 var 1.83 std 1.35 min 1.32 max 8.82 2 0.2640 5.41 1.67 1.29 2.01 9.75 Number of fixed regressors ML parameter estimates nr 1 0.55179 nr 2 0.51076 nr 3 0.68355 nr 4 0.44029 nr 5 4.47651 nr 6 4.47448 nr 7 4.73732 nr 8 4.71898 nr 9 0.72616 nr 10 0.48247 nr 11 0.92335 nr 12 0.71907 nr 13 0.62061 nr 14 0.95749 Logl 3 0.2520 5.49 1.39 1.18 2.23 9.33 1 START= 1 END= 1 Summary stats + parameter estimates, st errs, tvals 0 datafile ts1mi 4 0.2880 5.40 0.94 0.97 3.22 8.01 1 g g g g g g g g g g g g g g -513.914 -2xLogL 0.000026 0.000012 0.000048 0.000038 0.000020 0.000051 -0.000049 -0.000096 -0.000106 -0.000055 0.000051 -0.000143 -0.000031 -0.000021 se se se se se se se se se se se se se se 0.0851 0.0748 0.0830 0.0562 0.2983 0.2748 0.2244 0.1928 0.0506 0.0932 0.0738 0.0710 0.0609 0.1301 t t t t t t t t t t t t t t 1027.828 Inform(NPSOL) 6.48 6.83 8.23 7.83 15.01 16.28 21.11 24.47 14.36 5.18 12.52 10.13 10.19 7.36 0 37 title example Model 1 of simulated Output part#3: 1 parameter estimates in parameter matrices S parameters 1.000 0.923 0.719 0.621 R parameters - diagonal 0.552 0.511 0.684 0.440 H parameters 0.726 Q parameters 0.482 P parameters 100.000 d parameters 4.477 4.474 4.737 4.719 G parameters - diagonal 1.000 B parameters 0.957 P(t|t) error cov 0.233 38