Slide 1

Transcript Slide 1

MKFM6: multivariate stationary state-space time-series
modeling using ML estimation in the Kalman Filter
zt
G
at-1
H
H
at
B
xt+1 Z
G
at+1
B
S
xt+1 Z
yt
et
zt+1
H
at-2
S
yt+1
et+1
R
R
1
y[t] an ny dimensional random vector
repeatedly observed at occasion t-1...t...t+1
in a sample of N=1 or N>1
Q
zt
G
at-1
H
xt
Z
G
H
at
B
Q
at+1
B
S
xt+1 Z
yt
et
zt+1
H
S
yt+1
et+1
R
at-2
R
t=1.....T
2
y[t] = S a[t]
S=I
Q
zt
G
at-1
H
xt
Z
G
H
at
B
Q
at+1
B
S
xt+1 Z
yt
et
zt+1
H
at-2
S
yt+1
et+1
R
R
3
y[t] = S a[t]
a[t+1] = H a[t+1] + G z[t+1]
G=I
Q
Q
G
H
zt
G
H
H
at-1
at
at+1
B
xt
Z
S
B
xt+1 Z
yt
zt+1
at-2
S
yt+1
e
e
1st Order Autoregressive
in structure,
Markov model
R
VARMA(p,q) models
R
t
t+1
4
y[t] = S a[t] + e[t]
S≠I
a[t+1] = H a[t+1] + G z[t+1]
zt
G
at-1
H
G
H
at
B
xt
Q
Q
at latent (as in factor
analysis)
Z
at+1
B
S
xt+1 Z
yt
zt+1
H
at-2
S
yt+1
yt observed
et
et+1
R
R
5
covariance matrices
regression parameters
y[t] = S a[t] + e[t] + Z x[t]
a[t+1] = H a[t+1] + G z[t+1]
Q
Q
zt
G
at-1
H
H
at
B
xt
G
Z
at+1
B
S
xt+1
yt
et
Z
zt+1
H
at-2
S
yt+1
et+1
R
R
x is fixed regressor (e.g., if x=1, Z are means)
6
covariance matrices
regression parameters
y[t] = S a[t] + e[t] + Z x[t]
a[t+1] = H a[t+1] + G z[t+1] + B x[t+1]
Q
Q
zt
G
at-1
H
H
at
B
xt
G
Z
at+1
B
S
xt+1
yt
et
Z
zt+1
H
at-2
S
yt+1
et+1
R
R
7
y[t] = S a[t] + d + e[t] + Z x[t]
a[t+1] = H a[t+1] + c + G z[t+1] + B x[t+1]
d and c superfluous, but convenient
Q
Q
zt
zt+1
G
at-1
H
H
at
B
xt
G
Z
at+1
B
S
xt+1
yt
et
H
Z
at-2
S
yt+1
et+1
R
R
8
y[t] = S a[t] + d + e[t] + Z x[t]
a[t+1] = H a[t+1] + c + G z[t+1] + B x[t+1]
d and c superfluous, but convenient
Q
Q
zt
zt+1
G
at-1
H
H
at
c
1
G
d
at+1
c
S
1
yt
et
H
d
at-2
S
yt+1
et+1
R
R
9
zt-1
zt
at-1
zt+1
at
at+1
y
y
y
y
y
y
y
y
y
e
e
e
e
e
e
e
e
e
time
yt-11
.....
yt-1N
N
large
small
1
Groups
≥1
≥1
≥1
yt1
yt+11
.......
yT1
ytN
yt+1N
.......
yTN
T
small
intermediate
large
type
SEM
hybrid
Timeseries
subject
Software
LISREL, M+, Mx
MKFM
Many, MKFM
10
Acually all structural equation modeling, the details dictate computational strategies
Q
Q
zt-1
G
1
at-1
d
zt
G
at
H
S
Q
zt+1
G
at+1
H
S
S
y
y
y
y
y
y
y
y
y
e
e
e
e
e
e
e
e
e
R
R
R
nm=1 se=yes
mo=1 ny=4 ne=1 nx=0
df=ts1 rf=no ns=1 mi=-9
S=1 R=1 H=1 Q=1 d=1 c=0 Z=0 P=1 B=0 G=1
11
S=1 R=1 H=1 Q=1 d=1 c=0 Z=0 P=1 B=0 G=1
R fi di
0000
R fr di
1234
G fi di
1
G fr di
0
d fr
11 12 13 14
d fi
0000
H fi
0
H fr
21
Q fi fu
1
Q fr fu
31
S fi
1
0
0
0
S fr
0
41
42
43
12
Model 1 of 1
S parameters
1.000
0.934
0.812
0.683
R parameters - diagonal
0.395 0.453 0.563 0.500
H parameters
0.799
Q parameters
0.348
d parameters
4.936 5.081 5.030 4.947
G parameters - diagonal
1.000
Q
zt-1
G
at-1
1
H
d
S
y
y
y
y
e
e
e
e
R
13
2
0
-1
-2
ts1[ist:iend, i]
Black line estimated latent
series (Kalman
Filter)
1
Colored lines observed series
225
230
235
240
ist:iend
245
250
14
Q
Q
zt-1
G
1
at-1
d
G
at
H
S
Q
zt
G
H
at+1
S
zt+1
S
y
y
y
y
y
y
y
y
y
e
e
e
e
e
e
e
e
e
R
R
R
Q: what if H is zero?
15
Q
Q
zt-1
G
1
d
Q
zt
G
zt+1
G
at-1
at
at+1
S
S
S
y
y
y
y
y
y
y
y
y
e
e
e
e
e
e
e
e
e
R
R
R
Q: what if H is zero?
A: data at each occasion are independent.
If H is zero, I can fit the model in LISREL (or Mx, or M+)
Or in MKFM6
16
H=0
S parameters
1.000
0.867
0.841
0.665
LAMBDA-Y
1.000
0.867
0.841
0.665
Q parameters
0.949
PSI
0.949
Q
zt+1
I
at+1
S
y
y
y
e
e
e
R
R parameters - diagonal
0.557 0.481 0.472 0.634
THETA-EPS
0.557
0.481
0.472
0.634
17
Similarities between the LISREL model
and the MKF State- Space model.
measurement (linear factor) model
d+
t+
e[t] + Z x[t]
e[i]
structural regression model
a[t+1] = H a[t+1]
h[i] =
B h[i]
cov(e) = R
cov(e) = Q
covariance matrices
regression parameters
y[t] = Sa[t] +
y[i] = Lh[i] +
+ c + G z[t+1] + B x[t+1]
+ a + I z[t+1]
cov(z) = GQG' = Q cov(z) = Y
Sy-Miin Chow et al. SEM 2010.
18
next example VAR
x
B
x
B
Q
a1t
a1t+1
a2t
a2t+1
a3t
H
a3t+1
d
1
1
Restricted vector autoregressive model, S=I
y[t] and a[t] variables identical
19
regression on
fixed x
x
B
B
Q
a1t
a1t+1
a2t
a2t+1
a3t
intercepts
x
H
a3t+1
d
1
1
Effect of x on a2 and a3 via a1 (a causal model)
20
2
3
timeseries a1, a2, a3
1
0
-1
-2
-3
fsdat[101:150, 1]
x fixed variable
0
10
20
30
40
50
Index
gaps: 25% missing in each series
21
G fi di
111
G fr di
000
d fr
123
d fi
000
H fi
000
000
000
H fr
456
789
10 11 12
x
x
B Q
a1t
a2t
a3t
B
a1t+1
H
a2t+1
a3t+1
d
1
1
Q fi di
111
Q fr di
21 22 23
S fi di
111
S fr di
000
B fr
31
32
33
B fi
0
0
0
S=1 R=0 H=1 Q=1 d=1 c=0 Z=0 P=1 B=1 G=1
22
next example latent var
D - depression, A anxiety
D
D
A
A
S≠I:
autoregressive / cross lagged regressive model - with indicators
23
D - depression, A anxiety
D
D
wife
A
A
D
D
husband
A
A
24
N=1 Meas. Inv. of indicators w.r.t. external variable x
x
x
z
z
z
z
a
a
a
y
y
y
e
e
y
y
e
H
y
e
y
y
y
e
e
B
a
e
e
y
y
y
e
e
e
e
S
d (not shown) i.e. intercepts
G
25
N=1 Meas. Inv. of indicators w.r.t. external variable x
x
f(yi|a*) = f(yi|a*,xi)
z
z
a
a
y
e
y
e
y
y
y
y
e
e
e
e
26
N=1 Meas. Inv. of indicators w.r.t. external variable x
two indicators biased w.r.t. x.
x
x
z
z
z
a
z
a
a
y
e
y
y
e
e
a
y
y
y
y
e
e
e
e
y
e
y
e
y
y
y
e
e
e
B
S
H
Z (bias with respect to x)
G
f(yi|a*) ≠ f(yi|a*,xi)
27
x
x
z
z
B
B
z
z
a
a
a
y
y
y
e
e
y
y
e
a
y
e
y
y
e
e
y
e
e
y
y
y
e
e
e
e
f(yi|a*) = f(yi|a*,subjecti)
Are the indicators measurement invariant w.r.t. subject (e.g., N=2)?
d is invariant (intercepts equal), B zero in subject 1, B free in subject 2
X could equal 1.
28
f(yi|a*) = f(yi|a*,xi)
f(yi|a*) = f(yi|a*,subjecti)
Definition of measurement invariance in N=1 or N=2.
+ Interpretation as an intra-individual causal model
Relationship with inter-individual causal model
Issue of power: simulation? exact simulation?
is N=100, T=1 relevant to N=1, T=100.
Application (real data)
29
T=50 Ny=4 N=250
ML parameter estimates
R11 0.51000
R22 0.36000
R33 0.51000
R44 0.36000
D1 0.00000
D2 0.00000
D3 0.00000
D4 0.00000
H
0.00000
Q
1.00000
S21 0.80000
S31 0.70000
S41 0.80000
ML parameter estimates
0.50998 (.51)
0.35997 (.36)
0.50994 (.51)
0.36001 (.36)
-0.00153 (.0)
-0.00175 (.0)
-0.00153 (.0)
-0.00175 (.0)
0.79994 (.80)
0.37097 (.36)
0.79994 (.8)
0.69997 (.7)
0.79992 (.8)
30
Good points:
N=1, N=few, N=many
Multigroup, where group is N=1 or N>1
No limitation on length of timeseries T
Can handle N=few T=intermediate (a niche!)
Missing data no problem (under assumptions)
Model quite flexible
Freely available (FORTRAN 77 code)
Easy-ish to use
Bad points:
Stationarity (cov structure)
ML fixed effect only (no random effects)
Continuous indicators, conditional normality
Easy-ish to use
31
Estimation: Maximum Likelihood in the Kalman Filter
(prediction error decomp.)
Documentation:
mkfm.doc (manual with examples: DFA, ARMA,
includes FORTAN source code)
j_adolf.doc (more examples incl meas. inv.)
some technical doc (online; Ellen Hamaker)
My main reference:
A.C. Harvey (1996). Forecasting, structural time series models
and the Kalman Filter. Cambridge: Cambridge Univ. Press.
Other good references: Hamilton, Kim & Nelson.
One or two articles using MKFM: Ellen Hamaker (UU).
32
To use:
1) Organize data input (manual)
2) Write input script (manual)
3) Run analysis in DOS window
mkfm6-1 < inputfile > outputfile
33
title example simulated
nm=1 se=yes
mo=1 ny=4 ne=1 nx=1
df=ts1mi rf=no ns=1 mi=-999
B=1 S=1 R=1 H=1 Q=1 d=1 c=0 Z=0 P=1 G=1
R fi di
0000
R fr di
1234
G fi di
1
G fr di
0
d fr
11 12 13 14
d fi
0000
H fi
0
H fr
21
Q fi
1
Q fr
31
S fi
1
0
0
0
S fr
0
41
42
43
Input #1:
Model specification
B fi
0
B fr
50
P fi
100
P fr
0
st
...
lb
...
ub
...
34
Input part #2:
the data file
data file: TS1MI
250
0
4.621209 5.754381 6.855362 7.026104 0
5.826427 6.732414 6.818705 5.448525 0
4.840544 4.112223 6.653377 7.070116 0
6.197258 7.596196 7.257844 3.101740 1
..........
35
max nm= 5 nt=5000 ns=
10 ny=30 nx= 5 ne=30 npar=400
Output part#1:
Read from input file
title example
nm=1 se=yes
simulated
Model specification
mo=1 ny=4 ne=1 nx=1
df=ts1mi rf=no ns=1 mi=-999
B=1 S=1 R=1 H=1 Q=1 d=1 c=0 Z=0 P=1 G=1
===================
MKFv1
April 2010
===================
title example simulated
Model
1 of
1
S fr parameters (nonzero)
0
11
12
13
R fr parameters (nonzero) - diagonal
1
2
3
4
H fr parameters (nonzero)
9
Q fr parameters (nonzero)
10
d fr parameters (nonzero)
5
6
7
8
B fr parameters (nonzero)
14
36
Output part#2:
DATA SUMMARY
MODEL
CASE
1 of
1
1 NY= 4 NX= 1 NE= 1 Ncases=
T= 250 N of T missing=
State_0
0.00
var
1
%miss
0.2720
mean
5.52
var
1.83
std
1.35
min
1.32
max
8.82
2
0.2640
5.41
1.67
1.29
2.01
9.75
Number of fixed regressors
ML parameter estimates
nr
1
0.55179
nr
2
0.51076
nr
3
0.68355
nr
4
0.44029
nr
5
4.47651
nr
6
4.47448
nr
7
4.73732
nr
8
4.71898
nr
9
0.72616
nr 10
0.48247
nr 11
0.92335
nr 12
0.71907
nr 13
0.62061
nr 14
0.95749
Logl
3
0.2520
5.49
1.39
1.18
2.23
9.33
1 START=
1 END=
1
Summary stats
+ parameter
estimates,
st errs, tvals
0 datafile ts1mi
4
0.2880
5.40
0.94
0.97
3.22
8.01
1
g
g
g
g
g
g
g
g
g
g
g
g
g
g
-513.914 -2xLogL
0.000026
0.000012
0.000048
0.000038
0.000020
0.000051
-0.000049
-0.000096
-0.000106
-0.000055
0.000051
-0.000143
-0.000031
-0.000021
se
se
se
se
se
se
se
se
se
se
se
se
se
se
0.0851
0.0748
0.0830
0.0562
0.2983
0.2748
0.2244
0.1928
0.0506
0.0932
0.0738
0.0710
0.0609
0.1301
t
t
t
t
t
t
t
t
t
t
t
t
t
t
1027.828 Inform(NPSOL)
6.48
6.83
8.23
7.83
15.01
16.28
21.11
24.47
14.36
5.18
12.52
10.13
10.19
7.36
0
37
title example
Model
1 of
simulated
Output part#3:
1
parameter
estimates in
parameter
matrices
S parameters
1.000
0.923
0.719
0.621
R parameters - diagonal
0.552
0.511
0.684
0.440
H parameters
0.726
Q parameters
0.482
P parameters
100.000
d parameters
4.477
4.474
4.737
4.719
G parameters - diagonal
1.000
B parameters
0.957
P(t|t) error cov
0.233
38

Slide 1

Transcript Slide 1

Directory