Innovation Approach to the Identification of Causal Models in Time Series Analysis T. Ozaki Institute of Statistical Mathematics 2015/11/6

Download Report

Transcript Innovation Approach to the Identification of Causal Models in Time Series Analysis T. Ozaki Institute of Statistical Mathematics 2015/11/6

Innovation Approach
to
the Identification of Causal Models
in
Time Series Analysis
T. Ozaki
Institute of Statistical Mathematics
2020/5/2
1
Innovation Approach
(Wold, Kolmogorov, Wiener, Kalman, Kailath)
(Box-Jenkins , Akaike etc.)
x1,x 2, x 3,..., x N
zt  f ( zt 1 )   t
dz(t)/ dt  f (z(t))
dz(t)  f (z(t))dt  dw(t)
Causal Model
Ozaki(1985, 1992, 1995, 2000)
1,  2 ,  3 ,...,  N
2
What causes the time-dependency
in
geophysical time series?
Geophysical
Dynamical System
3
Three topics in time series
1. Nonlinear time series
Dynamical Systems
2. Non-Gaussian time series
Dynamical Systems
& Shot Noise
3. Spatial time series
Spatial Dynamics
Innovation Approach
4
Dynamical System & Time Series Model
Ozaki&Oda(1976)
Restoring force ∝ W・GM sin x
W・GM sin x  x
: for small x
W・GM sin x  x - 1/6 x3
: for large x
x(t )  cx(t )   x(t )  n(t )
xt
x(t )  cx(t )   x(t )   x3 (t )  n(t )
D.S.
nt
5
ExpAR Model
p
xt  {i ,0  i ,1 exp(xt21 )}xt i   t
i 1
1. Smaller prediction errors than AR models.
p
 t  xt  {i ,0  i ,1 exp(xt21 )}xt i
i 1
p
 t  xt  i xt i
i 1
2. Mechanical interpretation
Ozaki&Oda(1976)
6
Natural frequency
x(t )  cx(t )  x(t )  n(t )
n(t)
x(t)
2  c     0
1
0 
  c2 / 4
2
xt  1xt 1   2 xt  2  nt
nt
xt
2  1    2  0
f0 
1
1
2
tan ( 42  1 / 1 )
2
p( f )


2
1 1e
f0
i 2 f
 2 e
i 2 f 2 2
7
Idea : Dynamic eigen-values
1. Duffing equation
x(t )  cx(t )   x(t )   x3 (t )  n(t )
2. van der Pol equation
x(t )  c{x 2 (t )  1}x(t )   x(t )  n(t )
 x t  1 (x t 1 ) 2 ( xt 1 ) xt 1  t 
 
  





0
xt 1   1
xt 2  0 
1(xt 1)  1, 0  1,1 exp(  xt21)


2 ( xt 1 )  2,0  2,1 exp(xt21 )
0

0
0

0
8
Make it non-explosive !
2
xt  (1,0  1,1 xt 1 )xt 1
Explosive !
 2 xt  2   t
Non-explosive !

Make it stay inside
for large xt-1 !

0

0

xt  {1,0  1,1 exp(
2
xt 1 )}x t 1  2 xt  2
 t
9
ExpAR Singular points
Ozaki(1985)
10
ExpAR-Chaos
Ozaki(1985)
11
ExpAR Models
&
non-Gaussian distributions
Ozaki(1985)
xt   (xt 1)xt 1   t
0, 2 , 2
1 , 1
: Stable singular points
: Unstable singular points
0.8xt 1   t
xt  
2
4
(0.8  1.3x t1  1.3xt 1  t
for xt 1  1
for x t 1  1
12
Distribution of ExpAR process
nt+1
ExpAR
Gaussian white noise
xt+1
non-Gaussian
2
xt 1  {0.8  0.4exp(xt )}xt  nt 1
2
xt 1  {1 0.2 exp( xt )}xt  nt 1
2
xt 1  {0.8  0.2exp(xt )}xt  nt 1
13
Causal Models
in
discrete time and continuous time
dx(t)/ dt  f (x(t))
dx(t)  f (x(t))dt  dw(t)
?
xt   (xt 1)xt 1
?
xt   (xt 1)xt 1   t
14
Time discretizations
of
dx=f(x)dt+dw(t)
Euler scheme
Heun scheme
Runge-Kutta scheme
Explosive nonlinear AR model !
xt t  p( xt ) xt  twt t
Any non-explosive scheme ?
15
L.L. scheme
i). Ozaki(1984)
xt  t  exp( Kt t)xt  twt t
Kt  log[1  Jt1 {exp( Jt t)  1}f (xt ) / xt ]
 f ( x) 

Jt  
  x x  x
t
ii). Ozaki(1985)
iii). Biscay et al.(1996)
iv). Shoji & Ozaki(1997)
Characteristics
1. Simple
2. A-stable
16
Examples of Exp(Ktt)
x(t )  f ( x)  n(t )
xt  t  exp( Kt t)xt  twt t
2
xt 1  {0.8  0.2exp(xt )}xt  nt 1
2
xt 1  {1 0.2 exp( xt )}xt  nt 1
17
Innovation Approach
(Wold, Kolmogorov, Wiener, Kalman, Kailath)
(Box-Jenkins , Akaike etc.)
x1,x 2, x 3,..., x N
zt  f ( zt 1 )   t
dz(t)/ dt  f (z(t))
dz(t)  f (z(t))dt  dw(t)
Causal Model
Ozaki(1985, 1992, 1995, 2000)
1,  2 ,  3 ,...,  N
18
Three types of models
Data
x1,x 2, x 3,..., x N
Time Series Models (ExpAR, neural net etc.)
xt  f (xt 1,x t 2 ,..., xt  k )   t
Dynamic
Phenomena
dz(t)/ dt  f (z(t))
Dynamical Systems
dz(t)  f (z(t))dt dw(t)
Stochastic
Dynamical Systems
19
Applications
1. Non-Gaussian time series
and
nonlinear dynamics
2. Estimation of dx=f(x)dt+dw(t)
dx/dt=f(x)
3. RBF-Neural Net
vs
ExpAR, RBF-AR modeling
4. Spatial time series modeling
20
Application-(1)
Non-Gaussian time series
and
nonlinear dynamics
Does non-Gaussian-distributed time series
mean
non-Gaussian prediction errors?
Not Necessarily !
21
Distribution of ExpAR process
nt+1
ExpAR
Gaussian white noise
xt+1
non-Gaussian
2
xt 1  {0.8  0.4exp(xt )}xt  nt 1
2
xt 1  {1 0.2 exp( xt )}xt  nt 1
2
xt 1  {0.8  0.2exp(xt )}xt  nt 1
22
Same Distribution
Different Dynamics
i)
x(t ) 
x  1 exp(x /  )
W(x) 
( )( )
Ozaki(1985,1990)
3,1
 y 2 (t )
2
 /2 y
y (t ) 
  n(t )
y
2
3,1
ii)
x(t )  e
y (t ) 
2  y (t )

e
2
2 y
 n(t )
3,1
iii)
x(t )  e
y  ay 
2  y (t )

e
2
2 y
 n(t )
23
Mechanism
Gamma-distributed Process(Type II)
(Ozaki, 1985,1992)
p

1 2
2
2
  [{(  1) x  x }p] 
[2

x
p]
2
t
x
2 x
p(x) 
x(t)  e
2  y (t )

dy  {
e
2

1
 1  x / 
x
e
( )( )
2 y
}dt  dw(t)
x(t) is generated from
i) Gaussian white noise
ii) Nonlinear Dynamics
iii)Variable Transformation
24
This implies
the validity of innovation approach
Non-Gaussian time series
x1,x 2, x 3,..., x N
yt  h1 ( xt )
 t  yt  f ( yt 1 )
1,  2 ,  3 ,...,  N
Gaussian white noise
25
When residuals of your model
are non-Gaussian looking,
what would you do?
1. Introduce non-Gaussian noise model
Bayesian Approach
2. Improve the Causal Model
so that it produces Gaussian residuals
Innovation Approach
26
Application-(3)
RBF-AR & RBF Neural Net
ExpAR model is not suffucient
p
x(t )  k ( x(t  1)) x(t  k )   t
k 1
More complicated dynamics, i.e. k(x(t-1))
RBF-AR
y-dependent system characteristics, i.e. k(x(t-1), y(t-1))
RBF-ARX
27
RBF-AR & RBF Neural Net
More complicated k(x(t-1))
(xt )
RBF-AR(p,d,m)
p
x(t)  0 (X(t 1))   i (X(t 1))x(t  i)   (t)
1
i 1
X(t  1)  [x (t  1),...,x( t  d)]'
m
Z1
Z2
0
xt
i ( X(t  1))  ci,0 

k 1
2
ci,k exp{k X(t  1)  Zk 2 }
RBF- Neural Net (p,d,m)
xt  0 (X(t 1))   t
m
0 (X(t  1))  c0,0   c0,k exp{k X(t 1)  Zk }
2
k 1
28
Application - (2)
Estimation of
z (t )  f ( z (t ))
dz(t)  f (z(t))dt dw(t)
Numerical examples
1. Rikitake chaos, ( Geophysics)
2. Zetterberg Model (Brain Science)
3. Dynamic Market model ( Finance)
29
How to identify ?
dz(t)  f (z(t))dt dw(t)
from
x1,x 2, x 3,..., x N
30
State Space Formulation
x1,x 2, x 3,..., x N
dz  f ( z)dt  g(z)dw(t)
xt  Hzt  et
31
Frost & Kailath(1971)’s theorem
dz  f ( z |  )dt  g(z | )dw (t)
xt  C z t   t
 k  x k  E[xk | xk 1,..., x1]
 k   (t)
t  0
Non-Gaussian
time series
: Gaussian white noise
x1,x 2, x 3,..., x N
Nonlinear Filter
Gaussian
white noise
1 , 2 , 3 ,..., N
32
Likelihood Calculation
Innovation Approach
 k  x k  E[xk | xk 1,..., x1]
 k   (t)
t  0
: Gaussian white noise
Frost & Kailath(1971)
log p(x1 ,..., x N |  )   log p(x k | x k 1 ,...,x1 , )
k
  log p( k | x k 1 ,..., x1 , )
k
1 N

 ( ){log  2 t  t2 }
2 t 1
 t
2
How to obtain t and t2 ?
 k  x k  E[xk | xk 1 ,..., x1 ]
 x k   k p(k | x k 1 ,..., x1 )dk
Nonlinear Filter
33
Relations to Jazwinski(1970)’s scheme
log p(x1 ,..., x N |  )   log p(x k | x k 1 ,...,x1 , )
p(x k | x k 1 ,..., x1 , )   p(x k | zk ) p(zk | x k 1 ,...,x1 , )dzk
p(zk | x k 1 ,..., x1 , )   p(zk | zk 1 )p(zk 1 | x k 1 ,...,x1 , )dzk 1
p(zk 1 | xk 1 ,..., x1 , ) 
p(x k 1 | zk 1 )p(zk 1 | x k  2 ,..., x1 ,  )
 p(xk 1 |  k 1)p( k 1 | x k 2 ,...,x1, )d k 1
Innovation Approach
Calculate p(xk|zk) & p(zk|zk-1) etc.
by Local Gauss model.
zt  Ft 1 z t 1  Gt 1 nt
xt  Hzt  et
dz / dt  f ( z |  )  n(t )
L.L.
xt  Czt  et
34
Two Choices for Approximation
1) Local Gauss
zt  Ft 1 z t 1  Gt 1 nt
p(z k | zk 1 , )  N (Ft 1 z t 1 ,Gt 1  n G't 1 )
xt  Hzt  et
p(x k | zk , )  N ( Hzt , e2 )
2) Local non-Gauss
Use of Fokker-Planck equation.
p

t

p 1
f i (z)

z i 2

i
j
i, j
2p
 zi z j
p(zk | zk 1)
Computationally 1) is super more efficient than 2)
35
Advantages of the L.L. Scheme
See
B.L.S.Prakasa Rao(1999)
Statistical Inference for Diffusion Type Processes
H.Schurz(1999)
A Brief Introduction To Numerical Analysis of (Ordinary)
Stochastic Differential Equations Without Tears
Essence is
Stability & Efficiency
36
Numerical examples
1. Rikitake chaos ( Geophysics)
2. Zetterberg Model (Brain Science)
3. Dynamic Market model ( Finance)
37
Identification of the chaotic Rikitake model
(Ozaki et at. 2000)
Simulation obtained by the LL scheme
Rikitake(1957), Ito(1982)
38
Identification
Results
Simulated
and
estimated states
39
Innovation of the estimated model
P>0.95
40
Three types of parameters
Structured-parameter optimization
for
M.L.E. method
1. 1,2
2.
 ,
3.
(0), (0), (0)
2
w
2
e
Peng & Ozaki(2001)
41
Initial values & Estimated States
1,2 w2 , 2e
: optimized
1,2 w2 , 2e
: optimized
(0), (0), (0)
: not optimized
(0), (0), (0)
: optimized
1
1
0.5
0.5
0
0
-0.5
-0.5
-1
0
500
1000
1500
2000
2500
3000
3500
-1
20
20
10
10
0
0
-10
-10
-20
0
500
1000
1500
2000
2500
3000
3500
-20
2
2
1
1
0
0
-1
0
500
1000
1500
2000
2500
3000
3500
-1
0
2
4
6
8
10
12
14
16
18
0
2
4
6
8
10
12
14
16
18
0
2
4
6
8
10
12
14
16
18
42
Initial values & Innovations
1,2 w2 , 2e
: optimized
1,2 w2 , 2e
: optimized
(0), (0), (0)
: not optimized
(0), (0), (0)
: optimized
1.5
1.5
1
1
0.5
0.5
0
0
-0.5
-0.5
-1
0
500
1000
1500
2000
2500
3000
3500
-1
500
1000
1500
2000
2500
3000
3500
0
500
1000
1500
2000
2500
3000
3500
0.15
0.6
0.4
0.1
0.2
0.05
0
0
-0.2
-0.05
-0.4
-0.1
-0.6
-0.15
-0.8
0
0
500
1000
1500
2000
2500
3000
3500
-0.2
43
Reality in Data Analysis
1. Zero prediction errors are not possible to attain
even with nonlinear causal (chaos) models.
2. Residuals of ARIMA models are usually almost Gaussian,
but not always.

Time - inhomogeneous residuals
Gaussian white residuals + a few outliers

Generally distributed residuals are rare !

(We don’t see bi-modally distributed residuals)
44
Obvious Choice
Whitening filter-1
Whitening filter-2
x1 , x2 , x3 ,..., xN
x1 , x2 , x3 ,..., xN
A question is whether it is possible
Causal
Model-1 deterministic
Causal Model-2
to find
a perfect
model
for the data x1,x2,…,xN
1 , 2 , 3 , ..., N
(Stochastic Model)
0, 0, 0, .. .,0
(Deterministic Model)
45
Application - (4)
Spatial time series modeling
46
Example : Data assimilation
in meteorology
(i, j )
xt
(t  1,.2,...,n)
States on the lattice points (i,j)
Estimation
p  dim( yt )  dim( xt )  N  N
Observations
yt  H {xt(1,1) ,...xt( N , N ) }   t (t  1,.2,..., n)47
Mutual understanding : on the way
Variational method (4D-Var )
1 n
I ( x0 )  { yi  H ( xi|0 )}' Ri1{ yi  H ( xi|0 )}
2 i 0
1
 {x0  x0|0 }' B01{x0  x0|0 }
2
Rabier et al(1993)
Ide & Ghil(1997)
 Closed system
 Perfect model with Observation errors
 Penalized Least Squares method
1 n
I ( x0 )   { yi  H ( xi|i )}' Ri1{ yi  H ( xi|i )}
2 i 1
1
 {x0  x0|0 }'V01{x0  x0|0 }
2
n
  {xi  xi|i 1 )}' Qi1{xi  xi|i 1 )}
i 1
 Open system
 Model errors
as well as Observation errors
48
Similar principles
Variational method (4D-Var )
 Closed system
 Perfect model with observation errors
Penalized Least Squares method
 Open system
 Model errors as well as observation errors
Sequential method (Extended Kalman Filter  MLE )
 Open system
 Model errors as well as observation errors
(similar to P.L.S. but not the same!)
49
Hidden approximations
behind
perfect-model assumptions
Infinite dimensional state  Finite dimension
( Model error)
Open universe  Closed system
( Model error)
Nonlinear dynamics  Numerical Approximation
( Model error)
50
Experience in Chaos
(Smoothing the trajectory)
Penalized L.S. method didn’t work !
1 n
I ( x0 )   { yi  H ( xi|i )}' Ri1{ yi  H ( xi|i )}
2 i 1
1
 {x0  x0|0 }'V01{x0  x0|0 }
2
n
  {xi  xi|i 1 )}' Qi1{xi  xi|i 1 )}
i 1
Farmer & Sidorowich(1991)
Kostelich & York(1988)
Many local minimums !
51
Non-penalized L.S. method(4D-VAR)
is even worse !
Prediction errors
 k  x k  E[xk | xk 1,..., x1]
with  n2  0
52
Prediction errors with assumptions
0.5
n2  0.1 107
0
-0.5
0
100
200
300
400
500
600
700
800
900
1000
0.2
0.1
n2  0.1 106
0
-0.1
-0.2
0
100
200
300
400
500
600
700
800
900
1000
0.2
0.1
n2  0.1 104
0
-0.1
-0.2
0
100
200
300
400
500
600
700
800
900
1000
53
M.L.E. with L.L. Filtering
Ozaki(1994), Ozaki et al.(2000)
Maximum Likelihood Method works
for Lorenz chaos & Rikitake chaos
(2)log p(y1 , y2 ,..., yN )
N
  [log  t|t2 1 
t 1
{yt  H(xt |t 1 )}2

2
t |t 1
]
1. Optimization
2. Objective function
3. Filtering
Remained problem :
Computational burden from huge dimensional states
Innovation Approach
54
Numerical Example
Innovation Approach to Spati
al TimeSeries:
fMRI data
147,456-dimensional time series
147,456 channels time series
55
fMRI Machine
56
fMRI data
Sampling frequency: 3 sec
(3T)
Resolution: 64× 64 × 36 (slices)
147,456 dimensional time series
147,456-dim AR(1) is impossible?
(147,456 147,456 transition matrix ?)
57
Innovations in spatial dynamics

(i , j)
t
(i, j )
t
(x)  x
 E[x
(i, j)
t
|x
(*,*)
t 1
(*,*)
t2
,x
,...]
j)
(*,*)
(*,*)
(*,*) (*,*)
(it , j) (x | s)  xt(i, j )  E[x(i,
|
x
,
x
,...,s
t
t 1
t2
t 1 , st  2 ,...]
x
x
t
x
x
x
Why not start from the simplest linear model.
58
Space-Temporal Model with stimulus
Simplest example:
xt(i, j )   (i, j )   (i, j) xt(i1, j )   (i , j )t(i1, j )   (i, j ) st 1   t(i, j )
st  : Stimulus
(i, j )
(i, j 1)
t1  (x t 1
(i, j 1)
, xt 1
(i1, j)
,x t 1
(i 1, j)
, xt 1
 (i, j )  (b(iN , j) ,bS(i, j ) ,bW(i, j ) ,b(iE , j) )
)' : Neighbour vector
: Neighbour coefficient
Solve a linear equation for each space point (i,j).
(i, j) , (i, j ) , (i, j ) , (i , j) ,(i, j )
59
Estimated Model tells you something
(i , j)
xt
r1
r2
r3
k 1
k 1
k 1
j ) (i, j)
ˆ (i , j ) (i , j )  ˆ (i, j ) s   (i, j)
ˆ (i , j)  
ˆ (i,
 
x


 k t k  k t  k t
k
t k
This is a special type of Npdim ARX model
Np=64×64×36 =147,456)
r1
r3
k 1
k 1
ˆ  Aˆ X  
Xt  M
 k t  k  ˆ k St  k  Et
60
Looking through
1. Mean field map
r3
k 1
k 1
ˆ  Aˆ X  
Xt  M
 k t  k  ˆ k St  k  Eˆ t
ˆ (i, j )

2. Cross-spectrum field map
ˆ Aˆ ( f )
pˆ ( f )  Aˆ ( f ) 
Np
(n i j ,n i j )
pˆ
3. Causality field map
(n ,k )
ˆ 2k
( f )   | aˆ i j ( f ) |2 
k 1
4. Impulse response function
5. Innovation field map
r1
j)
j)
h(i, j,  ) & x (i,
    h(i, j, ) (i,
t
t 

i
j
ˆt(i, j )

2
6. Response field map
(i, j )
ˆ

 k St k
1
7. Innovation + Response field map
ˆ

(i, j )
t
2
ˆ (ik , j) St k
  
1
61
Example – A
fMRI data 3T
Task : Visual stimulus by black and white shuffled check board
Sampling frequency: 3s
Resolution: 64×64×36 (slices)
(=147,456)
62
Slice 12
k=12
Time Series plots
of
special line (j=25) of the slice k=12.
k=12
j=25
i=36~50
63
i=36 ~ 40
k=12
j=25
(i, j )
xt
2

k  1
(i, j)
k S t k

(i, j )
t
2
   (ik , j) St k
1
高
64
i=41 ~ 45
k=12
j=25
(i, j )
xt
2
2

(i, j)
k
k  1
S t k
t(i, j )    (ik , j) St k
1
65
i=46 ~ 50
k=12
j=25
低
低
(i, j )
xt
2

k  1
(i, j)
k S t k
2
t(i, j )    (ik , j) St k
1
大
大
66
Innovation Maps

(i, j )
t
2
 
1
2
(i , j)
k
t k
S

(i, j)
k
k  1
S t k
(i, j )
xt
67
Spatial Impulse Response
68
Spatial ARX Simulation-1
(i, j )
xt
(i , j )
(i1, j)
 0.7xt 1  0.1xt 1
(i 1, j )
 0.1xt 1
(i, j1)
 0.1xt 1
(i, j 1)
 0.1xt 1
 t
(i, j )
69
Innovation Approach
could be useful in Space-Time
(i, j )
1
x
(i, j )
2
,x
(i, j )
3
,x
,...,x
(i , j)
N
Causal Model

Time Series Models

Dynamical Systems

Stochastic Dynamical Systems

(i , j)
1
,
(i, j )
2
,
(i , j )
3
,..., 
(i , j )
N
Thank you
([email protected])
71