Analysis of time series Riccardo Bellazzi Dipartimento di Informatica e Sistemistica Università di Pavia Italy [email protected].

Download Report

Transcript Analysis of time series Riccardo Bellazzi Dipartimento di Informatica e Sistemistica Università di Pavia Italy [email protected].

Analysis of time series
Riccardo Bellazzi
Dipartimento di Informatica e Sistemistica
Università di Pavia
Italy
[email protected]
time course of blood flux
220
210
Blood flux
200
190
180
170
160
150
0
10
20
30
40
Dialysis sessions
50
60
70
Time series
• Time series: a collection of observations
made sequentially in time
• Many application fields:
–
–
–
–
Economic time series
Physical time series
Marketing time series
Process control
• Characteristics:
– successive observations are NOT independent
– The order of observation is crucial
Why time series analysis
• Description
• Explanation
• Prediction
• Control
Understand, then act
Outline
• Dynamic systems basics
– Basic concepts
– Linear and non linear dynamic systems
• Structural and black box models of
dynamic systems
– Time series analysis
• AI approaches for the analysis of time series
– Knowledge-based Temporal Abstractions
– Knowledge-discovery through clustering of time
series
Outline
• Dynamic systems basics
– Basic concepts
– Linear and non linear dynamic systems
• Structural and black box models of
dynamic systems
– Time series analysis
• AI approaches for the analysis of time series
– Knowledge-discovery through clustering of time
series
– Knowledge-based Temporal Abstractions
Dynamical systems
• System: a (physical) entity which can be
manipulated with actions, called inputs (u)
and that, as a consequence of the actions,
gives a measurable reaction, called output
(y)
• Dynamic: the system changes over time; in
general, the output does not only depend on
the input, but also on the current “state” of
the system (x), i.e. on the system history
u
x
y
A dynamical system
(example)
• A simple circuit with two lamps and one
switch with values 0 (u1) or 1 (u2). The output
can be y={y1 (lamp1 on), y2 (lamp2 on), y3
(off)}. The system is configured to have four
states, x1, x2, x3, x4
x1
u1
u2
x2
x4
u2
u1
x3
x1
y1
x2
y2
x3
y3
x4
Dynamical system definition
• A dynamical system is a process in which a
function's value changes over time
according to a rule that is defined in terms
of the function's current value and the
current time.
Modeling a dynamical
system
• Two ingredients:
– A state transition function
X(t)=f(t,t0,X0,u(.))
x1
u1
u2
x4
u2
x2
u1
x3
– An output transformation
Y(t)=h(t,x(t))
x1
y1
x2
y2
x3
y3
x4
Main classes of dynamical
systems
• Continuous / discrete
• Linear / nonlinear
• Time invariant / variant systems
• Single / Multiple Input / Outputs
• Deterministic / stochastic
Discrete and continuous
systems
• Discrete: the time set is the set of integer
numbers (t=1,2,…,k,…). The system is
typically modeled with difference
equations
x(k  1)  f(x(k),u(k)), y(k) h(x(k))
• Continuous: the time set is the set of non negative real numbers. The system is
typically modeled with differential
equations
dx
 f ( x(t ), u ),
x(to )  xo ,
y (t )  h( x(t ))
dt
Equilibrium
The pair u , x defines an equilibrium if and only if
0  f ( x ,u )
The output at the equilibrium is given by
y  g ( x ,u )
Compartmental models
x1 = drug concentration in the gastrointestinal compartment (mg/cc)
x2 = drug concentration in the hematic compartment (mg/cc)
k1 = transfer coefficient for the gastrointestinal compartment (h-1)
k2 = transfer coefficient for metabolic and excretory systems (h-1)
States and inputs
x1 , x2, u1, u2
 dx1
 dt
 dx
 2
 dt
  k1 x1  b1u1
ingestion
u1
Gastrointestinal
compartment
injection
u2
k1
Hematic
compartment
k2
  k1 x1  k 2 x2  b2u2
elimination
Equilibrium
Given constant inputs, u1 and u2,
 dx1
 dt
 dx
 2
 dt
  k1 x1  b1u1
  k1 x1  k 2 x2  b2u2
0   k1 x1  b1u1

0   k1 x1  k 2 x2  b2u2
x1 
b1u1
k1
k1 x1  b2u2
x2 
k2
Stability of equilibria
x( t )  f ( x( t )),x( 0 )  x0
An equilibrium x = a is asymptotically
stable if all the solutions starting in the
neighbourhood of a moves towards it.
f(x)
x
Stability of trajectories
Stable
Unstable
Asymptotically
stable
Phase portrait
x1  f1( x1 , x2 )
x2  f 2 ( x1 , x2 )
The locus in the x1-x2 plane of the solution x(t) for all t > 0 is
a curve that passes through the point x0. The x1-x2 plane is
usually called the state plane or phase plane.
For easy visualization, we represent f(x)=(f1(x),f2(x)),
x= (x1,x2 ), as a vector, that is, we assign to x the directed
line segment from x to x + f(x).
The family of all trajectories or solution curves is called the
phase portrait.
A Phase portrait of a
pendulum
x '=y
y ' = - sin(x) - y
2
M=g=l=1
1
y
equilibrio instabile
equilibrio as. stabile
0
-1
-2
-3
-4
-2
-1
0
1
x
2
3
4
The phase portraits
• Fixed or equilibrium points
• Periodic orbits or limit cycles
• Quasi periodic-attactors
• Chaotic of strange attractors
Non linear dynamic systems theory studies the
property of the system in the phase plan
Linear systems
dx
 f ( x(t ), u ),
dt
x(to )  xo ,
y (t )  h( x(t ))
• Linear systems: f and g are linear in x
and u
• Linear Time Invariant (LTI) Systems
x( t )  Ax( t )  Bu( t )
y( t )  Cx( t )  Du( t )
Theorem: An equilibrium point of a LTI system is stable,
asymptotically stable or unstable if and only if every equilibrium point
of the system is stable, asymptotically stable or unstable respectively
Linear systems
• The dynamics is characterized by the
eigenvalues of the matrix A
Linear systems: input/output
representation
• A linear system can be represented in
the frequency domain
t
y( t )   g y ( t   )u(  )d
0
u(t)
U(s)
g(t)
G(s)
y(t)
Y(s)
Ly( t )  Y ( s )  G( s )U ( s )
Reachability
Definition: A state ~x is reachable if there exists a finite
time instant ~t  0 and an input u~, defined from 0 to ~t,
such that x f ( ~
t ) ~
x
A system such that all its states are reachable is called
completely reachable
Observability
Definition: A state ~x  0 is called unobservable if, for
any finite ~t , yl ( t )  0,0  t  ~
t.
A system without unobservable states is called
completely observable
Decomposition
Reachable and
unobservable
u
Reachable and
observable
Unreachable and
non observable
Unreachable and
observable
ˆxa
ˆxb
ˆxc
ˆxd
Output
transformation
y
Outline
• Dynamic systems basics
– Basic concepts
– Linear and non linear dynamic systems
• Structural and black box models of dynamic
systems
– Time series analysis
• Some AI approaches for the analysis of time
series
– Knowledge-discovery through clustering of time
series
– Knowledge-based Temporal Abstractions
Data Models
• Input/output or black box
• Description of the system only by
knowing measurable data
• Typically based on minimal
assumptions on the system
• No infos on the internal structure of the
system
Modeling with black-box
Reachable and
unobservable
u
Reachable and
observable
Unreachable and
observable
Unreachable and
non observable
ˆxa
ˆxb
ˆxc
ˆxd
Output
transformation
y
Data Models
SYSTEM
DATA
Modeling
INPUT-OUTPUT
RELATIONSHIP
PARAMETER
ESTIMATE
MODEL
PURPOSE
Data Models
• Time series
• Impulse response
• Transfer functions (linear models)
• Convolution / deconvolution (linear
models)
Data models (Input-output)
Example
u
SYSTEM
y
CONCENTRAZIONE
15
10
5
0
0
20
40
60
80
TEMPO
100
120
2
y( t,p)   A ie it
i 1
T
Unknown parameters p  [ A 1, A 2 , 1,  2 ]
140
160
System Models
• White or grey box
• Description of the internal structure of
the system based on physical
principles and on explicit hypotesis on
causal relationships
• After comparison with experimental
data are aimed at understanding the
principles of the system
System Models
SYSTEM
A priori knowledge
Assumptions
Modeling
STRUCTURE
DATA
Purpose
PARAMETER
ESTIMATE
MODEL
SYSTEM MODELS (STRUCTURAL)
COMPARTMENTAL MODELS
y1 = x1/V1
u
y1 = x1/V1
u
x1
V1
x1
V1
x2
k12
k01
k01
x 1 ( t )  k 01x1 ( t )  u( t )
k21
x1 (0)  0
y( t )  x1 ( t ) / V1
Unknown parameters p=[k01, V1]T
x 1( t)  (k 01  k 21 )x 1( t)  k12 x 2 ( t)  u( t)
x 2 ( t)  k 21x 1( t)  k12 x 2 ( t)
x 1( 0 )  0
x 2 (0)  0
y( t)  x 1( t) / V1
Unknown parameters p=[k01, k12, k21, V1]T
Structural models
Guesses/
Prior kb
Reachable and
unobservable
u
Reachable and
observable
Unreachable and
observable
Guesses/
Prior kb
Unreachable and
non observable
ˆxa
ˆxb
ˆxc
ˆxd
Output
transformation
y
Modeling time series
• Time series: data are correlated; data are
realizations of stochastic processes
• Stochastic linear discrete input-output
models
• Two approaches:
– Model the data as a function of time (a
regression over time)
– Model the data as a function of its past values:
ARMA models
• Often, assumption of stationarity (the mean
and variance of the process generating the
data do not change over time)
Autoregressive (AR) models
yk 1  a1 yk  ek
• AR(h) is a regression model that regresses
each point on the previous h time points.
Example is AR(1)
• Each value is affected by random noise
with zero mean and variance s2
• Can be learned with linear estimation
algorithm
Moving Average (MA)
yk  b1ek 1  ek
• A different kind of model is the Moving
Average model (MA(h))
• It propagates over time the effect of the
random fluctuations
• The autocorrelation function may help in
choosing proper models
• An iterative estimation process is needed
ARMA
yk  a1 yk 1  b1ek 1  ek
• It can be used to obtain a more
parsimonious model, with “difficult”
autocorrelation functions
Exogenous inputs
• The system can be driven not only by
noise but also by eXogenous inputs
yk  a1 yk 1  b1ek 1  c1uk  ek
This is the general ARMAX model
Non linear models
• Also non-linear stochastic models have
been proposed in the literature
• Examples are NARX models
yk   dkk ( yk 1 )  ek
• NARX models can be easily learned
from data with Neural Nets
Non linear AR models
• Dynamic Bayesian Nets
Y1k-1
Y1k-1
Y2k-1
Y2k-1
From black-box to structural
stochastic models
Y1
Y1
X1
X1
X2
X2
Y2
Y2
Examples:
- Kalman filters
- Dynamic BNs
- Hidden Markov Models
Observable and partially
observable models
k
k+1
k
k+1
Y1
Y1
X1
X1
X1
X1
X2
X2
X2
X2
Y2
Y2
Y2
Y2
Fully observable
Partially observable
Delay coordinate
embedding
• How to reconstruct a state-space
representation from a uni-dimensional
time series y
• Sampled data
• Idea: add n state variables using the
values of y with a delay of tau
Example
•Data
generated by a
linear system
with two state
variables
Example
Time
0
0.0100
0.0200
0.0300
0.0400
0.0500
0.0600
0.0700
0.0800
0.0900
Y1
0
0.0092
0.0171
0.0238
0.0295
0.0343
0.0383
0.0415
0.0441
0.0462
From 1 dimension
Embedding
Delay=0.05
Time
0
0.0100
0.0200
0.0300
0.0400
X1
X2
0
0.0343
0.0092 0.0383
0.0171 0.0415
0.0238 0.0441
0.0295 0.0462
To 2 dimensions
Plots
Tau=0.265
Tau=0.0442
True
Challenges
• Finding the embedding parameters
– Estimate the number of state variable
– Estimate the delay
• Algorithms proposed in the literature
– Autocorrelation
– Pineda-Somerer
– False near neighbour
Outline
• Dynamic systems basics
– Basic concepts
– Linear and non linear dynamic systems
• Structural and black box models of dynamic
systems
– Time series analysis
• Some AI approaches for the analysis of time
series
– Knowledge-discovery through clustering of time
series
– Knowledge-based Temporal Abstractions
Clustering of time series
Several methodologies available
• Similarity-based clustering
• Model-based clustering
• Template-based clustering
Zhong, S., Ghosh, J., Journal of Machine Learning Research, 2003
Clustering of time series
Several methodologies available
• Similarity-based clustering
• Model-based clustering
• Template-based clustering
Zhong, S., Ghosh, J., Journal of Machine Learning Research, 2003
Similarity-Based Clustering
Key point: to define a distance measure (similarity
function) between time series.
Strategy: temporal profiles which verify the same
similarity condition are grouped together.
Different classes of
clustering, partitioning
maps.
Eisen et al., 1998; Tamayo et al., 1999
algorithms: hierarchical
methods, self-organizing
Similarity-Based Clustering:
how to choose a distance
Minkowski metric
S
Given the time series:
S = s1, … , sn
T= t1, … , tn
DS,T  
p
n
 si  ti 
T
p
i1
p = 1 : Manhattan
p = 2 : Euclidean
p = ∞ : Sup
D(S,T)
Euclidean distance: limits
Problem
Solutions
3
Offset
Translation
2.5
S = S - mean(S)
2
1.5
T = T - mean(T)
1
0.5
00
50
100
150
200
250
0
300
50
100
150
200
250
300
Amplitude
Scaling
0
Noise
100 200 300 400 500 600 700 800 900 1000
0
8
8
6
6
4
4
2
2
0
0
-2
-40
40
60
80
100
120
140
-40
S - mean(S)
std(S)
T
T - mean(T)
std(T)
100 200 300 400 500 600 700 800 900 1000
Smoothing
-2
20
S
20
40
60
80
100
120
140
Other distances (1)
Correlation coefficient:
n
rS,T  
 (s  s)(t  t)
i
i1
n
i
n
 (s  s)  (t  t)
i1
i
2
i1
i
2
- Useful for temporal models.
- Looks for similarities of the shapes of profiles.
- Disadvantage: not robust to temporal dislocations
Other distances (2)
Dynamic Time Warping:
Fixed time axis
Warped time axis
Idea: to ‘extend’ each sequence by repeating some
element. It is possible to calculate the euclidean
distance between the extended sequences.
Functional genomics:
Hiercarchical Clustering with
correlation coefficients
Time series of 13 samples of 517 genes
of human fibroblasts stimulated with
serum.
Dendrograms are related to the heatmaps of gene expression over time.
Eisen et al., PNAS 1998
Iyer et al., Science, 1999
Clustering of time series
• Similarity-based clustering
• Model-based clustering
• Template-based clustering
Zhong, S., Ghosh, J., Journal of Machine Learning Research, 2003
Model-based Clustering (1)
Key point: assume that the data are sampled from
a population composed by sub-populations
characterized by different stochastic processes;
clusters + processes = model
Strategy: the temporal profiles generated by the
same stochastic process are grouped in the same
cluster. The clustering problem becomes a problem
of model selection.
Cheesman and Stutz, 1996; Fraley and Raftery, 2002; Yeung et al., 2001
Model-based Clustering (2)
Given:
Y : the data
M: a set of stochastic dynamic models and a cluster
division
Θ: the model parameters
A suitable approach:
- Bayesian approach: select the model which
maximize the posterior probability of the model M
given the data Y, P(M|Y)
Ramoni e Sebastiani, 1999; Baldi e Brunak, 1998; Kay, 1993
The Bayesian Solution
Ramoni et al., PNAS 2002
Analysis of gene expression time series: CAGED system (Cluster
Analysis of Gene Expression Dynamics)
Assumption: time series generated by an unknown number of
autoregressive stochastic processes (AR)
From Bayes theorem: P(M|Y) proportional to f(Y|M) (marginal
likelihood)
Assumption + hypothesis on the distribution on the model
parameters  calculation of f(Y|M) for each possible model in
closed form
Model selection: agglomerative process + heuristic strategy
Cluster number: automatically selected maximizing the marginal
likelihood
Clustering of time series
• Similarity-based clustering
• Model-based clustering
• Template-based clustering
Zhong, S., Ghosh, J., Journal of Machine Learning Research, 2003
Template-Based Clustering (1)
Idea: group the time series on the basis of the similarity with a
set of qualitative prototypes (templates)
Template-Based Clustering (2)
Data representation: from quantitative to qualitative
Templates may capture the relevant characteristics
of an expression profile, although they can eliminate
the spurious effects caused by noise.
They may simplify the process of capturing the
variety of behavior which characterize the gene
expression profiles.
Current Limit: templates and clusters have to be apriori identified.
Template-Based Clustering:
an example
Hvidsten et al., 2003
Template-based clustering is used to forecast the gene
function on the basis of the knowledge of known
genes.
Define all possible intervals
on the time series
Times series of gene
expression
Templates: Increasing,
decreasing, steady
Cluster with genes that
has a match with a
template on the same
subinterval
Template-Based Clustering:
an example
Possible time
intervals: 3+2+1 = 6
Example: all sets of time
series with 4 points
Possible cluster
3 x 6 = 18
Templates:
Increasing
Decreasing
Steady
Template-Based Clustering:
an example
Matching
Template-Based Clustering:
real gene expression data
Cluster example:
2h-12h Decreasing
Template-based clustering
with temporal abstractions
QUALITATIVE representation of expression
profiles
TEMPORAL ABSTRACTIONS
Shahar, 1997
Temporal Primitives
• Time point
• Interval
Temporal Entities
• Events (<time-point, value>)
• Episodes (<interval, pattern>)
Pattern: specific data course
(decreasing, normal, stationary, …)
Time Series: sequence of
events
300
250
BGL (U/ml)
200
150
100
50
0
1
2
3
4
5
6
7
8
Time (days)
9
10 11 12 13 14
Data Abstraction Methods
• Qualitative Abstraction: quantitative
data are abstracted into qualitative (a
BGL of 110 U/ml is abstracted into
normal value)
• Temporal Abstraction (TA): time
stamped data are aggregated into
intervals associated to specific
patterns.
Temporal Abstractions
• Methods used to generate an abstract
description of temporal data
represented by a sequence of
episodes.
350
280
210
140
70
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Temporal Abstractions
Basic Abstractions
 State
 Trend
 Stationary
Complex Abstractions
State Temporal Abstractions
Low-Normal BGL values
350
BGL (U/ml)
280
210
140
70
0
1
2
3
4
5
6
7
8
time
9
10 11 12 13 14
Trend Temporal Abstractions
BGL decreasing trend
BGL (U/ml)
350
280
210
140
70
0
1
2
3
4
5
6
7
8
Time (days)
9
10
11
12
13
14
Stationary Temporal
Abstractions
BGL Stationary
300
BGL (U/ml)
250
200
150
100
50
0
0
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20
Time
Complex Abstractions
Series1
1
2
3
4
Series2
5
6
7
Series1 OVERLAPS Series2
8
9
10
11
12
13
14
15
16
Time
Complex Abstractions
example
Somogyi Effect: response to hypoglycemia
while asleep with counter-regulatory
hormones causing morning hyperglycemia
hyperglycemia at Breakfast
OVERLAPS
absence of glycosuria
Relationships between
intervals: Allen algebra
A
C
Finished-by
C
A
C
A
C
A
C
A
C
A
C
C
A
Overlaps
A
Meets
Before
Equals
Starts
Contains
Started-by
A
C
A
C
A
C
A
C
A
C
During
Finishes
Overlappedby
Is met by
After
Allen, J.F.: Towards a general theory of
action and time. Artificial Intelligence
(1984)
Clustering with dynamic
template generation
• Idea: apply Temporal Abstractions
• Generate Tas for each temporal
profile
• Cluster together “similar” TAs
TA generation
Expression
Decreasing
Dominant
points
Linear regression
detection
Original time
series
Trend
TAs extracted
Threshold
needed
from local slopes
Increasing
Time
D
D
D
I
I
Picewise linear approximation
(J.A. Horst, I. Beichl, 1997)
I
I
I
Labeling at different
abstraction level (1)
S
 [Steady]
I
 [Increasing]
I
IISII
II I S I
I SSS I
I S III
Labeling at different
abstraction level (2)
L1
S
 [Steady]
I
 [Increasing]
I
L2
ISIS
L3
IS
SI
SIS
SISI
IIIS IISS ISSS SIII SSII SIII SSIS SISS SIIS IISI
ISI
ISSI ISII
Building clusters
Time series to be clustered  labels L1, L2, L3
Comparison
L1
Comparison
L2
L3
Comparison
?
Results: Taxonomy
Saccharomyces Cerevisiae gene expression
L2
Template:
[Increasing Decreasing]
L3
(S. Chu et al. The Transcriptional Program of Sporulation in Budding Yeast. Science, 1998.)
Results (1)
GO Process
(B.J. Breitkreutz et al. Osprey: a network visualization system. Genome Biology, 2003)
Results (2)
GO Process
Results (3)
Outline
• Dynamic systems basics
– Basic concepts
– Linear and non linear dynamic systems
• Structural and black box models of
dynamic systems
– Time series analysis
• AI approaches for the analysis of time series
– Knowledge-discovery through clustering of time
series
– Knowledge-based Temporal Abstractions
Conclusions
• Time is a (the?) crucial aspect of our
lives
• It is therefore crucial for Intelligent
data analysis
• Understanding the dynamics of
processes through modeling
• IDA as an interdisciplinary field
manage time by combining systems
theory, probability theory, AI, …