Analysis of time series Riccardo Bellazzi Dipartimento di Informatica e Sistemistica Università di Pavia Italy [email protected].
Download ReportTranscript Analysis of time series Riccardo Bellazzi Dipartimento di Informatica e Sistemistica Università di Pavia Italy [email protected].
Analysis of time series Riccardo Bellazzi Dipartimento di Informatica e Sistemistica Università di Pavia Italy [email protected] time course of blood flux 220 210 Blood flux 200 190 180 170 160 150 0 10 20 30 40 Dialysis sessions 50 60 70 Time series • Time series: a collection of observations made sequentially in time • Many application fields: – – – – Economic time series Physical time series Marketing time series Process control • Characteristics: – successive observations are NOT independent – The order of observation is crucial Why time series analysis • Description • Explanation • Prediction • Control Understand, then act Outline • Dynamic systems basics – Basic concepts – Linear and non linear dynamic systems • Structural and black box models of dynamic systems – Time series analysis • AI approaches for the analysis of time series – Knowledge-based Temporal Abstractions – Knowledge-discovery through clustering of time series Outline • Dynamic systems basics – Basic concepts – Linear and non linear dynamic systems • Structural and black box models of dynamic systems – Time series analysis • AI approaches for the analysis of time series – Knowledge-discovery through clustering of time series – Knowledge-based Temporal Abstractions Dynamical systems • System: a (physical) entity which can be manipulated with actions, called inputs (u) and that, as a consequence of the actions, gives a measurable reaction, called output (y) • Dynamic: the system changes over time; in general, the output does not only depend on the input, but also on the current “state” of the system (x), i.e. on the system history u x y A dynamical system (example) • A simple circuit with two lamps and one switch with values 0 (u1) or 1 (u2). The output can be y={y1 (lamp1 on), y2 (lamp2 on), y3 (off)}. The system is configured to have four states, x1, x2, x3, x4 x1 u1 u2 x2 x4 u2 u1 x3 x1 y1 x2 y2 x3 y3 x4 Dynamical system definition • A dynamical system is a process in which a function's value changes over time according to a rule that is defined in terms of the function's current value and the current time. Modeling a dynamical system • Two ingredients: – A state transition function X(t)=f(t,t0,X0,u(.)) x1 u1 u2 x4 u2 x2 u1 x3 – An output transformation Y(t)=h(t,x(t)) x1 y1 x2 y2 x3 y3 x4 Main classes of dynamical systems • Continuous / discrete • Linear / nonlinear • Time invariant / variant systems • Single / Multiple Input / Outputs • Deterministic / stochastic Discrete and continuous systems • Discrete: the time set is the set of integer numbers (t=1,2,…,k,…). The system is typically modeled with difference equations x(k 1) f(x(k),u(k)), y(k) h(x(k)) • Continuous: the time set is the set of non negative real numbers. The system is typically modeled with differential equations dx f ( x(t ), u ), x(to ) xo , y (t ) h( x(t )) dt Equilibrium The pair u , x defines an equilibrium if and only if 0 f ( x ,u ) The output at the equilibrium is given by y g ( x ,u ) Compartmental models x1 = drug concentration in the gastrointestinal compartment (mg/cc) x2 = drug concentration in the hematic compartment (mg/cc) k1 = transfer coefficient for the gastrointestinal compartment (h-1) k2 = transfer coefficient for metabolic and excretory systems (h-1) States and inputs x1 , x2, u1, u2 dx1 dt dx 2 dt k1 x1 b1u1 ingestion u1 Gastrointestinal compartment injection u2 k1 Hematic compartment k2 k1 x1 k 2 x2 b2u2 elimination Equilibrium Given constant inputs, u1 and u2, dx1 dt dx 2 dt k1 x1 b1u1 k1 x1 k 2 x2 b2u2 0 k1 x1 b1u1 0 k1 x1 k 2 x2 b2u2 x1 b1u1 k1 k1 x1 b2u2 x2 k2 Stability of equilibria x( t ) f ( x( t )),x( 0 ) x0 An equilibrium x = a is asymptotically stable if all the solutions starting in the neighbourhood of a moves towards it. f(x) x Stability of trajectories Stable Unstable Asymptotically stable Phase portrait x1 f1( x1 , x2 ) x2 f 2 ( x1 , x2 ) The locus in the x1-x2 plane of the solution x(t) for all t > 0 is a curve that passes through the point x0. The x1-x2 plane is usually called the state plane or phase plane. For easy visualization, we represent f(x)=(f1(x),f2(x)), x= (x1,x2 ), as a vector, that is, we assign to x the directed line segment from x to x + f(x). The family of all trajectories or solution curves is called the phase portrait. A Phase portrait of a pendulum x '=y y ' = - sin(x) - y 2 M=g=l=1 1 y equilibrio instabile equilibrio as. stabile 0 -1 -2 -3 -4 -2 -1 0 1 x 2 3 4 The phase portraits • Fixed or equilibrium points • Periodic orbits or limit cycles • Quasi periodic-attactors • Chaotic of strange attractors Non linear dynamic systems theory studies the property of the system in the phase plan Linear systems dx f ( x(t ), u ), dt x(to ) xo , y (t ) h( x(t )) • Linear systems: f and g are linear in x and u • Linear Time Invariant (LTI) Systems x( t ) Ax( t ) Bu( t ) y( t ) Cx( t ) Du( t ) Theorem: An equilibrium point of a LTI system is stable, asymptotically stable or unstable if and only if every equilibrium point of the system is stable, asymptotically stable or unstable respectively Linear systems • The dynamics is characterized by the eigenvalues of the matrix A Linear systems: input/output representation • A linear system can be represented in the frequency domain t y( t ) g y ( t )u( )d 0 u(t) U(s) g(t) G(s) y(t) Y(s) Ly( t ) Y ( s ) G( s )U ( s ) Reachability Definition: A state ~x is reachable if there exists a finite time instant ~t 0 and an input u~, defined from 0 to ~t, such that x f ( ~ t ) ~ x A system such that all its states are reachable is called completely reachable Observability Definition: A state ~x 0 is called unobservable if, for any finite ~t , yl ( t ) 0,0 t ~ t. A system without unobservable states is called completely observable Decomposition Reachable and unobservable u Reachable and observable Unreachable and non observable Unreachable and observable ˆxa ˆxb ˆxc ˆxd Output transformation y Outline • Dynamic systems basics – Basic concepts – Linear and non linear dynamic systems • Structural and black box models of dynamic systems – Time series analysis • Some AI approaches for the analysis of time series – Knowledge-discovery through clustering of time series – Knowledge-based Temporal Abstractions Data Models • Input/output or black box • Description of the system only by knowing measurable data • Typically based on minimal assumptions on the system • No infos on the internal structure of the system Modeling with black-box Reachable and unobservable u Reachable and observable Unreachable and observable Unreachable and non observable ˆxa ˆxb ˆxc ˆxd Output transformation y Data Models SYSTEM DATA Modeling INPUT-OUTPUT RELATIONSHIP PARAMETER ESTIMATE MODEL PURPOSE Data Models • Time series • Impulse response • Transfer functions (linear models) • Convolution / deconvolution (linear models) Data models (Input-output) Example u SYSTEM y CONCENTRAZIONE 15 10 5 0 0 20 40 60 80 TEMPO 100 120 2 y( t,p) A ie it i 1 T Unknown parameters p [ A 1, A 2 , 1, 2 ] 140 160 System Models • White or grey box • Description of the internal structure of the system based on physical principles and on explicit hypotesis on causal relationships • After comparison with experimental data are aimed at understanding the principles of the system System Models SYSTEM A priori knowledge Assumptions Modeling STRUCTURE DATA Purpose PARAMETER ESTIMATE MODEL SYSTEM MODELS (STRUCTURAL) COMPARTMENTAL MODELS y1 = x1/V1 u y1 = x1/V1 u x1 V1 x1 V1 x2 k12 k01 k01 x 1 ( t ) k 01x1 ( t ) u( t ) k21 x1 (0) 0 y( t ) x1 ( t ) / V1 Unknown parameters p=[k01, V1]T x 1( t) (k 01 k 21 )x 1( t) k12 x 2 ( t) u( t) x 2 ( t) k 21x 1( t) k12 x 2 ( t) x 1( 0 ) 0 x 2 (0) 0 y( t) x 1( t) / V1 Unknown parameters p=[k01, k12, k21, V1]T Structural models Guesses/ Prior kb Reachable and unobservable u Reachable and observable Unreachable and observable Guesses/ Prior kb Unreachable and non observable ˆxa ˆxb ˆxc ˆxd Output transformation y Modeling time series • Time series: data are correlated; data are realizations of stochastic processes • Stochastic linear discrete input-output models • Two approaches: – Model the data as a function of time (a regression over time) – Model the data as a function of its past values: ARMA models • Often, assumption of stationarity (the mean and variance of the process generating the data do not change over time) Autoregressive (AR) models yk 1 a1 yk ek • AR(h) is a regression model that regresses each point on the previous h time points. Example is AR(1) • Each value is affected by random noise with zero mean and variance s2 • Can be learned with linear estimation algorithm Moving Average (MA) yk b1ek 1 ek • A different kind of model is the Moving Average model (MA(h)) • It propagates over time the effect of the random fluctuations • The autocorrelation function may help in choosing proper models • An iterative estimation process is needed ARMA yk a1 yk 1 b1ek 1 ek • It can be used to obtain a more parsimonious model, with “difficult” autocorrelation functions Exogenous inputs • The system can be driven not only by noise but also by eXogenous inputs yk a1 yk 1 b1ek 1 c1uk ek This is the general ARMAX model Non linear models • Also non-linear stochastic models have been proposed in the literature • Examples are NARX models yk dkk ( yk 1 ) ek • NARX models can be easily learned from data with Neural Nets Non linear AR models • Dynamic Bayesian Nets Y1k-1 Y1k-1 Y2k-1 Y2k-1 From black-box to structural stochastic models Y1 Y1 X1 X1 X2 X2 Y2 Y2 Examples: - Kalman filters - Dynamic BNs - Hidden Markov Models Observable and partially observable models k k+1 k k+1 Y1 Y1 X1 X1 X1 X1 X2 X2 X2 X2 Y2 Y2 Y2 Y2 Fully observable Partially observable Delay coordinate embedding • How to reconstruct a state-space representation from a uni-dimensional time series y • Sampled data • Idea: add n state variables using the values of y with a delay of tau Example •Data generated by a linear system with two state variables Example Time 0 0.0100 0.0200 0.0300 0.0400 0.0500 0.0600 0.0700 0.0800 0.0900 Y1 0 0.0092 0.0171 0.0238 0.0295 0.0343 0.0383 0.0415 0.0441 0.0462 From 1 dimension Embedding Delay=0.05 Time 0 0.0100 0.0200 0.0300 0.0400 X1 X2 0 0.0343 0.0092 0.0383 0.0171 0.0415 0.0238 0.0441 0.0295 0.0462 To 2 dimensions Plots Tau=0.265 Tau=0.0442 True Challenges • Finding the embedding parameters – Estimate the number of state variable – Estimate the delay • Algorithms proposed in the literature – Autocorrelation – Pineda-Somerer – False near neighbour Outline • Dynamic systems basics – Basic concepts – Linear and non linear dynamic systems • Structural and black box models of dynamic systems – Time series analysis • Some AI approaches for the analysis of time series – Knowledge-discovery through clustering of time series – Knowledge-based Temporal Abstractions Clustering of time series Several methodologies available • Similarity-based clustering • Model-based clustering • Template-based clustering Zhong, S., Ghosh, J., Journal of Machine Learning Research, 2003 Clustering of time series Several methodologies available • Similarity-based clustering • Model-based clustering • Template-based clustering Zhong, S., Ghosh, J., Journal of Machine Learning Research, 2003 Similarity-Based Clustering Key point: to define a distance measure (similarity function) between time series. Strategy: temporal profiles which verify the same similarity condition are grouped together. Different classes of clustering, partitioning maps. Eisen et al., 1998; Tamayo et al., 1999 algorithms: hierarchical methods, self-organizing Similarity-Based Clustering: how to choose a distance Minkowski metric S Given the time series: S = s1, … , sn T= t1, … , tn DS,T p n si ti T p i1 p = 1 : Manhattan p = 2 : Euclidean p = ∞ : Sup D(S,T) Euclidean distance: limits Problem Solutions 3 Offset Translation 2.5 S = S - mean(S) 2 1.5 T = T - mean(T) 1 0.5 00 50 100 150 200 250 0 300 50 100 150 200 250 300 Amplitude Scaling 0 Noise 100 200 300 400 500 600 700 800 900 1000 0 8 8 6 6 4 4 2 2 0 0 -2 -40 40 60 80 100 120 140 -40 S - mean(S) std(S) T T - mean(T) std(T) 100 200 300 400 500 600 700 800 900 1000 Smoothing -2 20 S 20 40 60 80 100 120 140 Other distances (1) Correlation coefficient: n rS,T (s s)(t t) i i1 n i n (s s) (t t) i1 i 2 i1 i 2 - Useful for temporal models. - Looks for similarities of the shapes of profiles. - Disadvantage: not robust to temporal dislocations Other distances (2) Dynamic Time Warping: Fixed time axis Warped time axis Idea: to ‘extend’ each sequence by repeating some element. It is possible to calculate the euclidean distance between the extended sequences. Functional genomics: Hiercarchical Clustering with correlation coefficients Time series of 13 samples of 517 genes of human fibroblasts stimulated with serum. Dendrograms are related to the heatmaps of gene expression over time. Eisen et al., PNAS 1998 Iyer et al., Science, 1999 Clustering of time series • Similarity-based clustering • Model-based clustering • Template-based clustering Zhong, S., Ghosh, J., Journal of Machine Learning Research, 2003 Model-based Clustering (1) Key point: assume that the data are sampled from a population composed by sub-populations characterized by different stochastic processes; clusters + processes = model Strategy: the temporal profiles generated by the same stochastic process are grouped in the same cluster. The clustering problem becomes a problem of model selection. Cheesman and Stutz, 1996; Fraley and Raftery, 2002; Yeung et al., 2001 Model-based Clustering (2) Given: Y : the data M: a set of stochastic dynamic models and a cluster division Θ: the model parameters A suitable approach: - Bayesian approach: select the model which maximize the posterior probability of the model M given the data Y, P(M|Y) Ramoni e Sebastiani, 1999; Baldi e Brunak, 1998; Kay, 1993 The Bayesian Solution Ramoni et al., PNAS 2002 Analysis of gene expression time series: CAGED system (Cluster Analysis of Gene Expression Dynamics) Assumption: time series generated by an unknown number of autoregressive stochastic processes (AR) From Bayes theorem: P(M|Y) proportional to f(Y|M) (marginal likelihood) Assumption + hypothesis on the distribution on the model parameters calculation of f(Y|M) for each possible model in closed form Model selection: agglomerative process + heuristic strategy Cluster number: automatically selected maximizing the marginal likelihood Clustering of time series • Similarity-based clustering • Model-based clustering • Template-based clustering Zhong, S., Ghosh, J., Journal of Machine Learning Research, 2003 Template-Based Clustering (1) Idea: group the time series on the basis of the similarity with a set of qualitative prototypes (templates) Template-Based Clustering (2) Data representation: from quantitative to qualitative Templates may capture the relevant characteristics of an expression profile, although they can eliminate the spurious effects caused by noise. They may simplify the process of capturing the variety of behavior which characterize the gene expression profiles. Current Limit: templates and clusters have to be apriori identified. Template-Based Clustering: an example Hvidsten et al., 2003 Template-based clustering is used to forecast the gene function on the basis of the knowledge of known genes. Define all possible intervals on the time series Times series of gene expression Templates: Increasing, decreasing, steady Cluster with genes that has a match with a template on the same subinterval Template-Based Clustering: an example Possible time intervals: 3+2+1 = 6 Example: all sets of time series with 4 points Possible cluster 3 x 6 = 18 Templates: Increasing Decreasing Steady Template-Based Clustering: an example Matching Template-Based Clustering: real gene expression data Cluster example: 2h-12h Decreasing Template-based clustering with temporal abstractions QUALITATIVE representation of expression profiles TEMPORAL ABSTRACTIONS Shahar, 1997 Temporal Primitives • Time point • Interval Temporal Entities • Events (<time-point, value>) • Episodes (<interval, pattern>) Pattern: specific data course (decreasing, normal, stationary, …) Time Series: sequence of events 300 250 BGL (U/ml) 200 150 100 50 0 1 2 3 4 5 6 7 8 Time (days) 9 10 11 12 13 14 Data Abstraction Methods • Qualitative Abstraction: quantitative data are abstracted into qualitative (a BGL of 110 U/ml is abstracted into normal value) • Temporal Abstraction (TA): time stamped data are aggregated into intervals associated to specific patterns. Temporal Abstractions • Methods used to generate an abstract description of temporal data represented by a sequence of episodes. 350 280 210 140 70 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Temporal Abstractions Basic Abstractions State Trend Stationary Complex Abstractions State Temporal Abstractions Low-Normal BGL values 350 BGL (U/ml) 280 210 140 70 0 1 2 3 4 5 6 7 8 time 9 10 11 12 13 14 Trend Temporal Abstractions BGL decreasing trend BGL (U/ml) 350 280 210 140 70 0 1 2 3 4 5 6 7 8 Time (days) 9 10 11 12 13 14 Stationary Temporal Abstractions BGL Stationary 300 BGL (U/ml) 250 200 150 100 50 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Time Complex Abstractions Series1 1 2 3 4 Series2 5 6 7 Series1 OVERLAPS Series2 8 9 10 11 12 13 14 15 16 Time Complex Abstractions example Somogyi Effect: response to hypoglycemia while asleep with counter-regulatory hormones causing morning hyperglycemia hyperglycemia at Breakfast OVERLAPS absence of glycosuria Relationships between intervals: Allen algebra A C Finished-by C A C A C A C A C A C C A Overlaps A Meets Before Equals Starts Contains Started-by A C A C A C A C A C During Finishes Overlappedby Is met by After Allen, J.F.: Towards a general theory of action and time. Artificial Intelligence (1984) Clustering with dynamic template generation • Idea: apply Temporal Abstractions • Generate Tas for each temporal profile • Cluster together “similar” TAs TA generation Expression Decreasing Dominant points Linear regression detection Original time series Trend TAs extracted Threshold needed from local slopes Increasing Time D D D I I Picewise linear approximation (J.A. Horst, I. Beichl, 1997) I I I Labeling at different abstraction level (1) S [Steady] I [Increasing] I IISII II I S I I SSS I I S III Labeling at different abstraction level (2) L1 S [Steady] I [Increasing] I L2 ISIS L3 IS SI SIS SISI IIIS IISS ISSS SIII SSII SIII SSIS SISS SIIS IISI ISI ISSI ISII Building clusters Time series to be clustered labels L1, L2, L3 Comparison L1 Comparison L2 L3 Comparison ? Results: Taxonomy Saccharomyces Cerevisiae gene expression L2 Template: [Increasing Decreasing] L3 (S. Chu et al. The Transcriptional Program of Sporulation in Budding Yeast. Science, 1998.) Results (1) GO Process (B.J. Breitkreutz et al. Osprey: a network visualization system. Genome Biology, 2003) Results (2) GO Process Results (3) Outline • Dynamic systems basics – Basic concepts – Linear and non linear dynamic systems • Structural and black box models of dynamic systems – Time series analysis • AI approaches for the analysis of time series – Knowledge-discovery through clustering of time series – Knowledge-based Temporal Abstractions Conclusions • Time is a (the?) crucial aspect of our lives • It is therefore crucial for Intelligent data analysis • Understanding the dynamics of processes through modeling • IDA as an interdisciplinary field manage time by combining systems theory, probability theory, AI, …