Network Inference - University of Glasgow :: School of

Download Report

Transcript Network Inference - University of Glasgow :: School of

Network Inference
Umer Zeeshan Ijaz
1
Overview
•Introduction
•Application Areas
• cDNA Microarray
• EEG/ECoG
•Network Inference
• Pair-wise Similarity Measures
• Cross-correlation
• Coherence
• Autoregressive
• Granger Causality
• Probabilistic Graphical Models
• Directed
• Kalman-filtering based EM algorithm
• Undirected
• Kernel-weighted logistic regression method
• Graphical Lasso-model
STATIC
STATIC
STATIC
STATIC
DYNAMIC
STATIC
Introduction
cDNA Microarray
EoCG/EEG
Cross-correlation based(1)
For a pair of time series xi[t] and xj[t] of lengths n, the sample correlation at lag τ
n 
1
Cij [ ] 
( xi [t ]  xi )( x j [t   ]  x j )

ˆ iˆ j (n  2 ) t 1
x j [t   ]
xi [t ]
Cij [ ]
Measure of Coupling is the maximum cross correlation:
1 n
var(l ) 
Cii [ ]C jj [ ]

n  l   n
zij 
sij
var(lˆij )
sij  max | Cij [ ] |
Use P-Value test to compare zij with a standard
normal distribution with mean zero and variance 1
zij is thescaled value wherelˆij is thelag corresponding to sij
Cross-correlation based (2)
Significance test: ANALYTIC METHOD
Use Fisher Transformation: the resulting distribution is
normal and has the standard deviation of 1 / n  3
1 1  Cij [ ]
F
Cij [ ]  ln
2 1  Cij [ ]
sijF
F
zij 
var(CijF )
Use scaled value that is expected to behave like
the maximum of the absolute value of a sequence
of random numbers. Using now established
results for statistics of this form, we obtain
therefore that
P[ z ]  exp{2 exp[an ( z  bn )]}
where P[ z ]  Pr{zijF  z}
an  2 ln n
bn  an  (2an ) 1 (ln ln n  ln 4 )
*M. A. Kramer, U. T. Eden, S. S. Cash, E. D. Kolaczyk, Network inference with confidence from multivariate time series. Physical review E 79, 061916, 2009
Cross-correlation based (3)
Significance test: FREQUENCY DOMAIN BOOTSTRAP METHOD
1) Compute the power spectrum (Hanning tapered) of each series and average these
power spectra from all the time series
2) Compute the standardized and whitened residuals for each time series
ei [t ]   ( ~
xi [ ] / p[ ] )
~
where x [ ]  ( x [t ])
1
i
i
3) For each bootstrap replicate, RESAMPLE ei [t ] WITH REPLACEMENT and compute the
surrogate data
xi [t ]   (e~i [] / p[])
1
where e~i [ ] is theFourier transformof theresidual ei [t ] resampledwith replacement
4) Compute N s such instances and calculate maximum cross-correlation sˆij for each pair
of nodes i and j
5) Finally compare the bootstrap distribution and assign a p-value
Cross-correlation based (4)
False Detection Rate Test
1) Order m=N(N-1)/2 p-values
p1  p2  ...  pm
2) Choose FDR level q
3) Compare each pi to critical value q.i / m
and find the maximum i such that
pk  q.k / m
4) We reject the null hypothesis that time
series xi [t ] and x j [t ] are uncoupled for
p1  p2  ...  pk
*M. A. Kramer, U. T. Eden, S. S. Cash, and E. D. Kolaczyk. Network inference with confidence from multivariate time series, Physics Review E
79(061916), 1-13, 2009
Coherence based
Coherence: Signals are fully correlated with constant phase shifts, although they
may show difference in amplitude
Cxy ( f ) | Sxx( f ) | / Sxx( f ) * Syy( f )
2
2
Cross-phase spectrum: Provides information on time-relationships between two
signals as a function of frequency. Phase displacement may be converted into time
displacement
 ( f )  arctan(ImSxy( f ) / Re Sxy( f ))
Coherence based(2)
*S. Weiss, and H. M. Mueller. The contribution of EEG coherence to the investigation of language, Brain and Language 85(2), 325-343, 2003
Granger Causality
Directed Transfer Function: Directional influences between any given pair of
channels in a multivariate data set
Bivariate autoregressive process
P
P
j 1
j 1
P
P
j 1
j 1
X 1 (t )   A11 ( j ) X 1 (t  j )   A12 ( j ) X 2 (t  j )  E1 (t )
X 2 (t )   A21 ( j ) X 1 (t  j )   A22 ( j ) X 2 (t  j )  E2 (t )
If the variance of the prediction error is reduced by the inclusion of other series,
then based on granger causality, one depends on another. Now taking the fourier
transform
 A11 ( f )

 A21 ( f )
A12 ( f )  X 1 ( f )   E1 ( f ) 



A22 ( f )  X 2 ( f )   E2 ( f ) 
p
Alm ( f )   lm   Alm ( j )e i 2fj
j 1
1 when l  m
0 when l  m
 lm  
 X 1 ( f )   A11 ( f )

  
 X 2 ( f )   A21 ( f )
where
A12 ( f ) 

A22 ( f ) 
 H11 ( f ) H12 ( f )   A11 ( f )

  
H
(
f
)
H
(
f
)
21
22

  A21 ( f )
1
Granger causality from channel j to i:
 E1 ( f ) 


 E2 ( f ) 
A12 ( f ) 

A22 ( f ) 
1
I
2
j i
| H ij ( f ) | 
2
| Aij ( f ) |2
| A( f ) |2
Kalman Filter
- State Space Model (State Variable Model; State Evolution Model)
State Equation
xk 1  Axk  wk ,
Measurement Equation
g k  Cxk  vk ,
E[wk  wkT ]  Qk   N N
xk  N1
gk  M 1
E[vk  vkT ]  Rk  M M
E[(xk  xk|k )(xk  xk|k )T ]  Pk|k
Measurement Update(Filtering) Time Update(Prediction)
Kk  Pk|k1CT [CPk|k1CT  Rk ]1
Pk 1|k  APk|k AT  Qk
Pk|k  (I N  Kk C)Pk|k1
xk|k  xk|k1  Kk [ gk  Cxk|k1 ]
xk 1|k  Axk|k
Probabilistic graphical models(1)
Joint distribution over a set
X = X1 ,..., X n 
Bayesian Networks associate with each variable
P X i | Ui ,Ui  X
Xi
a conditional probability
The resulting product is of the form P X 1 ,..., X n  =  P X i | U i 
i
A
D
P(C|A,B)
B
C
E
A
B
0
1
0
0
1
1
0
1
0
1
0.9
0.2
0.9
0.01
0.1
0.8
0.1
0.99
P A,B,C, D, E  = P APBPC | A,BPD | APE | C 
EM Algorithm: Predicting gene regulatory network
xt 1  Axt  wt
gt  Cxt  vt
xt 1  Axt  Bgt  wt
gt  Cxt  Dgt 1  vt
Definenew state vector~
x Tt  [ x Tt , g Tt ]T
T hen thenew statespace formis
~
~
x A ~
x w
t 1
0
t
t
g t  H~
xt
where
B 
A
A0  
, H  (0 I)

CA CB  D 
 wt  ~  Q
QCT 
~

, Q  
w t  
T

 Cw t  v t 
 CQ CQC  R 
Constructing the network:
H 0 : (CB  D)i , j  0 (No connection)
H1 : (CB  D)i , j  0 (Connection)
EM Algorithm: Predicting gene regulatory network(2)
x t 1  Axt  Bg t  w t
g t  Cx t  Dgt 1  vt
B and D are input tothestateand observation matrices,respectively
Conditional distribution of state and observables
P ( x t 1 | x t , g t ) ~  (Axt  Bg t , Q)
P (g t | x t , g t 1 ) ~  (Cx t  Dgt 1 , R )
 1

exp ( x t 1  Axt  Bg t )T Q 1 ( x t 1  Axt  Bg t )
 2

P ( x t 1 | x t , g t ) 
K /2
1/ 2
(2 ) | Q |
 1

exp (g t  Cx t  Dgt 1 )T R -1 (g t  Cx t  Dgt 1 )
 2

P ( y t | x t , g t 1 ) 
p/2
1/ 2
(2 ) | R |
Factorization rule for bayesian network
T 1
T
t 1
t 1
P({x t }, {g t } |  )  P(x1 ) P(x t 1 | x t , g t ) P(g t | x t , g t 1 )
Unknowns in the system
  {A, B, C, D, Q, R, Q1 , μ1}
EM Algorithm: Predicting gene regulatory network(4)
Construct the likelihood
L(θ; x, g)  log P(x, g | θ)
Construct the likelihood
θˆ  argmaxθ L(θ; x, g)
Marginalize with respect to x and introducing a distribution Q
L(θ; x, g)  F (Q(x), θ)
where Q k 1 (x)  P( x | g, θ)
argmaxθ L(θ; x, g) 
E step: Qk 1  arg maxQ F(Q,θ k )
M step: θ k 1  arg maxθ F(Qk 1 , θ)
Kalman filter based: Inferring network from microarray expression data(5)
θ  {A, B, C, D, Q, R, Q1 , μ1}
Let’s say we want to compute C
[2 log P({x t }, {g t } | θ)]

C
T 1
[ (g t  Cx t  Dgt 1 )R 1 (g t  Cx t  Dgt 1 )]
t 1
C
T
T
T
t 1
t 1
t 1
  g t xt  C x t xt  D g t 1x t  0
T akingexpectations
1
T
 T
 T 
C    g t xˆ t  D g t 1xˆ t   Pt 
t 1
 t 1
 t 1 
We can repeat thesame procedurefor D and obtain otherparameters, but when we will
do that wewill require
xˆ t  E[ xˆ t | {g}]
Pt  E[ xˆ t xˆ t | {g}]
Pt 1,t  E[ xˆ t xˆ t 1 | {g}]
which we will obtain from theKALMAN- SMOOT HER
Kalman filter based: Inferring network from microarray expression data(9)
Experimental Results: A standard T-Cell
activation model
*Claudia Rangel, John Angus, Zoubin Ghahramani, Maria Lioumi, Elizabeth Sotheran, Alessia Gaiba, David L. Wild, Francesco Falciani: Modeling T-cell activation using gene expression profiling and state-space
models. Bioinformatics 20(9): 1361-1372 (2004)
Probabilistic graphical models(2)
Markov Networks represent joint distribution as a product of potentials
P X 1 ,..., X n  =
D
1
Z

[c j ]
j
A
C
j
B
A
B
π1(A,B)
0
0
1
1
0
1
0
1
1.0
0.5
0.5
2.0
E
P A, B, C, D, E  =
1
π1  A, B π 2 B, C, E π3 C, D π 4  A, D 
Z
Kernel-weighted logistic regression method(1)
Pair-wise Markov Random Field


(t )
(t )
(t )
(t ) 

P ( t ) ( X ) 
exp   uv X u X v
(t )


Z ( )
 (u ,v )( t )

x6
1
θ
x1
Logistic Function
P ( t ) ( X
\u
(t )
u
|X )
(t )
\u

exp 2 X u(t )  \(ut ) , X \(ut )



exp 2 X u(t )  \(ut ) , X \(ut )  1
θ
θ
θ
x5
θ
θ
x2
x8
x4
θ
θ
x3
Log Likelihood
 ( ; X )  log P ( X
(t )
\u
(t )
(t )
\u
(t )
u
(t )
\u
|X )
Optimization problem
 n (t )

(t )
ˆ
 \u  argminˆ( t )  p1  -  w (ti ) ( \(ut ) ; X (ti ) )   ||  \(ut ) ||1 
\u
 i 1

a,b  a T b
\ u : V  u
(t )
 \(ut )  { uv
| v  \u} is p-1 dimensional
Kernel-weighted logistic regression method(2)
ˆ\(ut )
 n (t )

( ti )
(t )
(t )
 argminˆ( t ) p1  -  w (ti ) ( \u ; X )   ||  \u ||1 
\u
 i 1

Kernel-weighted logistic regression method(3)
Interaction between gene ontological groups related to developmental process undergoing dynamic rewiring. The weight of an edge between two
ontological groups is the total number of connection between genes in the two groups. In the visualization, the width of an edge is propotional to the edge
weight. The edge weight is thresholded at 30 so that only those interactions exceeding this number are displayed. The average network on left is produced
by averaging the right side. In this case, the threshold is set to 20
*L. Song, M. Kolar, and E. P. Xing. KELLER: estimating time-varying interactions between genes. Bioinformatics 25, i128-i136, 2009
Graphical Lasso Model(1)
Let Z  ( Z1 ,...,Z p ) T be a p - variateGaussian distribution withmean  and covariancematrix
- Key Idea : If theijth componentof  1 is zero theni and j are conditionally independent
 is an estimatorof ˆ -1 argmax0 logdet   tr (S)   ||  ||1
W is an estimatorof ˆ
1

min || W111/ 2   b ||2   ||  ||1 

2

where b  W11-1/2 s12 and aftersolvingfor  ,
we can obtain w12  W11 .
Empiricalcovariancematrix


1 n (k )
S   Z   Z (k )  
n k 1
W
W   T11
 w12
w12 
w 22 
S
S   T11
 s12

T
s12 
s 22 
*O. Banerjee, L. E. Ghaoui, A. d’Aspremont. Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. Journal of Machine Language Research 101, 2007
Initialise:W  S  I
Graphical Lasso Model(2)
Solve the lasso problem for w12 over jth column one at a time
 w11
w
 21
 w31

 w41
 w51
 w55
w
 25
 w35

 w45
 w15
w15 
w25 
w35 

w45 
w55 
 s11
s
 21
 s31

 s41
 s51
w52 w53 w54 w51 
w22 w23 w24 w21 
w32 w33 w34 w31 

w42 w43 w44 w41 
w12 w13 w14 w11 
 s55
s
 25
 s35

 s45
 s15
w12
w13
w14
w22
w32
w23
w33
w24
w34
w42
w43
w44
w52
w53
w54
W
W   T11
 w12
s12
s13
s14
s22
s32
s23
s33
s24
s34
s42
s43
s44
s52
s53
s54
s52
s53
s54
s22
s32
s23
s33
s24
s34
s42
s43
s44
s12
s13
s14
w12 
S11
S


sT
w 22 
 12
s15 
s25 
s35 

s45 
s55 
s51 
s21 
s31 

s41 
s11 
s12 
s 22 
 w11
w
 21
 w31

 w41
 w51
 w11
w
 51
 w31

 w41
 w21
w12
w13
w14
w22
w32
w23
w33
w24
w34
w42
w43
w44
w52
w53
w54
w15
w13
w14
w55
w35
w53
w33
w54
w34
w45
w43
w44
w25
w23
w24
W
W   T11
 w12
w15 
w25 
w35 

w45 
w55 
w12 
w52 
w32 

w42 
w22 
 s11
s
 21
 s31

 s41
 s51
 s11
s
 51
 s31

 s41
 s21
s12
s13
s14
s22
s32
s23
s33
s24
s34
s42
s43
s44
s52
s53
s54
s15
s13
s14
s55
s35
s53
s33
s54
s34
s45
s43
s44
s25
s23
s24
w12 
S11
S


sT
w 22 
 12
s15 
s25 
s35 

s45 
s55 
s12 
s52 
s32 

s42 
s22 
s12 
s 22 
Whenconverged,we can recoverˆ 1   fromW knowing that
W11
 wT
 12
w12  11 12   I

w22   12T  22  0T
0
1
In such case, we can use theprestoredmatricesˆ andW as follows:
ˆ   ˆˆ
12
tr(W -1S ) - p   || W-1 ||1  
22
ˆ22  1 /( w22  w12T ˆ )
*O. Banerjee, L. E. Ghaoui, A. d’Aspremont. Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. Journal of Machine Language Research 101, 2007
Graphical Lasso Model(3)
*Software under development @ Oxford Complex Systems Group
with Nick Jones
*Results shown for Google Trend Dataset
THE END
27