Network Inference - University of Glasgow :: School of
Download
Report
Transcript Network Inference - University of Glasgow :: School of
Network Inference
Umer Zeeshan Ijaz
1
Overview
•Introduction
•Application Areas
• cDNA Microarray
• EEG/ECoG
•Network Inference
• Pair-wise Similarity Measures
• Cross-correlation
• Coherence
• Autoregressive
• Granger Causality
• Probabilistic Graphical Models
• Directed
• Kalman-filtering based EM algorithm
• Undirected
• Kernel-weighted logistic regression method
• Graphical Lasso-model
STATIC
STATIC
STATIC
STATIC
DYNAMIC
STATIC
Introduction
cDNA Microarray
EoCG/EEG
Cross-correlation based(1)
For a pair of time series xi[t] and xj[t] of lengths n, the sample correlation at lag τ
n
1
Cij [ ]
( xi [t ] xi )( x j [t ] x j )
ˆ iˆ j (n 2 ) t 1
x j [t ]
xi [t ]
Cij [ ]
Measure of Coupling is the maximum cross correlation:
1 n
var(l )
Cii [ ]C jj [ ]
n l n
zij
sij
var(lˆij )
sij max | Cij [ ] |
Use P-Value test to compare zij with a standard
normal distribution with mean zero and variance 1
zij is thescaled value wherelˆij is thelag corresponding to sij
Cross-correlation based (2)
Significance test: ANALYTIC METHOD
Use Fisher Transformation: the resulting distribution is
normal and has the standard deviation of 1 / n 3
1 1 Cij [ ]
F
Cij [ ] ln
2 1 Cij [ ]
sijF
F
zij
var(CijF )
Use scaled value that is expected to behave like
the maximum of the absolute value of a sequence
of random numbers. Using now established
results for statistics of this form, we obtain
therefore that
P[ z ] exp{2 exp[an ( z bn )]}
where P[ z ] Pr{zijF z}
an 2 ln n
bn an (2an ) 1 (ln ln n ln 4 )
*M. A. Kramer, U. T. Eden, S. S. Cash, E. D. Kolaczyk, Network inference with confidence from multivariate time series. Physical review E 79, 061916, 2009
Cross-correlation based (3)
Significance test: FREQUENCY DOMAIN BOOTSTRAP METHOD
1) Compute the power spectrum (Hanning tapered) of each series and average these
power spectra from all the time series
2) Compute the standardized and whitened residuals for each time series
ei [t ] ( ~
xi [ ] / p[ ] )
~
where x [ ] ( x [t ])
1
i
i
3) For each bootstrap replicate, RESAMPLE ei [t ] WITH REPLACEMENT and compute the
surrogate data
xi [t ] (e~i [] / p[])
1
where e~i [ ] is theFourier transformof theresidual ei [t ] resampledwith replacement
4) Compute N s such instances and calculate maximum cross-correlation sˆij for each pair
of nodes i and j
5) Finally compare the bootstrap distribution and assign a p-value
Cross-correlation based (4)
False Detection Rate Test
1) Order m=N(N-1)/2 p-values
p1 p2 ... pm
2) Choose FDR level q
3) Compare each pi to critical value q.i / m
and find the maximum i such that
pk q.k / m
4) We reject the null hypothesis that time
series xi [t ] and x j [t ] are uncoupled for
p1 p2 ... pk
*M. A. Kramer, U. T. Eden, S. S. Cash, and E. D. Kolaczyk. Network inference with confidence from multivariate time series, Physics Review E
79(061916), 1-13, 2009
Coherence based
Coherence: Signals are fully correlated with constant phase shifts, although they
may show difference in amplitude
Cxy ( f ) | Sxx( f ) | / Sxx( f ) * Syy( f )
2
2
Cross-phase spectrum: Provides information on time-relationships between two
signals as a function of frequency. Phase displacement may be converted into time
displacement
( f ) arctan(ImSxy( f ) / Re Sxy( f ))
Coherence based(2)
*S. Weiss, and H. M. Mueller. The contribution of EEG coherence to the investigation of language, Brain and Language 85(2), 325-343, 2003
Granger Causality
Directed Transfer Function: Directional influences between any given pair of
channels in a multivariate data set
Bivariate autoregressive process
P
P
j 1
j 1
P
P
j 1
j 1
X 1 (t ) A11 ( j ) X 1 (t j ) A12 ( j ) X 2 (t j ) E1 (t )
X 2 (t ) A21 ( j ) X 1 (t j ) A22 ( j ) X 2 (t j ) E2 (t )
If the variance of the prediction error is reduced by the inclusion of other series,
then based on granger causality, one depends on another. Now taking the fourier
transform
A11 ( f )
A21 ( f )
A12 ( f ) X 1 ( f ) E1 ( f )
A22 ( f ) X 2 ( f ) E2 ( f )
p
Alm ( f ) lm Alm ( j )e i 2fj
j 1
1 when l m
0 when l m
lm
X 1 ( f ) A11 ( f )
X 2 ( f ) A21 ( f )
where
A12 ( f )
A22 ( f )
H11 ( f ) H12 ( f ) A11 ( f )
H
(
f
)
H
(
f
)
21
22
A21 ( f )
1
Granger causality from channel j to i:
E1 ( f )
E2 ( f )
A12 ( f )
A22 ( f )
1
I
2
j i
| H ij ( f ) |
2
| Aij ( f ) |2
| A( f ) |2
Kalman Filter
- State Space Model (State Variable Model; State Evolution Model)
State Equation
xk 1 Axk wk ,
Measurement Equation
g k Cxk vk ,
E[wk wkT ] Qk N N
xk N1
gk M 1
E[vk vkT ] Rk M M
E[(xk xk|k )(xk xk|k )T ] Pk|k
Measurement Update(Filtering) Time Update(Prediction)
Kk Pk|k1CT [CPk|k1CT Rk ]1
Pk 1|k APk|k AT Qk
Pk|k (I N Kk C)Pk|k1
xk|k xk|k1 Kk [ gk Cxk|k1 ]
xk 1|k Axk|k
Probabilistic graphical models(1)
Joint distribution over a set
X = X1 ,..., X n
Bayesian Networks associate with each variable
P X i | Ui ,Ui X
Xi
a conditional probability
The resulting product is of the form P X 1 ,..., X n = P X i | U i
i
A
D
P(C|A,B)
B
C
E
A
B
0
1
0
0
1
1
0
1
0
1
0.9
0.2
0.9
0.01
0.1
0.8
0.1
0.99
P A,B,C, D, E = P APBPC | A,BPD | APE | C
EM Algorithm: Predicting gene regulatory network
xt 1 Axt wt
gt Cxt vt
xt 1 Axt Bgt wt
gt Cxt Dgt 1 vt
Definenew state vector~
x Tt [ x Tt , g Tt ]T
T hen thenew statespace formis
~
~
x A ~
x w
t 1
0
t
t
g t H~
xt
where
B
A
A0
, H (0 I)
CA CB D
wt ~ Q
QCT
~
, Q
w t
T
Cw t v t
CQ CQC R
Constructing the network:
H 0 : (CB D)i , j 0 (No connection)
H1 : (CB D)i , j 0 (Connection)
EM Algorithm: Predicting gene regulatory network(2)
x t 1 Axt Bg t w t
g t Cx t Dgt 1 vt
B and D are input tothestateand observation matrices,respectively
Conditional distribution of state and observables
P ( x t 1 | x t , g t ) ~ (Axt Bg t , Q)
P (g t | x t , g t 1 ) ~ (Cx t Dgt 1 , R )
1
exp ( x t 1 Axt Bg t )T Q 1 ( x t 1 Axt Bg t )
2
P ( x t 1 | x t , g t )
K /2
1/ 2
(2 ) | Q |
1
exp (g t Cx t Dgt 1 )T R -1 (g t Cx t Dgt 1 )
2
P ( y t | x t , g t 1 )
p/2
1/ 2
(2 ) | R |
Factorization rule for bayesian network
T 1
T
t 1
t 1
P({x t }, {g t } | ) P(x1 ) P(x t 1 | x t , g t ) P(g t | x t , g t 1 )
Unknowns in the system
{A, B, C, D, Q, R, Q1 , μ1}
EM Algorithm: Predicting gene regulatory network(4)
Construct the likelihood
L(θ; x, g) log P(x, g | θ)
Construct the likelihood
θˆ argmaxθ L(θ; x, g)
Marginalize with respect to x and introducing a distribution Q
L(θ; x, g) F (Q(x), θ)
where Q k 1 (x) P( x | g, θ)
argmaxθ L(θ; x, g)
E step: Qk 1 arg maxQ F(Q,θ k )
M step: θ k 1 arg maxθ F(Qk 1 , θ)
Kalman filter based: Inferring network from microarray expression data(5)
θ {A, B, C, D, Q, R, Q1 , μ1}
Let’s say we want to compute C
[2 log P({x t }, {g t } | θ)]
C
T 1
[ (g t Cx t Dgt 1 )R 1 (g t Cx t Dgt 1 )]
t 1
C
T
T
T
t 1
t 1
t 1
g t xt C x t xt D g t 1x t 0
T akingexpectations
1
T
T
T
C g t xˆ t D g t 1xˆ t Pt
t 1
t 1
t 1
We can repeat thesame procedurefor D and obtain otherparameters, but when we will
do that wewill require
xˆ t E[ xˆ t | {g}]
Pt E[ xˆ t xˆ t | {g}]
Pt 1,t E[ xˆ t xˆ t 1 | {g}]
which we will obtain from theKALMAN- SMOOT HER
Kalman filter based: Inferring network from microarray expression data(9)
Experimental Results: A standard T-Cell
activation model
*Claudia Rangel, John Angus, Zoubin Ghahramani, Maria Lioumi, Elizabeth Sotheran, Alessia Gaiba, David L. Wild, Francesco Falciani: Modeling T-cell activation using gene expression profiling and state-space
models. Bioinformatics 20(9): 1361-1372 (2004)
Probabilistic graphical models(2)
Markov Networks represent joint distribution as a product of potentials
P X 1 ,..., X n =
D
1
Z
[c j ]
j
A
C
j
B
A
B
π1(A,B)
0
0
1
1
0
1
0
1
1.0
0.5
0.5
2.0
E
P A, B, C, D, E =
1
π1 A, B π 2 B, C, E π3 C, D π 4 A, D
Z
Kernel-weighted logistic regression method(1)
Pair-wise Markov Random Field
(t )
(t )
(t )
(t )
P ( t ) ( X )
exp uv X u X v
(t )
Z ( )
(u ,v )( t )
x6
1
θ
x1
Logistic Function
P ( t ) ( X
\u
(t )
u
|X )
(t )
\u
exp 2 X u(t ) \(ut ) , X \(ut )
exp 2 X u(t ) \(ut ) , X \(ut ) 1
θ
θ
θ
x5
θ
θ
x2
x8
x4
θ
θ
x3
Log Likelihood
( ; X ) log P ( X
(t )
\u
(t )
(t )
\u
(t )
u
(t )
\u
|X )
Optimization problem
n (t )
(t )
ˆ
\u argminˆ( t ) p1 - w (ti ) ( \(ut ) ; X (ti ) ) || \(ut ) ||1
\u
i 1
a,b a T b
\ u : V u
(t )
\(ut ) { uv
| v \u} is p-1 dimensional
Kernel-weighted logistic regression method(2)
ˆ\(ut )
n (t )
( ti )
(t )
(t )
argminˆ( t ) p1 - w (ti ) ( \u ; X ) || \u ||1
\u
i 1
Kernel-weighted logistic regression method(3)
Interaction between gene ontological groups related to developmental process undergoing dynamic rewiring. The weight of an edge between two
ontological groups is the total number of connection between genes in the two groups. In the visualization, the width of an edge is propotional to the edge
weight. The edge weight is thresholded at 30 so that only those interactions exceeding this number are displayed. The average network on left is produced
by averaging the right side. In this case, the threshold is set to 20
*L. Song, M. Kolar, and E. P. Xing. KELLER: estimating time-varying interactions between genes. Bioinformatics 25, i128-i136, 2009
Graphical Lasso Model(1)
Let Z ( Z1 ,...,Z p ) T be a p - variateGaussian distribution withmean and covariancematrix
- Key Idea : If theijth componentof 1 is zero theni and j are conditionally independent
is an estimatorof ˆ -1 argmax0 logdet tr (S) || ||1
W is an estimatorof ˆ
1
min || W111/ 2 b ||2 || ||1
2
where b W11-1/2 s12 and aftersolvingfor ,
we can obtain w12 W11 .
Empiricalcovariancematrix
1 n (k )
S Z Z (k )
n k 1
W
W T11
w12
w12
w 22
S
S T11
s12
T
s12
s 22
*O. Banerjee, L. E. Ghaoui, A. d’Aspremont. Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. Journal of Machine Language Research 101, 2007
Initialise:W S I
Graphical Lasso Model(2)
Solve the lasso problem for w12 over jth column one at a time
w11
w
21
w31
w41
w51
w55
w
25
w35
w45
w15
w15
w25
w35
w45
w55
s11
s
21
s31
s41
s51
w52 w53 w54 w51
w22 w23 w24 w21
w32 w33 w34 w31
w42 w43 w44 w41
w12 w13 w14 w11
s55
s
25
s35
s45
s15
w12
w13
w14
w22
w32
w23
w33
w24
w34
w42
w43
w44
w52
w53
w54
W
W T11
w12
s12
s13
s14
s22
s32
s23
s33
s24
s34
s42
s43
s44
s52
s53
s54
s52
s53
s54
s22
s32
s23
s33
s24
s34
s42
s43
s44
s12
s13
s14
w12
S11
S
sT
w 22
12
s15
s25
s35
s45
s55
s51
s21
s31
s41
s11
s12
s 22
w11
w
21
w31
w41
w51
w11
w
51
w31
w41
w21
w12
w13
w14
w22
w32
w23
w33
w24
w34
w42
w43
w44
w52
w53
w54
w15
w13
w14
w55
w35
w53
w33
w54
w34
w45
w43
w44
w25
w23
w24
W
W T11
w12
w15
w25
w35
w45
w55
w12
w52
w32
w42
w22
s11
s
21
s31
s41
s51
s11
s
51
s31
s41
s21
s12
s13
s14
s22
s32
s23
s33
s24
s34
s42
s43
s44
s52
s53
s54
s15
s13
s14
s55
s35
s53
s33
s54
s34
s45
s43
s44
s25
s23
s24
w12
S11
S
sT
w 22
12
s15
s25
s35
s45
s55
s12
s52
s32
s42
s22
s12
s 22
Whenconverged,we can recoverˆ 1 fromW knowing that
W11
wT
12
w12 11 12 I
w22 12T 22 0T
0
1
In such case, we can use theprestoredmatricesˆ andW as follows:
ˆ ˆˆ
12
tr(W -1S ) - p || W-1 ||1
22
ˆ22 1 /( w22 w12T ˆ )
*O. Banerjee, L. E. Ghaoui, A. d’Aspremont. Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. Journal of Machine Language Research 101, 2007
Graphical Lasso Model(3)
*Software under development @ Oxford Complex Systems Group
with Nick Jones
*Results shown for Google Trend Dataset
THE END
27