Fan Guo, Steve Hanneke, Wenjie Fu, Eric P. Xing School of Computer Science, Carnegie Mellon University 11/7/2015 ICML 2007 Presentation.

Download Report

Transcript Fan Guo, Steve Hanneke, Wenjie Fu, Eric P. Xing School of Computer Science, Carnegie Mellon University 11/7/2015 ICML 2007 Presentation.

Fan Guo, Steve Hanneke, Wenjie Fu, Eric P. Xing
School of Computer Science, Carnegie Mellon University
11/7/2015
ICML 2007 Presentation
1
Physicist Collaborations
High School Dating
The Internet
11/7/2015
All the images are from http://www-personal.umich.edu/~mejn/networks/. That page includes original citations.
Model for the Yeast cell
cycle transcriptional
regulatory network
Fig. 4 from (T.I. Lee et al., Science 298,
799-804, 25 Oct 2002)
Protein-Protein
Interaction
Network in S.
cerevisiae
Fig. 1 from (H. Jeong et al.,
Nature 411, 41-42, 3 May
2001)
11/7/2015
The small image is from http://www.raiks.de/img/dyna_title_zoom.jpg
3


Infer the hidden network topology from node
attribute observations.
Methods:
Optimizing a score function; Information-theoretic
approaches; Model-based approach …

11/7/2015
Most of them pool the data together to infer a
static network topology.
ICML 2007 Presentation
4

Network topologies and functions are not static:


Social networks can grow as we know more friends
Biological networks rewire under different conditions
Fig. 1b from Genomic analysis of regulatory network dynamics
reveals large topological changes
N. M. Luscombe, et al. Nature 431, 308-312, 16 September 2004
11/7/2015
ICML 2007 Presentation
5


11/7/2015
Network topologies and functions are not
always static.
We propose probabilistic models and
algorithms for recovering latent network
topologies that are changing over time from
node attribute observations.
ICML 2007 Presentation
6

11/7/2015
Networks rewire over discrete timesteps
Part of the image is modified from Fig. 3b (E. Segal et al., Nature Genetics 34, 166-176, June 2003).
Transition Model
Emission Model
11/7/2015
ICML 2007 Presentation
8


11/7/2015
Latent network structures are of higher
dimensions than observed node attributes
 How to place constraints on the latent space?
Limited evidence per timestep
 How to share the information across time?
ICML 2007 Presentation
9

Energy-based conditional probability model
(recall Markov random fields…)
p( x | y ) 
e  E ( x, y )
1



exp

(
x
,
y
)
  k Ck

 E ( x, y )
Z
(
y
)
k


e

x

11/7/2015
Energy-based model is easier to analysis, but
even the design of approximate inference
algorithm can be hard.
ICML 2007 Presentation
10

Based on our previous work on discrete temporal
network models in the ICML’06 SNA-Workshop.


Model network rewiring as a Markov process.
An expressive framework using energy-based local
probabilities (based on ERGM):


p At At 1 

1

t
t 1 
exp


A
,
A

 i i 
Z  At 1 ,  
 i

Features of choice:
1   Aijt ,
ij
 2    Aijt Aijt 1  1  Aijt 1  Aijt 1   ,
ij
A A A

A A
t
ij
3
t 1
ik
t 1
kj
ijk
t 1
ik
t 1
kj
ijk
(Density)
11/7/2015
ICML 2007 Presentation
(Edge Stability)
(Transitivity)
11


Given the network topology, how to generate the binary
node attributes?
Another energy-based conditional model:


1
t
t
t
p x A , 
exp    ij  xi , x j , Aij ,  ij  
t
Z  A , , 
 ij





11/7/2015
t
t

All features are pairwise which induces an undirected graph
corresponding to the time-specific network topology;
Additional information shared over time is represented by a matrix of
parameters Λ;
The design of feature function Φ is application-specific.
ICML 2007 Presentation
12

The feature function
 ij  Aijt  ij  2 xit  1 2 xtj  1


11/7/2015
If no edge between i and j,
Φ equals 0;
Otherwise the sign of Φ
depends on Λij and the
empirical correlation of
xi, xj at time t.
ICML 2007 Presentation
13
Hidden rewiring networks
Initial network to
define the prior
on A1
Time-invariant parameters
dictating the direction of pairwise
correlation in the example
11/7/2015
ICML 2007 Presentation
14

A natural approach to infer the hidden networks A1:T is Gibbs
sampling:
t
t 1
t
t 1
t


To evaluate the log-odds

t
ij

 log
P A

,x 
P Aij  1 A , Aij , A , x
t
ij
 0 At 1 , At ij , At 1
t
Conditional probabilities in a Markov blanket
Tractable transition model; the
partition function is the product
of per edge terms

Computation is straightforward

p xt At ,  


1
exp    ij  xit , xtj , Aijt ,  ij  
Z  A , , 
 ij

t
Given the graphical structure, run
variable elimination algorithms,
works well for small graphs
11/7/2015
ICML 2007 Presentation
15


11/7/2015
Grid search is very helpful, although Monte
Carlo EM can be implemented.
Trade-off between the transition model and
emission model:

Larger θ : better fit of the rewiring processes;

Larger η : better fit of the observations.
ICML 2007 Presentation
16

Data generated from the proposed model.
Starting from a network (A0) of 10 nodes and 14 edges.
The length of the time series T = 50.

Compare three approaches using F1 score:






11/7/2015
avg: averaged network from “ground truth”
(approx. upper bounds the performance of any static network inference algorithm)
htERG: infer timestep-specific networks
sERG: the static counterpart of the proposed algorithm
Study the “edge-switching events”
ICML 2007 Presentation
17

F1 scores on different parameter settings (varying 2 , )
1  0.5,3  4, D  5, 100k iterations of Gibbs sampling, 10 repetitions
11/7/2015
ICML 2007 Presentation
18

F1 scores on different number of examples
1  0.5,2  4,3  4,  1,100k iterations of Gibbs sampling, 10 repetitions
11/7/2015
ICML 2007 Presentation
19

Summary on capturing edge switching in networks

Three cases studied: offset, false positive, missing (false negative)
mean and rms of offset timesteps
1  0.5,2  4,3  4,  1, D  5, 100k iterations of Gibbs sampling, 10 repetitions

11/7/2015
ICML 2007 Presentation
20

The proposed model was applied to infer the muscle development subnetwork (Zhao et al., 2006) on Drosophila lifecycle gene expression data
(Arbeitman et al., 2002).

11 genes, 66 timesteps over 4 development stages

Further biological experiments are necessary for verification.
Network in
(Zhao et al. 2006)
11/7/2015
ICML 2007 Presentation
Embryonic
Larval
Pupal & Adult
21



11/7/2015
A new class of probabilistic models to address the problem
of recoving hidden, time-dependent network topologies and
an example in a biological context.
An example of employing energy-based model to define
meaningful features and simplify parameterization.
Future work

Larger-scale network analysis (100+?)

Developing emission models for richer context
ICML 2007 Presentation
22

Yanxin Shi
CMU

Wentao Zhao
Texas A&M University

Hetunandan Kamisetty
CMU
11/7/2015
ICML 2007 Presentation
23
11/7/2015
ICML 2007 Presentation
24