Modularity in Biological networks

Download Report

Transcript Modularity in Biological networks

Modularity in Biological networks
Modularity in Cellular Networks
 Hypothesis:
Biological function are carried by discrete functional modules.
Hartwell, L.-H., Hopfield, J. J., Leibler, S., & Murray, A. W., Nature, 1999.
 Traditional view of modularity:
Question: Is modularity a myth, or a structural property of biological networks?
(are biological networks fundamentally modular?)
Modularity in cell biology
Definition of a module
• Loosely linked island of densely
connected nodes
• Groups of co-expressed genes
Concept of modules in a network
Concept of modules in a network
Definition of a module
Computational analysis of
modular structures
Data clustering approach
Concept of data clustering analysis
• Partitioning a data set into groups so
that points in one group are similar to
each other and are as different as
possible from the points in other groups.
• The validity of a clustering is often in
the eye of beholder.
Concept of data clustering analysis
• In order to describe two data points are
similar or not, we need to define a similarity
measure.
• We also need a score function for our
objectives.
• A clustering algorithm can be used to
partition the data set with optimized score
function.
Types of clustering algorithms
• Partition-based clustering algorithms
• Hierarchical clustering algorithms
• Probabilistic model-based clustering
algorithms
Partitioning problem
• Given the set of n nodes network
D={x(1),x(2),∙∙∙,x(n)}, our task is to find
K clusters C={C1,C2,∙∙∙,CK} such that each
node x(i) is assigned to a unique cluster
Ck with optimized score function
S(C1,C2,∙∙∙,CK).
Community structure of biological network
Community 1
Community 2
Community 3
Score function for network clustering
• To maximize the intra group connections
as many as possible and to minimize the
inter group connection as few as
possible.
Spectral analysis clustering algorithm
Adjacency Matrix
• Aij= 1
protein
if ith protein interacts with jth
• Aij=0 otherwise
• Aij=Aji
(undirected graph)
• Aij is a sparse matrix, most elements of Aij
are zero
Spectral analysis
 




0


0





 
Algorithm (Spectral analysis)
• Randomly assign a vector X=(X1,X2,…,Xn)
• Iterate X(k+1)=AX(k) untill it converges
• Try another vector which is perpendicular
to previous found eigenspace
Topological Structure
Original Network
Hidden Topological
Structure
An example
Protein-protein interaction network of
Saccharomyces cerevisiae
Data source
Assign 80000 interactions of 5400
yeast proteins a confidence value
We take 11855 interactions with high and
medium confidence among 2617 proteins
with 353 unknown function proteins.
Quasi-bipartite
Positive eigenvalue
Quasi-clique
negative eigenvalue
• With the spectral analysis, we obtain 48
quasi-cliques and 6 quasi-bipartites.
• There are annotated proteins,
unannotated and unknown proteins
within a quasi-clique
Application—function prediction
Hierarchical clustering algorithm
• A similarity distance measure between
node i and j, d(i,j)
• The similarity measure can be let the
network to be a weighted network Wij.
Types of hierarchical clustering
• Agglomerative hierarchical clustering
• Divisive hierarchical clustering
Properties of similarity measure
• d(i,j)≥0
• d(i,j)=d(j,i)
• d(i,j)≤d(i,k)+d(k,j)
Similarity measure for
agglomerative clustering
• Correlation
• Shortest path length
• Edge betweenness
How good is agglomerative
clustering ?
Hierarchical tree (Dendrogram)
threshold
Distance between clusters
Cluster 2
Cluster 1
Single link
Distance between clusters
Cluster 2
Cluster 1
Complete link
2
2.5 5.39 5 
 0


0
1.5
5
5.29
 2
D   2.5 1.5
0
3.5 4.03


3.5
0
2 
 5.39 5
 5

5
.
29
4
.
03
2
0


1.5 2.0
x2
x3
x1
x4
x5
2.2
Single link
3.5
Divisive hierarchical clustering
M.E.J., Newman and M. Girvan, Phys. Rev. E 69, 026113, (2004)
Definition of edge betweeness
j
i
number of path passing through edge k 2
Bk (i, j ) 

number of path connecting node i and j 5
Definition of edge betweeness
edge betweenness of edge k   Bk (i, j )
i, j
2
scaled edge betweenness of edge k 
Bk (i, j )

( N  2)(N  3) i , j
Calculation of edge betweenness
Quantitative measurement of
network modularity
Modularity Q
Q   eii  a
2
i
i
ai   eij
j
eij is thefractionof edges in networkconnecting
module i and j
Threshold selection
Karate club network
Karate club network
Examples of agglomerative
hierarchical clustering
Can we identify the modules?
J (i, j )
OT (i, j ) 
J(i,j): # of nodes both i and j link to; +1 if there is a direct (i,j) link
min(ki , k j )
Modules in the E. coli metabolism
E. Ravasz et al., Science, 2002
Pyrimidine metabolism
Yeast signaling proteins in MIPS
PNAS, vol.100, pp.1128, (2003).
Aij 
1
lij
2
lij : short estpat h bet ween prot eini,j
Spotted microarray for
Saccharomyces cerevisiae
Similarity measure
Regulatory module network
Genome Biology, 9, R2, (2008).