Transcript slides

Evolutionary Clues Embedded In
Network Structure
——EPJB,85,106(2012)
Zhu Guimei
NGS Graduate School for Integrative Science & Engineering,
Centre for Computational Sciences & Engineering,
National University of Singapore
1
Outline
• Introductions
• Localizations on complex networks
• Evolutionary ages
• Conclustions
2
Objective and Scopes
 Detecting structural patterns at different scales :multi-scale structure
 Finding an intresting network evolution mechanisms based on multiscale structure networks
.
3
Complex Networks
Real-World Networks
Communication networks: telephone, internet, www…
Transportation networks: airports, highways, rail, electric power…
Biological networks: genetic ,protein-protein interaction, metabolic…
Social networks: friendship networks, collaboration networks…
C
A
B
scientific collaborations networks
Internet networks
4
Protein networks
Structure, Functions, Dynamics
Function
Structure
Degree
Motif
Modularity
…
?
Dynamic Process
at different structure scale
Dynamics
Mass
Energy
Signal
Information…
So Structure measures is the cornerstone for
understanding the relations between structure, dynamic,
function
5
How to measure: Multi-Scale Structure ?
Dynamics on Different Structure Scales:
Microscopic
?
Degree
what is a Meso (midterm) pattern?
Motif
clustering coefficient
…
Macroscopic
Modules
6
Define the Mesoscopic pattern
In Physics: Mesoscopic has been well defined
Materials that have a relatively intermediate length scale in condensed matter physics:
size between
molecules
microns
BUT in Complex networks: not yet well defined
We detect different structures patterns through localization method.7
Detect structure through localization: how?
We map networks to large clusters (nodes as atoms; edges as bonds)
Consider an undirected complex network with N identical nodes,
topological structure can be described by an adjacency matrix Aij (or Laplace matrix Lij ).
For an electron moving in such a molecule, the tight-binding Hamiltonian is:
N
N
n 1
mn
Hˆ    n n n   Amn  t mn  m n
Huckel Model
Adjacent matrix
Laplace matrix:
Aij
 i   0  0,(i  1, 2,3
HA
N)
tmn  1
(if nodes i and j are connected,
Aij
is 1, otherwise 0. Diagonal member all are 0)
N
  Aij  ki i  j 
Lij   j
 ki ij  Aij
 Aij
i  j 

8
Emergence of different scale structures on complex Networks
Dynamics on networks: Diffusive process
Transport processes on networks:
from micro- to macro- scales
2.Tight-binding Hamiltonian
Laplace Matrix
Diffusive process
N
N
n 1
m n
Hˆ    n n n   Lmn  tmn  m n
Huckle Model
 i  ki  0(i  1,2,3...N )
tmn  1
H L
Structures of networks:
Motif
…..
…..
Micro-scale
module
macro-scale
Different Eigenvalues represented
9
different structure patterns
How to describe Localizations on complex networks?
The localization properties of electrons in the clusters can be used as measures of the
structural properties of the networks.
detect different structure patterns from the spectra of complex networks.
The eigenvalues of L can be ranked as, 0  1  2   N
They correspond to the eigenfunctions from high to low energies.
10
Different Eigenvalues represented different structure
Scale patterns, how?
In Eigen space: (for complex networks)
 each eigenstate represents a specific wave function,
 they are sensitive to the structural patterns matching in size with its
wavelength.
Eigenvectors associated with small eigenvalues, usually have large wavelengths, and so they are
sensitive to perturbation on a large size of nodes in networks.
Eigenvectors associated with large eigenvalues, have small wavelengths, are most sensitive to
localized perturbations that are applied to a small set of nodes in the network.
Hence, the eigenvalues from 2 to N can detect the structural patterns from macro- to micro-scales.
11
Eigenvalues sensitive to structural patterns matching
in size with its wave length
(a): The eigenstates on a perfect regular network are periodic waves with the wavelengths from  to 2.
(b): we construct a local deformation in the segment from the 40th to the 60th node by adding edges . the eigenstates
with large values localize mainly in this region (local peak)
12
Methods to detect Multi-scale structures:
The components of every eigenvector of L: X i
i  1,
2, ,N 
s
Standardized The components of the eigenvectors:X i 
max(X i ) : is value of the largest component.
Then a threshold 
respectively.
s
Xi
, i  1,2,, N .
max X i 
can be used to identify the nodes involved in the scale structure,
s
The nodes with large values of standardized components (   ) are regarded as the nodes involved
in the corresponding scale structure.
For each scale structure, the components of the nodes involved in it are
distinguishably large compared with others. Hence, the -based results are robustness.
13
The Santa Fe Cooperation network (part)
We consider a part of the largest component of the Santa Fe Institute collaboration network, N=76
largest eigenvalues
7675 74 can detect the three hubs 40, 7 and 67 (red color).
73
:involves a group of nodes
numbered 17 ~ 25 (green nodes),
72 : nodes 26 ~29 and 34 also (cyan)
70 :41 ~47 (blue),
69 :1 ~ 6 (magenta),
68 :48 ~ 53 (violet)
...
With the decrease of eigenvalue , clusters
in much larger scale can be identified (not
shown).
14
Eigenmodes and Average Evolutionary Age:
BA Scale Free network
Three scale-free networks :
With edge density w = 2, 4, 8,
(a–c) average evolutionary ages,
(d–f) average degree (on a
logarithmic scale),
(g–i) size of eigenvector versus the
eigenvalue index i.
Eigenvectors associated with large
eigenvalues generally have small sizes,
but their ages are “older” in the network.
15
Eigenmodes and Average Evolutionary Age:
BA Scale Free network
16
Eigenvalue compared with Degree: to describe the Average Evolutionary Age
Eigenmodes and Average Evolutionary Age:
Scale-free networks generated by other mechanism
Scale-free networks generated
by
duplication/divergence-based
mechanism from PPI network of
the Baker’s Yeast,
 (d) Average age versus
degree. Because of large
fluctuation, the degree cannot
give age-related information,
but the eigenvalues can.
17
Y11k: PPI network: Evolutionary Age
Yeast 11k network:
 original: 5400 yeast proteins : 80 000 interaction.
focused on 11 855 interactions with high and medium confidence among 2617 proteins.
 But finally, we only consider the part of the largest component of 2235 proteins from the 2617
proteins.
Protein-protein interaction networks: Isotemporal Classification of Proteins
First , classified all yeast proteins into four isotemporal categories:
prokaryotes, eukarya, fungi, yeast only (the yeast without annotation).
 Based on the university tree of life,
we assign evolutionary age 4,3,2,1 from ancient to modern for each group of prokaryotes-4,
eukarya-3, fungi-3, and yeast only-1, respectively.
(1). C. von Mering et al., Nature 417, 399 (2002).
18
Eigenmodes and Average Evolutionary Age:
PPI Network
For the largest connected
component of the PPI network
of the baker’s yeast with 2235
nodes,
(d) Average age versus
degree.
We see that degree does not
reveal age-related information.
19
Summary
The localization properties of the eigenvectors from high
to low energies can detect patterns from micro- to macroscales.
Interestingly, the patterns contains significant clues of
evolutionary ages.
20
References
(1) G.M. Zhu, H.J. Yang, R.Yang, J. Ren, B. Li, and Y.-C. Lai, European Physical Journal B,
85, 106 (2012).
(2). G.M. Zhu, H.J. Yang, C. Yin, B. Li, Localizations on Complex Networks, Phys. Rev. E 77,
066113 (2008).
(3). H.J. Yang, C. Yin, G.M. Zhu, B. Li, Phys. Rev. E 77, 045101(R) (2008)
21
22
23
Complex Networks: Nontrivial Properties
A: random; small-world; scale-free(power law degree
distribution);
B: motif, modularity, hierarchy,
C: fractal properties, and so on.
…..
A
ER random networks, N=100, link connect ion probability p=0.02
B
C
SW networks, link rewiring probability r=0.1
Hierarchical networks
cauliflower are fractal in nature.. self similarity
Santo Fortunato, Physics Reports 486 (2010) 75174
24
BA scale free network, N=100, average degree w=2
Complex Networks: Basic Concepts
Structure Description
Hierarchical Description: Module
Function
25
Structure Multi-Scale Measures
Graph theory
Bioinformatics
Social Nets
Degree
Clustering coefficient
Shortest path
Small-world
Scale-free
micro
Node/edge-based
average
Motif
What is more?
Community
Hierarchy
clustering
macro
global
(Newman)
Dynamics:
Micro
To
Macro
Dynamics process is the bridge between structure and functions
R. Albert and A. -L. Barabasi, Rev. Mod. Phys. 47(2002);
M. Newman, SIAM Review 45, 167-256 (2003);
C. Song, et. al., Nature 433,6392(2005); Nature Physics 2,275(2006).
26
What is a Mesoscopic pattern?
In Physics: Mesoscopic has been well defined
Materials that have a relatively intermediate length scale in condensed matter physics:
a quantity of atoms such as molecules
size between
materials measuring microns
In complex networks: not yet well defined
Could regard it as community in complex networks (but there are also other
formations like trees or stars structures)
We define it as intermediate length scale structures based on
structure induced localization.
27
What is community on complex networks?
Community(clusters, modules)
groups of vertices : characterized by having more internal than
external connections between them.
Share common properties
and/or play similar roles within the graph.
Community detect methods
Graph partitioning,
hierarchical clustering
Partitioning clustering
Spectral clustering
It is a hot topic but even the definition of a community is a
controversial issue. people are still improving the methods to
detect the true communities in real world.
Santo Fortunato, Physics Reports 486 (2010) 75174
Fortunato, S., and C. Castellano, 2009,(Springer, Berlin, Germany), volume 1, eprint arXiv:0712.2716.
28
Y11k: PPI network: multi-scale analysis
29
How to Detect community?
Several
methods
Graph partitioning: dividing the vertices in g groups of predefined size
Traditional
Methods
Hierarchical clustering: definition of a similarity measure between vertices
Partitional clustering: separate the points in k clusters such to maximize or minimize a
given cost function based on distances between points.
Spectral Clustering: eigenvectors of matrix Adjacent or Laplace.
Divisive
algorithms
The algorithm of Girvan and Newman: according to the values of measures of
edge centrality, estimating the importance of edges according to some property or
process running on the graph
Modularity-based methods
Modularity optimization
Modifications of modularity
Limits of Modularity
Spectral algorithms: Use the eigenvalue and eigenvectors
Ahn, Y. Y., J. P. Bagrow, et al. (2010). "Link communities reveal multiscale complexity in networks." Nature 466(7307): 761-U711.
30
Small world, scale free, whole –cell networks
WSSW model: we construct first a regular circular lattice with each node connecting with its d
right-handed nearest neighbors. For each edge we rewire it with probability to another
randomly selected node. Self- and double edges are forbidden.
The BASF : preferential growth mechanism. Starting from several connected nodes as a seed,
at each growth step a new node is added and w edges are established between this node and
the existing network. The probability for an existing node to be connected with the new node is
proportional to its degree. Self- and double edges are forbidden. For the resulting networks, the
number of edges per node obeys a power law.
Whole-cell networks: consider cellular functions such as intermediate metabolism and
bioenergetics, information pathways, electron transport, and transmembrane transport. The
directed edges are replaced simply with nondirected edges. We consider only cellular networks
with sizes larger than 500.
31
Statistical properties of the spectra
The PDF of the Nearest Neighbor Level Spacing(NNLS) distribution obeys the Brody distribution:
 1
 s 
  1
U s   s exp 

  
    1 

   
 2
(localized)
(extended)

  
where s is the NNLS and
 the characteristic distribution width.
In order to obtain the value of 
accumulated function:
, we use the
C (s)   U x dx
s
0
some trivial calculations lead to:
  1 
   ln s   ln 
ln R( s)  ln ln
  1  C s  
From this formula, we can determine the values of 
and
.

Fig . Value of Brody parameter 
versus network
32
parameters pr and w. (a) WSSW and (b) BASF networks.
Wavelets Transform
Wavelets are mathematical functions that cut up data into different frequency
components, and then study each component with a resolution matched to its scale.
They have advantages over traditional Fourier methods in analyzing physical
situations where the signal contains discontinuities and sharp spikes.
Wavelets were developed independently in the fields of mathematics, quantum
physics, electrical engineering, and seismic geology. Interchanges between these
fields during the last ten years have led to many new wavelet applications such as
image compression, turbulence, human vision, radar, and earthquake prediction.
http://www.amara.com/IEEEwave/IEEEwavelet.html
33
Fractals properties on networks: Wavelet
transform (WT)
We assume the probability values has been sorted in ascending order:
  1  2    N 
WT can detect the fractal properties based on the ascending-order-ranked
series  . As a standard procedure, we first find the WT maximal values:
T a,  a k  k , k
g
k
1
2
,, k J 
where a is the given scale. The partition function should scale in
the limit of small scales as
kJ
Z (a, q)   Tg a, k a  ~ a q 
q
k  k1
The fractal dimension Dh  (statistical subsets properties) can
be obtained through the Legendre transform:
Dh   qh   q , h 
d q 
dq
Local Hurst exponent h: denotes local subsets
Positive q,  q  reflects the scaling of large fluctuations
Negative q,  q  reflects the scaling of small fluctuations
Fig5. The branched multifractal behavior for the whole cell network
of M. jannaschii is presented as a typical example.
34
Structure, Functions, Dynamics
• Structural measures:
(cornerstone for understanding the relations)
• Functions
?
• Dynamics:
(can be regarded as the transport progresses of )
Degree
Clustering coefficient
Shortest path
Dynamic diffusive Process
at different structure scale
Mass
Energy
Singal
Informations
And so on
L. K. Gallos, C. Song, S. Havlin, Proc. Natl. Acad. Sci. U.S.A. 104, 7746 (2007).
H. Yang, C. Yin, G. Zhu, and B. Li, Phys. Rev. E 77,045101(R) (2008)
Zhu, G.M., Yang H., Yin C., Li B., Physical Review E, 2008. 77(6)
35
Protein-protein interaction networks
DNA sequence
Protein-protein interaction networks
Y1
Y3
Y2
Proteins
Functions of Proteins realized by
Protein-protein interactions
Functions
1. Signal transduction:
interactions between signaling molecules
Protein-protein
interactions
2. Protein complex
* One carries another, e.g, from cytoplasm to nucleus
* One modify another
* complex formation often serves to activate
or inhibit one or more of the associated proteins
36
Metabolic networks (life processes)
metabolism of an organism, the basic chemical system
that generates essential components
(1) such as amino acids, sugars and lipids,
(2) and the energy required to synthesize them
(3) and to use them in creating proteins and cellular structures.
This system of connected chemical reactions is a metabolic network.
37