Transcript Slide 1

LECTURE 3
1. Complex Network Models
2. Properties of Protein-Protein Interaction
Networks
3. Usage of KNApSack Database
Complex Network Models:
Average Path length L, Clustering coefficient C, Degree
Distribution P(k) help understand the global structure
of the network.
Some well-known types of Network Models are as
follows:
•Regular Coupled Networks
•Random Graphs
•Small world Networks
•Scale-free Networks
•Hierarchical Networks
Regular networks
Regular networks
Diamond Crystal
Both diamond and
graphite are carbon
Graphite Crystal
Regular network (A ring lattice)
Average path length L is
high
Clustering coefficient C is
high
Degree distribution is delta
type.
P(k)
1
1
2
3 4
5
Random Graph
Erdos and Renyi introduced the concept of random
graph around 60 years ago.
Random Graph
N=10
Emax = N(N-1)/2
=45
p=0.1
p=0
p=0.15
p=0.25
Random Graph
Average path length L is
Low
Clustering coefficient C is
low
Degree distribution is
exponential type.
p=0.25
P(k )  e


k
k!
Random Graph
Usually to compare a real network with a random
network we first generate a random network of the
same size i.e. with the same number of nodes and
edges.
Other than Erdos Reyini random graphs there are
other type of random graphs
A Random graph can be constructed such that it matches the
degree distribution or some other topological properties of a
given graph
Geometric random graphs:
each vertex is assigned random coordinates in a geometric
space of arbitrary dimensionality and random edges are
allowed between adjacent points or points constrained by
a threshold distance.
Geometric random graph: Example
Small world model (Watts and Strogatz)
Oftentimes,soon after meeting a stranger, one is surprised to
find that they have a common friend in between; so they both
cheer:
“What a small world!”
What a small world!!
Small world model (Watts and Strogatz)
Randomly rewire each edge
Begin with a nearest-neighbor of the network with some
coupled network
probability p
Small world model (Watts and Strogatz)
Average path length L is
Low
Clustering coefficient C is
high
Degree distribution is
exponential type.
P(k)
Scale-free model (Barabási and Albert)
Start with a small number of nodes; at every time step,
a new node is introduced and is connected to alreadyexisting nodes following Preferential Attachment
(probability is high that a new node be connected to
high degree nodes)
Average path length L is
Low
Clustering coefficient C is
not clearly known.
Degree distribution is
power-law type.
1
0.1
γ=2
0.01
γ=3
0.001
P(k) ~
k-γ
0.0001
1
10
100
1000
Scale-free networks exhibit robustness
Robustness – The ability of complex systems to maintain their
function even when the structure of the system changes significantly
Tolerant to random removal of nodes (mutations)
Vulnerable to targeted attack of hubs (mutations) – Drug
targets
Scale-free model (Barabási and Albert)
The term “scale-free” refers to any functional form
f(x) that remains unchanged to within a
multiplicative factor under a rescaling of the
independent variable x i.e. f(ax) = bf(x).
This means power-law forms (P(k) ~ k-γ), since
these are the only solutions to f(ax) = bf(x), and
hence “power-law” is referred to as “scale-free”.
Hierarchical Graphs
NETWORK BIOLOGY: UNDERSTANDING THE CELL’S FUNCTIONAL ORGANIZATION
Albert-László Barabási & Zoltán N. Oltvai
NATURE REVIEWS | GENETICS VOLUME 5 | FEBRUARY 2004 | 101
The starting point of this construction
is a small cluster of four densely
linked nodes (see the four central
nodes in figure).Next, three replicas of
this module are generated and the
three external nodes of the replicated
clusters connected to the central node
of the old cluster, which produces a
large 16-node module. Three replicas
of this 16-node module are then
generated and the 12 peripheral nodes
connected to the central node of the
old module, which produces a new
module of 64 nodes.
Hierarchical Graphs
The hierarchical network model seamlessly integrates a scale-free topology with
an inherent modular structure by generating a network that has a power-law
degree distribution with degree exponent γ = 1 +ln4/ln3 = 2.26 and a large,
system-size independent average clustering coefficient <C> ~ 0.6. The most
important signature of hierarchical modularity is the scaling of the clustering
coefficient, which follows C(k) ~ k –1 a straight line of slope –1 on a log–log plot
NETWORK BIOLOGY: UNDERSTANDING THE CELL’S FUNCTIONAL ORGANIZATION
Albert-László Barabási & Zoltán N. Oltvai
NATURE REVIEWS | GENETICS VOLUME 5 | FEBRUARY 2004 | 101
NETWORK BIOLOGY: UNDERSTANDING THE CELL’S FUNCTIONAL ORGANIZATION
Albert-László Barabási & Zoltán N. Oltvai
NATURE REVIEWS | GENETICS VOLUME 5 | FEBRUARY 2004 | 101
Comparison of
random, scalefree and
hierarchical
networks
protein-protein interaction
Typical protein-protein interaction
A protein binds with another or several other proteins in
order to perform different biological functions---they are
called protein complexes.
protein-protein interaction
This complex
transport oxygen
from lungs to cells all
over the body through
blood circulation
PROTEINPROTEIN
INTERACTIONS
by Catherine Royer
Biophysics Textbook
Online
protein-protein interaction
PROTEINPROTEIN
INTERACTIONS
by Catherine Royer
Biophysics Textbook
Online
Twenty
amino
acids
Four
nucleotides
Four nucleotides
Network of interactions and complexes
•Usually protein-protein interaction data are produced by
Laboratory experiments (Yeast two-hybrid, pull-down
assay etc.)
detected complex data
A
A
B D
C
E F
A
Bait protein
B
Interacted protein
C D
E
F
Spoke approach
B
F
C
E
D
Matrix approach
•The results of the experiments are converted to binary
interactions.
•The binary interactions can be represented as a
network/graph where a node represents a protein and an edge
represents an interaction.
Network of interactions
AtpB
AtpG
AtpA
AtpB
AtpG
AtpE
00101
00011
10001
01001
11110
AtpA
AtpE
AtpH
AtpH
AtpH
AtpH
List of
interactions
Corresponding
network
Adjacency
matrix
The yeast protein interaction network evolves rapidly and contain
few redundant duplicate genes by A. Wagner.
Mol. Biology and Evolution. 2001
985 proteins and 899
interactions
S. Cerevisiae
giant component consists
of 466 proteins
The yeast protein interaction network evolves rapidly and contain
few redundant duplicate genes by A. Wagner.
Mol. Biol. Evol. 2001
Average degree ~ 2
Clustering coefficient = 0.022
Degree distribution is scale free
An E. coli interaction network from DIP
(http://dip.mbi.ucla.edu/).
Components of this
graph has been
determined by applying
Depth First Search
Algorithm
There are total 62
components
Giant component
93 proteins
300 proteins and 287
interactions
E. coli
An E. coli interaction network from DIP
(http://dip.mbi.ucla.edu/).
2.5
Log(No. of Node)
2
1.5
1
0.5
0
0
0.5
1
1.5
2
Log(Degree)
Average degree ~ 1.913
Clustering co-efficient
= 0.29
Degree distribution ~ scale free
Lethality and Centrality in protein networks by
H. Jeong, S. P. Mason, A.-L. Barabasi, Z. N. Oltvai
Nature, May 2001
Almost all proteins
are connected
1870 proteins and 2240
interactions
S. Cerevisiae
Degree distribution is scale free
PPI network based on MIPS database consisting of 4546 proteins
12319 interactions
Average
degree 5.42
Clustering coefficient =
0.18
Giant
component
consists of
4385 proteins
PPI network
based on MIPS
database
consisting of
4546 proteins
12319
interactions
3.5
3
Degree distribution ~ scale free
2.5
2
1.5
1
0.5
0
0
0.5
1
1.5
2
2.5
3
# of
# of
proteins Interac.
Average
degree
Clustering Giant
Coeffi.
Compo.
Degree
Distribu.
985
899
~2
0.022
Exist
47.3%
Power
law
300
287
1.913
0.29
Exist
31%
Almost
Power
law
1870
2240
______
______
Exist
~100%
Power
law
4546
12319
5.42
0.18
Exist
~96%
Almost
Power
law
A complete PPI network tends to be a connected graph
And tends to have Power law distribution
Introduction to KNApSaCK database
http://kanaya.aist-nara.ac.jp/KNApSAcK/
FT-MS high accurate MW for metabolites [molecular weight (ppm)]
accurate mass: 226.0477
Candidates
of Metabolites
Molecular formula
600
# of candidates for
molecular formula
597
Since 2004
Error level
for FT-MS
251
32
1
0
±1
±1
±0.1
±0.1
±0.01
±0.01
±0.001
±0.001
(Mw  margin )
C10H10O6
KNApSAcK: Species-metabolite relation DB
38
Chorismic acid
Isochorismic acid
Now!
Species
Metabolite
Since 2004
Last update information
50,048 unique metabolites
101,500 species-metabolite relations
39
Current Status of KNApSAcK project
Plant kingdom (Predicted)
-- 200,000 D. Strack and R. Dixon (2003)
Known NPs (Predicted)
-- 50,000 /Plants, Luca and Pierre, (2000)
KNApSAcK(last update)
-- 50,048 unique metabolites
101,500 species-metabolite relations
Model species
Arabidopsis thaliana
-- 5,000 ca. 1/3 of 1200 protein types
Human
-- 2,500
Ryals (2004)
Bacteria (E. coli, B. subtilis)
-- 800 – 1700
Systematization of Speciesmetabolite relation
DB(KNApSAcK)
Basic study:
--Metabolomics (Systems Biol)
-- Evolution of NPs
-- Gene to metabolite relations
Applied works:
-- Food Sciences
-- Health creation
-- Herbal medicine
-- Drug development by Herb.
40
Main window
http://kanaya.naist.jp/KNApSAcK/
We can retrieve metabolite information by:
(a) Name
(Organism, Metabolite)
(g) A list of retrieved metabolites
(b) Mw  margin
(c) Molecular formula
(h) Mode selection
(d) Taxonomic hierarchy
Substrucutre
(e) Ion mass of FT-MS
with ionization mode
41
Metabolites can be linked to KNApSAcK easily
by Keywords (Organism, Metabolite, Molecular Formula)
42
(+)-Sesamin is reported in 122 species
43
Input: Allium cepa
38 Metabolites
44
KNApSAcK(http:/kanaya.naist.jp/KNApSAcK )(Since 2004)
Papers utilized KNApSAcK DB to examine metabolomics
( Thanks!)
Davey, M.P., et al., Metabolomics, (2009)
Hounsome, N. et al., Postharvest Biol. Technol., (2009)
Xie,Z., et al., J.Exp.Botany,60, 87-97, (2009)
Giavalisco, Anal.Chem.(2009)
Draper et al., BMC Bioinformatics, (2009)
Shroff et al.,
6 papers-2009 (Red, Foreign country)
PNAS (2009)
Malitsky, S.,., et al., Plant Physiol., (2008)
Warner, E., et al., J.Chromatography B,(2008)
Fait, A., et al., Plant Physiol., 148, 730-750 (2008)
Mintz-Oron, S., et al., Plant Physiol., 147, 823-851, (2008)
Hanhineva, K., et al., Phytochemistry, 69, 2463-2481 (2008)
Bottcher,C., et al., Plant Physiol.,147,2107-2120, (2008)
Farder, A. et al., J. Nutrition, 138, 1282-1287, (2008)
Mintz-Oron, S., et al., Plant Physiol.,147,823-825, (2008)
17 papers-2008 (Red, Foreign country)
0
Target of Research
., Nature Protocols
15
20
Review Article
Bioinformatics Methodology
Development
10 papers-2007
Sofia, M., et al., Trends in Anal. Chem., 26, 855-866, (2007)
Hummel, J., et al., Topics in Curr. Genet., 18, 75-95, (2007)
Gaida, A., and Neumann, S., J. Int. Bioinf., (2007)
Griffiths,W.J.,Metabolomics,Metabolonomics and Metabolite Profiling,(Royal Soc.Chem.),2007
Ohta, D., et al., Anal.Biol. Chem.(2007)
Nakamura, Y., et al., Planta, (2007)
Suzuki, H., et al., Phytochemistry, (2007)
Sakakibara, K., et al., , J .Biol. Chem.,282, 14932-14941, (2007)
Saito, K. et al., Trends in Plant Sci., 13, 36-42, (2007)
Kikuchi, K and Kakeya, H.,
10
Metabolomics
Non-targeted Analysis
Overy, D.P., et al
, 3, 471-485, (2008)
Dunn, W.B., Physical Biol.,5, 1-24, (2008)
Akiyama, K., In Silico Biol., 8, 27, (2008)
Sawada, Plant Cell Physiol., (2008)
Arita,M. and Suwa, K., BioData Mining, 1,7.1-8 (2008)
Saito, K. et al., Trends in Plant Sci.,13, 36-43, (2008)
Akiyama, K., et al., In Silico Biol., 8, 339-345, (2008)
Takahashi, H., Anal. Bioanal Chem. (in press) (2008)
Iijima, Y., et al., Plant J., 54, 949-962, (2008)
Want, E.J. et al., J. Proteome Res., 6, 459-468, (2007)
5
Natuure Chem. Biol., 2, 392-394, (2006)
0
5
10
Arabidopsis thaliana
Fragaria x ananassa
Salanum lycopersicum
Brassica oleracea
4 papers-2006 Curcuma longa
Oikawa, A.,et al., Plant Physiol., 142, 398-413, (2006)
Shinbo, Y., et al., Biotchnol. Agric. Forestry, 57, 166-181, (2006)
Shinbo, Y., et al., J. Comput. Aided Chem., 7, 94-101, (2006)
since 2004
Web-sites linked to KNApSAcK
(WikiBook)
http://en.wikibooks.org/wiki/Metabolomics/Databases
(UC Davis) http://fiehnlab.ucdavis.edu/staff/kind/Metabolomics/Structure_Elucidation/
(KEGG)
http://fire3.scl.genome.ad.jp/dbget-bin/www_bfind?knapsack
(TAIR-Metabolomics Resource – Databases) http://www.arabidopsis.org/portals/metabolome/metabolome_database.jsp/
(LECO manual) Form No. 203-821-333
E. coli
Rattus norvegicus
45
KNApSAcK(http:/kanaya.naist.jp/KNApSAcK )(Since 2004)
Papers utilized KNApSAcK DB to examine metabolomics
( Thanks!)
Davey, M.P., et al., Metabolomics, (2009)
Hounsome, N. et al., Postharvest Biol. Technol., (2009)
6 papers-2009 (Red, Foreign country)
Xie,Z., et al., J.Exp.Botany,60, 87-97, (2009)
Giavalisco, Anal.Chem.(2009)
Draper et al., BMC Bioinformatics, (2009)
Shroff et al.,
PNAS (2009)
Malitsky, S.,., et al., Plant Physiol., (2008)
Warner, E., et al., J.Chromatography B,(2008)
Fait, A., et al., Plant Physiol., 148, 730-750 (2008)
Mintz-Oron, S., et al., Plant Physiol., 147, 823-851, (2008)
Hanhineva, K., et al., Phytochemistry, 69, 2463-2481 (2008)
Bottcher,C., et al., Plant Physiol.,147,2107-2120, (2008)
Farder, A. et al., J. Nutrition, 138, 1282-1287, (2008)
Mintz-Oron, S., et al., Plant Physiol.,147,823-825, (2008)
17 papers-2008 (Red, Foreign country)
Target of Research
0
5
10
Metabolomics
Non-targeted Analysis
Review Article
Bioinformatics Methodology
Development
., Nature Protocols
Overy, D.P., et al
, 3, 471-485, (2008)
Dunn, W.B., Physical Biol.,5, 1-24, (2008)
Akiyama, K., In Silico Biol., 8, 27, (2008)
Sawada, Plant Cell Physiol., (2008)
Arita,M. and Suwa, K., BioData Mining, 1,7.1-8 (2008)
Saito, K. et al., Trends in Plant Sci.,13, 36-43, (2008)
Akiyama, K., et al., In Silico Biol., 8, 339-345, (2008)
Takahashi, H., Anal. Bioanal Chem. (in press) (2008)
Iijima, Y., et al., Plant J., 54, 949-962, (2008)
Want, E.J. et al., J. Proteome Res., 6, 459-468, (2007)
Natuure Chem. Biol., 2, 392-394, (2006)
5
Fragaria x ananassa
Salanum lycopersicum
Brassica oleracea
4 papers-2006
Oikawa, A.,et al., Plant Physiol., 142, 398-413, (2006)
Shinbo, Y., et al., Biotchnol. Agric. Forestry, 57, 166-181, (2006)
Shinbo, Y., et al., J. Comput. Aided Chem., 7, 94-101, (2006)
since 2004
Web-sites linked to KNApSAcK
(WikiBook)
http://en.wikibooks.org/wiki/Metabolomics/Databases
(UC Davis) http://fiehnlab.ucdavis.edu/staff/kind/Metabolomics/Structure_Elucidation/
(KEGG)
http://fire3.scl.genome.ad.jp/dbget-bin/www_bfind?knapsack
(TAIR-Metabolomics Resource – Databases) http://www.arabidopsis.org/portals/metabolome/metabolome_database.jsp/
(LECO manual) Form No. 203-821-333
Curcuma longa
E. coli
Rattus norvegicus
20
10
Arabidopsis thaliana
Sofia, M., et al., Trends in Anal. Chem., 26, 855-866, (2007)
Hummel, J., et al., Topics in Curr. Genet., 18, 75-95, (2007)
Gaida, A., and Neumann, S., J. Int. Bioinf., (2007)
Griffiths,W.J.,Metabolomics,Metabolonomics and Metabolite Profiling,(Royal Soc.Chem.),2007
Ohta, D., et al., Anal.Biol. Chem.(2007)
Nakamura, Y., et al., Planta, (2007)
Suzuki, H., et al., Phytochemistry, (2007)
Sakakibara, K., et al., , J .Biol. Chem.,282, 14932-14941, (2007)
Saito, K. et al., Trends in Plant Sci., 13, 36-42, (2007)
Kikuchi, K and Kakeya, H.,
0
10 papers-2007
15
46
47
48
Recent information about the research works that used/introduced the KNApSAcK database
49
Recent information about the research works that used/introduced the KNApSAcK database
50
Recent information about the research works that used/introduced the KNApSAcK database
51
Recent information about the research works that used/introduced the KNApSAcK database
52
Recent information about the research works that used/introduced the KNApSAcK database
53
Recent information about the research works that used/introduced the KNApSAcK database
54
Recent information about the research works that used the KNApSAcK database
We welcome more researchers to use the KNApSAck database.
That will encourage us!!
55
We learnt
1.Properties of some complex network
models
2.Properties of Protein-Protein Interaction
Networks
3.Usage of KNApSack Database