Topology of the protein network H. Jeong, S.P. Mason, A.-L. Barabasi, Z.N.

Download Report

Transcript Topology of the protein network H. Jeong, S.P. Mason, A.-L. Barabasi, Z.N.

Topology of the protein network
H. Jeong, S.P. Mason, A.-L. Barabasi, Z.N. Oltvai, Nature 411, 41-42 (2001)
Erdös-Rényi model
(1960)
Connect with
probability p
p=1/6
N=10
k ~ 1.5
- Democratic
- Random
Pál Erdös
(1913-1996)
Degree distribution of a random graph
P(k): the probability that a node has k links
P(k)= Ck N-1 pk (1-p)N-1-k
For large N P(k) can be
replaced by a
Poisson distribution:
P(k)~ e-<k> <k>k/k!
Poisson distribution
Scale-free Network
Exponential Network
World Wide Web
Nodes: WWW documents
Links: URL links
Over 3 billion documents
ROBOT: collects all URL’s
found in a document and
follows them recursively
P(k) ~ k-
R. Albert, H. Jeong, A-L Barabasi, Nature, 401 130 (1999).
Scale-free model
(1) Networks continuously expand
by the addition of new nodes
WWW : addition of new documents
Citation : publication of new papers
(2) New nodes prefer to link to
highly connected nodes.
GROWTH:
add a new node with m links
PREFERENTIAL ATTACHMENT: the
probability that a node connects to a
node with k links is proportional to k.
WWW : linking to well known sites
Citation : citing again highly cited papers
ki
 ( ki ) 
 jk j
P(k) ~k-3
Barabási & Albert, Science 286, 509 (1999)
Other Models
Internet
Metabolic network
Archaea
Bacteria
Eukaryotes
Organisms from all three domains of life are
scale-free networks!
H. Jeong, B. Tombor, R. Albert, Z.N. Oltvai, and A.L. Barabasi, Nature, 407 651 (2000)
http://www.orgnet.com
Many real world networks have a
similar architecture:
Scale-free networks
WWW, Internet (routers and domains), electronic circuits,
computer software, movie actors, coauthorship networks,
sexual web, instant messaging, email web, citations, phone
calls, metabolic, protein interaction, protein domains, brain
function web, linguistic networks, comic book characters,
international trade, bank system, encryption trust net,
energy landscapes, earthquakes, astrophysical network…
Interacting yeast proteins as detected in several studies
Deane, C. M. (2002)
Mol. Cell. Proteomics 1: 349-356
Copyright ©2002 American Society for Biochemistry and Molecular Biology
L.Salwinski & D.Eisenberg. Curr.Op.Struct.Biol. 13 (2003)
J.S. Bader et al.. Nature Biotechnology 22 (2004)
Filtered Yeast Interactome dataset
(Han et al., Nature 430, 2004)
(HT-Y2H) projects
(5,249 potential interactions - union of the available data sets)
Co-IP
(6,630 potential interactions from two datasets)
in silico computational predictions of interactions
(7,446 potential interactions from the 'von Mering' data set
obtained from the union of gene co-occurrence, gene
neighbourhood and gene fusion predictions)
'MIPS protein complexes' published singly in the literature
(9,597 pairwise interactions between components of complexes
MIPS physical interactions
(excluding genome-scale experiments: 1,285 interactions).
J-D. Han et al. Nature 430 (2004)
•UNIPROT DATABASE filtered by SEG
Viruses
Bacteria
Archaea
Ascomycota
Metazoa
Spermatophyta
BLASTP
S. Cerevisiae sequence as a query
BACTERIA
METAZOA
VIRUSES
10-4
ARCHAEA
1
ARCHAEA
Top scoring sequences from each group  parwise SW homology check with 100 randomisations and Z-score cutoff
A-L. Barabasi & Z.N. Oltvai. Nature Reviews Genetics 5 (2004)
A-L. Barabasi & Z.N. Oltvai. Nature Reviews Genetics 5 (2004)
Archaea, Eubacteria, Fungi, Plants, Animals (33/26)

protein synthesis machinery (40S and 60S ribosomal subunits,
translational factors, t-RNA synthetases

basic metabolism (e.g. ATP synthesis , Krebs cycle)

protein folding and degradation (chaperones, proteases)
domains participating in protein-protein interactions (TPR,
WD40)
 +Viruses (16/5)



replication (RNA polymerases, helicases, replication factor C,
ribonucleotide reductase)
protein degradation (19S proteasome)
40S ribosome
Mitochondrial ribosomal
40S subunit
S. Wuchty, Z.N. Oltvai & A-L. Barabasi. Nature Genetics 22 (2003)
Mitochondrial alpha-ketoglutarate
dehydrogenase and pyruvate
degydrogenase complexes
Succinate dehydrogenase
General repressor of transcription
Coatomer
Nuclear
pore
Anaphase-promoting complex
(ubiquitin-protein ligase)
Cyclophilin, heat shock protein
(HSP82) and STI1 inhibitor
Eubacteria, Eukaryota |Archaea (4/4)
catalytic core delta subunit of mitochondrial ATP-ase

tubulin (BtubA/B -Prosthecobacter )

mitochondrial subunits of 60S ribosome

RNA-binding proteins (cleavage factor I)

Central stalk
of mitochondrial
F1F0 ATP synthase
Mitochondrial ribosomal
proteins of the 60S subunit
Subunits of cleavage factor I (HRP1, RNA15),
poly-A binding protein (PAB1, SGN1),
uncharacterized protein YGR250C
Alpha- and
beta-tubulin
Archaea, Eukaryota | Eubacteria (8/8)

RNA polymerase II (non-catalytic
subunits)

60S ribosomal subunits (cytoplasmatic)

splicing (archeal-like LSM proteins)

exosome 3’5’ exoribonuclease complex

20S proteasome
Protein components of large 60S ribosome subunit
Subunits of RNA polymerase II
Eukaryota | Archaea, Eubacteria, Viruses (24/19)

vesicle transport and membrane fusion
(multicompartmental cell)

mitochondrial transporters

regulation of actin cytoskeleton stability
Actin cytoskeleton
regulating complex
Actin capping heterodimer
Cofilin like protein and adenylyl
cyclase-asssociated protein
Mitochondrial inner membrane ATP/ADP translocator
and mitochondrial inner membrane transporters (TIM22, TIM9, MRS1)
Mitochondrial transport system
M.C. Rivera & J.A.Lake Nature 431 (2004)
M.C. Rivera & J.A.Lake Nature 431 (2004)
24
4
8
49
M.C. Rivera & J.A. Lake Nature 431 (2004)
L.Giot et al. Science. 2003, 302 (2003)
Subunits of 26 S proteasome comlex
Trehalose-6-phosphate complex
nk
10000
The number of S. cerevisiae proteins
1000
PL
100
10
GPL+EC
1
0.1
1
10
100
The node degree
nk  1.6510 k
3
1.27
nk  2.4 10 (k  0.3)
3
0.5
exp(k / 3.0)
1000
k
Linear combination of exponential decays method:
nk 
i _ max
 A exp(l k )
i 0
i
i
Ai
800
„F”
600
+- s.e.
400
„S”
200
0
0
0.25
0.5
0.75
1
1.25
l.i
nk  AF exp(lF k )  AS exp(lS k )
10000
nk
1000
DEL
100
10
1
0.1
0.01
1
10
100
1000
k
3000
nkF, nkS
The contribution of “F” and “S” component
2500
F
2000
1500
1000
500
S
0
0 1 2 3 4 5 6 7 8 10 11 12 13 14 15k
Saccharomyces cerevisiae
nk
2000
200
1500
150
1000
100
500
50
0
0
0
5
10
nk
300
250
200
150
100
50
0
Arabidopsis thaliana
nk
250
15
20
k
0
5
Escherichia coli
15
20
k
Caenorhabditis elegans
nk
2000
10
1500
1000
500
0
0
5
0
15
20
k
0
Helicobacter pylori
nk
300
250
200
150
100
50
0
10
10
15
20
k
0
10
15
20
k
Drosophila melanogaster
nk
3000
2500
2000
1500
1000
500
0
5
5
5
10
15
20
k
Saccharomyces cerevisiae
nk
500
AICc = 94.6
AICc = 112.4
400
AICc = 135.8
300
200
PL
100
DEL
GPL+EC
0
0
5
10
15
20
k
Akaike's Information Criterion (AICc)
2m(m  1)
AICc  z ln( )  2m 
z  m 1
2
2
- the average squared residual for a given
model, m - the number of the model
parameters, z - the number of observations
Interacting proteins
74%
23%
71%
S. cerevisiae
E. coli
H. pylori
61%
37%
68%
A. thaliana
C. elegans
D. melanogaster