Computational Biology

Download Report

Transcript Computational Biology

Semesterplanung
25.11.
30.11.
2.12. + 7.12.
9.12. + 14.12.
16.12.
Qualitätsanalyse in PI networks: Baye‘sche Statistik
Phylogenie
Genome-Rearrangement
Weihnachtsvorlesung
Ass #6
Ass #7
Ass #8
11.1
V19
Einleitung metabolische Netzwerke
13.1
V20
Extreme Pathways
Ass. #9
18.1
V21
Elementarmodenanalyse
20.1.
V22
Integration metabol. + regul. Netzwerke
Ass. #10
25.1.
V23
Modellierung von Signaltransduktions-Kaskaden
27.1.
Modellierung von Signaltransduktions-Kaskaden (II)
1.2.
chemical genomics
3.2.
V12 pharmacogenomics
8.2.
Integrative Netzwerkanalyse
10.2.
Zusammenfassung für Klausur
Klausurtermin: wann?
11. Lecture WS 2004/05
Bioinformatics III
1
V11 – modules in cellular networks – wrap up
traditional biology (reductionist approach) produces long lists:
lists of genes in genomes
lists of transcripts in different cell types
lists of protein interactions in model organisms

genomes, transcriptomes, proteomes, interactomes,
databases of genetic perturbations, and corresponding phenotypes
How to make sense of it all?
Will meaningful hypotheses and discoveries emerge?
systems biology
Formalized mathematical modeling
simulations

quantitative measurements
still room for reductionism:
test hypothesis from
systems biology experiments
Gagneur et al. Genome Biology 5, R57 (2004)
11. Lecture WS 2004/05
Bioinformatics III
2
Strategies to detect communities in networks
„Community“ stands for module, class, group, cluster, ...
Define community as a subset of nodes within the graph such that connections
between the nodes are denser than connections with the rest of the network.
The detection of community structure is generally intended as a procedure for
mapping the network into a tree („dendogram“ in social sciences).
Leaves: nodes
branches join nodes
or (at higher level)
groups of nodes.
Radicchi et al. PNAS 101, 2658 (2004)
11. Lecture WS 2004/05
Bioinformatics III
3
Agglomerative algorithms for mapping to tree
Traditional method to perform this mapping: hierarchical clustering.
For every pair i,j of nodes in the network compute weight Wij that measures how
closely connected the vertices are.
Starting from the set of all nodes and no edges,
links are iteratively added between pairs of
nodes in order of decreasing weight.
In this way nodes are grouped into larger and larger
communities, and the tree is built up to the root,
which represents the whole network.
 „agglomerative“ algorithm
Here: 3 communities of densely connected
vertices (circles with solid lines) with a
much lower density of connections
(gray lines) between them.
Girven, Newman, PNAS 99, 7821 (2002)
Radicchi et al. PNAS 101, 2658 (2004)
11. Lecture WS 2004/05
Bioinformatics III
4
Possible definitions of the weights
(1) number of node-independent paths between vertices
2 paths that connect the same pair of vertices are said to be node-independent if
they share none of the same vertices other than their initial and final vertices.
(2) edge-independent paths.
It has been shown that the number of node-independent (edge-independent) paths
between 2 vertices i and j in a graph is equal to the minimum number of vertices
(edges) that must be removed from the graph to disconnect i and j from one
another (Menger, 1927).
 these numbers are a measure of the robustness of the network to deletion of
nodes (edges).
Girven, Newman, PNAS 99, 7821 (2002)
11. Lecture WS 2004/05
Bioinformatics III
5
Possible definitions of the weights (II)
(3) count total number of paths that run between them (not just those that are
node- or edge-independent).
Because the number of paths between any 2 vertices is either 0 or infinite, one
typically weighs paths of length l by a factor l with small  so that the weighted
count of number of paths converges.
Thus long paths contribute exponentially less weight than short paths.
These node- or edge-dependent path definitions for weights work okay for certain
community structures, but show typical pathologies.
Girven, Newman, PNAS 99, 7821 (2002)
11. Lecture WS 2004/05
Bioinformatics III
6
Problems
In particular, both counting of node- and edge-independent paths has a tendency
to separate single peripheral vertices from the communities to which they should
rightly belong.
If a vertex is, e.g., connected to the rest of a network by only a single edge then, to
the extent that it belongs to any community, it should clearly be considered to
belong to the community at the other end of that edge.
Unfortunately, both the numbers of independent paths and the weighted path
counts for such vertices are small and hence single nodes often remain isolated
from the network when the communities are constructed.
This and other pathologies, make the hierarchical clustering method, although
useful, far from perfect.
Girven, Newman, PNAS 99, 7821 (2002)
11. Lecture WS 2004/05
Bioinformatics III
7
New strategy: Use “betweenness” as definition of weights
Focus on those edges that are least central, that are „between“ communities.
Define edge betweenness of an edge as the number of shortest paths between
pairs of vertices that run along it.
If there is more than one shortest path between a pair of vertices, each path is
given equal weight such that the total weight of all of the paths is 1.
If a network contains communities or groups that are only loosely connected by a
few intergroup edges, then all shortest paths between different communities must
go along one of these few edges.
 the edges connecting communities will have high edge betweenness.
By removing these edges we separate groups from one another and so reveal the
underlying community structure of the graph.
Girven, Newman, PNAS 99, 7821 (2002)
11. Lecture WS 2004/05
Bioinformatics III
8
GN Algorithm
1.
2.
3.
4.
Calculate betweenness for all m edges in a graph of n vertices
(can be done in O(mn) time).
Remove the edge with the highest betweenness.
Recalculate betweenness for all edges affected by the removal.
Repeat from step 2 until no edges remain.
Because step 3 has to be done for all edges, the algorithm runs in worst-case time
O(m2n).
Girven, Newman, PNAS 99, 7821 (2002)
11. Lecture WS 2004/05
Bioinformatics III
9
Application of Girvan&Newman Algorithm
(a) The friendship network from Zachary's karate
1.
club study. Nodes associated with the club
administrator's faction are drawn as circles, those
associated with the instructor's faction are drawn
as squares.
(b) Hierarchical tree showing the complete
community structure for the network calculated by
using the algorithm presented in this article. The
initial split of the network into two groups is in
agreement with the actual factions observed by
Zachary, with the exception that node 3 is
misclassified.
(c) Hierarchical tree calculated by using edgeindependent path counts, which fails to extract the
known community structure of the network.
Girven, Newman, PNAS 99, 7821 (2002)
11. Lecture WS 2004/05
Bioinformatics III
10
Divisive algorithms for mapping to tree
Reverse order of construction of the tree than for agglomerative algorithms:
start with the whole graph and iteratively cut the edges
 divide network progressively into smaller and smaller disconnected
subnetworks identified as the communities.
Crucial point: how to select those edges to be cut.
Example: Girven & Newman algorithm (GN)
Problem of GN algorithm: requires the repeated evaluation of a global property, the
betweenness, for each edge whose value depends on the properties of the whole
system.
 becomes computationally very expensive for networks with e.g.  10000 nodes.
Radicchi et al. PNAS 101, 2658 (2004)
11. Lecture WS 2004/05
Bioinformatics III
11
Faster algorithm
Introduce divisive algorithm that only requires the consideration of local quantities.
Need: quantity that can single out edges connecting nodes belonging to different
communities.
Consider edge-clustering coefficient:
number of triangles to which a given edge belongs divided by the number of
triangles that might potentially include it, given the degrees of the adjacent nodes.
For the edge-connecting node i to node j, the edge-clustering coefficient is
Ci,3j 

zi,3j  1

min ki  1, k j  1
where zi,j(3) is the number of triangles built on that edge and
min[(ki – 1), (kj – 1)] is the maximal possible number of them.
1 is added to zi,j(3) to remove degeneracy for zi,j(3) = 0.
Radicchi et al. PNAS 101, 2658 (2004)
11. Lecture WS 2004/05
Bioinformatics III
12
Faster algorithm
Edges connecting nodes in different communities are included in few or no
triangles and tend to have small values of Ci,j(3).
On the other hand, many triangles exist within clusters.
By considering higher order cycles one can define coefficients of order g
Ci,gj  
zi,gj   1
si,gj 
where zi,j(g) is the number of cyclic structures of order g the edge (i,j) belongs to,
and si,j(g) is the number of possible cyclic structures of order g that can be built
given the degrees of the nodes.
Define, for every g, a dectection algorithm that works exactly as the GN method
with the difference that, at every step, the removed edges are those with the
smallest value of Ci,j(g).
By considering increasing values of g, one can smoothly interpolate between a
local and a nonlocal algorithm.
Radicchi et al. PNAS 101, 2658 (2004)
11. Lecture WS 2004/05
Bioinformatics III
13
Comparison with GN method
Test of the efficiency of the different algorithms in the analysis of the artificial graph
with four communities. Here N = 128 and pin is changed with pout to keep the
average degree equal to 16.
(Left) Strong definition: fraction of successes for the different algorithms compared
with the analytical probability that four communities are actually defined.
(Right) Weak definition: in addition to the same quantities plotted in Left, here we
report, for every algorithm, the fraction f of nodes not correctly classified.
Radicchi et al. PNAS 101, 2658 (2004)
11. Lecture WS 2004/05
Bioinformatics III
14
Comparison with GN algorithm
Plot of the dendrograms for the network of college football teams, obtained by
using the GN algorithm (Left) and our algorithm with g = 4 (Right).
Different symbols denote teams belonging to different conferences.
In both cases, the observed communities perfectly correspond to the conferences,
with the exception of the six members of the „Independent conference“, which are
misclassified.
Radicchi et al. PNAS 101, 2658 (2004)
11. Lecture WS 2004/05
Bioinformatics III
15
Simple network clustering based on shortest-path distance
Aim: compute modular organization of cellular networks controlling specific
biological responses.
Ideas:
(i) the shortest path between any two vertices (proteins) is probably the most
relevant for functional associations;
(ii) each vertex in a network has a unique profile of shortest-path distances through
the network to every other vertex
(iii) module comembers are likely to have similar (clustered) shortest-path-distance
profiles.
Rives & Galitski PNAS 100, 1128 (2003)
11. Lecture WS 2004/05
Bioinformatics III
16
Network clustering
Yeast PI network; 4079 proteins, 6761 protein interactions.
MIPS: 133 signaling proteins, 64 have  1 interactions with another signaling
protein.
Algorithm: assign length 1 to each edge in protein interaction network.
Compute all-pairs shortest-path distance matrix: contains length of the shortest
path (distance) d between every pair of vertices in the network.
Convert into „association matrix“ using 1/d2 .
 Associations range from 0 to 1.
Emphasizes local association in subsequent clustering.
Use hierarchical agglomerative average-linkage clustering.
Rives & Galitski PNAS 100, 1128 (2003)
11. Lecture WS 2004/05
Bioinformatics III
17
Clustering of yeast signaling protein interaction network
A symmetrical matrix of 64 proteins of the
MIPS-database signaling category was
clustered identically in both dimensions. The
cluster tree is not shown. Each row or
column represents a protein. Each feature is
the intersection of two proteins and is a
grayscale representation of pairwise protein
association).
Columns to the right of the clustered network
represent MIPS-defined signaling pathways
[P, polarity-PKC; R, Ras; H, HOG; M,
mating/filamentation MAPK (mfMAPK)].
White bars in the MIPS-pathway columns
indicate protein members of the pathway.
Ras-pathway proteins form a single
cluster.
3 MAPK pathways as clusters.
Rives & Galitski PNAS 100, 1128 (2003)
11. Lecture WS 2004/05
Bioinformatics III
18
Network clustering of high-throughput data sets
HTS-Data usually has high (50%) false-positive error frequencies!
Also, many binary interactions may not occur within modules.
Because interacting proteins usually localize in the same subcellular compartment
one may integrate interaction and localization data for the identification of modules.
Single proteins with many interactions in Y2H screens (hubs) nucleate large
clusters that are not modules.
Rives & Galitski PNAS 100, 1128 (2003)
11. Lecture WS 2004/05
Bioinformatics III
19
examples of derived clusters
Clustering of the yeast nuclear-protein
network derived from high-throughput
interaction and localization data.
(A) Examples of clusters representing module
rudiments are labeled. The cluster tree is not
shown. Arrows indicate high-connectivity hub
proteins.
(B) Example clusters are shown in detail.
Cluster comembers participating in some
common structure or function have large bold
labels.
Rives & Galitski PNAS 100, 1128 (2003)
11. Lecture WS 2004/05
Bioinformatics III
20
Properties of hubs
All hub proteins indicated bind > 90 proteins in the global Y2H network.
The proteins bound by these hubs are randomly distributed in cellular
compartments.
The nuclear-localized proteins bound by these hubs form the 4 largest clusters.
Proteins bound by high-connectivity hubs will have few or no interactions among
themselves if they are not functionally associated („hub-and-spokes“ structure).
 proteins bound by each high-connectivity hub are not functionally associated
with each other, and their clusters do not represent modules.
Rives & Galitski PNAS 100, 1128 (2003)
11. Lecture WS 2004/05
Bioinformatics III
21
connectivity  neighborhood clustering
Global protein connectivity versus
neighborhood clustering. Each
protein in the global protein network is plotted by its connectivity,
k, and its neighborhood clustering,
C. Arrows indicate high-connectivity proteins shown in Fig. 2A.
The 4 high-connectivity hubs are among 15 outliers. Although these proteins have
exceedingly high connectivity, they almost completely lack neighborhood clustering.
 useful criterion to distinguish modules from nonmodules?
Rives & Galitski PNAS 100, 1128 (2003)
11. Lecture WS 2004/05
Bioinformatics III
22
Application to biological-response networks
Incorporate network clustering into 3-step process to study complex biomolecular
systems  generates modular network-structure model
(i) compile known and suspected components of the response network (from
databases, expression profiling, proteomics, genetic screens, metabolite profiles ...)
(ii) cluster network based on interactions between vertices. Edges can represent
any type of interaction.
(iii) abstract modular network-structure model showing modules.
Cluster 90 filamentation-network proteins that have  1 interaction with other
filamentation proteins.
Rives & Galitski PNAS 100, 1128 (2003)
11. Lecture WS 2004/05
Bioinformatics III
23
Clustering of the yeast filamentation network
Proteins of the yeast
filamentation network were
clustered. A tree-depth
threshold was set.
Tree branches with  3 leaves
(clusters with  3 proteins)
below the tree threshold are
shown.
Bullets and large bold labels
indicate proteins of highest
intracluster connectivity.
Rives & Galitski PNAS 100, 1128 (2003)
11. Lecture WS 2004/05
Bioinformatics III
24
Modular model of the yeast filamentation network
Clusters indicated in Fig. 4 are
abstracted as modules. All intermodule
paths in the filamentation network are
indicated as black lines with the
interacting proteins at the termini.
A gray line connecting the Ras and
protein kinase A modules was added to
indicate a connection mediated by the
small molecule cAMP.
Rives & Galitski PNAS 100, 1128 (2003)
11. Lecture WS 2004/05
Bioinformatics III
25
Filamentous growth-response of yeast cells
(A) Wild-type yeast-form cells grown in SHAD
liquid medium.
(B) Wild-type filamentous-form cells grown for
10 h on SLAD agar medium.
For budding yeast diploid cells, low availability
of ammonium and a solid growth substrate
trigger a dimorphic switch to filamentous-form
growth, characterized by cell elongation,
unipolar distal budding, adhesion and invasion.
Prominent involved pathways: cAMPdependent protein kinase, fMAP kinase, Cdc28
kinase activity, ubiquitination by SCF ubiquitinligase.
Here: investigate next step, ubiquitin-dependent
degration by 26S proteasome.
11. Lecture WS 2004/05
Bioinformatics III
Prinz et al.
Genome Research 14, 380 (2004)
26
Integrated filamentation network
The filamentation network includes proteins
(rectangular nodes) implicated in filamentous
growth by expression profiling or known
phenotypes, and metabolites (triangular
nodes) that are either substrates or products
of filamentation-protein enzymes.
N
ot shown are filamentation proteins with neither a protein–
metabolite interaction nor a protein–protein interaction with
another filamentation protein.
Blue edges: protein–protein interactions.
Green edges: protein–metabolite interactions.
Each gene node is colored based on its
expression log-ratio. Shades of red indicate
higher expression in the filamentous form
relative to the yeast form; shades of blue
indicate the opposite response; white indicates
no difference.
11. Lecture WS 2004/05
Prinz et al. Genome Research 14, 380 (2004)
Bioinformatics III
27
Collective Functions of Network Clusters
If clusters in an integrated network
represent biological modules, the
clusters should have collective
functions in specific biological
processes.
Specific biological-process gene
annotations (taken from GO
database) are found overrepresented in specific filamentationnetwork clusters.
Significance: -log (cumulative probability of the
observed data and all more extreme probabilities).
Prinz et al. Genome Research 14, 380 (2004)
11. Lecture WS 2004/05
Bioinformatics III
28
Modular abstraction of the filamentation network
Network clusters are abstracted
as circular "module nodes."
The area of each module node
is proportional to the number of
member molecules. The color of
each module node reflects the
average expression log-ratio of
member genes.
Each module node is assigned
the name of the member node
of highest intracluster degree
(the highest number of
interactions with cluster comembers); most are proteins,
some are metabolites.
Prinz et al. Genome Research 14, 380 (2004)
11. Lecture WS 2004/05
Bioinformatics III
29
Quantitative identifcation of network clusters
Nodes of the filamentation network were
iteratively joined into clusters.
(A) A cluster was defined as a joined group
containing at least 3 protein nodes.
The number of clusters is plotted as a
function of join number.
(B) The selection of nodes/clusters to join
was based on average-linkage Manhattan
distance of node shortest-paths-distance
profiles.
This distance metric is plotted as a function of
join number.
The arrows indicate join #535 corresponding
to the highest join number with the highest
number of clusters.
Prinz et al. Genome Research 14, 380 (2004)
11. Lecture WS 2004/05
Bioinformatics III
30
RPN12, GRR1, and CDC28 modules and their components
Modules (A), and their respective
components (B) with collective functions in
cell-cycle control and ubiquitin-dependent
proteolysis are shown.
Prinz et al. Genome Research 14, 380 (2004)
11. Lecture WS 2004/05
Bioinformatics III
31
growth behavior of rpn4 mutants
rpn4 mutants show Cln1-dependent
hyperelongation, and cell typeindependent agar adhesion.
(A) Diploid wild-type, rpn4 , cln1 ,
and rpn4 cln1 strains were grown
on SLAD agar plates and
photographed after 9 h.
(B) Patches of strains of the indicated
cell types and genotypes were
subjected to a wash-off assay of
adhesion. The plate was imaged
before and after washing with water.
Prinz et al. Genome Research 14, 380 (2004)
11. Lecture WS 2004/05
Bioinformatics III
32
Stabilization of Cln1 protein in rpn4 mutants
(A) Northern blot analysis of total RNA from wild-type and rpn4
 strains, and a cln1  strain. The blot was probed
consecutively with probes for CLN1 and RPN12. The asterisk
in the CLN1 blot indicates a cross-hybridizing band that also
serves as a loading control.
(B) Western blot analysis of Cln1 protein in diploid wild-type
and rpn4  strains carrying HA-tagged CLN1, and a no-tag
wild-type control strain. Protein extracts were prepared from
cells grown for 10 h on SLAD agar plates. Pgk1 protein levels
served as a loading control.
(C) Cln1-HA protein was immunoprecipitated from an rpn4 
strain. Aliquots of the immunoprecipitate were incubated with
calf-intestine phosphatase (CIP), or without CIP, and analyzed
by Western blotting.
(D) myc-tagged Cln1 protein was immunoprecipitated in
diploid wild-type and rpn4  strains, and a no-tag control
strain. All strains had a multicopy plasmid expressing HAtagged ubiquitin. Immunoprecipitates were analyzed by gel
electrophoresis and immunoblotting with anti-HA antibody to
detect ubiquitin conjugates. The blot membrane was stripped
and reprobed with anti-myc antibodies to detect the
immunoprecipitated Cln1.
Prinz et al. Genome Research 14, 380 (2004)
11. Lecture WS 2004/05
Bioinformatics III
33
Non-random interaction among filamentation proteins
a
Interaction data include all proteinprotein interactions plus all metabolic
interactions. Each analysis used either
biological interaction data or 10 data
sets in which interactions were as
signed randomly to pairs of proteins.
b
Each analysis included a list of either
the 1026 filamentation proteins, or the
873 expression-implicated proteins,
or 10 sets of random proteins.
c
The number of proteins in the list that
has at least one interaction with another
protein in the list
d
The number of direct interactions between
pairs of proteins in the list.
e
Node degree: # of incident edges of the
node. Mean node degree: ratio of 2*# of
interactions to # of interacting proteins.
11. Lecture WS 2004/05
Prinz et al. Genome
Research 14, 380 (2004)
Bioinformatics III
34
Expression change within clusters
RPN12, GRR1, and CDC28
modules and their components.
Modules (A), and their respective
components (B) with collective
functions in cell-cycle control and
ubiquitin-dependent proteolysis are
shown. Graphic representations
are as in Figures 2 and 3.
... table continues ...
Prinz et al. Genome Research 14, 380 (2004)
11. Lecture WS 2004/05
Bioinformatics III
35
Biological Insights from modular network abstraction
(1) In an integrated network, data on molecules and interactions shows clustered
organization that can be identified quantitatively
(2) Cluster co-member genes show significant coordination of expression change,
as expected for genes involved in a collective function.
(3) Cluster go-member genes show significant overrepresentation of biologicalprocess annotations, indicating collective function.
(4) The modular network abstraction intuitively stimulates testable biological
insights on complex biological properties.
Prinz et al. Genome Research 14, 380 (2004)
11. Lecture WS 2004/05
Bioinformatics III
36
Evolutionary conservation of motif constituents
in the yeast protein interaction network
Question: why are some cellular components conserved across species
but others evolve rapidly?
Many biological functions are carried out by the integrated activity of highly
interacting cellular components = functional modules
Motifs = topologically distinct interaction patterns with complex networks
may represent the simplest building blocks of modules.
Here, test the correlation between a protein‘s evolutionary rate and the
structure of the motif it is embedded in
 identify all 2-, 3-, 4-node motifs and some 5-node motifs
Wuchty, Oltvai, Barabasi, Nature Gen 35, 176 (2003)
11. Lecture WS 2004/05
Bioinformatics III
37
shared components
Data from DIP database,
3183 interacting yeast proteins
Algorithm to detect all
n-node subgraphs:
if there is evolutionary pressure to
maintain specific motifs, their
components should be evolutionarily
conserved and have identifiable
orthologs in other organisms.
scan all rows of the adjacency
matrix M. For each non-zero
element (i,j) representing a link,
scan through all neighbors of
(i,j) until a specific n-node
subgraph is detected.
Study conservation of 678 S. cerevisae
proteins with an ortholog in each of 5
higher eukaryotes:
Arabidopsis thaliana, C. elegans,
Drosophila melanogaster, Mus
musculus, Homo sapiens.
Wuchty, Oltvai, Barabasi, Nature Gen 35, 176 (2003)
11. Lecture WS 2004/05
Bioinformatics III
38
shared components
#motifs of a given kind in the yeast PI
network
fraction of original yeast motifs that is
evolutionary fully conserved: each of
their protein components belongs to
678 orthologous proteins
fraction of motifs that is fully conserved
for the random ortholog distribution
column 4 / column 5
less than 5% of #2 (linear 3-component
proteins) are completely maintained
47% of the fully conserved pentagons
(#11) are fully conserved!
Wuchty, Oltvai, Barabasi, Nature Gen 35, 176 (2003)
11. Lecture WS 2004/05
Bioinformatics III
39
topology  conservation of individual proteins
Larger motifs tend to
be conserved as a
whole, where each
component has an
ortholog.
E.g. less than 1% of the fully connected pentagon motifs disappeared completely,
for 69% of them, each of the subunits had an ortholog in human.
Clear correlation between the conservation rate and the degree of saturation of
a motif.
Participation in motifs substantially influences the evolutionary conservation of
specific components.
Wuchty, Oltvai, Barabasi, Nature Gen 35, 176 (2003)
11. Lecture WS 2004/05
Bioinformatics III
40
clustering coefficient  conservation of proteins ?
Wuchty, Oltvai, Barabasi,
Nature Gen 35, 176 (2003)
From 65% (C = 0) to 84% (C = 1) of neighbors of a human ortholog were also
human orthologs (filled circles). The conserved fraction of the nonorthologous
protein‘s neighborhood is markedly smaller.
Enrichment = ration between the percentages of orthologous proteins at distance d
from an ortholog in the natural and the random orthologous sets.
d: shortest distance between i and target protein measured along network links.
Proteins that interact directly with an ortholog at d=1 have a 50% higher chance of
conservation that at random!
11. Lecture WS 2004/05
Bioinformatics III
41
function  conservation?
Examine if the specific function of the yeast proteins within motifs affects their rate
of evolutionary conservation.
Assign each motif to functional class to which its protein components belong.
Larger motifs have a notable functional homogeneity:
- for 95% of fully connected yeast pentagon motifs (#11) all components shared at
least one common functional class,
- only 10% of the 2-node motifs (#1) are functionally conserved.
Identify type and number of evolutionary fully conserved motifs of each functional
class in S.cerevisae, for those that have an ortholog in humans.
Wuchty, Oltvai, Barabasi,
Nature Gen 35, 176 (2003)
11. Lecture WS 2004/05
Bioinformatics III
42
shared components
For 3 functional classes
(subcellular localization, protein
fate, transcription) each of the 11
studied motifs is considerably
overrepresented.
Some other functional classes
have only 1 or 2 characteristic
motifs.
No motifs are found for:
transposable elements, energy,
cellular fate, cellular communication, cellular rescue, cellular
organization, metabolism,
protein activity, protein binding
Wuchty, Oltvai, Barabasi, Nature Gen 35, 176 (2003)
11. Lecture WS 2004/05
Bioinformatics III
43
shared components
For 3 functional classes (subcellular localization, protein fate, transcription) each of
the 11 studied motifs is considerably overrepresented.
Some other functional classes have only 1 or 2 characteristic motifs.
No motifs are found for:
transposable elements, energy, cellular fate, cellular communi-cation, cellular
rescue, cellular organization, metabolism, protein activity, protein binding
The fully connected motifs (#9 and #11) tend to identify protein complexes.
However, the mere existence of protein complexes cannot explain the observed
trends towards higher conservation rates of the highly connected motifs.
Wuchty, Oltvai, Barabasi, Nature Gen 35, 176 (2003)
11. Lecture WS 2004/05
Bioinformatics III
44
shared components
Shared components = proteins or groups of proteins occurring in different
complexes are fairly common:
A shared component may be a small part of many complexes, acting as a unit that
is constantly reused for ist function.
Also, it may be the main part of the complex e.g. in a family of variant complexes
that differ from each other by distinct proteins that provide functional specificity.
Aim: identify and properly represent the modularity of protein-protein interaction
networks by identifying the shared components and the way they are arranged to
generate complexes.
Wuchty, Oltvai, Barabasi, Nature Gen 35, 176 (2003)
11. Lecture WS 2004/05
Bioinformatics III
45
Summary
Modules are key intermediate level in the organizational hierarchy of cells.
Biological Module: loose association of preferred molecular interaction partners
that interact to perform a collective function.
Modules can be identified based on structural characteristics such as their closely
connected members and interfacesto other modules.
There is evidence that modules are evolutionarily conserved.
Module co-members tend to be coordinately expressed.
11. Lecture WS 2004/05
Bioinformatics III
46