Transcript Slide 1
LECTURE 2 1. Complex Network Models 2. Properties of Protein-Protein Interaction Networks 3. Usage of KNApSack Database Complex Network Models: Average Path length L, Clustering coefficient C, Degree Distribution P(k) help understand the global structure of the network. Some well-known types of Network Models are as follows: •Regular Coupled Networks •Random Graphs •Small world Model •Scale-free Model •Hierarchical Networks Regular networks Regular networks Diamond Crystal Both diamond and graphite are carbon Graphite Crystal Regular network (A ring lattice) Average path length L is high Clustering coefficient C is high Degree distribution is delta type. P(k) 1 1 2 3 4 5 Random Graph Erdos and Renyi introduced the concept of random graph around 45 years ago. Random Graph N=10 Emax = N(N-1)/2 =45 p=0.1 p=0 p=0.15 p=0.25 Random Graph Average path length L is Low Clustering coefficient C is low Degree distribution is exponential type. p=0.25 P(k ) e k k! Random Graph Usually to compare a real network with a random network we first generate a random network of the same size i.e. with the same number of nodes and edges. Other than Erdos Reyini random graphs there are other type of random graphs A Random graph can be constructed such that it matches the degree distribution or some other topological properties of a given graph Geometric random graphs: each vertex is assigned random coordinates in a geometric space of arbitrary dimensionality and random edges are allowed between adjacent points or points constrained by a threshold distance. Geometric random graph: Example Small world model (Watts and Strogatz) Oftentimes,soon after meeting a stranger, one is surprised to find that they have a common friend in between; so they both cheer: “What a small world!” What a small world!! Small world model (Watts and Strogatz) Randomly rewire each edge Begin with a nearest-neighbor of the network with some coupled network probability p Small world model (Watts and Strogatz) Average path length L is Low Clustering coefficient C is high Degree distribution is exponential type. P(k) Scale-free model (Barabási and Albert) Start with a small number of nodes; at every time step, a new node is introduced and is connected to alreadyexisting nodes following Preferential Attachment (probability is high that a new node be connected to high degree nodes) Average path length L is Low Clustering coefficient C is not clearly known. Degree distribution is power-law type. 1 0.1 γ=2 0.01 γ=3 0.001 P(k) ~ k-γ 0.0001 1 10 100 1000 Scale-free networks exhibit robustness Robustness – The ability of complex systems to maintain their function even when the structure of the system changes significantly Tolerant to random removal of nodes (mutations) Vulnerable to targeted attack of hubs (mutations) – Drug targets Scale-free model (Barabási and Albert) The term “scale-free” refers to any functional form f(x) that remains unchanged to within a multiplicative factor under a rescaling of the independent variable x i.e. f(ax) = bf(x). This means power-law forms (P(k) ~ k-γ), since these are the only solutions to f(ax) = bf(x), and hence “power-law” is referred to as “scale-free”. Hierarchical Graphs NETWORK BIOLOGY: UNDERSTANDING THE CELL’S FUNCTIONAL ORGANIZATION Albert-László Barabási & Zoltán N. Oltvai NATURE REVIEWS | GENETICS VOLUME 5 | FEBRUARY 2004 | 101 The starting point of this construction is a small cluster of four densely linked nodes (see the four central nodes in figure).Next, three replicas of this module are generated and the three external nodes of the replicated clusters connected to the central node of the old cluster, which produces a large 16-node module. Three replicas of this 16-node module are then generated and the 12 peripheral nodes connected to the central node of the old module, which produces a new module of 64 nodes. Hierarchical Graphs The hierarchical network model seamlessly integrates a scale-free topology with an inherent modular structure by generating a network that has a power-law degree distribution with degree exponent γ = 1 +ln4/ln3 = 2.26 and a large, system-size independent average clustering coefficient <C> ~ 0.6. The most important signature of hierarchical modularity is the scaling of the clustering coefficient, which follows C(k) ~ k –1 a straight line of slope –1 on a log–log plot NETWORK BIOLOGY: UNDERSTANDING THE CELL’S FUNCTIONAL ORGANIZATION Albert-László Barabási & Zoltán N. Oltvai NATURE REVIEWS | GENETICS VOLUME 5 | FEBRUARY 2004 | 101 NETWORK BIOLOGY: UNDERSTANDING THE CELL’S FUNCTIONAL ORGANIZATION Albert-László Barabási & Zoltán N. Oltvai NATURE REVIEWS | GENETICS VOLUME 5 | FEBRUARY 2004 | 101 Comparison of random, scalefree and hierarchical networks protein-protein interaction Typical protein-protein interaction A protein binds with another or several other proteins in order to perform different biological functions---they are called protein complexes. protein-protein interaction This complex transport oxygen from lungs to cells all over the body through blood circulation PROTEINPROTEIN INTERACTIONS by Catherine Royer Biophysics Textbook Online protein-protein interaction PROTEINPROTEIN INTERACTIONS by Catherine Royer Biophysics Textbook Online Network of interactions and complexes •Usually protein-protein interaction data are produced by Laboratory experiments (Yeast two-hybrid, pull-down assay etc.) detected complex data A A B D C E F A Bait protein B Interacted protein C D E F Spoke approach B F C E D Matrix approach •The results of the experiments are converted to binary interactions. •The binary interactions can be represented as a network/graph where a node represents a protein and an edge represents an interaction. Network of interactions AtpB AtpG AtpA AtpB AtpG AtpE 00101 00011 10001 01001 11110 AtpA AtpE AtpH AtpH AtpH AtpH List of interactions Corresponding network Adjacency matrix The yeast protein interaction network evolves rapidly and contain few redundant duplicate genes by A. Wagner. Mol. Biology and Evolution. 2001 985 proteins and 899 interactions S. Cerevisiae giant component consists of 466 proteins The yeast protein interaction network evolves rapidly and contain few redundant duplicate genes by A. Wagner. Mol. Biol. Evol. 2001 Average degree ~ 2 Clustering coefficient = 0.022 Degree distribution is scale free An E. coli interaction network from DIP (http://dip.mbi.ucla.edu/). Components of this graph has been determined by applying Depth First Search Algorithm There are total 62 components Giant component 93 proteins 300 proteins and 287 interactions E. coli An E. coli interaction network from DIP (http://dip.mbi.ucla.edu/). 2.5 Log(No. of Node) 2 1.5 1 0.5 0 0 0.5 1 1.5 2 Log(Degree) Average degree ~ 1.913 Clustering co-efficient = 0.29 Degree distribution ~ scale free Lethality and Centrality in protein networks by H. Jeong, S. P. Mason, A.-L. Barabasi, Z. N. Oltvai Nature, May 2001 Almost all proteins are connected 1870 proteins and 2240 interactions S. Cerevisiae Degree distribution is scale free PPI network based on MIPS database consisting of 4546 proteins 12319 interactions Average degree 5.42 Clustering coefficient = 0.18 Giant component consists of 4385 proteins PPI network based on MIPS database consisting of 4546 proteins 12319 interactions 3.5 3 Degree distribution ~ scale free 2.5 2 1.5 1 0.5 0 0 0.5 1 1.5 2 2.5 3 # of protein s # of Interac. Average degree Clusterin g Coeffi. Giant Degree Compo. Distribu . 985 899 ~2 0.022 Exist 47.3% Power law 300 287 1.913 0.29 Exist 31% Almost Power law 1870 2240 ______ ______ Exist ~100% Power law 4546 12319 5.42 0.18 Exist ~96% Not exactly Power law A complete PPI network tends to be a connected graph And tends to have Power law distribution Introduction to KNApSaCK database http://kanaya.aist-nara.ac.jp/KNApSAcK/ FT-MS high accurate MW for metabolites [molecular weight (ppm)] accurate mass: 226.0477 Candidates of Metabolites Molecular formula 600 # of candidates for molecular formula 597 Since 2004 Error level for FT-MS 251 32 1 0 ±1 ±1 ±0.1 ±0.1 ±0.01 ±0.01 ±0.001 ±0.001 (Mw margin ) C10H10O6 KNApSAcK: Species-metabolite relation DB 35 Chorismic acid Isochorismic acid Now! Species Metabolite Since 2004 Last updata 2010/3/31 50,054 unique metabolites 102,005 species-metabolite relations 36 Current Status of KNApSAcK project Plant kingdom (Predicted) -- 200,000 D. Strack and R. Dixon (2003) Known NPs (Predicted) -- 50,000 /Plants, Luca and Pierre, (2000) KNApSAcK(last updata 2009/5/21) -- 50,054 unique metabolites 102,005 species-metabolite relations Model species Arabidopsis thaliana -- 5,000 ca. 1/3 of 1200 protein types Human -- 2,500 Ryals (2004) Bacteria (E. coli, B. subtilis) -- 800 – 1700 Systematization of Speciesmetabolite relation DB(KNApSAcK) Basic study: --Metabolomics (Systems Biol) -- Evolution of NPs -- Gene to metabolite relations Applied works: -- Food Sciences -- Health creation -- Herbal medicine -- Drug development by Herb. 37 Main window http://kanaya.naist.jp/KNApSAcK/ We can retrieve metabolite information by: (a) Name (Organism, Metabolite) (g) A list of retrieved metabolites (b) Mw margin (c) Molecular formula (h) Mode selection (d) Taxonomic hierarchy Substrucutre (e) Ion mass of FT-MS with ionization mode 38 Metabolites can be linked to KNApSAcK easily by Keywords (Organism, Metabolite, Molecular Formula) 39 (+)-Sesamin is reported in 122 species 40 Input: Allium cepa 38 Metabolites 41 KNApSAcK(http:/kanaya.naist.jp/KNApSAcK )(Since 2004) Papers utilized KNApSAcK DB to examine metabolomics ( Thanks!) Davey, M.P., et al., Metabolomics, (2009) Hounsome, N. et al., Postharvest Biol. Technol., (2009) Xie,Z., et al., J.Exp.Botany,60, 87-97, (2009) Giavalisco, Anal.Chem.(2009) Draper et al., BMC Bioinformatics, (2009) Shroff et al., 6 papers-2009 (Red, Foreign country) PNAS (2009) Malitsky, S.,., et al., Plant Physiol., (2008) Warner, E., et al., J.Chromatography B,(2008) Fait, A., et al., Plant Physiol., 148, 730-750 (2008) Mintz-Oron, S., et al., Plant Physiol., 147, 823-851, (2008) Hanhineva, K., et al., Phytochemistry, 69, 2463-2481 (2008) Bottcher,C., et al., Plant Physiol.,147,2107-2120, (2008) Farder, A. et al., J. Nutrition, 138, 1282-1287, (2008) Mintz-Oron, S., et al., Plant Physiol.,147,823-825, (2008) 17 papers-2008 (Red, Foreign country) 0 Target of Research ., Nature Protocols 15 20 Review Article Bioinformatics Methodology Development 10 papers-2007 Sofia, M., et al., Trends in Anal. Chem., 26, 855-866, (2007) Hummel, J., et al., Topics in Curr. Genet., 18, 75-95, (2007) Gaida, A., and Neumann, S., J. Int. Bioinf., (2007) Griffiths,W.J.,Metabolomics,Metabolonomics and Metabolite Profiling,(Royal Soc.Chem.),2007 Ohta, D., et al., Anal.Biol. Chem.(2007) Nakamura, Y., et al., Planta, (2007) Suzuki, H., et al., Phytochemistry, (2007) Sakakibara, K., et al., , J .Biol. Chem.,282, 14932-14941, (2007) Saito, K. et al., Trends in Plant Sci., 13, 36-42, (2007) Kikuchi, K and Kakeya, H., 10 Metabolomics Non-targeted Analysis Overy, D.P., et al , 3, 471-485, (2008) Dunn, W.B., Physical Biol.,5, 1-24, (2008) Akiyama, K., In Silico Biol., 8, 27, (2008) Sawada, Plant Cell Physiol., (2008) Arita,M. and Suwa, K., BioData Mining, 1,7.1-8 (2008) Saito, K. et al., Trends in Plant Sci.,13, 36-43, (2008) Akiyama, K., et al., In Silico Biol., 8, 339-345, (2008) Takahashi, H., Anal. Bioanal Chem. (in press) (2008) Iijima, Y., et al., Plant J., 54, 949-962, (2008) Want, E.J. et al., J. Proteome Res., 6, 459-468, (2007) 5 Natuure Chem. Biol., 2, 392-394, (2006) 0 5 10 Arabidopsis thaliana Fragaria x ananassa Salanum lycopersicum Brassica oleracea 4 papers-2006 Curcuma longa Oikawa, A.,et al., Plant Physiol., 142, 398-413, (2006) Shinbo, Y., et al., Biotchnol. Agric. Forestry, 57, 166-181, (2006) Shinbo, Y., et al., J. Comput. Aided Chem., 7, 94-101, (2006) since 2004 Web-sites linked to KNApSAcK (WikiBook) http://en.wikibooks.org/wiki/Metabolomics/Databases (UC Davis) http://fiehnlab.ucdavis.edu/staff/kind/Metabolomics/Structure_Elucidation/ (KEGG) http://fire3.scl.genome.ad.jp/dbget-bin/www_bfind?knapsack (TAIR-Metabolomics Resource – Databases) http://www.arabidopsis.org/portals/metabolome/metabolome_database.jsp/ (LECO manual) Form No. 203-821-333 E. coli Rattus norvegicus 42 KNApSAcK(http:/kanaya.naist.jp/KNApSAcK )(Since 2004) Papers utilized KNApSAcK DB to examine metabolomics ( Thanks!) Davey, M.P., et al., Metabolomics, (2009) Hounsome, N. et al., Postharvest Biol. Technol., (2009) Xie,Z., et al., J.Exp.Botany,60, 87-97, (2009) Giavalisco, Anal.Chem.(2009) Draper et al., BMC Bioinformatics, (2009) Shroff et al., 6 papers-2009 (Red, Foreign country) PNAS (2009) Malitsky, S.,., et al., Plant Physiol., (2008) Warner, E., et al., J.Chromatography B,(2008) Fait, A., et al., Plant Physiol., 148, 730-750 (2008) Mintz-Oron, S., et al., Plant Physiol., 147, 823-851, (2008) Hanhineva, K., et al., Phytochemistry, 69, 2463-2481 (2008) Bottcher,C., et al., Plant Physiol.,147,2107-2120, (2008) Farder, A. et al., J. Nutrition, 138, 1282-1287, (2008) Mintz-Oron, S., et al., Plant Physiol.,147,823-825, (2008) 17 papers-2008 (Red, Foreign country) 0 5 10 Target of Research Metabolomics Non-targeted Analysis Review Article Bioinformatics Methodology Development ., Nature Protocols Overy, D.P., et al , 3, 471-485, (2008) Dunn, W.B., Physical Biol.,5, 1-24, (2008) Akiyama, K., In Silico Biol., 8, 27, (2008) Sawada, Plant Cell Physiol., (2008) Arita,M. and Suwa, K., BioData Mining, 1,7.1-8 (2008) Saito, K. et al., Trends in Plant Sci.,13, 36-43, (2008) Akiyama, K., et al., In Silico Biol., 8, 339-345, (2008) Takahashi, H., Anal. Bioanal Chem. (in press) (2008) Iijima, Y., et al., Plant J., 54, 949-962, (2008) Want, E.J. et al., J. Proteome Res., 6, 459-468, (2007) 10 papers-2007 Sofia, M., et al., Trends in Anal. Chem., 26, 855-866, (2007) Hummel, J., et al., Topics in Curr. Genet., 18, 75-95, (2007) Gaida, A., and Neumann, S., J. Int. Bioinf., (2007) Griffiths,W.J.,Metabolomics,Metabolonomics and Metabolite Profiling,(Royal Soc.Chem.),2007 Ohta, D., et al., Anal.Biol. Chem.(2007) Nakamura, Y., et al., Planta, (2007) Suzuki, H., et al., Phytochemistry, (2007) Sakakibara, K., et al., , J .Biol. Chem.,282, 14932-14941, (2007) Saito, K. et al., Trends in Plant Sci., 13, 36-42, (2007) Kikuchi, K and Kakeya, H., Natuure Chem. Biol., 2, 392-394, (2006) 0 15 5 10 Arabidopsis thaliana Fragaria x ananassa Salanum lycopersicum Brassica oleracea 4 papers-2006 Curcuma longa Oikawa, A.,et al., Plant Physiol., 142, 398-413, (2006) Shinbo, Y., et al., Biotchnol. Agric. Forestry, 57, 166-181, (2006) Shinbo, Y., et al., J. Comput. Aided Chem., 7, 94-101, (2006) since 2004 Web-sites linked to KNApSAcK (WikiBook) http://en.wikibooks.org/wiki/Metabolomics/Databases (UC Davis) http://fiehnlab.ucdavis.edu/staff/kind/Metabolomics/Structure_Elucidation/ (KEGG) http://fire3.scl.genome.ad.jp/dbget-bin/www_bfind?knapsack (TAIR-Metabolomics Resource – Databases) http://www.arabidopsis.org/portals/metabolome/metabolome_database.jsp/ (LECO manual) Form No. 203-821-333 E. coli Rattus norvegicus 20 43 44 We learnt 1.Properties of some complex network models 2.Properties of Protein-Protein Interaction Networks 3.Usage of KNApSack Database