Transcript BioCyc - Artificial Intelligence Center
Pathway Tools / BioCyc Fundamentals
1
Peter D. Karp, Ph.D.
Bioinformatics Research Group SRI International [email protected]
BioCyc.org
EcoCyc.org, MetaCyc.org, HumanCyc.org
SRI International Bioinformatics
Pathway Tools Capabilities
2
Create and maintain an organism database integrating genome, pathway, regulatory information
Computational inference tools Interactive editing tools
Query and visualize that database
Use the database to interpret omics data
Metabolic network analysis tools
Comparative analysis tools
Export the metabolic network to SBML
Speed creation of flux-balance models by order of magnitude
SRI International Bioinformatics
BioCyc
3
Hundreds of microbial genomes
Inferred operons and metabolic networks
Couples curated data with computational predictions
Supports analysis of omics data
Comparative analysis tools
Microbial emphasis. Exceptions:
HumanCyc, MouseCyc, CattleCyc
SRI International Bioinformatics
4
Model Organism Databases / Organism Specific Databases
DBs that describe the genome and other information about an organism
Every sequenced organism with an active experimental community requires a MOD
Integrate genome data with information about the biochemical and genetic network of the organism Integrate literature-based information with computational predictions
Curated by experts for that organism
No one group can curate all the world’s genomes Distribute workload across a community of experts to create a community resource
SRI International Bioinformatics
5
Rationale for MODs
Each “complete” genome is incomplete in several respects:
40%-60% of genes have no assigned function Roughly 7% of those assigned functions are incorrect Many assigned functions are non-specific
Need continuous updating of annotations with respect to new experimental data and computational predictions
MODs are platforms for global analyses of an organism
Interpret omics data in a pathway context In silico prediction of essential genes Characterize systems properties of metabolic and genetic networks
SRI International Bioinformatics
What is Curation?
6
Ongoing updating and refinement of a PGDB Correcting false-positive and false-negative predictions Incorporating information from experimental literature Authoring of comments and citations Updating database fields Gene positions, names, synonyms Protein functions, activators, inhibitors Addition of new pathways, modification of existing pathways Defining TF binding sites, promoters, regulation of transcription initiation and other processes SRI International Bioinformatics
7
Pathway/Genome Database
Pathways Reactions Proteins RNAs Genes Chromosomes Plasmids Compounds Sequence Features Regulation
Operons Promoters DNA Binding Sites Regulatory Interactions CELL
SRI International Bioinformatics
8
BioCyc Collection of 507 Pathway/Genome Databases
Pathway/Genome Database (PGDB) – combines information about
Pathways, reactions, substrates Enzymes, transporters Genes, replicons Transcription factors/sites, promoters, operons
Tier 1: Literature-Derived PGDBs
MetaCyc EcoCyc -- Escherichia coli K-12
Tier 2: Computationally-derived DBs, Some Curation -- 24 PGDBs
HumanCyc Mycobacterium tuberculosis
Tier 3: Computationally-derived DBs, No Curation -- 481 DBs SRI International Bioinformatics
9
Pathway Tools Overview
Annotated Genome PathoLogic Pathway/Genome Database MetaCyc Reference Pathway DB Pathway/Genome Editors Pathway/Genome Navigator SRI International Bioinformatics
10
Pathway Tools Software: PathoLogic
Computational creation of new Pathway/Genome Databases
Transforms genome into Pathway Tools schema and layers inferred information above the genome
Predicts operons
Predicts metabolic network
Predicts which genes code for missing enzymes in metabolic pathways
Infers transport reactions from transporter names Bioinformatics 18:S225 2002 SRI International Bioinformatics
11
Pathway Tools Software: Pathway/Genome Editors
Interactively update PGDBs with graphical editors
Support geographically distributed teams of curators with object database system
Gene editor
Protein editor
Reaction editor
Compound editor
Pathway editor
Operon editor
Publication editor SRI International Bioinformatics
Pathway Tools Software: Pathway/Genome Navigator
Querying and visualization of:
Pathways Reactions Metabolites Proteins Genes Chromosomes 12
Two modes of operation:
Web mode Desktop mode Most functionality shared, but each has unique functionality
SRI International Bioinformatics
13
Pathway Tools Software: PGDBs Created Outside SRI
1,700+ licensees: 75+ groups applying software to 300+ organisms
Saccharomyces cerevisiae, SGD project, Stanford University
135 pathways / 565 publications
Candida albicans, CGD project, Stanford University
dictyBase, Northwestern University
Mouse, MGD, Jackson Laboratory
Under development:
Drosophila, FlyBase C. elegans, WormBase
Arabidopsis thaliana, TAIR, Carnegie Institution of Washington
288 pathways / 2282 publications
PlantCyc, Carnegie Institution of Washington
Six Solanaceae species, Cornell University
GrameneDB, Cold Spring Harbor Laboratory
Medicago truncatula, Samuel Roberts Noble Foundation SRI International Bioinformatics
14
Pathway Tools Software: PGDBs Created Outside SRI
NIAID BRCs for Biodefense pathogens:
BioHealthBase -- Mycobacterium tuberculosis, Francisella tuleremia
Pathema -- 80+ PGDBs
PATRIC – Brucella suis, Coxiella burnetii, Rickettsia typhi EuPathDB – Cryptosporidium, Plasmodium
G. Xie, Los Alamos Lab, Dental pathogens
F. Brinkman, Simon Fraser Univ, Pseudomonas aeruginosa
V. Schachter, Genoscope, Acinetobacter
M. Bibb, John Innes Centre, Streptomyces coelicolor
G. Church, Harvard, Prochlorococcus marinus, multiple strains
E. Uberbacher, ORNL and G. Serres, MBL, Shewanella onedensis
R.J.S. Baerends, University of Groningen, Lactococcus lactis IL1403, Lactococcus lactis MG1363, Streptococcus pneumoniae TIGR4, Bacillus subtilis 168, Bacillus cereus ATCC14579
Matthew Berriman, Sanger Centre, Trypanosoma brucei, Leishmania major
Sergio Encarnacion, UNAM, Sinorhizobium meliloti
Mark van der Giezen, University of London, Entamoeba histolytica, Giardia intestinalis
Michael Gottfert, Technische Universitat Dresden, Bradyrhizobium japonicum
Artiva Maria Goudel, Universidade Federal de Santa Catarina, Brazil, Chromobacterium violaceum ATCC 12472 SRI International Bioinformatics
15
Pathway Tools Software: PGDBs Created Outside SRI
Large scale users:
C. Medigue, Genoscope, 200+ PGDBs
G. Sutton, J. Craig Venter Institute, 80+ PGDBs G. Burger, U Montreal, 60+ PGDBs
Bart Weimer, Utah State University
monocytogenes
, Lactococcus lactis, Brevibacterium linens,
Lactobacillus acidophilus, Lactobacillus plantarum, Lactobacillus johnsonii, Listeria
Partial listing of outside PGDBs at BioCyc.org
SRI International Bioinformatics
16
Obtaining a PGDB for Organism of Interest
Find existing curated PGDB
Find existing PGDB in BioCyc
Create your own SRI International Bioinformatics
EcoCyc Project – EcoCyc.org
E. co
li En cyc lopedia
Review-level Model-Organism Database for E. coli Tracks evolving annotation of the E. coli genome and cellular networks The two paradigms of EcoCyc
“Multi-dimensional annotation of the E. coli K-12 genome”
Positions of genes; functions of gene products – 76% / 66% exp Gene Ontology terms; MultiFun terms Gene product summaries and literature citations Evidence codes Multimeric complexes Metabolic pathways Cellular regulation Karp, Gunsalus, Collado-Vides, Paulsen 17
Nuc. Acids Res.
35:7577 2007
ASM News
70:25 2004
Science
293:2040 SRI International Bioinformatics
EcoCyc = E.coli Dataset +
URL: EcoCyc.org
Pathway/Genome Navigator
Pathways: 246 18 EcoCyc v13.6
Citations: 19,000 Reactions: Metabolic: 1394 Transport: 246 Proteins: 4,479 Complexes: 895 RNAs: 285 Genes: 4,492 Compounds: 1,830 Gene Regulation: Operons: 3,369 Trans Factors: 196 Promoters: 1,796 TF Binding Sites: 2,205
SRI International Bioinformatics
19
Paradigm 1: EcoCyc as Textual Review Article
All gene products for which experimental literature exists are curated with a minireview summary
Found on protein and RNA pages, not gene pages!
3257 gene products contain summaries
Summaries cover function, interactions, mutant phenotypes, crystal structures, regulation, and more
Additional summaries found in pages for operons, pathways
EcoCyc cites 17,300 publications SRI International Bioinformatics
20
Paradigm 2: EcoCyc as Computational Symbolic Theory
Highly structured, high-fidelity knowledge representation provides computable information
Each molecular species defined as a DB object
Genes, proteins, small molecules
Each molecular interaction defined as a DB object
Metabolic reactions Transport reactions Transcriptional regulation of gene expression
220 database fields capture extensive properties and relationships SRI International Bioinformatics
EcoCyc Procedures
21
DB updates performed by 5 staff curators
Information gathered from biomedical literature Enter data into structured database fields Author extensive summaries Update evidence codes Corrections submitted by E. coli researchers
Four releases per year
Quality assurance of data and software
Evaluate database consistency constraints Perform element balancing of reactions Run other checking programs
SRI International Bioinformatics
22
EcoCyc Accelerates Science
Experimentalists
E. coli experimentalists Experimentalists working with other microbes Analysis of expression data
Computational biologists
Biological research using computational methods Genome annotation Study connectivity of E. coli metabolic network Study phylogentic extent of metabolic pathways and enzymes in all domains of life
Bioinformaticists
Training and validation of new bioinformatics algorithms – predict operons, promoters, protein functional linkages, protein-protein interactions,
Metabolic engineers
“Design of organisms for the production of organic acids, amino acids, ethanol, hydrogen, and solvents “
Educators SRI International Bioinformatics
23
MetaCyc
:
Meta bolic En cyc lopedia
Describe a representative sample of every experimentally determined metabolic pathway
Describe properties of metabolic enzymes
Literature-based DB with extensive references and commentary
Pathways, reactions, enzymes, substrates
Jointly developed by
P. Karp, R. Caspi, C. Fulcher, SRI International L. Mueller, A. Pujar, Boyce Thompson Institute S. Rhee, P. Zhang, Carnegie Institution
Nucleic Acids Research
2008 SRI International Bioinformatics
Applications of MetaCyc
24
Reference source on metabolic pathways
Metabolic engineering
Find enzymes with desired activities, regulatory properties Determine cofactor requirements
Predict pathways from genomes
Systematic studies of metabolism
Computer-aided education SRI International Bioinformatics
25
MetaCyc Data -- Version 13.6
Pathways Reactions Enzymes Small Molecules Organisms Citations 1,436 8,200 6,060 8,400 1,800 21,700 SRI International Bioinformatics
26
Taxonomic Distribution of MetaCyc Pathways – version 13.1
Bacteria Green Plants Fungi Mammals Archaea 883 607 199 159 112 SRI International Bioinformatics
Enzyme Data Available in MetaCyc
30
Reaction(s) catalyzed
Alternative substrates
Activators, inhibitors, cofactors, prosthetic groups
Subunit structure
Genes
Features on protein sequence
Cellular location
pI, molecular weight, Km, Vmax
Gene Ontology terms
Links to other bioinformatics databases SRI International Bioinformatics
31
What is a Pathway?
A connected sequence of biochemical reactions
Occurs in one organism
Conserved through evolution
Regulated as a unit
Often starts or stops at one of 13 common intermediate metabolites SRI International Bioinformatics
32
MetaCyc Pathway Variants
Pathways that accomplish similar biochemical functions using different biochemical routes
Alanine biosynthesis I – E. coli Alanine biosynthesis II – H. sapiens
Pathways that accomplish similar biochemical functions using similar sets of reactions
Several variants of TCA Cycle
SRI International Bioinformatics
MetaCyc Super-Pathways
33
Groups of pathways linked by common substrates
Example: Super-pathway containing
Chorismate biosynthesis Tryptophan biosynthesis Phenylalanine biosynthesis Tyrosine biosynthesis
Super-pathways defined by listing their component pathways
Multiple levels of super-pathways can be defined
Pathway layout algorithms accommodate super-pathways SRI International Bioinformatics
35
Comparison with KEGG
KEGG vs MetaCyc: Reference pathway collections
KEGG maps are not pathways Nuc Acids Res 34:3687 2006 KEGG maps contain multiple biological pathways Two genes chosen at random from a BioCyc pathway are more likely to be related according to genome context methods than from a KEGG pathway KEGG maps are composites of pathways in many organisms -- do not identify what specific pathways elucidated in what organisms KEGG has no literature citations, no comments, less enzyme detail KEGG assigns half as many reactions to pathways as MetaCyc
KEGG vs organism-specific PGDBs
KEGG does not curate or customize pathway networks for each organism Highly curated PGDBs now exist for important organisms such as E. coli, yeast, mouse, Arabidopsis
SRI International Bioinformatics
Comparison of Pathway Tools to KEGG
36
Inference tools
KEGG does not predict presence or absence of pathways KEGG lacks pathway hole filler, operon predictor
Curation tools
KEGG does not distribute curation tools No ability to customize pathways to the organism Pathway Tools schema much more comprehensive
Visualization and analysis
KEGG does not perform automatic pathway layout KEGG metabolic-map diagram extremely limited No comparative pathway analysis
SRI International Bioinformatics
Pathway Tools Implementation Details
37
Platforms:
Macintosh, PC/Linux, and PC/Windows platforms
Same binary can run as desktop app or Web server
Production-quality software
Version control Two regular releases per year Extensive quality assurance Extensive documentation Auto-patch Automatic DB-upgrade
480,000 lines of Lisp code SRI International Bioinformatics
38
SRI International Bioinformatics
39
Pathway Tools Architecture
Web Mode Pathway Genome Navigator Desktop Mode Disk File Lisp Perl Java GFP API Protein Editor Pathway Editor Reaction Editor Ocelot DBMS SRI International Bioinformatics Oracle or MySQL
40
Ocelot Knowledge Server Architecture
Frame data model
Minimizes size of schema relative to semantic complexity
Schema is stored within the DB
Schema is self documenting
Slot units define metadata about slots
Domain, range, inverse Collection type, number of values, value constraints Comment
Schema evolution facilitated by
Easy addition/removal of slots, or alteration of slot datatypes Flexible data formats that do not require dumping/reloading of data
SRI International Bioinformatics
Ocelot Storage System Architecture
41
Persistent storage via disk files or Oracle or MySQL
Concurrent development: Oracle or MySQL Single-user development: disk files
Oracle/MySQL DBMS storage
DBMS is submerged within Ocelot, invisible to users Frames transferred from DBMS to Ocelot On demand By background prefetcher Memory cache Persistent disk cache to speed performance via Internet
Transaction logging facility SRI International Bioinformatics
Why Do We Code in Common Lisp?
42
Gatt studied Lisp and Java implementation of 16 programs by 14 programmers (Intelligence 11:21 2000)
The average Lisp program ran 33 times faster than the average Java program The average Lisp program was written 5 times faster than the average Java program
Roberts compared Java and Lisp implementations of a Domain Name Server (DNS) resolver
http://www.findinglisp.com/papers/case_study_java_lisp_dns.html
The Lisp version had ½ as many lines as code
SRI International Bioinformatics
43
Common Lisp Programming Environment
Interpreted and/or compiled execution
Fabulous debugging environment
High-level language
Interactive data exploration
Extensive built-in libraries
Dynamic redefinition
Find out more!
See ALU.org or http://www.international-lisp-conference.org/
SRI International Bioinformatics
44
PathoLogic Processing
1.
2.
3.
4.
5.
6.
Translate source genome to PGDB form Predict operons Predict metabolic pathways Predict pathway hole fillers Transport inference parser Build metabolic overview diagram SRI International Bioinformatics
45
PathoLogic Step 1: Translate Genome to PGDB
Annotated Genomic Sequence Pathway/Genome Database Gene Products Pathways Genes/ORFs DNA Sequences Multi-organism Pathway Database (MetaCyc) Pathways PathoLogic Software
Integrates genome and pathway data to identify putative metabolic networks
Reactions Compounds Gene Products Genes Reactions Genomic Map Compounds SRI International Bioinformatics
47
PathoLogic Step 2: Predict Operons
Predict adjacent genes A and B in same operon based on:
Intragenic distance Functional relatedness of A and B
Tests for functional relatedness:
A and B in same gene functional class (MultiFun) A and B in same metabolic pathway A codes for enzyme in a pathway and B codes for transporter involving a substrate in that pathway A and B are monomers in same protein complex
Correctly predicts 80% of E. coli transcription units Marks predicted operons with computational evidence codes
Bioinformatics
20:709-17 2004 SRI International Bioinformatics
48
PathoLogic Step 3: Prediction of Metabolic Pathways
Infer reaction complement of organism
Match enzymes in source genome to MetaCyc reactions they catalyze Match enzyme names and EC numbers to MetaCyc Support user in manually matching additional enzymes
Computationally predict which MetaCyc metabolic pathways are present
For each MetaCyc pathway, evaluate which of its reactions are catalyzed by the organism
SRI International Bioinformatics
Match Enzymes to Reactions
49
Gene product 5.1.3.2
MetaCyc UDP-glucose-4 epimerase
2057 proteins matched by EC# 314 matched by name
Match no yes Probable enzyme -ase no yes
1320
Assign
UDP-D-glucose UDP-galactose
Not a metabolic enzyme no Manually search yes Can’t Assign Assign SRI International Bioinformatics
625
50
Import Pathways
reactions Containing pathways MetaCyc Import All yes keep no Manual Review Prune?
no yes Delete delete SRI International Bioinformatics
51
Pathway Prediction
Prediction is hard because
Enzyme naming is irregular Some reactions present in multiple pathways Pathway variants share many reactions in common MetaCyc now has many pathways
SRI International Bioinformatics
Pathway Scoring Criteria
52
Imported pathways must satisfy:
Pathways outside their taxonomic range must have enzymes for all reactions If any reactions in a pathway are designated as “key,” an enzyme must be present for at least one
Pathway P is imported if any conditions satisfied:
One unique enzyme present for P P missing at most one reaction More reactions present than absent for P P is not a superset of another pathway with the same number of enzymes present
SRI International Bioinformatics
Pathway Evidence Report
53
SRI International Bioinformatics
PathoLogic Step 4: Pathway Hole Filler
Definition: Pathway Holes are reactions in metabolic pathways for which no enzyme is identified
54 L-aspartate
1.4.3
CC3619
.-
NAD+ synthetase, NH3 iminoaspartate
holes
deamido-NAD quinolinate synthetase nadA quinolinate
2.7.7.18
n.n. pyrophosphorylase nadC nicotinate nucleotide
6.3.5.1
NAD
SRI International Bioinformatics
55
Step 1: Query UniProt for all sequences having EC# of pathway hole Step 2: BLAST against target genome
organism 1
enzyme A
organism 2
enzyme A
organism 3
enzyme A
organism 4
enzyme A
organism 5
enzyme A
organism 6
enzyme A
organism 7
enzyme A
organism 8
enzyme A
Step 3 & 4: Consolidate hits and evaluate evidence 7 queries have high-scoring hits to sequence Y SRI International Bioinformatics
57
Pathway Hole Filler
Why should hole filler find things beyond the original genome annotation?
Reverse BLAST searches more sensitive
Reverse BLAST searches find second domains
Integration of multiple evidence types SRI International Bioinformatics
Caulobacter crescentus Pathway Holes
58
130 pathways containing 582 reactions 92 pathways contain 236 pathway holes Caulobacter holes filled:
77 holes filled at P >0.9
Previous functions of candidate hole fillers:
No predicted function Correctly assigned single function Incorrectly assigned function Imprecise functional assignment
BMC Bioinformatics 5:76 2004
SRI International Bioinformatics
Example Pathway
CC2913, P=0.99
L-aspartate
1.4.3.-
iminoaspartate quinolinate synthetase nadA (CC2912) quinolinate 59 NAD+ synthetase, NH3 CC3619
holes
deamido-NAD NAD n.n. pyrophosphorylase nadC (CC2915)
2.7.7.18
nicotinate nucleotide
6.3.5.1
CC3619, P=0.99
CC3431*, P=0.90
CC2913 L-aspartate oxidase (wrong EC# on rxn) CC3431 ORF CC3619 put. NAD(+)-synthetase (multidomain)
SRI International Bioinformatics
60
PathoLogic Step 5: Transport Inference Parser
Problem: Write a program to query a genome annotation to compute the substrates an organism can transport
Typical genome annotations for transporters:
ATP transporter for ribose ribose ABC transporter D-ribose ATP transporter ABC transporter, membrane spanning protein [ribose] ABC transporter, membrane spanning protein [D-ribose]
SRI International Bioinformatics
Transport Inference Parser
61
Input: “ATP transporter of phosphonate”
Output: Structured description of transport activity
Locates most transporters in genome annotation using keyword analysis
Parse product name using a series of rules to identify:
Transported substrate, co-substrate Influx/efflux Energy coupling mechanism
Creates transport reaction object: phosphonate [periplasm] + H 2 O + ATP = phosphonate + P i + ADP SRI International Bioinformatics
62
Transport Inference Parser
Permits symbolic computation with transport activities:
Compute transportable substrates of the cell Compute connectivity among compartments for substrates Facilitate reasoning about transport/metabolism connections Draw transport cartoon in protein pages, cellular overview
SRI International Bioinformatics
63
Transport Inference Parser
User reviews all assignments using interactive tool that allows assignments to be revised
User also reviews transporters for which no assignment was made SRI International Bioinformatics
64
Regulation
SRI International Bioinformatics
65
Encoding Cellular Regulation in Pathway Tools -- Goals
Facilitate curation of wide range of regulatory information within a formal ontology
Compute with regulatory mechanisms and pathways
Summary statistics, complex queries Pattern discovery Visualization of network components
Provide training sets for inference of regulatory networks
Interpret gene-expression datasets in the context of known regulatory mechanisms SRI International Bioinformatics
66
Regulatory Interactions Supported by Pathway Tools
Substrate-level regulation of enzyme activity
Binding to proteins or small molecules (phosphorylation)
Regulation of transcription initiation
Attenuation of transcription
Regulation of translation by proteins and by small RNAs SRI International Bioinformatics
67
Regulation in Pathway Tools
Editing tools
Transcription factor display window
Transcription unit display window
Regulatory Overview / Omics Viewer SRI International Bioinformatics
Regulatory Interaction Editor
68
SRI International Bioinformatics
69
Regulatory Overview and Omics Viewer
Show regulatory relationships among gene groups SRI International Bioinformatics
Comparative Analysis
Via Cellular Overview
Comparative genome browser
Comparative pathway table
Comparative analysis reports
Compare reaction complements Compare pathway complements Compare transporter complements 71
SRI International Bioinformatics
Information Sources
73
Pathway Tools User’s Guide
aic-export/pathway-tools/ptools/13.0/doc/manuals/userguide.pdf
NOTE: Location of the aic-export directory can vary across different computers
Pathway Tools Web Site
http://bioinformatics.ai.sri.com/ptools/ Publications, FAQ, programming examples, etc.
Slides from this tutorial
http://www.ai.sri.com/pkarp/talks/
BioCyc Webinars
http://biocyc.org/webinar.shtml
SRI International Bioinformatics
74
BioCyc and Pathway Tools Availability
BioCyc.org Web site and database files freely available to all
Pathway Tools freely available to non-profits
Macintosh, PC/Windows, PC/Linux
SRI International Bioinformatics
75
Symbolic Systems Biology
Definition: Global analyses of biological systems using symbolic computing SRI International Bioinformatics
76
Symbolic Systems Biology
“Symbolic computing is concerned with the representation and manipulation of information in symbolic form. It is often contrasted with numeric representation.” -- R. Cameron
Examples of symbolic computation:
Symbolic algebra programs, e.g., Mathematica, Graphing Calculator Compilers and interpreters for programming languages Database query languages Text analysis programs, e.g., Google String matching for DNA and protein sequences Artificial Intelligence methods, e.g., expert systems, symbolic logic, machine learning, natural language understanding
SRI International Bioinformatics
77
Symbolic Systems Biology
Concerned with different questions than quantitative systems biology
Symbolic analyses can in many cases produce answers when quantitative approaches fail because of lack of parameters or intractable mathematics
Symbolic computation is intimately dependent on the use of structured ontologies SRI International Bioinformatics
Pathway Tools Ontology
78
1064 classes
Main classes such as: Pathways, Reactions, Compounds, Macromolecules, Proteins, Replicons, DNA-Segments (Genes, Operons, Promoters) Taxonomies for Pathways, Reactions, Compounds
205 slots
Meta-data: Creator, Creation-Date Comment, Citations, Common-Name, Synonyms Attributes: Molecular-Weight, DNA-Footprint-Size Relationships: Catalyzes, Component-Of, Product
Classes, instances, slots all stored side by side in DBMS SRI International Bioinformatics
Critiquing the Parts List
Slide thanks to Hirotada Mori (minus the banana!)
79
SRI International Bioinformatics
80
Dead End Metabolites
A small molecule C is a dead-end if:
C is produced only by SMM reactions in Compartment, and no transporter acts on C in Compartment OR C is consumed only by SMM reactions in Compartment, and no transporter acts on C in Compartment
SRI International Bioinformatics
81
Dead End Metabolites
Not yet an official part of Pathway Tools
Contact us if you’d like to use it SRI International Bioinformatics
82
Reachability Analysis of Metabolic Networks
Given:
A PGDB for an organism A set of initial metabolites
Infer:
What set of products can be synthesized by the small-molecule metabolism of the organism
Motivations:
Quality control for PGDBs Verify that a known growth medium yields known essential compounds Experiment with other growth media Experiment with reaction knock-outs
Limitations
Cannot properly handle compounds required for their own synthesis Nutrients needed for reachability may be a superset of those required for growth Romero and Karp,
Pacific Symposium on Biocomputing,
2001 SRI International Bioinformatics
Algorithm: Forward Propagation Through Production System
83
Each reaction becomes a production rule
Each of the 21 metabolites in the nutrient set becomes an axiom
Nutrient set Products Metabolite pool
PGDB reaction set
“Fire” reactions
A + B
C
Reactants
SRI International Bioinformatics
84 Nutrients: A, B, C, E, F A + B W C + D X E + F Y W + Y Z Produced Compounds: W, Y, Z
SRI International Bioinformatics
Initial Metabolite Nutrient Set (Total: 21 compounds)
85
Nutrients (8)
(M61 Minimal growth medium)
Nutrients (10)
(Environment)
H
+
, Fe
2+
, Mg
2+
, K
+
, NH
3
, SO
4 2-
, PO
4 2-
, Glucose Water, Oxygen, Trace elements (Mn
2+
, Co
2+
, Mo
2+
, Ca
2+
, Zn
2+
, Cd
2+
, Ni
2+
, Cu
2+
) Bootstrap Compounds (3) ATP, NADP, CoA
SRI International Bioinformatics
86
Essential Compounds E. coli Total: 41 compounds
Proteins (20)
Amino acids
Nucleic acids (DNA & RNA) (8)
Nucleosides
Cell membrane (3)
Phospholipids
Cell wall (10)
Peptidoglycan precursors Outer cell wall precursors (Lipid-A, oligosaccharides)
SRI International Bioinformatics
87
SRI International Bioinformatics
88
http://brg.ai.sri.com/ptools09/slides/Tuesday/growt h-experiment-Markus-Krummenacker.txt
SRI International Bioinformatics
89
Flux Balance Modeling
Generate, store, and update metabolic model within Pathway Tools
Fast, accurate generation of metabolic model Close coupling to genome and regulatory information Extensive schema Extensive query and visualization tools
Debug/validate model using Pathway Tools
Export to SBML and import to constraint solver for model execution
Visualize reaction flux and omics data using overviews
Copy/update multiple PGDBs to reflect alternative strains SRI International Bioinformatics