intro-to-ptools-and-biocyc - Bioinformatics Research Group at
Download
Report
Transcript intro-to-ptools-and-biocyc - Bioinformatics Research Group at
Introduction to the
Pathway Tools Software
and
BioCyc Database Collection
MetaCyc Family of
Pathway/Genome Databases
SRI International
Bioinformatics
2,500+
databases from multiple institutions
Cover all domains of life with microbial emphasis
All
DBs derived from MetaCyc via computational
pathway prediction
Common
schema
Common controlled
vocabularies
Common methodologies
Curated Databases Within the
MetaCyc Family
SRI International
Bioinformatics
Database
Organism
Organization
Curated From
MetaCyc
Multiorganism
SRI
34,000
EcoCyc
E. coli
SRI
23,000
HumanCyc
H. sapiens
SRI
AraCyc
A. thaliana
Carnegie Instit.
2,282
YeastCyc
S. cerevisiae
Stanford Univ
565
MouseCyc
M. musculus
Jackson Labs
BioCyc Collection of 1,700
Pathway/Genome Databases
Database (PGDB) –
combines information about
Pathways, reactions, substrates
Enzymes, transporters
Genes, replicons
Transcription factors/sites, promoters,
operons
Pathway/Genome
Tier
1: Literature-Derived PGDBs
MetaCyc, HumanCyc, YeastCyc
EcoCyc -- Escherichia coli K-12
AraCyc – Arabidopsis thaliana
Tier
2: Computationally-derived DBs,
Some Curation -- 34 PGDBs
Bacillus subtilis, Mycobacterium tuberculosis
Tier
3: Computationally-derived DBs, No
Curation -- The remainder
SRI International
Bioinformatics
SRI International
Bioinformatics
Pathway/Genome Database
Pathways
Reactions
Proteins
RNAs
Genes
Compounds
Sequence Features
Operons
Promoters
DNA Binding Sites
Regulatory Interactions
Chromosomes
Plasmids
CELL
Pathway Tools Software:
PGDBs Created Outside SRI
3,000+
SRI International
Bioinformatics
licensees: 250+ groups applying software to 1,700 organisms
Saccharomyces
cerevisiae, SGD project, Stanford University
135 pathways / 565 publications – BioCyc.org
FungiCyc, Broad Institute
Candida albicans, CGD project, Stanford University
dictyBase, Northwestern University
Mouse,
MGD, Jackson Laboratory -- BioCyc.org
Drosophila, FlyBase, Harvard University -- BioCyc.org
Under development:
C. elegans, WormBase
Arabidopsis
thaliana, TAIR, Carnegie Institution of Washington
288 pathways / 2282 publications – BioCyc.org
ChlamyCyc, GoFORSYS
PlantCyc, Carnegie Institution of Washington
Six Solanaceae species, Cornell University
GrameneDB, Cold Spring Harbor Laboratory
Medicago truncatula, Samuel Roberts Noble Foundation
Pathway Tools Software:
PGDBs Created Outside SRI
G.
SRI International
Bioinformatics
Serres, MBL, Shewanella oneidensis
M. Bibb, John Innes Centre, Streptomyces coelicolor
TBDB Project, Mycobacterium tuberculosis
F. Brinkman, Simon Fraser Univ, Pseudomonas aeruginosa
Genoscope, Acinetobacter
R.J.S. Baerends, University of Groningen, Lactococcus
lactis IL1403, Lactococcus lactis MG1363, Streptococcus
pneumoniae TIGR4, Bacillus subtilis 168, Bacillus cereus
ATCC14579
Matthew Berriman, Sanger Centre, Trypanosoma brucei,
Leishmania major
Sergio Encarnacion, UNAM, Sinorhizobium meliloti
Mark van der Giezen, University of London, Entamoeba
histolytica, Giardia intestinalis
Pathway Tools Software:
PGDBs Created Outside SRI
SRI International
Bioinformatics
Large scale users:
C. Medigue, Genoscope, 500+ PGDBs
J. Zucker, Broad Inst, 94 PGDBs
G. Sutton, J. Craig Venter Institute, 80+ PGDBs
G. Burger, U Montreal, 60+ PGDBs
E. Uberbacher, ORNL 33 Bioenergy-related organisms
Bart Weimer, UC Davis, Lactococcus lactis, Brevibacterium linens,
Lactobacillus acidophilus, Lactobacillus plantarum, Lactobacillus johnsonii,
Listeria monocytogenes
Partial
listing of outside PGDBs at
http://biocyc.org/otherpgdbs.shtml
Pathway Tools Software
Comprehensive
SRI International
Bioinformatics
software environment spanning
computational genomics and systems biology
Create and maintain an organism database
integrating genome, pathway, regulatory
information
Computational inference tools
Interactive editing tools
Query and visualize that database
Interpret genome-scale datasets
Comparative analysis tools
Generate flux-balance models
Pathway Tools Software
Annotated
Genome
Genome-Scale
Flux Model
+
SRI International
Bioinformatics
PathoLogic
Pathway/Genome
Database
Pathway/Genome
Editors
Briefings in Bioinformatics 11:40-79 2010
Pathway/Genome
Navigator
SRI International
Bioinformatics
Pathway Tools Software: PathoLogic
Computational
creation of new Pathway/Genome
Databases
Transforms
genome into Pathway Tools schema
and layers inferred information above the genome
Predicts
operons
Predicts metabolic network
Predicts which genes code for missing enzymes
in metabolic pathways
Infers transport reactions from transporter names
Bioinformatics 18:S225 2002
Pathway Tools Software:
Pathway/Genome Editors
Interactively update PGDBs
with graphical editors
Support geographically
distributed teams of
curators with object
database system
Gene editor
Protein editor
Reaction editor
Compound editor
Pathway editor
Operon editor
Publication editor
SRI International
Bioinformatics
What is Curation?
SRI International
Bioinformatics
Ongoing updating and refinement of a PGDB
Correcting false-positive and false-negative
predictions
Incorporating information from experimental literature
Authoring of comments and citations
Updating database fields
Gene positions, names, synonyms
Protein functions, activators, inhibitors
Addition of new pathways, modification of existing
pathways
Defining TF binding sites, promoters, regulation of
transcription initiation and other processes
Pathway Tools Software:
Pathway/Genome Navigator
Querying and visualization of:
Pathways
Reactions
Metabolites
Proteins
Genes
Chromosomes
Two modes of operation:
Web mode
Desktop mode
Most functionality shared, but each
has unique functionality
SRI International
Bioinformatics
SRI International
Bioinformatics
Pathway Tools Ontology / Schema
Ontology
classes: 1621
Datatype classes: Define objects from genomes to pathways
Classification systems for pathways, chemical compounds,
enzymatic reactions (EC system)
Protein Feature ontology
Controlled vocabularies:
Cell Component Ontology
Evidence codes
Comprehensive
relationships
set of 248 attributes and
What is a Pathway?
A
SRI International
Bioinformatics
connected sequence of biochemical reactions
Occurs in one organism
Conserved through evolution
Regulated as a unit
Starts or stops at one of 13 common intermediate
metabolites
SRI International
Bioinformatics
Comparison of BioCyc to KEGG
KEGG approach: Static collection of reference pathway
diagrams are color-coded to produce organism-specific
views
KEGG vs MetaCyc: Resource on literature-derived pathways
KEGG maps are not pathways
Nuc Acids Res 34:3687 2006
KEGG maps contain multiple biological pathways
KEGG maps are composites of pathways in many organisms -- do not
identify what specific pathways elucidated in what organisms
KEGG has no literature citations, no comments, less enzyme detail
KEGG vs BioCyc organism-specific PGDBs
KEGG does not curate or customize pathway networks for each organism
Highly curated PGDBs now exist for important organisms such as E. coli,
yeast, mouse, Arabidopsis
KEGG re-annotates entire genome for each organism
Comparison of
Pathway Tools to KEGG
Inference
SRI International
Bioinformatics
tools
KEGG does not predict presence or absence of pathways
KEGG lacks pathway hole filler, operon predictor
Curation tools
KEGG does not distribute curation tools
No ability to customize pathways to the organism
Pathway Tools schema much more comprehensive
Visualization and analysis
KEGG does not perform automatic pathway layout
No comparative pathway analysis
SRI International
Bioinformatics
Pathway Tools Implementation Details
Allegro
Common Lisp
PC/Windows, Linux, Macintosh platforms
Ocelot
object database
600,000+
lines of code
Lisp-based
WWW server at BioCyc.org
Manages 1,100+ PGDBs
EcoCyc iPhone App
Available
SRI International
Bioinformatics
in iTunes store
Free
Look
up gene information while on travel, at a
conference, in the library
Automated Generation of
Metabolic Flux Models from
PGDBs
Joint work with Mario Latendresse
SRI International
Bioinformatics
Flux-Balance Analysis
Nutrients
A
Steady state, constraint-based quantitative models of
metabolism
Starting information for organism of interest:
Secretions
Metabolic Reaction List
A
B
C
X
D
Biomass
D
Flux Balance Models
SRI International
Bioinformatics
Submit to linear optimization package
Optimize biomass production, ATP production, etc
Results
Steady-state reaction fluxes for the metabolic network
Remove reactions from the model to predict knock-out
phenotypes
Supply alternative nutrient sets to predict growth phenotypes
Approach: Derive FBA Models
from PGDBs
SRI International
Bioinformatics
Store and update metabolic model within Pathway Tools
The PGDB is the model
All query and visualization tools applicable to FBA model
FBA model is tightly coupled to genome and regulatory information
Export to constraint solver for model execution/solving
Reaction balance checking
Dead-end metabolite analysis
Visualize reaction flux using cellular overview
Multiple gap filling
SRI International
Bioinformatics
Multiple Gap Filling of FBA Models
Reaction
gap filling (Kumar et al, BMC Bioinf 2007 8:212):
Reverse directionality of selected reactions
Add a minimal number of reactions from MetaCyc to the
model to enable a solution
Reaction cost is a function of reaction taxonomic range
Metabolite
gap filling: Postulate additional
nutrients and secretions
Partial solutions: Identify maximal subset of
biomass components for which model can yield
positive production rates
Downloading Pathway Tools
SRI International
Bioinformatics
Obtain
license
http://biocyc.org/download.shtml
Download
Choose
directory offers several configurations
platform and database configuration
Many combinations of databases available
All databases requires a lot of memory
Use registry to add PGDBs to configuration you downloaded
Information Sources
SRI International
Bioinformatics
Pathway Tools User’s Guide
aic-export/pathway-tools/ptools/14.0/doc/manuals/userguide.pdf
NOTE: Location of the aic-export directory can vary across different
computers
Pathway Tools Web Site
http://bioinformatics.ai.sri.com/ptools/
Publications, FAQ, programming examples, etc.
Slides from this tutorial
http://bioinformatics.ai.sri.com/ptools/tutorial/sessions/
BioCyc Webinars
http://biocyc.org/webinar.shtml
Desktop vs Web functionality in Pathway Tools
http://biocyc.org/desktop-vs-web-mode.shtml
Information Sources
SRI International
Bioinformatics
Publications
“Pathway Tools version 13.0: Integrated Software for
Pathway/Genome Informatics and Systems Biology”,
Briefings in Bioinformatics 11:40-79 2010
“A survey of metabolic databases emphasizing the MetaCyc
family”, Archives of Toxicology 2011
Information Sources
BioCyc
Web site: Help Menu
Basic Help
Search Help
BioCyc Glossary
Publications
Website User Guide
PGDB Concepts
Guide to EcoCyc
Guide to MetaCyc
SRI International
Bioinformatics