Transcript General
Overview of the
Pathway Tools Software
and
Pathway/Genome Databases
Introductions
BRG Staff
Peter Karp
Tomer Altman
Joe Dale
Fred Gilham
John Myers
Suzanne Paley
Markus Krummenacker
Ingrid Keseler
Ron Caspi
Alex Shearer
Carol Fulcher
Attendees
Where from, what genome?
What do you hope to get out of the tutorial?
SRI International
Bioinformatics
SRI International
Private nonprofit
research institute
No permanent funding
sources
1300 staff in Menlo
Park
– Founded in 1946 as Stanford Research Institute
– Separated from Stanford University in 1970
– Name changed to SRI International in 1977
SRI International
Bioinformatics
SRI International
Bioinformatics
SRI Organization
Bioinformatics Research Group
Information and
Computing Sciences
Biopharmaceuticals
And
Pharmaceutical
Discovery
Education
and
Policy
Engineering Systems
And Sciences
Physical
Sciences
SRI International
Bioinformatics
Research in the SRI
Bioinformatics Research Group
BioCyc
Database Collection
EcoCyc
MetaCyc
Pathway Tools
BioWarehouse
Outline for Tutorial
SRI International
Bioinformatics
Monday
Introduction
Pathway/Genome Navigator
Introduction to Pathway/Genome Editors
Tuesday
PathoLogic tutorial
PathoLogic lab session – Build initial version of PGDB
Pathway hole filler lecture+lab
Wednesday
PathoLogic: Creating protein complexes, operon predictor, transport inference
parser
Pathway Tools Schema
Model organism database projects
Thursday
Advanced Pathway/Genome Editors
Friday
Overviews and Omics Viewers
Comparative analysis
Structured Advanced Query Form
Metabolite Tracing
Regulation
Tutorial Goals
SRI International
Bioinformatics
General
familiarity with Pathway Tools goals and
functionality
Ability
to create, edit, and navigate a new PGDB
Create
new PGDB for genome(s) you brought with
you
Familiarity
with information resources available
about Pathway Tools to continue your work
SRI International
Bioinformatics
SRI’s Support for Pathway Tools
NIH
grant finances software development and
user support
Additional
grants finance other software
development
Email
us bug reports, suggestions, questions
Comprehensive
bug reports are required for us to
fix the problem you reported
Keep
us posted regarding your progress
Administrative Details
Please
SRI International
Bioinformatics
wear badge at all times
Escort required outside this room/hallway
Let us know when you are leaving
Use
E-Bldg Entrance
Phone numbers to call from entrance
Meals
Restrooms
Tutorial Format
Questions
SRI International
Bioinformatics
welcome during presentations
Lab
sessions will take different amounts of time
for different people
Refine your PGDB
Read Pathway Tools manuals
Computer
Internet
logins
connectivity
SRI International
Bioinformatics
Pathway/Genome Database
Pathways
Reactions
Proteins
RNAs
Genes
Compounds
Sequence Features
Operons
Promoters
DNA Binding Sites
Regulatory Interactions
Chromosomes
Plasmids
CELL
BioCyc Collection of
Pathway/Genome Databases
Database (PGDB) –
combines information about
Pathways, reactions, substrates
Enzymes, transporters
Genes, replicons
Transcription factors/sites, promoters,
operons
Pathway/Genome
Tier
1: Literature-Derived PGDBs
MetaCyc
EcoCyc -- Escherichia coli K-12
Tier
2: Computationally-derived DBs,
Some Curation -- 20 PGDBs
HumanCyc
Mycobacterium tuberculosis
Tier
3: Computationally-derived DBs,
No Curation -- 349 DBs
SRI International
Bioinformatics
Terminology –
Pathway Tools Software
SRI International
Bioinformatics
PathoLogic
Predicts operons, metabolic network, pathway hole fillers, from genome
Computational creation of new Pathway/Genome Databases
Pathway/Genome Editors
Distributed curation of PGDBs
Distributed object database system, interactive editing tools
Pathway/Genome Navigator
WWW publishing of PGDBs
Querying, visualization of pathways, chromosomes, operons
Analysis operations
Pathway visualization of gene-expression data
Global comparisons of metabolic networks
Bioinformatics 18:S225 2002
Pathway Tools Software:
PGDBs Created Outside SRI
1000+
SRI International
Bioinformatics
licensees: 75+ groups applying software to 150+ organisms
Saccharomyces
cerevisiae, SGD project, Stanford University
pathway.yeastgenome.org/biocyc/
Mouse, MGD, Jackson Laboratory
dictyBase, Northwestern University
Under development:
CGD (Candida albicans), Stanford University
Drosophila, P. Ebert in collaboration with FlyBase
C. elegans, P. Ebert in collaboration with WormBase
Planned:
RGD (Rat), Medical College of Wisconsin
Arabidopsis
thaliana, TAIR, Carnegie Institution of Washington
Tomato and Potato, Cornell University
GrameneDB, Cold Spring Harbor Laboratory
Medicago truncatula, Samuel Roberts Noble Foundation
Pathway Tools Software:
PGDBs Created Outside SRI
NIAID
SRI International
Bioinformatics
BRCs: BioHealthBase (M. tuberculosis, F. tuleremia), PATRIC, ApiDB
(Cryptosporidium)
F. Brinkman, Simon Fraser Univ, Pseudomonas aeruginosa
V. Schachter, Genoscope, Acinetobacter
M. Bibb, John Innes Centre, Streptomyces coelicolor
G. Church, Harvard, Prochlorococcus marinus, multiple strains
E. Uberbacher, ORNL and G. Serres, MBL, Shewanella onedensis
R.J.S. Baerends, University of Groningen, Lactococcus lactis IL1403,
Lactococcus lactis MG1363, Streptococcus pneumoniae TIGR4, Bacillus
subtilis 168, Bacillus cereus ATCC14579
Matthew Berriman, Sanger Centre, Trypanosoma brucei, Leishmania major
Herbert Chiang, Washington University, Bacteroides thetaiotaomicron
Sergio Encarnacion, UNAM, Sinorhizobium meliloti
Gregory Fournier, MIT, Mesoplasma florum
Mark van der Giezen, University of London, Entamoeba histolytica, Giardia
intestinalis
Michael Gottfert, Technische Universitat Dresden, Bradyrhizobium
japonicum
Artiva Maria Goudel, Universidade Federal de Santa Catarina, Brazil,
Chromobacterium violaceum ATCC 12472
Kenneth J. Kauffman, University of California, Riverside, Desulfovibrio
vulgaris
Pathway Tools Software:
PGDBs Created Outside SRI
SRI International
Bioinformatics
Mike McLeod, University of British Columbia, Rhodococcus sp.
RHA1
Robert S. Munson, Children's Research Institute, Ohio,
Haemophilus ducreyi, Haemophilus influenzae 86-026NP
John Nash, Canadian NRC, Campylobacter jejuni
Christopher S. Reigstad, Washington University, Escherichia coli
UTI89
Haluk Resat, Pacific Northwest Lab, Rhodobacter sphearoides
Gary Xie, Los Alamos Lab, Bacillus cereus
Large scale users:
C. Medigue, Genoscope, 107 PGDBs
G. Burger, U Montreal, 48 PGDBs
Bart Weimer, Utah State University, Lactococcus lactis, Brevibacterium linens,
Lactobacillus acidophilus, Lactobacillus plantarum, Lactobacillus johnsonii,
Listeria monocytogenes
Partial
listing of outside PGDBs at BioCyc.org
Terminology
“Database”
SRI International
Bioinformatics
= “DB” = “Knowledge Base” = “KB” =
“Pathway/Genome Database” = “PGDB”
Why Create PGDBs?
SRI International
Bioinformatics
Extract more information from your genome
Create an up-to-date computable information repository
about an organism
Perform analyses on the genome and pathway complement
of the organism
Analyses of omics data
Analyses of cellular systems (dead-end metabolites)
Reports generated by Pathway Tools
Perform comparative analyses with other organisms
Generate a genome poster and metabolic wall chart
Sequence Project Workflow
Raw Sequence
Phred
SRI International
Bioinformatics
PathoLogic
P/G Editors
Pathway
Tools
Phrap
P/G Navigator
GeneMark/Glimmer
BLAST, BLOCKS
WWW Publishing
Analyses
EcoCyc Project – EcoCyc.org
SRI International
Bioinformatics
E. coli Encyclopedia
Review-level Model-Organism Database for E. coli
Tracks evolving annotation of the E. coli genome and cellular networks
The two paradigms of EcoCyc
“Multi-dimensional annotation of the E. coli K-12 genome”
Positions of genes; functions of gene products – 76% / 66% exp
Gene Ontology terms; MultiFun terms
Gene product summaries and literature citations
Evidence codes
Multimeric complexes
Metabolic pathways
Regulation of transcription initiation
Karp, Gunsalus, Collado-Vides, Paulsen
Nuc. Acids Res. 35:7577 2007
ASM News 70:25 2004
Science 293:2040
SRI International
Bioinformatics
Paradigm 1:
EcoCyc as Textual Review Article
All
gene products for which experimental literature
exists are curated with a minireview summary
Found on protein and RNA pages, not gene pages!
3257 gene products contain summaries
Summaries cover function, interactions, mutant
phenotypes, crystal structures, regulation, and more
Additional
summaries found in pages for operons,
pathways
EcoCyc
cites 15,880 publications
SRI International
Bioinformatics
Paradigm 2: EcoCyc as
Computational Symbolic Theory
Highly
structured, high-fidelity knowledge
representation provides computable information
Each molecular species defined as a DB object
Genes, proteins, small molecules
Each molecular interaction defined as a DB object
Metabolic reactions
Transport reactions
Transcriptional regulation of gene expression
220 database fields capture extensive properties
and relationships
EcoCyc Procedures
DB updates performed by 5 staff curators
Information gathered from biomedical literature
Enter data into structured database fields
Author extensive summaries
Update evidence codes
Corrections submitted by E. coli researchers
Four releases per year
Quality assurance of data and software
Evaluate database consistency constraints
Perform element balancing of reactions
Run other checking programs
SRI International
Bioinformatics
SRI International
Bioinformatics
MetaCyc: Metabolic Encyclopedia
Describe a representative sample of every experimentally
determined metabolic pathway
Describe properties of metabolic enzymes
Literature-based DB with extensive references and
commentary
Pathways, reactions, enzymes, substrates
Jointly developed by
P. Karp, R. Caspi, C. Fulcher, SRI International
L. Mueller, A. Pujar, Cornell Univ
S. Rhee, P. Zhang, Carnegie Institution
Nucleic Acids Research 2008
MetaCyc Data -- Version 11.6
Pathways
1010
Reactions
6,576
Enzymes
4,582
Small Molecules
6,561
Organisms
1,077
Citations
15,875
SRI International
Bioinformatics
Taxonomic Distribution of
MetaCyc Pathways
Bacteria
517
Green Plants
372
Mammals
90
Fungi
89
Archaea
65
SRI International
Bioinformatics
Family of Pathway/Genome
Databases
EcoCyc
MetaCyc
SRI International
Bioinformatics
CauloCyc
AraCyc
MtbRvCyc
HumanCyc
SRI International
Bioinformatics
Comparison of BioCyc to KEGG:
The Data
KEGG approach: Static collection of pathway diagrams that
are color-coded to produce organism-specific views
KEGG vs MetaCyc: Resource on literature-derived pathways
KEGG pathway maps are composites of pathways in many organisms -do not identify what specific pathways elucidated in what organisms
KEGG pathway maps encompass multiple biological pathways; are 2-4
times the size of MetaCyc pathways
KEGG has no literature citations, no summaries, less enzyme detail
KEGG vs BioCyc organism-specific PGDBs
KEGG re-annotates entire genome for each organism
KEGG does not curate or customize pathway networks for each organism
SRI International
Bioinformatics
Comparison of Pathway Tools to
KEGG: The Software
KEGG has no pathway hole filler or transport inference
parser or operon predictor
KEGG has no interactive editing tools – you cannot refine a
KEGG pathway DB
KEGG has no algorithmic visualization tools – pathway
diagrams are pre-drawn
May become out of date
Cannot show pathways at multiple detail levels
KEGG genome browser has very limited functionality
KEGG has one overview diagram with limited functionality
KEGG has no metabolite tracing tool
KEGG has no Structured Advanced Query Tool
SRI International
Bioinformatics
Overviews and Omics Viewers
Genome-scale Visualizations
Metabolic map
Transcriptional regulatory network
Genome map
Overlay gene expression, proteomics, metabolomics data
Obtain pathway based visualizations of omics data
Numerical spectrum of expression values mapped to a color spectrum
Steps of overview painted with color corresponding to expression level(s)
of genes that encode enzyme(s) for that step
SRI International
Bioinformatics
Environment for Computational
Exploration of Genomes
Powerful
ontology opens many facets of the
biology to computational exploration
Global
characterization of metabolic network
Analysis of interface between transport and
metabolism
Nutrient analysis of metabolic network
SRI International
Bioinformatics
Pathway Tools Implementation Details
Allegro
Common Lisp
Sun, Linux, Windows, Macintosh platforms
Ocelot
object database
370,000+
lines of code
Lisp-based
WWW server at BioCyc.org
Manages 370+ PGDBs
SRI International
Bioinformatics
The Common Lisp Programming
Environment
Gatt
studied
Lisp and Java
implementation
of 16 programs
by 14
programmers
(Intelligence
11:21 2000)
Survey
Please
SRI International
Bioinformatics
complete survey at end of each day
PGDB(s) That You Build
Before
SRI International
Bioinformatics
you leave
Tar up your PGDB directory and FTP it home, email it home,
or copy it to flash disk
We will create a backup copy of your PGDB directory if the
directory is still there at the end of the tutorial
Delete the PGDB directory if you don’t want us to back it up
We will not give the backed up data to anyone else
Information Sources
SRI International
Bioinformatics
Pathway Tools User’s Guide
/root/aic-export/pathway-tools/ptools/11.5/doc/manuals/userguide.pdf
NOTE: Location of the aic-export directory can vary across different computers
Pathway Tools Web Site
Publications, FAQ, programming examples, etc.
http://bioinformatics.ai.sri.com/ptools/
BioCyc Publications Page
http://biocyc.org/publications.shtml
MetaCyc Guide
http://metacyc.org/MetaCycUserGuide.shtml
Slides from this tutorial
http://bioinformatics.ai.sri.com/ptools/tutorial/
BioCyc Webinars
http://biocyc.org/webinar.shtml
SRI International
Bioinformatics
Reporting Pathway Tools Problems
[email protected]
Tell us:
What platform you are running on
What version of Pathway Tools you are running
The error message
Result of
[1] EC(2) :zoom :count :all
What operation were you performing when the error occurred?
New patches automatically downloaded and loaded with
PTools starts up
Auto-Patch
Tools -> Instant Patch -> Download and Activate All Patches
Summary
SRI International
Bioinformatics
Pathway
Tools and Pathway/Genome Databases
Not just for pathways!
Computational inferences
Operons, metabolic pathways, pathway hole fillers
Editing tools
Analysis tools: Omics data on pathways
Web publishing of PGDBs
Main
classes of users:
Develop PGDB to extract more information from genome for
genome paper
Develop a model-organism DB for the organism that is
updated regularly and published on the web