Transcript General

Overview of the
Pathway Tools Software
and
Pathway/Genome Databases
Introductions

BRG Staff
 Peter Karp
 Tomer Altman
 Joe Dale
 Fred Gilham
 John Myers
 Suzanne Paley
 Markus Krummenacker
 Ingrid Keseler
 Ron Caspi
 Alex Shearer
 Carol Fulcher

Attendees
 Where from, what genome?
 What do you hope to get out of the tutorial?
SRI International
Bioinformatics
SRI International

Private nonprofit
research institute

No permanent funding
sources

1300 staff in Menlo
Park
– Founded in 1946 as Stanford Research Institute
– Separated from Stanford University in 1970
– Name changed to SRI International in 1977
SRI International
Bioinformatics
SRI International
Bioinformatics
SRI Organization
Bioinformatics Research Group
Information and
Computing Sciences
Biopharmaceuticals
And
Pharmaceutical
Discovery
Education
and
Policy
Engineering Systems
And Sciences
Physical
Sciences
SRI International
Bioinformatics
Research in the SRI
Bioinformatics Research Group
 BioCyc
Database Collection
 EcoCyc
 MetaCyc
 Pathway Tools
 BioWarehouse
Outline for Tutorial





SRI International
Bioinformatics
Monday
 Introduction
 Pathway/Genome Navigator
 Introduction to Pathway/Genome Editors
Tuesday
 PathoLogic tutorial
 PathoLogic lab session – Build initial version of PGDB
 Pathway hole filler lecture+lab
Wednesday
 PathoLogic: Creating protein complexes, operon predictor, transport inference
parser
 Pathway Tools Schema
 Model organism database projects
Thursday
 Advanced Pathway/Genome Editors
Friday
 Overviews and Omics Viewers
 Comparative analysis
 Structured Advanced Query Form
 Metabolite Tracing
 Regulation
Tutorial Goals
SRI International
Bioinformatics
 General
familiarity with Pathway Tools goals and
functionality
 Ability
to create, edit, and navigate a new PGDB
 Create
new PGDB for genome(s) you brought with
you
 Familiarity
with information resources available
about Pathway Tools to continue your work
SRI International
Bioinformatics
SRI’s Support for Pathway Tools
 NIH
grant finances software development and
user support
 Additional
grants finance other software
development
 Email
us bug reports, suggestions, questions
 Comprehensive
bug reports are required for us to
fix the problem you reported
 Keep
us posted regarding your progress
Administrative Details
 Please
SRI International
Bioinformatics
wear badge at all times
 Escort required outside this room/hallway
 Let us know when you are leaving
 Use
E-Bldg Entrance
 Phone numbers to call from entrance
 Meals
 Restrooms
Tutorial Format
 Questions
SRI International
Bioinformatics
welcome during presentations
 Lab
sessions will take different amounts of time
for different people
 Refine your PGDB
 Read Pathway Tools manuals
 Computer
 Internet
logins
connectivity
SRI International
Bioinformatics
Pathway/Genome Database
Pathways
Reactions
Proteins
RNAs
Genes
Compounds
Sequence Features
Operons
Promoters
DNA Binding Sites
Regulatory Interactions
Chromosomes
Plasmids
CELL
BioCyc Collection of
Pathway/Genome Databases
Database (PGDB) –
combines information about
 Pathways, reactions, substrates
 Enzymes, transporters
 Genes, replicons
 Transcription factors/sites, promoters,
operons
Pathway/Genome
Tier
1: Literature-Derived PGDBs
 MetaCyc
 EcoCyc -- Escherichia coli K-12
Tier
2: Computationally-derived DBs,
Some Curation -- 20 PGDBs
 HumanCyc
 Mycobacterium tuberculosis
Tier
3: Computationally-derived DBs,
No Curation -- 349 DBs
SRI International
Bioinformatics
Terminology –
Pathway Tools Software
SRI International
Bioinformatics

PathoLogic
 Predicts operons, metabolic network, pathway hole fillers, from genome
 Computational creation of new Pathway/Genome Databases

Pathway/Genome Editors
 Distributed curation of PGDBs
 Distributed object database system, interactive editing tools

Pathway/Genome Navigator
 WWW publishing of PGDBs
 Querying, visualization of pathways, chromosomes, operons
 Analysis operations


Pathway visualization of gene-expression data
Global comparisons of metabolic networks
Bioinformatics 18:S225 2002
Pathway Tools Software:
PGDBs Created Outside SRI
1000+
SRI International
Bioinformatics
licensees: 75+ groups applying software to 150+ organisms
Saccharomyces
cerevisiae, SGD project, Stanford University
 pathway.yeastgenome.org/biocyc/
Mouse, MGD, Jackson Laboratory
dictyBase, Northwestern University
Under development:
 CGD (Candida albicans), Stanford University
 Drosophila, P. Ebert in collaboration with FlyBase
 C. elegans, P. Ebert in collaboration with WormBase
Planned:
 RGD (Rat), Medical College of Wisconsin
Arabidopsis
thaliana, TAIR, Carnegie Institution of Washington
Tomato and Potato, Cornell University
GrameneDB, Cold Spring Harbor Laboratory
Medicago truncatula, Samuel Roberts Noble Foundation
Pathway Tools Software:
PGDBs Created Outside SRI
NIAID
SRI International
Bioinformatics
BRCs: BioHealthBase (M. tuberculosis, F. tuleremia), PATRIC, ApiDB
(Cryptosporidium)
F. Brinkman, Simon Fraser Univ, Pseudomonas aeruginosa
V. Schachter, Genoscope, Acinetobacter
M. Bibb, John Innes Centre, Streptomyces coelicolor
G. Church, Harvard, Prochlorococcus marinus, multiple strains
E. Uberbacher, ORNL and G. Serres, MBL, Shewanella onedensis
R.J.S. Baerends, University of Groningen, Lactococcus lactis IL1403,
Lactococcus lactis MG1363, Streptococcus pneumoniae TIGR4, Bacillus
subtilis 168, Bacillus cereus ATCC14579
Matthew Berriman, Sanger Centre, Trypanosoma brucei, Leishmania major
Herbert Chiang, Washington University, Bacteroides thetaiotaomicron
Sergio Encarnacion, UNAM, Sinorhizobium meliloti
Gregory Fournier, MIT, Mesoplasma florum
Mark van der Giezen, University of London, Entamoeba histolytica, Giardia
intestinalis
Michael Gottfert, Technische Universitat Dresden, Bradyrhizobium
japonicum
Artiva Maria Goudel, Universidade Federal de Santa Catarina, Brazil,
Chromobacterium violaceum ATCC 12472
Kenneth J. Kauffman, University of California, Riverside, Desulfovibrio
vulgaris
Pathway Tools Software:
PGDBs Created Outside SRI







SRI International
Bioinformatics
Mike McLeod, University of British Columbia, Rhodococcus sp.
RHA1
Robert S. Munson, Children's Research Institute, Ohio,
Haemophilus ducreyi, Haemophilus influenzae 86-026NP
John Nash, Canadian NRC, Campylobacter jejuni
Christopher S. Reigstad, Washington University, Escherichia coli
UTI89
Haluk Resat, Pacific Northwest Lab, Rhodobacter sphearoides
Gary Xie, Los Alamos Lab, Bacillus cereus
Large scale users:
 C. Medigue, Genoscope, 107 PGDBs
 G. Burger, U Montreal, 48 PGDBs

Bart Weimer, Utah State University, Lactococcus lactis, Brevibacterium linens,
Lactobacillus acidophilus, Lactobacillus plantarum, Lactobacillus johnsonii,
Listeria monocytogenes
 Partial
listing of outside PGDBs at BioCyc.org
Terminology
 “Database”
SRI International
Bioinformatics
= “DB” = “Knowledge Base” = “KB” =
“Pathway/Genome Database” = “PGDB”
Why Create PGDBs?
SRI International
Bioinformatics

Extract more information from your genome

Create an up-to-date computable information repository
about an organism

Perform analyses on the genome and pathway complement
of the organism
 Analyses of omics data
 Analyses of cellular systems (dead-end metabolites)
 Reports generated by Pathway Tools

Perform comparative analyses with other organisms

Generate a genome poster and metabolic wall chart
Sequence Project Workflow
Raw Sequence
Phred
SRI International
Bioinformatics
PathoLogic
P/G Editors
Pathway
Tools
Phrap
P/G Navigator
GeneMark/Glimmer
BLAST, BLOCKS
WWW Publishing
Analyses
EcoCyc Project – EcoCyc.org
SRI International
Bioinformatics

E. coli Encyclopedia
 Review-level Model-Organism Database for E. coli
 Tracks evolving annotation of the E. coli genome and cellular networks
 The two paradigms of EcoCyc

“Multi-dimensional annotation of the E. coli K-12 genome”
 Positions of genes; functions of gene products – 76% / 66% exp
 Gene Ontology terms; MultiFun terms
 Gene product summaries and literature citations
 Evidence codes
 Multimeric complexes
 Metabolic pathways
 Regulation of transcription initiation
Karp, Gunsalus, Collado-Vides, Paulsen
Nuc. Acids Res. 35:7577 2007
ASM News 70:25 2004
Science 293:2040
SRI International
Bioinformatics
Paradigm 1:
EcoCyc as Textual Review Article
 All
gene products for which experimental literature
exists are curated with a minireview summary
 Found on protein and RNA pages, not gene pages!
 3257 gene products contain summaries
 Summaries cover function, interactions, mutant
phenotypes, crystal structures, regulation, and more
 Additional
summaries found in pages for operons,
pathways
 EcoCyc
cites 15,880 publications
SRI International
Bioinformatics
Paradigm 2: EcoCyc as
Computational Symbolic Theory
 Highly
structured, high-fidelity knowledge
representation provides computable information
 Each molecular species defined as a DB object
 Genes, proteins, small molecules
 Each molecular interaction defined as a DB object
 Metabolic reactions
 Transport reactions
 Transcriptional regulation of gene expression
 220 database fields capture extensive properties
and relationships
EcoCyc Procedures

DB updates performed by 5 staff curators
 Information gathered from biomedical literature




Enter data into structured database fields
Author extensive summaries
Update evidence codes
Corrections submitted by E. coli researchers

Four releases per year

Quality assurance of data and software
 Evaluate database consistency constraints
 Perform element balancing of reactions
 Run other checking programs
SRI International
Bioinformatics
SRI International
Bioinformatics
MetaCyc: Metabolic Encyclopedia





Describe a representative sample of every experimentally
determined metabolic pathway
Describe properties of metabolic enzymes
Literature-based DB with extensive references and
commentary
Pathways, reactions, enzymes, substrates
Jointly developed by
 P. Karp, R. Caspi, C. Fulcher, SRI International
 L. Mueller, A. Pujar, Cornell Univ
 S. Rhee, P. Zhang, Carnegie Institution
Nucleic Acids Research 2008
MetaCyc Data -- Version 11.6
Pathways
1010
Reactions
6,576
Enzymes
4,582
Small Molecules
6,561
Organisms
1,077
Citations
15,875
SRI International
Bioinformatics
Taxonomic Distribution of
MetaCyc Pathways
Bacteria
517
Green Plants
372
Mammals
90
Fungi
89
Archaea
65
SRI International
Bioinformatics
Family of Pathway/Genome
Databases
EcoCyc
MetaCyc
SRI International
Bioinformatics
CauloCyc
AraCyc
MtbRvCyc
HumanCyc
SRI International
Bioinformatics
Comparison of BioCyc to KEGG:
The Data

KEGG approach: Static collection of pathway diagrams that
are color-coded to produce organism-specific views

KEGG vs MetaCyc: Resource on literature-derived pathways
 KEGG pathway maps are composites of pathways in many organisms -do not identify what specific pathways elucidated in what organisms
 KEGG pathway maps encompass multiple biological pathways; are 2-4
times the size of MetaCyc pathways
 KEGG has no literature citations, no summaries, less enzyme detail

KEGG vs BioCyc organism-specific PGDBs
 KEGG re-annotates entire genome for each organism
 KEGG does not curate or customize pathway networks for each organism
SRI International
Bioinformatics
Comparison of Pathway Tools to
KEGG: The Software

KEGG has no pathway hole filler or transport inference
parser or operon predictor

KEGG has no interactive editing tools – you cannot refine a
KEGG pathway DB

KEGG has no algorithmic visualization tools – pathway
diagrams are pre-drawn
 May become out of date
 Cannot show pathways at multiple detail levels

KEGG genome browser has very limited functionality
KEGG has one overview diagram with limited functionality
KEGG has no metabolite tracing tool
KEGG has no Structured Advanced Query Tool



SRI International
Bioinformatics
Overviews and Omics Viewers

Genome-scale Visualizations
 Metabolic map
 Transcriptional regulatory network
 Genome map

Overlay gene expression, proteomics, metabolomics data
Obtain pathway based visualizations of omics data
 Numerical spectrum of expression values mapped to a color spectrum
 Steps of overview painted with color corresponding to expression level(s)
of genes that encode enzyme(s) for that step

SRI International
Bioinformatics
Environment for Computational
Exploration of Genomes
 Powerful
ontology opens many facets of the
biology to computational exploration
 Global
characterization of metabolic network
 Analysis of interface between transport and
metabolism
 Nutrient analysis of metabolic network
SRI International
Bioinformatics
Pathway Tools Implementation Details
 Allegro
Common Lisp
 Sun, Linux, Windows, Macintosh platforms
 Ocelot
object database
 370,000+
lines of code
 Lisp-based
WWW server at BioCyc.org
 Manages 370+ PGDBs
SRI International
Bioinformatics
The Common Lisp Programming
Environment
 Gatt
studied
Lisp and Java
implementation
of 16 programs
by 14
programmers
(Intelligence
11:21 2000)
Survey
 Please
SRI International
Bioinformatics
complete survey at end of each day
PGDB(s) That You Build
 Before
SRI International
Bioinformatics
you leave
 Tar up your PGDB directory and FTP it home, email it home,
or copy it to flash disk
 We will create a backup copy of your PGDB directory if the
directory is still there at the end of the tutorial
 Delete the PGDB directory if you don’t want us to back it up
 We will not give the backed up data to anyone else
Information Sources
SRI International
Bioinformatics

Pathway Tools User’s Guide
 /root/aic-export/pathway-tools/ptools/11.5/doc/manuals/userguide.pdf
 NOTE: Location of the aic-export directory can vary across different computers

Pathway Tools Web Site
 Publications, FAQ, programming examples, etc.
 http://bioinformatics.ai.sri.com/ptools/
BioCyc Publications Page
 http://biocyc.org/publications.shtml
MetaCyc Guide
 http://metacyc.org/MetaCycUserGuide.shtml



Slides from this tutorial
 http://bioinformatics.ai.sri.com/ptools/tutorial/

BioCyc Webinars
 http://biocyc.org/webinar.shtml
SRI International
Bioinformatics
Reporting Pathway Tools Problems

[email protected]

Tell us:
 What platform you are running on
 What version of Pathway Tools you are running
 The error message
 Result of
[1] EC(2) :zoom :count :all
 What operation were you performing when the error occurred?

New patches automatically downloaded and loaded with
PTools starts up

Auto-Patch
 Tools -> Instant Patch -> Download and Activate All Patches
Summary
SRI International
Bioinformatics
 Pathway
Tools and Pathway/Genome Databases
 Not just for pathways!
 Computational inferences

Operons, metabolic pathways, pathway hole fillers
Editing tools
 Analysis tools: Omics data on pathways
 Web publishing of PGDBs

 Main
classes of users:
 Develop PGDB to extract more information from genome for
genome paper
 Develop a model-organism DB for the organism that is
updated regularly and published on the web