BioCyc - Artificial Intelligence Center

Download Report

Transcript BioCyc - Artificial Intelligence Center

Pathway Tools / BioCyc Fundamentals

1

Peter D. Karp, Ph.D.

Bioinformatics Research Group SRI International [email protected]

BioCyc.org

EcoCyc.org, MetaCyc.org, HumanCyc.org

SRI International Bioinformatics

Pathway Tools Capabilities

2 

Create and maintain an organism database integrating genome, pathway, regulatory information

 Computational inference tools  Interactive editing tools 

Query and visualize that database

Use the database to interpret omics data

Metabolic network analysis tools

Comparative analysis tools

Export the metabolic network to SBML

 Speed creation of flux-balance models by order of magnitude

SRI International Bioinformatics

BioCyc

3 

Hundreds of microbial genomes

 Inferred operons and metabolic networks 

Couples curated data with computational predictions

Supports analysis of omics data

Comparative analysis tools

Microbial emphasis. Exceptions:

 HumanCyc, MouseCyc, CattleCyc

SRI International Bioinformatics

4

Model Organism Databases / Organism Specific Databases

DBs that describe the genome and other information about an organism

Every sequenced organism with an active experimental community requires a MOD

  Integrate genome data with information about the biochemical and genetic network of the organism Integrate literature-based information with computational predictions 

Curated by experts for that organism

 No one group can curate all the world’s genomes  Distribute workload across a community of experts to create a community resource

SRI International Bioinformatics

5

Rationale for MODs

Each “complete” genome is incomplete in several respects:

 40%-60% of genes have no assigned function   Roughly 7% of those assigned functions are incorrect Many assigned functions are non-specific 

Need continuous updating of annotations with respect to new experimental data and computational predictions

MODs are platforms for global analyses of an organism

 Interpret omics data in a pathway context  In silico prediction of essential genes  Characterize systems properties of metabolic and genetic networks

SRI International Bioinformatics

What is Curation?

6         

Ongoing updating and refinement of a PGDB Correcting false-positive and false-negative predictions Incorporating information from experimental literature Authoring of comments and citations Updating database fields Gene positions, names, synonyms Protein functions, activators, inhibitors Addition of new pathways, modification of existing pathways Defining TF binding sites, promoters, regulation of transcription initiation and other processes SRI International Bioinformatics

7

Pathway/Genome Database

Pathways Reactions Proteins RNAs Genes Chromosomes Plasmids Compounds Sequence Features Regulation

Operons Promoters DNA Binding Sites Regulatory Interactions CELL

SRI International Bioinformatics

8

BioCyc Collection of 507 Pathway/Genome Databases

Pathway/Genome Database (PGDB) – combines information about

 Pathways, reactions, substrates    Enzymes, transporters Genes, replicons Transcription factors/sites, promoters, operons 

Tier 1: Literature-Derived PGDBs

  MetaCyc EcoCyc -- Escherichia coli K-12 

Tier 2: Computationally-derived DBs, Some Curation -- 24 PGDBs

 HumanCyc  Mycobacterium tuberculosis 

Tier 3: Computationally-derived DBs, No Curation -- 481 DBs SRI International Bioinformatics

9

Pathway Tools Overview

Annotated Genome PathoLogic Pathway/Genome Database MetaCyc Reference Pathway DB Pathway/Genome Editors Pathway/Genome Navigator SRI International Bioinformatics

10

Pathway Tools Software: PathoLogic

Computational creation of new Pathway/Genome Databases

Transforms genome into Pathway Tools schema and layers inferred information above the genome

Predicts operons

Predicts metabolic network

Predicts which genes code for missing enzymes in metabolic pathways

Infers transport reactions from transporter names Bioinformatics 18:S225 2002 SRI International Bioinformatics

11

Pathway Tools Software: Pathway/Genome Editors

Interactively update PGDBs with graphical editors

Support geographically distributed teams of curators with object database system

Gene editor

Protein editor

Reaction editor

Compound editor

Pathway editor

Operon editor

Publication editor SRI International Bioinformatics

Pathway Tools Software: Pathway/Genome Navigator

Querying and visualization of:

 Pathways    Reactions Metabolites Proteins   Genes Chromosomes 12 

Two modes of operation:

 Web mode   Desktop mode Most functionality shared, but each has unique functionality

SRI International Bioinformatics

13

Pathway Tools Software: PGDBs Created Outside SRI

1,700+ licensees: 75+ groups applying software to 300+ organisms

Saccharomyces cerevisiae, SGD project, Stanford University

135 pathways / 565 publications

Candida albicans, CGD project, Stanford University

dictyBase, Northwestern University

Mouse, MGD, Jackson Laboratory

Under development:

Drosophila, FlyBase  C. elegans, WormBase 

Arabidopsis thaliana, TAIR, Carnegie Institution of Washington

 288 pathways / 2282 publications 

PlantCyc, Carnegie Institution of Washington

Six Solanaceae species, Cornell University

GrameneDB, Cold Spring Harbor Laboratory

Medicago truncatula, Samuel Roberts Noble Foundation SRI International Bioinformatics

14

Pathway Tools Software: PGDBs Created Outside SRI

NIAID BRCs for Biodefense pathogens:

  BioHealthBase -- Mycobacterium tuberculosis, Francisella tuleremia

Pathema -- 80+ PGDBs

  PATRIC – Brucella suis, Coxiella burnetii, Rickettsia typhi EuPathDB – Cryptosporidium, Plasmodium

G. Xie, Los Alamos Lab, Dental pathogens

F. Brinkman, Simon Fraser Univ, Pseudomonas aeruginosa

V. Schachter, Genoscope, Acinetobacter

M. Bibb, John Innes Centre, Streptomyces coelicolor

G. Church, Harvard, Prochlorococcus marinus, multiple strains

E. Uberbacher, ORNL and G. Serres, MBL, Shewanella onedensis

R.J.S. Baerends, University of Groningen, Lactococcus lactis IL1403, Lactococcus lactis MG1363, Streptococcus pneumoniae TIGR4, Bacillus subtilis 168, Bacillus cereus ATCC14579

Matthew Berriman, Sanger Centre, Trypanosoma brucei, Leishmania major

Sergio Encarnacion, UNAM, Sinorhizobium meliloti

Mark van der Giezen, University of London, Entamoeba histolytica, Giardia intestinalis

Michael Gottfert, Technische Universitat Dresden, Bradyrhizobium japonicum

Artiva Maria Goudel, Universidade Federal de Santa Catarina, Brazil, Chromobacterium violaceum ATCC 12472 SRI International Bioinformatics

15

Pathway Tools Software: PGDBs Created Outside SRI

Large scale users:

C. Medigue, Genoscope, 200+ PGDBs

  

G. Sutton, J. Craig Venter Institute, 80+ PGDBs G. Burger, U Montreal, 60+ PGDBs

Bart Weimer, Utah State University

monocytogenes

, Lactococcus lactis, Brevibacterium linens,

Lactobacillus acidophilus, Lactobacillus plantarum, Lactobacillus johnsonii, Listeria

Partial listing of outside PGDBs at BioCyc.org

SRI International Bioinformatics

16

Obtaining a PGDB for Organism of Interest

Find existing curated PGDB

Find existing PGDB in BioCyc

Create your own SRI International Bioinformatics

EcoCyc Project – EcoCyc.org

E. co

li En cyc lopedia

 Review-level Model-Organism Database for E. coli   Tracks evolving annotation of the E. coli genome and cellular networks The two paradigms of EcoCyc 

“Multi-dimensional annotation of the E. coli K-12 genome”

 Positions of genes; functions of gene products – 76% / 66% exp  Gene Ontology terms; MultiFun terms      Gene product summaries and literature citations Evidence codes Multimeric complexes Metabolic pathways Cellular regulation Karp, Gunsalus, Collado-Vides, Paulsen 17

Nuc. Acids Res.

35:7577 2007

ASM News

70:25 2004

Science

293:2040 SRI International Bioinformatics

EcoCyc = E.coli Dataset +

URL: EcoCyc.org

Pathway/Genome Navigator

Pathways: 246 18 EcoCyc v13.6

Citations: 19,000 Reactions: Metabolic: 1394 Transport: 246 Proteins: 4,479 Complexes: 895 RNAs: 285 Genes: 4,492 Compounds: 1,830 Gene Regulation: Operons: 3,369 Trans Factors: 196 Promoters: 1,796 TF Binding Sites: 2,205

SRI International Bioinformatics

19

Paradigm 1: EcoCyc as Textual Review Article

All gene products for which experimental literature exists are curated with a minireview summary

 Found on protein and RNA pages, not gene pages!

 3257 gene products contain summaries 

Summaries cover function, interactions, mutant phenotypes, crystal structures, regulation, and more

Additional summaries found in pages for operons, pathways

EcoCyc cites 17,300 publications SRI International Bioinformatics

20

Paradigm 2: EcoCyc as Computational Symbolic Theory

Highly structured, high-fidelity knowledge representation provides computable information

Each molecular species defined as a DB object

 Genes, proteins, small molecules 

Each molecular interaction defined as a DB object

 Metabolic reactions  Transport reactions  Transcriptional regulation of gene expression 

220 database fields capture extensive properties and relationships SRI International Bioinformatics

EcoCyc Procedures

21 

DB updates performed by 5 staff curators

  Information gathered from biomedical literature    Enter data into structured database fields Author extensive summaries Update evidence codes Corrections submitted by E. coli researchers 

Four releases per year

Quality assurance of data and software

 Evaluate database consistency constraints   Perform element balancing of reactions Run other checking programs

SRI International Bioinformatics

22

EcoCyc Accelerates Science

    

Experimentalists

E. coli experimentalists   Experimentalists working with other microbes Analysis of expression data

Computational biologists

 Biological research using computational methods    Genome annotation Study connectivity of E. coli metabolic network Study phylogentic extent of metabolic pathways and enzymes in all domains of life

Bioinformaticists

 Training and validation of new bioinformatics algorithms – predict operons, promoters, protein functional linkages, protein-protein interactions,

Metabolic engineers

 “Design of organisms for the production of organic acids, amino acids, ethanol, hydrogen, and solvents “

Educators SRI International Bioinformatics

23

MetaCyc

:

Meta bolic En cyc lopedia

Describe a representative sample of every experimentally determined metabolic pathway

Describe properties of metabolic enzymes

Literature-based DB with extensive references and commentary

Pathways, reactions, enzymes, substrates

Jointly developed by

 P. Karp, R. Caspi, C. Fulcher, SRI International   L. Mueller, A. Pujar, Boyce Thompson Institute S. Rhee, P. Zhang, Carnegie Institution

Nucleic Acids Research

2008 SRI International Bioinformatics

Applications of MetaCyc

24 

Reference source on metabolic pathways

Metabolic engineering

  Find enzymes with desired activities, regulatory properties Determine cofactor requirements 

Predict pathways from genomes

Systematic studies of metabolism

Computer-aided education SRI International Bioinformatics

25

MetaCyc Data -- Version 13.6

Pathways Reactions Enzymes Small Molecules Organisms Citations 1,436 8,200 6,060 8,400 1,800 21,700 SRI International Bioinformatics

26

Taxonomic Distribution of MetaCyc Pathways – version 13.1

Bacteria Green Plants Fungi Mammals Archaea 883 607 199 159 112 SRI International Bioinformatics

Enzyme Data Available in MetaCyc

30 

Reaction(s) catalyzed

Alternative substrates

Activators, inhibitors, cofactors, prosthetic groups

Subunit structure

Genes

Features on protein sequence

Cellular location

pI, molecular weight, Km, Vmax

Gene Ontology terms

Links to other bioinformatics databases SRI International Bioinformatics

31

What is a Pathway?

A connected sequence of biochemical reactions

Occurs in one organism

Conserved through evolution

Regulated as a unit

Often starts or stops at one of 13 common intermediate metabolites SRI International Bioinformatics

32

MetaCyc Pathway Variants

Pathways that accomplish similar biochemical functions using different biochemical routes

 Alanine biosynthesis I – E. coli  Alanine biosynthesis II – H. sapiens

Pathways that accomplish similar biochemical functions using similar sets of reactions

 Several variants of TCA Cycle

SRI International Bioinformatics

MetaCyc Super-Pathways

33 

Groups of pathways linked by common substrates

Example: Super-pathway containing

 Chorismate biosynthesis  Tryptophan biosynthesis   Phenylalanine biosynthesis Tyrosine biosynthesis 

Super-pathways defined by listing their component pathways

Multiple levels of super-pathways can be defined

Pathway layout algorithms accommodate super-pathways SRI International Bioinformatics

35

Comparison with KEGG

KEGG vs MetaCyc: Reference pathway collections

  KEGG maps are not pathways Nuc Acids Res 34:3687 2006    KEGG maps contain multiple biological pathways Two genes chosen at random from a BioCyc pathway are more likely to be related according to genome context methods than from a KEGG pathway KEGG maps are composites of pathways in many organisms -- do not identify what specific pathways elucidated in what organisms KEGG has no literature citations, no comments, less enzyme detail  KEGG assigns half as many reactions to pathways as MetaCyc 

KEGG vs organism-specific PGDBs

 KEGG does not curate or customize pathway networks for each organism  Highly curated PGDBs now exist for important organisms such as E. coli, yeast, mouse, Arabidopsis

SRI International Bioinformatics

Comparison of Pathway Tools to KEGG

36 

Inference tools

  KEGG does not predict presence or absence of pathways KEGG lacks pathway hole filler, operon predictor 

Curation tools

 KEGG does not distribute curation tools   No ability to customize pathways to the organism Pathway Tools schema much more comprehensive 

Visualization and analysis

   KEGG does not perform automatic pathway layout KEGG metabolic-map diagram extremely limited No comparative pathway analysis

SRI International Bioinformatics

Pathway Tools Implementation Details

37 

Platforms:

 Macintosh, PC/Linux, and PC/Windows platforms 

Same binary can run as desktop app or Web server

Production-quality software

  Version control Two regular releases per year     Extensive quality assurance Extensive documentation Auto-patch Automatic DB-upgrade 

480,000 lines of Lisp code SRI International Bioinformatics

p [email protected]

38

SRI International Bioinformatics

39

Pathway Tools Architecture

Web Mode Pathway Genome Navigator Desktop Mode Disk File Lisp Perl Java GFP API Protein Editor Pathway Editor Reaction Editor Ocelot DBMS SRI International Bioinformatics Oracle or MySQL

40

Ocelot Knowledge Server Architecture

Frame data model

 Minimizes size of schema relative to semantic complexity 

Schema is stored within the DB

Schema is self documenting

Slot units define metadata about slots

 Domain, range, inverse   Collection type, number of values, value constraints Comment 

Schema evolution facilitated by

 Easy addition/removal of slots, or alteration of slot datatypes  Flexible data formats that do not require dumping/reloading of data

SRI International Bioinformatics

Ocelot Storage System Architecture

41 

Persistent storage via disk files or Oracle or MySQL

 Concurrent development: Oracle or MySQL  Single-user development: disk files 

Oracle/MySQL DBMS storage

 DBMS is submerged within Ocelot, invisible to users  Frames transferred from DBMS to Ocelot     On demand By background prefetcher Memory cache Persistent disk cache to speed performance via Internet 

Transaction logging facility SRI International Bioinformatics

Why Do We Code in Common Lisp?

42 

Gatt studied Lisp and Java implementation of 16 programs by 14 programmers (Intelligence 11:21 2000)

 The average Lisp program ran 33 times faster than the average Java program  The average Lisp program was written 5 times faster than the average Java program 

Roberts compared Java and Lisp implementations of a Domain Name Server (DNS) resolver

 http://www.findinglisp.com/papers/case_study_java_lisp_dns.html

 The Lisp version had ½ as many lines as code

SRI International Bioinformatics

43

Common Lisp Programming Environment

Interpreted and/or compiled execution

Fabulous debugging environment

High-level language

Interactive data exploration

Extensive built-in libraries

Dynamic redefinition

Find out more!

  See ALU.org or http://www.international-lisp-conference.org/

SRI International Bioinformatics

44

PathoLogic Processing

1.

2.

3.

4.

5.

6.

Translate source genome to PGDB form Predict operons Predict metabolic pathways Predict pathway hole fillers Transport inference parser Build metabolic overview diagram SRI International Bioinformatics

45

PathoLogic Step 1: Translate Genome to PGDB

Annotated Genomic Sequence Pathway/Genome Database Gene Products Pathways Genes/ORFs DNA Sequences Multi-organism Pathway Database (MetaCyc) Pathways PathoLogic Software

Integrates genome and pathway data to identify putative metabolic networks

Reactions Compounds Gene Products Genes Reactions Genomic Map Compounds SRI International Bioinformatics

47

PathoLogic Step 2: Predict Operons

Predict adjacent genes A and B in same operon based on:

 Intragenic distance  Functional relatedness of A and B 

Tests for functional relatedness:

 A and B in same gene functional class (MultiFun)    A and B in same metabolic pathway A codes for enzyme in a pathway and B codes for transporter involving a substrate in that pathway A and B are monomers in same protein complex  

Correctly predicts 80% of E. coli transcription units Marks predicted operons with computational evidence codes

Bioinformatics

20:709-17 2004 SRI International Bioinformatics

48

PathoLogic Step 3: Prediction of Metabolic Pathways

Infer reaction complement of organism

 Match enzymes in source genome to MetaCyc reactions they catalyze  Match enzyme names and EC numbers to MetaCyc  Support user in manually matching additional enzymes 

Computationally predict which MetaCyc metabolic pathways are present

 For each MetaCyc pathway, evaluate which of its reactions are catalyzed by the organism

SRI International Bioinformatics

Match Enzymes to Reactions

49

Gene product 5.1.3.2

MetaCyc UDP-glucose-4 epimerase

2057 proteins matched by EC# 314 matched by name

Match no yes Probable enzyme -ase no yes

1320

Assign

UDP-D-glucose  UDP-galactose

Not a metabolic enzyme no Manually search yes Can’t Assign Assign SRI International Bioinformatics

625

50

Import Pathways

reactions Containing pathways MetaCyc Import All yes keep no Manual Review Prune?

no yes Delete delete SRI International Bioinformatics

51

Pathway Prediction

Prediction is hard because

 Enzyme naming is irregular  Some reactions present in multiple pathways  Pathway variants share many reactions in common  MetaCyc now has many pathways

SRI International Bioinformatics

Pathway Scoring Criteria

52 

Imported pathways must satisfy:

  Pathways outside their taxonomic range must have enzymes for all reactions If any reactions in a pathway are designated as “key,” an enzyme must be present for at least one 

Pathway P is imported if any conditions satisfied:

    One unique enzyme present for P P missing at most one reaction More reactions present than absent for P P is not a superset of another pathway with the same number of enzymes present

SRI International Bioinformatics

Pathway Evidence Report

53

SRI International Bioinformatics

PathoLogic Step 4: Pathway Hole Filler

Definition: Pathway Holes are reactions in metabolic pathways for which no enzyme is identified

54 L-aspartate

1.4.3

CC3619

.-

NAD+ synthetase, NH3 iminoaspartate

holes

deamido-NAD quinolinate synthetase nadA quinolinate

2.7.7.18

n.n. pyrophosphorylase nadC nicotinate nucleotide

6.3.5.1

NAD

SRI International Bioinformatics

55

Step 1: Query UniProt for all sequences having EC# of pathway hole Step 2: BLAST against target genome

organism 1

enzyme A

organism 2

enzyme A

organism 3

enzyme A

organism 4

enzyme A

organism 5

enzyme A

organism 6

enzyme A

organism 7

enzyme A

organism 8

enzyme A

Step 3 & 4: Consolidate hits and evaluate evidence 7 queries have high-scoring hits to sequence Y SRI International Bioinformatics

57

Pathway Hole Filler

Why should hole filler find things beyond the original genome annotation?

Reverse BLAST searches more sensitive

Reverse BLAST searches find second domains

Integration of multiple evidence types SRI International Bioinformatics

Caulobacter crescentus Pathway Holes

58  

130 pathways containing 582 reactions 92 pathways contain 236 pathway holes Caulobacter holes filled:

77 holes filled at P >0.9

Previous functions of candidate hole fillers:

    No predicted function Correctly assigned single function Incorrectly assigned function Imprecise functional assignment

BMC Bioinformatics 5:76 2004

SRI International Bioinformatics

Example Pathway

CC2913, P=0.99

L-aspartate

1.4.3.-

iminoaspartate quinolinate synthetase nadA (CC2912) quinolinate 59 NAD+ synthetase, NH3 CC3619

holes

deamido-NAD NAD n.n. pyrophosphorylase nadC (CC2915)

2.7.7.18

nicotinate nucleotide

6.3.5.1

CC3619, P=0.99

CC3431*, P=0.90

CC2913 L-aspartate oxidase (wrong EC# on rxn) CC3431 ORF CC3619 put. NAD(+)-synthetase (multidomain)

SRI International Bioinformatics

60

PathoLogic Step 5: Transport Inference Parser

Problem: Write a program to query a genome annotation to compute the substrates an organism can transport

Typical genome annotations for transporters:

  ATP transporter for ribose ribose ABC transporter    D-ribose ATP transporter ABC transporter, membrane spanning protein [ribose] ABC transporter, membrane spanning protein [D-ribose]

SRI International Bioinformatics

Transport Inference Parser

61 

Input: “ATP transporter of phosphonate”

Output: Structured description of transport activity

Locates most transporters in genome annotation using keyword analysis

Parse product name using a series of rules to identify:

 Transported substrate, co-substrate   Influx/efflux Energy coupling mechanism 

Creates transport reaction object: phosphonate [periplasm] + H 2 O + ATP = phosphonate + P i + ADP SRI International Bioinformatics

62

Transport Inference Parser

Permits symbolic computation with transport activities:

  Compute transportable substrates of the cell Compute connectivity among compartments for substrates   Facilitate reasoning about transport/metabolism connections Draw transport cartoon in protein pages, cellular overview

SRI International Bioinformatics

63

Transport Inference Parser

User reviews all assignments using interactive tool that allows assignments to be revised

User also reviews transporters for which no assignment was made SRI International Bioinformatics

64

Regulation

SRI International Bioinformatics

65

Encoding Cellular Regulation in Pathway Tools -- Goals

Facilitate curation of wide range of regulatory information within a formal ontology

Compute with regulatory mechanisms and pathways

   Summary statistics, complex queries Pattern discovery Visualization of network components 

Provide training sets for inference of regulatory networks

Interpret gene-expression datasets in the context of known regulatory mechanisms SRI International Bioinformatics

66

Regulatory Interactions Supported by Pathway Tools

Substrate-level regulation of enzyme activity

Binding to proteins or small molecules (phosphorylation)

Regulation of transcription initiation

Attenuation of transcription

Regulation of translation by proteins and by small RNAs SRI International Bioinformatics

67

Regulation in Pathway Tools

Editing tools

Transcription factor display window

Transcription unit display window

Regulatory Overview / Omics Viewer SRI International Bioinformatics

Regulatory Interaction Editor

68

SRI International Bioinformatics

69

Regulatory Overview and Omics Viewer

Show regulatory relationships among gene groups SRI International Bioinformatics

Comparative Analysis

Via Cellular Overview

Comparative genome browser

Comparative pathway table

Comparative analysis reports

 Compare reaction complements   Compare pathway complements Compare transporter complements 71

SRI International Bioinformatics

Information Sources

73 

Pathway Tools User’s Guide

 aic-export/pathway-tools/ptools/13.0/doc/manuals/userguide.pdf

 NOTE: Location of the aic-export directory can vary across different computers 

Pathway Tools Web Site

 http://bioinformatics.ai.sri.com/ptools/  Publications, FAQ, programming examples, etc.

Slides from this tutorial

 http://www.ai.sri.com/pkarp/talks/ 

BioCyc Webinars

 http://biocyc.org/webinar.shtml

SRI International Bioinformatics

74

BioCyc and Pathway Tools Availability

BioCyc.org Web site and database files freely available to all

Pathway Tools freely available to non-profits

 Macintosh, PC/Windows, PC/Linux

SRI International Bioinformatics

75

Symbolic Systems Biology

Definition: Global analyses of biological systems using symbolic computing SRI International Bioinformatics

76

Symbolic Systems Biology

“Symbolic computing is concerned with the representation and manipulation of information in symbolic form. It is often contrasted with numeric representation.” -- R. Cameron

Examples of symbolic computation:

 Symbolic algebra programs, e.g., Mathematica, Graphing Calculator  Compilers and interpreters for programming languages     Database query languages Text analysis programs, e.g., Google String matching for DNA and protein sequences Artificial Intelligence methods, e.g., expert systems, symbolic logic, machine learning, natural language understanding

SRI International Bioinformatics

77

Symbolic Systems Biology

Concerned with different questions than quantitative systems biology

Symbolic analyses can in many cases produce answers when quantitative approaches fail because of lack of parameters or intractable mathematics

Symbolic computation is intimately dependent on the use of structured ontologies SRI International Bioinformatics

Pathway Tools Ontology

78 

1064 classes

  Main classes such as:  Pathways, Reactions, Compounds, Macromolecules, Proteins, Replicons, DNA-Segments (Genes, Operons, Promoters) Taxonomies for Pathways, Reactions, Compounds 

205 slots

 Meta-data: Creator, Creation-Date    Comment, Citations, Common-Name, Synonyms Attributes: Molecular-Weight, DNA-Footprint-Size Relationships: Catalyzes, Component-Of, Product 

Classes, instances, slots all stored side by side in DBMS SRI International Bioinformatics

Critiquing the Parts List

Slide thanks to Hirotada Mori (minus the banana!)

79

SRI International Bioinformatics

80

Dead End Metabolites

A small molecule C is a dead-end if:

  C is produced only by SMM reactions in Compartment, and no transporter acts on C in Compartment OR C is consumed only by SMM reactions in Compartment, and no transporter acts on C in Compartment

SRI International Bioinformatics

81

Dead End Metabolites

Not yet an official part of Pathway Tools

Contact us if you’d like to use it SRI International Bioinformatics

82

Reachability Analysis of Metabolic Networks

   

Given:

 A PGDB for an organism  A set of initial metabolites

Infer:

 What set of products can be synthesized by the small-molecule metabolism of the organism

Motivations:

 Quality control for PGDBs    Verify that a known growth medium yields known essential compounds Experiment with other growth media Experiment with reaction knock-outs

Limitations

 Cannot properly handle compounds required for their own synthesis  Nutrients needed for reachability may be a superset of those required for growth Romero and Karp,

Pacific Symposium on Biocomputing,

2001 SRI International Bioinformatics

Algorithm: Forward Propagation Through Production System

83 

Each reaction becomes a production rule

Each of the 21 metabolites in the nutrient set becomes an axiom

Nutrient set Products Metabolite pool

PGDB reaction set

“Fire” reactions

A + B

C

Reactants

SRI International Bioinformatics

84 Nutrients: A, B, C, E, F A + B  W C + D  X E + F  Y W + Y  Z Produced Compounds: W, Y, Z

SRI International Bioinformatics

Initial Metabolite Nutrient Set (Total: 21 compounds)

85

Nutrients (8)

(M61 Minimal growth medium)

Nutrients (10)

(Environment)

H

+

, Fe

2+

, Mg

2+

, K

+

, NH

3

, SO

4 2-

, PO

4 2-

, Glucose Water, Oxygen, Trace elements (Mn

2+

, Co

2+

, Mo

2+

, Ca

2+

, Zn

2+

, Cd

2+

, Ni

2+

, Cu

2+

) Bootstrap Compounds (3) ATP, NADP, CoA

SRI International Bioinformatics

86

Essential Compounds E. coli Total: 41 compounds

Proteins (20)

 Amino acids 

Nucleic acids (DNA & RNA) (8)

 Nucleosides 

Cell membrane (3)

 Phospholipids 

Cell wall (10)

 Peptidoglycan precursors  Outer cell wall precursors (Lipid-A, oligosaccharides)

SRI International Bioinformatics

87

SRI International Bioinformatics

88 

http://brg.ai.sri.com/ptools09/slides/Tuesday/growt h-experiment-Markus-Krummenacker.txt

SRI International Bioinformatics

89

Flux Balance Modeling

Generate, store, and update metabolic model within Pathway Tools

  Fast, accurate generation of metabolic model Close coupling to genome and regulatory information   Extensive schema Extensive query and visualization tools 

Debug/validate model using Pathway Tools

Export to SBML and import to constraint solver for model execution

Visualize reaction flux and omics data using overviews

Copy/update multiple PGDBs to reflect alternative strains SRI International Bioinformatics