Transcript Slide 1
ShewCyc and BeoCyc: discovery platforms for environmental and bioenergy research Tatiana Karpinets, Gretta Serres, and Michael Leuze Oak Ridge National Lab, Marine Biological Laboratory Pathway Tools Workshop 2010 ShewCyc and Shewanella Knowledgebase http://shewanella-knowledgebase.org:8080/Shewanella/ Experimental data Computational predictions Analytical and visualization tools Biological insight Manually curated PGDB for Shewanella oneidensis MR1 •Manual reannotation •Localization prediction •Regulon predictions (http://regprecise.lbl.gov/RegPrecise/) •Capture information from literature, gene expression data, proteomics Yang et al. J Biol Chem. 281:29872-85 Fur Fnr Crp ArgR Shewanella oneidensis Metabolic Pathway Viewer developed by Erich Baker , Baylor University, TX http://watson.ecs.baylor.edu/4360/ Multi-Genome Annotation Solution: Ortholog Editor in Combination with Genome Editors Manual Curation Consistency Check Improved Individual Genome Editors Ortholog Table Tools Sort Search Table Overview Edit View Evidence View Alignments View Consistency Check View Download Edit View Alignment View MUSCLE (3.6) multiple sequence alignment Sfri_3956 MKIRVLISLATAFFMLNTSSAFAKDPADTAVQPLLVKPKVIIFDVNETLLDLENMRASVG Swoo_1992 ---------------------MTLELRDTSIIKDF--PKAVIFDTDNTLYPYHYSHQQAS :: : **:: : **.:***.::** . ... Sfri_3956 KALNGREDLLPLWFSTMLHHSLVVSATGDYQTFGSIGVA---------SLQMVAEINGIA Swoo_1992 LAVQQKAEKILGIKQSRFSDALKISKREIKERLGETASSHSRLLYFQRTIELLGLKTQIM *:: . : : .: : :* :* : :*. . : :::::. . * Sfri_3956 ITPEQAKTAILTPLRSLPAHPDVAEGLAKLKAQGYKLVTLTNSSLEGVTLQLKNANLSQY Swoo_1992 TTLDLEQTYWRTFLTNSQLFPEMHEFLHDLRAHGIQSAVVTDLTAQIQFRKLVYFGLHEA * : :* * * . .*:: * * .*.*:* : ..:*: : : :* .* : Sfri_3956 FDANLSIESVGVFKPHLKTYQWAIKDLGVNADEAL-MVAAHGW-DIAGADKAGLQTAFIR Swoo_1992 FDYIVTTEEAGADKPNPLPFQLARSKLGLEKGDNLWMIGDHPVKDIQGAKKT-LGAITLQ ** :: *..*. **: .:* * ..**:: .: * *:. * Sfri_3956 RQGKVLFPLAAQPDYNVL--DVNELASTLAKFN----- Swoo_1992 KNHKDVKVLKGKEGPDILFDKYSELRELLGEISSNKGK .: * : * .: . ::* . .** . *.::. ** **.*: * : :. Evidence View Consistency Check View Group annotation Protein length consistency Original annotations Automatic identification of bad grouping Domain consistency Protein Length Consistency Domain Consistency ShewCyc Probing Intergenic Regions (IGs) in S. oneidensis using microarray Experimental data (Many Microbe Microarrays Database): CRP mutant vs wild type MR1 (various time points during the transition from aerobic growth with lactate to anaerobic growth with lactate and fumarate. Affymetrix microarray was designed to probe transcripts derived from both genes and IGs Examples: IG SO0016_SO0022; IG SO0017-SO0015 A regulatory effect of the IG transcription Down-stream gene IG Up-stream gene Subset I: IG regions with the same direction of change in gene expression as their neighboring genes (1466) Subset A: IG regions with directions of changes in gene expression that are opposite to upstream genes (805) Subset B: IG regions with directions of changes in gene expression that are opposite to downstream genes (820) Revealing a biological role of Intergenic Regions transcription using Pathway Tools IG (SO2490_SO2491) SO2490 (HexR) Enzymes of the Entner–Doudoroff (ED) pathway HexR SO2491 (PykA) PykA BioEnergy research Science Center (BESC) Breakdown into sugars Cellulosic biomass BESC’s approach: Fuel(s) Sugar Fermentation (1) designing plant cell walls for rapid deconstruction and (2) engineering microbes for converting plants into biofuel in a single step (consolidated bioprocessing) Manually curated (NREL, UGA) pathway genome database for Usage Summary Populus trichocarpa Private Public portal http://besckb.ornl.gov Metabolic reconstructions for BESC relevant microorganisms (BeoCyc) Integrating Experimental Data from LIMS and external resources: Computational predictions: •Orthologs/Inparalogs •Protein Domains •Protein Localization •Metabolic enzymes and pathways •Carbohydrates Active Enzymes •and more Genomes comparison, analysis and visualization tools: Genome browsers Comparative chromosome maps (CMAP) Metabolic maps Omic Viewers and more Microbial Phenotypes comparison toolkit Analysis Framework CAZYmes Analysis Toolkit (CAT) Novel approach based on the association analysis to discover links between CAZy families and pfam domains Web site: http://cricket.ornl.gov/cgi-bin/cat.cgi Find conserved associations between CAZy families and pfam domains Assign carbohydrates activity to unknown protein domains Suggest novel CAZy families Find CAZymes among hypothetical proteins Assign CAZy families to a sequence with high specificity and sensitivity Private BeoCyc hosts a P. trichocarpa PGDB manually curated by NREL team 1. GDP-mannose biosynthesis II, 2. GDP-L-fucose biosynthesis I (from GDP-Dmannose), 3. GDP-L-fucose biosynthesis II (from L-fucose), 4. UDP-D-galactose biosynthesis, 5. UDP-D-galacturonate biosynthesis I (from UDPD-glucuronate), 6. UDP-D-galacturonate biosynthesis II (from Dgalacturonate), 7. UDP-D-glucose biosynthesis (from sucrose), 8. UDP-D-glucuronate biosynthesis (from myoinositol), 9. UDP-D-xylose biosynthesis (compartmentalized), 10. UDP-L-arabinose biosynthesis I (from UDPxylose) in Endoplasmic Reticulum, 11. UDP-L-arabinose biosynthesis I (from UDPxylose) in Cytosol, 12. UDP-L-arabinose biosynthesis I (from UDPxylose) in Golgi lumen, 13. UDP-L-arabinose biosynthesis II (from Larabinose) in Cytosol BeoCyc and BESC knowledgebase http://bobcat.ornl.gov/besc/index.jsp Improving Populus Trichocarpa genome annotation Poor annotation of the poplar genome (gene models and predicted enzymes) Poor representation of the cell wall biosynthesis and related pathways in the reference databases (MetaCyc, KEGG, and PlantCyc) Arabidopsis Genes Future!!! RESD & PESD Populus EC numbers EC numbers Genes Blast Sequences Ortholog search Sequences Kyoto Encyclopedia of Genes and Genomes genomic and molecular information Integration of the metabolic reconstructions into BESC knowledgebase RefSeq files from the NCBI Input files for Pathologic Enzyme information KEGG, CAZy Pathway Genome Databases Refine the PGDBs Create MySQL tables Supplement databases by additional annotations Compare phenotypes of the organisms In terms of their genomic and metabolic characteristics Challenge : automatic PGDB generation for draft genomes using one table for orf predictions fastA contigs !!! Predict automatically TU, complexes, transporters C0 or4062 1176 C0 C0 C0 C0 C0 or4063 or4064 or4065 or4066 or4067 4667 5611 6384 7869 8597 C0 or4068 C0 or4069 8852 9812 C1r or2287 C1r or2288 343 1398 C1r C1r C1r C1r C1r C1r C3r or2289 or2290 or2291 or2292 or2293 or2294 or2604 2852 2985 5705 6933 7743 8110 401 C3r C3r C3r C3r C3r C3r C4r or2605 or2606 or2607 or2608 1499 1870 3807 4128 709Polyketide cyclase/dehydrase L-threonine ammonia-lyase (21206 oxobutanoate-forming) 4682glutaryl-CoA dehydrogenase 5608hypothetical protein 7120naphthoate synthase 7887GntR domain protein adenosylcobinamide-phosphate 8643 guanylyltransferase 8973hypothetical protein protein of unknown function DUF6 1230 transmembrane 1679putative lipoprotein Alcohol dehydrogenase GroES domain 1854 protein 3881LysR substrate-binding 3897peptidase M24 5764Cysteine desulfurase 7129Lysine exporter protein (LYSE/YGGA) 78413-hydroxybutyryl-CoA epimerase 222phage tail sheath protein FI phosphonate metabolism protein/1,5bisphosphokinase (PRPP-forming) 903 PhnN 2196Arc-like DNA binding 2365glutaryl-CoA dehydrogenase 4640hypothetical protein 4.3.1.19 1.3.99.7 4.1.3.36 2.7.7.62 5.1.2.3 1.1.1.35 1.3.99.7 EC5 EC4 EC3 EC2 EC1 Product Stop Start Locus Replicon for each contig >C0 ATAAAGACGAAAAGCACCGGAT CGAACACCGCCACTTCGAAAAC TTCGAACGTCTACGG …. >C1r AGTGCGGCTAGGCCGTCGATGG AGCTAGGCCGTCGA …. >C3r GACGAAAAGCAGACGAAAAGCA GACGAAAAGCAGCT…. …. Involvement of Single-Genotype Consortia in Degradation of Aromatic Compounds by Rhodopseudomonas palustris p-Coumarate - - anoxygenic photosynthesis Benzoate - aerobic or anaerobic respiration and fermentation - fixation of nitrogen gas - utilization of carbon through CO2 reduction using H2 as an electron donor Average log2 ratio of the expression of nitrogenases with different cofactors in the growth on pcoumarate and benzoate versus succinate •Transpoters •Chemotaxes operons •Curli formation operon Expression of R. palustris phenotypes under p-coumarate (black columns) and benzoate (white columns) degrading conditions if compared with growth on succinate. p-Coumarate - Benzoate Structures of R. palustris consortia mediating anaerobic growth on p-coumarate (A) and on benzoate (B) Putative electron donor and electron acceptor reactions under different modes of the Rhodopseudomonas palustris growth Changes in total nitrogen, ammonium and dissolved nitrogen gas during the benzoate degradation as functions of OD660 Acknowledgements ShewCyc and Shewanella Knowledgebase BeoCyc and BESC Knowledgebase PNNL: Margaret Romine NREL: Ambarish Nag Christopher Chang Marine Biological Laboratory: Margrethe Serres ORNL: Denise Schmoyer Guruprasad Kora Mustafa Syed Erich Baker Hoony Park Nagiza Samatova and Edward Uberbacher UGA: Maor Bar-Peled ORNL: Mustafa Syed Hoony Park Morrey Parang Denise Schmoyer and Edward C. Uberbacher