Transcript Document
Protein and Specialized Sequence Databases © Wiley Publishing. 2007. All Rights Reserved. Learning Objectives Finding out the basics of protein maturation Deciphering a Swiss-Prot entry Getting to know specialized protein databases such as KEGG (the metabolicpathways database) or PDB (the structure database) Outline 1. Getting from a gene to a mature protein 2. Reading a UniProt/Swiss-Prot entry 3. Exploring metabolic databases such as KEGG 4. Finding out how about post translational modifications From a Gene to a Functional Protein DNA genes get transcribed into mRNAs mRNAs are translated into proteins Proteins often need to be matured before becoming active Matured proteins must be transported to their destination • • • • Cell nucleus Mitochondria or other organelle Periplasma (bacteria) Secreted outside the cell The protein is functional when it reaches the place where it has to work (just like you and me)! Protein Maturation Maturation can involve • Removal of some fragments • Specific protein cleavage • Chemical modifications • Phosphorylation • Addition of lipids or sugars (glycosylation) Knowing Your Protein To understand how your protein works, you need to know about • Its maturation • Its transportation • Its mechanism of functioning All this information must be determined experimentally If it has been done, it’s in Swiss-Prot The Swiss-Prot Database Entries describe all proteins that have known functions Small, non-redundant database: 100,000 entries • trEMBL contains 4 the 4 million putative proteins found in GenBank • Swiss-Prot contains the subset of trEMBL with a known function All entries annotated manually Most accurate database for protein function Access Swiss-Prot at www.expasy.ch Browsing a Swiss-Prot Entry Find this entry at www.expasy.org/uniprot/P00533 The Main Sections of a Swiss-Prot Entry General information • Accession number References • Bibliography Comment section • Functional information Cross-references • Links to entries in other databases Feature table • Mapping of every known function Sequence The General Information in a Swiss-Prot Entry The Entry Name • Identifies the entry • Can change if the entry gets merged The Primary Accession Number • Has the form PXXXX • Is permanent and never changes Last Modified lets you know when the entry was last modified The Protein Name and Synonyms provide some common names of your protein The From and Taxonomy fields indicate where the protein comes from The References section lists all the references used to compile this entry The Comments Section The Comments section lists all the known functions of the protein. This section is a valuable document compiled manually by specialists Comments deal with the most standard topics (see table) Comment Section of the Entry P00533 The Cross-reference Section Contains hyperlinks to other entries in other databases Automatically updated Some Important Cross-References EMBL: GenBank original DNA sequence PDB: Experimental structure of your protein DIP: Proteins interacting with your protein GlycoSuiteDB: Glycolsylations MIM: List of genetic diseases involving your protein Ontologies: Function of your protein Profiles: Known protein domains in your protein ENSEMBL: Genomic location of your protein The Features Section Localizes precisely every known function of your protein, each on its sequence TRANSMEM: Transmembrane domain ACT_SITE: Active sites BINDING: Binding sites DISULPHID: Bridge of cysteines Finding Out More About Your Protein’s Maturation Proteins are often modified to make them active Modification can imply attaching a lipid or a sugar www.ebi.ac.uk/RESID • This site details every known post-translational modification www.glycosuite.com • A complete database of all known sugars found in Use these resources to determine the details of the modification proteins www.lipidbank.jp • A database of lipids The Function of Your Protein The Features and the Comments sections give you valuable functional information To find out about the function of your protein, you will need to determine • • • • Where your protein works Metabolic pathway in which the protein is involved The protein’s 3D structure Which protein family it belongs to You may find this data by following links in the cross-links section Where Does Your Protein Work? Proteins are usually part of a metabolic pathway A metabolic pathway is like a chain of production linking several different proteins Metabolic pathways modify metabolites by passing them from one enzyme to the next On the KEGG pathway, each enzyme appears with its EC number Some Important Resources for Metabolic Pathways www.genome.ad.jp/kegg • KEGG is the most extensive database of metabolic pathways • You can use it to compare species www.chem.qmul.ac.uk/iubmb • The IUBMD assigns the EC numbers used to describe an enzyme activity www.ecocy.org • An exhaustive list of all known metabolic pathways in E. coli and other bacteria What Is the Structure of Your Protein ? A protein must have the right structure to perform its function The structure of a protein is the key to understanding it Predicting protein structures is very difficult Precise prediction requires experiments • X-ray crystallography • Nuclear magnetic resonance Prediction from sequence alone is possible but unreliable Some Databases of Protein Structures www.rcsb.org/pdb • The database of protein structures • The protein’s “PDB” is often a synonymous with its structure www.ncbi.nlm.nih.gov/Structure • The other home of protein structures swissmodel.expasy.org • Prediction of structures from sequences Some Important Protein Families Proteins can be classified into families www.kinasenet.org • Kinases control everything in us; their deregulation is the cause of This classification is based on both function and sequence many cancers imgt.cines.fr • Immunoglobulins are key elements of our natural defenses Very specialized databases are available for the most important families rebase.neb.com • This site is a key resource on restriction enzymes Wrapping It Up Predicting protein function is a central goal in biology Protein databases help organize knowledge They provide the material for • Developing new biological experiments • Developing new prediction algorithms • Extrapolating experimental data to unknown sequences