Transcript Document

Protein and Specialized
Sequence Databases
© Wiley Publishing. 2007. All Rights Reserved.
Learning Objectives
Finding out the basics of protein maturation
Deciphering a Swiss-Prot entry
Getting to know specialized protein
databases such as KEGG (the metabolicpathways database) or PDB (the structure
database)
Outline
1. Getting from a gene to a mature protein
2. Reading a UniProt/Swiss-Prot entry
3. Exploring metabolic databases such as
KEGG
4. Finding out how about post translational
modifications
From a Gene to a Functional Protein




DNA genes get transcribed into mRNAs
mRNAs are translated into proteins
Proteins often need to be matured before becoming active
Matured proteins must be transported to their destination
•
•
•
•
Cell nucleus
Mitochondria or other organelle
Periplasma (bacteria)
Secreted outside the cell
 The protein is functional when it reaches the place where it
has to work (just like you and me)!
Protein Maturation
Maturation can involve
• Removal of some fragments
• Specific protein cleavage
• Chemical modifications
• Phosphorylation
• Addition of lipids or sugars
(glycosylation)
Knowing Your Protein
 To understand how your protein works, you need to
know about
• Its maturation
• Its transportation
• Its mechanism of functioning
 All this information must be determined
experimentally
 If it has been done, it’s in Swiss-Prot
The Swiss-Prot Database
 Entries describe all proteins that have known functions
 Small, non-redundant database: 100,000 entries
• trEMBL contains 4 the 4 million putative proteins found in GenBank
• Swiss-Prot contains the subset of trEMBL with a known function
 All entries annotated manually
 Most accurate database for protein function
 Access Swiss-Prot at www.expasy.ch
Browsing a Swiss-Prot Entry
Find this entry at www.expasy.org/uniprot/P00533
The Main Sections of a
Swiss-Prot Entry
 General information
• Accession number
 References
• Bibliography
 Comment section
• Functional information
 Cross-references
• Links to entries in other databases
 Feature table
• Mapping of every known function
 Sequence
The General Information
in a Swiss-Prot Entry
 The Entry Name
• Identifies the entry
• Can change if the entry gets merged
 The Primary Accession Number
• Has the form PXXXX
• Is permanent and never changes




Last Modified lets you know when the entry was last modified
The Protein Name and Synonyms provide some common names of your protein
The From and Taxonomy fields indicate where the protein comes from
The References section lists all the references used to compile this entry
The Comments Section
 The Comments section lists all
the known functions of the
protein.
 This section is a valuable
document compiled manually by
specialists
 Comments deal with the most
standard topics (see table)
Comment Section of the Entry
P00533
The Cross-reference Section
 Contains hyperlinks to
other entries in other
databases
 Automatically updated
Some Important Cross-References
 EMBL: GenBank original DNA sequence
 PDB: Experimental structure of your protein
 DIP: Proteins interacting with your protein
 GlycoSuiteDB: Glycolsylations
 MIM: List of genetic diseases involving your protein
 Ontologies: Function of your protein
 Profiles: Known protein domains in your protein
 ENSEMBL: Genomic location of your protein
The Features Section
 Localizes precisely every known
function of your protein, each on
its sequence
 TRANSMEM: Transmembrane
domain
 ACT_SITE: Active sites
 BINDING: Binding sites
 DISULPHID: Bridge of cysteines
Finding Out More About Your
Protein’s Maturation
 Proteins are often modified
to make them active
 Modification can imply
attaching a lipid or a sugar
 www.ebi.ac.uk/RESID
• This site details every
known post-translational
modification
 www.glycosuite.com
• A complete database of all
known sugars found in
 Use these resources to
determine the details of the
modification
proteins
 www.lipidbank.jp
• A database of lipids
The Function of Your Protein
 The Features and the Comments sections give you valuable
functional information
 To find out about the function of your protein, you will need to
determine
•
•
•
•
Where your protein works
Metabolic pathway in which the protein is involved
The protein’s 3D structure
Which protein family it belongs to
 You may find this data by following links in the cross-links
section
Where Does Your Protein Work?
 Proteins are usually part of a metabolic
pathway
 A metabolic pathway is like a chain of
production linking several different proteins
 Metabolic pathways modify metabolites by
passing them from one enzyme to the next
 On the KEGG pathway, each enzyme
appears with its EC number
Some Important Resources for
Metabolic Pathways
 www.genome.ad.jp/kegg
• KEGG is the most extensive database of metabolic
pathways
• You can use it to compare species
 www.chem.qmul.ac.uk/iubmb
• The IUBMD assigns the EC numbers used to describe an
enzyme activity
 www.ecocy.org
• An exhaustive list of all known metabolic pathways in E.
coli and other bacteria
What Is the Structure
of Your Protein ?




A protein must have the right structure to perform its function
The structure of a protein is the key to understanding it
Predicting protein structures is very difficult
Precise prediction requires experiments
• X-ray crystallography
• Nuclear magnetic resonance
 Prediction from sequence alone is possible but unreliable
Some Databases of
Protein Structures
www.rcsb.org/pdb
• The database of protein structures
• The protein’s “PDB” is often a synonymous with
its structure
www.ncbi.nlm.nih.gov/Structure
• The other home of protein structures
swissmodel.expasy.org
• Prediction of structures from sequences
Some Important Protein Families
 Proteins can be classified into
families
 www.kinasenet.org
• Kinases control everything in us;
their deregulation is the cause of
 This classification is based on
both function and sequence
many cancers
 imgt.cines.fr
• Immunoglobulins are key elements
of our natural defenses
 Very specialized databases are
available for the most important
families
 rebase.neb.com
• This site is a key resource on
restriction enzymes
Wrapping It Up
 Predicting protein function is a central goal in
biology
 Protein databases help organize knowledge
 They provide the material for
• Developing new biological experiments
• Developing new prediction algorithms
• Extrapolating experimental data to unknown sequences