Transcript Document
Protein and Specialized
Sequence Databases
© Wiley Publishing. 2007. All Rights Reserved.
Learning Objectives
Finding out the basics of protein maturation
Deciphering a Swiss-Prot entry
Getting to know specialized protein
databases such as KEGG (the metabolicpathways database) or PDB (the structure
database)
Outline
1. Getting from a gene to a mature protein
2. Reading a UniProt/Swiss-Prot entry
3. Exploring metabolic databases such as
KEGG
4. Finding out how about post translational
modifications
From a Gene to a Functional Protein
DNA genes get transcribed into mRNAs
mRNAs are translated into proteins
Proteins often need to be matured before becoming active
Matured proteins must be transported to their destination
•
•
•
•
Cell nucleus
Mitochondria or other organelle
Periplasma (bacteria)
Secreted outside the cell
The protein is functional when it reaches the place where it
has to work (just like you and me)!
Protein Maturation
Maturation can involve
• Removal of some fragments
• Specific protein cleavage
• Chemical modifications
• Phosphorylation
• Addition of lipids or sugars
(glycosylation)
Knowing Your Protein
To understand how your protein works, you need to
know about
• Its maturation
• Its transportation
• Its mechanism of functioning
All this information must be determined
experimentally
If it has been done, it’s in Swiss-Prot
The Swiss-Prot Database
Entries describe all proteins that have known functions
Small, non-redundant database: 100,000 entries
• trEMBL contains 4 the 4 million putative proteins found in GenBank
• Swiss-Prot contains the subset of trEMBL with a known function
All entries annotated manually
Most accurate database for protein function
Access Swiss-Prot at www.expasy.ch
Browsing a Swiss-Prot Entry
Find this entry at www.expasy.org/uniprot/P00533
The Main Sections of a
Swiss-Prot Entry
General information
• Accession number
References
• Bibliography
Comment section
• Functional information
Cross-references
• Links to entries in other databases
Feature table
• Mapping of every known function
Sequence
The General Information
in a Swiss-Prot Entry
The Entry Name
• Identifies the entry
• Can change if the entry gets merged
The Primary Accession Number
• Has the form PXXXX
• Is permanent and never changes
Last Modified lets you know when the entry was last modified
The Protein Name and Synonyms provide some common names of your protein
The From and Taxonomy fields indicate where the protein comes from
The References section lists all the references used to compile this entry
The Comments Section
The Comments section lists all
the known functions of the
protein.
This section is a valuable
document compiled manually by
specialists
Comments deal with the most
standard topics (see table)
Comment Section of the Entry
P00533
The Cross-reference Section
Contains hyperlinks to
other entries in other
databases
Automatically updated
Some Important Cross-References
EMBL: GenBank original DNA sequence
PDB: Experimental structure of your protein
DIP: Proteins interacting with your protein
GlycoSuiteDB: Glycolsylations
MIM: List of genetic diseases involving your protein
Ontologies: Function of your protein
Profiles: Known protein domains in your protein
ENSEMBL: Genomic location of your protein
The Features Section
Localizes precisely every known
function of your protein, each on
its sequence
TRANSMEM: Transmembrane
domain
ACT_SITE: Active sites
BINDING: Binding sites
DISULPHID: Bridge of cysteines
Finding Out More About Your
Protein’s Maturation
Proteins are often modified
to make them active
Modification can imply
attaching a lipid or a sugar
www.ebi.ac.uk/RESID
• This site details every
known post-translational
modification
www.glycosuite.com
• A complete database of all
known sugars found in
Use these resources to
determine the details of the
modification
proteins
www.lipidbank.jp
• A database of lipids
The Function of Your Protein
The Features and the Comments sections give you valuable
functional information
To find out about the function of your protein, you will need to
determine
•
•
•
•
Where your protein works
Metabolic pathway in which the protein is involved
The protein’s 3D structure
Which protein family it belongs to
You may find this data by following links in the cross-links
section
Where Does Your Protein Work?
Proteins are usually part of a metabolic
pathway
A metabolic pathway is like a chain of
production linking several different proteins
Metabolic pathways modify metabolites by
passing them from one enzyme to the next
On the KEGG pathway, each enzyme
appears with its EC number
Some Important Resources for
Metabolic Pathways
www.genome.ad.jp/kegg
• KEGG is the most extensive database of metabolic
pathways
• You can use it to compare species
www.chem.qmul.ac.uk/iubmb
• The IUBMD assigns the EC numbers used to describe an
enzyme activity
www.ecocy.org
• An exhaustive list of all known metabolic pathways in E.
coli and other bacteria
What Is the Structure
of Your Protein ?
A protein must have the right structure to perform its function
The structure of a protein is the key to understanding it
Predicting protein structures is very difficult
Precise prediction requires experiments
• X-ray crystallography
• Nuclear magnetic resonance
Prediction from sequence alone is possible but unreliable
Some Databases of
Protein Structures
www.rcsb.org/pdb
• The database of protein structures
• The protein’s “PDB” is often a synonymous with
its structure
www.ncbi.nlm.nih.gov/Structure
• The other home of protein structures
swissmodel.expasy.org
• Prediction of structures from sequences
Some Important Protein Families
Proteins can be classified into
families
www.kinasenet.org
• Kinases control everything in us;
their deregulation is the cause of
This classification is based on
both function and sequence
many cancers
imgt.cines.fr
• Immunoglobulins are key elements
of our natural defenses
Very specialized databases are
available for the most important
families
rebase.neb.com
• This site is a key resource on
restriction enzymes
Wrapping It Up
Predicting protein function is a central goal in
biology
Protein databases help organize knowledge
They provide the material for
• Developing new biological experiments
• Developing new prediction algorithms
• Extrapolating experimental data to unknown sequences