Transcript Document

Web Resources
for
Bioinformatics
Vadim Alexandrov and Mark Gerstein
What is Bioinformatics?
• (Molecular) Bio - informatics
• One idea for a definition?
Bioinformatics is conceptualizing biology in
terms of molecules (in the sense of physicalchemistry) and then applying “informatics”
techniques (derived from disciplines such as
applied math, CS, and statistics) to understand and
organize the information associated with these
molecules, on a large-scale.
• Bioinformatics is “MIS” for Molecular Biology
Information. It is a practical discipline with many
applications.
Web Resources:
• Molecules
– Sequence, Structure,
Function
• Algorithms
– HMMs
– alignments
– simulations
• Databases
0. Good Starting Point
http://www.ncbi.nlm.nih.gov/
http://www.rcsb.org/pdb/
Web tour of UCL tools and resources
www.biochem.ucl.ac.uk/bsm/biocomp
Web tour of UCL tools and resources
1. PDBsum capabilities
PDBsum: www.biochem.ucl.ac.uk/bsm/pdbsum
Starting point for looking at PDB structure
Each entry contains:
a. View
Schematic pictures of the entry
• Interactive views (RasMol/VRML)
b. Details
• Name, date and description of macromolecules in PDB entry
• Authors, resolution and R-factor
c. Links
• • • • • • -
PDB header information
PDB, NDB, SWISSPROT
PQS (protein quaternary structure), MMDB
CATH, SCOP, FSSP
Structure check reports - PROCHECK, WHATIF
Many others – enzyme, PRINTS etc
PDBsum capabilites, continued
d. Each chain
CATH classification
Plot of sequence, secondary structure and domain assignments
PROMOTIF analysis
TOPS topology diagram
SAS – annotated FASTA alignment of related sequences in PDB
PROSITE pattern
e. Nucleic acid ligands
Base sequence
NUCPLOT diagram of interactions
f. Small molecule ligands
Schematic diagram of ligand
LIGPLOT diagram of interactions
2. SAS (Sequence Annotated by
Structure): www.biochem.ucl.ac.uk/bsm/sas
Annotation of protein sequences by
structural information.
a.
-
Input for FASTA search of rest of PDB
PDB code
SWISS-PROT code
Paste sequence
Upload own alignment
b.
-
Annotation
Residue type
Ligand contacts
Active site residues
CATH domains
Residue similarity
c. Options
- Select inclusion in alignment
- Colour/b&w, secondary structure
d. View 3D structural superposition
- coloured by SAS annotation
3. CATH: www.biochem.ucl.ac.uk/bsm/cath
Hierarchical domain classification of protein structures in the PDB. Four basic levels:
a. Class (automated): secondary structure composition and packing within
structure
- mainly-a, mainly- b, mixed a-b, low secondary structure
b. Architecture (manual): overall shape of the domain structure as determined by the
orientations of the secondary structures. Connectivity is ignored
- e.g. barrel, sandwich etc.
c. Topology (semi-automated): fold families determined by shape and connectivity of
secondary structures
- e.g. Mainly-b two-layer sandwich
d. Homologous superfamily (semi-automated): domains of common ancestors
determined by sequence and structural similarity
e. Sequence family (automated): highly similar structures and function as determined by
sequence identity
www.biochem.ucl.ac.uk/bsm/cath
4. Other classification databases
a. Enzyme structures database:
www.biochem.ucl.ac.uk/bsm/enzymes
- PDB enzymes structures classified by E.C. number
b. Protein-DNA database:
www.biochem.ucl.ac.uk/bsm/prot_dna/prot_dna.html
- PDB complex structures classified by binding motif
5. Protein sequence analysis:
www.biochem.ucl.ac.uk/bsm/dbbrowser
Protein sequence
search using
protein
fingerprints group of
conserved
sequence
motifs used
to
characterize a
protein
family.
6. Gross-level protein properties
Protein-protein interaction server:
www.biochem.ucl.ac.uk/bsm/PP/server
Protein-DNA interaction server:
www.biochem.ucl.ac.uk/bsm/PP/server
7. Atomic-level protein properties
a. PROCAT: www.biochem.ucl.ac.uk/bsm/PROCAT/PROCAT.html
- Database of 3D enzyme active sites
b. Hydrogen bond atlas: www.biochem.ucl.ac.uk/~mcdonald/atlas
- Graphical summary of hydrogen-bonding properties of amino acids
c. Atlas of side chain-side chain/side chain-base interactions:
www.biochem.ucl.ac.uk/bsm/sidechains
- interaction geometries of side chain and side chain-base pairs
8. Publicly available software
(protein structure/interaction)
a.
b.
c.
d.
e.
f.
g.
h.
i.
HBPLUS - calculation of interactions in PDB structures
LIGPLOT - schematic diagrams of protein-ligand interactions
NUCPLOT - schematic diagrams of protein-DNA interactions
PROMOTIF - analyze protein secondary structural motifs
NACCESS - calculate atomic accessibilities of protein surfaces
SURFNET - visualization of molecular surfaces, cavities etc
PROCHECK - check stereochemical quality of protein structures
THREADER - prediction of protein tertiary structure
MEMSAT - prediction of transmembrane protein structure
j-z BROWSE THE WEB AT YOUR SPARE TIME AND BOOKMARK ‘EM!
‘Domestic’ resources: http://bioinfo.mbb.yale.edu/partslist/