Transcript Slide 1

http://www.pdb.org/
Experimental approaches for
structural biology
• X-ray crystallography
• NMR
• cryoEM
Where to get structural data?
• biological molecules
– PDB – Protein Data Bank
http://www.pdb.org
free
– NDB – Nucleic Data Bank
http://ndbserver.rutgers.edu/
• organic molecules
– CSD – Cambridge Structural Database
paid
PDB History
1957
• Myoglobin structure determined
1970’s
• Discussions how to establish an archive of protein structures
• PDB established at Brookhaven
– Oct 1971, 7 structures
1980’s
• Technology takes off
– molecular biology, instrumentation, computer hardware and software
• Number of structures increases
• Structural biology is able to focus on medical problems
• IUCr requires data deposition to the PDB
1990’s
• Complexity of structures increases
• Structural genomics begins
Current state of the PDB
• ~61,000 structures in the PDB archive
• Over 7,000 new structures deposited in 2008
• Depositions by macromolecule type
– 92% Protein
– 3.4 % Nucleic acid
– 4% Protein-nucleic acid complexes
• Depositions by experimental technique:
– 86% x-ray diffraction
– 13.2% solution NMR
– 0.4% cryo-EM (256 structures)
data as of 9. 11. 2009
http://www.pdb.org/pdb/static.do?p=general_information/pdb_statistics/index.html
PDB ID
• Each structure in the PDB is represented
by a 4 character identifier of the form [09][a-z,0-9][a-z,0-9][a-z,0-9]
• 1B3T
Data formats of PDB
• PDB format, mmCIF (and derived xml
PDBML)
• Dictionary resources at:
http://mmcif.pdb.org/
• mmCIF is the PDB archival format
– all data released in all three formats
PDB Format
legacy format
 http://www.wwpdb.org/docs.html
 fortran-like 80 column-wide
not structured enough to describe
complicated 3D objects
its limits have been broken several times
 99,999 atoms, 34 (or 58) chains
readable by most programs
model – chain – residue – atom
mmCIF language
based on community-agreed definitions
allows adding new features and
customization
mmCIF categories are easily
transformed to database tables
not designed to be read by humans,
data should be viewed through
programs and databases
http://ich.vscht.cz/~cechp/mmcif/
Pubmed, MEDLINE, Entrez etc.
http://www.pubmed.gov
http://www.pubmed.org
NCBI
National Institute of Health (NIH) – U. S. government
National Library of Medicine (NLM)
National Centre for Biotechnology Information (NCBI)
NCBI (founded 1988, http://www.ncbi.nlm.nih.gov/)
• Genomic sequences - GenBank – open access annotated
collection of all available nucleotide sequences, doubles each
18 months (October 2008 – 97 381 682 336 bp), new release
every 2 months, accession number (U49845) required upon
publication
• OMIM – Online Mendelian Inheritance in Man, db of diseases
together with their genetic components
• PubChem (http://pubchem.ncbi.nlm.nih.gov/) – db of small
organic molecules, includes the information about their
bioacivities
• Entrez (http://www.ncbi.nlm.nih.gov/sites/gquery) – federated
search engine offering unified access to all NCBI databases
MEDLINE
• journal citations and abstracts for biomedical
literature
• since 1996 - free access to MEDLINE via
PubMed.
• PubMed - Web-based retrieval system
developed by the NCBI at the NLM. It is part of
NCBI's Entrez.
• PubMed contains
–
–
–
–
abstracts
links to full-text articles
links to other databases
…and much more
What’s in Pubmed
• Most PubMed records are MEDLINE citations.
– citations and author abstracts from approx. 5 200
biomedical journals
– diverse topics: microbiology, delivery of health care,
nutrition, pharmacology and environmental health.
– currently over 19 million references dating back to
1948
– new material added Tuesday through Saturday
– about 90% records are from English-language
sources or have English abstracts
– Approximately 79% of the citations are included with
the published abstract
What’s in Pubmed
• Pubmed Central (PMC)
– http://www.pubmedcentral.nih.gov/
– db of free full texts
– since 2007 paper funded by NIH must be freely
available through PMC no later tha 12 month since
publishing
• NCBI Bookshelf
– http://www.ncbi.nlm.nih.gov/sites/entrez?db=books
– free biomedical books (biochemistry, molecular
biology, …)
MeSH
• created 1960 by NLM
• "Medical Subject Headings."
– the authority list of the biomedical terms
– used for indexing journal articles for MEDLINE
• It imposes uniformity and consistency to the
indexing of biomedical literature.
• MeSH Tree.
• Citations are indexed manually.
• http://www.nlm.nih.gov/bsd/disted/video/index.html
•
MeSH vocabulary is organized by 16 main branches:
1.
2.
3.
4.
5.
Anatomy
Organisms
Diseases
Chemical and Drugs
Analytical, Diagnostic and Therapeutic Techniques and
Equipment
6. Psychiatry and Psychology
7. Biological Sciences
8. Natural Sciences
9. Anthropology, Education, Sociology and Social Phenomena
10. Technology, Industry, Agriculture
11. Humanities
12. Information Science
13. Named Groups
14. Health Care
15. Publication Characteristics
16. Geographic Locations
Search Pubmed
• each citation has a unique PbMed ID (PMID),
www.pubmed.org/PMID
• Boolean operators
– must be UPPERCASE!
– AND is default
– parenthesis: salmonella AND (hamburger OR
eggs)
• phrase searching
– “kidney failure”, kidney failure*, kidney
failure[tw]
• author names
– natural or inverted order (“julia wong”, “wong
julia”)
– searching last name only – use [au] tag (wheeler[au])
Search tags
•
•
•
•
•
•
•
•
•
[ad] – affiliation of the first author
[all] – all fields
[au] – author
[dp] – date of publication, yyyy/mm/dd, last two
are optionally
[ta] – journal title (abbreviated, full), see Journals
database http://www.ncbi.nlm.nih.gov/journals
[mh] - MeSH term
[majr] – MeSH major topic
[ti] – title
[tiab] – title + abstract
• citation sensor
– choi blood 2008
• related articles
– sorted from most to least relevant
• All, Review, Free full text