NCBI - Alumni Medical Library

Download Report

Transcript NCBI - Alumni Medical Library

Beyond PubMed and BLAST:
Exploring NCBI tools and
databases
Kate Bronstad
David Flynn
Alumni Medical Library
Alumni Medical Library
• Location
− 12th Floor Instructional Bldg
− www.medlib.bu.edu
• Services
− Electronic resources: full text access through PubMed,
Google Scholar, Web of Science
−Reference: drop in or by reservation
− Instruction: request class sessions or
creation of web tutorial
- Learning resource center: lab space, hands-on
instruction
NCBI
• National Center for Biotechnology Information
• Built on Entrez System
• Original database was Nucleotide
• PubMed built upon this original structure.
• PubMed, GENE, other molecular databases
interconnected
• Gene discovery, related data options in PubMed
• MyNCBI works with multiple databases
GENE
• Gives sequence, expression, information about protein
structure and function.
• Doesn't list all known and predicted genes
• Focuses on completely sequenced genomes or
ones where research communities are actively
contributing genetic information.
• Information from RefSeq and collaborating model
organism databases.
• Mix of curated and automatically updated
information.
•Pulls in, links out to resources outside of NCBI.
•4.6 Million records for 5,588 taxa
GENE Record
•Summary
official full name, gene type, lineage, summary, AKA
•Genomic regions, transcripts – structure, exon-intron
boundaries.
− Gene table for fuller display.
• Bibliography: GeneRIF.
− Summary of gene functions with specific references to
related articles about function of gene/proteins in
PubMed. Put together by people at NCBI.
− Not comprehensive, but will give you the most relevant
papers regarding function.
− Authors can contact the NCBI to submit their citations
RefSeq
• Reference Sequences
− Nucleotide sequences and protein translation
− Curated by NCBI or NCBI-approved programs.
• Difference between GenBank and RefSeq
− GenBank has raw data and duplicated records
− Metadata in GenBank can be incomplete
− RefSeq annotated, curated and non-redundant.
− NCBI takes best sequences from GenBank and
curates for RefSeq records
RefSeq Record Numbers
mRNAs and Proteins
NM_123456
NP_123456
NR_123456
XM_123456
XP_123456
XR_123456
Gene Records
NG_123456
Chromosome
NC_123455
AC-123455
Assemblies
NT_123456
NW_123456
Curated mRNA
Curated Protein
Curated non-coding RNA
Predicted mRNA
Predicted Protein
Predicted non-coding RNA
Reference Genomic Sequence
Microbial replicons, organelle
genomes, human chromosomes
Alternate assemblies
Contig
WGS Supercontig
OMIM
• Online Mendelian Inheritance in Man
• Previously in print, 10 volumes, updated every
2 years.
• Contains all the known genes in humans.
• Gives referenced explanations of cloning,
allelic variations, inheritance, mapping,
molecular genetics
• Links to clinical and testing information
• OMIA (Online Mendelian Inheritance in
Animals) a separate database for information in
animals.
Databases for Evidence
• GEO Profiles: Microarray Data Repository public
repository
- Archives and freely distributes microarray,
next-generation sequencing, and other highthroughput functional genomic data.
- Submitted by researchers. Offers data
storage, web-based interfaces and
applications to query and download content
• Evidence Viewer: Graphical display of evidence
supporting a gene model
Genome
• Sequence and map data from the whole genomes of
over 1000 organisms
-Represent organisms that are completely
sequenced and those that are in progress.
• Graphical overviews of complete
genomes/chromosomes
• Specialized genome BLAST search to see
alignments in context of genome
• Good for microbial genomes.
Homologene
• May
want to use instead of BLAST if looking for a
model organism with same function or if looking at an
evolutionary comparison.
• Allows downloads of genomic information.
- Can capture regulatory region by including
bases up or down stream.
• Multiple and pairwise alignment
• Protein Alignment scores
- Substitution rates, synonymous vs. non,
conservative vs. radical
• Polymorphisms in GeneView dbSNP link
Structure and Models
• Structure, MMDB (Molecular Modeling Database)
-Access from Protein link, Related Structure
• CN3D for application to view at different angles,
highlight sequence in structure.
• VAST (Vector Alignment Search Tool) searches
by geometric criteria
BLink
• BLAST Link
- Pre-run BLAST results
- NCBI runs weekly searches for every new protein
sequence.
• Can use instead of running BLAST search
- More information than in default BLAST:
taxonomy report, view multiple alignments,
search data against different
Links to Outside Databases
• MGI
• Ensembl
• KEGG: Kyoto encyclopedia of genes and genomes
- Integrated databases
- Pathway, disease, drug
- Good for quick pathway and protein graphics
•UCSC Genome Browser
-Visualize tracks to compare information like gene
predictions, ESTs, conserved regions.
- BLAT Blast-like alignment tool – quicker but not
as sensitive as BLAST.
Gene Information from GO
• Gene expression information from Gene Ontology (GO)
- Lists what has been assigned to the gene in:
Molecular Function
Biological Processes
Cellular Component
• Level of evidence and references linked when available.
• Links into AMIGO browser for more ontology or evidence
information
•Can search GENE for GO information by placing suffix at
end of search
Ex: “vasodilation [GO]”
BU Resources
• Biostatistics
- Dr.Mayetri Gupta: created statistical software for
discovering transcription factor binding sites (motifs)
and regulatory modules, gene regulatory networks,
and phylogenetic inference.
- Dr. Paola Sebastiani: created software for network
modeling called Bayesware Discoverer, also CAGED,
BAGED for analysis of gene expression data.
Library Support
• Contact the library with any suggestions,
recommendations that we can list or promote for BU
community
• Software and datasets can be archived in BU’s Digital
Common
• If there are resources we don’t have, we may be able
to procure them for you.
• Hands-on BLAST workshop offered.