Transcript Alternative Tools for Mining the Biomedical Literature
Alternative Tools for Mining the Biomedical Literature
Rolando Garcia-Milian
Biomedical & Health Information Services Department Health Sciences Center Library February 14, 2014
In this session
Introduction
Novel online tools for mining the literature
Unified Medical Language System
Quertle
NextBio
Semantic MEDLINE
Problem – Rapid Growth of Biomedical data
GenBank Statistics http://www.ncbi.nlm.nih.gov/genbank/genbankstats-2008/ 3,50
Samples Submitted to Gene Expression Omnibus Database
3,00 2,50 2,00 1,50 1,00 0,50 0,00 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 Compiled from GEO historic data http://www.ncbi.nlm.nih.gov/geo/summary/?type=history
Problem – Growth of the Biomedical Literature
25,00
Number of Records in PubMed Biomedical Literature
20,00 15,00 10,00 5,00 • Huge volume (PubMed 23132342 citations) • High diversity • High quality (peer review) 0,00 1940 1950 1960 1970 1980 1990 2000 2010 2020 Compiled by from PubMed http://www.ncbi.nlm.nih.gov/pubmed • Users overwhelmed by long list of search results • 1/3 of Pubmed queries resulted in 100 or more citations (Islamaj, 2009)
Problem – Querying the Biomedical Literature
Querying the biomedical literature becomes more difficult Boolean operators Filters Medical Subject Headings
Alternative Tools for Mining the Biomedical Literature
Alternative tools for mining the biomedical literature combine: Statistical methods, Ontologies, Natural Language Processing tools, Visualization tools
Reduced time for discovering meaningful results.
Information Retrieval and Information Extraction Information Retrieval retrieves documents/ records EGFR records records
Modified from OpenHelix
Information Extraction extracts facts
T14D inhibited EGF receptor internalization EGFR regulates tumor cell proliferation EGFR is expressed in SCCHN
Text Processing paper Extract =
Modified from OpenHelix
Sentence 1 Sentence 2 Sentence 3 Sentence 4 Sentence 5 Sentence 6 Query = phenotype ( ) + anatomy ( )
:
ontology category tags Sentence 1 Sentence 4 Word Word Word Word Word Word Word Word Word Word Word Word Word Word Word Word Word Word Word Word Word Word Word Word = association = molecular function = phenotype = anatomy etc...
From Müller H-M, Kenny EE, Sternberg PW (2004)
The Process of Marking up a Sentence
Unified Medical Language System (UMLS)
Started in 1986 - National Library of Medicine A set of files and software that brings together many health and biomedical vocabularies and standards to enable interoperability between computer systems (e.g. doctor, pharmacy, billing, biomedical literature mining.
Biomedical terminologies:
Anatomy (FMA) Drugs (RxNorm) Medical devices (UMD) Clinical terms (SNOMED CT) Information sciences (MeSH) Administrative terminologies (ICD-9-CM, CPT) Data exchange terminologies (HL7, LOINC) From Fitzman, 2011 Presentation at Biomedical Informatics course, MBL Woods Hole
Unified Medical Language System - Integrating Terminologies
From Fitzman, 2011 Presentation at Biomedical Informatics course, MBL Woods Hole
Unified Medical Language System - Integrating Terminologies
From Fitzman, 2011 Presentation at Biomedical Informatics course, MBL Woods Hole
Unified Medical Language System (UMLS) - Overview
Text
Lexical Look-up Syntactic Analysis MetaMap SemRep
Semantic Proposition
Specialist Lexicon Metathesaurus
UMLS
Semantic Network
From Fitzman, 2011 Presentation at Biomedical Informatics course, MBL Woods Hole
Unified Medical Language System (UMLS) - Overview
• Text
Pharmamacologic Substance TREATS Sign or Symptom Albuterol (phsu) TREATS Dyspnea (sosy)
•
Gene or Genome ASSOCIATED_WITH Disease or Syndrome BRCA1 gene (gngm) ASSOCIATED_WITH Breast carcinoma (dsyn)
From Fitzman, 2011 Presentation at Biomedical Informatics course, MBL Woods Hole
Novel Online Tools for Mining the Biomedical Literature
From Luz, 2011 http://www.ncbi.nlm.nih.gov/pubmed/21245076
Comparison of three different literature mining tool Tool Coverage Account Presentation of Results Quertle Semantic MEDLINE
MEDLINE/PubMed; Full-text publications from PubMed Central; NIH RePORTER database of grants applications; NLM TOXLINE database: biochemical, pharmacological, toxicological effects of drugs/chemicals; News (FierceMarkets Life Sciences and Health Care); Scientific whitepapers and research posters submitted to Quertle Not required Highlighted concepts in sentences MEDLINE/PubMed Required - use of UMLS license Network of concepts
NextBio
MEDLINE/PubMed; Full-text publications from PubMed Central; Clinical trials from ClinicalTrials.gov; Elsevier full text journal articles (23 million - available to NextBio Enterprise customers who subscribe to ScienceDirect); News - sourced from publicly available biology and health-related news publications Academic recognized email required Tag cloud
References
Campillos M*, Kuhn M*, Gavin AC, Jensen LJ, Bork P. Drug target identification using side effect similarity. Science. 2008 Jul 11;321(5886):263-6. http://www.ncbi.nlm.nih.gov/pubmed/18621671 Islamaj Dogan R, Murray GC, Névéol A, Lu Z. (2009) Understanding PubMed user search behavior. Database (Oxford) http://www.ncbi.nlm.nih.gov/pubmed/20157491 Kuhn M, Campillos M, Letunic I, Jensen LJ, Bork P. A side effect resource to capture phenotypic effects of drugs. Mol Syst Biol. 2010;6:343. Epub 2010 Jan 19. http://sideeffects.embl.de/drugs/56338/ Luz C (2011) PubMed and beyond: a survey of web tools for searching biomedical literature Database (Oxford) http://www.ncbi.nlm.nih.gov/pubmed/21245076 http://www.ncbi.nlm.nih.gov/pubmed/21245076 Müller H-M, Kenny EE, Sternberg PW (2004) Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature. PLoS Biol 2(11): e309. doi:10.1371/journal.pbio.0020309
http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.0020309
Rindflesch, T.C. et al. (2011) Semantic MEDLINE: An advanced information management application for biomedicine. Information Services & Use, 31, 15-21. http://lhncbc.nlm.nih.gov/system/files/pub-lhncbc-2011-109.pdf
Jensen LJ, Saric J, and Bor P (2006) Literature mining for the biologist: from information retrieval to biological discovery. Nature Reviews Genetics 7: 119-129. Retrieved from http://www.nature.com/nrg/journal/v7/n2/pdf/nrg1768.pdf