Alternative Tools for Mining the Biomedical Literature

Download Report

Transcript Alternative Tools for Mining the Biomedical Literature

Alternative Tools for Mining the Biomedical Literature

Rolando Garcia-Milian

[email protected]

Biomedical & Health Information Services Department Health Sciences Center Library February 14, 2014

In this session

Introduction

Novel online tools for mining the literature

Unified Medical Language System

Quertle

NextBio

Semantic MEDLINE

Problem – Rapid Growth of Biomedical data

GenBank Statistics http://www.ncbi.nlm.nih.gov/genbank/genbankstats-2008/ 3,50

Samples Submitted to Gene Expression Omnibus Database

3,00 2,50 2,00 1,50 1,00 0,50 0,00 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 Compiled from GEO historic data http://www.ncbi.nlm.nih.gov/geo/summary/?type=history

Problem – Growth of the Biomedical Literature

25,00

Number of Records in PubMed Biomedical Literature

20,00 15,00 10,00 5,00 • Huge volume (PubMed 23132342 citations) • High diversity • High quality (peer review) 0,00 1940 1950 1960 1970 1980 1990 2000 2010 2020 Compiled by from PubMed http://www.ncbi.nlm.nih.gov/pubmed • Users overwhelmed by long list of search results • 1/3 of Pubmed queries resulted in 100 or more citations (Islamaj, 2009)

Problem – Querying the Biomedical Literature

Querying the biomedical literature becomes more difficult Boolean operators Filters Medical Subject Headings

Alternative Tools for Mining the Biomedical Literature

Alternative tools for mining the biomedical literature combine: Statistical methods, Ontologies, Natural Language Processing tools, Visualization tools

Reduced time for discovering meaningful results.

Information Retrieval and Information Extraction Information Retrieval retrieves documents/ records EGFR records records

Modified from OpenHelix

Information Extraction extracts facts

T14D inhibited EGF receptor internalization EGFR regulates tumor cell proliferation EGFR is expressed in SCCHN

Text Processing paper Extract =

Modified from OpenHelix

Sentence 1 Sentence 2 Sentence 3 Sentence 4 Sentence 5 Sentence 6 Query = phenotype ( ) + anatomy ( )

:

ontology category tags Sentence 1 Sentence 4 Word Word Word Word Word Word Word Word Word Word Word Word Word Word Word Word Word Word Word Word Word Word Word Word = association = molecular function = phenotype = anatomy etc...

From Müller H-M, Kenny EE, Sternberg PW (2004)

The Process of Marking up a Sentence

Unified Medical Language System (UMLS)

Started in 1986 - National Library of Medicine A set of files and software that brings together many health and biomedical vocabularies and standards to enable interoperability between computer systems (e.g. doctor, pharmacy, billing, biomedical literature mining.

Biomedical terminologies:

Anatomy (FMA) Drugs (RxNorm) Medical devices (UMD) Clinical terms (SNOMED CT) Information sciences (MeSH) Administrative terminologies (ICD-9-CM, CPT) Data exchange terminologies (HL7, LOINC) From Fitzman, 2011 Presentation at Biomedical Informatics course, MBL Woods Hole

Unified Medical Language System - Integrating Terminologies

From Fitzman, 2011 Presentation at Biomedical Informatics course, MBL Woods Hole

Unified Medical Language System - Integrating Terminologies

From Fitzman, 2011 Presentation at Biomedical Informatics course, MBL Woods Hole

Unified Medical Language System (UMLS) - Overview

Text

Lexical Look-up Syntactic Analysis MetaMap SemRep

Semantic Proposition

Specialist Lexicon Metathesaurus

UMLS

Semantic Network

From Fitzman, 2011 Presentation at Biomedical Informatics course, MBL Woods Hole

Unified Medical Language System (UMLS) - Overview

• Text

Pharmamacologic Substance TREATS Sign or Symptom Albuterol (phsu) TREATS Dyspnea (sosy)

Gene or Genome ASSOCIATED_WITH Disease or Syndrome BRCA1 gene (gngm) ASSOCIATED_WITH Breast carcinoma (dsyn)

From Fitzman, 2011 Presentation at Biomedical Informatics course, MBL Woods Hole

Novel Online Tools for Mining the Biomedical Literature

From Luz, 2011 http://www.ncbi.nlm.nih.gov/pubmed/21245076

Comparison of three different literature mining tool Tool Coverage Account Presentation of Results Quertle Semantic MEDLINE

MEDLINE/PubMed; Full-text publications from PubMed Central; NIH RePORTER database of grants applications; NLM TOXLINE database: biochemical, pharmacological, toxicological effects of drugs/chemicals; News (FierceMarkets Life Sciences and Health Care); Scientific whitepapers and research posters submitted to Quertle Not required Highlighted concepts in sentences MEDLINE/PubMed Required - use of UMLS license Network of concepts

NextBio

MEDLINE/PubMed; Full-text publications from PubMed Central; Clinical trials from ClinicalTrials.gov; Elsevier full text journal articles (23 million - available to NextBio Enterprise customers who subscribe to ScienceDirect); News - sourced from publicly available biology and health-related news publications Academic recognized email required Tag cloud

References

Campillos M*, Kuhn M*, Gavin AC, Jensen LJ, Bork P. Drug target identification using side effect similarity. Science. 2008 Jul 11;321(5886):263-6. http://www.ncbi.nlm.nih.gov/pubmed/18621671 Islamaj Dogan R, Murray GC, Névéol A, Lu Z. (2009) Understanding PubMed user search behavior. Database (Oxford) http://www.ncbi.nlm.nih.gov/pubmed/20157491 Kuhn M, Campillos M, Letunic I, Jensen LJ, Bork P. A side effect resource to capture phenotypic effects of drugs. Mol Syst Biol. 2010;6:343. Epub 2010 Jan 19. http://sideeffects.embl.de/drugs/56338/ Luz C (2011) PubMed and beyond: a survey of web tools for searching biomedical literature Database (Oxford) http://www.ncbi.nlm.nih.gov/pubmed/21245076 http://www.ncbi.nlm.nih.gov/pubmed/21245076 Müller H-M, Kenny EE, Sternberg PW (2004) Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature. PLoS Biol 2(11): e309. doi:10.1371/journal.pbio.0020309

http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.0020309

Rindflesch, T.C. et al. (2011) Semantic MEDLINE: An advanced information management application for biomedicine. Information Services & Use, 31, 15-21. http://lhncbc.nlm.nih.gov/system/files/pub-lhncbc-2011-109.pdf

Jensen LJ, Saric J, and Bor P (2006) Literature mining for the biologist: from information retrieval to biological discovery. Nature Reviews Genetics 7: 119-129. Retrieved from http://www.nature.com/nrg/journal/v7/n2/pdf/nrg1768.pdf