Mayo Clinical Text Analysis

Download Report

Transcript Mayo Clinical Text Analysis

Open Health Natural Language
Processing Consortium
• www.ohnlp.org (part of caBIG Vocabulary Knowledge
Center web presence)
• Goal
• foster an open-source collaborative community around
clinical NLP that can deliver best-of-breed annotators,
leverage the dynamic features of UIMA flow-control, and
establish the infrastructure for clinical NLP.
• Two open source releases as part of OHNLP
• Mayo’s pipeline for processing clinical notes (cTAKES)
• IBM’s pipeline for processing medical notes (MedKAT)
and pathology reports (MedKAT/P)
1
2
3
cTAKES Technical Details
• Open source release March 15, 2009
• www.ohnlp.org
• Downloads: Documentation and Downloads
• Technical details: Publications
• Framework
• IBM’s Unstructured Information Management
Architecture (UIMA) open source framework
• Methods
• Natural Language Processing methods (NLP)
• Application
• High-throughput phenotype extraction system
(80M+ notes; 80B+ tokens)
4
cTAKES Components
• Core components
• Sentence boundary detection (OpenNLP)
• Tokenization (rule-based)
• Morphologic normalization (NLM’s “norm”)
• POS tagging (OpenNLP)
• Shallow parsing (OpenNLP)
• Named Entity Recognition
• Diseases/disorders, signs/symptoms, procedures,
anatomical sites, medications
• Dictionary mapping (lookup algorithm)
• Machine learning (MAWUI)
• Negation and status identification (NegEx)
5
cTAKES Type System
6
cTAKES example
7
Current Efforts - I
• Anaphoric relations and coreference (as part of the
Ontology Development and Information Extraction
project, University of Pittsburgh) (2008 - 2011)
• In collaboration with Chapman and Crowley
• Semantic processing of the clinical text (in
collaboration with Palmer, Martin and Ward,
University of Colorado) (2009 - 2011)
• Treebanking (deep parses)
• Predicate-argument structure and semantic labeling
(PropBanking)
• UMLS relations (except temporal relations)
8
Current Efforts - II
• Temporal relation discovery (2010-2014)
• In collaboration with Palmer, Martin and Ward,
University of Colorado
• Lexical resources for the clinical domain (2010-2015)
• In collaboration with Chapman, University of
•
Colorado and Elhadad, Columbia University
A la Treebank and clinical named entities with
attributes and modifiers
9