Transcript UIMA - Mayo Clinic Informatics
UIMA SHARP 4 - NLP May 25, 2010
Outline • UIMA Terminology (not just TLAs) • Parts of a UIMA pipeline • Running a pipeline • Viewing annotations • Creating a new annotator
UIMA terminology • •
CAS XCAS JCAS View Analysis Engine
(
AE
) / Annotator –
Aggregate Analysis Engine
• XML output:
XCAS
XMI •
Type System JCasGen
•
CAS Visual Debugger
(
CVD
) •
CPE
(
Collection Processing Engine
)
UIMA and Eclipse • UIMA plugin for Eclipse requires EMF • UIMA plugin provides visual editors for descriptors • An “Update site” exists for installing plugin
UIMA Pipeline Flow •
Collection Reader
• (
CAS Initializer
- deprecated) •
Analysis Engine
(
AE
) / Annotator •
CAS Consumer
Pipeline Example UIMA term
Collection Reader Analysis Engine Analysis Engine CAS Consumer
Example Read files from a dir Sentence annotator Tokenizer annotator Output tokens to a DB
Options for running UIMA tools • Tools: –
CPE Configurator
–
CVD
• Options: – Command line scripts/.bat files – Run within Eclipse
Tying together a UIMA pipeline •
Type System
– Defines the data types passed along •
CAS
(Common Analysis Structure) – Container for the data
Tying together a UIMA pipeline •
CPE
descriptor – select the parts –
Collection Reader
–
Analysis Engine
(s) –
CAS Consumer
•
Aggregate analysis engine
– Multiple
Analysis Engines
and their order
Options for running a pipeline •
CVD
GUI – Single
Aggregate Analysis Engine
– No
Collection Reader
•
CPE
GUI • Instantiate a CpeDescription and invoke the process() method 2.3. Running a CPE from Your Own Java Application
Example: Running a pipeline Running cTAKES within Eclipse using a
CPE
Use run configuration UIMA_CPE_GUI--clinical_documents_pipeline
CPE
test1.xml
from clinical documents pipeline\desc\collection_processing_engine
Options for viewing annotations •
CVD
•
Annotation viewer
• XML viewer • Text editor
Example: Viewing annotations Viewing annotations using the
CVD
• Load the
Type System
• Load the
XCAS
or XMI
Example: Running an AE in CVD Using
CVD
to run an
Analysis Engine
– No
Collection Reader
– Single
Analysis Engine
(can be an aggregate) – No
CAS Consumer
– Just paste/type in text to process Family history of hyperlipidemia.
Creating a New Annotator • Create Java project • Right click -> Add UIMA Nature • Add UIMA jars to .classpath (Build Path) • Create
Analysis Engine
(
AE
) descriptor • Add types to
AE
descriptor, or optionally create separate
Type System
descriptor • Write code!
Questions?
Supplemental slides follow
Example: Creating a PEAR file • Right click -> Add UIMA Nature • Right click -> Generate Pear • Select
Analysis Engine
descriptor • Select OS and JDK • Modify Properties if needed • Select what to include
Example: Modifying a parameter UIMA’s descriptor editors allow you to modify
most
parameters without looking at the XML itself.
Links • Getting started with UIMA http://uima.apache.org/doc-uima-annotator.html
• UIMA Update site for use in Eclipse http://www.apache.org/dist/incubator/uima/eclipse-update-site/
Email address [email protected]