UIMA Introduction James Masanz
Download
Report
Transcript UIMA Introduction James Masanz
UIMA Introduction
SHARPn Summit
June 11, 2012
Outline
UIMA Terminology (not just TLAs)
Parts of a UIMA pipeline
Running a pipeline
Viewing annotations interactively
UIMA Terminology
CAS
XCAS
JCAS View
Analysis Engine (AE) / Annotator
XML output:
Type System
XCAS
XMI
JCasGen
CAS Visual Debugger (CVD)
CPE (Collection Processing Engine)
UIMA
Framework
– Defining data types
– Passing data from one component to another
Tooling
– Viewing results
– Debugging
– Editing XML visually
Data Through a Pipeline
Type System
– Defines the data types passed along
CAS (Common Analysis Structure)
– Container for the data passed along
– Created by UIMA from the Type System
Parts of a UIMA Pipeline
Collection Reader
– Read input document
Analysis Engine(s) / Annotator(s)
– Process document
CAS Consumer
– Output data
Tying a Pipeline Together
CPE descriptor (Collection Processing Engine)
– Collection Reader
– Analysis Engine(s)
– CAS Consumer
Aggregate analysis engine
– Multiple Analysis Engines and their order
Pipeline Example
UIMA term
Example
Collection Reader
Read files from a dir
Analysis Engine
Sentence detector
Analysis Engine
Tokenizer annotator
Analysis Engine
Part of Speech tagger
CAS Consumer
Output tokens to DB
UIMA plugin for Eclipse
Provides visual editors for descriptors
– Mini GUI for selecting options
– Rather than editing XML directly
An “Update site” exists for installing plugin
http://www.apache.org/dist/incubator/uima/eclipse-update-site
UIMA Tooling Options
Tools:
– CPE Configurator
– CVD (CAS Visual Debugger)
Options:
– Command line scripts/.bat files
– Run within Eclipse
Running a Pipeline - CPE
cTAKES provides a script and a bat file
runctakesCPE
Choose a CPE descriptor, such as
test_plaintext.xml
from
cTAKESdesc/cdpdesc/collection_processing_engine
Viewing Annotations - CVD
Viewing annotations using the CVD
– Load the Type System
– Load the XCAS or XMI
Annotation Viewers
UIMA tools
– CVD (CAS Visual Debugger)
– Annotation viewer
Viewing XML output
– Any XML viewer
– Any text editor
Questions?
http://uima.apache.org/
Supplemental slides follow
Options to Run a Pipeline
CPE GUI
CVD GUI
– Single Aggregate Analysis Engine
– No Collection Reader
Instantiate a CpeDescription and invoke
the process() method
uimaFIT– removes dependency on XML
Creating a New Annotator
Within Eclipse
–
–
–
–
–
Create Java project
Right click -> Add UIMA Nature
Add UIMA jars to .classpath (Build Path)
Create Analysis Engine (AE) descriptor
Add types to AE descriptor, or optionally
create separate Type System descriptor
– Write code!
Running an AE in CVD
Using CVD to run an Analysis Engine
– No Collection Reader
– Single Analysis Engine (can be an aggregate)
– No CAS Consumer
– Load an Analysis Engine
– Paste/type in text to process
Family history of hyperlipidemia.
Modifying a parameter
UIMA’s descriptor editors allow you to
modify most parameters without looking at
the XML itself.
Links
Getting started with UIMA
http://uima.apache.org/doc-uima-annotator.html
UIMA Update site for use in Eclipse
http://www.apache.org/dist/incubator/uima/eclipse-update-site
Email address
[email protected]