Transcript Slides

Extracting BI-RADS Features
from Portuguese Clinical Texts
H. Nassif, F. Cunha, I.C. Moreira, R. CruzCorreia, E. Sousa, D. Page, E. Burnside,
and I. Dutra
University of Wisconsin – Madison,
and University of Porto, Portugal
The American Cancer Society, Cancer Facts & Figures 2009.
Mammogram
Radiologist
Impression
(free text)
Structured
Database
Benign
Predictive
Model
Malignant
BI-RADS Lexicon
Concepts
Example
• In the right breast, an approximately 1.0 cm
mass is identified in the right upper slightly
inner breast. This mass is noncalcified and
partially obscured and lobulated in
appearance.
Concepts
Lobular Shape Oval Shape Obscured Margin …
Report 1
0
1
0
…
Report 2
…
1
…
0
…
1
…
…
…
Nassif 09
Syntax Analyzer
•
•
•
•
Tokenize sentences
Discard punctuation
Keep stop words
Stem words
Nassif 09
Information from Lexicon
• Translate lexicon into Portuguese
• Lexicon specifies synonyms:
Eg: Equal density, Isodense
• Lexicon allows for ambiguous wording:
Text
indistinct margin
indistinct calcification
indistinct image
Concept
indistinct margin
amorphous calcification
not a concept
Nassif 09
Experts
• Provide domain specific information
– Synonyms: Oval, Ovoid
– Acronyms, abbreviations
– Domain idiosyncrasies
• Interact with and modify semantic rules
Nassif 09
Concept Finder
• Regular expression rules
• Extract concepts from text
• Rule formation:
– Initial rules based on lexicon
– Rules refined by experts
Rule Generation Example 1
•
•
•
•
•
Aim: Regional Distribution Concept
Lexicon specifies the word “regional”
Initial rule: presence of the word “regional”
Run on training set, experts see results
Many false positives:
– “regional medical center”, “regional hospital”
• Rule refined by experts:
– “regional .* !(medical|hospital)”
Rule Generation Example 2
• Aim: Skin Thickening Concept
• Lexicon specifies “skin thickening”
• Try “skin” and “thickening” in same sentence
– “skin retraction and thickening”
– “thickening of the overlying skin”
– “A BB placed on the skin overlying a palpable focal
area of thickening in the upper outer right breast”
• Experts suggest “skin” and “thickening” in
close proximity
Scope
• Scope: distance between two words
• Start with a large scope:
– assess number of true and false positives
• Move to smaller scopes:
– assess number of false negatives
• Check precision and recall estimates
• Experts decide on the best distance
Nassif 09
Negation Detector
• Negation triggers (Mutalik 01, Gindl 08):
– “não” (not) when not preceded by “onde” (where)
– “sem” (without)
– “nem” (nor).
• Precedes or appears within the subsentence
• Establish negation scope
• “without evidence of suspicious cluster of
microcalcifications”
Dataset
• Training set: 1,129 reports, unlabeled
• Testing set: 153 pairs, labeled by radiologist
– Basic screening report
– Detailed diagnostic report
• Perform three refinement passes
– Double blind, based on lexicon
– Refine rules
– Refine manual labeling and rules
Results
Conclusion
• Out of 48 disputed cases, parser correctly
classified 25 (52.1%)
• First Portuguese BI-RADS extractor
– Discovers features missed or misclassified
– Similar performance to manual annotation
• Method portable to other languages