Adapting an Algorithm to a Corpus
Download
Report
Transcript Adapting an Algorithm to a Corpus
Adapting an Algorithm to a Corpus
Peter Nelson
Carleton College
J. Starren, M.D., Ph.D.
L. Rasmussen
Project Purpose
2
In the context of a GWAS on hypothyroidism
A particular natural language processing
algorithm used to identify contextual features
Discover and evaluate automatic and
semi-automatic methods of adapting that
algorithm to a corpus of medical records
Project Motivation
PMRP
eMERGE
Hypothyroidism GWAS
–
3
Phenotyping
Project Motivation - PMRP
Marshfield Clinic PMRP
–
–
–
~ 20,000 people from central WI
EHR and blood samples
Studies in the fields of:
–
4
Population Genetics
Genetic Epidemiology
Pharmacogenetics
Leverage genetic data to improve care
Project Motivation - eMERGE
eMERGE Network
Organized by NHGRI
Members
–
–
–
–
–
5
Marshfield Clinic
Vanderbilt
Northwestern
Mayo Clinic
Group Health Cooperative
Genome Wide Association Studies
What is a GWAS? Why Do One?
6
“[A GWAS] involves rapidly scanning markers across
the… genomes of many people to find genetic
variations associated with a particular disease.”
“[R]esearchers can use the information to develop
better strategies to detect, treat and prevent the
disease.”
“…common, complex diseases, such as asthma,
cancer, diabetes….”
NHGRI website
(http://www.genome.gov/20019523)
Hypothyroidism GWAS
7
Insufficient hormone production by thyroid
gland can cause fatigue, weight gain, and
other symptoms.
Diagnosable and treatable
About 3% of American population have
clinical condition
Different Causes
Hypothyroidism GWAS
eMERGE Study
–
–
–
–
8
Identify patients with presumptive Hashimoto’s
disease induced hypothyroidism (Cases)
Identify patients with normal thyroid function
(Controls)
Genotype cases and controls (by testing for
100,000s of SNPs)
Genome-wide association analysis
Phenotyping in a GWAS
Doctors design an algorithm for phenotyping based
on the presence or absence of key procedures,
medicines, and conditions in a patient’s medical
history
EHR is used as a resource
–
–
–
9
Coded fields
Unmarked text
Images
Manual vs Electronic Phenotyping
Manual phenotyping by chart abstractors
–
–
Accurate (Gold standard)
Far too expensive (~20,000 medical records to process)
Electronic phenotyping by computers
–
Methods
–
–
10
Query database of coded fields
Natural language processing on free text
OCR and Image Processing on other resources
Comparatively cheap
Sample must be validated by chart abstractors
Natural Language Processing
11
What is it?
What problems must be solved?
How can they be solved?
Natural Language Processing
Search for concepts in free text of EHR
Simple keyword search insufficient
–
–
–
–
–
12
“There was no evidence of polyps or ulceration.”
“Rule out H. pylori, gastritis and gastropathy.”
“She should return to the Emergency Department if she
experiences nausea or vomiting.”
“Patient should avoid any tests which involve the use of iodinated
contrast material”
“The indication for this procedure is family history of colon cancer.”
Natural Language Processing
Search for concepts in free text of EHR
Negated
–
–
–
–
–
13
“There was no evidence of polyps or ulceration.”
“Rule out H. pylori, gastritis and gastropathy.”
“She should return to the Emergency Department if she
experiences nausea or vomiting.”
“Patient should avoid any tests which involve the use of iodinated
contrast material”
“The indication for this procedure is family history of colon cancer.”
Natural Language Processing
Search for concepts in free text of EHR
Hypothetical
–
–
–
–
–
14
“There was no evidence of polyps or ulceration.”
“Rule out H. pylori, gastritis and gastropathy.”
“She should return to the Emergency Department if she
experiences nausea or vomiting.”
“Patient should avoid any tests which involve the use of iodinated
contrast material”
“The indication for this procedure is family history of colon cancer.”
Natural Language Processing
Search for concepts in free text of EHR
Family History
–
–
–
–
–
15
“There was no evidence of polyps or ulceration.”
“Rule out H. pylori, gastritis and gastropathy.”
“She should return to the Emergency Department if she
experiences nausea or vomiting.”
“Patient should avoid any tests which involve the use of iodinated
contrast material”
“The indication for this procedure is family history of colon cancer.”
NegEx
Simple
Performs well
–
–
–
Recently extended
–
–
16
Against gold standard
Against MedLEE
Against straight statistical methods
Hypothetical & Family History
“ConText”
NegEx
“There was no evidence of polyps or ulceration.”
17
NegEx
“There was no evidence of polyps or ulceration.”
18
NegEx
“There was no evidence of polyps or ulceration.”
................................................. |
19
NegEx
“There was no evidence of polyps or ulceration.”
................................................. |
20
NegEx
“Rule out H. pylori, gastritis, and gastropathy.”
21
NegEx
“Rule out H. pylori, gastritis, and gastropathy.”
………………………………………|
22
NegEx
“Quantitative PCR testing for BK Virus is negative.”
23
NegEx
“Quantitative PCR testing for BK Virus is negative.”
|…………………………………………………
24
NegEx
25
“No evidence of spread of cancer to the lungs.”
“No residua of healed fractures can be seen otherwise.”
NegEx
26
………………………………………………..|
“No evidence of spread of cancer to the lungs.”
…………………………………………………………|
“No residua of healed fractures can be seen otherwise.”
NegEx
27
“No evidence of spread of cancer to the lungs.”
“No residua of healed fractures can be seen otherwise.”
NegEx
28
NegEx, and therefore ConText, require
carefully tuned lists of triggers and
pseudotriggers.
How big must a list be to perform well?
29
Scenarios
30
Annotated training set used to populate lists
Large unmarked training set used to extend
existing lists
Using Annotated Data
31
NegEx/ConText creators provide annotated excerpts
from medical records
Look for associations between words and negation
to populate list of triggers
Look for associations between words near triggers
and false positives to populate list of pseudotriggers
Identifying Triggers
32
Create a confusion matrix
for each word
Sort words by some statistic
based on these confusion
matrices
Select or reject top
candidate as a trigger
Repeat on yet unexplained
sentences until stopping
condition met
Actual Classification
+
-
TP
FP
FN
TN
+
Predicted
Classification
-
Identifying Triggers
Statistical measures used
–
–
–
–
33
Log-likelihood ratio
Precision (PPV)
Recall (Sensitivity)
F-measure
Log-Likelihood Ratio
LLR
Precision
Recall
F-measure
no
1763.2
95.3
69.9
80.6
Total
0.0
0.0
0.0
0.0
34
Triggers: { }
Log-Likelihood Ratio
Total
35
LLR
Precision
Recall
F-measure
1763.2
95.3
69.9
80.6
Triggers: { no }
Log-Likelihood Ratio
LLR
Precision
Recall
F-measure
denies
617.8
100.0
50.0
66.7
Total
1763.2
95.3
69.9
80.6
36
Triggers: { no }
Log-Likelihood Ratio
Total
37
LLR
Precision
Recall
F-measure
2371.6
96.1
84.9
90.2
Triggers: { no, denies }
Log-Likelihood Ratio
LLR
Precision
Recall
F-measure
not
179.4
70.6
32.4
44.4
Total
2371.6
96.1
84.9
90.2
38
Triggers: { no, denies }
Log-Likelihood Ratio
Total
39
LLR
Precision
Recall
F-measure
2519.5
94.2
89.8
92.0
Triggers: { no, denies, not }
Log-Likelihood Ratio
LLR
Precision
Recall
F-measure
denied
187.6
100.0
34.0
50.7
Total
2519.5
94.2
89.8
92.0
40
Triggers: { no, denies, not }
Log-Likelihood Ratio
Total
41
LLR
Precision
Recall
F-measure
2704.2
94.4
93.3
93.9
Triggers: { no, denies, not, denied }
Log-Likelihood Ratio
LLR
Precision
Recall
F-measure
without
79.9
60.0
27.3
37.5
Total
2704.2
94.4
93.3
93.9
42
Triggers: { no, denies, not, denied }
Log-Likelihood Ratio
Total
43
LLR
Precision
Recall
F-measure
2763.2
93.4
95.1
94.2
Triggers: { no, denies, not, denied, without }
Log-Likelihood Ratio
LLR
Precision
Recall
F-measure
negative
77.7
100.0
25.0
40.0
Total
2763.2
93.4
95.1
94.2
44
Triggers: { no, denies, not, denied, without }
Log-Likelihood Ratio
Total
45
LLR
Precision
Recall
F-measure
2839.7
93.5
96.3
94.9
Triggers: { no, denies, not, denied, without,
negative }
Log-Likelihood Ratio
LLR
Precision
Recall
F-measure
resolved
61.3
83.3
27.8
41.7
Total
2839.7
93.5
96.3
94.9
46
Triggers: { no, denies, not, denied, without,
negative }
Log-Likelihood Ratio
Total
47
LLR
Precision
Recall
F-measure
2900.0
93.4
97.4
95.3
Triggers: { no, denies, not, denied, without,
negative, resolved (post) }
Log-Likelihood Ratio
LLR
Precision
Recall
F-measure
4-way
tie!
-
-
-
-
Total
2900.0
93.4
97.4
95.3
48
Triggers: { no, denies, not, denied, without,
negative, resolved (post) }
Other Measures
Precision (PPV)
–
–
Recall (sensitivity)
–
–
–
Catches all the same ones as LLR
Also finds “any”, “the”, and “for”
Imprecise metric
F-measure
–
–
49
271 tie for 100%
Poor metric
Identical results to LLR
Good metric
Identifying Pseudotriggers
50
Use analogous method to find words that predict
false-positives
Limit to words next to triggers
Filter out prospects with low precision
Sort by LLR
Identifying Pseudotriggers
Some real pseudotriggers
–
–
Some that should be considered for addition to the
list of pseudotriggers
–
–
“not know”
“no additional”
Some entirely anomalous pseudotriggers
–
51
“no residua”
“without difficulty”
“no hepatosplenomegaly”
Further Work
52
Formalize stopping condition
Try other statistical measures
Can potential pseudotriggers be further
explored using unannotated EHR?
Evaluate the finished algorithm on ConText
data
Using Unmarked Data
Many pseudotriggers are variations on other
pseudotriggers
–
–
–
53
“No change”
“No significant change”
“No increase”
Could a large unmarked corpus of EHR be
searched for variations on pseudotriggers?
Phrase Comparison Methods
54
Edit Distance
N-gram similarity, Set similarity
Vector based methods
Word Comparison Methods
Path-based methods
–
–
–
Path-based, with IC
–
–
–
Resnik
Jiang-Conrath
Lin
Gloss-based
–
–
–
55
Path
Wu-Palmer
Leacock-Chodorow
Lesk (and Lesk Extended)
Gloss-Vector
LSA
Preliminary Results
56
Edit distance seems to be a poor phrase
comparison metric
Path-based measures seem to be poor word
comparison metrics
Further Work
57
Explore gloss-based measures of word
similarity
Explore other measures of phrase similarity
other than edit distance
Evaluate the finished metric on ConText lists
Validation
58
Take algorithms developed on NegEx and
apply them to ConText
Have chart abstractors evaluate terms from
some documents in the hypothyroidism
GWAS. Compare performance of unmodified
ConText with that of extended version(s)
Results/Conclusion
59
The study is ongoing; no final results are available
The methods described in this presentation show
promise, but they must be validated before any
conclusions can be drawn
If the phrase comparison metric performs well, it
could potentially be used to solve smoothing
problems in n-gram models.
N-gram Interlude
N-gram models estimate probability based
on leading context:
–
–
Many applications
–
–
–
60
“Class, please hand your homework ___”
“I heard a sharp rap on the ___”
Machine translation
OCR, speech recognition, spell checking
Identifying pathogical islands in virus and bacteria
genomes, Predicting protein folding
N-gram Interlude
As size of the n-grams (i.e., n) increases
–
–
–
–
61
Performance improves
Number of parameters increases exponentially
Size of data set necessary to accurately estimate
parameters becomes impossibly large
Missing parameters must be estimated based on
existing ones (Smoothing)
Could smoothing be based on a phrase
similarity metric?
Bibliography
62
Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. A simple algorithm for identifying
negated findings and diseases in discharge summaries. J Biomed Inform 2001;34(5):301–10.
Chu D, Dowling JN, Chapman WW. Evaluating the effectiveness of four contextual features in classifying
annotated clinical conditions in emergency department reports. AMIA Annu Symp Proc 2006:141–5.
Goryachev S, Kim H, Zeng-Treitler Q. 2008. Identification and extraction of family history information from
clinical reports. In proceedings of AMIA Annu Symp Proc. 2008 Nov 6:247-51.
Goryachev S, Sordo M, Zeng QT, and Ngo L. 2006. Implementation and evaluation of four different
methods of negation detection. Technical report, DSG.
Harkema H et al. ConText: An algorithm for determining negation, experiencer, and temporal status from
clinical reports. J Biomed Inform (2009), doi:10.1016/j.jbi.2009.05.002
Pedersen, Ted. 1996. Fishing for exactness. In Proceedings of the South-Central SAS Users Group
Conference, pages 188--200, Austin, TX.
Xu H, Anderson K, Grann VR, Friedman C. Facilitating cancer research using natural language processing
of pathology reports. Medinfo 2004;2004:565-72.
Acknowledgements
Luke Rasmussen
Laura Coleman & Ruth Zetek
Justin Starren
MCRF
Donors
Creators and Maintainers of
–
–
–
63
NegEx/ConText : W. Chapman, H. Harkema, X. Shen,
Kang
NLTK : S. Bird, E. Klein, E. Loper, et al.
WordNet Similarity : Ted Pedersen, et al.
P.