Text-mining & ontologies Andrey Rzhetsky A (very) short introduction into text-mining GeneWays as an infogrinder On-line Journals GeneWays Pathways.
Download ReportTranscript Text-mining & ontologies Andrey Rzhetsky A (very) short introduction into text-mining GeneWays as an infogrinder On-line Journals GeneWays Pathways.
Text-mining & ontologies Andrey Rzhetsky A (very) short introduction into text-mining GeneWays as an infogrinder On-line Journals GeneWays Pathways Graph: multi-type arcs and nodes QuickTime™ and a decompressor are needed to see this picture. Typical arcs 1001,'bind' 1004,'suppress' 1011,'replace' 1018,'interact' 1020,'activate' 1022,'stimulate' 1023,'phosphorylate' 1027,'increase' 1028,'associate' 1034,'up-regulate' 1036,'inhibit' 1040,'promote' 1041,'down-regulate' 1043,'trigger' 1049,'block' 1054,'modify' 1057,'digest' 1058,'degrade' 1062,'link' 1071,'cleave' 1072,'release' 1074,'catalyze' 1083,'inactivate' 1106,'repress' 1110,'acetylate' 1117,'methylate' Typical nodes 17767,'calcium channel antagonists' 20324,'hsp70 chaperone' 17467,'activator protein 1' 5104,'daunorubicin' 13194,'tyrosyl-phosphorylated' 9689,'paroxonase' 4190,'immunodeficiency' 4478,'iga2' 8552,'human fcgammarii' 4472,'iga1' 13151,'ikaros' 9820,'caveolin 1' 7277,'virus-triggered p-dcs' 4366,'complexes pr-3' 12290,'anti-alpha4 mabs' 2258,'gal4-mef2d' 14464,'polyneuropathy' database ID 16044,'alk5' 10393,'mek-1 inhibitor' 13262,'pro-matrilysin' Checking the internal consistency of statements (PLoS ONE, 2006) Quic kTime™ and a TIFF (Unc ompres sed) dec ompres sor are needed to see this picture. AR Tian Zheng Statements George leaves the office when Bill arrives (B -| G) Statements George is in the oval office (G=1) Statements Bill is in the oval office (B=1) Together George leaves the office when Bill arrives (B -| G) George is in the oval office (G=1) Bill is in the oval office (B=1) problem Get a consistent tissue/cell/organism-specific model of molecular interactions Given Noisy statements about molecular states (nodes) Noisy statements about molecular interactions (arcs) Inconsistent example Qu ickTime™ an d a TIFF (Unco mpressed) d ecompresso r are nee ded to see this picture. Consistent example Qu ickTime™ an d a TIFF (Unco mpressed) de compressor are nee ded to see this pic ture. Implementation: Simple generalization of Bayesian networks P(nodes|arcs) -- Bayesian network P(arcs|nodes) -- additional level of modeling Gibbs sampling Arc update Node update Before Qu ickTime™ and a TIFF (Uncompressed) deco mpressor are nee ded to see this picture. After Qu ickTime™ an d a TIFF (Uncompressed) de compressor are nee ded to see this pic ture. Difference Qu ickTime™ and a TIFF (Uncompressed) decompressor are need ed to see this picture. Entropy change Qu ickTime™ and a TIFF (Uncompressed) decompressor are need ed to see this picture. And Now For Something Completely Different… (with apologies to Monty Python) How many events we observe? • (With apologies to Steven Pinker): 9/11 -- one event or two? It was one coordinated attack, but … 1 --> $3.5 * 106 2 --> $7 * 106 in insurance paid to Larry Silverstein My point: seemingly incompatible descriptions of the same phenomenon coexist in publications Phosphorylation -- how many events and players? Enzyme and substrate? Plus ATP? ADP? Intermediate complex? A phosphorylates B Timeline: let’s think about implications for concepts & ontologies Big Dipper: now and 100,000 years from now Things and their perception change… Time-stamped texts are as fossil layers Fossilized concepts… Time and meaning Semantics Time and meaning YOU ARE HERE Semantics Text-miners are here (green) YOU ARE HERE Semantics Time and meaning SUBFIELD B SUBFIELD A Semantics Berlin-Kay QuickTime™ and a decompressor are needed to see this picture. Point: with text-mining we can go 100, 200, … years back. Even within a 20-year window we cannot neglect changes in conceptual semantics minimum We need representation of A and A Both can be true (with different probabilities) Fuzzy sets representing concept inheritance Even better… We could introduce binding between verbs and relations that changes with time Binding between phrases and their semantics (mapping to the real or abstract world) can also change with time Allow co-existence of concepts associated with incompatible theories (ether, chi, etc) Financial support comes from