Transcript Slide 1
Reasoning on the Web: Theory, Challenges, and Applications in Bioinformatics Prof. Michael Schroeder Biotec/Dept. of Computing TU Dresden [email protected] comas.soi.city.ac.uk Biotec Contents Motivation Beyond the web: Rules, Reasoning, Semantics, Ontologies Semantics of Deduction Rules Argumentation Semantics Fuzzy Reasoning Reaction rules Vivid Agents Prova Applications in Bioinformatics By Michael Schroeder, Biotec, 2003 2 The Web A great success story, but… it’s the web for humans, not machines Many areas, such as biology, have fully embraced the web Human genome project is only tip of the iceberg More than 500 tools and databases online LLNE YLEEVE EYEEDE By Michael Schroeder, Biotec, 2003 3 Example: Pubmed >12.000.000 literature abstracts Great resource if one knows what one is looking for “Kox1” has 17 hits But “diabetes” will produce >200.000 Often need to automatically process abstracts By Michael Schroeder, Biotec, 2003 4 Title Results of PubMed Lorenz P, Transcriptional repression mediated by the KRAB domain of the human C2H2 zinc finger protein Author Kox1/ZNF10 does not require histone deacetylation. Biol Chem. 2001 Apr;382(4):637-44. Fredericks WJ. An engineered PAX3-KRAB transcriptional repressor inhibits the malignant Year Journal of alveolar rhabdomyosarcoma cells phenotype harboring the endogenous PAX3-FKHR oncogene. Mol Cell Biol. 2000 Jul;20(14):5019-31. ... However, to a machine things look different! By Michael Schroeder, Biotec, 2003 5 Results of PubMed Lorez P, Trascriptioal repressio mediated by the KRAB domai of the huma C2H2 zic figer protei Kox1/ZNF10 does ot require histoe deacetylatio. Biol Chem. 2001 Apr;382(4):63744. Fredericks WJ. A egieered PAX3-KRAB trascriptioal repressor ihibits the maligat pheotype of alveolar rhabdomyosarcoma cells harborig the edogeous PAX3-FKHR ocogee. Mol Cell Biol. 2000 Jul;20(14):5019-31. Solution: tag data (XML) ... By Michael Schroeder, Biotec, 2003 6 Results of PubMed <author>Lorez P</author><title>Trascriptioal repressio mediated by the KRAB domai of the huma C2H2 zic figer protei Kox1/ZNF10 does ot require histoe deacetylatio. </title> <journal>Biol Chem </journal><year>2001<year> <author>Lorez P</author><title>Trascriptioal repressio mediated by the KRAB domai of the huma C2H2 zic figer protei Kox1/ZNF10 does ot require histoe deacetylatio. </title> <journal>Biol Chem </journal><year>2001<year> However, to a machine things look different! ... By Michael Schroeder, Biotec, 2003 7 Results of PubMed <author>Lorez P</author><title>Trascriptioal repressio mediated by the KRAB domai of the huma C2H2 zic figer protei Kox1/ZNF10 does ot require histoe deacetylatio. </title> <joural>Biol Chem </joural><year>2001<year> <author>Lorez P</author><title>Trascriptioal repressio mediated by the KRAB domai of the huma C2H2 zic figer protei Kox1/ZNF10 does ot require histoe deacetylatio. </title> <joural>Biol Chem </joural><year>2001<year> Solution: use ontologies (Semantic Web) ... By Michael Schroeder, Biotec, 2003 8 GeneOntology Biologists have recognised the problem of semantic inter-operability between disparate information sources GeneOntology (GO) is effort to provide common vocabulary for molecular biology GO has >10.000 terms in three branches “function”, “process”, “localisation” By Michael Schroeder, Biotec, 2003 9 GeneOntology Has 13 levels Width broadens to level 6 (3885 terms wide) then shrinks Number of leaves per levels broadens to level 6 (1223 leaves) then shrinks Average term has 4 words Maximal term has 29 words: Oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, 2-oxoglutarate as one donor, and incorporation of one atom each of oxygen into both donors 4500 4000 Breadth of GO 3500 3000 2500 2000 1500 1000 500 0 By Michael Schroeder, Biotec, 2003 1 2 3 4 5 6 7 8 9 10 11 12 13 10 14 Motivation Summary Web in the old days HTML (for humans) Web these days HTML XML, Ontologies (for machines) Web of the future HTML XML, Ontologies rules, reasoning, semantics access to computational resources (a la grid-computing) By Michael Schroeder, Biotec, 2003 11 Open Problems Part I: Theory of rules and reasoning on the web: Knowledge representation: Which level of expressiveness? Semantics: How to guarantee inter-operability Reasoning: Fuzzy reasoning and unification Reactivity: Vivid agents Part II: Applications of rules and reasoning on the web: Integration and querying of information sources Integration: transmembrane prediction tools Integration: protein structure DB and structure classification Consistency checking Ontology: If A is B and B is C, then the ontology should not explicitly mention A is C, as it is already implicit Annotation: Do different tools agree or disagree? By Michael Schroeder, Biotec, 2003 12 The wider Picture: www.RuleML.org Goal: develop Web language for rules using XML markup, formal semantics, and efficient implementations. Rules: derivation rules, transformation rules, and reaction rules. RuleML can thus specify queries and inferences in Web ontologies, mappings between Web ontologies, and dynamic Web behaviors of workflows, services, and agents. Currently, some 30 international members and close collaboration with W3C By Michael Schroeder, Biotec, 2003 13 The wider Picture: REWERSE Reasoning on the Web with Rules and Semantics FP6 Network of Excellence with nearly 30 partners Working groups on Infrastructure and Applications Composition Typing Policies Querying Reactivity and evolution Personalised Web sites Calendar systems Bioinformatics By Michael Schroeder, Biotec, 2003 14 Part I: Theory Motivation: Expressive Knowledge Representation Part I.a: Argumentation as LP semantics Notions of attack and justified arguments Hierarchy of semantics Proof procedure Part I.b: Fuzzy unification and argumentation Fuzzy negation Fuzzy argumentation Fuzzy unification Part I.c: Vivid Agents By Michael Schroeder, Biotec, 2003 15 Part I.a: A Hierarchy of Semantics RuleML caters for different degrees of knowledge representation A hierarchy of semantics is required to guarantee inter-operation. Analogy: In HTML, <b>Michael</b> will be interpreted differently in Netscape (Michael) and the text-based browser Lynx (Michael). Problem: How can we guarantee inter-operability between different interpretations of rules? By Michael Schroeder, Biotec, 2003 16 Knowledge representation Pete earns 500.000$ p.a. earns(pete,500000). Cross the street if there are no cars cross not car cross car The fridge is quite cheap cheap(fridge):70% Does Mike live in Londn? address(mike,london) = address(mike,londn): 95% By Michael Schroeder, Biotec, 2003 17 Knowledge System Cube z r: relational z f: fuzzy z d: deductive z DB: database z FB: factbase fdFB fdDB dDB dFB fDB deductive rDB fFB rFB negation By Michael Schroeder, Biotec, 2003 18 Part I.a: Argumentation as semantics for Extended Logic Programs zf: fuzzy z d: deductive z DB: database z FB: factbase dDB dFB fDB rDB deductive fdFB fdDB fFB rFB negation By Michael Schroeder, Biotec, 2003 19 Extended Logic Programming Logic Programming with 2 negations Default negation: not p : true if all attempts to prove p fail. Explicit negation: p : falsehood of a literal may be stated explicitly. Coherence principle: p not p By Michael Schroeder, Biotec, 2003 20 Argumentation Interaction between agents in order to gain knowledge revise existing knowledge convince the opponent solve conflicts Elegant way to define semantics for (extended) logic programming Dung Kowalski, Toni, Sadri Prakken & Sartor Etc. By Michael Schroeder, Biotec, 2003 21 Arguments An argument is a partial proof, with implicitly negated literals as assumptions. Argument = sequence of rules By Michael Schroeder, Biotec, 2003 22 Attacking arguments Two fundamental kinds of attack: A undercuts B = A invalidates premise of B P: Let’s go to the lake as it is not snowing anymore O: Hang, it is snowing A rebuts B = A contradicts B P: Let’s go to the lake as it is not snowing O: Let’s not, as I’ve got to prepare my talk Derived notions of attack used in Literature: A attacks B = A u B or A r B A defeats B = A u B or (A r B and not B u A) A strongly attacks B = A a B and not B u A A strongly undercuts B = A u B and not B u A By Michael Schroeder, Biotec, 2003 23 Proposition: Hierarchy of attacks Attacks = a = u r Defeats = d = u ( r - u -1) Undercuts = u Strongly attacks = sa = (u r ) - u Strongly undercuts = su = u - u By Michael Schroeder, Biotec, 2003 -1 24 -1 Fixpoint Semantics Argumentation: game between proponent and opponent argument A is acceptable if opponent’s x-attack is countered by proponent’s y-attack, which proponent already accepted earlier. Acceptable Let x,y be notions of attack. An argument A is x,y-acceptable w.r.t. a set of arguments S iff for every argument B, such that (B,A) x, there is a C S such that (C,B) y Fixpoint semantics Fx/y (S) = { A | A is x,y-acceptable w.r.t. S } x/y-justified arguments = Least Fixpoint of Fx/y. x/y-overruled arguments = x-attacked by a justified argument. x/y-defensible iff neither justified nor overruled By Michael Schroeder, Biotec, 2003 25 Theorem: Relationship of semantics Weakening opponent or strengthening proponent increases justified arguments and Sartor’s to attack, If Prakken opponent is allowed semantics Different notions ofispriorities acceptability give rise IfDung’s opponent allowed defeat , to different argumentation grounded w/o type of defense does not matter semantics WFSX type of defense does not matter argumentation semantics su/a=su/d If opponent is allowed undercut, su/u su/sa defense with (a,u,sa) or without (su,u) rebut makes a difference sa/u=sa/d=sa/a su/su u/a=u/d=u/sa sa/su=sa/sa u/su=u/u d/su=d/u=d/a=d/d=d/sa a/su=a/u=a/a=a/d=a/sa By Michael Schroeder, Biotec, 2003 26 Proof procedure Dialogues: x/y-dialogue is sequence of moves such that Proponent and Opponent alternate Players cannot repeat arguments Opponent x-attacks Proponent’s last argument Proponent y-attacks Opponent’s last argument Player wins dialogue if other player cannot move Argument A is provably justified if proponent wins all branches of dialogue tree with root A Concrete implementation SLXA: Since u/a=u/d=u/sa=WFSX compute justified arguments with top-down proof procedure SLXA for WFSX [Alferes, Damasio, Pereira] SLXA can be adapted for other notions By Michael Schroeder, Biotec, 2003 27 Part I.b: Fuzzy unification and argumentation z r: relational z f: fuzzy z d: deductive dDB z DB: database z FB: factbase dFB fDB rDB deductive fdFB fdDB fFB rFB negation By Michael Schroeder, Biotec, 2003 28 Classical Fuzzy Logic Solution: Truth values in [0,1] instead of {0,1}. Assertions: p:V (p a formula, V a truth value). Conjunction: p:V, q:W p q : min(V,W) Disjunction: p:V, q:W p q : max(V,W) Inference: p q1, …, qn ; q1:V1, …, qn:Vn p : min(V1, …, Vn) By Michael Schroeder, Biotec, 2003 29 Fuzzy Negation Classical fuzzy negation: L:V L: 1-V (Zadeh) Our setting (fuzzy adaptation of WFSX): L:V and L:V’ with V’ 1-V possible L and L not directly related. By Michael Schroeder, Biotec, 2003 30 Fuzzy Coherence Principle If L:V and V > 0, and not L:V’, then V’ > V. “If there is some explicit evidence that L is false, then there is at least the same evidence that L is false by default.” If L:V and V > 0, then not L: 1. By Michael Schroeder, Biotec, 2003 31 Law of excluded... ...contradiction ...middle p p :V V > 0 possible Contradictory programs! not p p : V V > 0 possible By coherence principle! Contradiction removal By Michael Schroeder, Biotec, 2003 not p p : V V > 0 p p : V V = 0 possible p is unknown 32 Strength of an argument Strength of an argument: Fact: value is given Rule: minimum of body literals Argument: Conclusion Least fuzzy value of the facts contributing to the argument. By Michael Schroeder, Biotec, 2003 33 Theorems Theorem (Soundness and Completeness) There is a justified argument of strength V for L iff There is a successful T-tree of truth value V for L Theorem (Conservative Extension) Argumentation semantics is a conservative extension of WFSX. By Michael Schroeder, Biotec, 2003 34 Application: Fuzzy unification Open systems: knowledge and ontologies may not match interaction with humans “Does Mike live in Londn?” Approach: address(mike,london) = address(mike,londn): 95% adapt unification algorithm (normalised edit distance over trees net) embed into argumentation framework By Michael Schroeder, Biotec, 2003 35 Finding Mismatches: Edit distance Edit distance between strings A and B: minimal number of delete, add, replace operations to convert A into B. efficient implementation with dynamic programming Example: e(address,adresse)=2, e(007,aa7)=2 Normalise: ne(A,B) = e(A,B) / max{ |A|, |B| } Trees: net = sum of all mismatches divided by sum of all max lengths By Michael Schroeder, Biotec, 2003 36 Fuzzy unification and arguments net is conservative extension of MGU (most general unifier) net(t,t’) ne(t,t’) Adapt definition of argument for fuzzy unification V-argument: for all L in a body, there is L’ in head such that net(L,L’) 1-V A V-undercuts B if A contains not L and B’s head is L’ and net(L,L’) 1-V A V-rebuts B if A’s head is L and B’s head is L’ and net(L,L’) 1-V Adapt previous definitions accordingly By Michael Schroeder, Biotec, 2003 37 Comparison: Argumentation Our framework allows us to relate existing and new argumentation semantics: Dung= a/su=a/u=a/a=a/d=a/sa Prakken&Sartor = d/su=d/u=d/a=d/d=d/sa WFSX = u/a = u/d = u/sa Dung Prakken&Sartor WFSX Proof Theory and Top-down Proof Procedure adapted from Alferes, Damasio, Pereira’s SLXA By Michael Schroeder, Biotec, 2003 38 Comparison: Fuzzy Argumentation Wagner: Scale: -1 to +1 Unlike WFSX, he relates F and F: F: -V iff F:V We adopted his interpretation for not: not F:1 if F:V, V>0 Relates his work to stable models, but there is no top-down proof procedure for stable models [Alferes&Pereira] Our approach conservatively extends WFSX, hence we can adapt proof procedure SLXA By Michael Schroeder, Biotec, 2003 39 Comparison: Fuzzy unification Arcelli, Formato, Gerla define abstract fuzzy unification/resolution framework cannot deal with missing parameters (common problem [Fung et al.]) no conservative extension of classical unification we use concrete distance: edit distance Evaluated idea on bioinfo DB By Michael Schroeder, Biotec, 2003 40 Conclusion “A database needs two kinds of negation” (Wagner) Argumentation is an elegant way of defining semantics Our framework allows classification of various new and existing semantics Efficient top-down proof procedure for justified arguments Argumentation as basis for belief revision (REVISE) We cover the whole knowledge system cube including fuzzy argumentation Defined fuzzy unification, which is useful in open systems By Michael Schroeder, Biotec, 2003 41 Part I.c: Vivid Agent A vivid agent is a software-controlled system, whose state is represented by a knowledge base and whose behaviour is represented by action- and reaction rules Actions are planned and executed to achieve a goal Reactions are triggered by events Epistemic RR: Effect <- Event, Cond Physical RR: Action, Effect <- Event, Cond Interaction RR: Msg, Effect <- Event, Cond By Michael Schroeder, Biotec, 2003 42 Interface Vivid Agent Events Reaction Rules Perception Reaction Cycle Intentions Goals Planner Believes/ Updates KB By Michael Schroeder, Biotec, 2003 Goals Action rules Believes Believes KB 43 Agent State and Transition Semantics Agent State: Event queue, Plan queue, Goal queue, Knowledge base Transition semantics Perception Add event to agent’s event queue Reaction Pop event from event queue, execute reactions including update of knowledge base Plan execution Execute action of plan in plan queue Replanning If action fails, replan Planning Pop goal from goal queue and generate plan By Michael Schroeder, Biotec, 2003 44 Implementation in Prova Original Implementation in PVM-Prolog Course-grain parallelism (PVM) for each agent and Prolog threads for an agent’s components Currently: Prova is a Java-based rule engine easy integration of all kinds of data sources. e.g., database, web services, etc. By Michael Schroeder, Biotec, 2003 45 Part II: Application to Bioinformatics NSF and EU’s strategic research workshop found that bioinformatics could play the role for the semantic web, which physics played for the web. Why? Masses of information Masses of publicly accessible online information (e.g. 8000 abstracts per month and over 500 tools) Data (more and more often) published in XML Data standards are accepted and actively developed Much valuable information scattered (as production cheap and hence not centralised) Systemsintegration and interoperation prime concern (e.g. GeneOntology) LLNE YLEEVE EYEEDE By Michael Schroeder, Biotec, 2003 46 Example: Information Agents for… … Protein interactions PDB, SCOP Facilitator … Protein annotation TOPPred, HMMTOP,… Information source Wrapper Mediator Facilitator Mediator Wrapper Source Wrapper Source By Michael Schroeder, Biotec, 2003 Wrapper Source 47 Example 1: Protein Interaction: PDB: Protein structures SCOP: Structure classification By Michael Schroeder, Biotec, 2003 48 Example 1: PSIMAP: Structural Interactions By Michael Schroeder, Biotec, 2003 49 Example 1: Protein Interaction: How it is currently done PDB: 15 Gigabyte in flat files SCOP: 3 flat files How? Download PDB, SCOP files Think up DB schema and populate MySQL DB Run some Perl scripts on various machines, that grind through the data and analyse it Run some Java to visualise results Problem: “Business logic” not separated By Michael Schroeder, Biotec, 2003 50 How our Prova system can run execute Might be held locally in file, remotely from a DB, Declarative and executable throughspecifications a web service, on the grid, etc. Interaction(Superfamliy1, Superfamliy2) if PDB(Protein), Domain(Protein,Domain1), Local or remote computation. Domain(Protein,Domain2), SCOP Superfamily(Domain1, Superfamily1), SCOP Superfamily(Domain2, Superfamily2), InteractionDD(Domain1,Domain2, 5 Ang, 5 Residues) Separation of information integration workflow Easier to maintain Platform independence, because of Java Flexible, optimized execution Query optimization and load-balancing of computations By Michael Schroeder, Biotec, 2003 51 Actual Prova Code % ACTUAL PROVA CODE % Given the open database connection DB % and a unique protein identifier in Protein % Data Bank PDB_ID, test whether the provided % domains with IDs PXA and PXB interact % (have at least 5 atoms within 5 angstroms) scop_dom2dom(DB,PDB_ID,PXA,PXB) :access_data(pdb,PDB_ID,Protein), scop_dom_atoms(DB,Protein,PXA,DomainA), scop_dom_atoms(DB,Protein,PXB,DomainB), DomainA.interacts(DomainB). By Michael Schroeder, Biotec, 2003 52 Caching % Two alternative rules for either retrieving data % from the cache or accessing the data from its % original location and caching it. access_data(Type,ID,Data,CacheData) :% Attempt to retrieve the data Data=CacheData.get(ID), % Success, Data (whatever object it is) is returned !. access_data(Type,ID,Data,CacheData) :% Retrieve the data from its location and update the cache retrieve_data_general(Type,ID,Data), update_cache(Type,ID,Data,CacheData). By Michael Schroeder, Biotec, 2003 53 Example 2: GoPubmed By Michael Schroeder, Biotec, 2003 54 Consistency of GO Simple example: Parsimony: If A is-a C is explicitly stated in the ontology, it should be possible to derive it implicitly I.e. Don’t state A is-a C if you have already A is-a B and B is-a C Done with Prova By Michael Schroeder, Biotec, 2003 55 Towards functional annotation through GoPubmed Protein Name/Enzyme activity Pyruvate kinase M1 isozyme CAMP dpt protein kinase type II regulatory chain Galactokinase Tropomyosin bêta chain HnRNP DO kinase transferase lyase isomerase one other X X X X oxireductase X X X X X X X X X X X X hydrolase X cyclase X X X X helicase X By Michael Schroeder, Biotec, 2003 56 Example 3: Consistent Integration of Protein Annotation By Michael Schroeder, Biotec, 2003 57 Conflicts By Michael Schroeder, Biotec, 2003 58 Example: Edit2TrEMBL EditToTrEMBL (Steffen Möller, EBI): automate annotation of DNA sequences by combining results of various tools and databases, which are online Analyser Info object Host Dispatcher Info object Info object HostInfo object Info object Analyser Host Analyser Host Info Info object Infoobject object By Michael Schroeder, Biotec, 2003 59 Challenge Uncertain, incomplete, vague, contradictory information Wrappers domains overlap: How Facilitator can mediator resolve conflicts? How can mediator integrate information consistently? How can mediator improve info Mediator quality using overlapping info and Wrapper inconsistencies Mediator contains conflict resolution component Source Semantic conflict resolution requires domain knowledge to Wrapper Wrapper identify conflicts We use extended logic programming Source Solution: Common Problem: Source Semantic consistency Overlapping information can lead checking to inconsistencies By Michael Schroeder, Biotec, 2003 60 Modelling domain knowledge Facts, Rules, Assumptions, Integrity Constraints For example: The length of transmembrane regions is limited: false if ft(AccNo,transmembrane,From,To), To-From >25 false if ft(AccNo,transmembrane,From,To), To-From <15 Maximal difference in membrane borders false if ft(Agent1,Acc,transmembrane,From1,To1), ft(Agent2,Acc,transmembrane,From2,To2), (From1>From2,From1<To2;To1>From2,To1<To2), (abs(From2-From1)>4;abs(To2-To1)>4). Assessment of predictions: probability(ft(tmhmm,p12345,transmem,6,26), 0.5) By Michael Schroeder, Biotec, 2003 61 REVISE REVISE detects conflicting arguments and computes minimal set of assumptions, which removes conflict Dropping these assumptions yields minimal consistent annotation of all predictions Minimality is based on probabilities given as part of predictions alternative: cardinality, set-inclusion By Michael Schroeder, Biotec, 2003 62 Vision: A semantic Grid for Bioinformatics BioNet Explorer Interaction Space: PSIMAP Expression Space: Space Explorer Pathway Space: Literature Space: Classification Server By Michael Schroeder, Biotec, 2003 63 Conclusion Advanced applications on the web, will require rules and reasoning Part I: Argumentation is an elegant way of defining semantics Classification of various new and existing semantics Fuzzy reasoning and unification Reactivity with vivid agents and prova Part II: Bioinformatics requires a semantic web and the semantic web requires bioinformatics By Michael Schroeder, Biotec, 2003 64 Acknowledgment Ralf Schweimeier (Argumentation semantics) Panos Dafas, Dan Bolser (PSIMAP) Steffen Moeller (Edit2Trembl) David Gilbert (Fuzzy Unification) Ralph Delfs, Alexander Kozlenkov (Go, Prova) Carlos Damasio (REVISE) More information at comas.soi.city.ac.uk Email: [email protected] By Michael Schroeder, Biotec, 2003 65