Transcript Slide 1
Semantic Matching and Applications Dr. Virendrakumar (Virendra) C. Bhavsar Professor (Dean 2003-2008) Faculty of Computer Science University of New Brunswick (UNB) Fredericton, Canada [email protected] (Thanks - Lu Yang, Biplab Sarker, Harold Boley) 1 Faculty of Computer Science The First “Faculty of CS” in Canada University of New Brunswick Fredericton, New Brunswick Canada Oldest English Language University in Canada Established in 1785 2 3 4 Outline • Motivation • Syntactic Matching • Semantic Matching -Concept Similarity in a Taxonomy -Schema Matching • Weighted Tree Representation • Partonomy Tree Similarity Algorithm • Non-semantic matching on nodes • Semantic Matching • Inner nodes vs. leaf nodes • Global similarity measure (inner nodes) • Local similarity measures (for leaf nodes) • Conclusion 5 Syntactic Matching Exact String Matching • Binary result 0.0 or 1.0 Permutation of strings “Java Programming” versus “Programming in Java” Number of identical words Maximum length of the two strings Example 1 For two node labels “a b c” and “a b d e”, their similarity is: 2 = 0.5 4 6 Syntactic Matching Example 2: Node labels “electric chair” and “committee chair” 1 = 0.5 2 meaningful? • Syntactic Matching does not consider additional domain knowledge •Semantic matching techniques are needed for the above problems 7 Semantic Matching Applications • Databases • e-Business • Case-Based Reasoning (CBR) • Information Retrieval • Web Services • Natural Language Processing • Information Integration • Other applications 8 e-Business Applications Main Server Agents User Info User Profiles … User Agents … … User Web Browser To other sites (network) e-Market … Matcher1 Matchern • e-business, e-learning … • Buyer-Seller matching • Metadata for buyers and sellers • Keywords/keyphrases 9 Concept Similarity in a Taxonomy • Taxonomy A taxonomy is a scheme that partitions a body of knowledge and defines the relationships among the pieces Taxonomy A B Given a taxonomy and two concepts (e.g., A and B), find the semantic similarity (distance) of the two concepts 10 Semantic Matching Techniques • Schema Matching • Element Matching • Concept Matching 11 Schema Matching • Schema A schema is simply a set of elements connected by some structure • Schema matching (E. Rahm and P. A. Bernstein. A survey of approaches to automatic schema matching. The VLDB Journal 10:334-350, 2001) • Takes two schemas as input and produces a mapping between elements of the two schemas that correspond semantically to each other • Applications • Schema integration • Data warehouses • E-Commerce • Semantic query processing 12 Concept Similarity in a Taxonomy ─ WordNet • Semantic knowledge base: WordNet (G.. Miller. Nouns in WordNet: A Lexical Inheritance System. International Journal of Lexicography, 1990, 3(4):245-264.) WordNet is the product of a research project at Princeton University which has attempted to model the lexical knowledge of a native speaker of English • Information in WordNet is organized around logical groupings called synsets • Each synset consists of synonymous word forms and semantic pointers that describe relationships between current synset and other synsets 13 Concept Similarity in a Taxonomy (Cont’d) • A taxonomy is represented as a tree or a graph • IS-A/HAS A (Contributes 80% among all the relationships) • Part-of/Has-Part • Classification of the semantic concept similarity measures • Edge-based approaches — shortest path length • Node-based approaches — node information content • Combination of edge-based and node-based approaches • Edge-weight assignment 14 Concept Similarity in a Taxonomy ─ Two Classical Algorithms (I) • Edge-based approach — shortest path length (R. Rada, H. Mili, E. Bicknell, and M. Blettner. Development and Application of a Metric on Semantic Nets. IEEE Transactions on Systems, Man, and Cybernetics. 1989, 19(1):17-30) • Applied to the medical domain • Intuition The more similar two concepts are, the smaller the concept distance is between them • Distance (A, B) = minimum number of edges separating a and b where, A and B are two concepts represented by the nodes a and b in an IS-A semantic net 15 Concept Similarity in a Taxonomy ─ Two Classical Algorithms (I) • Problems with the Rada et al. algorithm • For larger or more general domains, this algorithm leads to less accurate results than expected (Resnik et al. 1995, 1999) • The distances between any two adjacent nodes are not necessarily equal (Richardson & Smeaton 1995) • It relies on the notion that links in a taxonomy represent uniform distances (Wu & Palmer 1995, Richardson & Smeaton 1995, Corley & Mihalcea 2005) • Certain sub-taxonomies might be much denser than others • Only IS-A relationship was considered (Resnik 1999) 16 Concept Similarity in a Taxonomy ─ Two Classical Algorithms (II) • Node-Based Approach — Node Information Content (Resnik et al.1995, 1999) Goal: To solve the problem that certain sub-taxonomies are much denser than others in a taxonomy • Intuition One key to the similarity of two concepts is the extent to which they share information in common, indicated in an IS-A taxonomy by a highly specific concept that subsumes them both • Method Associating probabilities with concepts in the taxonomy function p: C→[0,1], such that for any c in C, p(c) is the probability of encountering an instance of concept c => p is monotonic as one moves up the taxonomy: if c1 IS-A c2, then p(c1) ≤ p(c2) 17 Concept Similarity in a Taxonomy ─ Two Classical Algorithms (II) • Probability computation of a concept freq(c) = ∑ count(n) n in words(c) where words(c) is the set of words subsumed by concept c p(c) = freq(c)/N where N is the total number of words observed • Quantification of the information content* -logp(c) As probability increases, informativeness decreases, so the more abstract a concept, the lower its information content • Similarity computation sim(c1, c2) = max [-logp(c)] c in S(c1, c2) where S(c1, c2) is the set of concepts that subsume both c1 and c2 * S. Ross. A First course in Probability. Macmillan, 1976. 18 Concept Similarity in a Taxonomy ─ Two Classical Algorithms (II) • Problems of the information content based approach Example (Richardson & Smeaton, 1995) {Produce, Green goods} 3.034 {Fruit} 3.374 {Apple} 3.945 {Boxberry} 7.576 {Berry} 4.907 {Banana} 5.267 {Cranberry} 6.285 • It does not differentiate the similarity values of any pair of concepts in a sub-hierarchy as long as their “smallest common ancestor” is the same e.g. sim(Apple, Berry) = sim(Apple, Banana) = 3.374 19 Concept Similarity in a Taxonomy ─ Two Classical Algorithms (II) • It only considers the IS-A links too • Structure information is overlooked • It derives coarse similarity results 20 Concept Similarity in a Taxonomy ─ Other Algorithms length * (1) Sim log 2* D length is the length of the shortest path between two concepts and D is the maximum depth of the taxonomy 2 * N3 (2)Sim N1 N2 2 * N3 N3 (Wu & Palmer, 1994) N1 and N2 are the numbers of nodes from c1 and c2 to their most specific common superconcept c3; N3 is the number of nodes from c3 to the root of the taxonomy (3)Sim Root 2 * IC( LCS ) IC(concept1) IC(concept2 ) c3 N1 c1 N2 c2 (Lin, 1998) IC is the information content computed by [4]; LCS is the Least Common Subsumer 21 * C. Leacock and M. Chodorow. Combining Local context and WordNet Sense Similarity for Word Sense Disambiguation. In WordNet, An Electronic Lexical Database. The MIT Press, 1998. Concept Similarity in a Taxonomy ─ Other Algorithms (Cont’d) (4)Sim 1 IC(concept1) IC(concept2 ) 2 * IC( LCS ) (Jiang-Conrath, 1997) • (Li-Bandar-McLean, 2003) • The main problem of previous algorithms: Either information sources are directly used as metric of similarity or a method uses a particular information source without considering the contribution of others. • Intuition Concepts at upper layers of the hierarchy have more general semantics and less similarity between them, while concepts at lower layers have more concrete semantics and stronger similarity. This leads to the consideration of concept depth. • Three factors that affect concept similarity Path length, depth, and density 22 Concept Similarity in a Taxonomy ─ Other Algorithms (Cont’d) • The similarity of two concepts is represented by the function s(c1, c2) = f ( f1(l), f2(h), f3(d)) where l, h, and d are the shortest path length, the depth of the nearest common subsumer of c1 and c2, and the local density of c1 and c2, respectively f1(l ) el α is a constant eh eh f 2 (h) h h e e β>0 esim(c1 ,c2 ) esim(c1 ,c2 ) f3 (sim) sim(c ,c ) sim(c ,c ) 1 2 e 1 2 e λ>0 After integration and evaluation: eh eh s(c1, c2 ) f1(l ) f 2 (h) el h e eh was reported as the best one 23 Concept Similarity in a Taxonomy ─ Evaluation of the Algorithms • Basic idea Investigates the algorithms’ performance against human common sense • Benchmark word set A commonly used word set is from an experiment by Rubenstein and Goodenough1 (51/65) and 26 years later by Miller and Charles2(38/30). Subjects rated the semantic similarity values from 0 to 4 • Reference results Similarity method Correlation Resnik Replication to Miller-Charles 0.9583 Li-Bandar-McLean 0.8914 Jiang-Conrath 0.8484 Lin 0.8213 Resnik (Information Content) 0.745 Rada (Edge-Counting) 0.664 24 1 2 H. Rubenstein and J. B. Goodenough. Contextual Correlates of Synonymy. Comm. ACM. 1965. 8:627-633. G. a. Miller and W. G. Charles. Contextual Correlates of Semantic Similarity. Language and Cognitive Processes, 1991, 6(1):1-28. Concept Similarity in a Taxonomy ─ Edge-Weight Assignment • Most of the aspects that should be considered are the structural characteristics (Jiang & Conrath, 1995) • Local network density One easy way to measure the local density is to find the number of children of a given node. The greater the density, the closer the distance between the nodes Node depth Distance shrinks as one descends the hierarchy, since differentiation is based on finer and finer details • Type of link • Link strength The closeness between a child node and its parent node, against those siblings 25 Schema Matching • Schema representations Entity-Relationship (ER) model, object-oriented (OO) model, XML and directed graphs • Element vs. structure Match is performed for individual schema elements or for combinations of elements 26 Schema Matching ─ Approaches • Cupid (Madhavn et al. 2001) • Intended to be generic across data models • Three phrases • Linguistic element-level matching • Transformation from the original schema to a tree and the computation of the structural similarity between pairs of elements • Mapping between elements • Similarity flooding (Melnik et al. 2002) • A graph matching algorithm • Schemas are converted into directed labeled graphs • The element-level mapping is fed into the SF matcher • Filters are applied to select relevant subsets of match results 27 Schema Matching ─ Approaches (Cont’d) • COMA (Do & Rahm 2002) • Represents schemas as rooted directed acyclic graphs • Combines match algorithms in a flexible way • Represents a generic match system supporting different applications and multiple schema types such as XML and relational schemas • Allows an interactive and iterative match process • Reuses the previously obtained match results 28 Recap • Concept similarity matching in a taxonomy • Edge-based • Node-based • Considerations • Path length • Node information content • Node depth • Edge weight • Semantic matching are applicable in many applications • Schema matching • Structure matching • Element matching 29 Motivation • More and more on-line transactions (e.g. e-Bay, kajiji, etc.) • Buyers and sellers input key words and/or specify values for some product features • A list of recommended sellers (with product advertisements) and/or buyers (with product requests) is presented • Flat representation of products cannot represent the hierarchical ‘part-of’ relationship of product parts • Match-making is not precise • Negotiation space is big 30 e-Business Applications Main Server Agents User Info User Profiles … … User Web Browser User Agents … To other sites (network) e-Market … Matcher1 Matchern • e-business, e-learning … • Buyer-Seller matching •Ranked pairings of Buyers and Sellers • Metadata for buyers and sellers • Keywords/keyphrases 31 Match-Making • Web service discovery [Field and Hoffner 2003] [Kawamura et al. 2001] [Trastour et al. 2001] • Multi-agent systems [Bertels et al. 2004] [Decker et al. 1996] [Marsh et al. 2003] [Ogston and Vassiliadis 2001] • Database schema, XML schema, and Data integration [Melnik et al. 2001] [Madhavan et al. 2001] • Ontology integration [Jiang and Conrath 1997] [Maguitman et al. 2005] [Sussan 1993] •…… 32 Partonomy Tree Similarity Algorithm ─ Tree Representation • Tree representation for product/service descriptions [Bhavsar et al. 2004] • Characteristics of our trees • Node-labled, arc-labled and arc-weighted • Sibling arcs are labled in lexicographical order • Sibling arc weights sum to 1.0 A simple example “Car” tree: Car Year Color 0.3 Make 0.5 0.2 Black Ford 2002 33 Seller Weights (1) • Advertisements on TV, Internet, and in newspapers • Sellers emphasize specific product/service features to attract buyers • Sellers seek buyers having preferences close to their products 34 Seller Weights (2) • No weights seller Car Color 1/3 Make 1/3 Red Ford Year 1/3 2002 35 Seller Weights (3) • Weights are not specified buyer 0.7834 Car Year 0.8 Color 0.1 seller Car Color 1/3 Make 1/3 Make 0.1 Black Ford 2002 Red Ford Year 1/3 2002 36 Seller Weights (4) • Sellers give weights • All the seller trees below are identical except the arc weights 0.925 buyer Car Color 0.1 Black Make 0.1 Year 0.8 Ford 2002 seller1 Car Color 0.05 Red 0.85 Color 0.2 Red Make 0.05 Ford seller3 Car Car Ford 2002 0.65 seller2 Make 0.2 Year 0.9 Year 0.6 2002 Color 0.6 Red Make 0.1 Ford Year 0.3 2002 37 Seller Weights (5) • Using seller weights, both buyers and sellers can find the most promising trading partners • The negotiation space is decreased • Sellers can always select the same weight for all fanout branches if they do not want to emphasize any attributes of their products/services 38 Tree Similarity/Distance (1) Some important algorithms in literature: • Node-labeled tree edit distance [Lu 1979] [Wang et al. 1994] [Shasha et al. 2001] • Insertion, deletion and substitution • Minimum cost to transform one tree to another one • ATreeGrep [Shasha et al. 2002] • Approximate tree searching • Only nodes are labeled • Siblings cannot be identical • Schema tree matching — Cupid [Madhavan et al. 2001] • Transformation from the original schema to a tree and computation of the structural similarity between pairs of elements •…… 39 Partonomy Tree Similarity Algorithm ─ Similarity Algorithm • Partonomy similarity [Bhavsar et al. 2004] Fragments of learning object trees [Boley et al. 2005] for learning object matching (http://www.cs.unb.ca/agentmatcher) educational 0.3334 edu-set general 0.5 format 0.5 tec-set platform 0.5 technical 0.7 0.3 0.3333 gen-set 0.5 general technical 0.3333 language title lom t´ lom t gen-set language title format 0.2 0.8 tec-set platform 0.1 0.9 * WinXP en en Introduction HTML WinXP to Oracle Basic Oracle * : Don’t Care (si (wi + w'i)/2) A(si) ≥ si (A(si)(wi + w'i)/2) 40 Semantic Matching • Inner nodes versus Leaf nodes • Inner nodes — class-oriented Inner node labels can be classes • Classes are located in a taxonomy tree • Taxonomic class similarity measure (global similarity measure) • Leaf nodes — type-oriented • Address, currency, date, price and so on • Type similarity measures (local similarity measures) 41 Semantic Matching (Cont'd) Non-Semantic Matching Exact String Matching (both inner and leaf nodes) String Permutation (both inner and leaf nodes) Semantic Matching Taxonomic Class Similarity (inner nodes) Type Similarity (leaf nodes) 42 Semantic Matching ─ Global Similarity • Global similarity measure (for inner nodes) [Yang et al. 2005] Distributed Programming Tuition Credit 0.2 Duration Textbook0.4 0.1 0.3 2months “Introduction $800 3 to Distributed Programming” Object-Oriented Programming Tuition Credit 0.1 Duration Textbook0.2 0.5 0.2 $1000 3months “Objected-Oriented 3 Programming Essentials” t1 t2 Partonomy trees 43 Tree Similarity Algorithm Development — Tree Similarity Function • Desired General Properties • Similarity function maps to [0.0, 1.0] • Reflexivity sim(t,t) =1.0 • Symmetry (could be relaxed) sim(t,t’) = sim(t’,t) • Negative properties • The following two conditions do not hold • If sim(x,y) = 1.0, then x = y • “Triangle Inequality” [sim(x,y)+ sim(y,z)] sim(x,z) >= sim(x,y) sim(y,z) 44 Tree Similarity Algorithm Development — Tree Similarity Function • Desired Special Properties • • • • Breadth/Depth Robustness Subtree Aggregation Taxonomic class similarity of inner node labels Local similarity of leaf node labels Select a tree, t, in the tree data set as the reference tree, the similarity between t and the trees in the data set: Similarity 1.0 Similarity approaches 0.0 0.0 t Trees 45 Tree Similarity Algorithm Development — Tree Similarity Algorithm Desired properties • Realize all desired properties of the function • (Recursive) Top-down traversal via subtree pairs under identically labeled arcs • Bottom-up similarity computation • Combining weights of identically labeled arcs • Inner node similarity • Node-label similarity — syntactic or semantic • Aggregation of subtree similarities • Leaf node similarity — syntactic or semantic 46 Tree Similarity/Distance (2) • Arc-weighted tree distance [Amenta et al. 2007] • Point representation of trees • Arcs can be weighted/unweighted • Only leaf nodes are labeled • No arc labels • Robinson-Foulds distance is employed 47 Tree Simplicity • Buyers and sellers may omit some product/service features, which lead to missing (sub)trees in product/service trees • Calculate the simplicity of the missing (sub)tree for the similarity computation • The simpler the missing (sub)tree, the greater its similarity and the empty tree 48 Tree Simplicity • Desired properties A tree simplicity function on a tree t is a real valued function simp(t) → R[0.0, 1.0] with the following properties: 1) Breadth property simp[bt] ]: N1→ R[0.0, 1.0] bt is the fanout of tree t. For two trees t and t': bt > bt' => simp(t) < simp(t') The simplicity of a tree monotonically decreases with increasing tree breadth. 2) Depth property simp[dt]]: N0→ R[0.0, 1.0] dt is the depth of tree t. For two trees t and t': dt > dt' => simp(t) < simp(t') The simplicity of a tree monotonically decreases with increasing tree depth. 49 Semantic Matching ─ A Taxonomy Tree • The taxonomy tree of “Programming Techniques” according to the ACM Computing Classification System (http://www.acm.org/class/1998/ccs98.txt) Programming Techniques 0.6 0.7 Object-Oriented 0.9 0.8 0.5 Programming 0.5 General Concurrent Programming Sequential Applicative Automatic 0.7 0.5 Programming Programming Programming Parallel Distributed Programming Programming 50 Semantic Matching ─ Taxonomic Class Similarity • The arc weights can be determined by human experts or machine learning algorithms [Singh 2005] • Sibling arc weights do not need to add up to 1 • Three factors that affect the taxonomic class similarity • The shortest path length between two classes • Arc weights on the shortest path • Level difference of two classes 51 Semantic Matching ─ Taxonomic Class Similarity • Taxonomic class similarity computation [Yang et al. 2005] dc1 dc2 Ns TS (c1 , c2 ) (1 ) * M *G Nt where TS(c1, c2) is the taxonomic class similarity of classes c1 and c2 Ns: the number of edges of the shortest path Nt: the number of edges of the whole tree M: the product of the arc weights on the shortest path d c1 d c2 : the level difference factor where G’s value is in (0.0, 1.0) and | dc1 dc2 | is the absolute difference of the depths of classes c1 and c2 (We assume G=0.5 here) G 52 Semantic Matching ─ Taxonomic Class Similarity Example Programming Techniques 0.7 0.6 General 0.8 0.5 0.5 Applicative Automatic 0.7 Programming Programming 0.9 Object-Oriented Programming Concurrent Programming Sequential 0.5 Programming Parallel Distributed Programming Programming • red arrows stop at their nearest common ancestor TS (Distributed Programming, Object - Oriented Programming) 3 2 1 (1 ) * (0.7 * 0.5 * 0.7) * 0.5 0.0766 8 53 Semantic Matching ─ Encoding Subtaxonomies • Encoding subtaxonomy trees into partonomy trees • A converse task Computes the similarity of pairs of taxonomies e.g. subtaxonomies of the background taxonomy, as required in our Teclantic project (http://teclantic.cs.unb.ca) • Allows the direct reuse of our partonomy similarity algorithm and permits weighted (or ‘fuzzy’) taxonomic subsumption with no added effort 54 Semantic Matching ─ Encoding Subtaxonomies Background Taxonomy tree of “Programming Techniques” for encoding Programming Techniques Applicative Concurrent Object-Oriented Programming Programming Programming Automatic 0.1 0.1 0.15 Programming General 0.3 0.2 * * * * * Distributed Parallel Programming Programming 0.4 0.6 * Sequential Programming 0.15 * * • Sibling arc weights must sum up to 1.0 • Classes are represented as arc labels (lexicographical ordered) • All node labels except the root node label are changed into “Don’t Care” 55 Semantic Matching ─ Encoding Subtaxonomies Two course trees with encoded subtaxonomy trees course course Classification Tuition Duration Title 0.05 0.65 Credit 0.15 0.1 taxonomy $800 0.05 2months 3 Programming 1.0 Techniques * Sequential Concurrent Programming Programming 0.3 0.7 * * Parallel Distributed Programming Programming 0.4 0.6 * Distributed Programming Classification Tuition Duration 0.65 Title 0.05 Credit 0.05 0.05 taxonomy $1000 0.2 3months 3 Programming 1.0 Techniques * Sequential Object-Oriented Programming Programming 0.2 0.8 * * Object-Oriented Programming * • Weight assignment in the "Classification" branch (two options) • By human expert • By machine learning • Normalizes corresponding weights in the background taxonomy 56 Semantic Matching ─ Local Similarity • Local similarity measures (for leaf nodes) Special-purpose similarity measures for various data types realizing semantics to be invoked when computing similarity of any two of their instances • “Price” type • “Date” type [Yang et al. 2005] •... 57 Semantic Matching ─ Price Matching • Price • Price is the omnipresent factor that determines buyers’ and sellers’ decision-making • Price similarity seems to be asymmetric for buyers and sellers e.g. buyer asks $800 and seller asks $1000 — Unsuccessful buyer asks $1000 and seller asks $800 — Successful The similarity of $800 and $1000 is different for the above cases 58 Semantic Matching ─ Price Matching • Transform the asymmetry to symmetry • Buyers and sellers always have price ranges in their minds [Bpref, Bmax] and [Smin, Spref] Bpref : buyer’s preferred price Bmax : buyer’s maximum acceptable price Smin : seller’s minimum acceptable price Spref : seller’s preferred price • Our price-range similarity measure is based on the intuition that the greater the overlap between the buyer’s and seller’s price ranges, the higher is their similarity value 59 Semantic Matching ─ Price Matching Algorithm • Pseudo code of the price-range similarity algorithm PriceRangeSim ([Bpref, Bmax], [Smin, Spref]) Begin If Spref <= Bpref similarity = 1.0 else if Bmax < Smin similarity = 0.0 else if Bmax = Smin similarity = 0.005 MAX MIN else { MIN = min{MIN, Smin} MAX = max{MAX, Bmax} similarity = Bmax Smin MAX MIN } return similarity End. • This algorithm can be easily adapted to the “price”-typed attributes e.g. “salary range” in job seeking and recruiting e-Market 60 Semantic Matching ─ Date Matching • “Date”-typed leaf node similarity measure if | d1 – d2 | ≥ 365 0.0 { DS(d1, d2) = 1– | d1 – d2 | 365 otherwise where DS(d1, d2) is the date similarity of two dates d1 and d2 Project Project start_date 0.5 end_date 0.5 Nov 3, 2004 May 3, 2004 t1 start_date 0.5 0.74 end_date 0.5 Jan 20, 2004 Feb 18, 2005 t2 61 Evaluation • Analytic study of algorithmic properties (for special cases) • Experiments to assess properties of functions algorithms • Comparison with other algorithms Arc-weighted tree distance [Amenta et al. 2007] Weighted keywords/phrases similarity [Marsh 2003] Online hotel search [Niemann 2006] - and 62 Implementation, Testing and Evaluation • Implementation • In Java • Testing on systematically varied cases • Evaluation • Experiments to assess properties of algorithms • Analytic study of algorithmic properties • Show the behavior of the algorithms with increasing breadth and depth 63 Implementation, Testing and Evaluation • Comparison with other algorithms • Arc-weighted tree distance [Amenta et al. 2007] • Weighted keywords/phrases [Marsh et al. 2003] • ATreeGrep [Shasha et al. 2002] • Node-labeled tree edit distance [Lu 1979] 64 Partonomy Tree Similarity Algorithm ─ Application • eduSource project prefilter parameters (Query URI) user input (1) End user UI (Java) Search Results (3) (4) (5) WOO (2) RuleML file Similarity Engine (Java) prefiltered WOO RuleML (6) (7) CanCore files files Translator (XSLT) Prefilter (SQL) CanCore files partial CanCore files CANLOM (XML) (c) LOMGen (Java) HTML files (a) LOR (HTML) (b) Recommended results (8) Administrator input DATABASE (Access) Administrator Keyword Table 65 Partonomy Tree Similarity Algorithm ─ Application • Teclantic protal http://www.teclantic.ca •ca) 66 Conclusion • Weighted trees for product/service descriptions • Partonomy tree similarity algorithm • Synchronously traverses trees top-down • Aggregates intermediate similarity values bottom-up • Semantic Global and Local Matching • Taxonomic Class Similarity • Encoding Subtaxonomies into Partonomies • Leaf-Node Type Similarity Measures • Future Work • Improvement of Taxonomic Class Similarity • Generalization of Local Similarity Measures 67 Publications 1. Lu Yang, B. K. Sarker, V. C. Bhavsar, and H. Boley. “Range Similarity and Satisfaction Measures for Buyers and Sellers in eMarketplaces”. Journal of Intelligent Systems, pp. 61-65, 2007. (Revised version of [2]) 2. Lu Yang, Biplab K. Sarker, Virendrakumar C. Bhavsar, and Harold Boley, "Range Similarity Measures between Buyers and Sellers in eMarketplaces", In Proceedings of the Second Indian International Conference on Artificial Intelligence, Pune, Dec. 20-22, 2005. 3. Lu Yang, Marcel Ball, Virendrakumar C. Bhavsar, and Harold Boley, "Weighted Partonomy-Taxonomy Trees with Local Similarity Measures for Semantic Buyer-Seller Match-Making". Journal of Business and Technology. 68 Publications 4. Harold Boley, Virendrakumar C. Bhavsar, David Hirtle, Anurag Singh, Zhongwei Sun, and Lu Yang, "A Match-Making System for Learners and Learning Objects", International Journal of Interactive Technology and Smart Education, 2(3), 2005. 5. Jing Jin, Biplab K. Sarker, V.C. Bhavsar and H. Boley, and Lu Yang, "Towards a Weighted-Tree Similarity Algorithm for RNA Secondary Structure Comparison", In Proceedings of the 8th International Conference on High Performance Computing in Asia Pacific Region, IEEE Computer Society, December 2005. 6. Lu Yang, Marcel Ball, Virendrakumar C. Bhavsar, and Harold Boley, 69 Publications 7. Lu Yang, Biplab K. Sarker, Virendrakumar C. Bhavsar, and Harold Boley, "A Weighted-Tree Simplicity Algorithm for Similarity Matching of Partial Product Descriptions", In Proceedings of ISCA 14th International Conference on Intelligent and Adaptive Systems and Software Engineering, Toronto, 2005, pp.55-60. 8. Virendrakumar C. Bhavsar, Harold Boley, and Lu Yang, "A WeightedTree Similarity Algorithm for Multi-Agent Systems in e-Business Environments", Computational Intelligence, 20(4), pp.584-602, 2004. 9. Riyanarto Sarno, Lu Yang, Virendrakumar C. Bhavsar, and Harold Boley, "The AgentMatcher Architecture Applied to Power Grid Transactions", In Proceedings of the First International Workshop on Knowledge Grid and Grid Intelligence, Halifax, 2003, pp.92-99. 10.Virendrakumar C. Bhavsar, Harold Boley, and Lu Yang, "A Weighted-Tree Similarity Algorithm for Multi-Agent Systems in eBusiness Environments", In Proceedings of 2003 Business Agents and the Semantic Web (BASeWEB'03) Workshop, Halifax, Canada, June 14, 2003. 70 Theses Lu Yang, Weighted Tree Similarity, Ph.D. Thesis, in progress. Anurag Singh, “LOMGenIE: A Weighted September 2005. Tree Metadata Extraction Tool,” MCS Thesis, Mathhieu Sebastien, “Match-making in Bartering Scenarios,” MCS Thesis, December 2005. Jin Jin Jing, “Similarity of Weighted Directed Acyclic Graphs,” MCS Thesis, September 2006. 56. Jie Li, “Rule-based Social Networking for Expert Finding,” (co-supervisor: Dr. Harold Boley, NRC), MCS Thesis, September 2006. 71 Thank you ! 72 Semantic Matching Review - References [1]. R. Rada, H. Mili, E. Bicknell, and M. Blettner. Development and Application of a Metric on Semantic Nets. IEEE Transactions on Systems, Man, and Cybernetics. 1989, 19(1):17-30. [2]. M. Sussna. Word Sense Disambiguation for Free-text Indexing Using a Massive Semantic Network. Proceedings of the Second International conference on Information and Knowledge Management, Arlington, Virginia, 1993. [3]. Z. Wu and M. Palmer. Verb Semantics and Lexical Selection. Proceedings of the 32nd Annual Meeting of the Associations for Computational Linguistics, Las Cruces, New Mexico, 1994, 133–138. [4]. P. Resnik. Using Information Content to Evaluate Semantic Similarity in a Taxonomy. Proceedings of the 14th International Joint Conference on Artificial Intelligence, Montreal, August 1995, 1:448-453. [5]. R. Richardson, and A. F. Smeaton. Using WordNet in a Knowledge-Based Approach to Information Retrieval. Working Paper, CA-0395, School of Computer Applications, Dublin City University, Ireland, 1995. [6]. J. J. Jiang and D. W. Conrath. Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. Proceedings of International Conference Research on Computational Linguistics (ROCLING X), Taiwan, 1997. [7]. D. Lin. An Information-Theoretical Definition of Similarity. Proceedings of the Fifth International Conference on Machine Learning, Morgan Kaufmann Publishers Inc., 1998, 296-304. [8]. P.Resnik. Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence Research, 1999, 11:95-130. [9]. Y. Li, Z. A. Bandar, and D. McLean. An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources. IEEE Transactions on Knowledge and Data Engineering, 2003, 15(4):871-882. [10]. C. Corley and R. Mihalcea. Measuring the Semantic Similarity of Texts. Proceedings of the ACL Workshop on empirical Modeling of Semantic Equivalence and Entailment, Ann Arbor, June 2005, 13-18. [11]. A. G. Maguitman, F. Menczer, H. Roinestad and A. Vespingnan. Algorithmic Detection of Semantic Similarity. Proceedings of WWW 2005, Chiba, Japan, May 10-14, 2005. [12]. E. Rahm and P. A. Bernstein. A survey of approaches to automatic schema matching. The VLDB Journal 10:334-350, 2001. [13]. J. Madhavan, P. A. Bernstein and E. Rahm. Generic schema matching with Cupid. Technical Report. Microsoft Research. August, 2001. [14]. S. Melnik, H. G. Molina and E. Rahm. Similarity Flooding: A Versatile Graph Matching Algorithm and its Application to Schema Matching. In Proceedings of 18th Intl. Conf. on Data Engineering (ICDE), San Jose, CA, 2002. [15]. H.-H. Do and E. Rahm. COMA - A System for Flexible Combination of Schema Matching Approaches. The VLDB Journal, 2002. [16]. Institute of Electrical and Electronics Engineers. (1990) IEEE Standard Computer Dictionary: A Compilation of IEEE Standard Computer Glossaries. New York, NY. [17] G. Miller. Nouns in WordNet: A Lexical Inheritance System. International Journal of Lexicography, 1990, 3(4):245-264. 73 References Amenta, N., M. Godwin, N. Postarnakevich, and K. St. John. (2007) Approximating Geodesic Tree Distance. Information Processing Letters 103, 61-65. Bertels, K., N. Panchanathan, and S. Vassiliadis. (2004) Centralized Matchmaking for Minimal Agents. Proceedings of the Conference on Parallel and Distributed Computer Systems, November, 2004. Decker, K., M. Williamson, and K. Sycara. (1996) Matchmaking and Brokering. Proceedings of the Second International Conference on Multi-Agent Systems. Do, H.-H., and E. Rahm. (2002) COMA--A System for Flexible Combination of Schema Matching Approaches. Proceedings of 28th International Conference on Very Large Databases, Hong Kong, 610-621. Field, S., and Y. Hoffner. (2003) Web services and matchmaking. Int. J. Networking and Virtual Organisations, 2(1), 16-32. Jiang, J. J., and D. W. Conrath. (1997) Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. Proceedings of International Conference Research on Computational Linguistics (ROCLING X), Taiwan. Kawamura, T., T. Hasegawa, and A. Ohsuga. (2001) Proposal of Semantics-based Web Service Matchmaking. International Conference on Computational Intelligence and Multimedia Applications. Lu, S. (1979) A tree-to-tree distance and its application to cluster analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1(2), 219-224. Madhavan, J., P. A. Bernstein and E. Rahm. Generic schema matching with Cupid. Technical Report. Microsoft Research. August, 2001. Maguitman, A. G., F. Menczer, H. Roinestad and A. Vespingnan. (2005) Algorithmic Detection of Semantic Similarity. Proceedings of WWW 2005, Chiba, Japan, May 10-14, 2005. Marsh, S., A. Ghorbani and V. C. Bhavsar. (2003) The ACORN Multi-Agent System, Web Intelligence and Agent Systems, IOS Press, Amsterdam, 1(1), 121. Melnik, S., H. Garcia-Molina and E. Rahm. (2001) Similarity Flooding: A Versatile Graph Matching Algorithm. Extended Technical Report, http://dbpubs.stanford.edu/pub/2001-25. Ogston, E., and S. Vassiliadis. (2001) Matchmaking among Minimal Agents without a Facilitator. Proceedings of the 5th International Conference on carnomous Agents, 608-615. Shasha, D., J. Wang, and K. Zhang. (1994) Exact and Approximate Algorithm for Unordered Tree Matching. IEEE Transactions on Systems, Man and Cybernetics, 24(4), 668-678. Shasha, D., J. Wang, and R. Giugno. (2002) Algorithms and Applications of Tree and Graph Searching. Proc. Of the 21th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 39-52. Sussna, M. (1993) Word Sense Disambiguation for Free-text Indexing Using a Massive Semantic Network. Proceedings of the Second International conference on Information and Knowledge Management, Arlington, Virginia. Trastour, D., C. Bartolini, and J. G-C. (2001) A Semantic Web Approach to Service Description for Matchmaking of Services. Internal report, HewlettPackard Company. Wang, J., K. Zhang, K. Jeong, and D. Shasha (1994) A System for Approximate Tree Matching. IEEE Transactions on Knowledge and Data Engineering. 6(4), 559-570. Yang, L., V. C. Bhavsar, and H. Boley. (2008) On Semantic Concept Similarity Methods. The 4th International Conference on Information & Communication Technology and Systems (ICTS), Indonesia (to appear). 74 References Rada, R., H. Mili, E. Bicknell, and M. Blettner. (1989) Development and Application of a Metric on Semantic Nets. IEEE Transactions on Systems, Man, and Cybernetics, 19(1), 17-30. Lin, D. (1998) An Information-Theoretical Definition of Similarity. Proceedings of the Fifth International Conference on Machine Learning, Morgan Kaufmann Publishers Inc., 296-304. Jiang, J. J., and D. W. Conrath. (1997) Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. Proceedings of International Conference Research on Computational Linguistics (ROCLING X), Taiwan. Yang, L., B. K. Sarker, V. C. Bhavsar, and H. Boley. (2007) Range Similarity and Satisfaction Measures for Buyers and Sellers in e-Marketplaces. Journal of Intelligent Systems. Niemann, M., Malgorzata Mochol, and Robert Tolksdrof. (2006) Improving Online Hotel Search - What Do We Need Semantics for?. Semantics: The New Paradigm Shift in IT November 28 - 30, Vienna, Austria. 75