http://ontogen.ijs.si ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute Outline Motivation Functionality Conclusion Blaz Fortuna, Jozef Stefan Institute, Slovenia HCII2007, July 26th.
Download ReportTranscript http://ontogen.ijs.si ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute Outline Motivation Functionality Conclusion Blaz Fortuna, Jozef Stefan Institute, Slovenia HCII2007, July 26th.
http://ontogen.ijs.si ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute Outline 2 Motivation Functionality Conclusion Blaz Fortuna, Jozef Stefan Institute, Slovenia HCII2007, July 26th 3 Motivation Blaz Fortuna, Jozef Stefan Institute, Slovenia HCII2007, July 26th What is ontology? 4 Ontology is a data model that represents a set of concepts within a domain and the relationships between those concepts. Generally it consist of Classes: sets, collections, or types of objects Instances: the basic or "ground level" objects Relations: ways that objects can be related to one another It can be used … as schema for knowledge management system, … to reason about the objects within that domain, etc. Blaz Fortuna, Jozef Stefan Institute, Slovenia HCII2007, July 26th Sample Ontology 5 Blaz Fortuna, Jozef Stefan Institute, Slovenia HCII2007, July 26th Creating Ontology 6 Ontology is normally designed by knowledge engineers using ontology editors: Domain Expert Domain experts are needed to aid the knowledge engineer at the understanding the domain Protégé, OntoStudio, … Xerox Ontology editors are not aware of the ontology’s domain Our goal is to make ontology editor easy-to-use and domain-aware so that it can be used by domain experts. Reduces the need for knowledge engineer This is done through the use of text mining and machine learning. Xerox Corporation Yahoo! is a technology and services enterprise engaged The in developing, Yahoo! Inc. manufacturing, isWashington a provider Post of marketing, servicing products and financing and Internet a portfolio Company's of document principal business services to consumers and equipment, software, activities solutions consist and of newspaper businesses services. It manages publishingitsthrough business (principally in the The Yahoo! Network, its four segments: Washington Production, Office, Post), television Developing broadcasting Markets Operations (through worldwide network of online the (DMO)properties. and ownership Other. The and Production operation of six The Company's segment includes television blackbroadcast and whitestations), the properties and atservices for products, which ownership operateand operation speeds of cable consumers andsystems, over 90 pages television per minute …businesses magazine residepublishing in four (principally areas: Search Newsweek magazine), and and Marketplace, … (through its Kaplan subsidiary) the provision of educational services. … Knowledge Engineer Ontology Editor In this presentation we focus on construction of Topic Ontologies Blaz Fortuna, Jozef Stefan Institute, Slovenia HCII2007, July 26th How does it work? 7 OntoGen suggests concepts Suggestions are generated automatically User selects appropriate suggestions and adds them to the ontology OntoGen helps deciding which suggestions to include … from the text corpus by clustering similar documents … based on user query … through text corpus map … by extracting main keywords from the documents … with ontology and concept visualizations … by list documents behind concepts Behind each concept there is a set of documents Documents are automatically assigned to concepts Document assignments can be edited manually Blaz Fortuna, Jozef Stefan Institute, Slovenia HCII2007, July 26th Example 8 Text corpus Ontology Concept A Concept B Domain Concept C Blaz Fortuna, Jozef Stefan Institute, Slovenia HCII2007, July 26th 9 Functionality Blaz Fortuna, Jozef Stefan Institute, Slovenia HCII2007, July 26th Main Features 10 Interactive user interface User can interact in real-time with the integrated machine learning and text mining methods Methods for helping at understanding the discovered concepts: Keyword extraction Concept discovery methods: Unsupervised System provides suggestions Concept visualization Supervised Concept learning Concept visualization Generates a list of characteristic keywords of a given concept Creates a map of documents from a given concept Also available as a separate tool named Document Atlas Blaz Fortuna, Jozef Stefan Institute, Slovenia http://docatlas.ijs.si HCII2007, July 26th Main view 11 Ontology visualization Concept hierarchy List of suggested sub-concepts Selected concept Concept suggestion 12 Add new concept Selected concept Suggested subconcepts Blaz Fortuna, Jozef Stefan Institute, Slovenia New concept HCII2007, July 26th Personalized suggestions 13 Lloyd’s CEO questioned in recovery suit in U.S. Ronald Sandler, chief executive of Lloyd's of London, on Tuesday underwent a second day of court interrogation about … Countries view Topics view UK takeovers and mergers The following are additions and deletions to the takeovers and mergers list for the week beginning August 19, as provided by the Takeover … Blaz Fortuna, Jozef Stefan Institute, Slovenia HCII2007, July 26th Concept learning 14 Query New Concept Blaz Fortuna, Jozef Stefan Institute, Slovenia HCII2007, July 26th Concept’s instances visualization 15 Instances are visualized as points on 2D map The distance between two instances on the map correspond to their content similarity Characteristic keywords are shown for all parts of the map User can select groups of instances on the map to create sub-concepts. Blaz Fortuna, Jozef Stefan Institute, Slovenia HCII2007, July 26th Concept management 16 Selected concept Selected instance Concept’s details Keywords Concept’s instance management Adding new documents to ontology 17 New documents Selected document Content of selected document Classification of selected document Blaz Fortuna, Jozef Stefan Institute, Slovenia HCII2007, July 26th 18 Conclusions Blaz Fortuna, Jozef Stefan Institute, Slovenia HCII2007, July 26th Evaluation 19 First prototype was successfully used in several commercial projects: Applied in multiple domains: business, legislations and digital libraries Users were always domain experts with limited knowledge and experience with ontology construction / knowledge engineering Valuable data from first trails was used as input for the interface design of the second prototype (the one presented here). Feedback from the users of the second prototype Main impression was that the tool saves time and is especially useful when working with large collections of documents Among main disadvantages were abstraction and unattractive look Many users use the program for exploration of the data Blaz Fortuna, Jozef Stefan Institute, Slovenia HCII2007, July 26th Future work 20 Tools for suggestion and learning of more complex relations Extended support for collaborative editing of ontologies Easier input of background knowledge Improvement of the user interface based on the feedback from user trails and real-world users Blaz Fortuna, Jozef Stefan Institute, Slovenia HCII2007, July 26th 21 Thank you for listening! Questions? Comments? http://ontogen.ijs.si Blaz Fortuna, Jozef Stefan Institute, Slovenia HCII2007, July 26th