http://ontogen.ijs.si ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute Outline    Motivation Functionality Conclusion Blaz Fortuna, Jozef Stefan Institute, Slovenia HCII2007, July 26th.

Download Report

Transcript http://ontogen.ijs.si ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute Outline    Motivation Functionality Conclusion Blaz Fortuna, Jozef Stefan Institute, Slovenia HCII2007, July 26th.

http://ontogen.ijs.si
ONTOGEN
SEMI-AUTOMATIC
ONTOLOGY EDITOR
Blaz Fortuna, Marko Grobelnik, Dunja Mladenic
Jozef Stefan Institute
Outline
2



Motivation
Functionality
Conclusion
Blaz Fortuna, Jozef Stefan Institute, Slovenia
HCII2007, July 26th
3
Motivation
Blaz Fortuna, Jozef Stefan Institute, Slovenia
HCII2007, July 26th
What is ontology?
4



Ontology is a data model that represents a set of concepts within a domain
and the relationships between those concepts.
Generally it consist of

Classes: sets, collections, or types of objects

Instances: the basic or "ground level" objects

Relations: ways that objects can be related to one another
It can be used

… as schema for knowledge management system,

… to reason about the objects within that domain,

etc.
Blaz Fortuna, Jozef Stefan Institute, Slovenia
HCII2007, July 26th
Sample Ontology
5
Blaz Fortuna, Jozef Stefan Institute, Slovenia
HCII2007, July 26th
Creating Ontology
6

Ontology is normally designed by
knowledge engineers using ontology
editors:



Domain
Expert
Domain experts are needed to aid
the knowledge engineer at the
understanding the domain


Protégé, OntoStudio, …
Xerox
Ontology editors are not aware of the
ontology’s domain
Our goal is to make ontology editor
easy-to-use and domain-aware so
that it can be used by domain experts.

Reduces the need for knowledge engineer

This is done through the use of text mining
and machine learning.
Xerox Corporation Yahoo!
is a technology
and services enterprise engaged
The
in developing,
Yahoo! Inc. manufacturing,
isWashington
a provider Post
of
marketing,
servicing products
and financing and
Internet
a
portfolio
Company's
of
document
principal business
services
to
consumers
and
equipment, software,
activities solutions
consist and
of newspaper
businesses
services.
It manages
publishingitsthrough
business
(principally
in the The
Yahoo!
Network,
its
four segments:
Washington
Production,
Office,
Post),
television
Developing
broadcasting
Markets
Operations
(through
worldwide network of online the
(DMO)properties.
and ownership
Other. The
and
Production
operation of six
The
Company's
segment includes
television
blackbroadcast
and whitestations), the
properties
and atservices
for
products, which
ownership
operateand
operation
speeds of cable
consumers
andsystems,
over 90
pages
television
per minute
…businesses
magazine
residepublishing
in four (principally
areas: Search
Newsweek
magazine), and
and Marketplace,
… (through its
Kaplan subsidiary) the provision of
educational services. …
Knowledge
Engineer
Ontology
Editor
In this presentation we focus on
construction of Topic Ontologies
Blaz Fortuna, Jozef Stefan Institute, Slovenia
HCII2007, July 26th
How does it work?
7

OntoGen suggests concepts

Suggestions are generated automatically




User selects appropriate suggestions and adds them to the ontology

OntoGen helps deciding which suggestions to include




… from the text corpus by clustering similar documents
… based on user query
… through text corpus map
… by extracting main keywords from the documents
… with ontology and concept visualizations
… by list documents behind concepts
Behind each concept there is a set of documents


Documents are automatically assigned to concepts
Document assignments can be edited manually
Blaz Fortuna, Jozef Stefan Institute, Slovenia
HCII2007, July 26th
Example
8
Text corpus
Ontology
Concept A
Concept B
Domain
Concept C
Blaz Fortuna, Jozef Stefan Institute, Slovenia
HCII2007, July 26th
9
Functionality
Blaz Fortuna, Jozef Stefan Institute, Slovenia
HCII2007, July 26th
Main Features
10

Interactive user interface


User can interact in real-time
with the integrated machine
learning and text mining
methods

Methods for helping at
understanding the
discovered concepts:

Keyword extraction

Concept discovery methods:

Unsupervised


System provides suggestions

Concept visualization

Supervised


Concept learning
Concept visualization
Generates a list of
characteristic keywords of a
given concept

Creates a map of documents
from a given concept
Also available as a separate
tool named Document Atlas

Blaz Fortuna, Jozef Stefan Institute, Slovenia
http://docatlas.ijs.si
HCII2007, July 26th
Main view
11
Ontology
visualization
Concept
hierarchy
List of suggested
sub-concepts
Selected
concept
Concept suggestion
12
Add new
concept
Selected
concept
Suggested
subconcepts
Blaz Fortuna, Jozef Stefan Institute, Slovenia
New
concept
HCII2007, July 26th
Personalized suggestions
13
Lloyd’s CEO questioned in
recovery suit in U.S.
Ronald Sandler, chief executive
of Lloyd's of London, on Tuesday
underwent a second day of
court interrogation about …
Countries view
Topics view
UK takeovers and mergers
The following are additions and
deletions to the takeovers and
mergers list for the week
beginning August 19, as
provided by the Takeover …
Blaz Fortuna, Jozef Stefan Institute, Slovenia
HCII2007, July 26th
Concept learning
14
Query
New Concept
Blaz Fortuna, Jozef Stefan Institute, Slovenia
HCII2007, July 26th
Concept’s instances visualization
15


Instances are visualized
as points on 2D map

The distance between two
instances on the map
correspond to their
content similarity

Characteristic keywords
are shown for all parts of
the map
User can select groups of
instances on the map to
create sub-concepts.
Blaz Fortuna, Jozef Stefan Institute, Slovenia
HCII2007, July 26th
Concept management
16
Selected
concept
Selected
instance
Concept’s
details
Keywords
Concept’s
instance
management
Adding new documents to ontology
17
New documents
Selected
document
Content of
selected
document
Classification of
selected document
Blaz Fortuna, Jozef Stefan Institute, Slovenia
HCII2007, July 26th
18
Conclusions
Blaz Fortuna, Jozef Stefan Institute, Slovenia
HCII2007, July 26th
Evaluation
19


First prototype was successfully used in several commercial projects:

Applied in multiple domains: business, legislations and digital libraries

Users were always domain experts with limited knowledge and experience with
ontology construction / knowledge engineering

Valuable data from first trails was used as input for the interface design of the
second prototype (the one presented here).
Feedback from the users of the second prototype

Main impression was that the tool saves time and is especially useful when
working with large collections of documents

Among main disadvantages were abstraction and unattractive look

Many users use the program for exploration of the data
Blaz Fortuna, Jozef Stefan Institute, Slovenia
HCII2007, July 26th
Future work
20




Tools for suggestion and learning of more complex relations
Extended support for collaborative editing of ontologies
Easier input of background knowledge
Improvement of the user interface based on the feedback from
user trails and real-world users
Blaz Fortuna, Jozef Stefan Institute, Slovenia
HCII2007, July 26th
21
Thank you for listening!
Questions? Comments?
http://ontogen.ijs.si
Blaz Fortuna, Jozef Stefan Institute, Slovenia
HCII2007, July 26th