OWLはオントロジーの課題を本当に解決するのか?

Download Report

Transcript OWLはオントロジーの課題を本当に解決するのか?

人工知能学会研究会資料
SIG-SWO-A201-02
OWLはオントロジーの課題を
本当に解決するのか?
山口 高平
(静岡大学)
Semantic Web の9階層
オントロジーに関する
知識工学と次世代Webの流れ
知識工学のトレンド
•
90-: 概念化の明示的仕様
(Tom Gruber オントロジーの定義)
オントロジー記述言語(Ontolingua)
知識交換言語(KIF)
Generic Ontology
CYC, WordNet, EDR…
PSM
Task Ontology
オントロジー構築方法論
...
次世代Webのトレンド
• 95-97: XML as arbitrary
structures
• 97-98: RDF
• 98-99: RDFS (schema) as a
frame-like system
• 00-01: DAML+OIL
• 02-07: OWL
OWLへの道のり
• On-To-Knowledge が OIL を定義
• 重要EUプロジェクトのいくつかが OIL を採用
• DAMLプロジェクトとOILがリンク
• 定義: DAML+OIL
• 改定: DAML+OIL
• W3Cへ DAML+OIL を提出
• WebOnt Working Group の立ち上げ
⇒ OWL のドラフト
Jan’00
Med’00
Sep’00
Dec’00
Mar’01
Aug’01
Oct’01
RDF(S)とOWLの関連
RDF(S)
DAML+OIL
≒OWL
•
•
•
•
•
•
class-def
subclass-of
slot-def
subslot-of
domain
range
• class-expressions
• AND, OR, NOT
• slot-constraints
• has-value, value-type
• cardinality
• slot-properties
• trans, symm
DAML Ontology Library (Ontology's by Keyword)
http://www.daml.org/ontologies/keyword.html
Keyword
URI
academic department
http://www.cs.umd.edu/projects/plus/DAML/onts/cs1.0.daml
academic department
http://www.cs.umd.edu/projects/plus/DAML/onts/cs1.1.daml
Academic Positions
http://www.daml.ri.cmu.edu/ont/homework/cmu-ri-employmenttypesont.daml
access
primitives
http://www.w3.org/2000/10/swap/pim/doc.rdf
control
acronym
http://orlando.drc.com/daml/Ontology/Thesaurus/CALL/current/
activity
http://www.kestrel.edu/DAML/2000/12/OPERATION.daml
Actors
http://opencyc.sourceforge.net/daml/cyc.daml
Actors
http://www.cyc.com/2002/04/08/cyc.daml
Actors
http://www.cyc.com/cyc-2-1/cyc-vocab.daml
address book
http://www.w3.org/2000/10/swap/pim/contact.rdf
agenda
http://www.daml.org/2001/10/agenda/agenda-ont
Baseball Ontology
<rdf:RDF xmlns="http://www.daml.org/2001/08/baseball/baseball-ont#" xmlns:daml="http://www.daml.org/2001/03/daml+oil#" xmlns:oiled=
"http://img.cs.man.ac.uk/oil/oiled#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
<daml:Ontology rdf:about=""> <rdfs:comment>baseball</rdfs:comment> <rdfs:comment>baseball ontology</rdfs:comment>
<daml:versionInfo>"1.0"</daml:versionInfo> </daml:Ontology> <rdfs:Class rdf:about="http://www.daml.org/2001/08/baseball/baseball-ont#PostSeasonGame
<rdfs:label>PostSeasonGame</rdfs:label> <rdfs:comment></rdfs:comment> <oiled:creationDate>12:55:22 26.08.2001</oiled:creationDate>
<rdfs:subClassOf> <rdfs:Class rdf:about="http://www.daml.org/2001/08/baseball/baseball-ont#Game"></rdfs:Class> </rdfs:subClassOf> </rdfs:Class>
<rdfs:Class rdf:about="http://www.daml.org/2001/08/baseball/baseball-ont#Inning"> <rdfs:label>Inning</rdfs:label>
<rdfs:comment></rdfs:comment> <oiled:creationDate>12:22:49 26.08.2001</oiled:creationDate> <rdfs:subClassOf>
<rdfs:Class rdf:about="http://www.daml.org/2001/08/baseball/baseball-ont#AggregateEvent"></rdfs:Class> </rdfs:subClassOf>
<rdfs:subClassOf> <daml:Restriction> <daml:onProperty rdf:resource="http://www.daml.org/2001/08/baseball/baseball-ont#number">
</daml:onProperty> <daml:toClass> <rdfs:Class rdf:about="#http://www.w3.org/TR/xmlschema-2/#string"></rdfs:Class> </daml:toClass>
</daml:Restriction> </rdfs:subClassOf> </rdfs:Class> <rdfs:Class rdf:about="http://www.daml.org/2001/08/baseball/baseball-ont#League">
<rdfs:label>League</rdfs:label> <rdfs:comment></rdfs:comment> <oiled:creationDate>12:35:38 26.08.2001</oiled:creationDate>
<rdfs:subClassOf> <daml:Restriction> <daml:onProperty rdf:resource="http://www.daml.org/2001/08/baseball/baseball-ont#division"></daml:onProperty>
<daml:toClass> <rdfs:Class rdf:about="http://www.daml.org/2001/08/baseball/baseball-ont#Division"></rdfs:Class> </daml:toClass> </daml:Restriction>
</rdfs:subClassOf> </rdfs:Class> <rdfs:Class rdf:about="http://www.daml.org/2001/08/baseball/baseball-ont#Person">
<rdfs:label>Person</rdfs:label> <rdfs:comment></rdfs:comment> <oiled:creationDate>13:00:13 26.08.2001</oiled:creationDate> </rdfs:Class>
<rdfs:Class rdf:about="http://www.daml.org/2001/08/baseball/baseball-ont#Player"> <rdfs:label>Player</rdfs:label> <rdfs:comment></rdfs:comment>
<oiled:creationDate>12:25:42 26.08.2001</oiled:creationDate>
<rdfs:subClassOf> <rdfs:Class rdf:about="http://www.daml.org/2001/08/baseball/baseball-ont#Employee"></rdfs:Class> </rdfs:subClassOf> </rdfs:Class>
<rdfs:Class rdf:about="http://www.daml.org/2001/08/baseball/baseball-ont#Relief"> <rdfs:label>Relief</rdfs:label> <rdfs:comment></rdfs:comment>
<oiled:creationDate>18:31:03 28.08.2001</oiled:creationDate> <rdfs:subClassOf>
<rdfs:Class rdf:about="http://www.daml.org/2001/08/baseball/baseball-ont#LineupEvent"></rdfs:Class> </rdfs:subClassOf> </rdfs:Class>
<rdfs:Class rdf:about="http://www.daml.org/2001/08/baseball/baseball-ont#Season">
<rdfs:label>Season</rdfs:label> <rdfs:comment></rdfs:comment> <oiled:creationDate>12:13:04 26.08.2001</oiled:creationDate> <rdfs:subClassOf>
<daml:Restriction> <daml:onProperty rdf:resource="http://www.daml.org/2001/08/baseball/baseball-ont#year"></daml:onProperty> <daml:toClass>
<rdfs:Class rdf:about="#http://www.w3.org/TR/xmlschema-2/#string"></rdfs:Class> </daml:toClass> </daml:Restriction> </rdfs:subClassOf>
</rdfs:Class> <rdfs:Class rdf:about="http://www.daml.org/2001/08/baseball/baseball-ont#Series">
<rdfs:label>Series</rdfs:label> <rdfs:comment></rdfs:comment> <oiled:creationDate>12:56:25 26.08.2001</oiled:creationDate> <rdfs:subClassOf>
<daml:Restriction> <daml:onProperty rdf:resource="http://www.daml.org/2001/08/baseball/baseball-ont#home"></daml:onProperty>
<daml:toClass rdf:resource="http://www.daml.org/2001/08/baseball/baseball-ont#Team"/> </daml:Restriction> </rdfs:subClassOf> <rdfs:subClassOf>
<daml:Restriction> <daml:onProperty rdf:resource="http://www.daml.org/2001/08/baseball/baseball-ont#away"></daml:onProperty> <daml:toClass
ソフトウェア開発プロセス
のためのオントロジー
開発(人手)経験
Approach
Issue
First,
Real software processes
change dynamically...
develop software process ontologies
through the domain analysis using
two case studies
very difficult to maintain
need the facility to correspond
to changing process dynamically
Afterwards,
design the system which generate
software processes interactively
with a user
Goal
A Process-Centered Software Engineering
Environment Using Ontologies
Development of Software Process Ontologies
Step1. Extract real software processes and objects from
the two cases
Step2. Organize extracted processes and objects into distinct
groups such that there are more similarity in meaning,
while defining the software process scheme
C
output
A
Invent
A+B
input
Process
reference
B
α
A
A’
or
+
Join
α
Append
B’ Extract
Step3. Give detailed definitions to each processes
using software process scheme
Process Ontology
Invent
Join
activity
Append
Invent with references
Invent without references
Join with references
Join without references
Evaluate
Review
Examine
Extract with references
Extract
Focus on input,output,
and reference roles
Extract without references
Focus on reference,
tool and agent roles
Software Process Ontologies(conceptual hierarchy)
Process Ontology including
about 250 concepts
Object Ontology including
about 400 concepts
Software Process Automation
using ontologies
?
Start
Object
?
?
?
Goal
Object
?
Retrieve processes
satisfied with constraints
Process Ontology
Object Ontology
Case Study
A query about SEP
What SEP between the following input and output ?
Input
: Requirement Specification from
S/W Subsystems
Output : Detailed Design of Each Component of
S/W Subsystems
Initial Software Process Plan Candidates
Output
Specify Testing
S/W Integration
Design Basic
Components of
S/W Subsystems
Design Detailed
Components of
S/W Subsystems
Detailed Design of
Each Components of
S/W Subsystems
Review All
Detailed Designs
Together
Input
Requirement
Specification
from
S/W Subsystems
Design Basic
Components of
I/F Subsystems
Review All
Basic Designs
Together
Design Detailed
Components of
I/F Subsystems
Evaluate All
Detailed Designs
Specify Testing
Units of S/W
Design Logical
Aspects of DB
Evaluate All
Basic Designs
Design Physical
Aspects of DB
Interim Software Process Plan Candidates
Output
Specify Testing
S/W Integration
Design Basic
Components of
S/W Subsystems
Design Detailed
Components of
S/W Subsystems
Detailed Design of
Each Components of
S/W Subsystems
Review All
Detailed Designs
Together
Input
Requirement
Specification
from
S/W Subsystems
Design Basic
Components of
I/F Subsystems
Design Detailed
Components of
I/F Subsystems
Evaluate All
Detailed Designs
Specify Testing
Units of S/W
Design Logical
Aspects of DB
Evaluate All
Basic Designs
Design Physical
Aspects of DB
Final Software Process Plan Candidates
Output
Specify Testing
S/W Integration
Design Basic
Components of
S/W Subsystems
Design Detailed
Components of
S/W Subsystems
Detailed Design of
Each Components of
S/W Subsystems
Review All
Detailed Designs
Together
Input
Requirement
Specification
from
S/W Subsystems
Design Basic
Components of
I/F Subsystems
Design Detailed
Components of
I/F Subsystems
Evaluate All
Detailed Designs
Specify Testing
Units of S/W
Design Logical
Aspects of DB
Evaluate All
Basic Designs
Design Physical
Aspects of DB
Evaluation
Applicability
It turned out that the environment generated SPP candidates
good for the user’s query with user interaction
Issues and Future work
•The environment can’t allocate human and time resource
•The environment has not the facility that supports the user
to add user’s specific processes
•The methodology for building ontologies has no theoretical
aspects,such as soundness
計算機可読型辞書と
テキストコーパスからの
オントロジー開発(半自動)経験
How to Build up Domain
Ontologies with Less Cost
• Taking Existing Information Resources
DODDLE Project
(a Domain Ontology rapiD DeveLopment Environment)
exploiting a MRD
Constructing up just a conceptual hierarchy
Why not taking MRD
to build up domain ontologies ?
• Good News:
MRD have many concepts.
• Bad News:
The concepts have been defined from the
point of natural language processing
(common sense) and so there are some
concept drift between MRD and specific
domain ontologies.
Concept Drift
DomainB
Domain A
Reusable Part
No reusable part
because of
concept drift
MRD
Domain
DODDLE
Spell-Match
A Set of
Input Terms
Spell-Matched
Results
Selection of Best-match
Initial Model
Trimming
Trimmed Model
Domain
Expert
•Matched-Results Analysis
•Trimmed- Results Analysis
Extension of Modification
Concept Hierarchy
WordNet
Matched-Results Analysis
Root
Move
Stay
Move
Best-Match
Internal Node
Move
Trimmed-Results Analysis
A
User re-constructs
A
the sub-tree.
A
Trimmed part
B C
0
B
3
0
C D
B
C
Trimmed Model
D
Initial Model
D
Acquiring not only Conceptual
Hierarchy but Concept Definitions
DODDLE II Project
(Extending DODDLE)
Info. Resource:Domain-Specific Texts
Technique: Co-occurence information
WordSpace
a Set of Domain Terms
MRD
WordNet
Taxonomic
Relationship
Acquisition
Module
non-Taxonomic
Relationship
Learning
Module
Taxonomic
Relationships
Domain Specific
Texts
non-Taxonomic
Relationships
Concept Specification
Template
extension & modification
DODDLE-II
Overview
Domain
Ontology
Extracting Concept Relationships
from Domain-Specific Texts
WordSpace
(Marti A. Hearst, Hinrich Schutze)
• Words and phrases in texts can be expressed by
vector representation containing co-occurrence statistics
• Inner products among the vectors work as
the similarity between the words and phrases.
Context Similarity between concepts C1 and C2
… wi … wj …C1 … wk …
… wi … wj …C2 … wk …
high value with similar words
coming up around the concepts
Constructing WordSpace
1.
2.
3.
4.
5.
Extract Word 4-grams with high-frequency
Collocation Matrix
Context Vectors
Word Vectors ( WordSpace is a set of word vectors )
Vector representations of all concepts
Texts (4-gram array)
…
.. f8 f4 f3 f7 f8
f4 f1 f3 f 8 f9 f2
f5 f1 f7 f1 f5 ..
…
Texts (4-gram array)
…
.. f8 f4 f3 f7 f8
f4 f 1 f3 f4 f9 f2
f5 f1 f7 f1 f5 ..
…
collocation scope
f1 …
f2
.. fn
f1
..
Collocation Matrix
Vector representation
for each 4-gram
context scope
f4
2
…
fn
Texts (4-gram vector array)
…
.. f8 f4 f3 f7 f8
f4 f1 f3 f4 w f2
f5 f1 f7 f1 f5 .. w
…
w
C
w1
w2
w3
w4
WordNet synset
Context vector w
A sum of 4-gram vectors around w
Word Vector W
A sum of w in Texts
Vector representation of a concept
4-gram vector
A sum of W in same synset
C
Constructing
Concept Specification Template
Taxonomic Relationships
from TR Acquisition Module
Cb
Ci
Ck
ancestor, descendant and sibling
a Set of Concept Pairs
from non-TR Learning Module
Ci
Ci
Ci
Ci
Ca
Cb
Cc
Cd
Concept Specification Template
Ci
non-TAXONOMY?
TAXONOMY
non-TAXONOMY?
non-TAXONOMY?
:
:
:
:
Ca
Cb
Cc
Cd
DODDLE-IIalpha on Perl/Tk (UNIX platform)
A Case Study
The Target Domain
Contracts for the International Sale of Goods(CISG)
Input Terms to TR module
46 Legal Terms from CISG Part-II
Domain-Specific Texts to NonTR module
full text of CISG (about 10,000 words)
Results
Select Best Matches
46Terms
Trimming
Modification
377Terms
113Terms
56Terms
The paths from
Spell Matched Nodes
to the Root of WordNet
Initial Model
Trimmed Model
61Terms
Legal Ontology
Taxonomic Relationships for input terms
constructed from TR Acquisition Module
Support Ratio
Support ratio: How match is included in final domain ontology
the intermediate products at each DODDLE activity.
WN's structure(52.2%)
Small DT(8.0%)
Strategy1 (5.3%)
Strategy2 (6.9%)
Correction by user (21.5%)
Addition by user (6.1%)
100.0
C o m po n e n t ra te ( % )
90.0
80.0
70.0
60.0
50.0
40.0
30.0
20.0
10.0
0.0
T rim m e d M o de l
E xte n de d T rim m e d
In te rn al D o m ain
In te rn al D o m ain
M o de l
O n to lo gy1
O n to lo gy2
D O D D L E's pro c e s s
D o m ain O n to lo gy
setting value for WordSpace
Extracting
Concept Relationships
Extraction frequency of 4-gram (times)
8
Collocation scope (4-grams)
10
Context scope (4-grams)
60
the concept pairs extracted according to context similarity (threshold 0.9993)
Domain Experts Modifying
Concept Specification Templates
ex) non-Taxonomic Relationships for “assent”
Concept Specification Template
assent
non-Taxonomic Relationships
from the template
assent
AGENT
: person
: act
LEGAL-SEQUENCE
: offer
non-TAXONOMY?
: effect
ANTONYM
: withdrawal
non-TAXONOMY?
: offer
non-TAXONOMY?
: person
non-TAXONOMY?
: offeree
non-TAXONOMY?
: withdrawal
non-TAXONOMY?
: time
TAXONOMY
: proposal
non-TAXONOMY?
: offeror
TAXONOMY
DE Modification
non-taxonomic relationship:
taxonomic relationship:
inheritance:
unnecessary:
person, offer, withdrawal
act, proposal
offeror, offeree
effect, time
Recall & Precision
of Concept Specification Templates
recall
precision
The number of extracted concept pairs
Conclusions
DODDLE II: A Domain Ontology Construction Support
Environment using MRD and Domain-Specific Texts
Legal experts appreciate DODDLE II to some extent.
Future Work
•How to Decide Threshold for CS
• How to Identify NTR
•Another DM method instead of WordSpace
• Another Case Studies with Large Scale
OWLとオントロジー開発
• OWLとオントロジー開発
⇔ UMLとソフトウェア開発
• OWL+RDF(S)+RDF
⇒中味のあるオントロジー
• オントロジー構築支援ツールが必要
• オントロジーの重量⇔スケーラビリティ