Transcript Semantic Web: the Story So Far
Semantic Web The Story So Far Ian Horrocks
The Semantic Web
What is it?
• • Web “invented” by
Tim Berners-Lee
(amongst others) – (Conceptual) simplicity of web has contributed to success, but is also a limiting factor Tim has ambitious goals for future of the web – Objective is to overcome existing limitations “… a
consistent logical web of data
…” “… information is given
well-defined meaning
…” • This vision of the future of the Web has become known as the
Semantic Web
Why do we want it?
Many tasks are difficult or impossible using existing web: Rev. Alan M. Gates, Associate Rector of the Church of the Holy Spirit, Lake Forest, Illinois
Why do we want it?
• • • Many tasks are difficult or impossible using existing web: Complex queries involving
background knowledge
– Find information about “animals that use sonar but are neither bats nor dolphins”
, e.g., Barn Owl
Locating information in
data repositories
– Travel enquiries – Prices of goods and services – Results of human genome experiments Finding and using “
web services
” – Given DNA sequence , identify genes , determine proteins they produce, and hence biological processes they control
What is the Problem?
Consider a typical web page: • Markup consists of: – rendering information (e.g., font size and colour) – Hyper-links to related content • Semantic content is accessible to humans, but not (easily) to computers…
•
How Will It Work?
Add semantic annotations to web resources Dr.
How Will It Work?
Now...
that
should clear up a few things around here
• •
Giving Semantics to Annotations
Agree on meaning
of a set of annotation tags • E.g., Dublin Core – Limited flexibility and extensibility – Limited number of things can be expressed
Agree on language
used to define meanings • E.g., an ontology language – Flexible and extensible • New terms can be formed by combining existing ones – Meaning (semantics) of such terms is formally specified
The Web Ontology Language OWL
• • • • •
Web Ontology Language OWL
Semantic Web led to requirement for a “web ontology language” set up Web-Ontology (
WebOnt
) Working Group – WebOnt developed
OWL
language – OWL based on earlier languages
RDF
,
OIL
and
DAML+OIL
– OWL now a W3C
recommendation
(i.e., a standard) OWL is a family of 3 languages: OWL Lite, OWL DL and OWL Full OIL, DAML+OIL and OWL (DL & Lite) based on
Description Logics
– Has facilitated development of wide range of high quality tools & infrastructure OWL now language of choice in many applications
•
What Are Description Logics?
A family of logic based Knowledge Representation formalisms – Descendants of
semantic networks
and
KL-ONE
– Describe domain in terms of
concepts
(classes),
roles
(properties, relationships) and
individuals
– –
Operators Names
allow for composition of complex concepts can be given to complex concepts, e.g.:
Parent u 8 hasChild .(Intelligent t
t
Athletic) )
•
Why (Description) Logic?
OWL exploits results of 15+ years of DL research – Well defined (model theoretic)
semantics
– Most DLs are subsets of C2, i.e., decidable fragments of FOL
•
Why (Description) Logic?
OWL exploits results of 15+ years of DL research – Well defined (model theoretic)
semantics
–
Formal properties
well understood (complexity, decidability)
I can’t find an efficient algorithm, but neither can all these famous people.
[Garey & Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, 1979.]
•
Why (Description) Logic?
OWL exploits results of 15+ years of DL research – Well defined (model theoretic)
semantics
–
Formal properties
well understood (complexity, decidability) – Known
reasoning algorithms
•
Why (Description) Logic?
OWL exploits results of 15+ years of DL research – Well defined (model theoretic)
semantics
–
Formal properties
well understood (complexity, decidability) – Known
reasoning algorithms
–
Implemented systems
(highly optimised)
KAON2
Pellet CEL
Class/Concept Constructors
• Concept can be thought of as a FOL formula with one free variable
Knowledge Base / Ontology Axioms
OWL RDF/XML Exchange Syntax
E.g., Parent u 8 hasChild.(Intelligent t Athletic):
• •
Ontology based Information Systems
Similar to
relational databases
– Ontology
¼
schema; instances
¼
data Some important (
dis
)
advantages
+ (Relatively) easy to maintain and update schema • Both schema and data are “self organising” + Query answers reflect both schema and data + Able to answer both intensional and extensional queries – Semantics may be counter-intuitive or even inappropriate • Open -v- closed world; axioms -v- constraints – Query answering (logical entailment) much more difficult • Can lead to scalability problems
• •
Ontology based Information Systems
Similar to
relational databases
– Ontology
¼
schema; instances
¼
data Some important (
dis
)
advantages
+ (Relatively) easy to maintain and update schema • Both schema and data are “self organising” + Query answers reflect both schema and data + Able to answer both intensional and extensional queries – Semantics may be counter-intuitive or even inappropriate • Open -v- closed world; axioms -v- constraints – Query answering (logical entailment) much more difficult • Can lead to scalability problems
Very useful, but don’t expect miracles!
Ontologies and Reasoning
• •
Support for Ontology Engineering
Developing and maintaining
quality ontolgies
is very challenging Users need
tools
and
services
, e.g., to help check if ontology is: –
Meaningful
— all named classes can have instances
• •
Support for Ontology Engineering
Developing and maintaining
quality ontolgies
is very challenging Users need
tools
and
services
, e.g., to help check if ontology is: –
Meaningful
— all named classes can have instances –
Correct
— captures intuitions of domain experts
• •
Support for Ontology Engineering
Developing and maintaining
quality ontolgies
is very challenging Users need
tools
and
services
, e.g., to help check if ontology is: –
Meaningful
— all named classes can have instances –
Correct
— captures intuitions of domain experts –
Minimally redundant
— no unintended synonyms Banana split Banana sundae
•
Support for Ontology Engineering
Range of new “non-standard” services supporting, e.g.: – –
Modular design
and
integration
• What is the effect of merging O 2 into O 1 ?
• In general, check that O 1
[
O 2
²
C iff O 1
²
C for any concept C constructed using vocabulary occurring in O 1
Module Extraction
• Extract a (small) module from O about some vocabulary V capturing all “relevant” information • In general, find O ’
µ
O s.t. O ’
²
C iff O constructed using terms from V
²
C for any concept C –
Bottom-up design
• Find a (small and specific) concept describing a set of individuals • In general, find most specific C s.t. O
²
C(i 1 )
Æ
…
Æ
C(i n ) – Where C may be “small” and/or in a sub-language (of O )
•
Support for Ontology Engineering
Range of new “non-standard” services supporting, e.g.: –
Error diagnosis
and repair
•
Support for Query Answering
In an
Ontology
based
Information System
Query answering
¼
computing (OIS),
logical entailment
–
Reasoner
needed in order to answer queries, e.g.: • C is a sub-class of D iff O
² 8
x . C(x)
!
D(x) • a is an instance of C iff O
²
C(a)
OIS with no reasoner ¼ DBMS with no query engine
Example Applications
•
e-Science
E.g., for “in silico” investigations and “
hypothesis testing
” – Comparing data (e.g., on proteins) to (model of) biological knowledge – Characteristics of proteins captured in an ontology O • Goal is to
identify protein instances
based on characteristics
•
e-Science
E.g., for “in silico” investigations and “
hypothesis testing
” – Comparing data (e.g., on proteins) to (model of) biological knowledge – Characteristics of proteins captured in an ontology O • Goal is to
identify protein instances
based on characteristics – Equivalent to
answering queries
of form:
O ² P(i)? for protein P and instance i
– Result may be discovery of new kinds of protein • And these may be potential
drug targets
if unique to a pathenogen – Result may also be discovery of errors in model • Which may reflect
gaps/errors in existing knowledge
• •
Healthcare
UK NHS has a
£6.2 billion
“Connecting for Health” IT programme Key component is
Care Records Service
(CRS) – “Live, interactive patient record service accessible 24/7” – Patient
data distributed
and a national DB across local centres in 5 regional clusters, • Detailed
records
held by local service providers • Diverse
applications
support radiology, pharmacy, etc • • Applications exchange
messages
information” containing “semantically rich clinical
Summaries
sent to national database –
SNOMED-CT
ontology provides common
vocabulary
for data • Clinical data uses terms drawn from ontology
•
SNOMED
Over
400,000 concepts
• • • •
SNOMED
Over
400,000 concepts Schema only
— no instances Language used is a (well known)
fragment of OWL
NHS version extended with 1,000s of additional classes –
OWL reasoner
(FaCT++) used to classify and check ontology • Currently takes
¼
4 hours – 180
missing subClass relationships
were found, e.g.: • Periocular_dermatitis subClassOf Disease_of_face • Fibrin_measurement subClassOf Coagulation_factor_assay
• •
SNOMED
Vocabulary is
extensible
at point of use: “post coordination” – Users (e.g. clinicians) may add/define new vocabulary – Terminology service (reasoner) used to insert in ontology Typical new term: –
almond_allergy
´ “allergy caused_by almond” – OWL reasoner (FaCT++) used to classify new term • Takes <10 ms – Classified as a kind of “
nut allergy
” • Clearly of
crucial importance
to recognise patients with allergy caused by almond as kinds of patient with nut allergy
Recent Developments
Improving Scalability
• • •
Optimisation techniques
– Improve performance of DL reasoners, e.g., [Tsarkov et al, JAR, 2007]
New reasoning techniques
– Reduction to disjunctive Datalog [Motik et at, KR-04] – Hybrid DL-DB systems [Horrocks et al, CADE-05] – Hypertableau based algorithms [Motik et al, CADE-07]
Polynomial time algorithms
for sub-ALC logics – Graph based techniques for EL+ [Baader et al, IJCAI-05] – Database techniques for DL-Lite [Calvanese et al, AAAI-05]
•
Extending Tools and Infrastructure
Editors/environments
– Oiled, Protégé, Swoop, TopBraid, Ontotrack, …
• •
Extending Tools and Infrastructure
Editors/environments
– Oiled, Protégé, Swoop, TopBraid, Ontotrack, …
Reasoning systems
– Cerebra, FaCT++, Kaon2, Pellet, Racer, CEL, …
KAON2
Pellet CEL
• • •
Extending Tools and Infrastructure
Editors/environments
– Oiled, Protégé, Swoop, TopBraid, Ontotrack, …
Reasoning systems
– Cerebra, FaCT++, Kaon2, Pellet, Racer, CEL, …
Design methodologies
– Modularity, foundational ontologies, etc.
Entity Endurant Quality Substantial Perdurant Event Stative Achievement Accomplishment
Increasing Expressive Power
• • •
Database style keys
[Lutz et al, JAIR 2004]
Rule language extensions
– W3C RIF WG (see http://www.w3.org/2005/rules/) – First order extensions (e.g., SWRL) [Horrocks et al, JWS, 2005] – Hybrid language extensions, e.g., [Eiter et al, KR-04; Motik et al, ISWC-04; Rosati, JoWS, 2005] – LP/F-Logic/Common Logic [Chen et al, JLP, 1993; de Bruijn et al, WWW-05]
Other extensions
– Temporal, Fuzzy, … •
OWL 1.1
extension to OWL
• • • •
OWL 1.1
Is an
extension of OWL
– Addresses deficiencies identified by users and developers (at
OWLED workshop
) Is based on more expressive DL:
SROIQ
– (OWL is based on
SHOIN
) W3C
working group
now chartered – Will develop recommendation based on existing member submission
Already supported
by popular OWL tools – Protégé, Swoop, TopBraid, FaCT++, Pellet
What’s New in OWL 1.1?
• Four kinds of features:
More expressive logic
– qualified cardinality restrictions, e.g.: ObjectMinCardinality(2 friendOf hacker) – property chain inclusion axioms, e.g.: SubObjectPropertyOf(SubObjectPropertyChain(parent brother) uncle) – local reflexivity restrictions, e.g.: ObjectExistsSelf(likes) [for narcissists] – reflexive, irreflexive, symmetric, and antisymmetric properties, e.g.: ReflexiveObjectProperty(knows); IrreflexiveObjectProperty(husbandOf) – disjoint properties, e.g.: DisjointObjectProperties(childOf spouseOf)
What’s New in OWL 1.1?
• Four kinds of features:
More expressive datatypes
– User-defined datatypes using facets from XML Schema Datatypes, e.g.: SubClassOf(Adult DataSomeValuesFrom(age DatatypeRestriction(xsd:integer minInclusive "18"^^xsd:integer)) – Simple relationships between values of functional data-valued properties, e.g.: DataSomeValuesFrom(shoeSize IQ greaterThan)
What’s New in OWL 1.1?
• Four kinds of features:
Metamodelling and annotations
– Names can be used as any or all of an individual, a class, or a property – Allows for a restricted form of metamodelling (“punning”), e.g.: subClassOf(SnowLeopard BigCat) ClassAssertion(SnowLeopard EndangeredSpecies) – Annotations of axioms as well as entities ClassAssertion(Comment(“source: WWF”) SnowLeopard EndangeredSpecies)
What’s New in OWL 1.1?
• Four kinds of features:
Syntactic sugar
(make things easier to say) – Disjoint unions, e.g.: DisjointUnion(Element Earth Wind Fire Water) – Negative assertions, e.g.: NegativeObjectPropertyAssertion(Ian hasChild Mary) NegativeDataPropertyAssertion (Ian hasAge 21)
• •
Tractable Fragments
OWL defines only one fragment (OWL Lite) – And it isn’t very tractable! OWL 1.1 defines several different fragments with
useful computational properties
– E.g., reasoning complexity in range LOGSPACE to PTIME – Smaller fragments implementable using RDBs
Tractable Fragments
Summary
• • • • •
Semantic Web
aims to make web content more accessible to automated processes – Adds semantic annotations to web resources
OWL Ontologies
provide vocabulary for annotations – Terms have well defined meaning
OWL
now being used in a wide range of applications – e-Science, medicine, geography, geology, …
Reasoning
enabled tools are of crucial importance – For both design and deployment of ontologies
Active research area
– Expressive power, scalability, methodologies, tools, …
Thank you for listening
Thank you for listening
FRAZZ:
© Jeff Mallett/Dist. by United Feature Syndicate, Inc.