Semantic Web: the Story So Far

Download Report

Transcript Semantic Web: the Story So Far

Semantic Web The Story So Far Ian Horrocks

Oxford University Computing Laboratory

The Semantic Web

What is it?

• • Web “invented” by

Tim Berners-Lee

(amongst others) – (Conceptual) simplicity of web has contributed to success, but is also a limiting factor Tim has ambitious goals for future of the web – Objective is to overcome existing limitations “… a

consistent logical web of data

…” “… information is given

well-defined meaning

…” • This vision of the future of the Web has become known as the

Semantic Web

Why do we want it?

Many tasks are difficult or impossible using existing web: Rev. Alan M. Gates, Associate Rector of the Church of the Holy Spirit, Lake Forest, Illinois

Why do we want it?

• • • Many tasks are difficult or impossible using existing web: Complex queries involving

background knowledge

– Find information about “animals that use sonar but are neither bats nor dolphins”

, e.g., Barn Owl

Locating information in

data repositories

– Travel enquiries – Prices of goods and services – Results of human genome experiments Finding and using “

web services

” – Given DNA sequence , identify genes , determine proteins they produce, and hence biological processes they control

What is the Problem?

Consider a typical web page: • Markup consists of: – rendering information (e.g., font size and colour) – Hyper-links to related content • Semantic content is accessible to humans, but not (easily) to computers…

How Will It Work?

Add semantic annotations to web resources Dr. Alan Rector , Professor of Computer Science , University of Manchester Rev. Alan M. Gates Associate Rector , of the Church of the Holy Spirit, Lake Forest, Illinois

How Will It Work?

Now...

that

should clear up a few things around here

• •

Giving Semantics to Annotations

Agree on meaning

of a set of annotation tags • E.g., Dublin Core – Limited flexibility and extensibility – Limited number of things can be expressed

Agree on language

used to define meanings • E.g., an ontology language – Flexible and extensible • New terms can be formed by combining existing ones – Meaning (semantics) of such terms is formally specified

The Web Ontology Language OWL

• • • • •

Web Ontology Language OWL

Semantic Web led to requirement for a “web ontology language” set up Web-Ontology (

WebOnt

) Working Group – WebOnt developed

OWL

language – OWL based on earlier languages

RDF

,

OIL

and

DAML+OIL

– OWL now a W3C

recommendation

(i.e., a standard) OWL is a family of 3 languages: OWL Lite, OWL DL and OWL Full OIL, DAML+OIL and OWL (DL & Lite) based on

Description Logics

– Has facilitated development of wide range of high quality tools & infrastructure OWL now language of choice in many applications

What Are Description Logics?

A family of logic based Knowledge Representation formalisms – Descendants of

semantic networks

and

KL-ONE

– Describe domain in terms of

concepts

(classes),

roles

(properties, relationships) and

individuals

– –

Operators Names

allow for composition of complex concepts can be given to complex concepts, e.g.:

Parent u 8 hasChild .(Intelligent t

t

Athletic) )

Why (Description) Logic?

OWL exploits results of 15+ years of DL research – Well defined (model theoretic)

semantics

– Most DLs are subsets of C2, i.e., decidable fragments of FOL

Why (Description) Logic?

OWL exploits results of 15+ years of DL research – Well defined (model theoretic)

semantics

Formal properties

well understood (complexity, decidability)

I can’t find an efficient algorithm, but neither can all these famous people.

[Garey & Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, 1979.]

Why (Description) Logic?

OWL exploits results of 15+ years of DL research – Well defined (model theoretic)

semantics

Formal properties

well understood (complexity, decidability) – Known

reasoning algorithms

Why (Description) Logic?

OWL exploits results of 15+ years of DL research – Well defined (model theoretic)

semantics

Formal properties

well understood (complexity, decidability) – Known

reasoning algorithms

Implemented systems

(highly optimised)

KAON2

Pellet CEL

Class/Concept Constructors

• Concept can be thought of as a FOL formula with one free variable

Knowledge Base / Ontology Axioms

OWL RDF/XML Exchange Syntax

E.g., Parent u 8 hasChild.(Intelligent t Athletic):

• •

Ontology based Information Systems

Similar to

relational databases

– Ontology

¼

schema; instances

¼

data Some important (

dis

)

advantages

+ (Relatively) easy to maintain and update schema • Both schema and data are “self organising” + Query answers reflect both schema and data + Able to answer both intensional and extensional queries – Semantics may be counter-intuitive or even inappropriate • Open -v- closed world; axioms -v- constraints – Query answering (logical entailment) much more difficult • Can lead to scalability problems

• •

Ontology based Information Systems

Similar to

relational databases

– Ontology

¼

schema; instances

¼

data Some important (

dis

)

advantages

+ (Relatively) easy to maintain and update schema • Both schema and data are “self organising” + Query answers reflect both schema and data + Able to answer both intensional and extensional queries – Semantics may be counter-intuitive or even inappropriate • Open -v- closed world; axioms -v- constraints – Query answering (logical entailment) much more difficult • Can lead to scalability problems

Very useful, but don’t expect miracles!

Ontologies and Reasoning

• •

Support for Ontology Engineering

Developing and maintaining

quality ontolgies

is very challenging Users need

tools

and

services

, e.g., to help check if ontology is: –

Meaningful

— all named classes can have instances

• •

Support for Ontology Engineering

Developing and maintaining

quality ontolgies

is very challenging Users need

tools

and

services

, e.g., to help check if ontology is: –

Meaningful

— all named classes can have instances –

Correct

— captures intuitions of domain experts

• •

Support for Ontology Engineering

Developing and maintaining

quality ontolgies

is very challenging Users need

tools

and

services

, e.g., to help check if ontology is: –

Meaningful

— all named classes can have instances –

Correct

— captures intuitions of domain experts –

Minimally redundant

— no unintended synonyms Banana split  Banana sundae

Support for Ontology Engineering

Range of new “non-standard” services supporting, e.g.: – –

Modular design

and

integration

• What is the effect of merging O 2 into O 1 ?

• In general, check that O 1

[

O 2

²

C iff O 1

²

C for any concept C constructed using vocabulary occurring in O 1

Module Extraction

• Extract a (small) module from O about some vocabulary V capturing all “relevant” information • In general, find O ’

µ

O s.t. O ’

²

C iff O constructed using terms from V

²

C for any concept C –

Bottom-up design

• Find a (small and specific) concept describing a set of individuals • In general, find most specific C s.t. O

²

C(i 1 )

Æ

Æ

C(i n ) – Where C may be “small” and/or in a sub-language (of O )

Support for Ontology Engineering

Range of new “non-standard” services supporting, e.g.: –

Error diagnosis

and repair

Support for Query Answering

In an

Ontology

based

Information System

Query answering

¼

computing (OIS),

logical entailment

Reasoner

needed in order to answer queries, e.g.: • C is a sub-class of D iff O

² 8

x . C(x)

!

D(x) • a is an instance of C iff O

²

C(a)

OIS with no reasoner ¼ DBMS with no query engine

Example Applications

e-Science

E.g., for “in silico” investigations and “

hypothesis testing

” – Comparing data (e.g., on proteins) to (model of) biological knowledge – Characteristics of proteins captured in an ontology O • Goal is to

identify protein instances

based on characteristics

e-Science

E.g., for “in silico” investigations and “

hypothesis testing

” – Comparing data (e.g., on proteins) to (model of) biological knowledge – Characteristics of proteins captured in an ontology O • Goal is to

identify protein instances

based on characteristics – Equivalent to

answering queries

of form:

O ² P(i)? for protein P and instance i

– Result may be discovery of new kinds of protein • And these may be potential

drug targets

if unique to a pathenogen – Result may also be discovery of errors in model • Which may reflect

gaps/errors in existing knowledge

• •

Healthcare

UK NHS has a

£6.2 billion

“Connecting for Health” IT programme Key component is

Care Records Service

(CRS) – “Live, interactive patient record service accessible 24/7” – Patient

data distributed

and a national DB across local centres in 5 regional clusters, • Detailed

records

held by local service providers • Diverse

applications

support radiology, pharmacy, etc • • Applications exchange

messages

information” containing “semantically rich clinical

Summaries

sent to national database –

SNOMED-CT

ontology provides common

vocabulary

for data • Clinical data uses terms drawn from ontology

SNOMED

Over

400,000 concepts

• • • •

SNOMED

Over

400,000 concepts Schema only

— no instances Language used is a (well known)

fragment of OWL

NHS version extended with 1,000s of additional classes –

OWL reasoner

(FaCT++) used to classify and check ontology • Currently takes

¼

4 hours – 180

missing subClass relationships

were found, e.g.: • Periocular_dermatitis subClassOf Disease_of_face • Fibrin_measurement subClassOf Coagulation_factor_assay

• •

SNOMED

Vocabulary is

extensible

at point of use: “post coordination” – Users (e.g. clinicians) may add/define new vocabulary – Terminology service (reasoner) used to insert in ontology Typical new term: –

almond_allergy

´ “allergy caused_by almond” – OWL reasoner (FaCT++) used to classify new term • Takes <10 ms – Classified as a kind of “

nut allergy

” • Clearly of

crucial importance

to recognise patients with allergy caused by almond as kinds of patient with nut allergy

Recent Developments

Improving Scalability

• • •

Optimisation techniques

– Improve performance of DL reasoners, e.g., [Tsarkov et al, JAR, 2007]

New reasoning techniques

– Reduction to disjunctive Datalog [Motik et at, KR-04] – Hybrid DL-DB systems [Horrocks et al, CADE-05] – Hypertableau based algorithms [Motik et al, CADE-07]

Polynomial time algorithms

for sub-ALC logics – Graph based techniques for EL+ [Baader et al, IJCAI-05] – Database techniques for DL-Lite [Calvanese et al, AAAI-05]

Extending Tools and Infrastructure

Editors/environments

– Oiled, Protégé, Swoop, TopBraid, Ontotrack, …

• •

Extending Tools and Infrastructure

Editors/environments

– Oiled, Protégé, Swoop, TopBraid, Ontotrack, …

Reasoning systems

– Cerebra, FaCT++, Kaon2, Pellet, Racer, CEL, …

KAON2

Pellet CEL

• • •

Extending Tools and Infrastructure

Editors/environments

– Oiled, Protégé, Swoop, TopBraid, Ontotrack, …

Reasoning systems

– Cerebra, FaCT++, Kaon2, Pellet, Racer, CEL, …

Design methodologies

– Modularity, foundational ontologies, etc.

Entity Endurant Quality Substantial Perdurant Event Stative Achievement Accomplishment

Increasing Expressive Power

• • •

Database style keys

[Lutz et al, JAIR 2004]

Rule language extensions

– W3C RIF WG (see http://www.w3.org/2005/rules/) – First order extensions (e.g., SWRL) [Horrocks et al, JWS, 2005] – Hybrid language extensions, e.g., [Eiter et al, KR-04; Motik et al, ISWC-04; Rosati, JoWS, 2005] – LP/F-Logic/Common Logic [Chen et al, JLP, 1993; de Bruijn et al, WWW-05]

Other extensions

– Temporal, Fuzzy, … •

OWL 1.1

extension to OWL

• • • •

OWL 1.1

Is an

extension of OWL

– Addresses deficiencies identified by users and developers (at

OWLED workshop

) Is based on more expressive DL:

SROIQ

– (OWL is based on

SHOIN

) W3C

working group

now chartered – Will develop recommendation based on existing member submission

Already supported

by popular OWL tools – Protégé, Swoop, TopBraid, FaCT++, Pellet

What’s New in OWL 1.1?

• Four kinds of features:

More expressive logic

– qualified cardinality restrictions, e.g.: ObjectMinCardinality(2 friendOf hacker) – property chain inclusion axioms, e.g.: SubObjectPropertyOf(SubObjectPropertyChain(parent brother) uncle) – local reflexivity restrictions, e.g.: ObjectExistsSelf(likes) [for narcissists] – reflexive, irreflexive, symmetric, and antisymmetric properties, e.g.: ReflexiveObjectProperty(knows); IrreflexiveObjectProperty(husbandOf) – disjoint properties, e.g.: DisjointObjectProperties(childOf spouseOf)

What’s New in OWL 1.1?

• Four kinds of features:

More expressive datatypes

– User-defined datatypes using facets from XML Schema Datatypes, e.g.: SubClassOf(Adult DataSomeValuesFrom(age DatatypeRestriction(xsd:integer minInclusive "18"^^xsd:integer)) – Simple relationships between values of functional data-valued properties, e.g.: DataSomeValuesFrom(shoeSize IQ greaterThan)

What’s New in OWL 1.1?

• Four kinds of features:

Metamodelling and annotations

– Names can be used as any or all of an individual, a class, or a property – Allows for a restricted form of metamodelling (“punning”), e.g.: subClassOf(SnowLeopard BigCat) ClassAssertion(SnowLeopard EndangeredSpecies) – Annotations of axioms as well as entities ClassAssertion(Comment(“source: WWF”) SnowLeopard EndangeredSpecies)

What’s New in OWL 1.1?

• Four kinds of features:

Syntactic sugar

(make things easier to say) – Disjoint unions, e.g.: DisjointUnion(Element Earth Wind Fire Water) – Negative assertions, e.g.: NegativeObjectPropertyAssertion(Ian hasChild Mary) NegativeDataPropertyAssertion (Ian hasAge 21)

• •

Tractable Fragments

OWL defines only one fragment (OWL Lite) – And it isn’t very tractable! OWL 1.1 defines several different fragments with

useful computational properties

– E.g., reasoning complexity in range LOGSPACE to PTIME – Smaller fragments implementable using RDBs

Tractable Fragments

Summary

• • • • •

Semantic Web

aims to make web content more accessible to automated processes – Adds semantic annotations to web resources

OWL Ontologies

provide vocabulary for annotations – Terms have well defined meaning

OWL

now being used in a wide range of applications – e-Science, medicine, geography, geology, …

Reasoning

enabled tools are of crucial importance – For both design and deployment of ontologies

Active research area

– Expressive power, scalability, methodologies, tools, …

Thank you for listening

Thank you for listening

FRAZZ:

© Jeff Mallett/Dist. by United Feature Syndicate, Inc.

Any questions?