Semantic Basics: Markup, Querying, and Reasoning Marlon Pierce Community Grids Lab Indiana University With Slides and Help from Sean Bechhofer, Carole Goble, Line Pouchard, and Dave.

Download Report

Transcript Semantic Basics: Markup, Querying, and Reasoning Marlon Pierce Community Grids Lab Indiana University With Slides and Help from Sean Bechhofer, Carole Goble, Line Pouchard, and Dave.

Semantic Basics: Markup,
Querying, and Reasoning
Marlon Pierce
Community Grids Lab
Indiana University
With Slides and Help from Sean Bechhofer,
Carole Goble, Line Pouchard, and Dave De
Roure
Preface: Beyond XML
Reductio ad Absurdum

“Physics is the study of the harmonic
oscillator.”
• H. L. Richards

“Statistical Mechanics is the study of
the Ising Model”
• H. L. Richards

“Web Service standards are the
study of <xsd:any> sequences”
• M. E. Pierce, soon to be anonymous
Which Web Service Specs?
<xs:element name="Header"
type="tns:Header" />
<xs:complexType
name="Header">
<xs:sequence>
<xs:any
namespace="##any"
processContents="lax"
minOccurs="0"
maxOccurs="unbounded" />
</xs:sequence>
<xs:anyAttribute
namespace="##other"
processContents="lax" />
</xs:complexType>
<xsd:complexType
name="SecurityHeaderType"
>
<xsd:sequence>
<xsd:any
processContents="lax"
minOccurs="0"
maxOccurs="unbounded">
</xsd:any>
</xsd:sequence>
<xsd:anyAttribute
namespace="##other"
processContents="lax" />
</xsd:complexType>
Which, What, and Why?

Which is what?
• Left is the definition of the SOAP header.
• Right is taken from Web Service Secure Messaging
Specification.
• You will find this pattern repeated pretty often in web
service specifications.

Why?
• We have limited ways of linking several XML schema
data models.

Imagine schemas for science applications and computing
resources.
• XML maps relationships to trees.


Link application and computer schemas with <xsd:any>.
In my application+computer schema, does application
contain computer as child node, or vice versa?
• Graphs are a more natural way of expressing many
inter-relationships of concepts.
XML is not enough




“The Creator of the Resource
XML defines
“http://www.w3.org/Home/Lassila”
grammars to verify
is Ora Lassila
and structure
documents
Creator Ora Lassila
http://www.w3.org/Home/Lassila
The grammar
enforces constraints
on tags
<Creator>
Different grammars
<uri> http://www.w3.org/Home/Lassila </uri>
<name>Ora Lassila</name>
define the same
</Creator>
content
XML lacks a
<Document uri=“http://www.w3.org/Home/Lassila”
semantic model – it
<Creator>Ora Lassila</Creator>
</Document>
only has a surface
model which is a
tree.
<Document uri=“http://www.w3.org/Home/Lassila” Creator=“Ora Lassila”/>
XML is not enough
Meaning of XML documents is intuitively clear
• “semantic” markup tags are domain terms

But computers do not have intuition
• Tag names per se do not provide semantics
• The semantics are encoded outside the XML
specification

XML makes no commitment on:

Domain specific ontological vocabulary

Ontological modeling primitives
 requires pre-arranged agreement on  & 
Feasible for closed collaboration
• agents in a small & stable community
• pages on a small & stable intranet


Semantic Web Markups often are expressed
in XML but they carry extra meaning.
Enter the Semantic Web/Grid
“The Semantic Web is the
representation of data on the World
Wide Web. It is a collaborative
effort led by W3C with participation
from a large number of researchers
and industrial partners. It is based
on the Resource Description
Framework (RDF), which integrates
a variety of applications using XML
for syntax and URIs for naming.”
The Semantic Stack
XML
XML
Schema
Defines the syntax for structured
documents.
Defines rules for XML dialects (SVG,
GML, etc.) and also built-in data
types.
RDF
A data model definition language with
XML bindings
RDF
Schema
A way to define RDF-based languages
(DAML-OIL, OWL).
OWL
An extension of RDF/RDFS with
extensive property/relationship
definitions for expressing logical
relationships.
Semantic Markups

All semantic markup languages
should be understood as assertion
languages.
• We will assert that certain relationships
between resources exist.
• We will express this using RDF, RDFS,
and OWL using XML

We must still provide tools for
processing (and verifying) the
assertions.
Resource Description
Framework
Overview of RDF basic ideas
and XML encoding.
Resource Description Framework
(RDF)


RDF is the simplest of the semantic languages.
Basic Idea #1: Triples
• RDF is based on a subject-verb-object statement
structure.
• RDF subjects are called resources (classes)
• Verbs (predicates) are called properties.
• Objects (values) may be simple literals or other
resources.

Basic Idea #2: Everything is a resource that is
named with a URI
•
•
•
•
RDF nouns, verbs, and objects are all labeled with URIs
Recall that a URI is just a name for a resource.
It may be a URL, but not necessarily.
A URI can name anything that can be described

Web pages, creators of web pages, organizations that the
creator works for,….
RDF Graph Model





RDF is defined by a graph model.
Resources are denoted by ovals (nodes).
Lines (arcs) indicate properties.
Squares indicate string literals (no URI).
Resources and properties are labeled by a URI.
http://.../CMCS/Entries/X
http://purl.org/dc/elements/1.1/creator
http://purl.org/dc/elements/1.1/title
H2O
http://.../CMCS/People/DrY
Encoding RDF in XML

The graph represents two statements.
• Entry X has a creator, Dr. Y.
• Entry X has a title, H2O.

In RDF XML, we have the following tags
• <RDF> </RDF> denote the beginning and end of the
RDF description.
• <Description>’s “about” attribute identifies the subject
of the sentence.
• <Description></Description> enclose the properties and
their values.
• We import Dublin Core conventional properties (creator,
title) from outside RDF proper.
RDF XML: The Gory Details
<rdf:RDF
xmlns:rdf='http://www.w3.org/1999/02/2
2-rdf-syntax-ns#'
xmlns:dc='http://purl.org/dc/elements/1.0
/'>
<rdf:Description rdf:about='http://.../X‘>
<dc:creator
rdf:resource='http://…/people/MEP‘/>
<dc:title rdf:resource='H2O'/>
</rdf:Description>
</rdf:RDF>
Encoding RDF as Triplets


In addition to graphs and XML, RDF
may be written as triple “sentences”.
A triple is just the subject, predicate,
and object (in that order) of a graph
segment.
<http://.../CMCS/Entries/X>http://purl.org/dc/ele
ments/1.1/creator<http://.../CMCS/People/DrY>
• This structure may look trivial but is
useful in expressing queries (more
later).
Creating RDF Documents

Writing RDF XML (or DAML or OWL) by
hand is not easy.
• It’s a good way to learn to read/write, but
after you understand it, automate it.

Authoring tools are available
• OntoMat: buggy
• Protégé: preferred by CGL grad students
• IsaViz: another nice tool with very good
graphics.

You can also generate these
programmatically using Hewlett Packard
Labs’ Jena toolkit for Java.
• This is what I did in previous example.
What is the Advantage?

So far, properties are just conventional URI names.
• All semantic web properties are conventional assertions about
relationships between resources.
• RDFS and OWL will offer more precise property capabilities.

But there is a powerful feature we are about to explore…
• Properties provide a powerful way of linking different RDF
resources


“Nuggets” of information.
For example, a publication is a resource that can be
described by RDF
• Author, publication date, URL are all metadata property values.
• But publications have references that are just other
publications
• DC’s “hasReference” can be used to point from one publication
to another.

Publication also have authors
• An author is more than a name
• Also an RDF resource with collections of properties

Name, email, telephone number,
Graph Model Depicting vCard and
DC Linking
[email protected]
http://.../CMCS/Entry/1
dc:title
H20
dc:creator
vcard:EMAIL
http://.../People/DrY
vcard:N
vcard:Given
vcard:Family
What Else Does RDF Do?

Collections: typically used as the object of an
RDF statement
• Bag: unordered collection of resources or literals.
• Sequence: ordered collection or resources or literals.
• Alternative: collection of resources or literals, from
which only one value may be chosen

And that’s about it. RDF does not define
properties, it just tells you where to put them.
• Definitions are done by specific groups for specific fields
(Dublin Core Metadata Initiative, for example).
• RDF Schema provides the rules for defining specific
resources classes and properties.

But the graph model has opened some doors
• Linked querying across data models.
• Reasoning about information
RDF Schema
RDF Schema

RDF Schema is a rules system for building RDF
languages.
• RDF and RDFS are defined in terms of RDFS
• DAML+OIL and OWL are defined by RDFS.

Take our Dublin Core RDF encoding as an
example:
• Can we formalize this process, defining a consistent set
of rules?

Previous example was valid RDF but how do I formalize the
process of writing sentences about creators of entries?
• Can we place restrictions and use inheritance to define
resources?

What really is the value of “creator”? Can I derive it from
another class, like “person”?
• Can we provide restrictions and rules for properties?

How can I express the fact that “title” should only appear
once?
• Current DC encoding in fact is defined by RDFS.
Some RDFS Classes (Subjects and Values)
RDFS: Resource
The RDFS root element. All
other tags derive from
Resource
RDFS: Class
The Class class. Literals and
Datatypes are example
classes. Classes consist of
entities that share
properties.
RDFS: Literal
The class for holding Strings
and integers. Literals are
dead ends in RDF graphs.
RDFS: Datatype
A type of data, a member of
the Literal class.
RDFS: XMLLiteral
A datatype for holding XML
data.
RDFS:Property
This is the base class for all
properties (that is, verbs).
Some RDFS Properties
subClassOf
Indicates the subject is a
subclass of the object in a
statement.
subPropertyOf
The subject is a subProperty
of the property
(masquerading as an
object).
Domain
Restricts a property to only
apply to certain classes of
subjects
Range
Restricts the values of a
property to be members of
an indicated class or one of
its subclasses.
type
Denotes an instance of a
particular class. Actually
from RDF, not RDFS.
Sample RDFS: Defining
<Property>
<rdfs:Class rdf:ID=“Property">
<rdfs:isDefinedBy rdf:resource="http://.../some/uri"/>
<rdfs:label>Property</rdfs:label>
<rdfs:comment>The class of RDF properties.</rdfs:comment>
<rdfs:subClassOf rdf:resource="http://.../#Resource”>
</rdfs:Class>



This is the definition of <property>, taken from the RDF
schema.
The “about” attribute labels names this nugget.
<property> has several properties
• <label>,<comment> are self explanatory.
• <subClassOf> means <property> is a subclass of <resource>
• <isDefinedBy> points to the human-readable documentation.
Property Relationships and Simple
Reasoning

subClassOf:
• Carole is a member of the class <Professor>
• <Professor> is a subclass of
<UniversityEmployee>
• So Carole works for a university.

subPropertityOf:
• Marlon hasSibling Susan
• hasSibling is a subclass of hasRelative
• So Marlon and Susan are related.

Domain and Range:
• hasSibling applies to animal subjects and
animal objects, so Marlon is a member of the
class <Animal>.
Web Ontology Language
(OWL)
Eeyore: W-O-L. That spells owl.
Owl: Bless my soul! So it does!
(Many Slides Courtesy of Sean
Bechhofer)
What’s an Ontology?

English definitions tend to be vague to
non-specialists
• “A formal, explicit specification of a shared
conceptionalization”

Clearer definition: an ontology is a
taxonomy combined with inference rules
• T. Berners-Lee, J. Hendler, O. Lassila

But really, if you sit down to describe a
subject in terms of its classes and their
relationships, you are creating an
Ontology.
RDFS Limitations

RDFS too weak to describe resources in
sufficient detail
• No localised range and domain constraints

Can’t say that the range of hasChild is person when applied
to persons and elephant when applied to elephants
• No existence/cardinality constraints

Can’t say that all instances of person have a mother that is
also a person, or that persons have exactly 2 parents
• No transitive, inverse or symmetrical properties


Can’t say that isPartOf is a transitive property, that hasPart
is the inverse of isPartOf or that touches is symmetrical
Difficult to provide reasoning support
• No “native” reasoners for non-standard semantics
• May be possible to reason via FO axiomatisation
OWL Semantic Layering

Three language “layers”:
• OWL Lite

A subset of OWL useful for expressing
classifications and simple relationships
Full
Contains all OWL constructions but with limitations
that guarantee computational completeness and
decidability.
DL
• OWL DL (Description Logic)

• OWL Full



All OWL constructs with no restrictions but no
guaranteed processibility.
Syntactic Layering
Semantic Layering
• Layers should agree on semantics.
• All legal Lite ontologies are legal DL ontologies.
• All legal DL ontologies are legal Full ontologies
Lite
OWL Lite Synopsis

Built on RDFS, with usual RDFS classes
(see previous table in these slides).
• Includes a special class, <Thing>, that is the
superclass of all OWL classes.
• Built in class <Nothing> that is the most
specific class (has no instances or subclasses).
• Built-in class <Individual> for instances of
classes.




In OWL, properties may apply to either individuals or
to all members of a class.
So <worksForIU> applies to Marlon but not Dave.
Expresses concepts such as equivalent
classes, synonymous properties.
Allows you to assert that properties can be
inverse, transitive, and symmetric.
Some OWL DL and OWL Full
Extensions

Class Axioms:
• oneOf: a class can be defined by its
members (ex: daysOfWeek defined by
members)

An Enumeration class
• disjointWith

More Boolean Relationships:
• unionOf, complementOf, intersectionOf

Unrestricted cardinality
• Ex: daysOfWeek as cardinality of 7
Differences Between DL and Full

Both DL and Full use the same OWL vocabulary
• See previous slide.

Difference #1: DL classes and properties cannot also be
individuals (instances), and vice versa.
• That is, there is a strict separation between type and
subClassOf.
• So if you use <Merlot> as <rdf:type> of <Wine>, you can’t
subclass <Merlot> to add additional properties in OWL DL.
• “subClass versus instance” decisions should be made based on
the intended use of the ontology.


Don’t make Merlot an instance if you are developing an ontology to
describe your wine collection, which consists of many bottles of
Merlot (instances), and you want to use OWL DL
Difference #2: All DL properties are required to be either
• owl:ObjectProperty: used to connect instances of two
classes.
• owl:DataTypeProperty: used to connect class instances with
XML schema types and RDF literal strings.
• (OWL Full allows us to tag DataTypeProperties as
owl:InverseFunctionalProperty, so we can create a string
literal instance that uniquely identifies a class instance. )
An OWL Example
An Earth Systems Grid
example
(Courtesy of Line Pouchard)
An Example Ontology: Climate
Data


The example shows how to construct a really
simple ontology and instance.
We don’t use it to encode all data but rather to
encode metadata about data files.
• Where is the data file (URI) that has the temperature
associated with this dataset?

Two classes:
• dataset
• Parameter

One property:
• hasParameter


Several parameters: cloud_medium,
bounds_latitude, temperature
Line Pouchard (ORNL) created this for ESG using
Protégé and OilEd.
Let’s Begin

Front matters: OWL ontologies begin with the
<Ontology> header.
• A useful place to put metadata about the document.
• Line uses the Dublin Core to establish authorship.

Next, define two classes: dataset and parameter.
• Class definitions are almost trivial.
• We really state what something is by its properties.


Deep philosophical arguments here, I’m sure.
Most of the work will go into defining the
property, hasParameter.
• Begins on bottom of next slide
• But the full extent of the definition requires a separate
slide.
Ontology header
With Dublin Core
Parameters.
Class Definitions
hasParameter
Definition
Defining hasParameter


hasParameter domain: it applies to the dataset
class.
hasParameter range: it applies to a list of 3 OWL
Things
• Cloud_medium, bounds_latitude, and temperature.
• This is done using the awkward RDF list structure.


“Give me the first of the rest recursively until I get to nil”
These three OWL Things are then defined.
• They are each of type “parameter”

That is, members of the parameter class.
• Each may also be further defined by additional
properties and classes.

Temperature has units, for example, bounds_latitude needs
starting and stopping values in decimal degrees,etc.
• Or it may be out of scope. I may just need to know that
the bounds_latitude for particular dataset is located in
some resource with a specific URI.
Parameter:
Cloud_medium
Parameter:
Bounds_latitude
Parameter:
temperature
Finally, Apply It to Something

What is the file PCM.B06.10.dataset1?
• It’s a member of the dataset class, which we
have defined.

What properties does it have?
• bounds_latitude and cloud_medium, as all
such members do.

Where can I get the bounds_latitude for
this data set?
• It’s in the file indicated by the rdf:resource.
OWL Enriched RDF
Metadata about
PCM.B06.10.dataset1
Is It Lite, DL, or Full?

Our ontology example is (at least)
DL because we include the oneOf
property.
OWL Equivalence and Inheritance
<owl:Class rdf:ID=”user”>
<owl:equivalentClass
rdf:resource=”person”>
<owl:Class>
Other logical relationships
that can be asserted:
•inverseOf,
•TransitveProperty,
•SymmetricProperty,
•FunctionalProperty,
•InverseFunctionalProperty
<owl:Class
rdf:about=”#magneticSpe
ctrometer”>
<rdfs:subClassOf>
<owl:Restriction>
<owl:onProperty
rdf:resource=”#hasMagnet
s”>
<owl:allValuesFrom
rdf:resource=”#Spectrome
ter”>
</owl:Restriction>
</rdfs: subClassOf>
</owl:Class>
Illustration of Inverse Properties
Querying Semantic Data
The Data Access Working
Group (DAWG)
What Is Semantic Querying?


Don’t confuse
querying with
inference.
Querying just means
retrieving data from
Semantic data
models.
• Post a query to the
world of distributed RDF
data nuggets.

For RDF-like
structures, this
amounts to querying
triples

Examples
• Finding an Email
address from a person’s
vCard.
• Searching across
subgraphs: get me the
email of the author of
this document (Dublin
Core + vCard).
• Persistent/scheduled
queries on updates to
several multimedia
databases.
The DAWG Working Group

Unfortunately, there are no standards for
querying RDF, etc.
• There are solutions, like RDQL/SquishQL
• These are just not “official”

The W3C Data Access Working Group
DAWG is filling the query gap.
• Formed Feb 2004.

This is a work in progress:
• Use Cases and Requirements:
http://www.w3.org/TR/rdf-dawg-uc/
• BRQL Query Language:
http://www.w3.org/2001/sw/DataAccess/rq23/
A Simple Query

Consider the following RDF triple
• <http://example.org/book/book1>
<http://purl.org/dc/elements/1.1/title> "BRQL
Tutorial“
• Recall this is equivalent to the sentence “book1
[has] title ‘BRQL Tutorial’”
• We may have a large set of such triples in our
data store.

We want to make a query on this data like
this: “What is the title of book1?”
The Query and the Results

We can construct queries on any of the
parts of the triple, such as
SELECT ?title
WHERE { <http://example.org/book/book1>
<http://purl.org/dc/elements/1.1/title> ?title .
}

Thus just means “what is the title of
book1?”
?title = "BRQL Tutorial“
So What?


This was a trivial example in which we
posed a query on the triple’s object, which
was a string.
But the object of the triple may be a URI
(an RDF resource), not just a literal.
• Or we may construct queries against subjects
or verbs of triples.


For complicated graphs, this means that
the query returns a “pointer” to another
section of the graph.
This means that we can make linked
queries that allow us to navigate graphs.
Linked Queries Across Graph
Sections
[email protected]
http://.../CMCS/Entry/1
dc:creator
dc:title
H20
What is the given name
of the creator of Entry 1?
vcard:EMAIL
http://.../People/DrY
vcard:N
vcard:Given
vcard:Family
What If You Can’t Wait?



BRQL is still a work in progress.
If you need something now, there is
Jena’s RDQL.
RDQL allows you to pose triplet
queries similar BRQL
• Jena has a programming interface that
allows you to construct and execute
these queries against RDF.
Tools for Playing with Things

Jena Toolkit: Java packages from HPLabs
for building Semantic Web applications.
• http://www.hpl.hp.com/semweb/
• Both IsaViz and Protégé use this.

IsaViz: A nice authoring/graphing tool
• http://www.w3.org/2001/11/IsaViz/

Protégé: Another ontology authoring tool
• http://protege.stanford.edu/

SiRPAC
• Allows you to parse RDF, convert RDF/XML into
graphs and triplets.
• http://www.w3.org/RDF/Validator/
Other Tutorials

Original Semantic Grid GGF tutorial
material is here:
• http://www.semanticgrid.org/presentations/on
tologies-tutorial/

Beginner and Advanced OWL tutorials are
here:
• http://www.co-ode.org/resources/
• Lectures cover working examples (pizza
ontology) built with Protégé.
• http://www.semanticgrid.org/presentations/on
tologies-tutorial/
Advanced OWL Tutorial
Courtesy of Sean Bechhofer
OWL Syntaxes

Abstract Syntax
• Used in the definition of the language and the
DL/Lite semantics

OWL as RDF triples (and thus as, e.g.
RDF/XML or N3)
• the “official” concrete syntax
• mapping rules describe how to translate from
abstract syntax to triples.

XML Presentation Syntax
• XML Schema definition
OWL Ontologies

An OWL ontology consists of a number of
Classes, Properties and Individuals
• All identified via URIs.

Classes
• Have “definitions” providing their characteristics

Properties
• Characteristics such as transitivity or functionality
• Domains and Ranges

Individuals
• Class membership
• Relationships to other individuals
• Concrete values.
XML Datatypes in OWL



OWL supports XML Schema primitive datatypes
Clean separation between ”object” classes and
datatypes
Philosophical reasons:
• Datatypes structured by built-in predicates
• Not appropriate to form new datatypes using ontology
language

Practical reasons:
• Ontology language remains simple and compact
• Implementability not compromised – can use hybrid
reasoner
OWL Class constructors


OWL has a number of operators for
constructing class expressions.
Boolean operators
• and, or, not

Restrictions
• slot fillers with explicit quantification

Enumerated Classes.
• explicit enumerations of the class
members
OWL Class Constructors
Constructor
Example
Classes
Human
intersectionOf
(and)
intersectionOf(Human Male)
unionOf (or)
unionOf(Doctor Lawyer)
complementOf
(not)
complementOf(Male)
oneOf
oneOf(john mary)
someValuesFrom
restriction(hasChild someValuesFrom
Lawyer)
allValuesFrom
restriction(hasChild allValuesFrom Doctor)
minCardinality
restriction(hasChild minCardinality (2))
maxCardinality
restriction(hasChild maxCardinality (2))
OWL Class constructors

The operators have an associated
semantics
• Given in terms of a domain:

D
• and an interpretation function I



I:concepts ! (D)
I:properties ! (D £ D)
I:individuals ! D
• I is then extended to concept
expressions.
OWL Constructor Semantics
Construc
tor
Example
Semantics
Classes
Human
I(Human)
I(Human) Å
I(Male)
I(Doctor) [
unionOf
unionOf(Doctor
Lawyer)
I(Lawyer)
compleme complementOf(Male) D n I(Male)
ntOf
intersectio intersectionOf(Huma
nOf
n Male)
oneOf
oneOf(john mary)
{I(john),
I(mary)}
OWL Constructor Semantics
Constructor
Example
Semantics
someValuesFr restriction(hasChild {xj9y.hx,yi2I(hasChild)Æ
y2I(Lawyer)}
om
someValuesFrom
Lawyer)
allValuesFrom restriction(hasChild {xj8y.hx,yi2I(hasChild) )
y2I(Doctor)}
allValuesFrom
Doctor)
minCardinalit
y
restriction(hasChild {x|# hx,yi2I(hasChild) ¸
minCardinality
2}
(2))
maxCardinalit restriction(hasChild {x|# hx,yi2I(hasChild) ·
y
maxCardinality
2}
(2))
OWL Axioms

Axioms allow us to add further statements about
arbitrary concept expressions and properties
• Disjointness, equivalence, transitivity of properties etc.

An interpretation is then a model of the axioms iff
it satisfies every axiom in the ontology.
Axiom
Example
Semantics
EquivalentClass
es
EquivalentClass(Man
intersectionOf(Human
Male))
I(Man) = I(Human) Å
I(Male)
DisjointClasses
DisjointClasses(Animal
Plant)
I(Animal) Å I(Plant) = ;
SameIndividual
As
SameIndividualAs(Geor
geWBush
PresidentBush)
I(GeorgeWBush) =
I(PresidentBush)
Basic Inference Tasks

Inference can now be defined w.r.t.
interpretations/models.
• C subsumes D w.r.t. K iff for every model I of K, I(D)
µ I(C)
• C is equivalent to D w.r.t. K iff for every model I of K,
I (C) = I (D)
• C is satisfiable w.r.t. K iff there exists some model I
of K s.t.
I (C)  ;

Querying knowledge
• x is an instance of C w.r.t. K iff for every model I of
K, I(x) 2 I(C)
• hx,yi is an instance of R w.r.t. K iff for, every model I
of K,
(I(x),I(y)) 2 I(R)
Why Reasoning?

Why do we want it?
• Semantic Web aims at “machine understanding”
• Understanding closely related to reasoning

Given key role of ontologies in the Semantic Web, it will be
essential to provide tools and services to help users:
• Design and maintain high quality ontologies, e.g.:




Meaningful — all named classes can have instances
Correct — captured intuitions of domain experts
Minimally redundant — no unintended synonyms
Richly axiomatised — (sufficiently) detailed descriptions
• Answer queries over ontology classes and instances, e.g.:


Find more general/specific classes
Retrieve annotations/pages matching a given description
• Integrate and align multiple ontologies
Why Decidable Reasoning?


OWL DL constructors/axioms restricted so reasoning is
decidable
Consistent with Semantic Web's layered architecture
• XML provides syntax transport layer
• RDF(S) provides basic relational language and simple
ontological primitives
• OWL DL provides powerful but still decidable ontology
language
• Further layers may (will) extend OWL


Will almost certainly be undecidable
Facilitates provision of reasoning services
• Known “practical” algorithms
• Several implemented systems
• Evidence of empirical tractability

Understanding dependent on reliable & consistent
reasoning
Other Links
XML Primer
General characteristics of XML
Basic XML






XML consists of human
readable tags
Schemas define rules for a
particular dialect.
XML Schema is the root,
defines the rules for
making other XML
schemas.
Tree structure: tags must
be closed in reverse order
that they are opened.
Tags can be modified by
attributes
• name, minOccurs
Tags enclose either strings
or structured XML
<complexType name="FaultType">
<sequence>
<element name="FaultName"
type="xsd:string" />
<element name="MapView/>
<element name="CartView“/>
<element name="MaterialProps"
minOccurs="0" />
<choice>
<element name="Slip" />
<element name="Rate" />
</choice>
</sequence>
</complexType>
Namespaces and URIs



XML documents can be
composed of several
different schemas.
Namespaces are used to
identify the source schema
for a particular tag.
• Resolves name
conflicts—”full path”
Values of namespaces are
URIs.
• URI are just structured
names.
 May point to
something not
electronically
retrievable
• URLs are special cases.
<xsd:schema
xmlns:xsd="http://www.w
3.org/2001/XMLSchema"
xmlns:gem="http://comm
grids.indiana.edu/GCWS/S
chema/GEMCodes/Faults”>
<xsd:annotation>
…
</xsd:annotation>
<gem:fault>
…
</gem:fault>
</xsd:schema>
Metadata and the Dublin
Core
Define metadata and describe
its use in physical and
computer science.
What is Metadata?


Common definition: data about data
“Traditional” Examples
• Prescriptions of database structure and contents.
• File names and permissions in a file system.
• HDF5 metadata: describes scientific/numerical data set
characteristics such as array sizes, data formats, etc.


Metadata may be queried to learn the
characteristics of the data it describes.
Traditional metadata systems are functionally
tightly coupled to the data they describe.
• Prescriptive, needed to interact directly with data.
Descriptive Metadata and the Web

Traditional metadata concepts must be extended
as systems become more distributed, information
becomes broader
• Tight functional integration not as important
• Metadata used for information, becomes descriptive.
• Metadata may need to describe resources, not just data.

Everything is a resource
• People, computers, software, conference presentations,
conferences, activities, projects.

We’ll next look at several examples that use
metadata, featuring
• Dublin Core: digital libraries
• CMCS: chemistry
The Dublin Core: Metadata for
Digital Libraries

The Dublin Core is a set of simple
name/value properties that can describe
online resources.
• Usually Web content but generally usable
(CMCS)
• Intended to help classify and search online
resources.


DC elements may be either embedded in
the data or in a separate repository.
Initial set defined by 1995 Dublin, Ohio
meeting.
Thought Experiment: Construct
Your Own Metadata Set


Describe yourself: your occupation, your
interests, your place of residence, your
parents, spouse, children,….
Take each sentence:
• The verbs become properties
• The verbs’ objects are property values.


Metadata is just a collection of these
name/value pairs.
For particular fields (like publishing), we
can define a conventional set of property
names.
The Dublin Core: Metadata for
Digital Libraries

The Dublin Core is a set of simple
name/value properties that can describe
online resources.
• Usually Web content but generally usable
(CMCS)
• Intended to help classify and search online
library resources.
• Digital library card catalog.


DC elements may be either embedded in
the data or in a separate repository.
Initial set defined by 1995 Dublin, Ohio
meeting.
Dublin Core Elements

Content elements:
• Subject, title, description, type, relation,
source, coverage.

Intellectual property elements:
• Contributor, creator, publisher, rights

Instantiation elements:
• Date, format, identifier, language

In RDF, these are called properties.
Encoding the Dublin Core


DC elements are independent of the
encoding syntax.
Rules exist to map the DC into
• HTML
• RDF/XML

We provide more detailed info on
RDF/XML encoding in this seminar.
Sample RDF/HTML
<head>
<title>Expressing Dublin Core in HTML/XHTML
meta and link elements</title>
<meta name="DC.title" content="Expressing
Dublin Core in HTML/XHTML meta and link
elements" />
<meta name="DC.creator" content="Andy
Powell, UKOLN, University of Bath" />
<meta name="DC.type" content="Text" />
</head>
Where Do I Put the Dublin Core
Metadata?

Dublin core elements may be placed
directly in HTML pages.
• Still need DC-aware crawlers or
applications to find and use them.

Or you may have a large database
on DC entries that are used by DCaware applications.
• We’ll examine a WebDAV-based scheme
for chemistry in a second.
Dublin Core Element Refinements



Many of these, and extensible
See
http://dublincore.org/documents/dc
mi-terms/ for the comprehensive list
of elements and refinements
Examples:
• isVersionOf, hasVersion, isReplacedBy,
references, isReferencedBy.
OWL DL

Use of OWL vocabulary restricted
• Can’t be used to do “nasty things” (i.e.,
modify OWL)
• No classes as instances

Standard DL/FOL model theory
(definitive)
• Direct correspondence with (first order)
logic
• Reasoning via DL engines

Some problems with oneOf/inverse
• Reasoning for full language via FOL engines

Would need built in datatypes for
performance
DL
OWL Full

No restriction on use of OWL
vocabulary
(as long as legal RDF)
• Classes as instances
• Assertions about vocabulary

RDF style model theory
• Reasoning using FOL engines

via axiomatisation
• Semantics should correspond with
OWL DL
for suitably restricted KBs
Full
XML for Knowledge Representation
1.
2.
3.


Definition of self-describing data in
worldwide standardized, non-proprietary
format.
Structured data and knowledge exchange
for enterprises in various industries.
Integration of information from different
sources to uniform documents.
Exchange of knowledge bases between
different AI languages, knowledge bases
and databases, application systems, etc.
But….
History: From RDF to OWL

Two languages developed by extending (part of) RDF
• OIL: developed by group of (largely) European researchers
• DAML-ONT: developed by group of (largely) US researchers
(in DARPA DAML programme)

Efforts merged to produce DAML+OIL
• Development was carried out by “Joint EU/US Committee on
Agent Markup Languages”
• Extends (“subset” of) RDF

DAML+OIL submitted to W3C as basis for standardisation
• Web-Ontology (WebOnt) Working Group formed
• WebOnt group developed OWL language based on DAML+OIL
• OWL language now a W3C Recommendation (Feb 2004)
RDFS Takeaway

RDFS defines a set of classes and properties that
can be used to define new RDF-like languages.
• RDFS actually bootstraps itself.


You can express inheritance, restriction
If you want to learn more, see the specification
• http://www.w3.org/TR/2003/WD-rdf-schema20030123/

But don’t trust the write up:
• Concepts are best understood by looking at the RDF
XML. English descriptions get convoluted.

If you want to see RDFS in action, see the DC:
• http://dublincore.org/2003/03/24/dces#
Web Ontology Language
Requirements
Desirable features identified for Web Ontology Language:

Extends existing Web standards
• Such as XML, RDF, RDFS

Easy to understand and use
• Should be based on familiar KR idioms

Of “adequate” expressive power

Formally specified

Possible to provide automated reasoning support
Short History of Description
Logics
Phase 1:
• Incomplete systems (Back, Classic, Loom, . . . )
• Based on structural algorithms
Phase 2:
• Development of tableau algorithms and complexity
results
• Tableau-based systems for Pspace logics (e.g., Kris,
Crack)
• Investigation of optimisation techniques
Phase 3:
• Tableau algorithms for very expressive DLs
• Highly optimised tableau systems for ExpTime logics
(e.g., FaCT, DLP, Racer)
• Relationship to modal logic and decidable fragments of
FOL
Latest Developments
Phase 4:
• Mature implementations
• Mainstream applications and Tools

Databases
• Consistency of conceptual schemata (EER, UML etc.)
• Schema integration
• Query subsumption (w.r.t. a conceptual schema)

Ontologies and Semantic Web (and Grid)
• Ontology engineering (design, maintenance,
integration)
• Reasoning with ontology-based markup (meta-data)
• Service description and discovery
• Commercial implementations

Cerebra system from Network Inference Ltd
What Does This Have to Do with
Grid Computing?

RDF resources aren’t just web pages
• Can be computer codes, simulation and
experimental data, hardware, research groups,
algorithms, ….

Consider the CMCS chemistry example
that they needed to describe the
provenance, annotation, and curation of
chemistry data.
• Compound X’s properties were calculated by
Dr. Y.


CMCS maps all of their metadata to the
Dublin Core.
The Dublin Core is encoded quite nicely as
RDF.
vCard: Representing People with
RDF Properties

The Dublin Core tags are best used to represent
metadata about “published content”
• Documents, published data

vCards are an IETF standard for representing
people
• Typical properties include name, email, organization
membership, mailing address, title, etc.
• See http://www.ietf.org/rfc/rfc2426.txt

Like the DC, vCards are independent of (and
predate) RDF but are map naturally into RDF.
• Each of these maps naturally to an RDF property
• See http://www.w3.org/TR/2001/NOTE-vcard-rdf20010222/
Example: A vCard in RDF/XML
<rdf:RDF
xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
xmlns:vcard='http://www.w3.org/2001/vcard-rdf/3.0#'>
<rdf:Description rdf:about='http://cgl.indiana.edu/people/GCF'
vcard:EMAIL='[email protected]'>
<vcard:FN>Geoffrey Fox</vcard:FN>
<vcard:N
vcard:Given='Geoffrey'
vcard:Family='Fox'/>
</rdf:Description>
</rdf:RDF>
Linking vCard and Dublin Core
Resources


The real power of RDF is that you can link two
independently specified resources through the
use of properties.
We do this using URIs as universal pointers
• Identify specific resources (nouns) and specifications for
properties (verbs)
• The URIs may optionally be URLs that can be used to
fetch the information.

Linking these resource nuggets allows us to pose
queries like
• “What is the email address of the creator of this entry in
the chemical database?”
• “What other entries reference directly or indirectly on
this data entry?”

Linkages can be made at any time
• Don’t have to be designed into the system
A Simple Jena RDQL Example
Model model=new ModelMem();
Model.read(new FileReader(“a.rdf”));
String queryString = "SELECT ?x,
?fname WHERE
(?x,<http://www.w3.org/2001/vcard
-rdf/3.0#EMAIL>, ?fname)"
Query query=newQuery(queryString);
query.setSource(model);
QueryExecution qe=new
QueryEngine(query);
QueryResults results=qe.exec();
Building Semantic Markup
Languages

XML essentially defines
syntax rules for markup
languages.
• “Human readable” means
humans provide meaning



We also would like some
limited ability to encode
meaning directly within
markup languages.
The semantic markup
languages attempt to do
that, with increasing
sophistication.
Stack indicates direct
dependencies: OWL is
defined in terms of RDF,
RDFS.
Eric Miller,
http://www.w3.org/2002/Talks/www2002-w3ct-swintro-em/
Other Semantic Markup Languages

RDF Schema (RDFS):
• Provides formal definitions of RDF
• Also provides language tools for writing more
specialized languages.
• We’ll examine in more detail.

DARPA Agent Markup Language (DAML):
• DAML-OIL is the language component of the
DAML project.
• Defined using RDF/RDFS.

Web-Ontology Language (OWL):
• Developed by the W3C’s Web-Ontology
Working Group
• Based on/replaces DAML-OIL
What Are Description Logics?

A family of logic based Knowledge Representation
formalisms
• Descendants of semantic networks and KL-ONE
• Describe domain in terms of concepts (classes), roles
(relationships) and individuals

Distinguished by:
• Formal semantics (typically model theoretic)


Decidable fragments of FOL
Closely related to Propositional Modal & Dynamic Logics
• Provision of inference services


Sound and complete decision procedures for key problems
Implemented systems (highly optimised)