The Semantic Web -- an overview

Download Report

Transcript The Semantic Web -- an overview

The Semantic Web
-- an overview --
Dr Yuri A. Tijerino
Computer Science Department
Brigham Young University
The Book of Genesis tells of a great tower built by men
not only from fear of a second Flood but above all “to
make a name for themselves”. Gods’ punishment was
the Babylonian confusion of tongues, with men unable
to understand each other, the result being that the
tower was never finished.
Today’s Web
 information overload
 massive, heterogeneous data sets
 unstructured documents (e.g. e-mails)
 lack of context and meaning
 new forms of content
 software programs, sensors, ambient
devices …
 blur between content & services
 mixing Web content with “smart”
executables
Limitations of the Web Today…
The Semantic Web
 Tim Berners-Lee
 “an extension of the current web in
which information is given welldefined meaning, better enabling
computers and people to work in
cooperation”
 An open platform allowing
information to be shared and
processed automatically
 adding context and structure via
metadata
The Agent Computing Paradigm
 The old way of thinking about computer programs: a
program




begins executing
takes input
gives output
finishes executing
 The new way: programs
 interact with each other
 are always active
 should be robust (ie, able to deal
with the unexpected)
From Agents to Knowledge Markup
 Almost everything we need to know is on the
web.
 What a great resource for agents!
 But … Agents don’t understand web pages.
 Natural Language processing is too
hard for computers, and will remain so
for a long time.
 The solution: Knowledge Markup.
Knowledge Markup in a Nutshell
 A web page describes objects.
 Datasets, human beings, services, items for sale,
etc.
 The semantics of an object are defined by the place
it occupies in some domain ontology.
 The basic idea of knowledge markup is to use
ontologies to markup a web page according to the
location its objects occupy in the ontology.
 Essentially, knowledge markup is knowledge
representation done in ontologies.
Benefits of Knowledge Markup
 Agents can parse a page, and immediately
understand its semantics.
 No need for natural language processing.
 Searches can be done on concepts. The inheritance
mechanisms of the back-end knowledge base obviate
the need for keywords.
 Data and knowledge sharing.
Knowledge Markup Example (Hypothetical)
 You ask the system “Show me all universities
near the beach.”
 The UCLA page doesn’t say anything about
the beach, but it does say (through
knowledge markup) that it’s near the Pacific
Ocean.
 UCLA makes use of a geography ontology
which includes the rule “Ocean(x)
hasBeaches(x)”.
 When your search agent parses the UCLA
page, it loads in the relevant ontologies,
deduces that UCLA is near the beach, and
returns the page.
Semantic Web Vision
XML is a first step
Semantic markup
HTML  layout
XML  meaning
Metadata
within documents
not across documents
XML example
<play>
<title>The Life and Death of King John</title>
<Dramatis Personae>
<persona>The Earl of PEMBROKE</persona>
<persona>The Earl of ESSEX</persona>
……
</Dramatis Personae>
<Stagedir>SCENE England, the Court.</Stagedir>
<act>Act 1
<scene>Scene I.
<speech>
<speaker>JOHN</speaker>
<line>Now, Chatillon, what would France with us?</line>
</speech>
Resource Description Framework
(RDF)
A standard of W3C
Relationships between documents
(or parts of documents)
Can be an XML application
Consisting of triples or sentences:
subject
property or predicate (“verb”)
object
RDF & RDFS used to define ontologies
A simple example
“Tolkein wrote ‘The Lord of the Rings’ ”
hasWritten
(‘http://www.famouswriters.org/tolkein/’,
‘http://www.books.org/ISBN00001047582/’)
RDF Schema
 RDF Schema is a frame based language
used for defining RDF vocabularies.
 Introduces properties
rdfs:subPropertyOf and rdfs:subClassOf
 Defines semantics for inheritance and
transitivity.
 Introduces notions of rdfs:Domain and
rdfs:Range
 Also provides
rdfs:ConstraintProperty
RDF Schema Lexicon
from www.w3.org/TR/rdf-schema/
The Recapitulation of AI Research
 The last 5 years have seen a recapitulation of
40 years of AI history.
 Data Structures XML
 Semantic Networks RDF
 Early Frame Based Systems RDFS
As a mechanism for metadata encapsulation,
RDFS works just fine. But it is unsuited for
general purpose knowledge representation.
This is where the AI community steps in,
saying, essentially, “We know how to do this;
please let us help.”
What are Ontologies?
Ontologies provide a shared and
common understanding of a domain
a (shared) specification of a
conceptualisation
concept map
a simple example - Yahoo
Business&Economy > Finance > Banking
universal, non-consensual, manual,
changes slowly
for WWW, defined using RDF-Schema
(RDFS)
A History of Knowledge Representation

Knowledge representation (KR) is the branch of artificial
intelligence (AI) that deals with the construction,
description and use of ontologies.
 How do we model a domain for input into the
machine?

Ontology is the branch of philosophy that answers the
question
“what is there?”
 Some big names in ontology: Parmenides, Plato,
Aristotle, Kant, Pierce, Husserl

For a program to reason, it must have a conceptual
understanding of the world. This understanding is
provided by us. Thus we have to answer questions that
we’ve been considering for several thousand years.

Today, in computer science, an ontology is typically a
hierarchical collection of classes, permissible relationships
amongst those classes, and inference rules
Ontology as Taxonomy
Computer Networks
Distributed Systems
Network
OS
Client/
Server
Network Architectures
ATM
Wireless
Communication
ISDN
Ontology of People and their Roles
Employee
Manager
Expert
advises
Programme Mgr
funds
Project Mgr
Contractor
Analyst
Ontology and Logic
Reasoning over ontologies
Inferencing capabilities
X is author of Y  Y is written by X
X is supplier to Y; Y is supplier to Z

X and Z are part of the
same supply chain
Based on Description Logic
research
Example Ontology (Scientific Pedagogy)
Classes
Experimental science (ES)
Theoretical science (TS)
Good Teaching Example (GTE)
Relationships
Motivates
A particular instance of TS may motivate
an instance of ES.
Demonstrates
A particular instance of ES may
demonstrate an instance of TS.
Inference Rules
ES(X) and TS(Y) and Demonstrates(X,Y) GTE(X,Y)
The DAML Program

DAML: DARPA Agent Markup Language

Defense Advanced Research Agency (DARPA) program


Program Managers: James Hendler, Murray Burke
Begin in August 2000
 Web site: http://www.daml.org/

Goal: achieve semantic interoperability between Web pages,
databases, programs, and sensors

Integration contractor and 16 technology development teams


MIT (Tim Berners-Lee, Ben Grosof)

Stanford (Gio Weiderhold, Richard Fikes, Deborah McGuinness)

UMBC (Tim Finin)

U West Florida (Pay Hayes)

…
Yale (Drew McDermott)
 Cycorp (Doug Lenat)
 Nokia (Ora Lassila)
 Teknowledge (Bob Balzer)
Advisors: Ramanthan Guha, Peter Patel-Schneider, …
DAML+OIL = OWL
 A representation language for user-defined ontologies
 An ontology added to RDF and RDF-Schema
 Specification document:
http://www.daml.org/2001/03/daml+oil-index.html
 Expressive power analogous to:
 Description logics (e.g., CLASSIC)
 Monotonic frame languages (e.g., OKBC knowledge model)
 Designed in collaboration with the European Community
Designers of the Ontology Inference Layer (OIL)
 Basis for Web Ontology Language (OWL), the candidate W3C
standard
DAML+OIL Classes
Thing
Restriction
List
Ontology
AbstractProperty
TransitiveProperty
DatatypeProperty
UniqueProperty
UnambiguousProperty
Nothing
DAML+OIL Properties
 Equivalence
equivalentTo, sameClassAs,
samePropertyAs
 Lists
first, rest, item
 Properties
inverseOf
 Ontologies
versionInfo, imports
 Classes
disjointWith
 Defining Non-primitive classes
unionOf, disjointUnionOf, intersectionOf,
complementOf, oneOf
 Restrictions
onProperty, toClass, hasValue, hasClass,
hasClassQ
minCardinality, maxCardinality,
cardinality
minCardinalityQ, maxCardinalityQ,
cardinalityQ
Property Restrictions on Classes
<Class ID = "Person">
<comment> Person is a subclass of objects whose parents are persons. </comment>
<rdfs:subClassOf>
<daml:Restriction>
<daml:onProperty rdf:resource = “#hasParent” />
<daml:toClass rdf:resource = “#Person” />
</daml:Restriction>
</rdfs:subClassOf>
<comment > Person is a subclass of resources that have one father. </comment>
<rdfs:subClassOf>
All objects all
<daml:Restriction>
of whose
<daml:onProperty rdf:resource = “#hasFather” /> parents are
persons
All objects that
have exactly 1
father
<daml:cardinality> 1 </daml:cardinality>
</daml:Restriction>
</rdfs:subClassOf>
Person
Comments on DAML+OIL (OWL)
 Expressive power of a description logic
 Representation language for both classes and instances
 Additional expressive power needed (at least FOL)
 No rationale for excluding any axiom from an ontology that is
–
 Not a tautology
 Satisfied by the intended interpretation of the ontology
 Example of need for additional expressive power
“The magnitude of a physical quantity in a given unit of measure”
(=>
(AND (Quantity-Magnitude ?q ?u ?m)
(Quantity-Dimension ?q ?d))
(AND (type Physical-Quantity ?q) (type Unit-Of-Measure ?u)
(type Magnitude ?m) (Unit-Dimension ?u ?d)))
 May be too difficult for the Web community to understand
 Acceptance will be depend on user-friendly tools
 Ok to support development of Semantic Web technology
Issues Facing OWL: Need for Really Good
Annotation Tools
 OWL is not meant to be read or written by
human beings.
 Humans will make assertions through
intuitive user interfaces, which will
generate the appropriate OWL markup.
 In fact, the markup should “fall out” of the
activity of building a web page.
 This requires some thought.
Proof and Trust
Example 1: Focused Crawling
 Special purpose search engines will increasingly
replace all-purpose engines.
 The notion of an all-purpose search engine is
yielding to that of special-purpose engines.
 Such engines do not want to index irrelevant pages.
 Current “focused crawling” techniques employ
heuristics based on text mining, and collaborative
filtering.
 A cleaner approach would be for web sites to
describe themselves with RDF or DAML.
 An entire site map could be expressed in RDF, along
with metadata descriptions of each node in the
map.
 An agent would know precisely which of the site’s
pages are worth checking out.
Example 2: Indexing the Hidden Web
 Search engines – google, infoseek, etc. –
work by constantly crawling the web, and
building huge indexes, with entries for every
word encountered.
 But a lot of web information is not linked to
directly. It is “hidden” behind forms.
 eg www.allmovies.com allows you to search
a vast database of movies and actors. But it
does not link to those movies and actors.
You are required to enter a search term.
 A web-spider, not knowing how to interact
with such sites, cannot penetrate any deeper
than the page with the form.
Indexing the Hidden Web (Contd.)
 Now imagine that allmovies.com had some DAML
attached, which said
“I am allmovies.com. I am an interface to a vast
database of movie and actor information. If you input
a movie title into the box, I will return a page with the
following information about the movie: … If you input
an actor name, I will return a page with the following
information about the actor: …”
Indexing the Hidden Web (Contd.)
 An OWL aware spider can come to such a page and do
one of two things:
 If it is a spider for a specialized search engine, it may
ignore the site altogether.
 If not, it can say to itself: “I know some movie titles.
I’ll input them (being careful not to overwhelm the
site), and index the results (and keep on spidering
from the result pages).
 At the least, the search engine can record the fact
that
“www.allmovies.com/execperson?name=x” returns
information about the actor with name x.
Example 3: Knowledge Sharing/Corporate Memory
 Our problem: The Church is growing, and various
organizations, departments and divisions need to
collaborate and share knowledge.
 The wheel often gets reinvented.
 Our proposal:
 Build an ontology which captures gospel, family
history, education and other relevant knowledge.
 Mark up talks, scriptures, curriculum materials, etc.
according to this ontology.
 Harvest the information with OWL aware webcrawlers.
 Build OWL aware query agents.
Example 3 (Contd.)
 Leaders and members should be able to tell
the query agent the current form of their
data (e.g. a articles and a subject), their
desired output (e.g. all other related
lessons prepared by others), and get back
the series of available lessons and advice
necessary to prepare the lesson.
 We also have a chicken and egg problem
here.
 Leaders and members don’t want to invest time
in yet another knowledge technology.
 How do we do it?
DAML Example 4: ittalks.org
 www.ittalks.org will be a repository of
information about information technology
(IT) talks given at universities and research
organization across America.
 A user’s information (research interests,
schedule, constraints, etc.) will be stored on
their personal DAML page.
 When a new talk is added, the personal
agents of interested users will be notified.
 The personal agents will determine, based on
schedule, driving time, more refined interest
specifications, etc, whether or not to inform
the user.
ittalks.org (Contd.)
 Example Scenario
 You are going to be in Boston for a few days.
You enter this in your schedule, and you are
automatically notified of several talks, at several
Boston universities, that match your interests.
You select one that you would like to attend. You
get a call on your cell-phone letting you know
when it is time to leave for the talk.
The Road Ahead
 Enormous synergy between KM,
ubiquitous computing, and agents.
 Start Trek, here we come.
 The concept is clear, but many details
need to be worked out.
 Semantic Web systems can be built
incrementally.
 Start small. Even a very modest effort
can massively improve search results.
Conclusions
To conclude:
 The first version of the Web lacked a metadata
framework which was needed to describe
resources
 W3C developed RDF to provide this framework
 As well as providing an framework for metadata
applications, RDF allows software to reach beyond
individual Web sites
 The Semantic Web will be based on registries of
machine-understandable definition
 The Semantic Web standard language (OWL) was
derived from US and EC efforts (DAML+OIL)
 The Semantic Web will be difficult to achieve
 It will be expensive to provide rich interoperable
services without a Semantic Web
Find Out More (1)
Semantic Web, W3C
<http://www.w3c.org/2001/sw/>
Semantic Web Road map, Tim Berners-Lee
<http://www.w3c.org/DesignIssues/Semantic.html>
The Semantic Web, Scientific American
<http://www.sciam.com/2001/0501issue/0501berners
-lee.html>
The Semantic Web Community Portal,
<http://www.semanticweb.org/>
The Semantic Web: A Primer
<http://www.xml.com/pub/a/2000/11/01/semanticwe
b/>
All found using Google to search for “semantic Web”
Find Out More (2)
An introduction to RDF
<http://www-106.ibm.com/developerworks/xml/
library/w-rdf/>
The Semantic Web Community
Portal
<http://www.semanticweb.org/>
The Semantic Web: An
Introduction
<http://infomesh.net/2001/swintro/>