UML for Dummies

Download Report

Transcript UML for Dummies

®
The Semantic Web Made
Simple
David Price
December 2004
[email protected]
All Presentation Material Copyright Eurostep Group AB
Agenda
• The Current Web
– and its technologies
– How’s it work now?
• The Semantic Web
– is adding semantics
– How’s it going to work in the future?
®
All Presentation Material Copyright Eurostep Group AB
The Current Web
• Web core concepts
–
–
–
–
People read Web pages
Web page authors can control basic layout
Web pages need to link to each other
Web pages need to link to online media
• that people read, view, listen to or interpret
– People use tools that search/recall Web content (Yahoo,
Google, Lycos, their own bookmarks)
®
All Presentation Material Copyright Eurostep Group AB
What’s on a Web Page?
Categories of articles
Text that’s actually graphics
Online shopping link
Article title and link
Date and time
A photograph
Article abstract
Location, temperature and unit
®
All Presentation Material Copyright Eurostep Group AB
What we saw
• Things on the NY Times site
–
–
–
–
–
–
–
–
Text that’s actually a graphic
Categories of articles
Online shopping link
A photograph
Article title and link
Article abstract
Date and time
Location, temperature and unit
• How did we know that?
– Because we are humans who can read English and who
can interpret what we see
®
All Presentation Material Copyright Eurostep Group AB
What did the editors do?
• Determined the layout of the pages as a whole
– Should it look like a real paper? Should there be
advertising?
• Wrote the text
• Decided on navigation
– Articles categories called “International”, “National”,
“Sports”, etc.
• Article categories list items link to separate page for each
category with list of articles
– Users will have to scroll down the page to see the
headline articles
– Articles titles will link directly to separate page for each
article
®
All Presentation Material Copyright Eurostep Group AB
How did they do that?
• They used HTML and graphic images
• Hypertext Markup Language (HTML) allows
editors to
– control presentation and layout
• Paragraph, Bold, Table/Column/Row
– add links to other pages
• Hyperlink Reference
– show graphics
• Image of many types are natively supported by browsers
– link to other media that have software to present them
• music, video, PDF, documents, presentations
®
All Presentation Material Copyright Eurostep Group AB
A peek under the covers
®
All Presentation Material Copyright Eurostep Group AB
How does that work?
• HTML is a standard language
– World Wide Web consortium standardized it
• Companies have written software that reads
HTML and presents it to you
– These are Web browsers
• The presentation capabilities of HTML, the related
media and browsers are pretty powerful
®
All Presentation Material Copyright Eurostep Group AB
How does HTML really work?
• What do the browsers understand?
– <P>This is a paragraph.</P>
• Present the text “This is a paragraph.” as a new paragraph
– <A HREF=“newsitems.html”>News</A>
• Present a hyperlink of text “News” and if it’s selected present
new page from file “newsitems.html”
– <TR><TD>dog</TD><TD>cat</TD><TR>
• In the current row of the table, present text “dog” in column 1
of table and text “cat” in column 2 of table
– <IMG SRC=“p1.jpg” />
• Present an image from whatever is in the file named “p1.jpg”
®
All Presentation Material Copyright Eurostep Group AB
So, What’s the problem?
• Only a human being can read a Web page and
extract any meaning from it
– The Web browser does understand paragraph, image,
link
– The Web browser does not know it’s linking to a “News
Article” or the image is a “picture of photographs”
• It’s the meaning that’s really important
• Wouldn’t it be powerful if computers could get
some of the meaning out of Web pages?
®
All Presentation Material Copyright Eurostep Group AB
Why is it a powerful idea?
• Using our NY Times/newspaper site example…
– Suppose you were an Environmental Group
– Suppose you want to monitor news stories about the
environment or pollution
– You could write a program that searches the Web
media outlets
– That program could trigger a notification about articles
on environmental issues
– Or, it could contact members of your group in specific
locations when it finds legislation related to pollution in
particular US states
– This would save your members a lot of time searching
for themselves, wouldn’t it?
®
All Presentation Material Copyright Eurostep Group AB
The Semantic Web
• Figuring out how to get meaning out of things on
the Web using software is what “The Semantic
Web” is all about
– “using software” means “without humans doing the
interpretation”
• How would one do that?
– Clearly, HTML is not sufficient, so more powerful
languages are required
– Clearly, cannot replace everything already on the Web,
so ways to add meaning are required
– Need to combine better languages/communication,
computer science and the study of what things mean
®
All Presentation Material Copyright Eurostep Group AB
Semantics
• People have been studying what things exist and
what they mean for centuries
– This is called Philosophy
• People have been studying how people
communicate for decades
– This is called Linguistics
• People have been studying how computers can
“learn” for a few decades
– This is called Artificial Intelligence
®
All Presentation Material Copyright Eurostep Group AB
The Semantic Web
• Vision of Web “inventer” Tim Berners-Lee and
others
– Wrote an article in Scientific American in 2001
• Goals
– Go beyond processing by human beings
– Make Web content computer processable
• How?
– Add semantics using ontologies
– Use inference/reasoning over ontologies
®
All Presentation Material Copyright Eurostep Group AB
Ontologies
• Ontology
– A big word from philosophy, linguistics, and computer
science
– A formal, machine readable specification of a domain of
interest
• Names things and adds knowledge about and constraints on
the things
• Allows relationships between terms within and between
different ontologies
• Semantic Web researchers and W3C have been
working several years now
®
All Presentation Material Copyright Eurostep Group AB
OWL History
• US researchers produced DAML-ONT in 2000
– DARPA Agent Markup Language – Ontology Language
• European researchers produced OIL about the same time
– Ontology Inference Layer
• Merged to produce DAML+OIL and submitted as Note to
W3C and formed the W3C WebOnt group in 2001
• W3C WebOnt Group produced OWL in 2003
– OWL is now a W3C Recommendation
• This is not really that important for our purposes… just
remember that OWL didn’t appear overnight
®
All Presentation Material Copyright Eurostep Group AB
What is OWL?
• The World Wide Web Consortium (W3C) created
the HTML and XML standards
• OWL is a next-generation W3C Web standard
– its purpose is to add “semantics” to the Web
• Therefore, it can be distributed and is Web-enabled and does
not assume a single source for everything
– In concept, it is very much like other data modelling
languages (it calls models or schemas “ontologies”)
• class, subclass, property, property type, instance/individual
– supports set theory and logic-based statements about
the classes and individuals
– it has more than one syntax, XML being one
®
All Presentation Material Copyright Eurostep Group AB
RDF underlies OWL
• RDF is another W3C standard, the Resource
Description Framework
– RDF is simple in concept but sufficient for many basic
Semantic Web tasks (e.g. who created this
presentation?)
– It allows you to assign a property with a value to a Web
page (or any Web resource)
Resource
http://www.eurostep.com/TheSemanticWeb.ppt
Property
Creator
Value
David Price
®
All Presentation Material Copyright Eurostep Group AB
RDF underlies OWL
• RDF is another W3C standard, the Resource
Description Framework
– RDF is simple in concept but sufficient for many basic
Semantic Web tasks (e.g. who created this
presentation?)
– RDF is often represented by nodes and arcs
http://www.eurostep.com/TheSemanticWeb.ppt
Creator
®
All Presentation Material Copyright Eurostep Group AB
“David Price”
Back to the NY Times
®
All Presentation Material Copyright Eurostep Group AB
What we saw… again
• Things on the NY Times site
–
–
–
–
–
–
–
–
Text that’s actually a graphic
Categories of articles
Online shopping link
A photograph
Article title and link
Article abstract
Date and time
Location, temperature and unit
• How did we know that?
– Because we are humans who can read English and who
can interpret what we see
®
All Presentation Material Copyright Eurostep Group AB
A peek under the semantic covers
ArticleNewspaper
title
ontology
Article Authors
Date
Article Subjects
®
All Presentation Material Copyright Eurostep Group AB
Without using an editor …
Now these are
semantics a
software application
can understand…
Articles and Authors
®
All Presentation Material Copyright Eurostep Group AB
On Annotating the Web
• You might ask: But what about the current Web
content, we’re not going to rewrite it all are we?
• And we’d answer: Of course not, but you can
“annotate” them to add semantics.
• What this means is:
– Descriptive ontologies like the one for Newpapers are
being developed
– Descriptions are then linked to already existing Web
pages, including any multi-media content (e.g. video)
– The Semantic Web community calls this “annotating a
Web resource”
– You’ll also hear people use the term “metadata” too
®
All Presentation Material Copyright Eurostep Group AB
So, How does OWL Work?
• An Ontology
– is a formal description of a field of interest
– defines Classes – the kinds of things of interest
• Article, Person, etc.
– defines Properties – the relationships and characteristics related to
Classes
• Article is WrittenBy Person, Person has Name
• Then, based on the Ontology people create content
– An author writes articles using software that understands the
Newspaper Ontology
– The Publisher gathers all the articles, classifieds, etc. and links
them into the online version of the NY Times
®
All Presentation Material Copyright Eurostep Group AB
But how does that help?
• If everyone, or at least a reasonably large
community, agreed on an ontology for
Newspapers
– then sharing articles between sites is possible
– presentation can be layered on top of the semantic
content of the articles
– Web robots, only smarter than Google, can find and
relate content about specific subjects, by specific
authors, etc.
• The key is getting agreement on the ontologies
– This is ongoing in various standards bodies, consortia,
etc. but remains a major issue for the Semantic Web
®
All Presentation Material Copyright Eurostep Group AB
In Conclusion
• The Semantic Web goal is to make semantic
content of Web pages available for software
applications
• Work has been ongoing for several years
– Building on decades of research
• The OWL language is a key development
– As are the languages upon which it is based, such as
RDF Schema
• But that’s for another day…
®
All Presentation Material Copyright Eurostep Group AB