Transcript Slide 1

Semantic Web Services
Web Science
Lecture III – 19th March 2009
Dieter Fensel
(contribution from Ioan Toma)
©www.sti-innsbruck.at
Copyright 2008 STI INNSBRUCK www.sti-innsbruck.at
Outline
1. Motivation
2. Web Science
3. Web Evolution
3.1. Web 1.0 – Traditional Web
3.2. Web 2.0
3.3. Web 3.0 – Semantic Web
4. What Web Science should be
www.sti-innsbruck.at
Motivation
www.sti-innsbruck.at
Motivation
The Web Today
http://www.youtube.com/watch?v=6gmP4nk0EOE
www.sti-innsbruck.at
Motivation
“[…] As the Web has grown in complexity and the
number and types of interactions that take place have
ballooned, it remains the case that we know more about
some complex natural phenomena (the obvious example
is the human genome) than we do about this particular
engineered one.”
A Framework for Web Science
T. Berners-Lee and W. Hall and J. A. Hendler and K. O'Hara and N.
Shadbolt and D. J. Weitzner Foundations and Trends® in Web
Science 1 (2006)
A new science that studies the complex phenomena called Web is needed!!
www.sti-innsbruck.at
Web Science
www.sti-innsbruck.at
Web Science definition
A new science that focuses on how huge decentralized
Web systems work.
“The Web isn’t about what you can do with computers. It’s people and, yes,
they are connected by computers. But computer science, as the study of
what happens in a computer, doesn’t tell you about what happens on the
Web.”
Tim Berners-Lee
“A new field of science that involves a multi-disciplinary study and
inquiry for the understanding of the Web and its relationships to us”
Bebo White, SLAC, Stanford University
Shift from how a single computer works to how huge
decentralized Web systems work
www.sti-innsbruck.at
Endorsements for Web Science
“Web science represents a pretty big next step in the evolution of
information. This kind of research likely to have a lot of influence on
the next generation of researchers, scientists and, most importantly,
the next generation of entrepreneurs who will build new companies
from this.”
Eric E. Schmidt, CEO Google
“Web science research is a prerequisite to designing and building
the kinds of complex, human-oriented systems that we are after in
services science.”
Irving Wladawsky-Berger, IBM
www.sti-innsbruck.at
Web science – multi-disciplinary approach
http://webscience.org/images/collide.jpg
www.sti-innsbruck.at
The Goals of Web Science
• To understand what the Web is
• To engineer the Web’s future and providing
infrastructure
• To ensure the Web’s social benefit
www.sti-innsbruck.at
Scientific method
• Natural Sciences such as physics, chemistry, etc. are
analytic disciplines that aim to find laws that generate or
explain observed phenomena
• Computer Science on the other hand is synthetic. It is
about creating formalisms and algorithms in order to
support particular desired behaviour.
• Web science scientific method has to be a combination
of these two paradigms
www.sti-innsbruck.at
What Could Scientific Theories for the
Web Look Like?
• Some simple examples:
– Every page on the Web can be reached by following less than 10
links
– The average number of words per search query is greater than 3
– Web page download times follow a lognormal distribution function
(Huberman)
– The Web is a “scale-free” graph
• Can these statements be easily validated? Are they good
theories? What constitutes good theories about the
Web?
http://webcast.bibalex.org/Presentations/Bebo91108.ppt
www.sti-innsbruck.at
Food For Thought
Electricity : 1800
Electricity Now
What are the analogies for Web Science and Design? Is
our understanding of the Web like that of 1800 electricity?
http://webcast.bibalex.org/Presentations/Bebo91108.ppt
www.sti-innsbruck.at
In the rest of this lecture
• Web Evolution
– Web 1.0 - Traditional Web
– Web 2.0
– Web 3.0 - Semantic Web
• Future steps to realize Web science
– Large scale reasoning
– Rethinking Computer Science for the 21st
century
www.sti-innsbruck.at
Web 1.0 – Traditional Web
www.sti-innsbruck.at
Web 1.0
More than a 2 billion users
more than 50 billion pages
Static
www.sti-innsbruck.at
WWW
URI, HTML, HTTP
Web 1.0
• The World Wide Web ("WWW" or simply the
"Web") is a system of interlinked, hypertext
documents that runs over the Internet. With a
Web browser, a user views Web pages that may
contain text, images, and other multimedia and
navigates between them using hyperlinks. wikipedia
• The Web was created around 1990 by Tim
Berners-Lee working at CERN in Geneva,
Switzerland.
www.sti-innsbruck.at
Web 1.0
• A distributed document delivery system implemented
using application-level protocols on the Internet
• A tool for collaborative writing and community building
• A framework of protocols that support e-commerce
• A network of co-operating computers interoperating using
HTTP and related protocols to form a ‘subnet’ of the
Internet
• A large, cyclical, directed graph made up of Web pages
and links
www.sti-innsbruck.at
WWW Components
• Structural Components
–
–
–
–
Clients/browsers – to dominant implementations
Servers – run on sophisticated hardware
Caches – many interesting implementations
Internet – the global infrastructure which facilitates data
transfer
• Language and Protocol Components
– Uniform Resource Identifiers (URIs)
– Hyper Text Transfer Protocol (HTTP)
– Hyper Text Markup Language (HTML)
www.sti-innsbruck.at
Uniform Resource Identifiers (URIs)
• Uniform Resource Identifiers (URIs) are used
to name/identify resources on the Web
• URIs are pointers to resources to which
request methods can be applied to generate
potentially different responses
• Resource can reside anywhere on the
Internet
• Most popular form of a URI is the Uniform
Resource Locator (URL)
www.sti-innsbruck.at
Hypertext Transfer Protocol (HTTP)
• Protocol for client/server communication
– The heart of the Web
– Very simple request/response protocol
• Client sends request message, server replies with response
message
– Provide a way to publish and retrieve HTML pages
– Stateless
– Relies on URI naming mechanism
www.sti-innsbruck.at
HTTP Request Messages
•
•
•
•
•
•
•
•
GET – retrieve document specified by URL
PUT – store specified document under given URL
HEAD – retrieve info. about document specified by URL
OPTIONS – retrieve information about available options
POST – give information (eg. annotation) to the server
DELETE – remove document specified by URL
TRACE – loopback request message
CONNECT – for use by caches
www.sti-innsbruck.at
HTML
• Hyper-Text Markup Language
– A subset of Standardized General Markup
Language (SGML)
– Facilitates a hyper-media environment
• Documents use elements to “mark up” or identify sections
of text for different purposes or display characteristics
• Mark up elements are not seen by the user when page is
displayed
• Documents are rendered by browsers
www.sti-innsbruck.at
HTML
HTML markup consists of several types of entities,
including: elements, attributes, data types and character
references
– DTD (Document Type Definition)
•
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
– Element (such as document (<html>…</html>), head elements
(<title>…</title>)
– Attribute: <span id='anId' class='aClass' style='color:red;'
title='HyperText Markup Language'>HTML</span>
– Data type: CDATA, URIs, Dates, Link types, language code,
color, text string, etc.
– Character references: for referring to rarely used characters:
• "&#x6C34;" (in hexadecimal) represents the Chinese character for
water
www.sti-innsbruck.at
Web 2.0
www.sti-innsbruck.at
Web 2.0
Web 2.0 is a
notion for a row of
interactive and
collaborative
systems of the
internet
www.sti-innsbruck.at
26
What is the web 2.0? „Definition“ by O‘Reilly
Web 1.0
Web 2.0
improvement
DoubleClick
Ofoto
Britannica Online
content
Webseiten
publishing
CMS
directories
taxonomy
Google AdSense
Flickr
Wikipedia
personalized
tagging, community
community, free
blogging
participation
wikis
tagging
folksonomy
dialogue
Consumers  Prosumers
www.sti-innsbruck.at
27
flexibility, freedom
community
What is the Web 2.0? - Examples
•
•
•
•
Gmail
Google Notebooks (Collaborative Notepad in the Web)
Wikis
Wikipedia
– Worlds biggest encyclopedia, Top 30 web site, 100 langueges
•
•
•
•
Del.icio.us (Social Tagging for Bookmarks)
Flickr (Photo Sharing and Tagging)
Blogs, RSS, Blogger.com
Programmableweb.com: 150 web-APIs
www.sti-innsbruck.at
28
Blogs
•
Easy usable user interfaces to
update contents
•
Easy organization of contents
•
Easy usage of contents
•
Easy publishing of comments
•
Social: collaborative (single
users but strongly
connected)
29
www.sti-innsbruck.at
29
Wikis
• Wiki  invented by Ward Cunningham
• Collection of HTML sites: read and edit
• Most famous and biggest Wiki: Wikipedia (MediaWiki)
– But: Also often used in Intranets (i. e. our group)
•
•
•
•
•
Problems solved socially instead of technically
Flexible structure
Background algorithms + human intelligence
No new technologies
social: collaborative (nobody owns contents)
www.sti-innsbruck.at
30
Wikis: Design Principles
• Open
Should a page be found to be incomplete or poorly organized, any
reader can edit it as they see fit.
• Incremental
Pages can cite other pages, including pages that have not been
written yet.
• Organic
The structure and text content of the site are open to editing and
evolution.
• Mundane
A small number of (irregular) text conventions will provide access to
the most useful page markup.
• Universal
The mechanisms of editing and organizing are the same as those of
writing so that any writer is automatically an editor and organizer.
• Overt
The formatted (and printed) output will suggest the input required to
reproduce it.
Source: http://c2.com/cgi/wiki?WikiDesignPrinciples
www.sti-innsbruck.at
31
Wikis: Design Principles
• Unified
Page names will be drawn from a flat space so that no
additional context is required to interpret them.
• Precise
Pages will be titled with sufficient precision to avoid most
name clashes, typically by forming noun phrases.
• Tolerant
Interpretable (even if undesirable) behavior is preferred
to error messages.
• Observable
Activity within the site can be watched and reviewed by
any other visitor to the site.
• Convergent
Duplication can be discouraged or removed by finding
and citing similar or related content.
Source: http://c2.com/cgi/wiki?WikiDesignPrinciples
www.sti-innsbruck.at
32
Social Tagging
• Idea: Enrich contents by user chosen keywords
• Replace folder based structure by a organisation using tags
• New: Simple user interfaces for tagging and tag based search
• First steps to Semantic Web?
• Technically: user interfaces
• Social: collaborative (own contents, shared tags)
www.sti-innsbruck.at
33
Collaborative Tagging
www.sti-innsbruck.at
34
Tagging: Flickr.com
www.sti-innsbruck.at
35
Folksonomies
Data created by tagging, knowledge structures
User
Tag
Resource
Tag
Resource
User
Tag
Resource
Tag
User
Resource
Tag
Resource
Mary tags www.wikipedia.org with wiki wikipedia encyclopedia
Bob tags www.wikipedia.org with wiki web2.0 encyclopedia knowledge
www.sti-innsbruck.at
36
Tag Clouds
Size of Tags:
count of usage
Browsing
replaces
Searching
Different meaning
for different users
Orientation in
Information Set
www.sti-innsbruck.at
37
What is the Web 2.0? Trends for Web
Applications
• Technical Evolution
– Web User Interfaces become faster (AJAX)
– Desktop shifts to Web (GMail, Google Notebooks, AJAX)
• Social Evolution
–
–
–
–
Collective creates additional value (Wiki, Tagging)
Free contents become popular (Licenses)
Attention is getting monetarized (Text-Ads)
Websites with additional value by recombination (Mash-Ups, RSS)
www.sti-innsbruck.at
38
Semantic Web
www.sti-innsbruck.at
From Web to Semantic Web
Static
www.sti-innsbruck.at
WWW
Semantic Web
URI, HTML, HTTP
RDF, RDF(S), OWL
Semantic Web
•
If the Web is about the global networking of data through URL,
HTML, and HTTP…
•
… the Semantic Web is about the global networking of
knowledge through URI, RDF, and SPARQL
•
This knowledge can be an annotation of Web data (this picture
depicts Innsbruck) or just for knowledge‘s sake (Innsbruck is a city
in Austria)
www.sti-innsbruck.at
Semantic Web
•
URIs are used to identify resources, not just things that exists on
the Web, e.g. Sir Tim Berners-Lee
•
RDF is used to make statements about resources in the form of
triples
<entity, property, value>
•
With RDFS, resources can belong to classes (my Mercedes
belongs to the class of cars) and classes can be subclasses or
superclasses of other classes (vehicles are a superclass of cars,
cabriolets are a subclass of cars)
www.sti-innsbruck.at
Semantic Web Architecture
• Give URIs to concepts - Each URI
identifies one concept.
• Share these symbols between many
languages
• Support URI lookup
www.sti-innsbruck.at
Semantic Web layer cake
www.sti-innsbruck.at
URI and XML
• Uniform Resource Identifier (URI) is the dual of
URL on Semantic Web
– It’s purpose is to indentify resources
• eXtensible Markup Language (XML) is a markup
language used to structure information
– Fundament of data representation on the Semantic Web
– Tags do not convey semantic information
www.sti-innsbruck.at
RDF and OWL
• Resource Description Framework (RDF) is the
dual of HTML in the Semantic Web
–
–
–
–
Simple way to describe resources on the Web
Sort of simple ontology language (RDF-S)
Based on triples (subject; predicate; object)
Serialization is XML based
• Ontology Web Language (OWL) a layered
language based on DL
– More complex ontology language
– Overcome some RDF(S) limitations
www.sti-innsbruck.at
SPARQL and Rule languages
• SPARQL
– Query language for RDF triples
– A protocol for querying RDF data over the Web
• Rule languages (e.g. SWRL)
– Extend basic predicates in ontology languages with
proprietary predicates
– Based on different logics
• Description Logic
• Logic Programming
www.sti-innsbruck.at
Semantic Web
• KIM Browser Plugin
Web content is annotated using
ontologies
Content can be searched and
browsed intelligently
Select one or more concepts
from the ontology…
… send the currently loaded
web page to the Annotation
Server
Annotated Content
www.sti-innsbruck.at
Semantic Web
Dereferencable
URI
www.sti-innsbruck.at
Disco Hyperdata Browser
navigating the Semantic Web as an
unbound set of data sources
Web Evolution - Summary
www.sti-innsbruck.at
Web Evolution - summary
Web 1.0
Web 2.0
Semantic Web
Personal Websites
Blogs
Semantic Blogs: semiBlog, Haystack, Semblog,
Structured Blogging
Content Management Systems,
Britannica Online
Wikis, Wikipedia
Semantic Wikis: Semantic MediaWiki, SemperWiki,
Platypus, dbpedia, Rhizome
Altavista, Google
Google Personalised,
DumbFind, Hakia
Semantic Search: SWSE, Swoogle, Intellidimension
CiteSeer, Project Gutenberg
Google Scholar, Book Search
Semantic Digital Libraries: JeromeDL, BRICKS,
Longwell
Message Boards
Community Portals
Semantic Forums and Community Portals: SIOC,
OpenLink DataSpaces
Buddy Lists, Address Books
Online Social Networks
Semantic Social Networks: FOAF, PeopleAggregator
…
…
Semantic Social Information Spaces: Nepomuk,
Gnowsis
www.sti-innsbruck.at
Web Evolution - summary
•
•
•
Traditional Web (Web1.0)
– Normal User: browsing
– Communication style: one-direction communication (e.g. reading a book)
– Data: web data (string and syntactic format)
– Data contributor: webmaster or experienced user
– How to add data: compose HTML pages
Social Web (Web2.0)
– Normal User: browsing + publishing and organizing web data
– Communication style: human-human (sharing)
– Data: web data + tags
– Data contributor: normal user – revolution!
– How to add data: tagging
Semantic Web
– Normal User: interacting (human-machine)
– Communication style: humanmachine
– Data: web data + tags + metadata (in SW Language)
– Data contributor: normal user, machine
– How to add data: machine generate or user publish
www.sti-innsbruck.at
WHAT WEB SCIENCE SHOULD BE
www.sti-innsbruck.at
Web principles
In the context of the traditional Web (Web 1.0)
a set of principles were proposed:
• Web resource are identified by URI
(Universal Resource Identifier)
• Namespaces should be used to denote
consistent information spaces
• Make use of HTML, XML and other W3C
Web technology recommendations, as well
as the decentralization of resources
www.sti-innsbruck.at
Web 1.0 + semantics = Semantic Web
• The traditional Web represents information using
– natural language (English, German, Italian,…)
– graphics, multimedia, page layout
• Humans can process this easily
– can deduce facts from partial information
– can create mental associations
– are used to various sensory information
www.sti-innsbruck.at
Web 1.0 + semantics = Semantic Web
• However…. Machines are ignorant!
– partial information is unusable
– difficult to make sense from, e.g., an image
– drawing analogies automatically is difficult
– difficult to combine information automatically
• is <foo:creator> same as <bar:author>?
• how to combine different XML hierarchies?
– …
www.sti-innsbruck.at
Semantic Web
• Semantic Web is about applying semantics to
the tradition Web, Web 1.0
• Some of the benefits of Semantic Web:
– More precise queries
– Smarter apps with less work
– Share & link data between apps
– Information has machine-processable and
machine-understandable semantics
www.sti-innsbruck.at
Limitations of applying semantics to traditional
Web
• The principal limits of describing large, heterogeneous, and
distributed systems
• The principal limits of self representation and self reflection
 Necessitates incompleteness and incorrectness of semantic
descriptions.
www.sti-innsbruck.at
Limitations of applying semantics to traditional
Web
The principal limits of describing large, heterogeneous, and distributed systems
www.sti-innsbruck.at
Limitations of applying semantics to traditional
Web
The principal limits of self representation and self reflection
www.sti-innsbruck.at
Limitations of applying semantics to traditional
Web
The principal limits of self representation and self reflection
The mission of STI
International is to
establish Semantics
as a core pillar of
modern computer
The
engineering.
It ismission of STI
supposedInternational
to be the
is to
leading international
Semantics
think tankestablish
in this
field.
as a core pillar of
The mission of STI
International is to
modern computer
establish
engineering.
It is
supposed
to be the as a
Semantics
leadingcore
international
pillar of
think tank in this
field. modern computer
engineering. It is
supposed to be the
leading
international think
tank in this field.
Th
www.sti-innsbruck.at
Limitations of applying semantics to traditional
Web
The principal limits of self representation and self reflection
Meta Layer
(encodes heuristics, i.e. strategic knowledge)
Introspection
Reflection
Object Layer
(encodes possible complete reasoning
knowledge for the problem)
www.sti-innsbruck.at
Limitations of applying semantics to traditional
Web
• The meta layer should apply heuristics that may help
– Speed up the overall reasoning process.
– Increase its flexibility.
• Therefore, it needs to be incomplete in various aspects
and resemble important aspects of our consciousness.
– Introspection
– Reflection
• Unbounded rationality, constrained rationality, limited
rationality.
www.sti-innsbruck.at
Limitations of applying semantics to traditional
Web
• Description of data by metadata or programs by metaprograms
– Always larger (even infinitely large) …
– … or always an approximation
www.sti-innsbruck.at
Data look-up on the Web
• In a large, distributed, and heterogeneous environment,
classical ACID guarantees of the database world no
longer scale in any sense.
• Even a simple read operation in an environment such as
the Web, a peer-to-peer storage network, a set of
distributed repositories, or a space, cannot guarantee
completeness in the sense of assuming that if data was
not returned, then it was not there.
• Similarly, a write can also not guarantee a consistent
state that it is immediately replicated to all the storage
facilities at once.
www.sti-innsbruck.at
Information retrieval on the Web
• Modern information retrieval applies the same principles
– In information retrieval, the notion of
completeness (recall) becomes more and more
meaningless in the context of Web scale
information infrastructures.
– It is very unlikely that a user requests all the
information relevant to a certain topic that exists
on a worldwide scale, since this could easily go
far beyond the amount of information processing
he or she is investing in achieving a certain goal.
– Therefore, instead of investigating the full space
of precision and recall, information retrieval is
starting to focus more around improving
precision and proper ranking of results.
www.sti-innsbruck.at
Reasoning on the Web
• What holds for a simple data look-up holds in an even
stronger sense for reasoning on Web scale.
• The notion of 100% completeness and correctness as
usually assumed in logic-based reasoning does not even
make sense anymore since the underlying fact base is
changing faster than any reasoning process can process
it.
• Therefore, we have to develop a notion of usability of
inferred results and relate them with the resources that
are requested for it.
www.sti-innsbruck.at
Reasoning on the Web
Semantic Web
precision (soundness)
Logic
IR
recall (completeness)
www.sti-innsbruck.at
LarKC
• LarKC – The Large Knowledge Collider
http://www.larkc.eu/
• An open source, modular, and distributed platform for
inference on the Web that makes use of new reasoning
techniques
• A plug-in architecture that supports cooperation between
distributed, heterogeneous, cooperating modules enabling
research into new and different reasoning techniques
www.sti-innsbruck.at
WEB SCIENCE – THE COMPUTER SCIENCE OF THE
21st CENTURY
www.sti-innsbruck.at
Web Science – The Computer Science of the 21st
Century
•
With the Web we have an open, heterogeneous, distributed, and fast
changing computing environment.
•
Therefore we need computing to be understood as
– A goal driven approach where the solution process
is only partially determined and actually decided
during runtime, based on available data and services.
– A heuristic approach that gives up on absolute
notion of completeness and correctness in order to
gain scalability.
•
The times of 100% complete and correct solutions are gone.
www.sti-innsbruck.at
Web Science – The Computer Science of the
21st Century
The Need for Trade-offs:
• In all areas one has to define the trade-off between the guarantees
one provides in terms of service level agreements. Completeness
and correctness are just examples of some very strong guarantees
and what this requires in terms of assumptions, and computational
complexity
•
Different heuristic problem solving approaches are just different
combinations of these three factors.
www.sti-innsbruck.at
Web Science – The Computer Science of the
21st Century
• Service level agreements (or goals) define what has to
be provided as result of solving a problem.
• Do we request an optimal solution, a semi-optimal
solution, or just any solution?
www.sti-innsbruck.at
Web Science – The Computer Science of the
21st Century
• Assumptions describe the generality of the problem
solving approach:
– Assuming that there is only one solution allows stopping the
search for an optimum immediately after a solution has been
found.
– Instead of a global optimization method, a much simpler heuristic
search method can be used in this case, which would still deliver
a global optimum.
• Computational complexity (scalability) or the resources
that are required to fill the gap between the assumptions
and the goals.
www.sti-innsbruck.at
Web Science – The Computer Science of the 21st
Century
• Computer science in the 20th century was about perfect solutions in
closed domains and applications.
• Web science, the new computer science of the 21st century, will be
about approximate solutions and frameworks that capture the
relationships of partial solutions and requirements in terms of
computational costs, i.e., the proper balance of their ratio.
www.sti-innsbruck.at
Web Science – The Computer Science of the
21st Century
•
This shift is comparable to the transition in physics, from classical
physics to relativity theory and quantum mechanics,
...where the notion of absolute space and time is replaced by
relativistic notions and the principle limits of precision.
the more precisely we know about the location of a particle in space,
the less we know about its movement in time and vice versa.
•
•
?
?
www.sti-innsbruck.at
?
?
Relevant links and literature
•
Web Science Research Initiative http://webscience.org/
•
T. Berners-Lee, W. Hall, J. Hendler, N. Shadbolt, D. Weitzner (2006): Creating a
science of the Web. http://eprints.ecs.soton.ac.uk/12615/
•
T. Berners-Lee, W. Hall, J. Hendler, K. O’Hara, N. Shadbolt, D. Weitzner (2006): A
Framework for Web Science. http://eprints.ecs.soton.ac.uk/13347/
•
N. Shadbolt. Web Science Research Initiative Seminar November 2008
http://www.ecs.soton.ac.uk/podcasts/video.php?id=153
•
D. Fensel, Dieter F. van Harmelen. Unifying Reasoning and Search to Web Scale,
IEEE Internet Computing, 11(2), 2007
•
D. Fensel, D. Wolf: The Scientific Role of Computer Science in the 21st Century. In
Proceedings of the third International Workshop on Philosophy and Informatics
(WSPI 2006), Saarbruecken, Germany, May 3-4, 2006.
www.sti-innsbruck.at
Next lecture
#
Date
Title
1
5th March
Introduction
2
12th March
Service Science
3
19th March
Web Science
4
26th March
Web Services (WSDL. SOAP, UDDI, XML)
5
2nd April
Web 2.0 services/ restful services
6
23rd April
WSMO
7
30th April
WSML
8
7th May
WSMX
9
14th May
OWL-S and others
10
28th May
SA-WSDL, WSMO-Lite, MicroWSMO
11
4th June
SWS are good for what
12
18th June
seekda: the business point of view
13
25th June
Mobile services
14
2nd July
Exam
www.sti-innsbruck.at
Questions?
www.sti-innsbruck.at