Transcript Slide 1

Semantic Web

Andrejs Lesovskis

Agenda

Syntax and semantics Introduction to Semantic Web Semantic Web layers Projects that use Semantic Web technologies

Syntax and semantics (1)

A term for the study of the rules governing the way words are combined to form sentences in a language.

In computer science it refers to the ways symbols may be combined to create well-formed programs in the language.

It defines the formal relations between the constituents of a language.

Syntax and semantics (2)

Semantics is the study of the meaning of linguistic expressions. The language can be a natural language, such as English or Navajo, or an artificial language, like a computer programming

language.

Natural-language semantics is important in trying to make computers better able to deal directly with human languages.

What is Semantic Web?

"The Semantic Web is not a separate Web but an extension of the current one (World

Wide Web

– WWW), in which information is given well-defined meaning, better enabling computers and people to work in cooperation.

... a web of data that can be processed directly and indirectly by machines." Tim Berners-Lee, James Hendler, and Ora Lassila.

Semantic Web and World Wide Web

Semantic Web and World Wide Web

Semantic Web and Beyond

Creators Semantic Web content Users

applications agents Semantic Web

Semantic Annotations Languages

WWW and Beyond

Creators Ontologies Tools Web content Logical Support Applications / Services Users

Resource Integration

Semantic annotations Web resources, services, databases Shared ontology 8

8

Resource integration

Industrial and business processes External resources Web resources, services, databases Web users Shared ontology Multimedia resources Mobile devices Machines and devices Web agents/applications

9

Semantic Web and semantic network (1)

Semantic Web and semantic network (2)

Semantic Web inventor

Semantic web inventor the World Wide Web.

Sir Timothy Berners-Lee best known as the inventor of Berners-Lee is the director of the World Wide Web Consortium (W3C), which oversees the Web's continued development.

Semantic Web layers (1)

Semantic Web layers (2)

URI and Unicode XML (eXtensible Markup Language) RDF (Resource Derscription Framework) Ontology Logic Proof Trust User interface and applications

Semantic Web layers (3)

XML and Semantic Web Standards Timeline

Project OpenCalais (Thomson Reuters)

• Thomson Reuters launched project Calais in January 2008.

• Calais Web Service processes unstructured text (like news articles, blog postings, scientific papers, etc.) and it returns semantic metadata in RDF format.

• Uses natural language processing learning and machine techniques to examine the text and locate the entities, facts, and events.

Swoogle search engine

Project DBPedia.org (1)

 DBpedia is a project aimed to extract structured content from the information created as part of the Wikipedia project ("infobox" tables).

This structured information is then made available on the World Wide Web.

 The DBpedia knowledge base allows users to query relationships and properties associated with the Wikipedia resources, including links to other related datasets.

 Used technologies: Scala, Java, Virtuoso Universal Server.

Project DBPedia.org (2)

Project DBPedia.org (3)

Project DBPedia.org (4)

DBPedia project results: Data extraction from 97 languages, English version of the DBpedia knowledge base currently describes 3.77

million things, including 764,000 persons, 573,000 places, 333,000 creative works, 192,000 organizations, 202,000 species and 5,500 diseases., Contains more than 672 million RDF triples, Tests show 87% precision, Developed a large multi-domain ontology.

RDF Site Summary (RSS)

RSS (Really Simple Syndication) is a family of web feed formats used to publish frequently updated works — such as blog entries, news headlines, video audio, and — in a standardized format.

Really Simple Syndication (RSS)

RSS Title This is an example of an RSS feed http://www.someexamplerssdomain.com/main.html Mon, 06 Sep 2010 00:01:00 +0000 Mon, 06 Sep 2009 16:45:00 +0000 1800 Example entry Here description. is some text containing http://www.wikipedia.org/ unique string per item Mon, 06 Sep 2009 16:45:00 +0000 an interesting

URI un Unicode

Unicode

- is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems.

URI (Uniform Resource Identifier) URL (Uniform Resource Locator) http://www.google.com

mailto:[email protected] URN (Uniform Resource Name) URN of "Spider-Man" movie: urn:isan:0000-0000-9E59-0000-O-0000-0000-2 URN of "Science of Computer Programming “ magazine: urn:issn:0167-6423

XML (1)

XML (eXtensible Markup Language) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable Uses tags for markup : data Some of the XML-based languages: Extensible Hypertext Markup Language (XHTML), Really Simple Syndication (RSS), Mathematical Markup Language (MathML), GraphML, Scalable Vector Graphics (SVG).

XML (2)

Example: BELGIAN WAFFLES $5.95 TWO OF OUR FAMOUS BELGIAN WAFFLES WITH PLENTY OF REAL MAPLE SYRUP 650

Scalable Vector Graphics (SVG)

Scalable Vector Graphics (SVG)

Simple Object Access Protocol (SOAP)

 SOAP Version 1.2 (SOAP) is a lightweight protocol intended for exchanging structured information in a decentralized, distributed environment.

It uses XML technologies messaging to define framework an extensible providing a message construct that can be exchanged over a variety of underlying protocols.

 SOAP 1.2

became recommendation in 2007.

a W3C

SOAP envelope

SOAP example

POST /InStock HTTP/1.1

Host: www.example.org

Content-Type: application/soap+xml; charset=utf-8 Content-Length: 299 SOAPAction: "http://www.w3.org/2003/05/soap-envelope" IBM

Web Services Description Language (WSDL)

 Web Services Description Language is an XML-based interface description language that is used for describing the functionality offered by a web service.

 A WSDL description of a web service (also referred to as a WSDL file) provides a machine-readable description of how the service can be called, what parameters it expects, returns.

and what data structures it  WSDL 2.0 became a W3C recommendation on June 2007.

Web Services Description Language (WSDL)

Semantic Web service architecture

Simple Semantic Web Architecture and Protocol

Simple Semantic Web Architecture and Protocol (2) The SSWAP architecture is based on the following five basic concepts:

Provider

– corresponds to the organizations that own and publish resources;

Resource

– arbitrary resources (for example, web pages, ontologies, or datasets), but they are primarily used to describe web services;

Graph

– concept that describes transformations performed by the service;

Subject

– input data that is given to the service;

Object

– service execution result.

Document Type Definition (DTD)

Document Type Definition (DTD) is a set of markup declarations that define a document type for an SGML-family markup language (SGML, XML, HTML). DTD is a part of XML 1.0 specification.

Example: DTD ]> XML Mike's Store XML XML in Nutshell John Smith

DTD elements

External DTD declaration: doc_elem SYSTEM/PUBLIC dtd_addr>

Element type declaration name content_model>

Any content:

Children elements:

Parsed character data:

Empty (has no content):

DTD quantifiers

a+

• •

a* a?

a, b a | b ]> Microsoft

DTD attributes

Attribute declaration template: element_name

attribute_name type default_value

...

attribute_name type default_value>

Example: ...

XML Schema

XML Schema 1.0 was approved as a W3C Recommendation in 2001 and it was the first separate schema language for XML to receive this status.

Schema is an abstract collection of metadata, that includes the following components: element and attribute declarations and complex and simple type definitions.

Schema definition example: ...

Reference to an XL Schema:

XML Schema example

XML Schema elements

 Simple elements don’t contain child elements or attributes:

Element types

Primitive types

string,

boolean,

decimal,

float,

double,

duration,

dateTime, time, date,

gYearMonth, gYear,

gMonthDay, gDay,

gMonth,

hexBinary, base64Binary,

anyURI,

Qname,

NOTATION.

Derived types

normalizedString,

token,

language,

NMTOKEN, NMTOKENS,

Name, NCName,

ID, IDREF, IDREFS,

ENTITY, ENTITIES,

integer,

nonPositiveInteger,

negativeInteger,

long, int, short, byte,

unsignedLong,

unsignedInt,

unsignedShort,

unsignedByte.

Element occurrence indicators

The minOccurs indicator specifies the minimum number of times an element can occur. If minOccurs is equal to 0, then element is optional.

The indicator specifies the maximum number of times an element can occur. If maxOccurs equals " unbounded ", then element is allowed to appear an unlimited number of times.

XML Schema attributes

Attribute declaration template:

Example: ...

DTD vs XML Schema (1)

• •

DTD pros

It's been around longer than XML Schema; Is a part of XML 1.0 specifications.

• • • •

DTD cons

Uses different from XML syntax; Doesn’t support namespaces; Limited number of types; DTD describes whole document.

DTD un XML Schema (2)

• • •

XML Schema pros

Uses XML syntax (schemas themselves are XML documents); Supports more data types and allows to define your own types; Schema can define portions of the document.

XML Schema cons

Pretty much none these days.

Thank you!