Document 7631832

Download Report

Transcript Document 7631832

Swoogle
search and metadata for the semantic web
Presented by eBiquity group, UMBC
CIKM’04, Nov 12, 2004
Partial research support was provided by DARPA contract F30602-00-0591
and by NSF by awards NSF-ITR-IIS-0326460 and NSF-ITR-IDM-0219649.
@
Motivation
Concepts
Demo
Architecture
Status
Summary
Outline




Motivation
Concepts
Demo
Architecture





document discovery
metadata creation
ontology rank
Status
Summary
http://swoogle.umbc.edu/
Swoogle
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
2
Motivation
Concepts
Demo
Architecture
Status
Summary
Motivation


(Google + Web) has made us all smarter
something similar is needed by people
and software agents for information on
the semantic web
Swoogle
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
3
Motivation
Concepts
Demo
Architecture
Status
Summary
Motivation – Common Questions

Find an ontology



Find instance data



What are the ontologies about “time” ?
Shall I use an existing ontology or create one?
Show me the instances of a class “http://foo.com/Person”?
Gather relevant information for my application.
Characterize the Semantic Web




Swoogle
How many RDF documents are online?
What are the most popular ontologies ?
What graph properties does the semantic web have?
Does namespace URI link to the corresponding ontology?
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
4
Motivation
Concepts
Demo
Architecture
Status
Summary
The Role of Swoogle in Semantic Web
Software Agents, Applications
uses
uses
searches
Directory/Digest Service
Service Finder
digests
Semantic Web
Services
Data
Finder
Swoogle
digests
Data Service
RDF document
SW data service
(Web) document
Swoogle
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
database
@
5
Motivation
Concepts
Demo
Architecture
Status
Summary
Related work

Ontology based annotation & search

Annotate web documents





Annotate proper reference & relations


instance document
 Automated discovery
Ontology level




 Based on both ontology and
CREAM (AIFB,2003)
Ontology repositories

DAML Ontology Library
Schema Web
SemWebCentral
Term level


SHOE (UMCP, 1997)
Ontobroker (AIFB, karlsruhe, 1998),
WebKB (Martin & Eklund, 1999),
QuizRDF (BT,2002)
Swoogle aims to be
a Google-like online
ontology repository
W3C’s Ontaria (2004)
 Search and rank ontologies
and terms
 Digest but not store
Ontology management systems


Stanford’s Ontolingua
IBM’s Snobase
 Create metadata based on
RDF and OWL semantics
 Provide services to both
human and software agents
Swoogle
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
6
Motivation
Concepts
Demo
Architecture
Status
Summary
Concepts

Document

A Semantic Web Document (SWD) is an online document written in
semantic web languages (i.e. RDF and OWL).
In swoogle, a document D is a valid SWD iff. JENA* correctly parses D and
produces at least one triple.
*JENA is a Java framework for writing Semantic Web applications. http://www.hpl.hp.com/semweb/jena2.htm



An ontology document (SWO) is a SWD that contains mostly term definition
(i.e. classes and properties). It corresponds to T-Box in Description Logic.
An instance document (SWI or SWDB) is a SWD that contains mostly class
individuals. It corresponds to A-Box in Description Logic.
Term

A term is a non-anonymous RDF resource which is the URI reference of
either a class or a property.
foaf:Person

rdf:type
rdfs:Class
Individual

An individual refers to a non-anonymous RDF resource which is the URI
reference of a class member.
http://.../foaf.rdf#finin
Swoogle
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
rdf:type
foaf:Person
@
7
Motivation
Concepts
Demo
Architecture
Status
Summary
Concepts Example
SWD
http://foo.com/foaf.rdf#finin
rdf:type
SWO
foaf:Person
SWI
http://foo.com/foaf.rdf#finin
foaf:mbox
[email protected]
http://xmlns.com/foaf/1.0/
Individual
rdfs:subClassOf
wordNet:Agent
Term
foaf:Person
Property
rdf:type
rdfs:domain
rdfs:Class
NOTE: Qualified Names (QName) are used to
shorten well-known namespaces as follows
foaf:mbox
rdf:type
Swoogle
Class
rdf: => http://www.w3.org/1999/02/22-rdf-syntax-ns#"
rdfs: => http://www.w3.org/2000/01/rdf-schema
foaf: => http://xmlns.com/foaf/1.0/
wordNet: => http://xmlns.com/wordnet/1.6/
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
rdf:Property
8
Motivation
Concepts
Demo
Architecture
Status
Summary
Demo
1
Find “Time” Ontology
(Swoogle Search)
2
3
Swoogle
• Document view
• Term view
Find Term “Person”
(Ontology Dictionary)
4
5
Digest “Time” Ontology
Digest Term “Person”
• Class properties
• (Instance) properties
Swoogle Statistics
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
9
Demo
1
Find “Time” Ontology
We can use a set of keywords to search
ontology. For example, “time, before, after”
are basic concepts for a “Time” ontology.
Motivation
Concepts
Demo
Architecture
Status
Summary
Usage of Terms in SWD
http://www.cs.umbc.edu/~finin/foaf.rdf
rdf:type
foaf:Person
foaf:mbox
[email protected]
http://foo.com/foaf.rdf
rdf:type
foaf:Person
http://foo.com/foaf.rdf#finin
foaf:mbox
[email protected]
http://xmlns.com/foaf/1.0/
populated Class
rdfs:subClassOf
wordNet:Agent
populated Property
foaf:Person
rdf:type
rdfs:domain
rdfs:Class
defined Class
foaf:mbox
rdf:type
rdf:Property
defined Property
defined Individual
Swoogle
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
11
Demo
2(a)
Digest “Time” Ontology (term view)
TimeZone
before
………….
intAfter
Motivation
Concepts
Demo
Architecture
Status
Summary
Document Metadata

Web document metadata





When/how discovered/fetched
Suffix of URL
Last modified time
Document size





SWD metadata

Language features





Swoogle

OWL species
RDF encoding
Links to other SWDs


Defined/used terms
Declared/used namespaces
Ontology Ratio
Ontology Rank
Label
Version
Comment
Related Relational
Metadata
Statistical features


Ontology annotation



Imported SWDs
Referenced SWDs
Extended SWDs
Prior version
Links to terms

Classes/Properties
defined/used
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
13
Demo
2(b)
Digest “Time” Ontology (document view)
Demo
3
Find Term “Person”
Not capitalized! URIref is case sensitive!
Motivation
Concepts
Demo
Architecture
Status
Summary
Term Metadata: An integrated definition
Properties (from SWO)
• foaf:mbox
• foaf:name
Onto 1
Class Definition
• rdfs:subClassOf -- foaf:Agent
• rdfs:label – “Person”
foaf:mbox
Properties (from SWI)
• foaf:name
• dc:title
foaf:name
rdfs:domain
rdfs:domain
Onto 2
SWD3
rdf:type
rdf:type
owl:Class
foaf:Person
foaf:name
“Tim Finin”
rdfs:subClassOf
foaf:Agent
rdfs:label
“Person”
Swoogle
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
16
Demo
4
Digest Term “Person”
167 different properties
562 different properties
Demo
5
Swoogle Statistics
Motivation
Concepts
Demo
Architecture
Status
Summary
Swoogle Architecture
data
analysis
metadata
creation
SWD
discovery
Swoogle
IR analyzer
SWD Cache
SWD analyzer
SWD Metadata
interface
Web Server
Web Service
Agent Service
SWD Reader
Candidate
URLs
The Web
Web Crawler
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
19
Motivation
Concepts
Demo
Architecture
Status
Summary
1. SWD Discovery

Swoogle uses three crawlers to discover likely SWD
URLs

A Google Crawler uses Google to find URLs using





keywords: http://www.w3.org/2000/01/rdf-schema,...
File type suffices: .rdf, .owl
A Focused Crawler crawls through HTML files recursively
within the given website.
A SWD Crawler crawls through SWDs and discover URLs
according to term semantics.
To determine the likely SWD URLs:



Swoogle
Non-swd extension filter: .jpg, .mp3, and etc.
Protocol filter: file://, urn:, and etc.
Namespace of RDF resources in SWD
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
20
Motivation
Concepts
Demo
Architecture
Status
Summary
2. Metadata Creation

Document metadata




Term Metadata (definition)



General metadata
SWD metadata
Ontology metadata
Class property
(Instance) property: i.e. class-property bond
Relational metadata
Swoogle
Term
Document
Term
rdfs:subClassOf, rdfs:domain…
rdfs:seeAlso, …
Document
Uses, Defines,…
owl:imports,…
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
21
Motivation
Concepts
Demo
Architecture
Status
Summary
2.1 Ontology Ratio


Why?
 The fuzzy distinction between ontology and instance document
Given a SWD foo, and let
 C(foo): the set of classes defined in foo
 P(foo): the set of properties defined in foo
 I(foo): the set of instances defined in foo
Ratioontology( foo) 

C ( foo)  P( foo)
C ( foo)  P( foo)  I ( foo)
Ontology Ratio as a heuristic to do the classification
 0: pure SWI
 1: pure SWO
 > 0.8: foo is said to be an ontology.
Swoogle
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
22
Motivation
Concepts
Demo
Architecture
Status
Summary
2.2 Relational Metadata

Inter-document relation




Inter-term relation





EXtension (EX) e.g. rdfs:subClassOf
use-TerM (TM) e.g. rdf:range
use-INdividual (IN)
e.g. owl:sameAs
Prior Version (PV, IPV, CPV)
Generalized inter-document relations



rdfs:seeAlso
IMport (IM) e.g. owl:import
Similar/Equal SWD
Generalized from individual level relation
Capture more relations while with less complexity
Usage


Swoogle
Link SWDs
Ontology rank
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
23
Motivation
Concepts
Demo
Architecture
Status
Summary
3. Data analysis: Ranking SWD

Why?





Ranking captures page
importance and popularity
Ranking has been proven
useful in HTML search.
SWD is different from HTML
and has more semantics
So, a new SWD ranking
mechanism is needed !

Swoogle
SWOs
Video
files
HTML
documents
SWIs
Related ideas?

Audio
files
Google’s PageRank
Kleinberg’s HITS
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
Images
@
24
Motivation
Concepts
Demo
Architecture
Status
Summary
3.1 Random surfer model (PageRank)

How PageRank is computed?
 page A’s rank is
 PTi  

P A  1  d   d  
 i 1 C Ti  
n




read page
Where


{Ti } are the pages that link to A
C(X): # of page X’s out links
d is a damping factor (e.g., 0.85)
Compute by iterating until converge
Uniform probability of following any link is
convention in the Web but not in the SW

Links have semantics that influence the
probability of following them

Rational users read an ontology and all
ontologies it referenced.
Swoogle
Jump to a
random page
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
yes
bored?
no
Follow a
random link
@
25
Motivation
Concepts
Demo
Architecture
Status
Summary
3.2 Rational Random Surfer Model

1
Weighted random behavior
n
rawPR( A)  (1  d )  d  rawPR( Xi)
i 1
flow( Xi, A) 
flow( Xi, A)
flow( Xi)
Jump to a
random page
 weight (l )
read page
ilinks( Xi , A )
m
flow( Xi)   f ( Xi, Aj )
no
2
j 1
2

yes
Read
referenced
SWOs
Rational behavior

SWO?
Rank of a SWI
PR( A)  rawPR( A)

Rank of a a SWO
PR( A) 
 rawPR( Xi)
XiTC ( A )
where TC(A) is transitive closure of SWOs referencing A.
Swoogle
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
yes
bored?
no
1
Follow a
random link
@
26
Motivation
Concepts
Demo
Architecture
Status
Summary
3.3 Ontology Rank Example
http://xmlns.com/wordnet/1.6/
rdf:type
http://www.w3.org/2000/01/rdf-schema
rdfs:Class
wordNet:Person
rdfs:subClassOf
TM
rdf:type
wordNet:Individual
rdfs:subClassOf
rdf:Property
EX
TM
http://xmlns.com/foaf/1.0/
rdfs:subClassOf
http://www.cs.umbc.edu/~finin/foaf.rdf
wordNet:Person
rdf:type
foaf:Person
foaf:mbox
TM
foaf:Person
rdf:type
rdfs:Class
[email protected]
Swoogle
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
27
Motivation
Concepts
Demo
Architecture
Status
Summary
3.3 Ontology Rank Example (cont’d)
http://www.w3.org/2000/01/rdf-schema
rawPR =300
PR =403
TM
TM
http://xmlns.com/wordnet/1.6/
rawPR =3
PR =103
EX
http://xmlns.com/foaf/1.0/
TM
rawPR =100
PR =100
http://www.cs.umbc.edu/~finin/foaf.rdf
rawPR =0.2
Swoogle
PR =0.2
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
28
Motivation
Concepts
Demo
Architecture
Status
Summary
Current Status

Swoogle Watch reported (Nov 7, 2004)




40 M triples
270 K SWDs: 4k ontologies
144 K terms: 91K classes & 51K properties
Ongoing work




Ontology Dictionary
Swoogle Statistics
Web Service interface (see Swoogle website)
IR with the Semantic Web (Content search)



Swoogle
Character N-Grams
Bag of URIrefs
Swangling
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
29
Motivation
Concepts
Demo
Architecture
Status
Summary
Summary
2004
Swoogle (Mar, 2004)
Swoogle2 (Sep, 2004)
2005
Swoogle3
Swoogle
 Automated SWD discovery
 SWD metadata creation and search
 Ontology rank (rational surfer model)
 Swoogle watch
 Web Interface
 Ontology dictionary
 Swoogle statistics
 Web service interface (WSDL)
 Bag of URIref IR search
 Better crawl & refresh strategies
 More metadata (ontology mapping)
 More IR features
 Better web service interfaces
 Capture and store all triples
 More reasoning
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
30
Motivation
Concepts
Demo
Architecture
Status
Summary
The End



Website: http://swoogle.umbc.edu
Slides at: http://ebiquity.umbc.edu/v2.1/resource/html/id/66/
Demo: http://ebiquity.umbc.edu/v2.1/resource/html/id/65/
Swoogle
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
31