Document 7631832
Download
Report
Transcript Document 7631832
Swoogle
search and metadata for the semantic web
Presented by eBiquity group, UMBC
CIKM’04, Nov 12, 2004
Partial research support was provided by DARPA contract F30602-00-0591
and by NSF by awards NSF-ITR-IIS-0326460 and NSF-ITR-IDM-0219649.
@
Motivation
Concepts
Demo
Architecture
Status
Summary
Outline
Motivation
Concepts
Demo
Architecture
document discovery
metadata creation
ontology rank
Status
Summary
http://swoogle.umbc.edu/
Swoogle
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
2
Motivation
Concepts
Demo
Architecture
Status
Summary
Motivation
(Google + Web) has made us all smarter
something similar is needed by people
and software agents for information on
the semantic web
Swoogle
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
3
Motivation
Concepts
Demo
Architecture
Status
Summary
Motivation – Common Questions
Find an ontology
Find instance data
What are the ontologies about “time” ?
Shall I use an existing ontology or create one?
Show me the instances of a class “http://foo.com/Person”?
Gather relevant information for my application.
Characterize the Semantic Web
Swoogle
How many RDF documents are online?
What are the most popular ontologies ?
What graph properties does the semantic web have?
Does namespace URI link to the corresponding ontology?
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
4
Motivation
Concepts
Demo
Architecture
Status
Summary
The Role of Swoogle in Semantic Web
Software Agents, Applications
uses
uses
searches
Directory/Digest Service
Service Finder
digests
Semantic Web
Services
Data
Finder
Swoogle
digests
Data Service
RDF document
SW data service
(Web) document
Swoogle
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
database
@
5
Motivation
Concepts
Demo
Architecture
Status
Summary
Related work
Ontology based annotation & search
Annotate web documents
Annotate proper reference & relations
instance document
Automated discovery
Ontology level
Based on both ontology and
CREAM (AIFB,2003)
Ontology repositories
DAML Ontology Library
Schema Web
SemWebCentral
Term level
SHOE (UMCP, 1997)
Ontobroker (AIFB, karlsruhe, 1998),
WebKB (Martin & Eklund, 1999),
QuizRDF (BT,2002)
Swoogle aims to be
a Google-like online
ontology repository
W3C’s Ontaria (2004)
Search and rank ontologies
and terms
Digest but not store
Ontology management systems
Stanford’s Ontolingua
IBM’s Snobase
Create metadata based on
RDF and OWL semantics
Provide services to both
human and software agents
Swoogle
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
6
Motivation
Concepts
Demo
Architecture
Status
Summary
Concepts
Document
A Semantic Web Document (SWD) is an online document written in
semantic web languages (i.e. RDF and OWL).
In swoogle, a document D is a valid SWD iff. JENA* correctly parses D and
produces at least one triple.
*JENA is a Java framework for writing Semantic Web applications. http://www.hpl.hp.com/semweb/jena2.htm
An ontology document (SWO) is a SWD that contains mostly term definition
(i.e. classes and properties). It corresponds to T-Box in Description Logic.
An instance document (SWI or SWDB) is a SWD that contains mostly class
individuals. It corresponds to A-Box in Description Logic.
Term
A term is a non-anonymous RDF resource which is the URI reference of
either a class or a property.
foaf:Person
rdf:type
rdfs:Class
Individual
An individual refers to a non-anonymous RDF resource which is the URI
reference of a class member.
http://.../foaf.rdf#finin
Swoogle
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
rdf:type
foaf:Person
@
7
Motivation
Concepts
Demo
Architecture
Status
Summary
Concepts Example
SWD
http://foo.com/foaf.rdf#finin
rdf:type
SWO
foaf:Person
SWI
http://foo.com/foaf.rdf#finin
foaf:mbox
[email protected]
http://xmlns.com/foaf/1.0/
Individual
rdfs:subClassOf
wordNet:Agent
Term
foaf:Person
Property
rdf:type
rdfs:domain
rdfs:Class
NOTE: Qualified Names (QName) are used to
shorten well-known namespaces as follows
foaf:mbox
rdf:type
Swoogle
Class
rdf: => http://www.w3.org/1999/02/22-rdf-syntax-ns#"
rdfs: => http://www.w3.org/2000/01/rdf-schema
foaf: => http://xmlns.com/foaf/1.0/
wordNet: => http://xmlns.com/wordnet/1.6/
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
rdf:Property
8
Motivation
Concepts
Demo
Architecture
Status
Summary
Demo
1
Find “Time” Ontology
(Swoogle Search)
2
3
Swoogle
• Document view
• Term view
Find Term “Person”
(Ontology Dictionary)
4
5
Digest “Time” Ontology
Digest Term “Person”
• Class properties
• (Instance) properties
Swoogle Statistics
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
9
Demo
1
Find “Time” Ontology
We can use a set of keywords to search
ontology. For example, “time, before, after”
are basic concepts for a “Time” ontology.
Motivation
Concepts
Demo
Architecture
Status
Summary
Usage of Terms in SWD
http://www.cs.umbc.edu/~finin/foaf.rdf
rdf:type
foaf:Person
foaf:mbox
[email protected]
http://foo.com/foaf.rdf
rdf:type
foaf:Person
http://foo.com/foaf.rdf#finin
foaf:mbox
[email protected]
http://xmlns.com/foaf/1.0/
populated Class
rdfs:subClassOf
wordNet:Agent
populated Property
foaf:Person
rdf:type
rdfs:domain
rdfs:Class
defined Class
foaf:mbox
rdf:type
rdf:Property
defined Property
defined Individual
Swoogle
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
11
Demo
2(a)
Digest “Time” Ontology (term view)
TimeZone
before
………….
intAfter
Motivation
Concepts
Demo
Architecture
Status
Summary
Document Metadata
Web document metadata
When/how discovered/fetched
Suffix of URL
Last modified time
Document size
SWD metadata
Language features
Swoogle
OWL species
RDF encoding
Links to other SWDs
Defined/used terms
Declared/used namespaces
Ontology Ratio
Ontology Rank
Label
Version
Comment
Related Relational
Metadata
Statistical features
Ontology annotation
Imported SWDs
Referenced SWDs
Extended SWDs
Prior version
Links to terms
Classes/Properties
defined/used
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
13
Demo
2(b)
Digest “Time” Ontology (document view)
Demo
3
Find Term “Person”
Not capitalized! URIref is case sensitive!
Motivation
Concepts
Demo
Architecture
Status
Summary
Term Metadata: An integrated definition
Properties (from SWO)
• foaf:mbox
• foaf:name
Onto 1
Class Definition
• rdfs:subClassOf -- foaf:Agent
• rdfs:label – “Person”
foaf:mbox
Properties (from SWI)
• foaf:name
• dc:title
foaf:name
rdfs:domain
rdfs:domain
Onto 2
SWD3
rdf:type
rdf:type
owl:Class
foaf:Person
foaf:name
“Tim Finin”
rdfs:subClassOf
foaf:Agent
rdfs:label
“Person”
Swoogle
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
16
Demo
4
Digest Term “Person”
167 different properties
562 different properties
Demo
5
Swoogle Statistics
Motivation
Concepts
Demo
Architecture
Status
Summary
Swoogle Architecture
data
analysis
metadata
creation
SWD
discovery
Swoogle
IR analyzer
SWD Cache
SWD analyzer
SWD Metadata
interface
Web Server
Web Service
Agent Service
SWD Reader
Candidate
URLs
The Web
Web Crawler
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
19
Motivation
Concepts
Demo
Architecture
Status
Summary
1. SWD Discovery
Swoogle uses three crawlers to discover likely SWD
URLs
A Google Crawler uses Google to find URLs using
keywords: http://www.w3.org/2000/01/rdf-schema,...
File type suffices: .rdf, .owl
A Focused Crawler crawls through HTML files recursively
within the given website.
A SWD Crawler crawls through SWDs and discover URLs
according to term semantics.
To determine the likely SWD URLs:
Swoogle
Non-swd extension filter: .jpg, .mp3, and etc.
Protocol filter: file://, urn:, and etc.
Namespace of RDF resources in SWD
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
20
Motivation
Concepts
Demo
Architecture
Status
Summary
2. Metadata Creation
Document metadata
Term Metadata (definition)
General metadata
SWD metadata
Ontology metadata
Class property
(Instance) property: i.e. class-property bond
Relational metadata
Swoogle
Term
Document
Term
rdfs:subClassOf, rdfs:domain…
rdfs:seeAlso, …
Document
Uses, Defines,…
owl:imports,…
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
21
Motivation
Concepts
Demo
Architecture
Status
Summary
2.1 Ontology Ratio
Why?
The fuzzy distinction between ontology and instance document
Given a SWD foo, and let
C(foo): the set of classes defined in foo
P(foo): the set of properties defined in foo
I(foo): the set of instances defined in foo
Ratioontology( foo)
C ( foo) P( foo)
C ( foo) P( foo) I ( foo)
Ontology Ratio as a heuristic to do the classification
0: pure SWI
1: pure SWO
> 0.8: foo is said to be an ontology.
Swoogle
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
22
Motivation
Concepts
Demo
Architecture
Status
Summary
2.2 Relational Metadata
Inter-document relation
Inter-term relation
EXtension (EX) e.g. rdfs:subClassOf
use-TerM (TM) e.g. rdf:range
use-INdividual (IN)
e.g. owl:sameAs
Prior Version (PV, IPV, CPV)
Generalized inter-document relations
rdfs:seeAlso
IMport (IM) e.g. owl:import
Similar/Equal SWD
Generalized from individual level relation
Capture more relations while with less complexity
Usage
Swoogle
Link SWDs
Ontology rank
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
23
Motivation
Concepts
Demo
Architecture
Status
Summary
3. Data analysis: Ranking SWD
Why?
Ranking captures page
importance and popularity
Ranking has been proven
useful in HTML search.
SWD is different from HTML
and has more semantics
So, a new SWD ranking
mechanism is needed !
Swoogle
SWOs
Video
files
HTML
documents
SWIs
Related ideas?
Audio
files
Google’s PageRank
Kleinberg’s HITS
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
Images
@
24
Motivation
Concepts
Demo
Architecture
Status
Summary
3.1 Random surfer model (PageRank)
How PageRank is computed?
page A’s rank is
PTi
P A 1 d d
i 1 C Ti
n
read page
Where
{Ti } are the pages that link to A
C(X): # of page X’s out links
d is a damping factor (e.g., 0.85)
Compute by iterating until converge
Uniform probability of following any link is
convention in the Web but not in the SW
Links have semantics that influence the
probability of following them
Rational users read an ontology and all
ontologies it referenced.
Swoogle
Jump to a
random page
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
yes
bored?
no
Follow a
random link
@
25
Motivation
Concepts
Demo
Architecture
Status
Summary
3.2 Rational Random Surfer Model
1
Weighted random behavior
n
rawPR( A) (1 d ) d rawPR( Xi)
i 1
flow( Xi, A)
flow( Xi, A)
flow( Xi)
Jump to a
random page
weight (l )
read page
ilinks( Xi , A )
m
flow( Xi) f ( Xi, Aj )
no
2
j 1
2
yes
Read
referenced
SWOs
Rational behavior
SWO?
Rank of a SWI
PR( A) rawPR( A)
Rank of a a SWO
PR( A)
rawPR( Xi)
XiTC ( A )
where TC(A) is transitive closure of SWOs referencing A.
Swoogle
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
yes
bored?
no
1
Follow a
random link
@
26
Motivation
Concepts
Demo
Architecture
Status
Summary
3.3 Ontology Rank Example
http://xmlns.com/wordnet/1.6/
rdf:type
http://www.w3.org/2000/01/rdf-schema
rdfs:Class
wordNet:Person
rdfs:subClassOf
TM
rdf:type
wordNet:Individual
rdfs:subClassOf
rdf:Property
EX
TM
http://xmlns.com/foaf/1.0/
rdfs:subClassOf
http://www.cs.umbc.edu/~finin/foaf.rdf
wordNet:Person
rdf:type
foaf:Person
foaf:mbox
TM
foaf:Person
rdf:type
rdfs:Class
[email protected]
Swoogle
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
27
Motivation
Concepts
Demo
Architecture
Status
Summary
3.3 Ontology Rank Example (cont’d)
http://www.w3.org/2000/01/rdf-schema
rawPR =300
PR =403
TM
TM
http://xmlns.com/wordnet/1.6/
rawPR =3
PR =103
EX
http://xmlns.com/foaf/1.0/
TM
rawPR =100
PR =100
http://www.cs.umbc.edu/~finin/foaf.rdf
rawPR =0.2
Swoogle
PR =0.2
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
28
Motivation
Concepts
Demo
Architecture
Status
Summary
Current Status
Swoogle Watch reported (Nov 7, 2004)
40 M triples
270 K SWDs: 4k ontologies
144 K terms: 91K classes & 51K properties
Ongoing work
Ontology Dictionary
Swoogle Statistics
Web Service interface (see Swoogle website)
IR with the Semantic Web (Content search)
Swoogle
Character N-Grams
Bag of URIrefs
Swangling
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
29
Motivation
Concepts
Demo
Architecture
Status
Summary
Summary
2004
Swoogle (Mar, 2004)
Swoogle2 (Sep, 2004)
2005
Swoogle3
Swoogle
Automated SWD discovery
SWD metadata creation and search
Ontology rank (rational surfer model)
Swoogle watch
Web Interface
Ontology dictionary
Swoogle statistics
Web service interface (WSDL)
Bag of URIref IR search
Better crawl & refresh strategies
More metadata (ontology mapping)
More IR features
Better web service interfaces
Capture and store all triples
More reasoning
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
30
Motivation
Concepts
Demo
Architecture
Status
Summary
The End
Website: http://swoogle.umbc.edu
Slides at: http://ebiquity.umbc.edu/v2.1/resource/html/id/66/
Demo: http://ebiquity.umbc.edu/v2.1/resource/html/id/65/
Swoogle
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
@
31