Creating and Sharing Structured Semantic Web Contents through the Social Web (Main Evaluation) Aman Shakya Advisor: Prof.

Download Report

Transcript Creating and Sharing Structured Semantic Web Contents through the Social Web (Main Evaluation) Aman Shakya Advisor: Prof.

Creating and Sharing
Structured Semantic Web Contents
through the Social Web
(Main Evaluation)
Aman Shakya
Advisor:
Prof. Hideaki Takeda
Sub-advisors: Assoc. Prof. Nigel Collier
Assoc. Prof. Kenro Aihara
Outline

Introduction
◦ Social Semantic Web
◦ State-of-art and Problems

Proposed approach
◦ The StYLiD system
◦ Concept consolidation
◦ Concept grouping

Evaluation
Practical applications
 Conclusions

7/27/2009
main evaluation
2
Introduction
7/27/2009
main evaluation
3
Background

Information Sharing
◦ Information publishing
◦ Understandable semantics
◦ Information dissemination

Shared information
◦ Better utilization  Increased value

Shared information put together
◦ Valuable knowledge
7/27/2009
main evaluation
4
Social Web and Web 2.0
◦
◦
◦
◦
◦
Easy to publish, understand and use
Information sharing platform
User generated contents
Connecting people
Collaboration
◦ Mass participation – Power of People
◦ Wisdom of the crowds
7/27/2009
main evaluation
5
Current Limitations and Needs

Data processing and automation
◦ Unstructured data only for humans

Interoperability
◦ Sharing data across
different applications

Integration
◦ Combining data from
different applications
7/27/2009
main evaluation
6
The Semantic Web
Web of Structured Data
 Machine understandable semantics


Ontologies
◦ Represent Conceptualizations of things
◦ Consensus and common formats

Enables
◦ Automated processing
◦ Interoperation and Integration
◦ Effective search and browsing
7/27/2009
main evaluation
7
Challenges
?

Difficult to publish on the Semantic Web

Wide variety of data to share
◦ Long Tail of information domains
(Hunyh et al. 2007)


Not enough ontologies
Ontology creation is a difficult process

Goal - To enable people to easily share wide variety of
semantically structured data
7/27/2009
main evaluation
8
Social Semantic Web
Social software + Semantic Web
Web 3.0
Social connectivity


Social Semantic
Web
Information connectivity
- Adapted from (Decker, 2005)
7/27/2009
main evaluation
9
State-of-Art: Social Semantic Web
Structured content creation on the
Social Semantic Web
Direct Structured Contents
Derived Structured Contents
Instance Data Creation
Semantification of Social Data
Data Exporters
Semantic Blogging
Scrapers
Semantic Bookmarking
Semantics of Tags
Semantic Desktop
Semantics from Text
Semantic Annotation
Ontology + Instance Data creation
Emergent Semantics
Semantic Wikis
Collaborative Ontology Creation
7/27/2009
main evaluation
10
Collaborative Knowledge Base Creation
Collaborative
Knowledge Base
Knowledge base =
ontology + instance data
Users
Users
7/27/2009
main evaluation
11
Collaborative Knowledge Base Creation Systems
Ease of use
Expressiveness
Constraints
Multiplicity
Consensus
Complex
Moderate
No
Needed
SMW, ikeWiki, etc
extended wiki
syntax, some
training needed
Mainly instances,
concept schemas
possible
strict type
constraints
Freebase
Moderate
Moderate
Metaweb Inc.
Interactive but
elaborate
interface
Concept schemas,
instances
strict type
constraints
Allowed but
concepts
not related
my-
Complex
Moderate
Ontology
understanding
of ontology
needed
Concepts, relations,
instances
Fairly easy
Low
need to build
taxonomy
Concept hierarchy
Easy
Moderate
Semantic
Wikis
Siorpaes & Hepp,
2007
Ontology
Maturing
Braun et al., 2007
Desired
Solution
Wiki way
Mostly
needed
Wiki way,
by admin
Strict logical
constraints
No
free tagging
No
Needed
Wiki way
Needed
By interaction
Minimum
7/27/2009
main evaluation
Yes
Optional
12
Problems
Complexity and learning curve
1.
◦
Difficult to create perfect concept definitions
and ontologies
2.
◦
Difficult to accommodate all requirements
◦
Strict constraints can make the model rigid
Existence of multiple conceptualizations
3.
◦
4.
Powerful collaborative systems difficult for ordinary people
Different perspectives or contexts
Difficulty of collaboration and consensus
7/27/2009
main evaluation
13
Proposed Approach
7/27/2009
main evaluation
14
Proposed Collaborative Knowledge Base Creation
Collaborative
Knowledge Base
Local
KB
Local
KB
Local
KB
Users
Users
Users
7/27/2009
main evaluation
15
Overview of Proposed Approach
Structured Data
Collection
Social Platform
for
Structured Data
Authoring
Concept
Consolidation
Schema
Alignment
Concepts
Instances
Concept
Grouping
Structured
Linked Data
User Community
Grouped
concepts
Browsing,
Searching,
Services
Emerging
Lightweight
Ontologies
7/27/2009
main evaluation
16
StYLiD

Structure Your own Linked Data
http://www.stylid.org

Social Software for

Sharing a wide variety of Structured Data
 Users freely define their
 Easy for ordinary people
own concepts


Consolidate multiple concept schemas
Group and organize similar concepts

Popular evolving concepts definitions
7/27/2009
main evaluation
17
“Hotel” Concept
Creating a new
Concept
List of Attributes
Description
Or Reuse / Modify
existing Concept
Suggested Value Range
7/27/2009
main evaluation
18
Shinjuku Prince Hotel
Instance Data
Literal value
Pick value from
Suggested range
Resource URI
External URI
Multiple Values
7/27/2009
main evaluation
19
Concept Consolidation
Hotel 1
Hotel 2
Hotel 3
Hotel 4
Name
Name
Name
Name
Amenities
Facilities
Price
Phone-number
Capacity
No. of rooms
Rating
Zip-code
Contact
Phone-number
City
Latitude
Price
Single room price
Country
Longitude
Double room price
Near-by
attractions
No. of stories
Access
Nearest station
Rating
Category
Address
same
Synonymous / different labels
Different Contexts / Perspectives
Many-to-one Complimentary
7/27/2009
main evaluation
20
Hotel (Consolidated Concept )
Name
Facilities
Capacity
Consolidated
Concept
Contact
Single room price
Double room price
Access
Rating
Address
Zip-code
Latitude
Longitude
Near-by attractions
No. of stories
7/27/2009
main evaluation
21
Concept Consolidation

A concept consolidation C is defined as a triple
< C, S, A> where
◦ C - consolidated concept
◦ S - set of constituent concepts {C1,C2 ,…..Cn}
◦ A is the attribute alignment between C and S

Based on Global-as-View (GAV) approach for data integration
(Lenzerini, 2002)
◦ Global schema defined as views on source schemas

Consolidated Concept
C
with consolidated attributes
◦ aligned to source concept attributes as views
7/27/2009
main evaluation
22
Concept Consolidation
< C , S, A>
C1
M1
image
C
k
 (ai1 )
view
 (a )
a1
a2  (a )
1
i
2
i
 (ain )
i
am
Ci
1
)
i
2
)
i
aligned( a1 ,a
ai1
ai2
aligned( a2 , a
aligned( am , aini )
k
aini
Mi
Mn
A = { M 1 , M 2 … Mn }
7/27/2009
Cn
main evaluation
23
Concept Consolidation

Consolidated view of instances

Translation of instances
v(k , a j )  v(k ,  (a j ))
◦ From one conceptualization to another

Query Unfolding (Advantage of GAV over LAV)
◦ Queries over
C (in terms of attributes)
to queries over {C1,C2 ,…..Cn}
◦ Using alignment A
◦ Union of results Q(C)  Q1(C1)  Q2 (C2 ) ..... Qn (Cn )

Translation of queries
7/27/2009
main evaluation
24
Concept Cloud
Consolidated
concept
Sub-Cloud
7/27/2009
main evaluation
25
Experiment on Conceptualization

Hypothesis
◦ Multiple conceptualizations by different people for
the same thing can be consolidated

Methodology
◦ Participants given short text passages (6 participants)
◦ List down Facts structured as
attribute value
 (Attribute, Value) table
name
Concept schema location
…..

Kiyomizu
Kyoto
…..
All concept schemas aligned manually
7/27/2009
main evaluation
26
Observations
Types of Alignment Relations found
Attribute label similarity
7/27/2009
main evaluation
27
Remarks


People can express their conceptualizations in
terms of schema
Different people have different conceptualizations
◦ No one covers all possible attributes

Conceptualizations overlap significantly
Most parts can be aligned
Most have simple alignment relations

Multiple conceptualizations can be consolidated


7/27/2009
main evaluation
28
Alignment of Concept Schemas

Attribute Alignments suggested Automatically
◦ Alignment API implementation (with WordNet extension)
(Euzenat, 2004)

Community-supported alignment
◦ Human intelligence + Machine intelligence

Alignments are represented and saved
◦ Alignment ontology (Hughes and Ashpole, 2004)
◦ Alignment API alignment specification language (Euzenat et al., 2004)
 Other formats : C-OWL, SWRL, OWL axioms, XSLT, SEKT-ML and SKOS.
◦ Incremental alignment (maintained collaboratively)

A Unified View
◦ Consolidated concept with Consolidated Attributes
◦ Homogenous table of data
7/27/2009
main evaluation
29
Semi-automatic Schema Alignment
Two Hotel concepts
x
Consolidated
attributes
7/27/2009
main evaluation
30
Consolidated Structured Search
Find all hotels with location “Tokyo”
and type “luxury”
Search on Consolidated Concept
Hotel 1 ---- Hotel 2
location  address
type  category
7/27/2009
main evaluation
31
Concept Grouping

Concept Similarity
ConceptSim(C1, C2) = w1*NameSim(N1, N2) + w2*SchemaSim(S1, S2)

NameSim
◦ WordNet-based similarity - Lin’s algorithm (1998)
◦ Levenshtein distance

SchemaSim
◦ Average similarity of best matching pairs of attributes
Calculate ConceptSim between all pairs of concepts
 Group similar concepts above Threshold

7/27/2009
main evaluation
32
Schema Similarity

Calculate NameSim for all pairs of attributes to create an
n1*n2 matrix
S2
S1
M = [NameSim(A1X A2)]

Find best matching pairs using
Hungarian Algorithm (M)
(Kuhn, 1955; Munkres, 1957)

A2
A1
Calculate matching average
SchemaSim(S1, S2) = 2xSimilarity of best matching pairs / (|A1|+|A2|)
Adapted from Semantic similarity between sentences (Simpson and Dao, 2005)
7/27/2009
main evaluation
33
Visualization of Concepts Grouping
Cytoscape
7/27/2009
main evaluation
34
Experiments on Freebase Data

Purpose
◦ Evaluate automatic schema alignment
◦ Evaluate proposed concept grouping method
◦ Observations about user-defined concepts


Community-driven database of world’s information
User-defined Types – concept schemas
◦ Queried out

(May 20, 2008)
Cleaning
◦ Filter out test types, stop-words, types without instances
7/27/2009
main evaluation
35
Observations

After cleaning
◦ 1,412 concepts
◦ 500 users who defined concepts

People want to share a wide variety of data

People define their own concept schemas

Most people only define few concepts (1-5)
◦ Long tail of information types
7/27/2009
main evaluation
36
Freebase Concept Consolidation

Concepts with same name, synonyms,
morphological variants
◦ 57 consolidated concepts formed

Multiple versions of concept by different users
◦ Up to 6 versions of the same concept
◦ Same user also defines multiple versions

Alignments suggested automatically
◦ 51 alignment relations (44 aligned attribute sets)
◦ Human judgement
◦ Precision 88.24%
◦ Recall 67.16%
7/27/2009
main evaluation
37
Concept Consolidation Example

{Recipe (user1), Recipe (user2), Recipes (user3) ….}
r1
r2
r3
Consolidated concept - Recipe
 Consolidated attributes

◦
◦
◦
◦
◦
◦
{r1#ingredient, r2#ingredients, r3#materials}
{r1#steps, r2#instructions}
Aligned attribute Sets
r3#directions
r2#tools_required
r3#taste
(adapted from Freebase)
r3#author ……
7/27/2009
main evaluation
38
Evaluation of Concept Grouping
ConceptSim(C1, C2) = w1*NameSim(N1, N2) + w2*SchemaSim(S1, S2)
Concept grouping with different thresholds (w1 = 0.7, w2 = 0.3)
Concept grouping with different weights (threshold = 0.8)
7/27/2009
main evaluation
39
Emergence of Lightweight Ontologies
Concepts contributed by community
 Concept consolidation
 Concept grouping
 Popularity of concepts (as in Tag clouds)

Common vocabulary for structured
information sharing
 Conceptual schemas (class/property)
 Informal organization by similarity

7/27/2009
main evaluation
40
Informal Lightweight Ontology
source: Schaffert et al. (2005) p. 7
7/27/2009
main evaluation
41
Evaluation
7/27/2009
main evaluation
42
Evaluation of Usability

Hypothesis
◦ StYLiD is more usable than Freebase (for given tasks)

Methodology
◦ Tasks performed with StYLiD and Freebase





Task 1 - Structured data authoring
Task 2 - Concept schema creation
Task 3, 4 - Modifying and reusing concepts
Task 5 - Structured concepts and instances authoring
Task 6 - Searching
◦ Observations
 Questionnaires, screen logs, comments, etc
7/27/2009
main evaluation
43
Example (Task 1)
Input Band – The Beatles
7/27/2009
main evaluation
44
Participants

Total 15 participants
◦ Including 6 without IT background
◦ Different backgrounds
 Public policy, international relations, psychology,
telecommunication, networks, hotel staff, etc.
◦ From 10 countries
◦ Age : 22 – 43 (avg. 28.3)
◦ Most did not know the systems before
7/27/2009
main evaluation
45
Results

System Usability Scale (SUS)
(Digital Equipment Corp.)
◦ Average scores: StYLiD – 69.7%, Freebase – 39.3%
 Enhanced Semantic MediaWiki – 54.8% (Pfisterer et al., 2008)

Aggregated results from the Tasks
7/27/2009
main evaluation
(score: 0-4)
46
Results for non-IT participants


6 participants
SUS scores
◦ StYLiD (71.67%), Freebase (50.42%)
7/27/2009
47
Observations
StYLiD quite usable without any training,
knowledge or help
 Most users preferred StYLiD to Freebase

Specifying attribute value range not easy
 Strict data type constraints can cause problems
 Many people modify and reuse concepts

People try to input all data in minimum steps
 Data entry can be made easier and quicker

◦ Auto-complete mechanisms would be helpful
7/27/2009
main evaluation
48
Comparison with some systems
StYLiD
Freebase
Semantic
MediaWiki
•Concept creation
UI supported
UI supported
Template markup
•Instance creation
Form-based
Form-based
Extended wiki
syntax + forms
•Data authoring
Blogging / social
bookmarking
Structured wiki
Wiki text
annotation
•Data import
Wrappers
Bulk import facility Not possible
•Constraints
Flexible
Strict type
constraints
Strict type
constraints
•Multiplicity
Allowed
Partly
No
•Consolidation
Schema-level
Some instances
No
•Organization
Concept grouping
Bases
7/27/2009
Categories
main evaluation
49
Practical Applications
7/27/2009
main evaluation
50
Application Scenarios

Users
Social Site for
Structured Information Sharing
Concept
Schemas
Structured
data
StYLiD
External Data
Resources
CMS
Integration
Schema Alignment
Information Sharing
Social Semantic
Website
Users
7/27/2009
main evaluation
51
Application Scenarios

IS1
IS2
Integrated Semantic portal
Wrapper1
Wrapper2
Wrapper3
IS3
Structured
data
Concept
Schemas
Information
Sources
StYLiD
External Data
Resources
Data
Backend
Integrated
Semantic Portal
Integration
Schema Alignment
Users
Admin
7/27/2009
main evaluation
52
Adapting to different scenarios

Variable aspects
◦
◦
◦
◦

Data and concepts acquisition
Community and motivation
Functionalities and constraints
Data quality
Ways of adaptation
◦
◦
◦
◦
Use of wrappers, etc.
Delegate functionalities/constraints
Extensible and customizable open source
Customized queries and views
7/27/2009
main evaluation
53
Real practical applications

Integration of research staff directories
◦ Osaka university and Nagoya university
◦ Data scraped from the websites

A musical community website in Tokyo
International Exchange Center

Social data bookmarking site StYLiD.org

A document management system in AIT
7/27/2009
main evaluation
54
University Directory
Integration
•10 alignments
automatically suggested
•All correct
•Total 19 alignments
7/27/2009
main evaluation
55
Integrated interface
7/27/2009
main evaluation
56
TIEC Musical Community website
7/27/2009
main evaluation
57
StYLiD.org Data Bookmarking
7/27/2009
main evaluation
58
Document Management system
7/27/2009
main evaluation
59
Structured Information Dissemination
in Decentralized Communities
SocioBiblog System
SocioBiblog System
Publishing
Publishing
Aggregation
Aggregation
Social network links
Extended RSS
Web
SocioBiblog System
SocioBiblog System
Publishing
Publishing
Aggregation
Aggregation
7/27/2009
main evaluation
60
Conclusions
7/27/2009
main evaluation
61
Conclusions

Social web application for sharing structured
Semantic Web contents
◦ StYLiD
◦ Free contribution, no strict constraints
◦ Usable (even without training)

Concept consolidation
◦
◦
◦
◦
Multiple conceptualizations exist
Overlap significantly and can be consolidated
Automatic alignments with good precision and recall
A loose collaborative approach for creating shared concept
definitions
7/27/2009
main evaluation
62
Conclusions (contd.)

Concept grouping by similarity
◦ Informal organization
◦ Good precision can be obtained
◦ Parameters can be tuned for appropriate coverage and
precision

Emergent lightweight informal ontologies
◦ Ontology as by-product of information sharing and integration

Practical applications
7/27/2009
main evaluation
63
Future Directions

Computing concept relations







Hierarchical and non-hierarchical
Better schema alignment techniques
Consolidation of data instances
Using existing vocabularies
Mash-ups / plugins to utilize structured data
Scrapers to collect data from the web
…
7/27/2009
main evaluation
64
Thank You!


Questions
Suggestions
7/27/2009
main evaluation
65