Realizing the Relationship Web: Morphing information

Download Report

Transcript Realizing the Relationship Web: Morphing information

Realizing the Relationship Web:
Morphing information access on the Web from today’s document- and entity-centric
paradigm to a relationship-centric paradigm
ACM Multimedia International Workshop on the Many Faces of Multimedia Semantics
September 28, 2007, Augsburg, Germany
Amit Sheth
Kno.e.sis Center, Wright State University,
Dayton, OH
This talk also represents work of several members of Kno.e.sis team, esp. the
Semantic Discovery and Semantic Sensor Data. http://knoesis.wright.edu
Thanks, M. Perry, C. Ramakrishnan, C. Thomas
Knowledge Enabled Information and Services Science
Objects of Interest
“An object by itself is intensely uninteresting”.
Grady Booch, Object Oriented Design with Applications, 1991
Keywords
|
Search
Entities
|
Integration
Relationships
|
Analysis,
Insight
Entities + Relationships also needed to model & study Events
Knowledge Enabled Information and Services Science
Semantics and Relationships
Increasing depth and sophistication in dealing with
semantics by dealing with (identifying/searching
to analyzing) documents, entities, and
relationships.
Future
Relationships
Current
Entities
Past
Documents/Media
Knowledge Enabled Information and Services Science
Data, Information, and Insight
What: Which thing or which particular one
Who: What or which person or persons
Where: At or in what place
When: At what time
How: In what manner or way; by what means
Why: For what purpose, reason, or cause; with what
intention, justification, or motive
© Ramesh Jain
Knowledge Enabled Information and Services Science
Insights require understanding Relationships
Object
Location
Time
What
X
X
X
Who
X
Where
When
Relationships
X
X
How
X
Why
X
© Ramesh Jain
Knowledge Enabled Information and Services Science
Semantics and Relationships
Semantics is derived from relationships. Consider the
linguistics perspective.
“Semantics is the study of meaning. …We may
distinguish a number of legitimate ways to approach
semantics:
• …
• the relationship between linguistic expressions (e.g.
synonymy, antonymy, hyperonymy, etc.): sense;
• the relationship to linguistic expressions to the "real
world": reference. “
Ontologies use KR language to support modeling of relationships .
Quoted part from http://www.ncl.ac.uk/sml/staff/. © 2000 Jonathan West.
Knowledge Enabled Information and Services Science
Why is this a hard problem?
Are objects/entities equivalent/equal(same)?
How (well) are they related?
• Implicit vs explicit; formal/assertional vs social
consensus based; powerful (beyond FOL): partial,
probilistic and fuzzy match
• Degrees of relatedness and relevance: semantic
similarity, semantic proximity, semantic distance, ….
– [differentiation, disjointedness]
– related in a “context”
• Even is-a link involves different notions: identify, unity,
essense (Guarino and Wetley 2002)
Semantic ambiguity, also based on incomplete,
inconsistent, approximate information/knowledge
Knowledge Enabled Information and Services Science
Issues - Relationships
• Identifying Relationship (extraction)
• Expressing (specifying, representing)
relationships
• Discovering and Exploring Relationships
• Hypothesizing and Validating
Relationships
• Utilizing/exploiting Relationships for
Semantic Applications (in document
search, querying metadata, inferencing,
analysis, insight, discovery)
Knowledge Enabled Information and Services Science
Information Extraction
for Metadata Creation
WWW, Enterprise
Repositories
Nexis
UPI
AP
Feeds/
Documents
Digital Videos
...
...
Data Stores
Digital Maps
...
Digital Images
Create/extract as much (semantics)
metadata automatically as possible;
Use ontlogies to improve and enhance
extraction
Digital Audios
EXTRACTORS
METADATA
Knowledge Enabled Information and Services Science
Automatic Semantic Metadata Extraction/Annotation
Knowledge Enabled Information and Services Science
Semantic Annotation (Extraction + Enhancement)
COMTEX Tagging
Value-added Voquette Semantic Tagging
Content
‘Enhancement’
Rich Semantic
Metatagging
Limited tagging
(mostly syntactic)
Value-added
relevant metatags
added by Voquette
to existing
COMTEX tags:
Knowledge Enabled Information and Services Science
• Private companies
• Type of company
• Industry affiliation
• Sector
• Exchange
• Company Execs
• Competitors
Semantic Metadata Enhancement
Knowledge Enabled Information and Services Science
Automatic Classification & Metadata Extraction (Web page)
Video with
Editorialized
Text on the Web
Auto
Categorization
Semantic Metadata
Knowledge Enabled Information and Services Science
Ontology-directed Metadata Extraction
(Semi-structured data)
Web Page
Enhanced Metadata Asset
Extraction
Agent
Knowledge Enabled Information and Services Science
Semantic Extraction/Annotation of Experimental Data
ProPreO: Ontology-mediated provenance
830.9570
194.9604
2
580.2985
0.3592
parent ion m/z
688.3214
0.2526
779.4759
38.4939
784.3607
21.7736
1543.7476
1.3822
fragment ion m/z
1544.7595
2.9977
1562.8113
37.4790
1660.7776
476.5043
parent ion charge
parent ion
abundance
fragment ion
abundance
ms/ms peaklist data
Knowledge
Enabled Information and(MS)
Services Science
Mass
Spectrometry
Data
Video metadata and search
Technique
Who’s trying it
How it works
Wired
Tired
Scanning the
script
Blinkx, TVEyes
Everything said in a
clip is tracked through
voice recognition
software, closedcaption information, or
a combination of the
two.
Hunts down TV
news references
to, say, Lindsay
Lohan.
A mere mention of
her name doesn't
guarantee Lindsay
is in the clip.
Identifying
what's being
shown
Google, UCSD's
Statistical Visual
Computing Lab,
VideoMining
Algorithms try to figure
out what's in the video
by monitoring attributes
like behavior and
movement
(VideoMining), faces
(Google), and objects
(UCSD).
Finds friends,
public figures, or
specific actions
like car chases.
The tech is stuck
in the lab or
limited to
specialized search
tasks.
Analyzing
links and
metadata
Dabble, Google
Conventional search
spiders scan the text
around a video and the
pages that link to it.
The fastest and
best way to
search for video
data.
Can't "see" what's
in the videos or
locate specific
action in a long
clip.
Knowledge Enabled Information and Services Science
Wired Mag, Aug 2007
Semantic Sensor ML – Adding Ontological Metadata
Domain
Ontology
Person
Company
Spatial
Ontology
Coordinates
Coordinate System
Temporal
Ontology
Time Units
Timezone
Mike Botts, "SensorML and Sensor Knowledge
Web Enablement,"
Enabled Information and Services Science
Earth System Science Center, UAB Huntsville
18
Relationships on the Web:
early work
Knowledge Enabled Information and Services Science
MREF (Metadata Reference Link -- complementing HREF)
Creating “logical web” through
Media Independent Metadata based
Correlation
Knowledge Enabled Information and Services Science
Metadata Reference Link (<A MREF …>)
<A HREF=“URL”>Document Description</A>
physical link between document (components)
• <A MREF KEYWORDS=<list-of-keywords>;
THRESH=<real>>Document Description</A>
• <A MREF ATTRIBUTES(<list-of-attribute-valuepairs>)>Document Description</A>
Knowledge Enabled Information and Services Science
MREF
Metadata Reference Link -- complementing HREF (1996, 1998)
Creating “logical web” through
Media Independent Metadata based Correlation
ONTOLOGY
NAMESPACE
ONTOLOGY
NAMESPACE
METADATA
METADATA
DATA
MREF
in RDF
DATA
Knowledge Enabled Information and Services Science
MREF (1998)
Model for Logical
Correlation using
Ontological Terms
and Metadata
MREF
Framework for
Representing
MREFs
RDF
Serialization
(one implementation
choice)
XML
K. Shah and A. Sheth, "Logical Information Modeling of Web-accessible Heterogeneous Digital Assets",
Proc. of the Forum on Research and Technology Advances in Digital Libraries," (ADL'98),
Santa Barbara, CA, May 28-30, 1998, pp. 266-275.
Knowledge Enabled Information and Services Science
Figure 3: XML, RDF, and MREF
Correlation based on Content-based Metadata
Some interesting information on dams is available here.
“information on dams” is defined by an MREF defining
keywords and metadata (which may be used for a query).
water.gif (Data)
Metadata Storage
water.gif
……mpeg
……ppm
Content
Dependent
Metadata
height, width
and size
Content based Metadata
Major component(RGB)
Knowledge Enabled Information and Services Science
Blue
Domain Specific Correlation
Potential locations for a future shopping mall identified by all
regions having a population greater than 500 and area greater
than 50 sq meters having an urban land cover and moderate
relief <A MREF ATTRIBUTES(population < 500; area < 50 &
region-type = ‘block’ & land-cover = ‘urban’ & relief =
‘moderate’)>can be viewed here</A>
=> media-independent relationships between domain specific
metadata: population, area, land cover, relief
=> correlation between image and structured data at a higher
domain specific level as opposed to physical “link-chasing” in
the WWW
Knowledge Enabled Information and Services Science
Repositories and the Media Types
Population:
Area:
Boundaries:
Land cover:
Relief:
Image Features
(IP routines)
Regions
(SQL)
Boundaries
Census DB
TIGER/Line DB
Knowledge Enabled Information and Services Science
Map DB
Knowledge Enabled Information and Services Science
Complex Relationships
• Some relationships may not be manually
asserted, but according to statistical
analyses of text, experimental data, etc.
•  allow association of provenance data
with classes, instances, relationship types
and direct relationships or statements
Knowledge Enabled Information and Services Science
A simple relationship ?
Smoking
Causes
Knowledge Enabled Information and Services Science
Cancer
Complex Relationships - Cause-Effects & Knowledge discovery
ENVIRON.
VOLCANO
BUILDING
LOCATION
LOCATION
ASH RAIN DESTROYS
PYROCLASTIC
FLOW
WEATHER
PEOPLE
COOLS TEMP
PLANT
DESTROYS
KILLS
Knowledge Enabled Information and Services Science
Knowledge Discovery - Example
Earthquake Sources
Nuclear Test Sources
Nuclear Test May Cause Earthquakes
Is it really true?
Knowledge Enabled Information and Services Science
Complex
Relationship:
How do you
model this?
Inter-Ontological Relationships
A nuclear test could have caused an earthquake
if the earthquake occurred some time after the
nuclear test was conducted and in a nearby region.
NuclearTest Causes Earthquake
<= dateDifference( NuclearTest.eventDate,
Earthquake.eventDate ) < 30
AND distance( NuclearTest.latitude,
NuclearTest.longitude,
Earthquake,latitude,
Earthquake.longitude ) < 10000
Knowledge Enabled Information and Services Science
Knowledge Discovery –
exploring relationship…
For each group of earthquakes with magnitudes in the ranges
5.8-6, 6-7, 7-8, 8-9, and >9 on the Richter scale per year
starting from 1900, find number of earthquakes
Number of earthquakes with
magnitude > 7 almost constant.
So nuclear tests probably only
cause earthquakes with
magnitude < 7
Knowledge Enabled Information and Services Science
Now possible – Extracting relationships
between MeSH terms from PubMed
Biologically
active substance
UMLS
Semantic Network
complicates
affects
causes
causes
Lipid
affects
Disease or
Syndrome
instance_of
instance_of
???????
Fish Oils
Raynaud’s Disease
MeSH
9284
documents
5
documents
Knowledge Enabled Information and Services Science
4733
documents
PubMed
Schema-Driven Extraction of Relationships from Biomedical Text
Cartic Ramakrishnan, Krys Kochut, Amit P. Sheth: A Framework for SchemaDriven Relationship Discovery from Unstructured Text. International Semantic
Web Conference 2006: 583-596 [.pdf]
Knowledge Enabled Information and Services Science
Method – Parse Sentences in PubMed
SS-Tagger (University of Tokyo)
SS-Parser (University of Tokyo)
• Entities (MeSH terms) in sentences occur in modified forms
• “adenomatous”
modifies
“hyperplasia”
(TOP (S
(NP (NP (DT An)
(JJ excessive)
(ADJP (JJ endogenous) (CC or) (JJ
• “An excessive
endogenous
or exogenous
modifies
exogenous)
) (NN stimulation)
) (PP
(IN by) (NPstimulation”
(NN estrogen)
) ) ) (VP (VBZ
“estrogen”
induces)
(NP (NP (JJ adenomatous) (NN hyperplasia) ) (PP (IN of) (NP (DT
• Entities
can also occur) as
of 2 or more other entities
the)
(NN endometrium)
) ) composites
)))
• “adenomatous hyperplasia” and “endometrium” occur as “adenomatous
hyperplasia of the endometrium”
Knowledge Enabled Information and Services Science
Method – Identify entities and Relationships
in Parse Tree
Modifiers
Modified entities
Composite Entities
TOP
S
VP
NP
VBZ
PP
NP
DT
the
JJ
excessive
JJ
endogenous
IN
by
ADJP
NP
induces
NN
estrogen
NP
NN
stimulation
JJ
adenomatous
CC
or
PP
NN
hyperplasia
IN
of
NP
JJ
exogenous
DT
the
Knowledge Enabled Information and Services Science
NN
endometrium
Resulting Semantic Web Data in RDF
hyperplasia
adenomatous
hasModifier
hasPart
modified_entity2
An excessive
endogenous or
exogenous stimulation
hasModifier
hasPart
modified_entity1
induces
composite_entity1
hasPart
hasPart
estrogen
Modifiers
Modified entities
Composite Entities
endometrium
Knowledge Enabled Information and Services Science
Blazing Semantic Trails in
Biomedical Literature
Cartic Ramakrishnan, Amit P. Sheth: Blazing Semantic Trails in Text: Extracting
Complex Relationships from Biomedical Literature. Tech. Report #TR-RS2007
[.pdf]
Knowledge Enabled Information and Services Science
Relationships -- Blazing the Trails
“The physician, puzzled by her patient's reactions, strikes the trail
established in studying an earlier similar case, and runs rapidly
through analogous case histories, with side references to the classics
for the pertinent anatomy and histology. The chemist, struggling
with the synthesis of an organic compound, has all the chemical
literature before him in his laboratory, with trails following the
analogies of compounds, and side trails to their physical and
chemical behavior.” [V. Bush, As We May Think. The Atlantic
Monthly, 1945. 176(1): p. 101-108. ]
Knowledge Enabled Information and Services Science
Original documents
PMID-15886201
PMID-10037099
Knowledge Enabled Information and Services Science
Semantic Trail
Knowledge Enabled Information and Services Science
Semantic Trails over all types of Data
Semantic Trails can be built over a Web of Semantic
(Meta)Data
extracted (manually, semi-automatically and automatically)
and gleaned from
• Structured data (e.g., NCBI databases)
• Semi-structured data (e.g., XML based and semantic metadata
standards for domain specific data representations and exchanges)
• Unstructured data (e.g., Pubmed and other biomedical literature)
and
• Various modalities (experimental data, medical images, etc.)
Knowledge Enabled Information and Services Science
Applications
Applications
“Everything's connected, all along the line. Cause and effect.
That's the beauty of it.
Our job is to trace the connections and reveal them.”
Jack in Terry Gilliam’s 1985 film - “Brazil”
Knowledge Enabled Information and Services Science
An application in Risk & Compliance
Ahmed Yaseer:
Watch list
• Appears on
Watchlist ‘FBI’
Organization
Hamas
FBI Watchlist
member of organization
appears on Watchlist
Ahmed Yaseer
works for Company
WorldCom
Company
Knowledge Enabled Information and Services Science
• Works for Company
‘WorldCom’
• Member of
organization ‘Hamas’
Global Investment Bank
Watch Lists
Law
Enforcement
Regulators
Public
Records
World Wide
Web content
BLOGS,
RSS
Semi-structured Government Data Un-structure text, Semi-structured Data
Establishing
New Account
User will be able to navigate
the ontology using a number
of different interfaces
Scores the entity
based on the
content and entity
relationships
Example of Fraud
prevention application
used in financial services
Knowledge Enabled Information and Services Science
Hypothesis driven retrieval of Scientific Text
Knowledge Enabled Information and Services Science
Semantic Browser
Knowledge Enabled Information and Services Science
More about the Relationship Web
Relationship Web takes you away from “which document”
could have information I need, to “what’s in the
resources” that gives me the insight and knowledge I
need for decision making.
Amit P. Sheth, Cartic Ramakrishnan: Relationship Web: Blazing Semantic Trails
between Web Resources. IEEE Internet Computing July 2007 (to appear) [.pdf]
Knowledge Enabled Information and Services Science
Events: 3 Dimensions – Spatial, Temporal and Thematic
Spatial
Temporal
Thematic
Knowledge Enabled Information and Services Science
Events and STT dimensions
• Powerful mechanism to integrate content
– Describes the Real-World occurrences
– Can have video, images, text, audio all of the same event
– Search and Index based on events and STT relations
• Many relationship types
– Spatial:
• What events happened near this event?
• What entities/organizations are located nearby?
– Temporal:
• What events happened before/after/during this event?
– Thematic:
• What is happening?
• Who is involved?
• Going further
– Can we use What? Where? When? Who? to answer Why? / How?
– Use integrated STT analysis to explore cause and effect
Knowledge Enabled Information and Services Science
Example Scenario: Sensor Data Fusion and Analysis
Low-level Sensor (S-L)
High-level Sensor (S-H)
H
L
A-H
E-H
A-L
E-L
• How do we determine if A-H = A-L? (Same time? Same place?)
• How do we determine if E-H = E-L? (Same entity?)
• How do we determine if E-H or E-L constitutes a threat?
Knowledge Enabled Information and Services Science
53
Data Pyramid
Sensor Data Pyramid
Relationship
Semantics/Understanding
/Insight
Metadata
Entity Metadata
Information
Feature Metadata
Raw Sensor (Phenomenological) Data
Data
Knowledge Enabled Information and Services Science
54
Sensor Data Architecture
Analysis Processes
Annotation Processes
Knowledge
• Object-Event Relations
• Spatiotemporal Associations
Semantic Analysis
RDF
KB
• Provenance Pathways
SML-S
Entity Detection
SML-S
Feature Extraction
O&M
SML-S
Ontologies
Information
• Entity Metadata
Fusion
• Object-Event Ontology
• Space-Time Ontology
• Feature Metadata
TML
Collection
Data
• Raw Phenomenological Data
Sensors (RF, EO, IR, HIS, acoustic)
Knowledge Enabled Information and Services Science
55
Current Research Towards STT Relationship Analysis
• Modeling Spatial and Temporal data using SW standards (RDF(S))1
– Upper-level ontology integrating thematic and spatial dimensions
– Use Temporal RDF3 to encode temporal properties of relationships
– Demonstrate expressiveness with various query operators built upon
thematic contexts
• Graph Pattern queries over spatial and temporal RDF data2
– Extended ORDBMS to store and query spatial and temporal RDF
– User-defined functions for graph pattern queries involving spatial
variables and spatial and temporal predicates
– Implementation of temporal RDFS inferencing
1.
Matthew Perry, Farshad Hakimpour, Amit Sheth. "Analyzing Theme, Space and Time: An Ontology-based Approach",
Fourteenth International Symposium on Advances in Geographic Information Systems (ACM-GIS '06), Arlington, VA,
November 10 - 11, 2006
2.
Matthew Perry, Amit Sheth, Farshad Hakimpour, Prateek Jain. “Supporting Complex Thematic, Spatial and Temporal
Queries over Semantic Web Data", Second International Conference on Geospatial Semantics (GeoS ‘07), Mexico City,
MX, November 29 – 30, 2007
3.
Claudio Gutiérrez, Carlos A. Hurtado, Alejandro A. Vaisman. “Temporal RDF”, ESWC 2005: 93-107
Knowledge Enabled Information and Services Science
Upper-level Ontology modeling Theme and Space
Continuant
Occurrent
Spatial_Occurrent
Named_Place
Dynamic_Entity
located_at
occurred_at
Spatial_Region
rdfs:subClassOf
property
Occurrent: Events – happen and then don’t exist
occurred_at:Those
Links
Spatial_Occurents
to theirbehavior
geographic
locations
Named_Place:
Spatial_Region:
Records
entities
exact
with
spatial
static
location
spatial
(geometry
objects,
building)
Continuant:
Concrete
and
Abstract
Entities
– persist
over (e.g.
time
Spatial_Occurrent:
Events
with
concrete
spatial
locations
(e.g.
a speech)
located_at:
Links
Named_Places
to
their
geographic
locations
Dynamic_Entity: Those entities with dynamiccoordinate
spatial behavior
system
(e.g.
info)person)
Knowledge Enabled Information and Services Science
Upper-level Ontology
Continuant
Occurrent
Named_Place
Dynamic_Entity
located_at
occurred_at
Spatial_Occurrent
Spatial_Region
City
Person
trains_at
Speech
gives
Politician
Military_Unit
Soldier
participates_in
Military_Event
assigned_to
on_crew_of
used_in
Bombing
Battle
Vehicle
rdfs:subClassOf used
for integration
rdfs:subClassOf
relationship type
Domain Ontology
Knowledge Enabled Information and Services Science
Temporal RDF Graph: Platoon Membership
assigned_to [5, 15]
E4:Soldier
assigned_to [1, 10]
E1:Soldier
E2:Platoon
assigned_to [11, 20]
Time interval represents
valid time of the
relationship
E3:Platoon
assigned_to [5, 15]
E5:Soldier
E1 is assigned to E2 from time 1 to 10 and then
assigned to E3 from time 11 to 20
Also need to handle inferencing:
(x rdf:type Grad_Student):[2004, 2006] AND
(x rdf:type Undergrad_Student):[2000, 2004]
 (x rdf:type Student):[2000, 2006]
Knowledge Enabled Information and Services Science
ORDBMS Implementation: DB Structures
Unlike thematic relationships which are explicitly stated in
the RDF graph, many spatial and temporal relationships
(e.g., distance) are implicit and require additional
computation
Knowledge Enabled Information and Services Science
Sample STT Query
Scenario (Biochemical Threat Detection): Analysts must examine
soldiers’ symptoms to detect possible biochemical attack
Query specifies
(1) a relationship between a soldier, a chemical agent and a battle
location (graph pattern 1)
(2) a relationship between members of an enemy organization and their
known locations (graph pattern 2)
(3) a spatial filtering condition based on the proximity of the soldier and
the enemy group in this context (spatial Constraint)
Knowledge Enabled Information and Services Science
The Machine Factor
Formal representation of knowledge
– RDF(S), OWL, etc.
Statistical analysis
– Similarity
– Cooccurrence
– Clustering
Intelligent aggregation of knowledge
– Collaboration/Problem Solving Environments
– Decision support tools
Knowledge Enabled Information and Services Science
Putting the man back in Semantics
The Semantic Web focuses on artificial agents
“Web 2.0 is made of people” (Ross Mayfield)
“Web 2.0 is about systems that harness collective
intelligence.” (Tim O’Reilly)
The relationship web combines the skills of humans and
machines
Knowledge Enabled Information and Services Science
Putting the man back in Semantics
“Web
2.0 is about
systems
thatthe
harness
collective
intelligence.”
The
relationship
web
combines
skills
of
humans
and
machines
The
Semantic
Web
focuses
on
artificial
agents
“Web 2.0 is made
of
people”
(Ross
Mayfield)
(Tim O’Reilly)
Knowledge Enabled Information and Services Science
Going places …
Formal
Powerful
Implicit
Social,
Informal
Knowledge Enabled Information and Services Science