Compass - uni

Download Report

Transcript Compass - uni

Compass
Semantic search
www.ovitas.no
Basics



Knowledge model based information
retrieval
Fulltext search enhanced with Topic
Maps = Semantic search
Search driven navigation
12.10.2006
TMRA '06
2
Search technologies
Semantic search
Level of precision
("Intelligence")
Conceptual search
Full-text search
Data volume
(Domain size)
Compass
12.10.2006
TMRA '06
3
Given...

a web site with a lot of text,

which is unstructured (no markup, no
tags),

a controlled domain (we know what the
discourse domain is), and

non-adequate search engine...
12.10.2006
TMRA '06
4
We would like to...



get relevant hits within a meaningful
context,
spare the work of structuring our data,
add semantics to the content by defining
a knowledge model.
12.10.2006
TMRA '06
5
Compass-bowl:
Take a fulltext search engine.
Take a Topic Maps engine.
Add a hint of semantics.
Define the correct processes for
orchestrating the components.
Mix them thoroughly.
Serve to public!
12.10.2006
TMRA '06
6
Full text search engine

Apache Lucene (open source)

Possible to index most file formats


The index is independent of the model


No need to re-index when changes are made to the
model
Small index size


html, asp, php, jsp, pdf, rtf, txt, doc, ppt, xls, pst…
typically less than 10% of the size of the data
Fast index lookup

12.10.2006
less than 20 ms for index size >20000
TMRA '06
7
The knowledge model

Based on the ISO International Standard
for Topic Maps

Semantic model of the discourse domain

Concept words = topic names/synonyms

Semantic relationships through
associations

Compass Weight defines “closeness”
between topics

12.10.2006
property on association types
TMRA '06
8
Example
Ovitas
CW=0.8
hasProduct
type
type
hasEmployee
CW=0.7
Compass
12.10.2006
Christopher
TMRA '06
9
Compass orchestrator

Guides the processes of the search:
1.
2.
3.
4.
5.
12.10.2006
Search for term in the topic map
Expand the map for relevant/related topics
Send all these terms off to a fulltext search
Calculates relevance (based on the
combination of CW and Lucene weights) and
prepares the result list as an XML instance
Render XML as wished
TMRA '06
10
Search term
Hits in the fulltext
gruouped by the
related topics
Topic Map
expansion
Relevant documents ranked
by the weighting result
Search term in the
topic map, but not
in the text
Relevant information
about ”Chris
Searle”
Synonym
search
Creating/maintaining the model




An MS Excel plug-in serves as the topic
map editor
Can be put under version control
Import the model into the topic map
engine: one click only
For complex topic maps a custom user
interface can be used to enter instance
data
12.10.2006
TMRA '06
15
Navigation

Navigation through the associations
between topics

Navigation by search
12.10.2006
TMRA '06
16
User configurations




What pages to index
What topic map to use
The number of hops to perform
The threshold for relevance
12.10.2006
TMRA '06
17
Content lifecycle management




Easy to integrate with content
repositories
A content management or publishing
system can send a request to the indexer
to re-index a particular resource
Incremental indexing: add, update or
delete documents
HTTP is used as the basic mechanism to
address content
12.10.2006
TMRA '06
18
Architecture




SOA (service oriented architecture), no
dependency on platform or components
Web service interface (HTTPRest)
.NET platform
Integrated components:


12.10.2006
TMCore Topic Maps engine by
NetworkedPlanet
Apache Lucene: full text engine
TMRA '06
19
Architecture diagram
TM
Nav
TM Core
Excel Editor
TM editor
person
12.10.2006
Full Text
Compass Service
Publishing System
Services
User
TMRA '06
20
Compass - Summary







Semantic search based on Topic Maps
Search in any document formats
Organize information in a topic-oriented
manner
Link to relevant information without touching
the data content
Conceptual navigation by Topic Maps
Tools for maintaining/evolving the
classification
Fast and easy implementation
12.10.2006
TMRA '06
21