WP1: Integration of language resources in eLearning

Download Report

Transcript WP1: Integration of language resources in eLearning

Crosslingual
Ontology-Based
Document Retrieval (Search)
in an eLearning Environment
RANLP, Borovets, 2007
Eelco Mossel
University of Hamburg
Framework
• EU-Project LT4eL: Language Technology for eLearning
(www.lt4el.eu)
• Goal: use of Language Technology to improve the
effectiveness of Learning Management Systems
• Multilingual Setting: 8 languages
• 12 European partner universities/institutes
• Crosslingual search: work together with:
– Cristina Vertan, Stefanie Reimers (University of Hamburg)
– Kiril Simov and his team (Bulgarian Academy of Sciences, Sofia)
– Alex Killing (ETH Zürich (Eidgenössische Technische Hochschule))
2
Overview
•
•
•
•
Goals of semantic search
Resources for search function
Functionality and architecture
Further work
3
Crosslingual semantic search
Goals of the approach
1. Improved retrieval of documents
– Find documents that would not be found by simple
text search (exact search word occurs in text)
– Example: search for “screen” – retrieve doc that
contains “monitor” but not “screen”.
2. Multilinguality
– One implementation for all languages in the project
3. Crosslinguality
– Find documents in languages different from
search/interface language
• No need to translate search query
• Search possible with passive foreign language knowledge
4
Overview of resources
• A multilingual document collection
• An ontology including a domain ontology on the
domain of the documents
• Concept lexicalisations in different languages
• Annotation of concepts in the documents
5
Overview of resources (graphical)
Lexicons:
TermConcept
BG
CS
DE
EN
MT
NL
PL
PT
RO
Ontology
LOs
BG
CD
DE
EN
MT
NL
PL
PT
RO
6
Search procedure
Search-Terms
(multiple
languages)
Lexicons:
contain
term-concept
mappings
Search-Concepts
Ontology:
contains
concepts
Document
Database
Visualisation
Retrieved
Documents
7
Search with ILIAS
8
9
Internal components
Search functionality comprises:
1. Find terms in lexicons that reflect search
query.
2. Find corresponding concepts for derived
terms.
3. Find relevant documents for concepts.
4. Create ranking for set of found documents.
5. Create ontology fragment containing
necessary information to present concept
neighbourhood
6. Find “shared concepts”
10
Architecture
LMS / ILIAS /
other system using
the search functionality
Crosslingual
Search
Lexicon
Lookup
Component
Ontology
Management
System
Lexicon
Ontology
Ontology
Search
Engine
Lucene
Database
11
1: Query  Terms
• Why start with a free text query?
– User wants results fast (as in Google)
– Compete with fulltext search and keyword search
– Find starting point for ontology browsing
• Query  lexicon: adopted/implemented
strategies for
– Case and diacritic insensitive
– Create combinations for multiword terms
Example: Text Editor 
•
•
•
•
•
text-editor
texteditor
text editor
text
editor
12
1: Query  Terms (continued)
• Other ideas to improve recognition of query:
–
–
–
–
Lemmatisation of search terms
Expansion of lexicon with word forms
Match substrings
Match similar strings
• Insertion of function words
e.g. Portuguese: “provedor acesso”  “provedor de acesso”
- Dynamic list of available terms that contain input so
far (involves change of GUI)
13
2: Term  Concept
Not always 1:1 mapping.
• Corresponding concept is missing from ontology
– LT4eL: not in lexicon
• Unique result: term is lexicalisation of one concept
• Multiple concepts from one domain, e.g.:
– Key (from keyboard)
– Key (in database)
• Concepts from more domains:
– Window (graphical representation on monitor)
– Window (part of a building)
• Different concepts for different languages:
– “Kind” (English: sort/type)
– “Kind” (German: child)
 Let the user choose: present multiple browsing units
14
3: Concept  Documents
• Simplest:
– Disjunctive search with ranking
• For each concept, each document that is annotated with it is
returned
• Documents with more search concepts are ranked higher
– DISADVANTAGES:
• (too) many results
• slower
• Use super/subconcepts
• Further possibilities
– Conjunctive search:
• Combination of concepts must occur in a document
• Is taken into account by current ranking
– DISADVANTAGES:
• For automatic concept search: concept set might be larger than
expected, thus restricting search results too much
15
3: Concept  Documents (continued)
• How useful is it, to find documents that treat a
superconcept?
– Negative example: lt4el:Subroutine  lt4el:Software.
– Positive example: lt4el:WebPortal  lt4el:Website.
• How useful is it, to find documents that treat a
subconcept?
– lt4el:Program has 93 subconcepts, e.g.:
•
•
•
•
ApplicationProgram
Computervirus
Driver
Unzip
16
4: Ranking
• Number of different search concepts
• Annotation frequency: number of times search
concepts are annotated in the document
– Normalise: divide by document length
• Superconcepts and subconcepts of search
concepts have lower weight
– A factor determines their weight
• Language of document:
– Sort per language? (currently)
– Sort by ranking throughout (independent of)
languages?
– Make language a factor in ranking?
17
Evaluation
• Does semantic search return correct results?
(appropriate documents)
• How easy is it to use semantic search?
• Are the results better (precision/recall) than
with keyword search or fulltext search (also
available in ILIAS)?
– Relevant for monolingual scenario
• Is the learning process improved?
– Depends on quality of ontology and annotation
– In multilingual case: depends on domain knowledge
and language knowledge of multilingual test persons
18
Future work
• Display document fragment for search results,
in addition to title.
– Choose contexts, where search concepts occur close
together
– More on this Thursday 18:30 at BIS-21++ information
session.
• Integrate faster document lookup component
• Improve: search term  lexicon entry
• Make use of more relations than
super/subconcepts
• Possibly other changes like:
– Sort differently than per language
19
Thank you
20