Some LIRICS topics Peter Wittenburg, Marc Kemps-Snijders MPI for Psycholinguistics 1 hour ?
Download
Report
Transcript Some LIRICS topics Peter Wittenburg, Marc Kemps-Snijders MPI for Psycholinguistics 1 hour ?
Some LIRICS topics
Peter Wittenburg, Marc Kemps-Snijders
MPI for Psycholinguistics
1 hour ?
1
LMF Topics
• is this LIRICS?
• what is LMF compliance?
Some ideas about header
and related standards UNICODE
• docking mechanism of components
• relation mechanism
• operation mechanism
• this is LIRICS for MPI and Sheffield primarily
• DCR Syntax API
• LMF API
• lexicon and component registries incl metadata
2
DCR API 1
• tools such as GATE, LEXUS, ANNEX, … have to make use of
the ISO DCR (and probably other ontologies/concept registries)
• so we need an API (that can also be re-used)
• API has to be delivered finally as a web-service including all aspects
• UDDI layer to search/browse for services
• WSDL layer to describe the interface (methods, …)
• SOAP layer for message exchange
• SYNTAX ISO DCR was not set up as a service, but as a management
tool for editing boards
• therefore a split in final API (the ideal) and first phase API
• in 1. phase: no webservice, URL and details of services are known
• all what comes is a result of a smooth interaction with the LORIA folks
3
DCR API 2
• function List loadProfiles ()
• give me all profiles in DCR
• function List loadDataCategories (aProfile, aWorkingLanguage, aObjectLanguage)
• give me all datcats for a certain profile; result is perhaps a structure
• function List searchDataCategories (aQueryString, aProfile, aWorkingLanguage,
aObjectLanguage)
• search for a datcat by specifying some pattern – mostly a name
• function DataCategory loadDataCategoryReduced (URID, aWorkingLanguage,
aObjectLanguage)
• give me some info for a certain datcat (ID, definition, conceptual domain)
• function DataCategory loadDataCategory (URID, a WorkingLanguage,
aObjectLanguage)
• give me all info for a certain datcat for specified languages
• function List loadAllTopBroaderGenericConcepts ()
• give me all top conceptual domains
• function List giveLinks (aDataCategory)
• give me additional information such as constraints (to be worked out!)
4
DCR API 3
• function DataCategory loadBroaderGenericConceptDataCategory
(aDataCategory/URID)
• give me a broader concept for a name or URID
• function List loadDataCategoriesUsingBroaderGenericConcept (aDataCategory)
• give me all datcats for a broader concept
• function Workspace openWorkspace (aUserName, aUserLogin)
• open a private workspace for a user
• function DataCategory AddDataCategoryToWorkspace (aWorkspace,
aDataCategory, aStatus)
• add a datcat to a workspace
• login into the system
• synchronize with a given cash
5
LMF Registries General
• an LMF API makes only sense if you have a service
• a service makes only sense if you can serve something
• what can LMF services serve:
• lexical schemas
• extensions i.e. ready-made components created by someone
• LMF compliant lexica created by someone
• other lexica related information
• so we need registration services
• MPI will start doing so since we need it now
• will set things up similar to IMDI
• all is open (registries and portal code)
• everyone can easily setup his/her own portal
• will and have to synch about various things
• perhaps people will like it
6
LMF Registries
• registries must give the following services
• register and store a lexical schema
• register and store a lexical component schema
• register (and store) a lexicon (storage can be everywhere)
• delete an entry
• modify an entry
• let the user browse or search for lexica
• let the user browse or search for schemas
metadata based
metadata based
• for metadata start we suggest the stuff that came out of the discussions in
ISLE/MILE (see IMDI)
7
LMF API 1
• if we have found a lexicon what then …
• services based on web-services
• UDDI level – why not the ISLE/MILE stuff
• function LexicalDatabase createLexicalDatabase (Name)
create a lexicon in a workspace
• function LexicalDatabase loadLexicalDatabase (URID)
give me a certain lexicon (into workspace or local?)
• function LexicalDatabase loadLexicalDatabaseDetail (URID, aStructure)
give me a certain lexicon part (filtering into workspace or local?)
• function void storeLexicalDatabase (LexicalDatabase)
store/upload a lexical database
• function LexicalEntry createLexicalEntry (URID, LexicalDatabase)
create/add a lexical entry
• function LexicalEntry loadLexicalEntry (URID)
upload a lexical entry
• function void storeLexicalEntry (URID, LexicalEntry)
update a lexical entry
8
LMF API 2
• function List searchLexicalEntries (aQuery)
search for lexical entries matching the string unstructured
• function List searchLinguisticInformationUnits (aQuery, aStructure)
search for lexical entries matching the string on specific attributes
returning substructures (filtering)
In addition
• function LexicalDatabaseSchema loadSchema (URID)
give me a schema for a specific lexicon
• function void store (LexicalDatabaseSchema)
store and register a schema
• function GlobalInfo loadGlobalInfo (URID)
give me the metadata/globalInfo for a lexicon
• function void storeGlobalInfo (GlobalInfo)
store and register a lexicon with metadata
• function List searchLexicalDatabase (aQuery)
search for lexica based on metadata
9
LMF API 3
• what about
• relations
• where to store relations – need registry mechanism
• if there is one integrated domain of lexica relations can be registered
under this common root
• Gil: took UML – UML has everything in it
so also relations are in UML – so why bother
• Peter: where to register is the question
• these were just first ideas!!
• Monica/Thierry haben ein Tool fuer die Constraints gemacht
10
What else: Relations
• actually component association is a relation of special type
bank
breite Sitzgelegenheit
something broad to sit on
sitzgelegenheit
etwas um zu sitzen
• need various type of relations between
attributes and units in value strings
• each relation can be associated with
features, i.e. relations can be seen as
components in its own
something to sit on
schmal
gegenteil zu breit
contrary to broad
11
Relation Mechanism
• need a generalized relation mechanism (look in Parole lexica etc)
• prefer very simple graphics instead of UML hiding the essentials
relation V
type = any
from
cardinality
to
cardinality
component K
1..N
component L
• relation components are almost normal components,
i.e.they can have components and datcats
• however they don’t have a parent
• do we need the destinction between “to” and “from”
in general relations??
• component reference is a special type of relation
here we need to distinguish “to” and “from”
1..N
• added a few additional stuff in paper (direction)
component X
component Y
relation U
1..1
type = refine
from
to
cardinality
1..N
12
What else: conditions (operations)
just one example from DOBES
lexemtype
if lexemtype = “stem |
idiom | lexical word”
head
outer-body-L
sense nr
meaning
if lexemtype =
“auxil | inflect affix”
etc etc
sense nr
• probably better examples around
if value(X) then modify contraints(Y)
etc
meaning effect
categorial effect
etc etc
13
Operation Mechanism
• well – nothing special perhaps (operators as datcats – see Gil)
• but need sequence of operations
• but we need to be able to add complex operations (code)
then need an invocation and interfacing standard
14
LEXUS etc
• we need to go ahead since we have to deliver usable infrastructures
• so hope on critical comments and fast convergence
• relation mechanism is next on our action list
• LMF API relevant for us since we have to combine LEXUS and LAMUS
• LAMUS = Language Archive Management and Upload System
is ready and working for simple objects (annotated media, …)
• but need to handle complex objects such as lexica
• metadata is done – people can integrate and search for lexica
• registries for schemas and sub-schemas comes next as well
15
LEXUS state
• ISO DCR integrated – Shoebox MDF as well, GOLD to come
• private and protected workspace is ready
• Shoebox/ CHAT filters ready, XML grabbing to come
• first cross lexicon search is ready
• working on private DCR (stripped but compliant Syntax)
• working on Concept Profiles (bottom up generated concept lists)
• working on tools to link bottom up stuff with ISO, …
• working on easy mapping framework
• first interaction with corpora is ready
• first merging is integrated
• what else??
16
Logging onto the application
Users must authenticate before loggin onto the application.
Workshop
‘LexicalDabases
and digital tools’
Nijmegen
April 2004
17
User workspace
Each user has his/her own personal workspace
where private lexica are stored
Workshop
‘LexicalDabases
and digital tools’
Nijmegen
April 2004
18
Lexicon creation
New lexica may be created…
Workshop
‘LexicalDabases
and digital tools’
Nijmegen
April 2004
19
Lexicon import
New lexica may be imported from a lexical resource…
Workshop
‘LexicalDabases
and digital tools’
Nijmegen
April 2004
20
Lexicon structure
The LMF core model can be identified in this simple structure.
Components and datacategories can be identified using different icons.
All may be dynamically created or modified.
Workshop
‘LexicalDabases
and digital tools’
Nijmegen
April 2004
21
Lexicon structure
Representation of a more complex structure. By selecting a node in the
Tree the content of a component or datacategory is shown and may
be modified.
Workshop
‘LexicalDabases
and digital tools’
Nijmegen
April 2004
22
Data category selection
Data categories can easily be selected from data category registries. .
Workshop
‘LexicalDabases
and digital tools’
Nijmegen
April 2004
23
Lexical entry overview
Overview of lexical entries. By selecting a lexical entry the details
will be revealed.
Workshop
‘LexicalDabases
and digital tools’
Nijmegen
April 2004
24
Lexical entry details
Details of a lexical entry. Entry structure modifications are bound to
schema definition, e.g. cardinality.
Workshop
‘LexicalDabases
and digital tools’
Nijmegen
April 2004
25
Lexical entry details
Attribute values can be easily modified. Various value types are
supported( text, video, audio, image or file)
Workshop
‘LexicalDabases
and digital tools’
Nijmegen
April 2004
26
Lexical entry details
Example of uploading a video file.
Workshop
‘LexicalDabases
and digital tools’
Nijmegen
April 2004
27
Lexical entry details
Viewing multimedia content.
Workshop
‘LexicalDabases
and digital tools’
Nijmegen
April 2004
28
Alternative entry view
Workshop
‘LexicalDabases
and digital tools’
Nijmegen
April 2004
Alternative views are provided which may be customized in look and feel.
29
Synchronization of lexica
Personal Workspace
Main Lexicon
Workshop
‘LexicalDabases
and digital tools’
Nijmegen
April 2004
Lexica may be copied to and modified in personal workspace
30
Synchronization of lexica
Personal Workspace
Main Lexicon
Workshop
‘LexicalDabases
and digital tools’
Nijmegen
April 2004
Lexica may be synchronized with main lexicon
31
Synchronization of lexica
When synchronizing lexica the user is notified of structural changes and
is in total control of the synchronization proces.
Workshop
‘LexicalDabases
and digital tools’
Nijmegen
April 2004
32
Future directions
•Support for various types of relations
•Import of data from other sources
•Support for other Data Category Registries, e.g. GOLD
•Integration with MPI archive
•Integration with exploitation tools (ELAN, ANNEX)
•Miscellaneous user requests
Workshop
‘LexicalDabases
and digital tools’
Nijmegen
April 2004
33