Transcript Slide 1

English-Lithuanian-English
Lexicon Database
Management System for MT
Gintaras Barisevicius and Elvinas Cernys
Singleton Labs.
Kaunas University of Technology, Department of Software Engineering
Situation in Lithuania
Speak
English
35%
Don't speak
English
65%
2
Situation in Lithuania
 General electronic dictionaries
 http://www.fotonija.lt; http://www.led.lt
 Morphological analysis tools
 Text corpora (100 mln. words)
 http://donelaitis.vdu.lt
 Speech recognition systems
 Machine translation research
3
Previous dictionary







Open to the user dictionary
Rigid dictionary structure
Lack of attributes
Not all parts of speech included
Indexed files for dictionary storage
Polysemy not included
Phrases not included
4
Requirement to new system





Open to the user dictionary
Easy management of the attributes
All parts of speech
Big volume storage
Solution to polysemy and synonyms
5
Project size
Comments;
5800 CS
20%
System Code;
19430 CS
67%
Empty strings;
3770 CS
13%
6
Current system
Orientated to MT
More attributes, easy to extend
All parts of speech included
Database for dictionary storage
Polysemous words, domains
Automatic generation of
morphological forms
 System can work on various OS.






7
Development process




From C++ to Java
Rational Rose tool
CVS for version control management
MySql database
8
Adding new languages
Meanings1
Noun_Lithuanian
Word
ID_LT
ID_ENG
Meanings2
ID_LT
ID_GER
Noun_English
Word
Noun_German
Word
9
System deployment localy
Local dictionary
database
Desktop PC
10
System deployment online
Dictionary server
Router
Terminal to Internet
Notebook
Notebook
Desktop PC
11
Future ambitions




Phrases
Text corpora
Syntax rule realization
Additional features
 Possible other translation choices
 WEB translation
 Video subtitle translation
12
Text corpora usage in MT
 The pen is on the table.
PEN
RASIKLIS
Look usage with RASIKLIS
RASIKLIS usage with
STALAS is more often!
STALAS
Text corpora
TABLE
LENTELE
Look usage with RASIKLIS
13
Conclusions
 Thorough analysis of Lithuanian and
English language conducted
 Additional features to the dictionary
have to be added (phrases, syntax
rules)
 Filling the dictionary can be started
 Machine translation is underway
14
Thank you for your attention.
Gintaras Barisevicius
[email protected]
[email protected]
15