Transcript Slide 1
English-Lithuanian-English
Lexicon Database
Management System for MT
Gintaras Barisevicius and Elvinas Cernys
Singleton Labs.
Kaunas University of Technology, Department of Software Engineering
Situation in Lithuania
Speak
English
35%
Don't speak
English
65%
2
Situation in Lithuania
General electronic dictionaries
http://www.fotonija.lt; http://www.led.lt
Morphological analysis tools
Text corpora (100 mln. words)
http://donelaitis.vdu.lt
Speech recognition systems
Machine translation research
3
Previous dictionary
Open to the user dictionary
Rigid dictionary structure
Lack of attributes
Not all parts of speech included
Indexed files for dictionary storage
Polysemy not included
Phrases not included
4
Requirement to new system
Open to the user dictionary
Easy management of the attributes
All parts of speech
Big volume storage
Solution to polysemy and synonyms
5
Project size
Comments;
5800 CS
20%
System Code;
19430 CS
67%
Empty strings;
3770 CS
13%
6
Current system
Orientated to MT
More attributes, easy to extend
All parts of speech included
Database for dictionary storage
Polysemous words, domains
Automatic generation of
morphological forms
System can work on various OS.
7
Development process
From C++ to Java
Rational Rose tool
CVS for version control management
MySql database
8
Adding new languages
Meanings1
Noun_Lithuanian
Word
ID_LT
ID_ENG
Meanings2
ID_LT
ID_GER
Noun_English
Word
Noun_German
Word
9
System deployment localy
Local dictionary
database
Desktop PC
10
System deployment online
Dictionary server
Router
Terminal to Internet
Notebook
Notebook
Desktop PC
11
Future ambitions
Phrases
Text corpora
Syntax rule realization
Additional features
Possible other translation choices
WEB translation
Video subtitle translation
12
Text corpora usage in MT
The pen is on the table.
PEN
RASIKLIS
Look usage with RASIKLIS
RASIKLIS usage with
STALAS is more often!
STALAS
Text corpora
TABLE
LENTELE
Look usage with RASIKLIS
13
Conclusions
Thorough analysis of Lithuanian and
English language conducted
Additional features to the dictionary
have to be added (phrases, syntax
rules)
Filling the dictionary can be started
Machine translation is underway
14
Thank you for your attention.
Gintaras Barisevicius
[email protected]
[email protected]
15