Transcript Slide 1
English-Lithuanian-English Lexicon Database Management System for MT Gintaras Barisevicius and Elvinas Cernys Singleton Labs. Kaunas University of Technology, Department of Software Engineering Situation in Lithuania Speak English 35% Don't speak English 65% 2 Situation in Lithuania General electronic dictionaries http://www.fotonija.lt; http://www.led.lt Morphological analysis tools Text corpora (100 mln. words) http://donelaitis.vdu.lt Speech recognition systems Machine translation research 3 Previous dictionary Open to the user dictionary Rigid dictionary structure Lack of attributes Not all parts of speech included Indexed files for dictionary storage Polysemy not included Phrases not included 4 Requirement to new system Open to the user dictionary Easy management of the attributes All parts of speech Big volume storage Solution to polysemy and synonyms 5 Project size Comments; 5800 CS 20% System Code; 19430 CS 67% Empty strings; 3770 CS 13% 6 Current system Orientated to MT More attributes, easy to extend All parts of speech included Database for dictionary storage Polysemous words, domains Automatic generation of morphological forms System can work on various OS. 7 Development process From C++ to Java Rational Rose tool CVS for version control management MySql database 8 Adding new languages Meanings1 Noun_Lithuanian Word ID_LT ID_ENG Meanings2 ID_LT ID_GER Noun_English Word Noun_German Word 9 System deployment localy Local dictionary database Desktop PC 10 System deployment online Dictionary server Router Terminal to Internet Notebook Notebook Desktop PC 11 Future ambitions Phrases Text corpora Syntax rule realization Additional features Possible other translation choices WEB translation Video subtitle translation 12 Text corpora usage in MT The pen is on the table. PEN RASIKLIS Look usage with RASIKLIS RASIKLIS usage with STALAS is more often! STALAS Text corpora TABLE LENTELE Look usage with RASIKLIS 13 Conclusions Thorough analysis of Lithuanian and English language conducted Additional features to the dictionary have to be added (phrases, syntax rules) Filling the dictionary can be started Machine translation is underway 14 Thank you for your attention. Gintaras Barisevicius [email protected] [email protected] 15