Language Engineering

Download Report

Transcript Language Engineering

Language Engineering Research
at
Resource Centre for Indian Language Technology
Solutions
University of Hyderabad
6 July 2015
K. Narayana Murthy, DCIS, UoH
1
So far

UCSG System of Syntax, Parsers
 English-Kannada Machine Aided Translation
 OCR for Telugu and other Indian Languages

Telugu Corpus (10 Million Words)
 Experimental Text-to-Speech System for Telugu

A Variety of tools
6 July 2015
K. Narayana Murthy, DCIS, UoH
2
On Going

Word Sense Disambiguation
 Robust Shallow Parsing
 Language Identification
6 July 2015
K. Narayana Murthy, DCIS, UoH
3
Planned

Speech Technologies
– Speech Recognition
– Text-to-Speech

Long Term Vision: Speech-to-Speech
Translation between English and ILs
6 July 2015
K. Narayana Murthy, DCIS, UoH
4
The UCSG Architecture
Sentence
L Grammar
L
H Grammar
F Grammar
F
H
6 July 2015
K. Narayana Murthy, DCIS, UoH
F Structure
5
Architecture of a Hybrid Machine Translation System
SL Sentence
TL Sentence
Tagger(HMM)
Post Editing
Identify/Rate
Word Groups
(FSM, Markov Models, MI)
Syntactic Generator
Identify Clause Structure
Assign Functional Roles
to Word Groups
Structural Description (TL Inst.)
(The Phrases & their Roles of each clause)
Rate/Rank Role Assignments
Best First Search for Best Parse
Structural Description (SL Inst.)
(The Phrase & their Roles for each Clause)
6 July 2015
TL Sentence Planner
K. Narayana Murthy, DCIS, UoH
Clause/Phrase/Word level
Transfer (WSD Statistics)
6
English – Telugu – English Speech to Speech Translation
English Speech
Telugu Speech
English ASR
English Text
Telugu TTS
English-Telugu MT
Telugu Speech
English Speech
Telugu ASR
Telugu Text
6 July 2015
Telugu Text
English TTS
Telugu-English MT
K. Narayana Murthy, DCIS, UoH
English Text
7
What we have Developed: Telugu







Telugu Corpus (10 Million words)
English-Telugu, Telugu-Hindi dictionaries
Telugu Morphological Analyzer
OCR System for Telugu
Telugu Spell Checker
Telugu TTS systems
Electronic versions of several dictionaries
6 July 2015
K. Narayana Murthy, DCIS, UoH
8
What we have developed: Kannada
Kannada and English-Kannada
dictionaries
 Kannada Thesaurus
 Kannada Morphological Analyzer and
Generator

6 July 2015
K. Narayana Murthy, DCIS, UoH
9
What we have developed: MT








English-Kannada Machine Aided Translation
system for Govt. of Karnataka
Budget Speech Texts: 100 to 150 Pages
40 – 60 % automatic, rest semi-automatic
Powerful Post-Editor
Parser Based Translation
MAT: High quality can be ensured
Efficient: A full book processed in just a few
minutes
MAT2: Interactive man-machine hybrid planned
6 July 2015
K. Narayana Murthy, DCIS, UoH
10
Other Contributions
AKSHARA – Advanced Multi-Lingual Text
Processor
 VIDYA – Web Based Education system
 History-Society-Culture Portal
 On-Line Searchable Directory
 Language Technology Tool Kits

6 July 2015
K. Narayana Murthy, DCIS, UoH
11
Thank You
Visit
www.LanguageTechnologies.ac.in
6 July 2015
K. Narayana Murthy, DCIS, UoH
12