Transliterated Search using Syllabification
of Computer Science, Gujarat University, Ahmedabad, India.
2L.J. College of Engineering, Ahmedabad, India
@FIRE 4rth Dec 2013
Error And Analysis
There is need to provide local language support in web based
applications because various domains such as ecommerce sites
require English knowledge.
The challenge in transliteration is take the word “राष्ट्रपति” for
this word “rashtrapati”, “rashtrapathi”, “raashtrapathy”,
“raashtrpati” are various possible combinations may possible
which one should be correct is again an issue.
Transliteration tasks become difficult in presence of out of
vocabulary words (OOV) and noisy words.
In both the subtasks, the transliteration was performed using
In the subtask-1, we had done the morphological analysis of
English words , then a corpus based approach used to identify
frequently occurring Hindi words.
In the subtask-2, the queries were formulated that contained
both Roman and Devanagari script and Roman script for
separate run submissions.
Linguists have different languages have constraints on possible
consonant and vowel sequences that characterize not only the
word structure for the language but also the syllable structure.
Vowels @ center (nucleus)
consonant @ beginning (onset)
End is coda
Syllable Structure Example
स ◌ु द ◌ुा क र
ज िु◌ ि ◌ु श
न ◌ुा र ◌ुा य ण
श िु◌ व
म ◌ुा ध व
म ◌ु ह म ◌ु म द
Algorithm for subtask-I
Step 1: First of all words are fetching in English dictionary.
Step 2: perform spell-check ,stemming and also morphological
analysis for English language, if no spell error and match found then
label the word as English =E.
Step 3: If English word are not found then check with English
corpus of US News paper.
Step 4: If English word found then check with English corpus of
Indian news paper.
Step 5: If English word found in US News paper and not found in
Indian news paper then word=E.
Step 6: Step 2 and step 5 are parallel apply for English
words and label as =\E.
Step 7: Remaining words would be transliterate into
Hindi words and Label the word as = \H.
Step 8: Apply to Moses tool ,which one is help English
words transliterate into Hindi words.
RESULT OF SUBTASK-1
Results For Subtask 2
Run 1 “मर सापन न कक रानी काब आयगी ि mere sapnon ki rani
kab aayegi tu”.
Run 2 “mere sapnon ki rani kab aayegi tu”.
Error And Analysis
There are some problems in the transliteration which
decreased the precision.
Error in the maatra : “sapnon” => “सापन न”, “ki” => “की”,
“kab” => “काब”, “main” => “ममन” & “mein” => “मीन” , na
=> न & ka => क
Multiple Mapping of the words e.g. T = ि, ट, i.e. tera=>टरा,
tum => िूम, to => ट , teri =>टरर .
Missing sounds (फ, ख, छ ‘chh’, ksh) i. e. for word “accha” we
got “आक्का”, for , “poochho” we got “पछ
Multiple Transliterations- c,k
The vowel are not giving perfect answers
i.e. “lo” => “लॉ” , “ho”=> “ह र”, “ko” => “कॉ”
Conjuncts formation(“kya” => “कया”)
Missing of vowels ‘ak tr khan’ (अक ु िर खान)
‘y’ As Vowel: ‘anthony’ & ‘Shyam’
We used the syllabification approach and considered the most
probable term in the transliteration process. The word labeling
task was performed assuming that a term either belongs to
English language or Hindi language. We were able to get high
accuracy in English recall as the labeling approach used
morphological analysis and dictionary approach. However due
to syllabification model, the transliteration did not give high
precision resulting in lower precision of transliteration tasks
and subsequently lower precision metrics in the song lyrics