01LaalalaaSplIndices

Download Report

Transcript 01LaalalaaSplIndices

Special Indices for LaaLaLaa Lyric
Analysis & Generation Framework
Dr. V. Madhan Karky,
Tamil Computing Lab (TaCoLa),
College of Engineering Guindy,
Anna University, Chennai.
Overview
•
•
•
•
•
•
•
•
•
Objective
Introduction
Background
Rhyme Schemes in Tamil
Meter Pattern
System Architecture
Indexing Structure
Indexing Algorithm
Results and Analysis
Objective
• To build special indices for the LaaLaLaa lyric
analysis and generation framework to
facilitate faster retrieval based on
– Meter Pattern
– Rhyme
Introduction
• Tamil is a vibrant language with a rich grammar,
vocabulary, an inherent poetic flavor.
• About 1000 lyrics are being written every year as
private albums, jingles and as original soundtracks of
mainstream movies.
Background
• WASP (Pablo Grevas (2000)) splits a given block of
text, identifies patterns and fits words from the
vocabulary to get verses of similar pattern.
• COLIBRI (Agudo, Grevas, Calero (2002)) follows a
case based approach to poem generation.
Background
• Tra-la-Lyrics (Oliviera, Cardoso, Pereira (2005)) finds
out the beat pattern of the midi file and places
words with similar syllabic division and stress
pattern.
• Automatic generation of tamil lyrics for melodies
(Ramakrishnan, Kuppan, Devi (2009)) converts the
midi file to KNM reprsn and fits words to it from a
corpus. Phrases are meaningful as parts only. No
edhugai, monai or iyaybu.
Background
• LaaLaLaa (Sowmiya, Karky (2010)) talks of splitting raw
text from midi file to templates and filling them with
words from a wordnet according to the pattern mined
from an existing corpus of lyrics with due consideration
to rhyme, meaning and flow.
Rhyme Schemes in Tamil
• Two words are said to rhyme in
– Monai (ம ோனை) - first letters are the same.
– Edhugai (எதுனை) - second letters are the same.
– Iyaibu (இனைபு) - last letters are the same.
Rhyme Schemes in Tamil
• Examples:
– பறவை and பச்வை rhyme in monai.
– அருவி and விருப்பு rhyme in edhugai.
– ைோக்னை and வோழ்க்னை rhyme in iyaibu.
– அருவி and குருவி rhyme in edhugai and iyaibu.
– ைவினைைள் and ைவிஞர்ைள் rhyme in all the
three schemes.
Meter Pattern
• Maathirai ( ோத்ைினை) - time taken to wink an eyelid.
• Maathirai based classification of Tamil alphabets.
– Nedil (N) (நெடில்) - Those alphabets which are pronounced for the
time interval of 2 maathirai.
– Kuril (K) (குறில்) - Alphabets which take 1 maathirai to be pronounced.
– Mei (M) (ந ய்) - Alphabets which are pronounced for 0.5 maathirai.
• Meter pattern of a word refers to its Kuril Nedil Mei pattern.
• For example, the Meter pattern of the word போடல் is NKM as போ is
a Nedil(N), ட is a Kuril(K) and ல் is a Mei(M).
System Architecture
Word Object
Convertor
Lyric
DB
Rhyme
Extractor
Rhyme Pattern
Extractor
Index Builder
Rhyme Meter
Index
Indexing Structure
Part of Speech
Letter1
Words
Letter2
Words
Letter1
Words
Letter2
Words
Letter1
Words
Letter2
Words
Letter1
Words
Letter2
Words
Letter1
Words
Letter2
Words
Letter1
Words
Letter2
Words
MeterPattern1
ம ோவை
MeterPattern2
MeterPattern1
எதுவை
MeterPattern2
MeterPattern1
இவைபு
MeterPattern2
Indexing Algorithm
Result and Analysis
4000
3500
3000
Meter Rhyme Indexed Approach
2500
Word Indexed Approach
Retrieval Time
(in milliseconds)2000
1500
1000
500
1
18
35
52
69
86
103
120
137
154
171
188
205
222
239
256
273
290
307
324
341
358
375
392
409
426
443
460
477
494
0
Word
Results and Analysis
• Retrieval complexity of both the approaches
tested using a dataset of 500 tamil words.
• The average retrieval time in
– Word indexed approach - 875.47millisecond
– Meter Rhyme Indexed approach – 1.90millisecond
Results and Analysis
• The drastic decrease in retrieval time from
O(α) to O(1) [α is the number of words in the
database] is due to
– The use of hash-tables which are efficient for
retrieval.
– Having separate hash-tables for the ம
எதுனை and இனைபு of each POS.
ோனை,
Thank You!!!