Morphology For Marathi POS-Tagger Veena Dixit

Download Report

Transcript Morphology For Marathi POS-Tagger Veena Dixit

Morphology
For
Marathi POS-Tagger
Veena Dixit
11/ 10 /2005
Contents
• Word
• Morphology
• Marathi Morphology - definition of the task and
difficulties thereto.
• Marathi Morphology - solutions to the challenges
• Different word classes
• Postpositions
• Particles
• Interjections
• Conjunctions
• Pronouns
• Adjectives
• Adverbs
• Verbs
• Nouns
• Words are the orthographical strings
separated by spaces and some punctuation
marks.
• To syntax, words make sentences and to
morphology, word has internal structure
and has different inflectional forms.
• Inflectional forms of a root word form a
paradigm based on a principle.
• Root word is the form which is stored in
lexicons / dictionaries.
What is Morphology?
• Morphology is the study of forms of words in
the language, especially the different forms
used in declensions, conjugations, and word
building. It deals with the morphemes.
• Morpheme is a term which refers to the
smallest component of a word that (a) seems to
contribute some sort of meaning, or a
grammatical function to the word to which it
belongs, and (b) cannot be decomposed into
smaller morphemes.
Marathi Morphology
Definition of the task and difficulties thereto
• Morphological analysis of Marathi plays
significant role in natural language processing
because Marathi, a pan Indian Language, is
rich in morphology.
• Marathi, being the language of the area
situated centrally, gets influenced by almost all
language groups of India.
• This makes the Marathi morphology more
complicated.
Marathi Morphology
solutions to the challenges
• Morphological analysis is done category wise.
• Parameters for changes in the root word are
identified.
• Rules are constructed in the tabular form to
facilitate computation.
Marathi Word Classes
•
•
•
•
•
•
•
•
•
•
Nouns
Pronouns
Adjectives
Verbs
Adverbs
Postpositions
Conjunctions
Interjections
Particles
Punctuation Mark
Postpositions
• Postposition is the morpheme that follows the
words and shows the relation between the
word that is followed and other word in the
sentences.
• Case markers and shabdayogi avyaya are
classified as postpositions in Marathi because
they show same behavior.
(ref. ‘Classification of Words’, Veena Dixit,
proceedings of 26th AICL, Shilong, 2004 )
Postpositions (continued)
• In Marathi, postpositions are attached to all classes
of words except interjection. examples
• When a postposition is attached to a stem it
produces mainly adverb, but also, adjective and
conjunction.
• Postpositions are handled along with other word
classes.
• 5 subgroups of postpositions are identified on the
basis of what is the possible order of their
attachment and to which group of words they can
be attached.
Particles
• Strings like ही – hi_also, च – cha_only, सद्ु धा –
suddha_also, are
– sometimes attached to other words (e.g.. खाली –
khaali _under – खालीसद्ु धा - khaalisuddhaa_under
also / झाड - jhaaDa _ tree - झाडसद्ु धा jhaaDasuddhaa _ tree also )
– or sometimes they are written separately (e.g..
झाडाखाली - jhaaDaakhaali_ under the tree –
झाडाखाली सद्
ु धा - jhaaDaakhaalisuddhaa_ under the
tree also).
• When such words are attached to other words, the
word to which it is attached, does not get inflected.
Interjections
•Interjections are identified from the lexicon and
stored to produce the tag.
Conjunctions
•Conjunctions are identified from the lexicon
and stored to produce the tag.
•Morphology also plays a role in the case of
conjunctions.
Conjunctions (continued)
• When some of Marathi postpositions are
attached to a pair of demonstrative pronouns,
they produce a pair of conjunctions in some
instances.
जो – ज्यापासन
ू (jo – jyaapaasuna --- which – from which)
तो – त्यापासन
ू (to – tyaapaasuna --- that – from that)
ज्यापासन
ू काल सरु
ु वात केली, त्यापासन
ू आज नक्कीच सरु
ु वात
करायला नको. – jyaapaasuna kaala suruvaata keli,
tyaapaasun aaja nakkicha suruvaata karaayalaa
nako_One should not start from the (same point) from
which it was started yesterday.
Pronouns
• Number of inflected forms of a pronoun and
the rules describing such inflection are almost
equal in number.
• Number of pronouns and their respective
inflected forms are finite and less when
compared to verbs and nouns.
• All inflected forms of the pronouns will be
stored to produce the tag for pronoun.
• Derivational morphology of pronoun is
handled with rules.
Pronouns (continued)
Inflectional forms of pronouns act either as
adjectives (माझा – maajhaa_my) or as adverbs
(मला – malaa_to me ) or as conjunctions (जो –
ज्यापासन
ू (jo – jyaapaasuna --- which – from
which) तो – त्यापासन
ू (to – tyaapaasuna --- that –
from that)).
Pronouns (continued)
• All together 29 pronouns have 526
inflectional forms, which are either words or
stems.
• 21 paradigms are identified generating several
rules.
Adjectives
• Adjectives are mainly, inflectional and non inflectional.
• Adjectives inflect for gender, number and
attachment of postposition to the noun they
modify.
• Adjectives in Marathi agree in gender and
number with the nouns they modify.
Adjectives (continued)
• All inflectional adjectives belong to one paradigm,
which corresponds to several rules for generating
inflectional and derivational forms from an adjective.
• Most of ‘aa’ ending adjectives agree with masculine
nouns and further get inflected according to the
gender and number of the noun they modify. (मोकळा /
मोकळी / मोकळे / मोकळ्या_mokaLaa / mokaLi /
mokaLe / mokaLyaa_empty)
• There are some exceptions to this rule, such as, (जादा
- jaada_extra, नाना – naanaa_different, वाया
vaayaa_wasted).
Adverbs
• Adverbs are mainly, inflectional and non inflectional.
• Adverbs inflect for attachment of
postpositions.
खाली – (khaali_under –-- खालपासन
ू – khaalapaasuna
_from the underneath)
Verbs and Nouns will be discussed in next
sessions.
Thank you.
Veena Dixit
11/ 10 /2005