Constraint Based Hindi Parser LTRC, IIIT Hyderabad Introduction Broad coverage parser Very crucial IL-IL MT systems, IE, co-reference resolution, etc.
Download ReportTranscript Constraint Based Hindi Parser LTRC, IIIT Hyderabad Introduction Broad coverage parser Very crucial IL-IL MT systems, IE, co-reference resolution, etc.
Constraint Based Hindi Parser
LTRC, IIIT Hyderabad
Introduction
Broad coverage parser Very crucial IL-IL MT systems, IE, co-reference resolution, etc.
Why Dependency ?
Phrase Structures Intrinsically presumes order Context Free Grammar (CFG) not well-suited for free-word order languages (Shieber, 1985) Particularly ill suited to Indian Languages Dependency Structures Gives flexibility Common structures With appropriate labels, closer to Semantics
Computational Paninian Grammar (CPG)
Based on Panini’s Grammar (500 BC) Inspired by Inflectionally rich language (Sanskrit) A dependency based analysis
Computational Paninian Grammar (The Basic Framework)
Treats a sentence as a set of modifier modified relations Sentence has a primary modified or the root (which is generally a verb) Gives us the framework to identify these relations Relations between noun constituent and verb called ‘
karaka’ karakas
are syntactico-semantic in nature Syntactic cues help us in identifying the
karakas
karta
– karma karaka
The boy opened the lock k1 –
karta
k2 –
karma
karta, karma
usually correspond to agent, theme But not always k1
boy open
k2
lock
karakas
are direct participants in the activity denoted by the verb
Basic karaka relations
karta –
agent/doer/force Relation label – k1
karma –
object/patient Relation label – k2
karana –
instrument Relation label – k3
sampradaan –
beneficiary Relation label – k4
apaadaan –
source Relation label – k5
adhikarana –
location in place/time/other Relation label – k7p/k7t/k7 For complete list of dependency relations: (Begum et al., 2008)
Basic karaka relations
raama phala khaataa hai
‘Ram eats fruit’
Basic karaka relations
raama chaaku se saiv kaatataa hai
‘Ram cuts the apple with knife’
Basic karaka relations
raama ne mohana ko pustaka dii
‘Ram gave a book to Mohan’
Why Paninian Labels
Other choices for labels could be Grammatical relations Subject, Object, etc.
Behavioral tests (Mohanan, 1994) Thematic roles Agent, patient, etc.
No concrete cues Difficult to extract them automatically Karakas can be computationally exploited Syntactically grounded, Semantically loaded Gives a level of interface
Levels of Language Analysis
Morphological analysis ( Morph Info.
) Analysis in local context ( POS tagging ) Sentence analysis ( Chunking , Parsing ) Semantic analysis ( Word sense disambiguation, etc.
) Discourse processing ( Anaphora resolution, Informational Structure, etc.
)
Example
rAma ne mohana ko puswaka xI |
Example – Parsed Output
k1 rAma xI ‘give’ k4 mohana k2 puswaka ‘book’
Parser
Two stage strategy Appropriate constraints formed Stage I (Intra-clausal relations) Dependency relations marked Relations such as k1, k2, k3, etc. for each verb Stage II (Inter-clausal relations & conjunct relations) Conjuncts, relative clauses, kriya mula, etc
Demand Frame for Verb
A demand frame or karaka frame for a verb indicates the demands the verb makes It depends on the verb and its tense, aspect and modality (TAM) label.
A mapping is specified between karaka relations and vibhaktis (post-positions, suffix).
Karaka Frame
It specifies what karakas are mandatory or optional for the verb and what vibhaktis (post positions) they take respectively Each verb belongs to a specific verb class Each class has a basic karaka frame Each TAM specifies a transformation rule
Example
rAma mohana ko puswaka xewA hE |
xewA hE ‘give is’ k1 k2 k4 rAma mohana
Parsed Dependency Tree
puswaka ‘book’
Transformations
Based on the TAM of the verb
rAma ne mohana ko KilOnA xi yA
|
rAma ko mohana ko KilOnA xe nA padZA
| Appropriate transformation applied
Example
rAma ne mohana ko puswaka xI |
Karaka Frame – xe (give)
Transformation Rule – yA (TAM)
Karaka Frame
rAma ne mohana ko KilOnA xi yA | yA TAM
Transformed frame for
xe
after applying the
yA
trasformation --------------------------------------------------------------------------------------- arc-label necessity vibhakti lextype src-pos arc-dir --------------------------------------------------------------------------------------- k1 m ne n l c k2 m 0|ko n l c k3 d se n l c k4 d ko n l c ----------------------------------------------------------------------------------------
0
ne
Parsed Output
k1 rAma xI ‘give’ k4 mohana k2 puswaka ‘book’
Other frames
Adjectives
Steps in Parsing
SENTENCE Morph, POS tagging, Chunking Identify Demand Groups Load Frames & Transform Find Candidates Apply Constraints & Solve Final Parse
Example:
rAma ne mohana ko KilOnA xiyA |
Identify the demand group, Load and Transform DF
xiyA Only verb Transformed frame Use ‘yA’ TAM info.
--------------------------------------------------------------------------------------- arc-label necessity vibhakti lextype src-pos arc-dir --------------------------------------------------------------------------------------- k1 m ne n l c k2 m 0|ko n l c k3 d se n l c k4 d ko n l c ----------------------------------------------------------------------------------------
Candidates
k1
main rAma ne mohana ko KilOnA
k2 k2
xiyA _ROOT_ |
k4
Constraints
C1: demand frame for each demand group, there should be For each of the mandatory demands exactly one outgoing edge in a labeled by the demand from the demand group.
C2: For each of the optional demands in a demand frame for each demand group, there should be at most one outgoing edge from the demand group.
labeled by the demand C3: There should be each source group .
exactly one incoming arc into
Constraints
A parse of a sentence is obtained by satisfying all the above constraints Ambiguous sentences have multiple parses Ill formed sentences have no parse.
Parse - I
k1
main rAma ne mohana ko KilOnA xiyA _ROOT_ |
k2 k4
Parse - I
k1 rAma _ROOT_ main xiyA k2 k4 mohana KilOnA
Integer Programming Constraints
X ijk
represents a possible arc from word group
i
to
j
with karaka label
k
It takes a value 1 and 0 if the solution has that arc otherwise. It cannot take any other values.
The constraint rules are formulated into constraint equations.
Constraint Equations
C1: For each demand group i, for each of its mandatory demands k, the following equalities must hold: M
ik
: S
j x ikj
= 1 C2: For each demand group i, for each of its optional or desirable demands k, the following inequalities must hold: O
ik
: S
j x ikj
< = 1 C3: For each of the source groups j, the following equalities must hold: S
j
: S
ik x ikj
= 1
Multiple Frames
If more than one karaka frame for a verb Call Integer Programming package for each frame If more than one demand groups (e.g., multiple verbs) in the sentence with multiple demand frames Call Integer Programming package for each combination of such frames
Other frames
Common karaka frame Attached to each karaka frame Preference given to main frame if there are clashes Fallback karaka frame required karaka frame is missing Graceful degradation
Stage I: Types being handled
Simple Verbs Non-finite verbs wA_huA wA_hI nA kara 0_rahe, etc.
Copula Genitive
Example (Complex Sentence)
rAma ne phala khaakara mohana ko
Ram ‘ERG’ fruit ‘having eaten’ Mohan ‘DAT’
KilOnA xiyA
toy gave
‘Having eaten the fruit Ram gave the toy to Mohan’
Candidates
X1: k1 X4: k2
X8: main
X7: vmod
rAma ne phala khaakara mohana ko KilOnA xiyA _ROOT_ |
X6: k2 X3: k2 X2: k2 X5: k4
Constraint Equations
Verb ‘xe’ Mandatory Demands (C1) k1 k2 x1 = 1 x2 + x3 + x4 = 1 Optional Demands (C2) k4 x5 <= 1 Verb ‘khaa’ Mandatory Demands (C1) k2 x6 = 1 vmod x7 = 1 _ROOT_ C1 Main x8 = 1
Constraint Equations (contd.)
Incoming Arcs into Source (C3) rAma x1 = 1 phala x4 + x6 = 1 khaa x7 = 1 mohana x3 + x5 = 1 KilOnA x2 = 1 xe x8 = 1
Solution Graph
_ROOT_
main
xiyA rAma
k1 k2 vmod k4
khaakara
k2
mohana phala KilOnA
References
Akshar Bharati and Rajeev Sangal. 1993. Parsing free word order languages in Paninian Framework.
USA.
ACL:93, Proc.of Annual Meeting of Association of Computational Linguistics, Association of Computational Linguistics, New Jersey.
Akshar Bharati, Rajeev Sangal, T Papi Reddy. 2002. A Constraint Based Parser Using Integer Programming
In Proc. of ICON-2002: International Conference on Natural Language Processing.
Rafiya Begum, Samar Husain, Arun Dhwaj, Dipti Misra Sharma, Lakshmi Bai and Rajeev Sangal. 2008. Dependency Annotation Scheme for Indian Languages.
In Proceedings of The Third International Joint Conference on Natural Language Processing (IJCNLP).
Hyderabad, India.
S. M. Shieber. 1985. Evidence against the context-freeness of natural language. In
Linguistics and Philosophy
, p. 8, 334 –343.
Tara Mohanan, 1994.
Arguments in Hindi
. CSLI Publications.
THANKS!!