Document 7624767

Download Report

Transcript Document 7624767

Interlingua Methodology
 Directly obtain the meaning of the source sentence.
 Do target sentence generation from the meaning
representation.
 John gave the book to Mary.
 Meaning representation:
 give-action:
 agent: john
 object: the book
 receiver: mary
Competing approaches
Direct
Transfer based
Direct approach
 Word replacements
I like mangoes
maOM AcCa laga Aama
I like (root) mangoes
 Morphology
maOM AcCa lagata Aama
I like mangoes
 Syntactic re-arrangement
maOM Aama AcCa lagata hO
I mangoes like
 Semantic embellishment
mauJao Aama AcCa lagata hO
I (dative) mangoes like
Transfer Based
Source sentence processed for parsing, chunking etc.
S
VP
NP
V
NP
I
like
mangoes
Transfer Based
Transfer structures obtained for the target sentence.
S
VP
NP
NP
V
I
mangoes
like
Transfer Based
Morphology and language specific modifications
S
VP
NP
NP
mauJao
Aama
V
AcCa lagataa hO
MT Architectures: Vauquois' triangle
Relation Between the Transfer and the Interlingua Models
Interlingua
Interpretation
Source language
Parse tree
Parsing
source language
words
generation
transfer
Target Language
Parse tree
generation
Target language
words
State of Affairs
 Systran reports 19 different language
pairs.
 8 alright for intended use.
 Even fewer are capable of quality written
or spoken text translation.
ENGLISH-SPANISH-ENGLISH
 ...In that Empire, the Art of Cartography attained such
Perfection that the map of a single Province occupied
the entirety of a City, and the map of the Empire, the
entirety of a Province
 ... en ese imperio, el arte de la cartografía logró tal
perfección que el mapa de una sola provincia ocupó
la totalidad de una ciudad, y el mapa del imperio, la
totalidad de una provincia
 ... in that empire, the art of the cartography obtained
such perfection that the map of a single province
occupied the totality of a city, and the map of the
empire, the totality of a province
Provided by Systran on 19/11/02
ENGLISH-KOREAN-ENGLISH
 ...In that Empire, the Art of Cartography attained such Perfection
that the map of a single Province occupied the entirety of a City,
and the map of the Empire, the entirety of a Province
저 제국안에, 단순한 지방의 지도가 도시
의 완전을 점유했다 고 Cartography의 예
술은 같은 얀벽,및 제국, 지방의 완전의
지도 를 달성했다
 Inside that empire, the map of the region where it is simple
occupied the perfection of the city the art of the Cartography is
same, yan it attained the map of of perfection of the wall and
empire and region
Provided by Systran on 19/11/02
UNL Based MT: the scenario
ENGLISH
ENCONVERSION
RUSSIAN
UNL
DECONVERSION
FRENCH
HINDI
Universal Networking
Language
 Common language for computers to express
information written in natural language
(Uchida et. al. 2000)
 Application:
 Electronic language to overcome language barrier
 Information Distribution System
UNL Example
arrange
agt
John
obj
meeting
plc
residence
Components of the UNL System
 Universal Word
 Relation Labels
 Attributes
Universal Word
[saayaa] "shadow(icl>darkness)"; the place was
now in shadow
[laoSamaa~] "shadow(icl>iota)"; not a shadow of
doubt about his guilt
[saMkot] "shadow(icl>hint)" ; the shadow of the
things to come
[Cayaa] "shadow(icl>deterrant)"; a shadow over his
happiness
Universal Word
(foreign concepts)
[aput] "snow(icl>thing)";
[pukak] "snow(aoj<salt like)";
[mauja] "snow(aoj<soft, aoj<deep)";
[massak] "snow(aoj<soft)";
[mangokpok] "snow(aoj<watery)";
Relation
agt (agent) Agt defines a thing which initiates an action.
agt (do, thing)
Syntax
agt[":"<Compound UW-ID>] "(" {<UW1>|":"<Compound UW-ID>}
"," {<UW2>|":"<Compound UW-ID>} ")"
Detailed Definition
Agent is defined as the relation between:
UW1 - do, and
UW2 - a thing
where:
UW2 initiates UW1, or
UW2 is thought of as having a direct role in making UW1 happen.
Examples and readings
agt(break(icl>do), John(icl>person)) John breaks
agt(translate(icl>do), computer(icl>machine)) computer translates
Attributes
 Used to describe what is said from the
speaker's point of view.
 In particular captures number, tense,
aspect and modality information.
Example Attributes
 I see a flower
UNL: obj(see(icl>do), flower(icl>thing))
 I saw flowers
UNL: obj(see(icl>do).@past, flower(icl>thing).@pl)
 Did I see flowers?
UNL: obj(see(icl>do).@past.@interrogative,
flower(icl>thing).@pl)
 Please see the flowers?
UNL: obj(see(icl>do).@past.@request,
flower(icl>thing).@pl.@definite)
The Analyser Machine
Analysis
Rules
C
Node List
Dictionary
Enconverter
ni-1
A
ni
A
C
C
ni+1
ni+2
ni+3
A
D
Node-net
B
C
E
Strategy for Analysis
 Morphological Analysis
 Syntactico-Semantic Analysis
Analysis of a simple sentences
<< A Report of John’s genius reached King’s ears>>
article and noun are combined and attribute@indef is added to the noun.
<<[Report ][of] John’s genius reached king’s ears>>
Right shift to put preposition with the succeeding noun.
<</Report /[of ][John’s] genius reached king’s ears>>
Ram’s being a possessing noun, shift right.
<</Report //of / [John’s] [genius] reached king’s ears>>
These two nouns are resolved into relation pos and first noun is deleted:
Simple sentence (continued)
<</Report /[of][genius] reached King’s ears>>
The preposition of is then combined with noun and a dynamic attribute OFRES is
added to entry of genius.
<<[Report][of genius ] reached King’s ears>>
Using the attribute OFRES these two nouns are resolved to relation mod and the
second noun is deleted.
<<[Report ][reached] King’s ears>>
Shift right again and solve King’s ears, relation pof is generated.
<</Report /[reached][ ears]>>
Relation obj is generated here and then relation agt is generated between Report
and ears
<</reached />>
UNL as Interlingua and
Language Divergence
(Dave, Parikh, Bhattacharyya, JMT, 2003)
 Stands for the discrepancy in representation
due to the inherent characteristics of the
languages.
 Syntactic Divergence
 Lexical Semantic Divergence
Issue of free word order
jaIma nao caaorI krnaovaalao laD,ko kao laazI sao
jaIma nao laazI sao
caaorI krnaovaalao laD,ko kao
caaorI krnaovaalao laD,ko kao jaIma nao laazI sao
caaorI krnaovaalao laD,ko kao laazI sao
jaIma nao
laazI sao
jaIma nao caaorI krnaovaalao laD,ko kao
maara.
maara.
maara.
maara.
maara.
 Use made of the fact that in Hindi post positions stay adjacent
to nouns (opposed to the preposition stranding divergence).
 Flexibility in parsing- hit and preserve the predicate till the
end.
Conjunct and Compound verbs
Typical Indian language phenomenon. Conjunct for verb-verb,
compound for other POS+verb.
vah gaanao
lagaI
She started singing
H
E
H
E
H
E
calao jaaAao
Go away.
$k jaaAao
Stop there.
Jauk jaaAao
Bend down.
Possibility of combinatorial explosion in the lexicon. Possible
solution: wordnet?
Use of Lexical Resources
 Automatic Generation of the UW to language dictionary
(Verma and Bhattacharyya, Global Wordnet Conference, Czeck Republic, 2004)
 Universal Word generation
 Semantic attribute generation
 Heavy use of wordnets and ontologies
Languages under Study
Language
Analysis Status
Generation
Status
English
D- 60000
R- 5000
D- 60000
R- 400
Hindi
D- 75000
R- 5700
D- 75000
R- 6500
Marathi
D- 4000
R- 2200
D- 4000
R- 6000
Bengali
D- 500
R- 1800
D- 500
R- 2100
Conclusions
 Predicate preservation strategy used for
English, Hindi, Marathi, Bengali (Spanish
being added).
 Focus in marathi on morphology for
Marathi.
 Focus on kaarak (case) system for Bengali.
 Extremely lexical knowledge hungry.
Conclusions
 Work going on in the creation of Indian language
wordnets (Hindi, Marathi in IIT Bombay; Dravidian in
Anna University).
 Interlingua has a the attractive possibility of being used
as a knowledge representation and applying to
interesting applications like summarization, text
clustering, meaning based multilingual search engines.
Generation of the Hindi Case System in an
Interlingua based MT
Framework
Debasri Chakrabarti, Sunil Kumar Dubey, Pushpak Bhattacharyya.
Computer Science and Engineering Department,
Indian Institute of Technology, Bombay,
Mumbai, 400076, India.
debasri,dubey,[email protected]
Introduction
Role of the case marker in a language
plays an important role in the structure of a sentence
helps to impart the meaning and naturalness
Example
*मोटे तौर पर कृषि भमू म की जतु ाई, फसलों की रुपाई, कटाई, पालतू पशु
प्रजनन, पालन, दग्ु ध-व्यवसाय और वनीकरण सम्मममलत होता है ।
In a broad sense, agriculture includes cultivation of the soil and
growing and harvesting crops and breeding and raising livestock
and dairying and forestry.
The Case System in Hindi
Hindi is characterized by a rich subsystem of
case
Example:
राम
ने
रषव
Ram Erg Ravi Dat
gave a book to Ravi.
को
ककताब
दी।
book Nom give + past Ram
Hindi has the following cases
nominative, ergative, accusative, instrumental, dative, genitive
locative
Language Universal Case Feature
Case
Nominative
(NOM)
Conditions
case of the subjects. In a language if there
are two distinct cases for the subjects, one
inflected and the other without inflection
then NOM refers to the uninflected one.
Ergative (ERG)
the inflected case associated with the subject
Accusative (ACC) case attached with the object
Language Universal Case Feature
Case
Conditions
Dative (DAT)
case of goals/ recipients.
Instrumental (INS)
Genitive (GEN)
case of instruments used to accomplish an
action
case of possessors
Locative (LOC)
case of physical place
Case features of Hindi
Case
1.
Nominative
Markers
Ө
Conditions
a. Subject
b. Inanimate primary
object
2.
Ergative
ने
a. Agentive subject with
perfective aspect
b. Simple past
Example
राम आम खा रहा
था।
राम आम खा
रहा था।
राम ने श्याम को
ककताब दी थी।
राम ने ककताब
पढी।
Case features of Hindi
Case
3.
Accusative
Markers
को
Conditions
Example
a. Animate primary object
राम ने सीता को
दे खा।
b. Definite, Inanimate
primary object
राम ने उस ककताब
को पढा।
राम ने सीता को
ककताब दी।
4.
Dative
को
Goal of the sentence
5.
Ablative
से
Source
पेड़ से पत्ते गिर
रहे
हैं।
Case features of Hindi
Case
Markers
Conditions
a. Instrument
6.
7.
Instrumental
Genitive
से
Example
राम ने चाकू से फल
काटा।
b. Intermediary agent
[cause]
राम ने सीता से
गचठ्ठी
मलखवाई
c. Denoted agent of
Passive
राम से खाना नहीीं
खाया िया।
राम की ककताब
a. Possessor
का, की, के [involving ownership
अच्छी
है ।
of something]
Case features of Hindi
Case
Markers
Conditions
7.
Genitive
का, की, के
8.
Locative
में
a. In, Within
पर
b. On, at
b. Relationship to
somebody
Example
राम का भाई अच्छा
है ।
राम ददल्ली में
रहता
है ।
राम पेड़ पर चढ
िया।
Nominative ~ Ergative alternation in
the agent position
agent of an action may bear either nominative
case or ergative case
ergative case appears in Hindi
 simple past form
 perfective aspect
Examples
 राम
पीटा।
Ram
ने
erg
रषव
Ravi
acc
को
beat+past
Ram beat Ravi.
 राम
था।
Ram
perfect
ने
रषव
erg
Ravi
को
acc
पीटा
beat+past
Ram had beaten Ravi.
 राम
है ।
ने
रषव
को
पीटा
Observations
There is a correlation between the ergative case
and the aspectual property of the main verb
This is morphologically overt on the verb
 Simple Past Tense: पीटा
 Perfective Aspect: पीटा था
Morphological Rule
 Simple Past Tense: V + आ  ने
 Perfective Aspect: V + आ + (Tense morphology)  ने
Nominative ~ Ergative Alternation
 Some Complex Phenomena
 nominative case on the agent with the mentioned
aspectual features
 IS nominative ~ ergative subject to transitivity?
language universally transitivity determines nom ~
erg
three types of patterns independent of transitivity
in Hindi
Nominative ~ Ergative Alternation
Three patterns are:
only nom agents
only erg agents
either nom or erg agents
Examples of Intransitive verbs
 Only nom agents
i) राम
Ram +nom
ii) *राम
Ram
ने
erg
गिरा।
fall + past.
गिरा।
f all + past
Ram fell down
Intransitive Verbs
 Only erg agents
i) राम
ने
Ram
erg
ii) * राम
Ram +nom
प्रतीक्षा की।
wait + past.
प्रतीक्षा ककया।
wait + past.
 Either nom or erg agents
i) राम
खेला।
Ram +nom
play + past.
ii) राम
ने
खेला।
Ram
erg
play + past.
Ram waited.
Ram played.
Transitive Verbs
 Only nom agents
i) राम
Ram +nom
ii) *राम
ने
Ram erg
शीशा
glass
शीशा
glass
 Only erg agents
i) राम
Ram
ii) *राम
Ram +nom
ने
erg
शीशा
glass
glass
लाया। Ram brought the glass.
bring + past.
लाया।
bring + past.
तोड़ा। Ram broke the glass.
break + past.
शीशा
तोड़ा।
break + past.
Transitive Verbs
 Either nom or erg agents
i) राम
ने
समझा
घर
मेरा
है ।
Ram
erg
think + past
that
mine
is.
Ram thought that the house is mine.
कक
ii) राम
समझा
मेरा
है ।
Ram
think + past
is.
घर
कक
that
house
house
mine
Inferences
 Ergative case in Hindi is semantically driven
 action performed deliberately : ergative case
 action performed non deliberately: nominative
case
 Examples of deliberate and non-deliberate
action
राम
गिरा।
Ram +nom
राम
ने
fall down.
Ram erg
fall + past.
मोहन
Mohan acc
को
Ram fell down
गिराया। Ram made Mohan to
cause to fall down
Accusative ~ Nominative Alternation in the
Object
 Primary objects in Hindi
either accusative : को
or nom uninflected : Ө
 Examples
राम
ने
Ram ate rice
Ram
erg
राम
ने
killed Ravan.
Ram erg
चावल
खाया।
rice + nom
eat+ past.
रावण
Ravan
को
acc
मारा।
kill + past.
Ram
Accusative ~ Nominative Alternation
 Generalization
animate objects are accusative
inanimate objects are nominative
 Counter examples of this generalization
accusative case with the inanimate objects
राम
ने
ककताब
को
उठाया । Ram lifted the
book.
Ram
erg
book
राम
Ram
ने
erg
ककताब
book
acc lift + past.
उठाई । Ram lifted a book.
lift + past.
Summarization of the theoretical
approach
 NOM~ERG Alternation
subject to the semantic property of the verb
this semantic property conscious choice
 NOM~ACC Alternation
subject to animacy
subject to definiteness
How to generate the Case Markers in the
UNL System
Three components to the generation system
Lexicon
Rule Base
UNL Expression
Lexicon
attribute for a verb is taken from a verb hierarchy
attribute of conscious choice is [DLBRT-ACT]
[DLBRT-ACT] stands for deliberate action
Case Markers in the UNL System
Rule Base
Hindi is a SOV language
a frequently used rule in Hindi is left insertion rules
a child node is mostly always inserted to the left of the
parent node given in the UNL expression
Format for the Rule
:"<COND1>:<ACTION1>:<RELATION1>:<ROLE1>"{<COND2>:<ACTION2>:
<RELATION2>:<ROLE2>}
G
GG
GG
GG
GG
सगचन
सगचन
Sachin
सगचन
खा@entry
चावल
खा@entry <<STAIL>>
<<SHEAD>> eat@entry
eat@entry
<<STAIL>>
<<STAIL>>
agt
Sachin
obj
Rice
obj
Rice
G
Interpretation of a rule
 Example of a left insertion rule
:"<agt:+blk,+agt,+!ne,+sufc:agt:"{V,>agt,@pas
t,DLBRT-ACT,^@progress:+!agt::}P242;
{} indicates parent node
“ ” indicates child node
condition of the parent node
V,>agt,@past,DLBRT-ACT,^@progress
condition of the child node
<agt
priority is denoted by P followed by a priority number
P242
Generation of the Case marker on the
Agent
 Rules for the generation system to handle the case of the
agent
R1
:"<agt:+blk,+agt,+!ne,+sufc:agt:"{V,>agt,@past,DLBRTACT,^@progress:+!agt::}P242;
R2
:"<agt:+blk,+agt,+sufc:agt:"{V,>agt,@past,^@progress:!agt::}P241;
 Priority plays an important role in generation
 R1  Ergative
 R2  Nominative
Generation of the Case marker on the
Object
 Rules for the generation system to handle the case of
the object
R5 
:"<obj,INANI,MALE,^V,^SCOPE:+obj,+sufc,+blk:obj:"{>obj
:+!obj,+male::}P160;
R6 
:"<obj,ANIMT,MALE,^V,^SCOPE:+obj,+sufc,+blk,+!ko:obj:"
{>obj:+!obj,+male::}P160;
R7 
:"<obj,@def,MALE,^V,^SCOPE:+obj,+sufc,+blk,+!ko:obj:"{
>obj:+!obj,+male::}P163;
Conclusion
 Result
 provides accuracy in the generation of case markers for the UNL
relations (see table)
 lends naturalness in the generation of the Hindi sentences
 This alternation is extended for the pronominal cases
 Future Work
 enhance the study for Dative and Genitive case markers
and their corresponding UNL relations
Demo