Parallel Syntactic Annotation of Multiple Languages

Download Report

Transcript Parallel Syntactic Annotation of Multiple Languages

Parallel Syntactic Annotation
of Multiple Languages
Owen Rambow, Bonnie Dorr, David Farwell,
Rebecca Green, Nizar Habash, Stephen
Helmreich, Eduard Hovy, Lori Levin, Keith J.
Miller, Teruko Mitamura, Florence Reeder,
Advaith Siddharthan
Interlingual Annotation of Multilingual Text Corpora (IAMTC)
• CMU
– Lori Levin, Teruko Mitamura
• Columbia
– Owen Rambow, Advaith Siddharthan
• ISI
– Eduard Hovy
• MITRE
– Keith Miller, Flo Reeder
• New Mexico State University
– David Farwell, Steven Helmreich
• University of Maryland
– Bonnie Dorr, Rebecca Green, Nizar Habash
Goals of IAMTC
• Design an Interlingua
– Language-independent representation of text meaning
– Useful for MT, IR, IE, QA,…
• Develop an Annotation Methodology
– Manuals, tools, evaluations
• Annotate multi-lingual, multi-parallel texts
– Foreign language original and 2 English translations
– Foreign languages: Arabic, French, Hindi, Japanese,
Korean, Spanish
IL Development: Three Levels
• IL0: syntactic dependency tree
• IL1: semantic annotations
– Concepts:
• ‘senses’ from ISI’s Omega ontology
• for Nouns, Verbs, Adjs, Advs
– Semantic Roles
• Theta Roles from Dorr’s LCS work
• IL2: reconciliation of different IL1s with
same meaning but different syntax:
– Predicate argument structure
– Sentence plan: main and embedded clauses
Outline
• Goals of IAMTC
 IL0: A deep syntactic dependency representation
– How and why it is different from other dependency
representations
• Examples:
–
–
–
–
Copula
Future tense
Causative
Light verbs
• Comparison to other work
– Prague tectogrammatical representation
– PropBank
Example of IL0
TrEd, Pajas, 1998
Sheikh Mohammed, who is also the
Defense Minister of the United Arab
Emirates, announced at the inauguration
ceremony that “we want to make Dubai
a new trading center”
IL0 Design:
Reduce cross-linguistic Differences
• Retain content words
• Replace function words with syntactic
features
– Tense, definiteness, etc.
• Retain information about the event and
participants
• Neutralize information about the
organization of the information or how it is
communicated
IL0 Features
• Parts of Speech
– Verb, noun, proper noun, adjective, adverb,
preposition, conjunction, determiner, aux
(modal), punctuation, symbols, speech
sounds, misc
• Features of Nouns
– Number, Definiteness
• Features of Verbs
– Progressive, Perfective, Tense, Mood
Summary of IL0
•
•
•
No auxiliary verbs
No determiners
Add empty arguments
–
•
•
“Undo” passives and clefts
Copular sentences are headed by the predicate
–
•
•
I want ___ to go
The umbrella is red
Retain causative markers and light verbs only if they
affect the argument structure of the sentence or have a
literal meaning
Includes syntactic roles (Subj, Obj, IndObj, Mod)
Annotations done so far
• Annotations of 6 English Texts
• Each translated from a different source
language
• Two translations of each text
• 10 – 12 annotators for each text
• Approximately 144 annotated texts
total
IL0 Annotation Manuals
•
•
•
•
•
•
•
English
Arabic
French
Hindi
Japanese
Korean
Spanish
Outline
• Goals of IAMTC
• IL0: A deep syntactic dependency representation
– How and why it is different from other dependency
representations
 Examples:
–
–
–
–
Copula
Future tense
Causative
Light verbs
• Comparison to other work
– Prague tectogrammatical representation
– PropBank
Copula
• English: overt copula
– The umbrella was red.
• Arabic: overt copula in past tense
– kAnat AlmiZl~apu HamrA’F
• Japanese: optional copula (desu)
– Kasa wa akai.
IL0 for Copula Sentences
IL1 for Copula Sentence
Future Tense
Spanish: Llegará Juan
English: Juan will arrive
Causative Sentences in English,
Japanese, and Arabic
• English: main clause and embedded clause
I made [the cat eat the fish]
• Japanese: productive causative morpheme
Watashi-wa neko-ni sakana-wo tabe-sase-ta
I
TOP cat
DAT fish
ACC eat
• Arabic: lexical causatives
>ak~altu
AlqiT~apa
Alsamakpa
Eat-CAUSE
cat.DEF.ACC
fish.DEF.ACC
CAUSE-PAST
IL0 for causative sentences in
English, Japanese, and Arabic
Make[V,past]
>ak~al[V,cause,past]
SUBJ
IOBJ
Empty[N]
cat[N,sg,def]
OBJ
fish[N,sg,def]
Reduce differences
between languages but
only to the extent allowed
by the syntax,
morphology, and lexical
items
SUBJ
I[N]
OBJ
eat[V]
SUBJ
cat[N,sg,def]
OBJ
fish[N,sg,def]
sase[V,past]
SUBJ
watashi[N]
OBJ
tabe[V]
SUBJ
neko[N,sg,def]
OBJ
sakana[N,sg,def]
Hindi Light Verbs
Hum santre
kha gaye
We oranges eat went
“We ate oranges”
Hindi Light Verbs
Ram santra kha-kar jayega
Ram orange eat-then go
“Ram will eat the orange and left”
Outline
• Goals of IAMTC
• IL0: A deep syntactic dependency representation
• Examples:
–
–
–
–
Copula
Future tense
Causative
Light verbs
 Comparison to other work
– Prague tectogrammatical representation
– PropBank
Comparison to other work
• Compared to annotation projects
– IAMTC is an interlingua project
– IAMTC annotates multi-lingual, multi-parallel
texts in order to reconcile differences between
languages
• Compared to interlingua design projects
– IAMTC is a corpus driven project
– IAMTC is an annotation project
Comparison to Tectogrammatical
Representation
•
IL0 has only syntactic relation labels
– In IL0: all adjuncts are marked “adj”
•
IL0 retains strongly governed prepositions
– give X to Y
•
IL0: prepositions are heads
– But there is some flexibility for each language
to decide
Comparison to PropBank
• IAMTC is more syntactic
• Thematic paraphrases: same arguments
filling the same roles for the same verb
– Load hay on truck/load truck with hay
– Same in PropBank
– Different in IL0
End
Extra Slides
IL0 Differences Between
Languages
• Morphological features on nodes different
between languages
• No raising verbs in Arabic, Hindi, Japanese,
Korean; raising verbs have no subject
John seems to like beans
• Serial verbs in Hindi: additional verb with only
aspectual meaning (?) treated as dependent
on main verb
hum santre kha gaye
we oranges eat went
`We ate the oranges’
IL0 Differences Between
Languages (2)
• Morphological causatives in Japanese:
causative morpheme is head
私は
(猫に 魚を 食べ-)
-させた
1sg-TOP (cat-DAT fish-OBJ eat-) -CAUSE-PAST
I made the cat eat the fish
• Prepositions as heads in all our
languages, but probably not others
(Czech)
Summary:
What is Normalized Where?
• Syntactic variation: IL0
– The gangster killed at least 3 innocent bystanders
– At least 3 innocent bystanders were killed by the
gangster
• Lexical synonymy: IL1
– The toddler sobbed, and he attempted to console her
– The baby wailed, and he tried to comfort her
• Diathesis alternation: IL1 (caveat)
– The men loaded hay into the trucks
– The men loaded the trucks with hey
Summary:
What is Normalized Where?
• Part-of-speech class and derivational
morpholgy: IL1/2
– I was surprised that he destroyed the old house
– I was surprised by his destruction of the old house
• Possession: IL1
– Dubais’s oil, oil of Dubai
• Clause combination: IL2
– This is Joe’s new car, which he bought in New York
– This is Joe’s new car. He bought it in New York
Summary:
What is Normalized Where?
Different argument realizations: IL1/2
– Bob enjoys playing with his kids
– Playing with his kids pleases Bob
• Noun-noun compounds: IL2
– She loves velvet dresses
– She loves dresses made of velvet
• Head switching: IL2
– Mike Mussina excels at pitching
– Mike Mussina pitches well
– Mike Mussina is a good pitcher
Summary:
What is Normalized Where?
• Overlapping meanings:IL2
– Lindbergh flew across the Atlantic Ocean
– Lindbergh crossed the Atlantic Ocean by
plane
• Locus of Negation: IL2
– I have not bought any cheese
– I have bought no cheese
Summary:
What is Normalized Where?
• Light verbs: IL2
– conduct a tightening = tighten
– witness a growth rate of = grow by
• Direct and indirect discourse: IL2
– said “X” vs. said that X
Not Normalized
at IL0, IL1, nor IL2
• Logical inferences
– He’s smarter than everybody else
– He’s the smartest one
• Real-World Inference
– The tight end caught the ball in the end zone
– The tight end scored a touchdown
• Different syntactic sentence types, same
pragmatic meaning
– Who composed the Brandenburg Concertos?
– Tell me who composed the Brandenburg Concertos
Not Normalized
at IL0, IL1, nor IL2
• Viewpoint variation
– The U.S.-led invasion/liberation/occupation of
Iraq
– He is getting in the way vs.
He is only trying to help
Differences from other projects
• Eurotra, Euro WordNet, UNL
– Share the goal of defining an interlingua
– Don’t share the goal of producing an annotated
corpus
• ParGram
– Grammars for several languages developed in close
consultation
– Based on the assumption of universal grammar
– Not an annotation project
– Not corpus based
Getting at Meaning
(Two translations of Korean original text)
Starting on January 1
of next year,
SK Telecom subscribers
can switch to
less expensive LG Telecom or
KTF. …
The Subscribers
cannot switch again
to another provider
for the first 3 months,
but they can cancel
the switch
in 14 days
if they are not satisfied with
services
like voice quality.
Starting January 1st
of next year
customers of SK Telecom
can change their service
company to
LG Telecom or KTF …
Once a service company swap
has been made,
customers
are not allowed to change
companies again
within the first three months,
although they can cancel
the change
anytime within 14 days
if problems
such as poor call quality
are experienced.
Getting at Meaning
(Two translations of Korean original text)
Starting January 1st
of next year
customers of SK Telecom
can change their service
company to
LG Telecom or KTF …
Once a service company swap
black: same words,
same
meaning
has
been
made,
The Subscribers
customers
cannot switch again
are not allowed to change
to another provider
companies again
for the first 3 months,
within the first three months,
but they can cancel
although they can cancel
the switch
the change
in 14 days
anytime within 14 days
if they are not satisfied with
if problems
services
such as poor call quality
like voice quality.
are experienced.
Starting on January 1
of next year,
SK Telecom subscribers
can switch to
less expensive LG Telecom or
KTF. …
Getting at Meaning
(Two translations of Korean original text)
Starting January 1st
of next year
customers of SK Telecom
can change their service
company to
LG Telecom or KTF …
Once a service company swap
green: small syntactic
differences
has
been made,
The Subscribers
customers
cannot switch again
are not allowed to change
to another provider
companies again
for the first 3 months,
within the first three months,
but they can cancel
although they can cancel
the switch
the change
in 14 days
anytime within 14 days
if they are not satisfied with
if problems
services
such as poor call quality
like voice quality.
are experienced.
Starting on January 1
of next year,
SK Telecom subscribers
can switch to
less expensive LG Telecom or
KTF. …
Getting at Meaning
(Two translations of Korean original text)
Starting on January 1
of next year,
SK Telecom subscribers
can switch to
less expensive LG Telecom or
KTF. …
The Subscribers
cannot switch again
to another provider
for the first 3 months,
but they can cancel
the switch blue: lexical
in 14 days
if they are not satisfied with
services
like voice quality.
Starting January 1st
of next year
customers of SK Telecom
can change their service
company to
LG Telecom or KTF …
Once a service company swap
has been made,
customers
are not allowed to change
companies again
within the first three months,
although they can cancel
differences
the change
anytime within 14 days
if problems
such as poor call quality
are experienced.
Getting at Meaning
(Two translations of Korean original text)
Starting January 1st
of next year
customers of SK Telecom
can change their service
company to
LG Telecom or KTF …
Once a service company swap
has been made,
The Subscribers
customers
cannot switch again
are not allowed to change
to another provider
companies again
for the first 3 months,
within the first three months,
but they can cancel
they can cancel
the switch
red: not contained although
in
other
text
the
change
in 14 days
anytime within 14 days
if they are not satisfied with
if problems
services
such as poor call quality
like voice quality.
are experienced.
Starting on January 1
of next year,
SK Telecom subscribers
can switch to
less expensive LG Telecom or
KTF. …
Getting at Meaning
(Two translations of Korean original text)
Starting January 1st
Starting on January 1
of next year
of next year,
customers
of SK Telecom
SK Telecom
subscribers
purple:
more complex
relations
can change their service
can switch to
company to
less expensive LG Telecom or
LG Telecom or KTF …
KTF. …
Once a service company swap
has been made,
The Subscribers
customers
cannot switch again
are not allowed to change
to another provider
companies again
for the first 3 months,
within the first three months,
but they can cancel
although they can cancel
the switch
the change
in 14 days
anytime within 14 days
if they are not satisfied with
if problems
services
such as poor call quality
like voice quality.
are experienced.