GECCo - International scientific conference «Corpus linguistics

Download Report

Transcript GECCo - International scientific conference «Corpus linguistics

A CORPUS LINGUISTIC
STUDY OF ELLIPSIS AS A
COHESIVE DEVICE
Katrin Menzel
Institute of Applied Linguistics, Translation and
Interpreting, Saarland University
Corpus Linguistics Conference – 27 June 2013, St. Petersburg
GECCo
project
http://www.gecco.
uni-saarland.de/
GECCo/Home.html
GECCo project
GECCo: German-English Contrasts in Cohesion
• supported by DFG
1st phase 2011-2013
2nd phase 2013-2016
• Project Team:
Marilisa Amoia
Ekaterina Lapshinova
Erich Steiner
Kerstin Kunz
Katrin Menzel
Main research questions
Which systemic resources of cohesion are
instantiated in English and German texts in
different registers/genres?
How frequent are they?
Which cohesive meanings do they express?
Research goals
analyse cohesive resources provided by
the language systems and instantiations in
texts
 explore contrasts in form, frequency,
function and meaning relations across and
between languages, registers and
production types
Motivation
Filling major research gaps:
Comprehensive accounts of cohesion: only
existent from a monolingual perspective (e.g.
Halliday & Hasan 1976)
empirical monolingual or contrastive analyses on
text and discourse level mainly deal with
individual phenomena
CORPUS RESOURCES
procedures to extract cohesive
phenomena require compilation,
annotation and exploitation of GECCo
Corpus (written and spoken texts)
assumption: no clear dividing line but a
continuum from written to spoken
written part of GECCo is a translation
corpus and consists of various genres
(popular-scientific, fictional and tourism
text, prepared speeches, political essays,
corporal communication, instruction
manuals, websites) of English and
German original texts that are aligned
with their translation
spoken part of the corpus is
comparable corpus of English
and German original texts
(interviews, academic lectures,
web-forum, talkshows…)
Corpus resources
http://fedora.clarin-d.uni-saarland.de/cqpweb/
http://de.clarin.eu/en
http://fedora.clarin-d.uni-saarland.de/cqpweb/doc/Simple_query_language.pdf
Types of cohesive devices
Present Study:
ellipsis
types: nominal, verbal, clausal
(cf. Halliday&Hasan, 1976)
across:
- languages: English vs. German
- registers: different text types
- production types: originals vs.
translations
Research goals
describing ellipsis from a cross-linguistic
viewpoint in English and German
enhancing corpus linguistic methods to
cover a comprehensive variety of ellipses
in different registers of spoken and
written language in a bilingual corpus
(GECCo) of about 1 million words
Defining cohesive ellipsis
Ellipsis as a cohesive device is the omission of an
element normally required by the grammar that
can be recovered by the linguistic context.
Halliday/Hasan: nominal, verbal, clausal ellipsis
Examples:
There are two approaches to problem-solving: the
empirical [ ] and the rational [ ].
I want to help you, but I can’t [ ].
What is the capital of the Philippines? – Manila [ ].
Ellipsis as a cohesive device
• cohesive ellipsis vs. other types of ellipsis
and fragments (e.g. headlines, exophoric
ellipsis without textual antecedent,
lexicalised ellipsis)
• missing information must be supplied
from the surrounding co-text (usually
anaphorically)
Some difference between English and
German
e.g. nominal ellipsis: ellipsis remnant
has to show strong morphological
agreement in order to license the
elided noun in German
ein grünes [Haus], keine [Häuser], keins [?]
in a few cases, this also happens in
English (mine, none…)
Verbal ellipsis in English and German
lack of correspondence between English and German verbal
system  more differences between E/G than with regard
to nominal ellipsis
e.g. inclusive imperative: Let’s [go]. / Let’s not. (does not
exist in German: *Lass uns!)
English examples in GECCo: many subtypes of verbal ellipsis
with varying degree of complexity
German: mainly ellipses of modal verb complement
(Er muss [ ])
Clausal ellipsis in English and German
Differences G/E: case
Von wem wurde der Junge untersucht? –
(Von) Einer Psychologin.
* Eine Psychologin.
Who was the boy examined by? –
A psychologist.
Sluicing:
Er will jemandem schmeicheln, aber sie wissen nicht wem [ ]
He wants to flatter someone, but they don't know who [ ].
Practical Issues
Annotating / querying ellipsis
in corpora
Manual annotation with MMAX2
 to compare with automatic annotation
http://www.h-its.org/english/research/nlp/download/mmax.php
Pointer relation can be used to link a bridging expression to its bridging antecedent.
CQP queries: to query empty elements
we have to find syntactically incomplete
or deficient structures
German: Stuttgart-Tübingen-TagSet
STTS, English: Penn Treebank tagset
Querying corpus with CQP
(German: Stuttgart-Tübingen-TagSet STTS, in English: Penn Treebank tagset)
Sample patterns nominal ellipsis
CQP query design
Examples English
[#
Mrs. Wood’s [hat]
1. possessive marker 's not
followed by noun
[word='s'][pos!='nn|ne']
2. nominal ellipsis after
article/det/numeral/quantifier/
possessive marker (+optional
adjective)
e.g. in German subcorpora:
[pos='adja'][pos='vafin']; (adjective + finite
verb) or
[pos='art'][pos='adja'][pos!='nn|ne'];
(article + adjective, not followed by
noun/proper noun)
That was your dream. Kim’s
[dreams] were all
nightmares.
I accept the first argument,
but reject the other two [ ]/
the third [ ]
While Kim had lots of
books, Pat had very few [ ].
I went up that skyscraper in
Boston, but the tallest [ ] is
in Chicago.
in English subcorpora (different tagset) :
[pos='jj'][pos='vv.*']; (adjective + verb)
…
Sample CQP queries
GO:
• [pos='adja'][pos='vafin'];
(adjective + finite verb);
• [pos='art'] [pos='adja'][pos!='nn|ne'];
(article + adjective, not followed by
noun/proper noun)
• EO/ETRANS (different tagset) :
[pos='jj'][pos='vv.*']; (adj. + verb)
 some manual correction necessary
difficulty for tagger: in English, many
ellipsis remnants have multiple word class
membership
 pronouns (e.g. "other": det/adj/pron),
words ending in -ing:
the second being very...
- to know whether being is a verb or a noun
context has to be taken into account as
tagging is sometimes wrong and leads to
irrelevant examples in query results)
e.g. "one": number/pronoun/det/adj/ noun
- sometimes used with nominal ellipsis,
sometimes nominal substitution):
the green one (= nominal subsitution)
we saw one [lion] (=nominal ellipsis)
sometimes ellipsis remnants are
zero derivations (especially in
English this additionally contributes
to word class ambiguity for taggers,
e.g. N/V: salt, ship, Adj./N: modal)
- some nominalised elements (tagged as adj.
/ numerals), which often refer to people or
abstract concepts + lexicalised / context-free
ellipsis also have to be sorted out manually:
- the immoral, the rich,
- the elderly, a 1 year old
- the Fantastic Four
- the big two [?] (referring to Oxford and
Cambridge university, lexicalised?)
- lexicalised idiomatic ellipsis: eine [ ] rauchen
normalized frequencies of typical ellipsis subtypes per
100.000 words in 4 German & English registers of GECCo
Nominal ellipsis verbal
clausal
∑
GO Interview
62.2
9.7
42.2
114.1
EO Interview
129.3
58.0
42.2
229.5
GO Academic
124.4
9.8
43.9
178.1
EO Academic
131.0
29.7
12.4
173.1
GO Fiction
114.2
38.1
51.7
204.0
EO Fiction
154.1
37.8
27.0
218.9
GO Tourism
24.6
13.7
16.4
54.7
EO Tourism
52.9
5.6
0
58.5
Spoken registers EO/GO GECCo:
Redundant elements were inserted - instead of
elided -, words were repeated, even in an
ungrammatical way to remind the hearer of items that
were mentioned earlier in the text.
- Da machen wir etwas was es absolut verrückt ist.
- Ich war bis 1975 war ich in Stuttgart (GO Interview)
- For me it’s important is identifying where you come
from. (EO Interview)
Translation as a cause of
linguistic change with
regard to cohesive devices
cohesive devices, especially ellipsis and
substitution, are particular elements
where translations involve specific
shifts and some kind of ‘fingerprints’
(Gellerstam 2005) or ‘shining through’
(Teich 2003) from the source language
into the target language
‘shining through’ (Teich 2003) from source
language into target language:
empirically identifiable traces of source
language interference in terms of
proportional frequencies of constructions
that have the potential to spread from
translated to non-translated target language
texts
- translation-induced language change is
subtle and often overlooked, but in recent
years, some interesting studies have
demonstrated the significance of translation
as a site of language contact (e.g. House
2006)
- lexical and orthographic level is probably
affected most frequently as words are
sometimes borrowed through translation
- source language interference with regard to
syntactic or discourse-structural patterns,
such as the use of cohesive devices, is more
complex and less easily perceptible without a
quantitative analysis of proportional
frequencies in larger text corpora
- using translation and parallel text corpora,
House (2011) for instance has demonstrated
that textual norms in German are adapted to
anglophone ones
analysis of GECCo corpus indicates that, compared to
English originals, English translations of German texts
include a higher frequency of nominal ellipsis after
adjectives where we would normally expect for
example ‘one/s’, ‘of them’, a general or a specific noun:
(1) ein Denken …, das strenger ist als das begriffliche [ ]
translation: a thinking more rigorous than the
conceptual [ ]
(2) Der größte und schönste [ ] ist der Naschmarkt.
translation: The largest and most impressive [ ] is
Naschmarkt.
On the other hand, translations into
English seem to have a higher
frequency of 'one' as a substitute
where it is not obligatory (e.g. after
'next', 'second', 'another', 'which').
translators often insert ‘tun‘ in the case of
English lexical verb ellipsis or use it as a direct
translation of ‘do’
If we do not, no one else will [ ].
translation in corpus: Wenn wir es nicht tun,
wird niemand es tun.
just as Ukraine and South Africa had done and
as Libya is doing today
translation: so wie es die Ukraine und Südafrika
getan haben, und wie Libyen es heute tut
corpus extraction results show that
number of hits of lemma ‘tun’ is much
higher in German translations
(41 / 100.000 words) than in German
originals (29 / 100.000 words)
translations contribute to semantic
bleaching of this verb (writers of
German original texts usually tend to
avoid the verb ‘tun’ as a substitute for
a main verb for stylistic reasons)
depending on various factors such as
standardization of the language and genre
and amount and prestige of translated
texts, language specific structures and
innovations may spread from translated to
non-translated target language texts
References:
Evert, S. 2005. The CQP Query Language Tutorial. IMS, Universität
Stuttgart.
Gellerstam, M. 2005. Fingerprints in Translation, In: In and out of
English: For Better, for Worse, ed. by G. Anderman and M. Rogers,
Clevedon: Multilingual Matters, pp. 201-13.
Halliday, Michael. A.K. and Ruqaiya Hasan. 1976. Cohesion in
English. London: Longman.
House, J. 2006. Covert Translation, Language Contact, Variation
and Change. In: SYNAPS 19. 25-47.
House, J. 2011. Using translation and parallel text corpora to
investigate the influence of Global English on text norms in other
languages. In: A. Kruger et al eds. Corpus-based Translation
Studies. London: continuum.
Kunz, K. & Lapshinova-Koltunski, E. 2011, Tools to Analyse GermanEnglish Contrasts in Cohesion. In proceedings of GSCL-2011,
Hamburg, Germany.
Neumann, S. & S. Hansen-Schirra. The CroCo Project. Crosslinguistic corpora for the investigation of explicitation in
translations. In Proceedings from the Corpus Linguistics Conference
Series (PCLC), 2005. Vol. 1 no. 1,
Steiner, E. 2008. Empirical studies of translations as a mode of
language contact - “explicitness” of lexicogrammatical encoding as
a relevant dimension. In: Siemund, P. & N. Kintana (eds.). Language
contact and contact languages. Amsterdam: John Benjamins
(Hamburg Studies in Multilingualism Vol. 7). pp. 317-346.
Teich, Elke. 2003. Cross-Linguistic Variation in System and Text: A
Methodology for the Investigation of Translations and Comparable
Texts. Berlin: Mouton de Gruyter.
Спасибо за внимание!
У вас есть вопросы?
Do you have any questions? Comments?
Katrin Menzel
[email protected]
50