Недискретность в языке и фокальная структура Non-discreteness in language and focal structure А.А.Кибрик (ИЯз РАН и МГУ) [email protected] НПММВЯ, 12 октября1
Download ReportTranscript Недискретность в языке и фокальная структура Non-discreteness in language and focal structure А.А.Кибрик (ИЯз РАН и МГУ) [email protected] НПММВЯ, 12 октября1
Недискретность в языке и фокальная структура Non-discreteness in language and focal structure А.А.Кибрик (ИЯз РАН и МГУ) [email protected] НПММВЯ, 12 октября 2013 1 The problem We tend to think about language as a system of discrete, segmental units (phonemes, morphemes, words, sentences...) But this view does not survive an encounter with reality 2 Simple example: morpheme fusion Russian adjective детский ‘children’s, childish’ det-sk-ij child-Attr-M.Nom Root-Suffix-Ending suffix [ d’eck’ -ij ] root Many human languages have something like that in morphological structure 3 Similar phenomena abound at all lingustic levels Phonemes Syllables Words Clauses Sentences 4 Phonemes Coarticulation: cat keep cool Engwall (2000): articulographic study of how pronunciation of Swedish fricatives is affected by surrounding vowels Sequences such as asa, ɪsɪ, ɔsɔ, ʊsʊ, aɕa, ɔʂɔ, ʊfʊ, etc. For example: context of labial vowels strongly increases lip protrusion 5 Phonemes (continued) Also, the tongue is more anterior in the context of the front vowel /ɪ/ compared to back vowels (Engwall 2000: 10) That is, boundaries between “segments” are not really segmental Trying to posit boundaries in the signal inevitably means a kind of digitalization 6 Syllables Language speakers often naturally “feel” the syllabic structure But segmentation into syllables is usually less than clear-cut For example, speakers of Pulaar confidently segment words into syllables, e.g. gor |ko ‘man’ But cf. the behavior of geminated consonants On the one hand, when asked to segment a word into syllables, speakers of Pulaar usually posit a boundary between the two copies of a geminated consonant: hok |kam ‘give me’ On the other hand, a Pulaar secret language is reported Geminates such as kk are thus inconsistent: the encrypting sequence lfV is inserted after the first syllable of a word (Gaden 1914, Labouret 1952: 108): hokkam ndiyam ‘give.me water’ holfokkam ndilfiyam in some way they belong to two different syllables in some other way they form the onset of a syllable (Koval 2000: 114, 185) 7 Words Possessive constructions N + N On the one hand, of is a preposition and thus clearly belongs to the possessor rather than to the possessed analytic of-genitive: the retinue of the queen the retinue [of the queen], lots [of stuff] On the other hand, there are indications of reanalysis This kind of graphic practices suggest that language users attach the clitic of to the possessed rather than to the possessor In terms of Nichols 1986, in these kinds of examples English hesitates on behaving as dependent-marking or head-marking Of displays doubleface behavior in two ways Jurafsky et al. 1998: of is so often reduced that one must posit the allomorph [ɔ] Native users of English feel that and render that in spelling, also altering the affiliation of the clitic lots of > lotsa, couple of > coupla “Kinda outta luck” (song by Lana del Rey) as any clitic, it is a semi-word, that is something between a word and an affix it oscillates between two possible hosts 8 Clauses Widely held view of “syntax from the discourse perspective” (see Chafe 1994): Local discourse structure consists of quanta, or chunks, or elementary discourse units (EDUs) (Kibrik and Podlesskaya eds. 2009) EDUs can be defined by a set of prosodic criteria Thus identified EDUs typically coincide with clauses The level of such coincidence mostly varies within the range between 1/2 and 3/4 9 Clauses (continued) Language Percentage of clausal EDUs English (Chafe 1994) 60% Mandarin (Iwasaki and Tao 1993) 39.8% Sasak (Wouk 2008) 51.7% Japanese (Matsumoto 2000) 68% Russian (Kibrik and Podlesskaya eds. 2009) 67.7% Upper Kuskokwim (Kibrik 2012) 70.8% 10 Clauses (continued) However, there is a significant residue Non-clausal EDUs Subclausal EDUs • Increments (translation from a Russian spoken corpus “Night Dream Stories” – Kibrik and Podlesskaya eds. 2009) And suddenly I saw a box. With a ribbon on top. Increments appear after a clear prosodic boundary At the same time, they semantically and grammatically fit into the preceding base clause Such increments simultaneously belong and do not belong to the preceding clause They are outliers in clause structure 11 Paradigmatics So far we have only discussed difficulties associated with the syntagmatic indentification of units The same problem applies to paradigmatic boundaries Marginal phonemes That is, boundaries between classes, types, or categories in an inventory “One might consider the voiceless velar fricative /x/ occurring in words such as Bach (the German composer) or loch (a Scottish lake) as a marginal phoneme for some speakers of English” (Brinton and Brinton 2010: 53) Russian [w] in loan words • Russian has phonemes /v/ and /u/ • English William > Russian Вильям or Уильям Vil’jam Uil’jam [v] [u], increasingly [w] • English wow > Russian: usually spelled вау vau, pronounced [wau] 12 Semantics Semantics provides particularly abundant evidence of nondiscrete boundaries Plethora of examples have been discussed in cognitive semantics Textbook example from Labov’s 1973 “Boundaries of words and their meanings” cup bowl 13 Diachronic change Diachrony provides innumerable examples of nondiscrete boundaries between linguistic elements or stages Hock and Joseph 1996: 237-238 Old English wēod ‘plant’ and wæ̅d(e) ‘garment’ Both developed into modern English weed The meaning ‘garment’ only survives in a couple of expressions, such as widow’s weed ‘a widow’s mourning clothes’ Modern speakers tend to connect this usage with the winning weed The erstwhile meaning of wæ̅d(e) is echoed in the modern language as a faint trace 14 Language wholeness Languages are identifiable, but every language has internal variation Consider a very small language, Upper Kuskokwim Athabaskan Ethnic group of about 200 individuals in central interior Alaska About 20 remaining speakers The members of the group have a clear feeling of identity, as well as separateness from other neighboring Athabaskan languages Still, striking dialectal variation In particular, the rendering of ProtoAthabaskan coronal consonant series © Michael Krauss, 2011 15 Language wholeness (continued) Interdental Dental Retroflex As in: Dialect: Conservative: ‘my tongue’ ‘snow’ ‘raven’ sitsula’ tsetł' dotron' Tanana Standard merger: sitsula’ tsetł' dotron' Tsetsaut Downriver merger: sitsula’ tsetł' dotson' Koyukon Merger of all three sitsula’ tsetł' dotson' Ahtna no merger loss of interdentals loss of retroflex 16 Language wholeness (continued) Note that the rendering of coronal series is traditionally used as the basis for classifying the family into branches This situation can be explained by geographical and demographic factors The Upper Kuskokwim traditional territory probably occupied over 50 K square kilometers Traditionally, contact between famlies/bands was seasonal or sporadic Still, what identifies the language’s wholeness and boundaries in terms of internal characteristics? 17 Proto-languages Linguists often speak about proto-languages (ProtoGermanic, Proto-IE, etc.), as if they were fixed, 100% homogeneous communities without any internal variation Dahl (2001) discussed the status of Old Nordic He questions the notion of Common Nordic and the assumption that the Scandinavians “changed their language all at the same time and in the same fashion, as if conforming to a EU regulation on the length of cucumbers” (p. 227). Contrary to the traditional tree-like picture of a protolanguage splitting into daughter languages, Dahl suggests that the spread of prestige dialects may have led to a decrease in diversity and to unification 18 Language contact Trudgill 2011: 56-58 Contact with Low German affected Scandinavian languages significantly This influence can generally be described as simplification That was possible because in the 1400s cities such as Bergen and Stockholm had about 1/3 or more of German population When non-native population reaches close to 50%, natives accommodate Boundaries between languages are thus penetrable Almost all languages are creoles to a degree 19 Other cognitive domains Studies by the Russian psychologist Yuri Alexandrov Alexandrov and Sergienko 2003: psychophysiological experiments demonstrate the non-disjunctive character of mind and behavior • “Continuity is the overarching principle in the organization of living things at various levels” (p. 105) Alexandrov and Alexandrova 2010: complementary, non-disjunctive character of cultures • Niels Bohr, discussing the relationships between cultures, emphasized that, “unlike physics <...> there is no mutual exclusion of properties belonging to different cultures”. 20 Intermediate conclusion Language (as well as cognition in general) simultaneously longs for discrete, segmented structure tries to avoid it The omnipresence of non-discreteness effects has not yet led to proper recognition in the mainstream linguistic thinking Linguists are often bashful about non-discreteness But non-discreteness is not just a nuisance Non-discrete effects permeate every single aspect of language This problem is in the core of theoretical debates about language 21 Possible reactions “Digital” linguistics: ignore non-discrete phenomena or dismiss them as minor Ferdinand de Saussure: language only consists of identities and differences More inclusive (“analog”) linguistics: the discreteness delusion appeal of scientific rigor but reductionism a bit too simplistic often a mere statement of continuous boundaries and countless intermediate/borderline cases 22 Cognitive science Wittgenstein: family resemblance Rosch: prototype theory Lakoff: radial categories D C A B Picture from Janda and Nesset 2012 A is the prototypical phoneme/word/clause/meaning... B, C, and D are less prototypical representatives We still need a theory for: boundaries between related categories boundaries in the syntagmatic structure 23 My main suggestion In the case of language we see the structure that combines the properties of discrete and non-discrete: focal structure Focal phenomena are simultaneously distinct and related Focal structure is a special kind of structure found in linguistic phenomena, alternative to the discrete structure It is the hallmark of linguistic and, possibly, cognitive phenomena, in constrast to simpler kinds of matter 24 Various kinds of structures 1 discrete structure 2 1 continuous ▐ structure 2 focal structure ▐ focal point 1 outlier hybrid focal point 2 or anchor point25 A possible analogy: neuronal structure with synapses 26 Examples Syntagm. Paradigm. Diachr. Lg.contact det [c] v w wēod (widow’s) weed Old Norse Norwegian sk u wæ̅d(e) Low German etc., etc. focal point 1 ▐ focal point 2 27 Caveat The claim about non-discrete boundaries should not be overstated Phonemes, words, clauses, and languages do exist They are just not as discrete and segmental as we apparently want them to be We should not replace the discrete structure with the idea of a mere continuum, basically non-structure Cf. Goddard 2010: 233 defending the discrete character of meaning by dismissing the idea of a continuum or merging Something like focal structure is in order as the major model of linguistic and cognitive “matter” 28 Peripheral status of non-discrete phenomena in linguistics Are linguists unaware about the nondiscreteness effects? No, they are aware of them “distinct but related” But they tend to ignore them Why? I am not sure But I suspect the answer is related to the well known Kant’s problem 29 Kant’s puzzle The Critique of Pure Reason: The role of observer, or cognizer, crucially affects the knowledge of the world “The schematicism by which our understanding deals with the phenomenal world ... is a skill so deeply hidden in the human soul that we shall hardly guess the secret trick that Nature here employs.” It is possible that the human analytical mind is digital, and it wants its object of observation to be digital as well In addition, standards of scientific thought have developed on the basis of physical, rather than cognitive, reality Physical reality is much more prone to the discrete approach Compared to the physical world, in the case of language and other cognitive processes Kant’s problem is much more acute because mind here functions both as an observer and an object of observation, so making the distinction between30 the two is difficult A paradoxical state of affairs Language is full of non-discrete phenomena But our “digital” mind is biased towards discreteness “The tyranny of the discontinuous mind” (R. Dawkins) It is like eyeglasses keeping only a part of the reality and filtering out the rest Addressing the “analog” reality in its entirety is often perceived as pseudo-science, or quasi-science at best Language is unknowable, a Ding an sich? Perhaps, partly because of the scientific tradition based on segmentation and categorization (Aristotelian, “rational”, “lefthemispheric”, etc.) 31 What to do? We need to develop a more embracing linguistics and cognitive science that address non-discrete phenomena: not as exceptions or periphery of language and cognition but rather as their core Can we outwit our mind? Two suggestions towards this goal 1. Object of investigation: concentrate on obviously non-discrete communication channels, not so burdened with the tradition of discrete analysis 2. Methodology: new type of models 32 SUGGESTION 1: Look at communication channels other than verbal Explore gesticulation accompanying speech Explore prosody These communication channels are obviously less discrete than the verbal code So it may be a good idea to develop new theoretical approaches on the basis of gesticulation and prosody, then apply them to traditional, “segmental” language Michael Tomasello (2009): in order to “understand how humans communicate with one another using a language <…> we must first understand how humans communicate with one another using natural gestures” I discuss a case study in “Reference in discourse” (2011) Sandro Kodzasov (2009): “there is a multitude of prosodic techniques <...> defining the basic gestalts of our perception of the world” 33 Sentences In written language, sentences are separated from each other by dedicated punctuation marks Is the notion of sentence applicable to spoken language? cf. the “written language bias” (Linell 2005) written language, inherently digital, hypnotizes people and makes them think that language is generally discrete “Is sentence viable?” (Kibrik 2008) In brief, spoken Russian displays two major prosodic patterns: “comma intonation”: rising on the main accent of EDU “period intonation”: final falling on the main accent of EDU But also “falling comma intonation” – non-final falling: similar to comma intonation in terms of discourse semantics formally similar to period intonation /, \. \, 34 Proposed solution It appears that non-final falling is not as low as final falling But the difference cannot be identified in absolute terms • Great variation (gender, individual) • What is final falling in one person can be non-final in another Employ the speaker’s “prosodic portrait” Final falling , targets at the bottom of the given speaker’s F0 range Non-final falling targets at a level several dozen Hz (several semitones) higher than the final falling in the given speaker 35 F0 graph for an example There was a lake, either a river, or a lake, but I guess a lake, because somehow it was small, not a big one. And across it there was a log, like a bridge. 12 10 / / / \ \ \ \ \ 12 8 5 \ozero, \malen’koe takoe, \nebol’ šoe. \brevno kakoe \mosta. -to, 36 Representation of EDU continuity types (or “phase” types) in corpus 44% 50% 40% 30% 20% Final falling 33% 23% Non-final falling 10% 0% (Non-final) rising 37 Sentences (continued) There are clearly contrasted, focal patterns: final falling (end) rising (non-end) Speakers and listeners usually “know” when a sentence is completed and when it is not Spoken sentences are the prototype of written sentences In addition, the hybrid type must be recognized: non-final falling It can be identified on the basis of speaker’s prosodic portraits This helps to deal with tremendous phonetic variation With this analysis, the notion of spoken sentence remains viable 38 SUGGESTION 2: Entertain another type of models Methodological point 1960s: a fashion of “mathematical methods” in linguistics That did not bring much fruit, primarily because of the non-discreteness effects Time for another attempt of bringing in more useful kinds of mathematics 39 Ongoing project: Modeling referential choice in discourse When we mention a person/object, we choose from a set of options Corpus of Wall Street Journal texts Annotation for multiple variables, candidate factors of ref. choice proper name: Kant description: the philosopher reduced form: he words – 45016, EDUs – 5497, anaphors – 3994 distances to antecedent antecedent’s syntactic role protagonisthood animacy .............. Machine learning algorithms Two-way task: Full NP vs. pronoun Three-way task: proper name vs. description vs. pronoun logical logistic regression compositions 40 Results of machine learning modeling (RefRhet 3, Xудякова 2013) Алгоритм Совпадение Несовпадение Аккуратность Логистическая регрессия 1615 237 87,2% Деревья решений С4.5 1735 117 93,7% Деревья решений С4.5, 1655 197 89,4% 1658 194 89,5% улучшенные бэггингом Деревья решений С4.5, улучшенные бустингом 41 Non-categorical referential choice 100% accuracy cannot be reached The choice is not always deterministic (Kibrik 1999): often only one option is appropriate sometimes both Kant and he are appropriate Experiment (Mariya Khudyakova) Nine texts in which the algorithms deviated in their prediction compared to the original referential choice: pronoun instead of a proper name Each text was presented to 60 experiment participants, in one of the two variations: original (proper name) and altered (pronoun) 1 question about the referent of a proper name/pronoun, 42 2 control questions Non-categorical referential choice (continued) 84% ̶ correctness of answers to proper names 80% ̶ correctness of answers to pronouns (in 7 texts) In two instances participants showed a significant drop in their accuracy In these instances the algorithm correctly predicted a pronoun, even though deviating from the original referential choice In these instances the algorithms erred in their prediction Logistic regression provides the degree of certainty in prediction That can be, with due caution, interpreted as probability The degree of certainty in the prediction of a pronoun varied between 0.5 and 0.8 (in all instances but one): • moderate probability of a pronoun, according to the algorithm’s judgment 43 New type of models Non-categorical referential choice: a hybrid between the clear, crisp, focal instances Probabilistic modeling and machine learning techniques can be used to simulate human behavior in non-categorical situations We need to employ (and develop!) mathematical methods appropriate for the “cognitive matter” 44 Conclusion Just as we invoke scientific thinking, we tend to immediately turn to discrete analysis This may be the reason why discrete linguistics is so popular, in spite of the omnipresence and obviousness of non-discrete effects This may be our inherent bias, or a habit developed in natural sciences, or a cultural preference But in the case of language and other cognitive processes we do see the limits of the traditional discrete approach It remains an open question if linguists and cognitive scientists are able to eventually overcome the strong bias towards “pure reason” and discrete analysis, or language will remain a Ding an sich But it is worth trying to circumvent this bias and to seriously explore the focal, non-discrete structure that is in the very core of language and cognition In the future, mathematics may play a crucial role in linguistics and cognitive science, as it already did in physics and biology 45 Thanks for your attention CONGENIAL QUOTATIONS “Unfortunately, or luckily, no language is tyrannically consistent. All grammars leak.” (Sapir 1921: 38) “Words as well as the world itself display the ‘orderly heterogeneity’ which characterizes language as a whole” (Labov 1973: 30) “The mind-brain is both modular and interconnected <...> To insist on one to the exclustion of the other is to short-change the enormous complexity of this quintessentially hybrid system” (Givón 1999: 107-108) 46 References Alexandrov, Yuri I., and Natalia L. Alexandrova. 2010. Komplementarnost’ kul’tur. In: M.A.Kozlova (ed.) Ot sobytija k bytiju. M: Izd. dom VShE, 298-335. Alexandrov, Yuri I., and Elena A. Sergienko. 2003. Psixologicheskoe i fiziologicheskoe: kontinual’nost’ i/ili diskretnost’? Psixologicheskij zhurnal 24.6, 98-109. Brinton, Laurel J., and Donna Brinton. 2010. The linguistic structure of modern English. Amsterdam: Benjamins. Chafe, W. 1994. Discourse, consciousness, and time. Chicago: University of Chicago Press. Dahl, Östen. The origin of the Scandinavian languages. 2001. In: Dahl, Östen, and Maria Koptjevskaja-Tamm (eds.) The Circum-Baltic languages. Typology and contact. Vol. 1. Amsterdam: Benjamins, 215-236. Engwall, Olov. 2000. Dynamical aspects of coarticulation in Swedish fricatives – a combined EMA & EPG study. TMH-QPSR 4/2000. Givon, T. 1999. Generativity and variation: The notion ‘Rule of grammar’ revisited. In: B.MacWhinney (ed.) The emergence of language. Mahwah: Erlbaum, 81-114. Goddard, Cliff. 2011. Semantic analysis: A practical introduction. Oxford: OUP. Hoch, Henrich, and Brian Joseph. 1996. Language history, language change, and language relationship. Berlin: Mouton de Gruyter. Iwasaki S., Tao H.-Y. 1993. A comparative study of the structure of the intonation unit in English, Japanese, and Mandarin Chinese. Paper presented at the annual meeting of LSA. 47 References (continued) Jurafsky, Daniel, Alan Bell, Eric Fosler-Lussiery, Cynthia Girand, and William Raymond. 1998. Reduction of English functionwords in switchboard. In Proceedings of ICSLP-98, Sydney Kibrik, A.A. 2008a. Est’ li predlozhenie v ustnoj rechi? // A.V.Arxipov et al. eds. Fonetika i nefonetika. M.: JaSK, 104—115. Kibrik, A.A. Reference in discourse. Oxford, 2011. Kibrik, A.A. Prosody and local discourse structure in a polysynthetic language. 2012 Kibrik A. A., Podlesskaya V. I. (eds.) 2009. Rasskazy o snovidenijax: Korpusnoe issledovanie ustnogo russkogo diskursa [Night Dream Stories: A corpus study of spoken Russian discourse]. Moscow: JaSK. Koval A.I. Morfemika Pulaar-Fulfulde [Formal morphology of Pulaar-Fulfulde] // V.A.Vinogradov ed. Osnovy afrikanskogo jazykoznanija. Morfemika. Moscow: Vost. literatura, 2000, 103 - 290 Labouret, Henri. 1952. La langue des Peuls ou Foulbé. Dakar : IFAN. Labov, William. 1973. The boundaries of words and their meanings. In: R. Fasold (ed.) Variation in the form and use of language. Georgetown University Press, 29-62. Linell, P. 1982. The written language bias in linguistics. Linköping, Sweden: University of Linköping. Matsumoto K. 2000. Japanese intonation units and syntactic structure. Studies in Language 24: 525-564. Trudgill, Peter. 2011. Sociolinguistic typology: Social determinants of linguistic complexity. Oxford: Oxford University Press. 48 Wouk F. 2008. The syntax of intonation units in Sasak. Studies in Language 32: 137–162. Acknowledgements Yuri Alexandrov Mira Bergelson Svetlana Burlak Olga Fedorova Vera Podlesskaya Natalia Slioussar Valery Solovyev 49