A Lexical Theory of Variation

Download Report

Transcript A Lexical Theory of Variation

A Lexical Theory of Variation

Andries W. Coetzee Workshop on Variation, Gradience and Frequency in Phonology Stanford University, July 2007 0

Things that are known to influence variation

Grammar

(i) Where: Where it appears and where not (ii) Frequency: How often does a process apply in some context 

Lexical frequency

Some variable processes affect frequent words more, others affect infrequent words more.

Extra-grammatical factors

Speech style, speech rate, etc.

1

Existing theories of variation

Grammatical

• Variable rule in the Labovian tradition (Labov 1972; Sankoff 1988) • Several OT models (Anttila 1997; Boersma and Hayes 2000; Coetzee 2006; Reynolds 1994) Reasonably successful at accounting for the grammatical influence.

Usage-based/exemplar models

(Bybee 2001, 2002; Pierrehumbert 2001) Reasonably successful at accounting for the influence of lexical frequency.

Interaction between the two

Models that incorporate both are still largely absent.

2

Structure of the presentation

1. Usage frequency and variation 2. The basics of the proposal 3. Phonetically motivated variation 4. Analogically motivated variation 5. Learning lexical distributions 3

Usage Frequency and Variation

4

  

Phonetically motivated variable process

Typical phonological process Applies more often to lexical items with higher usage frequency Example: t/d deletion Pre-C: west bank ~ wes bank Pre-V: Pre-##: west end ~ wes end west ~ wes

Chicano English

(Santa Ana 1991) n % deleted Pre-C 3,693 62 Pre-V 1,574 45 Pre-## 1,024 37

Influence of frequency

(Bybee 2000:70) n % deleted High frequency 1,650 54.4

Low frequency 399 34.4

5

Analogically motivated variable process

   Usually some kind of regularization process – irregular plural/past tense replaced with regular Applies more often to lexical items with lower usage frequency Example: Regularization of past tense verbs Infrequent verbs are more likely to regularize (Hooper 1976:100; Bybee 1985:120, 2002:269; Bybee & Slobin 1982) Less likely to regularize Present Raw keep leave 348 345 Log 2.54

2.54

More likely to regularize Present Raw creep leap 19 20 Log 1.28

1.30

sleep 106 2.03

weep 22 1.34

drive 174 2.24

dive 32 1.93

Kučera and Francis frequencies (1982) as calculated at www.iphod.com.

 Also many examples from the historical literature. (Phillips 1984, 2001 and references therein.) 6

The challenge

A formal theory of variation that:  Captures the role of grammar • Determines what kind of variation is possible • Influences the frequency of application  Captures the role of lexical frequency • Variable process applies differently to different lexical items.

• Different kinds of processes are differently influenced by lexical frequency.

7

The Proposal: Variation Through Lexical Indexation

8

Variable lexical indexation

 Lexically indexed constraints (Pater 1994, 2000; Itô & Mester 1995, 1999) • Allows a way in for lexical influence • Yet still keep control in the hands of grammar  Variation through variable lexical class affiliation /west/ L2 /west/ L1   west wes west wes M AX -L2 *!

M * *!

M AX -L1 *  Note that the grammar stays constant – what varies is the lexical class affiliation of lexical items. Variation is hence moved from the grammar into the lexicon.

9

Lexical distribution functions

  What determines the lexical class affiliation of a lexical item?

Each lexical item is stored with a probability density function.

• • • Every time a lexical item is submitted to grammar for evaluation, a value is chosen randomly along the x-axis of the distribution function.

The x-axis is divided into equally sized adjacent regions corresponding to the number of indexed versions of the constraint.

Correlation between frequency and skewness of distribution function: – – Frequent lexical items = left skewed function Infrequent lexical items = right skewed function average 2 1 low high 0 L2 L1 10

Example 1: Phonetically Motivated Variation

11

t/d-deletion again Context

n % deleted Pre-C 3,693 62 Pre-V 1,574 45 Pre-## 1,024 37

Frequency

n % deleted High frequency 1,650 54.4

Low frequency 399 34.4

Grammar

 Markedness constraints *P RE -C *P RE -V *P RE -## No t/d in the context C_#C No t/d in the context C_#V No t/d in the context C_## Contextual licensing constraints a la Steriade (1997)   Four indexed versions of M AX .

Ranking: M AX -L4  *P RE -C  M AX -L3  *P RE -V  M AX -L2  *P RE -##  M AX -L1 12

The grammar in Pre-C condition

Preservation if M AX -L4, deletion if M AX -L3, M AX -L2, M AX -L1 /west L4 bank/ /west L3 bank/ /west L2 bank/ /west L1 bank/     west bank wes bank west bank wes bank west bank wes bank west bank wes bank M AX -L4 *P RE -C * M AX -L3 *P RE -V M AX -L2 *P RE -## M AX -L1 *!

*!

* *!

* *!

* 13

The grammar in Pre-V condition

Preservation if M AX -L4, M AX -L3, deletion if M AX -L2, M AX -L1 /west L4 end/ /west L3 end/ /west L2 end/ /west L1 end/     west end wes end west end wes end west end wes end west end wes end M AX -L4 *P RE -C M AX -L3 *P RE -V * M AX -L2 *P RE -## M AX -L1 *!

* *!

*!

* *!

* 14

The grammar in Pre-Pause condition

Preservation if M AX -L4, M AX -L3, M AX -L2, deletion if M AX -L1 /west L4 / /west L3 / /west L2 / /west L1 /     west wes west wes west wes west wes M AX -L4 *P RE -C M AX -L3 *P RE -V M AX -L2 *P RE -## * M AX -L1 *!

* *!

* *!

*!

* 15

Likelihood of deletion based on grammar alone

Grammar: M AX -L4  *P RE -C  M AX -L3  *P RE -V  M AX -L2  *P RE -##  M AX -L1 Context Pre-C Pre-V Pre-Pause Example west side west end west Indexation resulting in retention L4 L4, L3 L4, L3, L2 Indexation resulting in deletion L3, L2, L1 L2, L1 L1 % indexations resulting in deletion 75% 50% 25% Note that grammar determines:  What variation is observed – only a process that reduces markedness, only a process that is grammatically motivated.

 How frequently process applies in which context.

But we still need to give the lexicon its due.

16

The influence of lexical frequency

Infrequent Intermediate Frequent Mean vest modest best Raw frequency 6 29 361 29.8

Log frequency 0.60

1.46

2.56

1.47

Expected deletion Low Medium High Frequencies from Francis & Ku čera (1982), calculated at www.iphod.com.

2 vest modest best 1 0 M AX -L4 M AX -L3 M AX -L2 M AX -L1 *P RE -C *P RE -V *P RE -## 17

Example 2: Analogically Motivated Variation

18

Regularization of the strong past tense in English

 Specific examples from Kučera and Francis (1982) (www.iphod.com) speed dive leap mean Raw 91 32 20 29.8

Base Log 1.96

1.51

1.2

1.47

Regular past 3 5 20 Strong past 9 4 2 % regular 25 56 91  Irregular morphology/suppletion as allomorphy • Two morphological options for formation of the past tense.

• Both options are input to grammar, so that choice of the one allomorph does not violate faithfulness relative to the other.

(Anttila 1997, Bonet 2004, Itô and Mester 2006, Kager 1996, Mascaró 1996, etc.)  Constraints • OO-F AITH : • U SE L ISTED : Some kind of paradigm uniformity (Benua 2000, Kenstowicz 1996,etc.) The input of a candidate must be a single lexical entry (Zuraw 2000) 19

The grammar

{/leap L1 + ed/, /leapt L1 /} OO-Base: leap {/leap L2 + ed/, /leapt L2 /} OO-Base: leap   leaped leapt leaped leapt OO-F AITH -L2 *!

U SE L ISTED *!

OO-F AITH -L1 * *

And the influence from the lexicon

2 leap dive speed 1 0 OO-F AITH -L2 OO-F AITH L1 U SE L ISTED 20

Lexical Distribution Functions

21

What needs to be learned?

Grammar : Lexicon : Ranking between constraints Lexical items, with their probabilistic distribution functions.

These are two separate learning problems, each with their own solution.

Learning the grammar

Well developed learnability literature in OT. (Tesar and Smolensky 1998, 2000, etc.) And specifically on learning an indexed grammar. (Pater 2006, to appear) .

I will therefore not dwell on this aspect here.

Learning the lexicon

Focus here on how the lexical distribution functions might be acquired.

22

General properties of lexical distribution functions

2 infrequent average frequent 1 0 M AX : D EP : L1 L1 L2 L3 I DENT [F]: L1 L2 L3 L4 23

General properties of lexical distribution functions

2 infrequent average frequent 1 0

Basic requirements

  Minimum and maximum value.

Shape parameters that determine skewness   

Beta-distribution

(Evans, Hastings & Peacock 2000)    =  <  >     symmetric right skewed left skewed

f

(

x

,  ,  )   1 0

x

  1 ( 1 

x

)   1

u

  1 ( 1 

u

)   1

du

24

A small scale simulation

 IPhOD 1.3 ( www.iphod.com

) • 33,432 words, with CMU transcriptions and Kučera~Francis frequencies • Multiple KF by 10 to avoid having to work with log(1) …  Calculated the following • Mean frequency of all words in IPhOD = 297.89. Log(  ) = 2.47.

• Collected all words that end [-Ct] or [-Cd], excluding past tense verbs, and took the log of the frequency for each of these.

 Distribution functions: Frequency frequent infrequent (f >  ) (f <  ) left right Skewness (  >  ) (  <  )  log(f) log(  )  log(  ) log(f) 25

A small scale simulation

Frequency Log aghast 10 1 vest 40 1.60

modest 290 2.46

best 3610 3.56

most 11610 4.07

 297.89

2.47

3 aghast 2 vest 1 0 modest most M AX -L4 M AX -L3 M AX -L2 M AX -L1 *P RE -C *P RE -V *P RE -## best 26

How well do the predictions line up with reality?

  Once the values of  and  for a word are known, it is easy to calculate the likelihood of an x-value falling in a specific range along the x-axis, and hence the likelihood of deletion in each of the three contexts for each word.

Using this, I ran a simulation, feeding each [-Ct] and [-Cd] word through the grammar, according to its frequency in IPhOD.

Phonological context

(value in brackets is ratio to Pre-C) (Santa Ana 1991) Chicano English Predictions of LTV Pre-C 62 90 Pre-V 45 (.73) 62 (.69) Pre-## 37 (.60) 27 (.30)

Frequency

(value in brackets is ratio to > 35/million) (Bybee 2000) Chicano English Predictions of LTV > 35/million 54 65 < 35/million 34 (.63) 43 (.66) 27

How can this be refined further?

 Currently, the lexical distribution functions are determined purely based on lexical frequency. But we know that different dialects show different deletion rates.

• Either different dialects have different lexical frequencies.

• Or there are other parameters that can be set independently from lexical frequency.

 Maybe some constant is added/subtracted from the mean?

• Added = more words become “infrequent” = more conservative dialect.

• Subtracted = more words become “frequent” = more deletion.

 Maybe the lexical space can be warped – i.e. the regions along the x-axis that correspond to lexical classes are not of equal size.

 Maybe lexical distribution functions are best-fit functions – i.e. learn a function that would result in the correct deletion rate … but then we lose the connection between usage frequency and deletion rates.

28

Conclusion

29

Conclusion

 Existing grammatical models of variation do not allow the lexicon enough opportunity to play a role. (Pierrehumbert 2001): p. 138 p. 148 A second challenge arises from the fact that the differential phonetic outcomes relate specifically to word frequency. Standard generative

models do not encode word frequency. They treat the word frequency effects … as matters of linguistic performance rather

than linguistic competence. Thus the intrusion of word frequency into a traditional area of linguistics, namely to conditioning of allophony, is not readily accommodated in the classical generative viewpoint.

The exemplar model is the only current model which has these properties.

 Purely usage-based models probably does not allow the grammar enough say. Bybee (2000:73) … it does mean that there is no variable rule of t/d-deletion. Rather there is a gradual process of shortening or reducing the lingual gesture … Bybee (2002:268) If we take linguistic behavior to be highly practiced neuromotor activity … then we can view reductive sound changes as the result of the automation of linguistic production. It is well known that repeated neuromotor patterns become more efficient as they are practiced; transitions are smoothed by the anticipatory overlap of gestures, and unnecessary or extreme gestures decrease in magnitude or are omitted.

 LTV is an attempt to do both. Does it succeed?

30

References

Anttila, Arto. 1997. Deriving variation from grammar. In Frans Hinskens, Roeland van Hout and Leo Wetzels, eds. Variation, Change and Phonological Theory, Amsterdam: John Benjamins. p. 35-68.

Benua, Laura. 2000. Phonological Relations Between Words . New York: Garland.

Boersma, Paul and Bruce Hayes. 2000. Empirical tests of the Gradual Learning Algorithm. Optimality Theory. Linguistic Inquiry , 32: 45-86.

Bonet, Eulàlia. 2004. Morph insertion and allomorphy in International Journal of English Studies , 4:73-104.

Bybee, Joan L. 1985. Morphology: A Study of the Relation Between Meaning and Form . Amsterdam: Benjamins.

Bybee, Joan L. 2000. The phonology of the lexicon: evidence from lexical diffusion. In Michael Barlow and Suzanne Kemmer, eds. Usage-Based Models of Language . Stanford: CSLI Publications. p. 65-85.

Bybee, Joan. 2001. Phonology and Language Use . Cambridge: Cambridge University Press.

Bybee, Joan. 2002. Word frequency and context of use in the lexical diffusion of phonetically conditioned sound change. Language Variation and Change Language , 58:265-289.

optimal” candidates. Optimality Theory Phonology , 14:261-290.

Bybee, Joan L. and Dan I. Slobin. 1982. Rule and schemas in the development and use of the English past tense. Coetzee, Andries W. 2006. Variation as accessing “non , 23:337-385.

Itô, Junko and Armin Mester. 1995. The core-periphery structure of the lexicon and constraints on reranking. In J. Beckman, S. Urbanczyk, and L. Walsh, eds. University of Massachusetts Occasional Papers in Linguistics 18: Papers in , Amherst: GLSA. p. 181-209.

Itô, Junko and Armin Mester. 1999. The structure of the phonological lexicon. In Tsujimura Natsuko, ed. The Handbook of Japanese Linguistics . Malden: Blackwell. p. 62 100.

Itô, Junko and Armin Mester. 2006. Indulgentia parentum filiorum pernicies: Lexical allomorphy in Latin and Japanese. In Eric Bakovic, Junko Ito, and John McCarthy, eds. Wondering at the Natural Fecundity of Things: Essays in Honor of Alan Prince . Paper 9. (http://repositories.cdlib.org/lrc/prince/9).

Hooper, Joan B. 1976. Word frequency in lexical diffusion and the source of morphological change. In William M. Christie, ed. Current Progress in Historical Linguistics . Amsterdam: North-Holland Publishing Co. p. 95-105.

Kager, René. 1996. On affix allomorphy and syllable counting. In Ursula Kleinhenz, ed. Interfaces in Phonology . Berlin: Akademie Verlag. p. 155-171.

Kenstowicz, Michael. 1996. Base-identity and uniform exponence: alternatives to cyclicity. In Current Trends in Phonology: Models and methods . In J. Durand and B. Laks, eds. Paris-X and Salford: University of Salford Publications. p. 363-393 Labov, William. 1972. The internal evolution of linguistic rules. In Robert P. Stockwell and Ronald K.S. Maucaulay, eds. Linguistic Change and Generative Theory . Bloomington: Indiana University Press. p. 101-171. Mascaró, Joan. 1996. External allomorphy as emergence of the unmarked. In Jacques Durand and Bernard Laks, eds. Current Trends in Phonology: Models and Methods of Salford. pp. 473-83.

. Salford, Manchester: European Studies Research Institute, University 31

References

Pater, Joe. 1994. Against the underlying specification of an ‘exceptional’ English stress pattern. Toronto Working Papers in Linguistics Phonology , 13:95-121.

Pater, Joe. 2000. Non-uniformity in English secondary stress: the role of ranked and lexically specific constraints. , 17:237-274.

Pater, Joe. 2006. The Locus of Exceptionality: Morpheme Specific Phonology as Constraint Indexation. In L. Bateman, M. O'Keefe, E. Reilly, and A. Werle, eds. Optimality Theory III Phonological Argumentation sound change. University of Massachusetts Occasional Papers in Linguistics 32: Papers in . Amherst: GLSA. p. 259-296.

Pater, Joe. to appear. Morpheme-specific phonology: constraint indexation and inconsistency resolution. In Steve Parker, ed. Language . London: Equinox Publishers.

Phillips, Betty S. 1984. Word frequency and the actuation of , 60:320-342.

Phillips, Betty S. 2001. Lexical diffusion, lexical frequency, and lexical analysis. In Joan Bybee and Paul Hopper, eds. Frequency and the Emergence of Linguistic Structure . Amsterdam: John Benjamins. p. 123-136.

Pierrehumbert, Janet. 2001. Exemplar dynamics: Word frequency, lenition, and contrast. In Joan Bybee and Paul Hopper, eds. Frequency Effects and the Emergence of Lexical Structure. Amsterdam: John Benjamins. p. 137-157.

Reynolds, Bill. 1994. Variation and Phonological Theory . Ph.D. dissertation, University of Pennsylvania.

Sankoff, David. 1988. Variable rules. In Ulrich Ammon, Norbert Dittmar and Klaus J. Mattheier, eds.

International Handbook of the Science of Language and Society Sociolinguistics: An . Berlin & New York: Walter de Gruyter. p. 984-997.

Santa Ana, Otto. 1991. Phonetic Simplification Processes in the English of the Barrio: A Cross-Generational Sociolinguistic Study of the Chicanos of Los Angeles . Ph.D. Dissertation, University of Pennsylvania. Steriade, Donca. 1997. Phonetics in Phonology: The Case of Laryngeal Neutralization. Ms. UCLA.

Tesar, Bruce, & Paul Smolensky. 1998. Learnability in Optimality Theory. Linguistic Inquiry , 29:229-268.

Tesar, Bruce, & Paul Smolensky. 2000. Learnability in Optimality Theory . Cambridge, MA: MIT Press.

Zuraw, Kie. 2000. Patterned Exceptions in Phonology. Ph.D. dissertation, UCLA.

32

Die einde 33