No Slide Title

Download Report

Transcript No Slide Title

What are “phonetic categories”?
[email protected]
Sarah Hawkins & Ingrid Johnsrude
[email protected]
Phonetics Laboratory, Dept. of Linguistics, University of Cambridge, UK; Dept. of Psychology, Queen’s University, Canada
2) Acoustic variability is systematic
and potentially informative
The problem
Models of speech perception often emphasize phonetic or
phonological categories (features, phonemes, gestures)
that:
• are stable, abstract entities;
• result from stripping (irrelevant) variation from the
speech stream;
• are prerequisite to the processing of other aspects of
speech (grammar and meaning).
“Bottom-up”
sequence of
processing levels
typically assumed
Usually neglected
by models of
speech perception
meaning
grammar
?
?
?
abstract word
representations
Standard domain
of models of
speech perception
abstract phonemes
0.7464
The acoustic realization of a phoneme is systematically influenced
by10,11:
1) allophonic variation:
• position in the syllable (eg “tip” vs “pit”)
• boundaries between words (eg “grey train” vs “great rain”)
• grammatical status (eg the productivity of a morpheme;
content vs function words)
2) speaker intent & register (discourse function, casualness, rate)
3) talker identity
Experiments show listeners use much of this systematic variability12-17
Examples: Grammatical information conveyed
by systematic acoustic-phonetic variation
Productive morpheme
Syllable-internal spectro-temporal
relationships indicate morphemic
productivity. Spectrograms of
‘mistimes’ & ‘mistakes’ from ‘I’d be
surprised if Tess___ it.’ The first four
phonemes (/mist/) are the same. Their
acoustic differences produce a different
rhythm that may signal that ‘mis’ in
‘mistimes’ is a productive morpheme,
whereas ‘mis’ in ‘mistakes’ is not.
0
This systematic acoustic variation has
implications for models of word
recognition incorporating lexical
competition.15, 18-20
-0.45
0
0.900952
Tim e (s)
But:
1) phonemic category boundaries shift with phonetic
context, meaning, and the function of the utterance.
2) much variability in speech sounds is systematic and
potentially informative about features of speech other
than phonemic categories.
1) Phonemic category boundaries are
context-dependent; thus not stable
Range & frequency effects: category boundaries tend towards the
middle of the stimulus series. When stimuli are removed from one
end of a continuum, the boundary shifts towards the other end.
Stimulus frequency (& previous stimulus) affect current decision.1-4
Meaning: phonemic boundaries favour the phoneme related to the
word in word-nonword continua;5 they favour sensible meanings in
word-word continua in sentences.6
1
Perceptual learning: Rather little exposure to a novel pronunciation
is required for a phonemic category boundary to shift.3,7-9
0
A phonemic category boundary
shift due to the Ganong effect.
-0.7581
% /d/
0
50
0
short VOT (d)
long VOT (t)
dask-task
dash-tash
• fine phonetic detail informs about perceptual units at multiple
linguistic ‘levels’ (phonetics/phonology/grammar/meaning)
simultaneously
• and thus over different time domains (variable grain sizes)12,13,35
I’m blowing/going/watching:
aIm aIN aIw
lime bark lime goes crime wave: aIm
In principle, the acoustic pattern can be used by the listener to inform
about the grammatical class of the speech segment being perceived.
In its place in an utterance, ‘I’m’ has few or no acoustic competitors.
Anatomical considerations
The anatomical organization of the brain is not consistent with serial,
feedforward models of speech perception.19
Hence a phonetic category:
• is relational & plastic: each element is bound with other
elements (larger, smaller) and no element can be described
independently of its prosodic, grammatical, & functional context
RIGHT: Temporofrontal connections are parallel
among multiple levels of auditory cortex (belt to
superior temporal sulcus), segregated,
bidirectional, and follow a strict anteriorposterior topographic organization 24,25
Seltzer & Pandya (1989). J
Comp Neurol 281: 97-113
• entails cognitively and neuropsychologically distributed
processes which operate on different types of information13,36,37
Results of functional neuroimaging studies of speech perception are
consistent with multiple, parallel, cascaded auditory streams of
processing.22,23,26-28
Some implications for models of speech perception:
• speech perception, like visual object perception,38,39 may
conform to Bayesian models: e.g. hypotheses about speech
segmental identity (at multiple scales of temporal integration)
may be generated by ‘higher-order’ regions and tested in
‘lower-order’ regions.
Information flow in the auditory system is not unidirectional. Cortical
feedforward connections each have their feedback complement.29-31
Anatomy suggests converging influences from multiple higher stages
of perception, removed from the stage in question by zero, one, or
more intervening stages.21
• major challenges for the next generation of models include:
• use of acoustic-phonetic information at all linguistic levels
• long-range phonetic dependencies, at all linguistic levels
Neurophysiological studies suggest that information in even core
auditory cortex regions is integrated over multiple time domains.32,33
Speech acoustics inform about multiple linguistic levels simultaneously12,13,34
˅ t ɛ s mɪ s t a ɪ m z ɪ t
Acoustic cue
Speakers exhibit different assimilation in function and content words.
E.g. /m/ assimilates to place of next consonant in I’m but not lime or
crime: 11
Petrides & Pandya (1988).
J Comp Neurol 273: 52-66
ABOVE: Anatomical organization of the
macaque cortex suggests four or five discrete,
hierarchically organized stages of auditory
processing between primary core and frontal
cortex 21
Unproductive morpheme
Nature
Events (see waveforms )
˅tɛsmɪsteɪksɪt
Perceptual correlate
Nature
Perceptual correlate
new syllable (simple onset); morpheme; word
poor segment identity
1. periodic, nasal
same as ‘mistimes’
2. nasal-oral boundary Abrupt features for [m]; phoneme /m/?
+ formant definition Clear high front vowel?
Unclear features for nasal? labial??
Unclear high vowel? front vowel??
3. frication start
rel.
Late
syllable coda starts; rhyme has voiceless coda??;
features for [s]; phoneme /s/?
syllable is unstressed (weak, light)??
rel.
Early
4. fricative-silence
boundary
rel.
Early
phoneme /s/; voiceless coda; coda ends?; new
rel.
syllable? features for [t]?; morpheme ends?? productive Late
morpheme/same word??
phoneme /s/; features for mis, maybe dis;
features for [t]?; syllable coda continues??
morpheme continues (is nonproductive)??
weak light syllable?
weak heavy syllable
Relationships
Relative durations:
sonorant:sibilant 1:1
Relative durations:
sonorant:sibilant 1:1
plus sibilant: silence 2:1
transient + aspiration Long
same as ‘mistimes’ except:
syllable is unstressed (weak)?
1:2
weak, light syllable; productive morpheme mis? (dis??)
silence +intonation heralds new syll. onset, new foot?
1:2
3:1
weak heavy syllable; strong (stressed) syll.
onset of same word (monomorphemic
polysyllable)?; defocussed verb missed??
confirms: productive morpheme mis (dis??); new strong
Short
syllable onset [th], new foot, new morpheme,
same polymorphemic word; features for [th]; phoneme /t/
new strong syll. onset [st]; new foot. Confirms monomorphemic word beginning mis(t),
vis, bis (dis?); features for [t]; phoneme /t/
0.9441
The multiple cognitive processes required for speech comprehension
probably rely on multiple cortical networks that operate in parallel. This
functional organization may, in humans, map onto anatomically
segregated, hierarchically organized processing streams similar to
those identified in macaque monkeys.21,22
0
100
Conclusions
Phonetic and anatomical data are consistent with the hypotheses that
“Top-down”
influences are poorly
understood but are
typically assumed to
be separable from
bottom-up processes
abstract initial
categories e.g.
phonological features
Anatomical considerations (cont)
Relationships
Bold font = nodes in
linguistic structure =
potential perceptual units
1 2
3
Events
4
1 2?
3
4
-0.8403
0
Different pathways may be differentially specialized to serve different
processes or operate on complementary representations of speech
(eg articulatory; phonological; crossmodal).23
T im e
(s)
T im e
(s)
Perceptual information available in the short sections of sound, ‘mist,’ taken from ‘Tess mistimes it’ and ‘Tess mistakes it’.
Information about featural, phonemic and lexical identity, and syllabic, morphemic and grammatical structure is conveyed
simultaneously in the fine acoustic-phonetic detail, comprising both ‘events’ at segment boundaries and longer-term relationships.
Prior knowledge is required for linguistic information—at all levels—to be extracted from sensory input. No unit is identifiable
independent of context, and no unit/level is primary. Information is mapped onto prosodic structures linked to grammatical
structures12,13,34 (example at http://kiri.ling.cam.ac.uk/sarah/docs/CNS06trees.pdf).
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
Parducci. A. (1974) Psychophysical Judgment &
Measurement. ed. Carterette/Friedman, 127-141.
Rosen, S., (1979) Journal of Phonetics 7, 393-402.
Pastore, R. (1987) Categorical Perception, ed. Harnad.
Cambridge. 29-52.
Hawkins/Stevens (1985) J. Acoust. Soc. Am. 77, 1560-75
Ganong, W.F., (1980) J. Exp. Psych.: HPP 6, 110-125.
Borsky, S., et al.(2000) J. Psycholing. Res., 29, 155-168.
Ladefoged/Broadbent (1957) J. Ac..Soc. Am. 29, 98-104.
Norris, D. et al. (2003) Cognit Psychol, 47, 204-38.
Eisner/McQueen (2006) J. Acoust. Soc. Am.119, 1950-3.
Abercrombie, D, (1967) Elements of General Phonetics.
Local, J.K., (2003) J. Phonetics 31, 321-339.
Hawkins/Smith (2001) Italian J. Linguistics 13, 99-188.
Hawkins, S., (2003) J. Phonetics 31, 373-405.
Pisoni, D.B. (1997) Talker Variability in Speech
Processing. ed. Johnson, JW, Academic. 9-32.
Davis et al.(2002) J. Exp. Psych.: HPP 28, 218-244.
Kemps, R. et al.,(2005) Mem. Cognit. 33, 430-46.
Salverda, A., et al. (2003) Cognition 90, 51-89.
Marslen-Wilson (1990) In Cognitive Models of Speech
Processing. Ed Altmann, Cambridge. 148-172.
Norris, D. (1994) Cognition 52, 189-234.
McClelland/Elman (1986) Cognit. Psychol. 18, 1-86.
Kaas, J.. et al. (1999) Curr. Opin. Neurobiol. 9, 164-170.
Davis/Johnsrude (2003) J. Neurosci. 23, 3423-31.
Scott/Johnsrude (2003) Trends Neurosci. 26, 100-7.
Petrides/Pandya (1988) J. Comp. Neurol. 273, 52-66.
Seltzer/Pandya (1989) J. Comp. Neurol. 281, 97-113.
Davis/Johnsrude/Horwitz (2004) Soc. Neurosci. Ann. Mtg
Rodd, R. et al. (2005) Cereb. Cortex 15, 1261-9.
Buchsbaum. B.R. et al. (2005) Neuron 48, 687-97.
Pickles, J. (1988) An Introduction to the Physiology of
Hearing. London: Academic Press.
Pandya, D.N. (1995) Rev. Neurol., 151, 486-494.
de la Motte, L. et al. (2006) J. Comp. Neurol. 496, 27-71.
Nelken, I. et al. (2003) Biol. Cybern. 89, 397-406.
Ulanovsky, N., et al. (2004) J. Neurosci. 24, 10440-53.
Ogden, R. et al. (2000) Comput. Sp. & Lang.14, 177-210
Boemio, A., et al (2005) Nat. Neurosci. 8, 389-95.
Andruski, J et al. (1994) Cognition 52, 163-187
Blumstein, S.et al. (2005) J. Cog. Neurosci. 17, 1353-66.
Murray, S. et al. (2002) Proc. Natl. Acad. Sc. 99,5164-9.
Kersten, D. et al. (2004). Ann. Rev. Psychol. 55, 271-304.
2.24538
Funded in part by the Leverhulme Trust