Transcript Document

CS 551/652:
Structure of Spoken Language
Lecture 2: Spectrogram Reading and
Introductory Phonetics
John-Paul Hosom
Fall 2010
Spectrogram Reading
Why bother??
What’s the point of spectrogram reading? Do people read
spectrograms as part of their job? Do computers “read” spectrograms
in order to recognize speech?
There are some jobs that require spectrogram reading (e.g. phonetic
time alignment), but not many. Automatic speech recognition
systems do not process speech in this way.
Primary reason for spectrogram reading:
If you’re going to work on a problem, it’s advisable to
understand the nature of that problem. Spectrogram reading
provides a direct method for “hands-on” learning of the
characteristics of speech. Studying phonetics, signal processing,
or techniques in speech recognition/speech synthesis does not
fully convey the complexity and structure of spoken language.
2
More Formant Data…
(source unknown)
3
Phonetics: Introduction
Phonology:
A description of the systems and patterns of sounds
that occur in a language (abstract), often involving
comparisons between languages and/or evolution of
a language over time.
Phonetics:
A branch of phonology that deals with individual speech
sounds, their production, and their written representation.
Phoneme:
• A unit of speech that can be used to differentiate words
(e.g. “cat” /k ae t/ vs. “bat” /b ae t/).
• Phonemes identify minimal pairs in a language.
• The set of phonemes in a language subject to interpretation;
most languages have 20 to 40 phonemes.
4
Phonetics: Introduction
Allophone:
A speech sound constituting one of the systematic phonetic
variants of a given phoneme. Different allophones are
predictable from environment (e.g. “toe”, “caught”,
“fitness”, “writer”; “sill”, “still”, “spill”)
Phone:
An acoustic realization of a phoneme. (Many different
phones may represent the same phoneme.)
“The phoneme /s/ consists of more than 100 allophones”
− Pickett, The Acoustics of Speech Communication, p. 7.
Phonemes indicated by / /; phones (allophones) indicated by [ ].
5
Phonetics: Introduction
Syllable:
• Unit of speech containing one or more phonemes.
• A vowel in a syllable is called the syllable nucleus.
• Most syllables contain one vowel (or diphthong);
some contain only a lateral (“bott/le”) or nasal
(“butt/on”) as the most intense sound.
• Syllable boundaries sometimes ambiguous
(“tas/ty” vs. “tast/y” vs. “ta/sty”)
Coarticulation:
The “blending” of two or more adjacent phones, causing
a non-distinct boundary between them. Coarticulation
is caused by smooth changes in the articulators (lips,
tongue, jaw) over time.
6
Phonetics: Introduction
Coarticulation Example:
y
uw
aa
r
“you are”: /y uw aa r/
7
Phonetics: Introduction
Another Example of Coarticulation:
8
Phonetics: Introduction (adapted from Schane, p. 4-6)
• Speech signal is continuous; we perceive discrete entities.
(How many sound units are in the word “cat”?)
• One assumption of phonology: utterances can be represented as
sequence of discrete units.
• Are such units purely an “invention” of linguistics?
Spoonerisms (“belly jeans” vs. “jelly beans”) and rhymes
indicate small units of language (Reverend William Archibald Spooner (1844-1930))
• Utterances of the same word(s) have many differences… we’re
usually only interested in those differences that are “linguistically
significant” or that are “perceived as different”.
• Implies a somewhat subjective nature to phonology, whereas
we want an objective measure of perceived or produced units.
9
Phonetics: Distinctive Phonetic Features
• Phonemes do not differ randomly from one another; there
are relationships among phonemes (e.g. /p/ vs. /t/ vs. /ah/)
• A (distinctive) feature is a “phonetic property that can be
used to classify sounds” [Ladefoged, p. 42]
• Typically, features are associated with aspects of articulation
• Features may be binary or multi-valued
• Capital letters indicate feature name: Manner
square brackets [] indicate feature value: [+fricative]
10
Phonetics: Distinctive Phonetic Features
• Exact set of features and feature values depends on goals
(no “right” or “wrong” set of features or values)
• Distinctive features provide a vocabulary for describing speech
• Are distinctive features purely an “invention” of linguistics?
memory tasks show that when people forget a phoneme, they
usually remember a phoneme with similar distinctive features
11
Phonetics: Distinctive Phonetic Features
nasal tract
velic port
velum (soft palate)
tongue
pharynx
glottis
(hard) palate
oral tract
alveolar ridge
lips
teeth
tongue tip
(vocal folds and
space between vocal cords)
vocal folds
= vocal cords
larynx
(voice box)
The Speech Production Apparatus (from Olive, p. 23)
12
Phonetics: Distinctive Phonetic Features*
Feature
Description
_
Consonantal produced with a constriction along center line of
oral cavity. Only vowels, /w/, /h/, and /y/ are not.
Vocalic
largely unobstructed vocal tract. Vowels and
liquids (/l/, /r/) are vocalic; glides (/w/, /y/) are not.
Anterior
point of articulation near alveolar ridge, including
all labial and dental sounds.
Coronal
articulation involves front of tongue
Continuant
no complete obstruction in oral cavity; only nasals,
stops, and affricates are non-continuant
Strident
articulation with long, narrow constriction;
such as /s/, /z/, /f/, /v/, /sh/, /zh/, /ch/, /jh/
Voiced
vibration of the vocal folds occurs during articulation
13
Phonetics: Distinctive Phonetic Features*
Feature
Lateral
Description
_
contact between corona of tongue and roof of mouth,
with lowering of sides of tongue (only /l/ in English)
Nasal
lowering of the velic port and opening of nasal cavity.
High
vowel with high tongue position (narrow constriction);
in English, /iy/, /ih/, /uh/, /uw/
Low
vowel with low tongue position (no constriction);
/ae/, /ao/, /aa/ are (some) low vowels in English.
Back
vowels produce with tongue toward back of mouth;
/uw/, /uh/, /ah/, /ao/, /aa/, /ow/ are back vowels
Round
articulation involving rounding of the lips; only
/uw/, /ow/, /ao/, and /uh/ are rounded in English.
However, /uh/ may take an unrounded form.
*Adapted
from “Language” by C.E.Cairns and F. Williams in Normal Aspects of Speech, Hearing,
and Language, edited by Minifie, Hixon, and Williams, 1973, p. 424, as printed in Daniloff p. 51. 14
Phonetics: More Distinctive Phonetic Features*
Feature
Sonorant
Obstruent
Syllabic
Tense
Aspirated
Glottalized
*
Description
_
“resonant quality” of a sound; vowels are +sonorant,
stops and fricatives are –sonorant. nasals also sonorant.
non-sonorants, e.g. stops, fricatives, affricates, which
are formed by obstructing the airflow.
is the phoneme the main sound in a syllable?
vowels are syllabic, stops are usually –syllabic,
but there are syllabic nasals and liquids.
tense vowels are longer, more fully articulated, and
more “distinct,” e.g. /iy ey uw ow aa/; lax vowels
are less so, e.g. /ih eh uh ah/.
produced without a constriction in the vocal tract,
but also without voicing (/h/).
produced with aperiodic or extremely low-frequency
vibrations of the vocal cords.
from Schane, pp. 26-32
15
Phonetics: Distinctive Phonetic Features
Physiological Features:
• Manner
stop /p/, fricative /s/, affricate /ch/, liquid /l/ /r/,
glide /j/ /w/, nasal /m/, vowel /ah/, aspiration /h/
• Place
bilabial /p/, labiodental /f/, dental /th/, alveolar /t/,
palato-alveolar /r/, palatal /sh/, velar /k/, glottal /h/,
front /iy/, mid /ah/, back /aa/ (can combine mid + back)
• Height
high /iy/, mid-high /ih/, mid /ax/, mid-low /eh/, low /aa/
or high /iy/, mid /eh/, low /aa/ (3 values, plus tense/lax)
• Tenseness, Nasality, Rounding
same as previous descriptions
16
Phonetics: Distinctive Feature Relationships: Vowels
Front
Unrounded Rounded
High
i (iy)
ü
i (ix)
u (uw)
Mid
ɛ (eh)
ö
ʌ (ah)
o (ow)
Low
æ (ae)
œ
a (aa)
ɔ (ao)
Front, –Round
Tense Lax
Back, +Round
Tense
Lax
High
iy
ih
uw
Mid
ey
eh
ow
ae
ao
Low
*
Back
Unrounded
Rounded
Back, –Round
Tense
Lax
uh
ix
ah, ax†
aa
from Schane, pp. 12-13. †/ax/ is slightly more centralized than /ah/, and shorter in duration
17
Phonetics: Distinctive Phonetic Features: The Case of /ae/
• /ae/ is classified in the preceding table as “lax”, but we have been
considering it as “tense”.
• One Rule for Differentiating Tense/Lax:
A lax vowel can never be a word-final stressed vowel
e.g. /iy/ can be word final: “be” /b iy/, “tea” /t iy/
/ih/ can not be word final in one-syllable word: /b ih/, /t ih/
/ah/ can be word final, but only if unstressed.
• According to this rule, both /eh/ and /ae/ are lax, because they can
not be word-final stressed vowels. In this case, the tense vowel in
contrast to /eh/ is /ey/.
• However, /ae/ is long in duration (e.g. Forgie and Forgie (1959) and
Peterson and Lehiste (1960)), making it acoustically more similar to
a tense vowel.
• For spectrogram reading, we’re more concerned with acoustics, so
we’ll call /ae/ a tense vowel, although others may call it lax.
18
Phonetics: Distinctive Phonetic Features: The Case of /ae/
• Looking at 130,000 words in the CMU dictionary:
PHN
/iy/
/ih/
/eh/
/ae/
/uw/
/uh/
/ah/
/aa/
/ao/
/ey/
/ay/
/oy/
/yu/
/aw/
/ow/
CNT
12945
15
30
5
714
2
6413
170
243
962
379
167
171
226
5137
PCNT
0.10002
0.00012
0.00023
0.00004
0.00552
0.00002
0.04955
0.00131
0.00188
0.00743
0.00293
0.00129
0.00132
0.00175
0.03969
0.21280
EXAMPLES
“chui”, “des”, “kiwani”, “lui”, “moishe”, “pih”, “to”
“bienvenue”, “des”, “eh”, “moshe”, “yahweh”, “zeh”
“dhaka”, “lashua”, “losoya”, “pah”, “yeah”
“l’heureux”, “milieu”
21% of words end in vowel/diphthong
19
Phonetics: Distinctive Feature Relationships: Vowels
Front
iy
Central
Back
ju
uw
High
ih
uh
ey
ix
Mid
ay
eh
oy
ax
ow
aw
ao
ah
Low
ae
aa
from Ladefoged, pp. 38, 81, 218 with correction to /aw/
20
Phonetics: Distinctive Feature Relationships: Consonants
Manner
obstruent
stops
fricatives
approximant
affricates
Voicing
bilabial
labiodental
dental
alveolar
palato-
palatal
velar
alveolar
+voice
b
d
g
-voice
p
t
k
+voice
v
dh
z
zh
-voice
f
th
s
sh
+voice
jh
-voice
ch
nasals
+voice
m
glides
+voice
w
retroflex
+voice
lateral
+voice
glottal
h
n
ng
y
(w)
r
l
from Olive, p. 28 and Daniloff, p. 56
21
Phonetics: Distinctive Feature Relationships: Consonants
-sibilant
Labial
Coronal
Dorsal
+nasal
m
n
ng
-nasal
p b
t d
k g
stop
ch jh
+sibilant
-sibilant
s z
-lateral
f v
th dh
w
r
+lateral
strong
fricative
sh zh
fricative
y
approximant
l
+anterior
-anterior
from Ladefoged, p. 44
22
Approximants: Terminology
• “Approximants” are NOT the same as “Semi-Vowels”
(although Rabiner states they are the same…). American
English /r/ is debatable, but we’ll exclude it from the
Semi-Vowels for consistency. (Ladefoged p. 229)
• Approximants can be divided into two groups: Liquids and Glides
Liquid = {/l/, /r/}, Glide = {/w/, /y/}
(Again, Rabiner confuses things by mixing up these sets)
• Lateral = {/l/}
• Retroflex = {/r/, /er/, /axr/}.
(In some cases, /er/ is considered a retroflex but /r/ isn’t;
we’ll keep things simple by calling /r/ a retroflex).
• Central Approximants = {/r/, /w/, /y/},
Lateral Approximant = {/l/}
23
Approximants: Terminology
Approximant
Semi-Vowel / Glide
/y/
/w/
Liquid
Retroflex
/r, er, axr/
central approximants
Lateral
/l/
lateral approximant
24