From Sounds to Language Lecture 2 Spoken Language Processing Prof. Andrew Rosenberg Linguistic sounds • How does a sound wave become language? • Sounds are continuous.

Download Report

Transcript From Sounds to Language Lecture 2 Spoken Language Processing Prof. Andrew Rosenberg Linguistic sounds • How does a sound wave become language? • Sounds are continuous.

From Sounds to Language
Lecture 2
Spoken Language Processing
Prof. Andrew Rosenberg
Linguistic sounds
• How does a sound wave become
language?
• Sounds are continuous wave forms.
• Linguistic units are categorical.
• How is the human perceptual system able
to categorize and combine linguistic
sounds into language?
1
Studying Speech
• Who studies speech?
– Linguists (phoneticians, phonologists, forensic
linguists)
– Speech Engineers
• Speech recognition
• Speech synthesis
• etc.
–
–
–
–
Speech Pathologists
Language Instructors
Singers
Marketing experts
2
Marketing experts?
3
Studying speech
• Major questions in studying speech.
– What is the sound inventory of a language?
• Which variations are linguistically relevant?
– R/L in Asian Languages
– P/Ph in English
– How are speech sounds produced?
– What sounds are shared by two languages,
and which are not?
– How do sounds vary in context?
• “Green banana” vs. “Greem banana”
4
Representing speech sounds
• Why are representations important?
– translation between sounds and words
• ASR and TTS
– Learning pronunciation
– Having a shared vocabulary to discuss language.
• How should we represent speech sounds?
– Orthography?
– Special symbols?
– Abstract classes based on sound and/or
articulatory similarities
5
Using orthography to represent sounsd
• A single orthographic letter is realized in
many different ways (in English)
–b
–c
– oo
–s
comb, tomb, bomb
court, center, chess
food, good, blood
reason, sunrise, shy, collision
6
Using orthography to represent sounsd
• A single sound can be written in many
different ways (in English)
– [i]
– [s]
– [u]
– [ay]
sea, see, scene, receive, thief, miss
cereal, same, miss
true, few, choose, lieu, do
lie, prime, pry, buy,
• How is orthography looking as a choice in
English?
7
Phonetic Symbol Sets
• International Phonetic Alphabet (IPA)
– Single (unique) character for each sound
– Represents all sounds of the world’s
languages, but is large, and requires a special
(non-ascii) font.
• ARPAbet, TIMIT, etc.
– Multiple characters for each sound
– Language specific. A new symbol set is
required for each language.
8
Exercise:
Write your full name in English
orthography and in ARPAbet.
9
Sound categories
• Phone: Basic speech sound of a language
– A minimal sound difference between two words
• too vs. zoo
– Not every sound made by a human speaker is
phonetic
• Sniffs, laughs, coughs, breaths…
• Phoneme: Class of speech sounds
– Phoneme may include several phones
– /t/ in top, stop, little, butter, winter
• Allophone: the set of phonetic variants that
comprise a phoneme.
– {[t], [ɾ], …}
10
Speech Production
• The articulatory organs
• General Process:
– Air is expelled from the lungs through the
windpipe (trachea) leaving via the mouth (and
nose)
– Air passes through the trachea through the
larynx which contains the vocal folds – the
space between them is the glottis.
– When vocal folds vibrate, voiced sounds are
produced, otherwise, voiceless (e.g. [f] vs [v])
11
Vocal Fold Vibration
Slow motion video of normal vocal folds
12
Articulators
• “Why did Ken set the net on the soggy deck?”
• Queens University ATR Labs X-ray Film Database
http://psyc.queensu.ca/~munhallk/05_database.htm
13
Vocal Organs
14
Recording Articulatory Data
• X-Ray Microbeam Database
– Track motion of small gold pellets on the tongue,
jaw, lips and soft pallate
• Electroglottography
– Run a high freq current through the glottal area of
a speaker.
– There is lower resistance when the vocal folds
are closed.
• Electromagnetic articulography (EMMA)
– 3 transmitters on a helmet allow for triangulation
of 5-15 sensor positions
15
Classes of Sounds
• Consonants and Vowels
– Consonants:
• Restricted or blocked airflow (e.g. [s])
• Voiced or unvoiced
– Vowels
• Unrestricted airflow
• voiced
– Semi vowels (approximants): [w], [y]
16
Consonants: Place of Articulation
• What is the point of maximum air
restriction?
– Labial: bilabial [b], [p]; labiodental [v], [f]
– Dental: [], [] thief vs. them
– Alveolar: [t], [d], [s], [z]
– Palatal: [], [t] shrimp vs. chimp
– Velar: [k], [g]
– Glottal: [?] glottal stop
17
Consonants: Place of Articulation
• What is the point of maximum air
restriction?
– Approximant: [w], [y]
• 2 articulators come close but don’t restrict much
• Somewhere between vowels and consonants
• lateral: [l]
– Tap or flap: [ ] e.g. butter
18
Places of Articulation
dental
labial
alveolar post-alveolar/palatal
velar
uvular
pharyngeal
laryngeal/glottal
http://www.chass.utoronto.ca/~danhall/phonetics/sammy.html
19
Consonants: Manner of articulation
• How is the airflow restricted
– Stop (or plosive): [p], [t], [g], …
• Airflow is completely blocked (closure) and released
(release)
• Glottal stop, e.g. before word-initial vowels in English
after a pause. “three even”
– Nasal: air is released through the nose [m], [ng]
– Frivative: [s], [z], [f] air is forced through a narrow
channel, leading to turbulent airflow
– Affricates: [t] begin as stops, but the release is
frivative
20
Articulation map
MANNER OF ARTICULATION
PLACE OF ARTICULATION
bilabial labiodental
stop
p
inter- alveolar
dental
b
fric.
f
v
th dh
t
d
s
z
affric.
nasal
m
n
appr
ox
flap
w
l/r
palatal
sh
zh
ch
jh
velar
glottal
k
q
g
h
ng
y
dx
VOICING: voiceless
voiced21
Vowels
• All voiced
• Vowel height
– How high is the tongue? High or low?
– Where is its highest point? Front or back?
• How rounded are the lips?
• mono- [eh] vs. dipthong [ey]
– 1 vowel sound vs. two
22
American English Vowel Space
HIGH
iy
uw
ix
ih
FRONT
ux
uh
ax
eh
ah
ae
ao
BACK
aa
LOW
23
Compare to vowel spaces in other
languages
•
•
•
•
•
•
British English
Indian English
Swedish
Spanish
Mandarin Chinese
Japanese
24
[iy] vs [uw] – “key” vs “coo”
(From a lecture given by Rochelle Newman)
25
[ae] vs [aa] – “cat” vs. “cot”
(From a lecture given by Rochelle Newman)
26
Acoustic Landmarks
[p][ix] [t]
[ih][sh] [ax] [p]
[ae] [t] [s] [iy]
[s] [ae] [l][iy]
“Patricia and Patsy and Sally”
[p]
[ix]
[t]
[ih]
27
Coarticulation
• The same phone can be produced differently
depending on phonetic context.
• Articulations overlap as articulators move in
different timing patterns to to produce
consecutive dounsounds
– Eight vs. Eighth
• Articulation moves forward
– Met vs. Men
• Vowel becomes nasalized
– Green Banana
• or “greem” banana?
28
Articulator mistiming
• “Probably” is canonically [p r aa b ax b l iy]
–
–
–
–
[p r aa b iy]
[p r aw l uh]
[p r ah b iy]
[p r aa l iy]
• “Sense” is canonically [s eh n s]
– [s eh n t s]
– [s ih t s]
29
IPA Consonants
30
IPA Vowels
31
Representations for Sounds
• With ways to represent sounds (IPA, Arpabet,
etc.) we can classify and manipulate these
units.
–
–
–
–
–
Automatic Speech Recognition
Speech synthesis
Speech pathology
Language ID
Speaker ID
• But…how do we recognize these different
sounds automatically from sound data?
– Acoustic analysis (digital signal processing)
32
Next Class
• Overview of Spoken Dialog Systems
• Readings: J&M 24.1, 24.2
33