Chapter 13 Speech Perception

Download Report

Transcript Chapter 13 Speech Perception

Chapter 13
Speech Perception
語音刺激
• 電腦語音辨認系統至今仍然表現
不夠理想,因為
– 雖然語音知覺是間斷的
(discrete),語音訊號是連續重
疊的
– 語音訊號的特性受性別,年齡,講
話速度,腔調差異等的影響
• 音素(Phonemes)
– 語音刺激的基本成分
• 產生語音意義改變的最
小單位
• 語音知覺單位
– Syllable? Phoneme?
– 不同語言的音素數量
不同
• 英文
– 37個
• 音訊號(acoustic
signal)
– 來自於發聲器官造成的
氣壓變動
– 發聲器官(articulators)
• 舌,唇,牙齒,下巴,
軟顎(soft palate)
– 母音聲道的震動
– 母音
• 源於聲道振動造成發聲
的頻率變動
• 不同母音有不同的聲道
形狀,而造成共振頻率
的變化
• 基本表現在幾個主要的
頻率
– Formants
» Peak frequencies
» 1st formant, 2nd
formant etc.
(low – high freq.)
• Sound spectrogram
– /ae/’s formants – 500, 1700, 2500Hz
13.3
– 子音
• 關閉或摩擦聲道所造成
– eg., /d/ vs. /f/
• Formant transition
– Formant前後很快速的頻率改變,與子音有關
» e.g., T2, T3與/r/有關
語音刺激與語音知覺
• 分割問題(the segmentation
problem)
– 語音知覺如何將連續的語音訊號
分割成為個別的字?
– 在spectrogram中音訊號是連續
的,但人的語音知覺可以明確切
割字的邊界
• Visual – scene →objects
• 聽不懂的外語也像是連續的
• 變異問題(the variability problem)
– 音訊號與音素之間的對應受語境( context)影響
•
/i/ - 1st
formant=200Hz, 2nd
formant=2600Hz
/u/ - 1st
formant=200Hz, 2nd
formant=600Hz
–
But see the
formant transition
for /d/ differs
between /di/ and
/du/
•
/i/ - 1st formant=200Hz, 2nd formant=2600Hz
/u/ - 1st formant=200Hz, 2nd formant=600Hz
– But see the formant transition for /d/ differs
between /di/ and /du/
• Coarticulation
– 聲道發出一個音素時的形狀受前後發聲的形狀影響
» Eg., boat vs. bat
– 但即使有coarticulation,我們對於語音的知覺仍然相同
» 知覺恆常性
– 說話者
• 音調,速度,腔調
– This was a best buy --- “bes” vs. “ best”
Did you go to the store? --- “dijoo’ vs. ‘did you’
13.7
語音知覺的重要發現與議題
• 尋找invariant acoustic cues
– 同一音素在不同脈絡或由不同說話者發聲時皆保持不變
而可藉以辨識的音線索
– Short-term spectrum
– running spectral display
一系列的short-term
spectrum
– Invariant acoustic cues
in running spectral
display
• eg., continuing lowfrequency peaks代表聲
帶振動 /da/有,但 /pi/沒
有
– 但只有部分語音可以找
到invariant cues
• 類別知覺(categorical
perception)
– 當連續變動語音的物理
向度時,語音知覺的變
化卻是類別式的
– VOT (voice onset time)
• 聲道震動與發出聲音之
間的時間間隔
– 連續變動VOT,
但聽者的知覺
只有兩種
• Phonetic
boundary-產生
知覺類別改變
的VOT
– 類別知覺是一種恆常性(constancy)的表現
• 兩個要被區辨的語音VOT 落在phonetic boundary
同側時,會被認為是同一個音
• 協助更直接而有效率的語音訊息分析及知覺
• 語音知覺的多感官(multimodal)特性
– McGurk effect (閉眼則產生正確知覺)
http://www.infectiousvideos.com/index.php?p=showvid&a=playvid&s
id=1419&cr=hotplay
– Lipreading 造成視覺區(運動)以及側葉聽覺區的激發
• Silent lipreading vs. static condition
Ss watch silent video of a speaker’s lip move and repeat the number in
their minds vs. watch static face and repeat the number ‘one’ in their
minds
Perceiving Words in Sentences
• Miller and Isard
– Shadowing: repeating aloud the sentences
through earphones
• 89% accurate with normal sentences
• 79% accurate for anomalous sentences
• 56% accurate for ungrammatical word strings
– Knowledge helps us understand the sentences
even in noisy environment.
Perceiving Breaks between Words
• The segmentation problem - there are no physical
breaks in the continuous acoustic signal.
• Segmentation is affected by context, meaning, and our
knowledge of word structure.
• Knowledge of word structure
– Transitional probabilities: the chance that one sound will
follow another in a language
• Statistical learning: the process of learning transitional probabilities
and other language characteristics
– Infants as young as eight months show statistical learning.
• Learning phase:
– 4 nonsense words (bidaku, padoti, golabu, & tupiro) form 2
minutes string of words in random order.
• Transitional probabilities
– Syllables within a word:
bidaku: da + bi ~ 100%
– Syllables between words:
ku (bidaku) + pa (padoti) ~ 33%
+ tu (tupiro) ~ 33%
+ go (golabu)~ 33%
• Testing phase
• Whole-words
• Part-words: the beginning and ends of two words
– Ex. tibida: padoti + bidaku
• Part-words as new stimuli in testing phase
語音知覺的認知向度
• Meaning and segmentation
– Segmentation
斷詞的線索不是百分之百明確的
Anna Mary Candy Lights Since Imp Pulp Lay Things
vs.
An American delights in simple playthings
– Big girl vs. big Earl
– How to recognize speech?
vs.
How to wreck a nice beach
– 黄河遠上白雲間,一片孤城萬仞山,羌笛何須怨楊柳,
春風不渡玉門關
黄河遠上白雲一片孤城萬仞山羌笛何須怨楊柳春風不
渡玉門關
黄河遠上
白雲一片
孤城萬仞山 羌笛何須怨
楊柳春風 不渡玉門關 (紀曉嵐 vs. 乾隆)
– 另一闕
• 清明時節雨,紛紛路上,
行人欲斷魂,
借問酒家何處,
有牧童,遙指杏花村
• 清明時節雨紛紛,路上行人欲斷魂。借問酒家何處
有?牧童遙指杏花村
• Meaning and phoneme perception
– Judge which word starts with /b/
631 ms for nonwords and 580 ms for real
words
– The phonemic restoration effect
“The state governors met with their respective
legislatures convening in the capital city”
/s/ is replaced by a cough
• Ss could not identify where the cough is nor could
they identify that /s/ was missing
Warren (1970)
– The phonemic restoration effect is affected by
• Whether the word is a real word
• Whether the masking sound is similar to the
missing phoneme
• Meaning and word perception
– Shadowing accuracies were
89% for normal sentences
(Gadgets simply work around the house)79% for
anomalous (Gadgets kill passengers from the eyes)
sentences
56% for ungrammatical strings (Between gadgets
highways passengers the steal)
They were 63%, 22% and 3% under noise
Miller & Isard (1963)
以上這些發現顯示,知識協助聽者將音訊號轉碼為
音素,字與句子
• Speaker characteristics
– Indexical characteristcs
– 年齡,性別,原籍,情緒,用意(語調)
• eg., ‘yeah, right’
• Speaker identity affects recognition accuracy
Physiology of speech perception
• 神經元對語音的反應
– 反映語音中的頻率組成
– 猴子大腦相當於人類
speech area 的區域對
於猴子「叫聲」有反應
• 功能定位
– Broca’s area (frontal lobe)
• Speaking
– Wernicke’s area (temporal lobe)
• Understanding speech
– Aphasia
• Damage in Broca’s and Wernicke’s area
results in aphasia
• Wernicke’s aphasia
– Cannot comprehend speech
– Produce word salads
– A problem similar to the inability to
perceive and produce phonemes foreign
to the language
• Brain damage: having
trouble in discriminating
syllables but having ability
to understand words.
– A “voice area” in the
STS activated more by
human voices than other
sounds.
– Dual stream model of
speech perception:
• Ventral stream for
recognizing speech
• Dorsal stream linking the
acoustic signal to
movements for producing
speech
Motor Theory of Speech Perception
• Motor mechanisms responsible for producing
sounds activate mechanisms for perceiving
sound.
– monkeys have audiovisual mirror neurons.
• Hearing the sound of breaking a peanut
Speech perception與speech production的關係
Fig. 13-18, p. 299
Is speech perception special?
• Does speech perception
differ from the perception
of other auditory stimuli?
• Duplex perception
– (右圖)受試者聽到
• /da/
← speech mode
‘chirp’ ← auditory mode
– 當3rd formant transition低
強度時聽到/da/,高強度時
才產生duplex perception
→語音優先於一般聽覺信號
– 不過其他研究者用非語音刺激製造出
duplex perception
• 關門聲
– 高頻一左耳
低頻一右耳
– 受試者聽到「關門聲」+「沙沙聲」
美國 vs. 日本聽者
• Phoneme perception
– 6MO
• 日本與美國嬰孩都可以區辨 /r/ /l/
– 12Mo
• 日本嬰孩無法區辨,而美國嬰孩區辨得更好
– Also the case for babbling
– Adults
• 美國-categorical perception
• 日本-continuous
subvocal speech recognition
• http://www.nasa.gov/centers/ames/news/releases/2004/s
ubvocal/subvocal.html
• NASA Ames Research Center
• silently spelled out ‘NASA’
• then submitted it to a Web search engine
• electronically numbered the Web pages
that came up as search results
• used the numbers again to choose Web
pages to examine
• we could browse the Web without touching
a keyboard
期末考練習題
• Trichromatic theory與opponent process theory
如何整合來解釋人的色彩經驗?
• 對於Muller-Lyer illusion的解釋。
• Corollary discharge theory對於運動知覺的解釋。
• J.J.Gibson對於知覺的ecological approach。
• Bekesy對於聽覺的位置理論及支持證據。
• 何謂categorical perception?它有什麼重要意義?