Identifying frication & aspiration noise in the frequency

Download Report

Transcript Identifying frication & aspiration noise in the frequency

The role of prosody
in dialect synthesis and authentication
Kyuchul Yoon
Division of English
Kyungnam University
Spring 2008 Joint Conference of KSPS & KASS
Goals
1. Synthesize Masan utterances from
matching Seoul utterances by prosody
cloning
2. Examine the role of prosody in the
authentication of synthetic Masan
utterances (Listening experiment)
2
Background
• Differences among dialects
– Segmental differences
• Fricative differences in the time domain (Lee, 2002)
– Busan fricatives have shorter frication/aspiration intervals than for Seoul
• Fricative differences in the frequency domain (Kim et al., 2002)
– The low cutoff frequency of Kyungsang fricatives was higher than for
Cholla fricatives (> 1,000 Hz)
– Non-segmental or prosodic differences
•
•
•
•
Intonation or fundamental frequency (F0) contour difference
Intensity contour difference
Segment durational difference
Voice quality difference
3
Synthesis
• Simulating (by prosody cloning) Masan
dialect from Seoul dialect
• The simulated Masan utterances will have
– the speech segments of Seoul dialect
– the prosody of Masan dialect
• F0 contour
• Intensity contour
• Segmental duration
4
Evaluation
• Through a listening experiment
• Stimuli consist of
–
–
–
–
–
–
–
–
#1. Authentic, but synthetic, Masan utterance
#2. Seoul utterance with Masan segmental durations (D)
#3. Seoul utterance with Masan F0 contour (F)
#4. Seoul utterance with Masan intensity contour (I)
#5. Seoul utterance with Masan durations and F0 contour (D+F)
#6. Seoul utterance with Masan durations and intensity contour (D+I)
#7. Seoul utterance with Masan F0 contour and intensity contour (F+I)
#8. Seoul utterance with Masan durations, F0 contour and intensity contour (D+F+I)
(1) 동대구에 볼 일이 없습니다. (2) 바다에 보물섬이 없다
5
Prosody transfer (PSOLA algorithm)
• Three aspects of the prosody
– Fundamental frequency (F0) contour
– Intensity contour
– Segmental durations
• Pitch-Synchronous OverLap and Add (PSOLA)
algorithm (Mouline & Charpentier, 1990)
– Implemented in Praat (Boersma, 2005)
– Use of a script for semi-automatic segment-by-segment
manipulation (Yoon, 2007)
6
Prosody transfer (PSOLA algorithm)
• Procedures for full prosody transfer
–
–
–
–
Align segments btw/ Masan and Seoul utterances
Make the segment durations of the two identical
Make the two F0 contours identical
Make the two intensity contours identical
7
Prosody transfer (PSOLA algorithm)
Align segments btw/ Masan and Seoul utterances
Make the segment durations of the two utterances identical
Masan
Seoul
ㅂ ㅏ ㄹ ㅏ ㅁ
ㅂ
ㅏ
ㄹ
ㅏ
“…바람…”
ㅁ
8
Prosody transfer (PSOLA algorithm)
Make the two F0 contours identical
Masan F0
Masan
ㅂ ㅏ ㄹ ㅏ ㅁ
Seoul
ㅂ ㅏ ㄹ ㅏ ㅁ
Seoul F0
9
Prosody transfer (PSOLA algorithm)
Make the two intensity contours identical
Masan intensity
Masan
ㅂ ㅏ ㄹ ㅏ ㅁ
Seoul
ㅂ ㅏ ㄹ ㅏ ㅁ
Seoul intensity
10
Synthetic (simulated) Masan stimulus
11
Synthetic authentic Masan stimulus
12
Listening experiment
• 16 stimuli (8 + 8)
• Presented to 13 Masan/Changwon listeners
– On a scale of 1 (worst) to 10 (best)
– Used Praat ExperimentMFC object
– Allowed repetition of stimulus: up to 10 times
13
Listening experiment
14
Results & Conclusion
Histogram of listener responses
15
Results & Conclusion
1 … listener responses … 10
F0 contour transfer
16
Results & Conclusion
Masan
F
D
FI
DF
I
DFI
DI
Seoul utterances with Masan prosody
17
Results & Conclusion
• Main effects of
– Segmental durations; F(1,12)=11.53, p=0.005
– F0 contour; F(1,12)=141.12, p=0.00000005
• Regression analysis
18
Results & Conclusion
• Prosody cloning not sufficient for dialect
simulation
– (Sub)Segmental differences may be at work
– Quality of synthetic stimuli
• F0 contour transfer (from Masan to Seoul)
– Most influential on shifting perception from
Seoul to Masan utterances
19
References
[1] Kyung-Hee Lee, “Comparison of acoustic characteristics between Seoul and Busan
dialect on fricatives”, Speech Sciences, Vol.9/3, pp.223-235, 2002.
[2] Hyun-Gi Kim, Eun-Young Lee, and Ki-Hwan Hong, “Experimental phonetic study
of Kyungsang and Cholla dialect using power spectrum and laryngeal
fiberscope”, Speech Sciences, Vol.9/2, pp.25-47, 2002.
[3] Kyuchul Yoon, “Imposing native speakers’ prosody on non-native speakers’
utterances: The technique of cloning prosody”, Journal of the Modern British &
American Language & Literature, Vol.25(4). pp.197-215, 2007.
[4] E. Moulines and F. Charpentier, “Pitch synchronouswaveform processing techniques
for text-to-speech synthesis using diphones”, Speech Communication, 9 5-6, 1990.
[5] P. Boersma, “Praat, a system for doing phonetics by computer”, Glot International,
Vol.5, 9/10, pp.341-345, 2005.
20