Transcript CTT samt TMH
Processing the Prosody of Oral Presentations
Rebecca Hincks
KTH, The Royal Institute of Technology Department of Speech, Music and Hearing The Unit for Language and Communication
[1]
English in Sweden
• • • A second language rather than a foreign language Nearly all beginners are children ASR not appropriate or necessary for acquisition of sounds Rebecca Hincks [2]
Support for advanced L2 users?
• • • • • Vision: Speech checker analogous to a spellchecker or grammar checker Practice an oral presentation, get feedback on: – Lexicon – Pronunciation – Prosody Making a presentation can be difficult in a native language, and is even more difficult in an L2 Standard advice for how to deliver a presentation–
Use a lively voice, don’t speak too fast, take pauses
These qualities can be processed automatically using speech analysis Rebecca Hincks [3]
What is a lively voice?
• • • • • • A voice that varies in pitch and rhythm A voice that shows enthusiasm Difficult for native speakers, but more difficult for non-native speakers Studies have shown that non-natives use a more narrowed pitch range than natives (Pickering 2004) Tools for helping speakers increase their liveliness should be welcomed Research Question: How can we measure liveliness automatically?
Rebecca Hincks [4]
Corpus of student speech
• • • • • Audio recordings of 35 ten-minute presentations in English made by engineering students Recordings made in the classroom Selected 10 women and 10 men – Varied levels of ability in English – All native speakers of Swedish Written feedback on the presentations from teachers and classmates In preparation: listener ratings of liveliness and fluency Rebecca Hincks [5]
Pitch dynamism quotient, PDQ PDQ =
Standard deviation of F
0
in Hertz Mean F
0
in Hertz
• • F 0 = Fundamental frequency = pitch Necessary to normalize the standard deviation in order to compare voices that are naturally high or naturally low Rebecca Hincks [6]
Time, frequencies and editing
• • • • • Between 7 and 10 minutes per person Divided in intervals of (1 min, 30 s, 15 s,) 10 seconds WaveSurfer’s ESPS settings: 60-400 Hz men, 75 600 Hz women Have also analyzed at 25-400 Hz men, 25-500 Hz women Visually inspected every contour and edited away as many errors as possible Rebecca Hincks [7]
0.15
0.10
0.05
0.25
Mean pitch dynamism quotient for 7-10 minutes of speech
Females Males 0.20
Student, by placement test
Rebecca Hincks [8]
Three proficient speakers
0.35
0.30
0.25
0.20
0.15
0.10
0.05
85OM 88TO 89EH 0.00
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59
Consecutive time periods of 10 seconds
Rebecca Hincks [9]
Lively speaker 1
0.35
0.30
0.25
0.20
0.15
0.10
Speaker 85OM4
0.05
0.00
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58
Consecutive time periods of 10 seconds
• Mean PDQ: .23
“the divergence”
“well-structured,” “confident,” “easy to follow,” “very coherent,” and the speech “well-modulated” and with “varied intonation.” Rebecca Hincks [10]
Lively speaker 2
• Mean PDQ: .21
Her presentation was “well-rehearsed” and “professional.” 0.35
0.30
Speaker 88TO4
0.25
0.20
0.15
0.10
0.05
0.00
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58
Consecutive time periods of 10 seconds
Rebecca Hincks [11]
Monotone speaker
• Mean PDQ: .12
“why is voice over IP interesting?
• Medel PDQ: .12
0.35
0.30
0.25
Speaker 89EH4
0.20
0.15
0.10
0.05
0.00
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58
Consecutive time periods of 10 seconds
Delivery was “a little deadpan,” “more animated facial expressions would be good,” and the presentation would be improved by “showing more enthusiasm.” Rebecca Hincks [12]
Selection of files for listening test
Test values; 9 per speaker
• • 3 lowest PDQ • 3 highest 0.40
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
0 Males Females 10 20 30 40 50 60
Individual ten-second segment
70 80 90 Rebecca Hincks [13]
Conclusions
• • • • • Normalized standard deviation can be used as a measure of liveliness in speaking styles used for oral presentations Hypothesis: PDQ values over .15 lively, over .30 very lively, between .20 and .25 a good target - Different preferences depending on personality and culture?
Unclear effect of Swedish L1 and of proficiency in English Applications: teaching, presentation skills Appropriate feedback: not values but a talking head that moves from alert to sleepy Rebecca Hincks [14]
Thank you for your attention…
Rebecca Hincks [15]