CTT samt TMH

Download Report

Transcript CTT samt TMH

Processing the Prosody of Oral Presentations

Rebecca Hincks

KTH, The Royal Institute of Technology Department of Speech, Music and Hearing The Unit for Language and Communication

[1]

English in Sweden

• • • A second language rather than a foreign language Nearly all beginners are children ASR not appropriate or necessary for acquisition of sounds Rebecca Hincks [2]

Support for advanced L2 users?

• • • • • Vision: Speech checker analogous to a spellchecker or grammar checker Practice an oral presentation, get feedback on: – Lexicon – Pronunciation – Prosody Making a presentation can be difficult in a native language, and is even more difficult in an L2 Standard advice for how to deliver a presentation–

Use a lively voice, don’t speak too fast, take pauses

These qualities can be processed automatically using speech analysis Rebecca Hincks [3]

What is a lively voice?

• • • • • • A voice that varies in pitch and rhythm A voice that shows enthusiasm Difficult for native speakers, but more difficult for non-native speakers Studies have shown that non-natives use a more narrowed pitch range than natives (Pickering 2004) Tools for helping speakers increase their liveliness should be welcomed Research Question: How can we measure liveliness automatically?

Rebecca Hincks [4]

Corpus of student speech

• • • • • Audio recordings of 35 ten-minute presentations in English made by engineering students Recordings made in the classroom Selected 10 women and 10 men – Varied levels of ability in English – All native speakers of Swedish Written feedback on the presentations from teachers and classmates In preparation: listener ratings of liveliness and fluency Rebecca Hincks [5]

Pitch dynamism quotient, PDQ PDQ =

Standard deviation of F

0

in Hertz Mean F

0

in Hertz

• • F 0 = Fundamental frequency = pitch Necessary to normalize the standard deviation in order to compare voices that are naturally high or naturally low Rebecca Hincks [6]

Time, frequencies and editing

• • • • • Between 7 and 10 minutes per person Divided in intervals of (1 min, 30 s, 15 s,) 10 seconds WaveSurfer’s ESPS settings: 60-400 Hz men, 75 600 Hz women Have also analyzed at 25-400 Hz men, 25-500 Hz women Visually inspected every contour and edited away as many errors as possible Rebecca Hincks [7]

0.15

0.10

0.05

0.25

Mean pitch dynamism quotient for 7-10 minutes of speech

Females Males 0.20

Student, by placement test

Rebecca Hincks [8]

Three proficient speakers

0.35

0.30

0.25

0.20

0.15

0.10

0.05

85OM 88TO 89EH 0.00

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59

Consecutive time periods of 10 seconds

Rebecca Hincks [9]

Lively speaker 1

0.35

0.30

0.25

0.20

0.15

0.10

Speaker 85OM4

0.05

0.00

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58

Consecutive time periods of 10 seconds

• Mean PDQ: .23

“the divergence”

“well-structured,” “confident,” “easy to follow,” “very coherent,” and the speech “well-modulated” and with “varied intonation.” Rebecca Hincks [10]

Lively speaker 2

• Mean PDQ: .21

Her presentation was “well-rehearsed” and “professional.” 0.35

0.30

Speaker 88TO4

0.25

0.20

0.15

0.10

0.05

0.00

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58

Consecutive time periods of 10 seconds

Rebecca Hincks [11]

Monotone speaker

• Mean PDQ: .12

“why is voice over IP interesting?

• Medel PDQ: .12

0.35

0.30

0.25

Speaker 89EH4

0.20

0.15

0.10

0.05

0.00

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58

Consecutive time periods of 10 seconds

Delivery was “a little deadpan,” “more animated facial expressions would be good,” and the presentation would be improved by “showing more enthusiasm.” Rebecca Hincks [12]

Selection of files for listening test

Test values; 9 per speaker

• • 3 lowest PDQ • 3 highest 0.40

0.35

0.30

0.25

0.20

0.15

0.10

0.05

0.00

0 Males Females 10 20 30 40 50 60

Individual ten-second segment

70 80 90 Rebecca Hincks [13]

Conclusions

• • • • • Normalized standard deviation can be used as a measure of liveliness in speaking styles used for oral presentations Hypothesis: PDQ values over .15 lively, over .30 very lively, between .20 and .25 a good target - Different preferences depending on personality and culture?

Unclear effect of Swedish L1 and of proficiency in English Applications: teaching, presentation skills Appropriate feedback: not values but a talking head that moves from alert to sleepy Rebecca Hincks [14]

Thank you for your attention…

Rebecca Hincks [15]