Introducing The Buckeye Speech Corpus

Download Report

Transcript Introducing The Buckeye Speech Corpus

Introducing
The Buckeye Speech Corpus
http://buckeyecorpus.osu.edu
Kyuchul Yoon
English Division, Kyungnam University
March 21, 2008
School of English, Kyung Hee University
1
The Buckeye Speech Corpus
• What is it?
• Project
Personnel
• Collection &
Recording
• Transcription
& Analysis
• Why create
the corpus?
• The Buckeye Corpus of
conversational speech
• 40 speakers in Columbus, OH
conversing freely with an interviewer
• Orthographically transcribed and
phonetically labeled
• Audio/text files & time-aligned
phonetic labels (Xwaves, Wavesurfer)
• Available to researchers in academics
and industry
2
The Buckeye Speech Corpus
• What is it?
• Project
Personnel
• Collection &
Recording
• Transcription
& Analysis
• Why create
the corpus?
• Principal Investigators
– Mark Pitt (Department of Psychology)
– Eric Fossler-Lussier (Department of Computer
Science and Engineering)
– Elizabeth Hume (Department of Linguistics)
– Keith Johnson (Department of Linguistics)
• Post-doctoral researchers (4)
• Graduate students (7)
• Undergraduate students (15)
3
The Buckeye Speech Corpus
• What is it?
• Project
Personnel
• Collection &
Recording
• Transcription
& Analysis
• Why create
the corpus?
• Collection of speech completed by
spring 2000
• 40 speakers, all natives of Central
Ohio (i.e. born in/near Columbus, or
moved there no later than age 10)
• Sample design is strafied for age/sex
– Class was not strictly controlled
– Most are middle class to upper working
class
4
The Buckeye Speech Corpus
• What is it?
• Project
Personnel
• Collection &
Recording
• Transcription
& Analysis
• Why create
the corpus?
• From 40 speakers, about 300,000
words of speech were collected (about
40 hours)
– This large sample should ensure that the
estimates of the forms and frequency of
phonological variation are representative
of the population under study
– There should be a large number of tokens
of many variant forms appearing in
different phonetic environments
– Useful for studying variation
5
The Buckeye Speech Corpus
• What is it?
• Project
Personnel
• Collection &
Recording
• Transcription
& Analysis
• Why create
the corpus?
• Qualified speakers had a conversation
about everyday topics such as politics,
sports, traffic, schools, etc.
• A modified sociolinguistic interview
format was chosen
• Interviews conducted in a small
seminar room by the (male) postdoc
and (female) graduate assistant
6
The Buckeye Speech Corpus
• What is it?
• Project
Personnel
• Collection &
Recording
• Transcription
& Analysis
• Why create
the corpus?
• A detailed description of the
procedures/conventions used in creating
the corpus can be found in the manual
• Sound files and text transcriptions
– Digital recordings were transferred onto a
PC using a digital I/O card
– Recorded conversations were transcribed
into written English text by undergraduate
transcribers using Soundscriber software
(http://www-personal.umich.edu/~ebreck/sscriber.html)
– Transcripts are stored as ASCII text files
7
The Buckeye Speech Corpus
• What is it?
• Project
Personnel
• Collection &
Recording
• Transcription
& Analysis
• Why create
the corpus?
• Automatic word and phone alignment
– Sound files and written transcriptions were
input to an automatic phonetic transcription
program, Entropics Aligner
– Aligner uses acoustic phone models trained
on the TIMIT corpus of spoken English. It
comes with a dictionary that lists several
alternative pronunciations for many words
– RA’s used Aligner to select the best fitting
alternative pronunciations of words from
among the alternatives listed in the dictionary
and aligned the selected words and their
phones to a portion of the sound wave
8
The Buckeye Speech Corpus
• What is it?
• Project
Personnel
• Collection &
Recording
• Transcription
& Analysis
• Why create
the corpus?
• Hand realignment
– Errors produced by the Aligner were
corrected by phonetically trained RA’s
– Corrections were made when the Aligner’s
labels were placed at the wrong locations or
when a label that is not a part of Aligner’s
segmental repertoire was needed
– For the hand alignment procedure, deciding
upon the appropriate transcription of a given
sequence was done using combined
waveform and spectrographic displays of the
signal using Entropics waves+ or Wavesurfer
software
9
The Buckeye Speech Corpus
• What is it?
• Project
Personnel
• Collection &
Recording
• Transcription
& Analysis
• Why create
the corpus?
• The .words / .phones / .log label files
– The alignment procedure creates three
(ASCII text) ‘label’ files corresponding to
each sound file
– The first contains the word labels and offset
times
– The second contains the phone labels and
offset times
– The third label file is a log of notes supplied
by the labelers, marking instances of unusual
voice quality, manner of speaking, nasality,
etc.
10
The Buckeye Speech Corpus
• What is it?
• Project
Personnel
• Collection &
Recording
• Transcription
& Analysis
• Why create
the corpus?
• Can be used for both pure research and
for applied research and product
development
• As a resource for pure research
– The corpus provides one of the richest
sources of data on pronunciation variation in
conversational speech
• Auditory word recognition in psycholinguistics
• Rules of pronunciation variation in phonology
• Age and gender related conditioning on
pronunciation variation in sociolinguistics
• Effects of pronunciation variation on automatic
speech recognition
11
The Buckeye Speech Corpus
• What is it?
• Project
Personnel
• Collection &
Recording
• Transcription
& Analysis
• Why create
the corpus?
• On the applied side
– Training acoustic models for speech
recognition systems
– Lexicon training for handling pronunciation
variation
– Testbed for grammar training
12
Corpus Citation
•
Pitt, M.A., Dilley, L., Johnson, K., Kiesling, S., Raymond, W., Hume, E. and Fosler-Lussier, E.
(2007) Buckeye Corpus of Conversational Speech (2nd release) [www.buckeyecorpus.osu.edu]
Columbus, OH: Department of Psychology, Ohio State University (Distributor).
•
Related Publications
Raymond, William D., Robin Dautricourt, and Elizabeth Hume. (2006). Word-medial /t,d/ deletion in spontaneous speech: Modeling the
effects of extra-linguistic, lexical, and phonological factors. Language Variation and Change, 18(1), 55-97.
Pitt, Mark, Keith Johnson, Elizabeth Hume, Scott Kiesling, and William Raymond. (2005). The Buckeye Corpus of Conversational Speech:
Labeling Conventions and a Test of Transcriber Reliability. Speech Communication, 45, 90-95.
Pitt, Mark and Keith Johnson. (2003). Using pronunciation data as a starting point in modeling word recognition. Paper presented at the 15th
International Congress of Phonetic Sciences.
Johnson, Keith. (2003). Aligning phonetic transcriptions with their citation forms. Acoustic Research Letters Online.
Johnson, Keith. (2003). Massive reduction in conversational American English. Proceedings of the Workshop on Spontaneous Speech: Data
and Analysis. August, 2002. Tokyo, JP.
Raymond, William D., Robin Dautricourt, and Elizabeth Hume. (Submitted, 2003). Medial /t,d/ deletion in spontaneous speech. Manuscript
submitted to Language Variation and Change.
Raymond, William D. (2003). An analysis of coding consistency in the transcription of spontaneous speech from the Buckeye corpus.
Proceedings of the Workshop on Spontaneous Speech: Data and Analysis. August, 2002. Tokyo, JP.
Raymond, William D., Mark Pitt, Keith Johnson, Elizabeth Hume, Matthew Makashay, Robin Dautricourt, and Craig Hilts. (2002). An
analysis of transcription consistency in spontaneous speech from the Buckeye corpus. Proceedings of ICSLP-02. September, 2002. Denver.
13
What it looks like
14
이후 순서
•
•
•
•
Buckeye Corpus 검색 스크립트 소개
인터넷 방송 저장 방법 및 상용프로그램 소개
포먼트 변형/합성 스크립트 소개
Voice bar/prevoicing/VOT 길이 조정
스크립트 소개
• TextGrid 자동 생성 스크립트 소개
• …
15