슬라이드 1 - Ohio State Linguistics

Download Report

Transcript 슬라이드 1 - Ohio State Linguistics

LSA 2009 January 10
VOT is necessary but not sufficient
for describing the voicing contrast
in Japanese
Eun Jong Kong*, Mary E. Beckman*, Jan Edwards †
(*Ohio State University, †Univ. of Wisconsin at Madison)
1
Introduction

Since the seminal work of Lisker and Abramson (1964), Voice Onset Time
(VOT) has been used as the primary measure for comparing word-initial
stop voicing and aspiration contrasts across languages.
vot=0
e.g.,
• Spanish: /d/ vs. /t/
lead VOT vs. short lag VOT
• English: /d/ vs. /t/
Spanish
t
frequency
• Cantonese: /t/ vs. /th/
short lag vs. long lag VOT.
t
d
lead or short lag VOT vs. long
lag VOT
th
Cantonese
d
t
English
Figure.1 Voice onset time distribution of apical (dental and
alveolar) stops of two-category languages. Taken from Lisker
& Abramson (1964).
voice onset time (msec)
2
Introduction

VOT has also been a useful acoustic measure for describing
children’s mastery of word-initial stops in languages with
voicing and/or aspiration contrasts.
7 year olds
e.g., Thai (Gandour et al 1986)
- stops with three-way contrast
: /d/ vs. /t/ vs. /th/
- lead VOT mastered later than
short lag VOT or long lag VOT
/d/
/t/
/th /
5 year olds
Thai
3 year olds
Figure.2 VOT distribution of alveolar stops in Thai.
Taken from Gandour et al (1986).
3
Introduction

Is VOT the whole story?

Japanese stops and VOT

Two-way voicing contrast (Homma 1980, Shimizu 1989)




voiced stops: not only lead VOT, but also short lag VOT (Takada
2004)
voiceless stops: neither clearly short lag nor clearly long lag, but
intermediate between the two (Riney et al 2007)
This results in overlap in VOT range between the two categories
Is there another acoustic measure that helps to disambiguate?
4
Goal of the study

To evaluate whether VOT is a sufficient acoustic measure
in distinguishing voiced stops from voiceless stops in
Japanese, we investigate


how the acoustic parameter of VOT relates to native
speaker/transcriber judgments of accuracy for voiced and
voiceless stop consonants in English- and Japaneseacquiring children.
whether another acoustic parameter is also needed to predict
native speaker/transcriber judgments of these productions.
5
Research questions

Children’s stop productions were analyzed to address the
following questions.
Question 1) Are there differences between the time-courses for
mastering the stop voicing contrasts in English and Japanese?
Method; judgments by trained native speaker/phoneticians,
logistic regression.
Question 2) How well does the single acoustic dimension of VOT
predict the native speaker/transcriber’s judgments of voiced vs.
voiceless stops produced by English- and Japanese-acquiring
children?
Question 3) Is there another acoustic dimension that improves the
prediction of the native speaker/transcriber’s judgments of the
voicing contrast in stops produced by these children?
Method; acoustic analysis, logistic regression
6
Data collection
1) Production data come from paidologoV project
- cross-language investigation of phonological development
www.ling.ohio-state.edu/~edwards/
2) Subjects


51 children (2;0-6;0) , 20 adults (18;0-30;0) recorded in Tokyo
50 children (2;0-6;0) , 15 adults (18;0-30;0), recorded in Ohio
3) Materials: word-initial pre-vocalic lingual stops — e.g.,


Japanese /d/ daikon ‘radish’ vs. /t/ tamago ‘egg’
English /d/ dove vs. /t/ tongue
(velar stops were also recorded but not discussed here)
7
tamago ‘egg’
8
9
daikon ‘radish’
Correct Voicing
Voicing Error
10
Correct Voicing
Voicing Error
11
Analysis 1: Transcription
Question 1) Are there differences between the timecourses for mastering the stop voicing contrasts in
English and Japanese?

Measure: voicing accuracy from transcriptions by a trained
phonetician native speaker of English/Japanese.



voicing correct: /t/ → [t], /d/ → [d], /d/ → [g], /t/ → [k]
voicing error: /t/ → [d], /d/ → [t], /t/ → [n]
Criterion for mastery: 75% voicing accuracy (adapted from
criteria used in norming studies such as Smit et al., 1990).
12
Transcription: results
30
40
50
60
70
80
30
100
60
80
100
60
80
30
40
40
50
50
60
60
70
age in month
age (months): English
70
80
40
20
40
20
g
k
d
t
0
100
80
60
20
40
/d/ at 42 mo
0
d
t
0
20
before 24 mo
Japanese
0
40
English
% voicing accuracy cons.
100
80
60
75% accuracy criterion
0
20
40
60
80
100
Mixed effects logistic regression.
 Dependent variable: token by token voicing accuracy
(correct / incorrect)
 Independent variable: age of child and target voicing (fixed
effect) + subject (random effect)
% voicing accuracy cons.

80 13
30
30
40
age
Analysis.1: interim conclusion
Transcription Analysis
 The voicing contrast is mastered later by Japanese-speaking
children, as compared to English-speaking children.
14
Analysis 2: VOT
Question 2) How well does the single acoustic dimension of
VOT predict the native speaker/transcriber’s judgments of
voiced vs. voiceless stops produced by English- and
Japanese-acquiring children?

VOT: the latency between the burst and the voicing onset.
burst
voice
onset
torn4_20000
141.871398
0.08031
142.072843
VOT
0
/t/ in “torn”
-0.06935
141.9
142.1
Time (s)
15
VOT: results (adults)
60
VOT=0
20
VOT medians.
0
20
0
-0.10
male
0.10
0.00
-0.20
60
Japanese
/d/
/t/
-0.10
female
0.10
0.00
20
0
0
20
40
60
-0.20
40
no. of counts
female
d
t
40
English
40
60
male
-0.15
-0.05
0.05
0.15
-0.15
-0.05
0.05
0.15
VOT in seconds


English: clear separation between short lag (/d/) vs. long lag (/t/)
Japanese: lead or short lag (/d/) vs. intermediate lag (/t/), with much
overlap.
16
VOT: results (children)
VOT medians.
60
0
0.1
0.2
-0.2
VOT in seconds
Japanese
Japanese
-0.1
5yo
0.0
0.1
0.2
/d/
/t/
-0.1
0.0
0.1
0.2
5 yos
20
2 yos
0.0
60
-0.1
2yo
40
-0.2
-0.2

VOT=0
20
English
/d/
/t/
5 yos
40
2 yos
5yos
0
no. of counts
0
20 40 60
no. of counts
0
20 40 60
2yos
-0.1
0.0
0.1
0.2
-0.2
VOT in seconds
Language specific VOT distributions in children’s stops


English: clearly separated peaks.
Japanese: intermediate values for /t/ with even more overlap
with /d/ than in adults.
17
VOT: results (children)
Mixed effects logistic regression
correctly predicted 94%
English
correctly predicted 80%
Japanese
0.0
0.0
0.2
0.4
0.6
0.8
1.0

Dependent variable: token by token voicing judgment (/t/
or not /t/)
Independent variable: VOT
1.0

probability of transcription as /t/
0.2
0.4
0.6
0.8

-0.3
-0.2
-0.1
0.0
0.1
VOT (seconds)
0.2
0.3
-0.3
-0.2
-0.1
0.0
0.1
VOT (seconds)
0.2 18 0.3
VOT: results (children)

Evaluation of predictive value


Baseline prediction accuracy with no independent variable
i.e., calculate the proportion of tokens where the transcriber
transcribed a voiceless consonant:
‘Baseline’: 49.7% and 63.3%
Model’s prediction accuracy with VOT as an independent
variable
i.e., calculate proportion of tokens where the odds of transcribing
/t/ are greater than 50% and the transcriber actually transcribed
/t/:
‘VOT model’: 94% and 80%
19
Analysis 2: interim conclusion
Transcription Analysis
 The voicing contrast is mastered later for Japanese-speaking
children, as compared to English-speaking children.
VOT
 The single acoustic dimension of VOT predicts the transcribed
voicing for English productions 94% of the time.

Accuracy of prediction for Japanese productions is much lower.
20
Analysis 3: H1-H2 by VOT

Question 3) Is there another acoustic dimension that improves the
prediction of the native speaker/transcriber’s judgments of the
voicing contrast in stops produced by these children?
H1-H2
 A type of breathiness measure.
 Amplitude difference between the first harmonic and the second
harmonic.
torn4_20000
141.871398
0.08031
142.072843
“torn”
-0.06935
141.9
Time (s)
25ms
)z
H
/ 40
B
(dl
ev
el
er
u
142.1s
se 20
rp
dn
uo
S
Amplitude (dB)
0
0
0
first harmonic (H1)
H1-H2 (dB)
second harmonic (H1)
Frequency (Hz)
21
6000
10
0
-10
0
10
/t/
/th/
adults:
male
male
-100
-10
0
10
female
adults: female
-20

Higher H1-H2 and
longer VOT for /t/.
No overlap between
VOT ranges
-10

-20
English
English
H1-H2 (dB)

20
Adults
100
-100
-10
0
10
100
10
/d/
/t/
-10
0
10
0
adults:
male
male
-100
-10
0
10
female
adults: female
-20

Higher H1-H2 and
longer VOT for /t/.
Overlap between
VOT ranges
Japanese
-10

20
Japanese
-20

20
log VOT (ms)
H1-H2 (dB)

20
H1-H2 by VOT: adults
100
log VOT (ms)
-100
-10
0
10
22
100
-100
-10
10
100
/d/ on target
/t/ on target
[t] off target
[d] off target
5 yos
-100
/d/ on-10
target 0
/t/ on target
log VOT (ms)
[t] off target
[d] off target
10
100
0
-10
-20
2 yos
-100
-10
0
10
5 yos
-20
-10
0
10
0
10
Japanese
Japanese /t/
: longer lag VOT,
higher H1-H2
20
10
0
-10
2 yos
20 -20
0
-10
20 -20
English /t/
: longer lag VOT
H1-H2 (dB)
Perceived /t/ and
/d/ by transcriber.
H1-H2 (dB)

English
10
20
H1-H2 by VOT: children
100
log VOT (ms)
-100
-10
0
10
23
100
VOT: results (children)

Mixed effects logistic regression


Dependent variable: token by token voicing judgment (/t/
or not /t/)
Independent variables: VOT+ H1H2
24
VOT and H1-H2: results (children)
*
English
English
children
29.4 times
29.4
times
7.91
VOT
>
0.27
*
H1-H2
0.0 0.2 0.4 0.6 0.8 1.0
Evaluation of predictive value
 Baseline prediction accuracy with no independent variable
i.e., calculate the proportion of tokens where the transcriber
transcribed a voiceless consonant: 49.7% and 63.3%
 Model’s prediction accuracy with VOT as an independent
variable: 94% and 80%
 Model’s prediction accuracy with VOT and H1-H2 as
independent variables: 94% and 83%
normalized coefficients
0.0 0.2 0.4 0.6 0.8 1.0

*
Japanese
* P < 0.05
5.3
5.3 times
times
5.84
VOT
>
1.1
H1-H2
*
25
Analysis 3: interim conclusion
Transcription Analysis
 The voicing contrast is acquired later for Japanese-speaking
children, as compared to English-speaking children.
VOT
 The single acoustic dimension of VOT is adequate to
characterize the transcription results for English.
 However, VOT alone does not adequately characterize
the transcription results for Japanese.
H1-H2 by VOT

In Japanese, the additional acoustic parameter of H1-H2
improves the prediction of the transcription results.

The effects of VOT relative to H1-H2 was greater in English than
in Japanese
26
Summary and conclusion

Japanese-speaking children showed mastery of the voicing
contrast at a later age than English speaking children.



However, the VOT ranges for the productions of Japanese-speaking
children were similar to those of adults.
When VOT alone was used to predict the judgments of a
trained native speaker/transcriber, it was only 80%
successful in Japanese, whereas it was 94% successful in
English.
Adding the acoustic parameter of H1-H2 improved the
prediction of the native speaker/transcriber judgments for
the productions of the Japanese-speaking children, but not
for those of the English-speaking children.
27
Summary and conclusion

English and Japanese encode their stop voicing
contrast in the acoustic dimensions in languagespecific ways.



English: exclusively along VOT dimension
Japanese: more than VOT dimension
Unlike English, VOT is not a sufficient acoustic
measure of stop voicing contrast in Japanese.

It was necessary to examine other relevant acoustic
dimensions such as breathiness to correctly
characterize Japanese stop voicing contrast.
28
Acknowledgement


This work was supported by by NIDCD grant 02932
to Jan Edwards.
We thank the children who participated in the task, the
parents who gave their consent, and the principals and
teachers at the schools at which the data were collected.
Thank you for your attention!
29
reference







Lisker, L. and A. Abramson. 1964. A cross-language study of voicing in initial
stops: acoustical measurements. Words, 20.
Riney, T., N. Takagi, K. Otaa, and Y. Uchida. 2007. The intermediate degree of
vot in japanese initial voiceless stops. Journal of Phonetics, 35.
Smit, A.B., L. Hand, J. Freilinger, J renthal, and A Bird. 1990. The iowa
articulation norms project and its nebraska replication. Journal of Speech and
Hearing Disorders, 55.
Gandour, H. S. H., J., R. Petty, S. Dardarananda, Dechongkit, and S.
Mukongoen. 1986. The acquisition of the voicing contrast in thai: A study of
voice onset time in word-initial stop consonants. Journal of Child Language, 13.
Takada, M. 2004. VOT tendency in the initial voiced alveolar plosive /d/ in
Japanese and the speakers' age. Journal of the Phonetic Society of Japan, 8(3),
57-66.
Homma, Y. (1980). Voice onset time in Japanese stops. Onseigakkai Kaihoo,
163, 7-9.
Sander, E.1972. When are speech sounds learned? Journal of Speech and
Hearing Disorders, 37: 55-63.
30
Extra I: Velars
adults scatterplts
English adults: coronals + velars
Japanese adults: coronals (top) +
velars (bottom)
31
Extra I: Velars
children scatterplots
English children
(alv: left, velar: right)
- VOT only model:
93%
- VOT&H1-H2 model:
no improvement.
VOT was the only
effective parameter.

32
Extra I: Velars
children scatterplots
Japanese children
(alv: left, velar: right)
- VOT only model:
87%
- VOT&H1-H2 model:
no improvement.
VOT was the only
effective parameter.

33
34
35
36
Correct Voicing
Voicing Error
37