Variability of Formant Measurements – Part 2 Philip Harrison J P French Associates & Department of Language & Linguistic Science, York University IAFPA 2006 Annual Conference Göteborg,

Download Report

Transcript Variability of Formant Measurements – Part 2 Philip Harrison J P French Associates & Department of Language & Linguistic Science, York University IAFPA 2006 Annual Conference Göteborg,

Variability of Formant
Measurements – Part 2
Philip Harrison
J P French Associates &
Department of Language & Linguistic Science,
York University
IAFPA 2006 Annual Conference
Göteborg, Sweden
Summary
• Briefly recap previous analysis &
last year’s presentation
• New analysis & results
• PhD research
• Questions
l
2
Study
• Aim: Investigate the variability of
formant measurements which exists
both within and between different
software programs currently used in the
field of forensic phonetics.
– 3 programs – Praat, Multispeech & Wavesurfer
– 3 analysis parameters – LPC order, analysis
(frame/window) width, pre-emphasis
– Word list – 5 vowel categories – 6 tokens per
category – read 3 times – total = 90 tokens
– 2 speakers – Peter French & me
l
– 2 simultaneous recordings – microphone & telephone
3
Results & Analysis
• Scripts used to obtain 37,260
individual formant measurements
using LPC formant trackers
• Analysis – microphone data only
– Initial observations of raw formant data
– Quantitative analysis of results
– Statistical analysis
l
4
My F1s from Praat
LPC Variation
3500
FLEECE
TRAP
PALM
GOOSE
SCHWA
3000
F1 Frequency (Hz)
2500
6
8
10
12
14
16
18
2000
1500
1000
500
88
85
82
79
76
73
70
67
64
61
58
55
52
49
46
43
40
37
34
31
28
25
22
19
16
13
10
7
4
1
0
Token
l
5
The Plot Shows…
• Scripts work – (used in fault finding)
• Vowel categories clear
• Greatest deviation – LPC orders 6 & 8
• Orders 10 to 18 very similar for
FLEECE, GOOSE & SCHWA
• Generated many more plots for all
formants, parameters & software
– Lots of variation
l
– Difficult to interpret
6
Quantitative Analysis
• Quantitative Difference Analysis
– No absolute measurement to compare
formants with – outcome of analysis, not
directly comparable with acoustic reality
– Difference calculated between value
obtained with default analysis settings
– Absolute difference calculated for each
formant then averaged by vowel category
– Shows variation between two analyses
l
7
Observations
• Numerical analysis confirmed
impression from plots
• Clear differences between vowel
categories, speakers, formants,
software & settings
• Complex set of results with no
clear patterns
l
8
Statistical Analysis
• Paired t-test between measurements
from default settings and varied
settings for each vowel category
– Null hypothesis – altering analysis settings  no
effect
– Exp hypothesis – altering analysis settings  effect
• Number of significant ‘hits’ summed –
max 15
• Higher number = greater variation in
formant measurements
l • 2 significance levels – 0.01 & 0.05
9
Conclusions
• Hoped to have clear patterns, able to
produce set of
guidelines/recommendations
• Patterns only at specific, detailed level
• Very clear that many factors affect
formant measurements
• No software is obviously better than
others
l
• Care should be taken when measuring
formants
10
New Work!!!
• Initial data contained obviously
incorrect measurements
• Discard measurements – criterion?
• Determine acceptable band
– Spectrograms – no
– Formant bandwidths – no (attempted)
– LPC tracker & spectrogram – no (attempted)
– Spectrum of selection – yes but still encountered
problems
l
• Band limit 300 Hz – impressionistic
11
Spectrum Measurements
• Used to determine centre of 300 Hz
acceptable band
• Spectrum with 260 Hz bandwidth –
same as default spectrogram
• Measured peaks F1, F2 & F3
• Issues/problems
– Windowed -> biased to centre of selection
– Formant peaks not always clear – some tokens
ignored
l
– Double peaks – highest peak measured
12
Analysis of Accepted
Measurements
• Analyse LPC variation only – other
parameters more stable – not altered
• No accurate reference which raw
measurements can be judged against
• Accepted results provide indication of
accuracy & consistency
• Clear patterns in accepted formants
• Condense results – % accepted per
vowel category
l
13
Plot of Accepted Results
Praat Me Mic F1
100
90
Percentage Accepted
80
70
FLEECE
TRAP
PALM
GOOSE
SCHWA
60
50
40
30
20
10
l
0
6
8
10
12
LPC
14
16
18
14
Me Microphone Accepted
Praat
Multispeech
F1
W avesur f er M e M i c F1
10 0
10 0
10 0
90
90
90
80
80
80
70
70
70
60
60
60
50
50
50
40
40
40
30
30
30
20
20
20
10
10
P r aat M e M i c F2
0
6
F2
Wavesurfer
M ul t i speech M e M i c F1
P r a a t M e M i c F1
8
10
12
10
M ul t i speech M e M i c F2
0
14
16
18
6
8
10
12
16
18
10 0
10 0
10 0
90
90
90
80
80
80
70
70
70
60
60
60
50
50
50
40
40
40
30
30
30
20
20
20
10
10
0
6
8
10
12
14
16
18
10 0
90
90
80
80
70
70
60
60
50
50
40
40
30
30
20
20
10
10
0
0
11
12
10
11
12
13
14
15
16
17
18
16
17
18
17
18
0
6
10 0
10
10
0
P r aat M e M i c F3
W avesur f er M e M i c F2
0
14
8
M ul t i speech M e M i c F3
10
12
14
16
18
W avesur f er M e M i c F3
13
14
15
12 0
10 0
F3
l
80
60
40
20
6
8
10
12
14
16
18
15
0
6
8
10
12
14
16
18
10
11
12
13
14
15
16
Me Telephone Accepted
Praat
Multispeech
P r aat M e P hone F1
F1
10 0
10 0
90
90
90
80
80
80
70
70
70
60
60
60
50
50
50
40
40
40
30
30
30
20
20
20
10
10
P r aat M e P hone F2
6
8
10
12
14
16
18
6
8
10
12
16
18
10 0
10 0
90
90
90
80
80
80
70
70
70
60
60
60
50
50
50
40
40
40
30
30
30
20
20
20
10
10
8
10
12
14
16
18
8
10
12
14
16
18
10 0
10 0
90
90
90
80
80
80
70
70
70
60
60
60
50
50
50
40
40
40
30
30
30
20
20
20
10
10
10
0
0
0
8
10
12
14
16
18
11
12
10
11
12
13
14
15
16
17
18
16
17
18
17
18
0
6
M ul t i speech M e P hone F3
10 0
6
10
10
0
P r aat M e P hone F3
W avesur f er M e P hone F2
0
14
10 0
6
l
10
M ul t i speech M e P hone F2
0
0
F3
W avesur f er M e P hone F1
10 0
0
F2
Wavesurfer
M ul t i speech M e P hone F1
6
8
10
12
14
16
18
W avesur f er M e P hone F3
13
14
15
16
10
11
12
13
14
15
16
JPF Microphone Accepted
Praat
Multispeech
P r aat JP F M i c F1
F1
10 0
10 0
90
90
90
80
80
80
70
70
70
60
60
60
50
50
50
40
40
40
30
30
30
20
20
20
10
10
P r aat JP F M i c F2
6
8
10
12
14
16
18
6
8
10
12
16
18
10 0
10 0
90
90
90
80
80
80
70
70
70
60
60
60
50
50
50
40
40
40
30
30
30
20
20
20
10
10
8
10
12
14
16
18
8
10
12
14
16
18
10 0
10 0
90
90
90
80
80
80
70
70
70
60
60
60
50
50
50
40
40
40
30
30
30
20
20
20
10
10
10
0
0
0
8
10
12
14
16
18
11
12
10
11
12
13
14
15
16
17
18
16
17
18
17
18
0
6
M ul t i speech
10 0
6
10
10
0
P r aat JP F M i c F3
W avesur f er JP F M i c F2
0
14
10 0
6
l
10
M ul t i speech JP F M i c F2
0
0
F3
W avesur f er JP F M i c F1
10 0
0
F2
Wavesurfer
M ul t i speech JP F M i c F1
6
8
10
12
14
16
18
W avesur f er JP F M i c F3
13
14
15
17
10
11
12
13
14
15
16
JPF Telephone Accepted
Praat
Multispeech
P r aat JP F P hone F1
F1
10 0
10 0
90
90
90
80
80
80
70
70
70
60
60
60
50
50
50
40
40
40
30
30
30
20
20
20
10
10
P r aat JP F P hone F2
6
8
10
12
14
16
18
6
8
10
12
14
18
10 0
10 0
90
90
90
80
80
80
70
70
70
60
60
60
50
50
50
40
40
40
30
30
30
20
20
20
10
10
8
10
12
14
16
18
8
10
12
14
16
18
10 0
10 0
90
90
90
80
80
80
70
70
70
60
60
60
50
50
50
40
40
40
30
30
30
20
20
20
10
10
10
0
0
0
8
10
12
14
16
18
11
12
10
11
12
13
14
15
16
17
18
16
17
18
17
18
0
6
M ul t i speech JP F P hone F3
10 0
6
10
10
0
P r aat JP F P hone F3
W avesur f er JP F P hone F2
0
16
10 0
6
l
10
M ul t i speech JP F P hone F2
0
0
F3
W avesur f er JP F P hone F1
10 0
0
F2
Wavesurfer
M ul t i speech JP F P hone F1
6
8
10
12
14
16
18
W avesur f er JP F P hone F3
13
14
15
18
10
11
12
13
14
15
16
General Patterns
• Praat & Multispeech – bell curves
– Most consistent setting – P 10, MS 10 to 14
– Curves shifted to left (lower LPC) for phone
• Wavesurfer – horizontal
– Different behaviour to Praat & Multispeech
– Some very weak results – especially F3
– For me better results for phone recording (also true
for Praat & Multispeech)
• Most consistent setting Praat LPC 10
l
• Again variation across vowel category,
speaker, formant, software & condition
19
Microphone vs Telephone
• Künzel (2001):
– Landline phone vs microphone
– Largest F1 difference in region of 14% for close
vowels
• Byrne & Foulkes (2004):
– GSM mobile phone vs microphone
– F1 average 29% higher for GSM
• Not big differences for F2 & F3
l
• Current data (spectral comparisons) –
only 2 speakers
20
Comparison Tables
Me
FLEECE
TRAP
PALM
GOOSE
SCHWA
F1
258
771
690
260
502
F1 % Diff
26
0
6
33
0
F2
2171
1394
1125
1748
1486
F2 % Diff
0
1
-1
0
1
F3
2891
2632
2626
2242
2513
F3 % Diff
0
-1
-2
0
-1
F2 % Diff
0
-1
-1
-1
0
F3
2551
2306
2439
2222
2274
F3 % Diff
0
0
0
0
0
JPF
l
FLEECE
TRAP
PALM
GOOSE
SCHWA
F1
254
661
607
269
528
F1 % Diff
13
2
6
11
1
F2
2140
1413
1037
1105
1330
21
General Observations
• LPC tracks for phone recordings
more stable, easier to measure
– Less ‘information’ above F3
– Possibly pre-filter recordings?
• Different LPC orders produce
better tracks for different formants
of the same token
– Contradicts my previous advice to keep LPC
setting constant across vowel categories
l
22
PhD Next Steps
• Use synthesised speech
• Formant values specified
• Repeat software experiments
• Other factors to investigate
– Pitch
– Voice quality
– Interaction of analysis parameters
l
23
Other Potential Areas of
Investigation for PhD
• Effects of GSM coding &
transmission
• Acoustic environments
• Pseudo-formants – source???
• Mouth/telephone distance &
orientation
l
• Any other ideas…?
24
Questions
?
l
Thanks to Peter French & Paul Foulkes
25