Transcript Slide 1
Applications
EARs, 22-23 August 2011 Stanley Wenndt, PhD AFRL/RIGC Rome Research Site
Familiar Speaker Recognition
•
Two motivations
– Finish MS Neuroscience degree • Needed 700-level NEU course, Ind Study only option – Speech Power versus Speech Intelligibility • Gerber 1974
Frequency Range (Hz) 0-500 500-1000 1000-2000 2000-4000 4000-8000 % Power 60 35 3 1 1
– What about SID
% Intelligibility 5 35 35 13 12
28 April 2020 2
Audio Data
• • •
In-House Database
– Longitudinal study (20 sessions over 3 years) – 65 subjects • 25 (20 males, 5 females) connected to the Audio Group – Read, Digits, Short Sentences, Conversations
10 Short Sentences
– Two intonations • Let’s go skiing today.
– Visual and audible cue • Natural elicitation
Shortfalls (hindsight)
– Unequal Sentences – Different degrees of familiarity between listeners/speakers 28 April 2020 3
Listening Experiments
• • • • • • • • • • •
Session 1 – Pure Tone Test Session 2 –Familiarization with Test Set-up Session 3 – Clean Session 4 – 0-1K Hz, -20 dB, Speech shaped, add WGN Session 5 - 1-2K Hz, -20 dB, Speech shaped, add WGN Session 6 - 2-3K Hz, -20 dB, Speech shaped, add WGN Session 7 - 3-4K Hz, -20 dB, Speech shaped, add WGN Session 8 – 0-4K Hz, 0 dB, Speech shaped, add WGN Session 9 - Clean Session 10 - Whispered Session 11 – Time-reversed
28 April 2020 4
Listening Experiments
• • •
Results reported in 2 groups
– Normal Hearing – Hearing Deficit
Hard to draw conclusions from 2 nd group
– Don’t know severity of hearing loss
Experiments are a rough 1 st pass
– 10 SID Listening Sessions – Analyze data – Learn from mistakes 28 April 2020 5
Group Normal HD
Listening Experiments
Clean 0-1K
90.0
73.3
82.2
62.0
1K-2K 2K-3K
80.9
49.3
76.0
50.0
3K-4K
79.1
58.0
Clean
94.9
73.0
95 65 55 85 75 45 Clean 0-1K 1-2K 2-3K 3-4K Clean Normal Listeners Listeners with Hearing Deficit 28 April 2020 6
Current Research
• 28 April 2020
Data Analysis
– Difficult to compare between sessions • Is the performance statistically different – Between group, within group?
– Current data analysis is focused on individual sentences • Let’s go skiing today.
– Same phonetic content – Same noise (or lack of) – Same intonation – Same session – Main variable is the speaker • Formants, shimmer, jitter, energy, etc 7
“Male 8”
• • • •
Most easily recognized voice Except for Session 6
– 2K-3K noise
Currently, we build models the same
– Good or bad?
Can we figure out what is unique or not unique about and individual’s voice?
Session 2 Clean 0-1K 1K-2K 2K-3K “Male 8”
35 36 36 33 16
3K-4K 0-4K
33 34
Clean Whis
35 34
Rev
33
28 April 2020 8