Transcript Slide 1

Applications

EARs, 22-23 August 2011 Stanley Wenndt, PhD AFRL/RIGC Rome Research Site

Familiar Speaker Recognition

Two motivations

– Finish MS Neuroscience degree • Needed 700-level NEU course, Ind Study only option – Speech Power versus Speech Intelligibility • Gerber 1974

Frequency Range (Hz) 0-500 500-1000 1000-2000 2000-4000 4000-8000 % Power 60 35 3 1 1

– What about SID

% Intelligibility 5 35 35 13 12

28 April 2020 2

Audio Data

• • •

In-House Database

– Longitudinal study (20 sessions over 3 years) – 65 subjects • 25 (20 males, 5 females) connected to the Audio Group – Read, Digits, Short Sentences, Conversations

10 Short Sentences

– Two intonations • Let’s go skiing today.

– Visual and audible cue • Natural elicitation

Shortfalls (hindsight)

– Unequal Sentences – Different degrees of familiarity between listeners/speakers 28 April 2020 3

Listening Experiments

• • • • • • • • • • •

Session 1 – Pure Tone Test Session 2 –Familiarization with Test Set-up Session 3 – Clean Session 4 – 0-1K Hz, -20 dB, Speech shaped, add WGN Session 5 - 1-2K Hz, -20 dB, Speech shaped, add WGN Session 6 - 2-3K Hz, -20 dB, Speech shaped, add WGN Session 7 - 3-4K Hz, -20 dB, Speech shaped, add WGN Session 8 – 0-4K Hz, 0 dB, Speech shaped, add WGN Session 9 - Clean Session 10 - Whispered Session 11 – Time-reversed

28 April 2020 4

Listening Experiments

• • •

Results reported in 2 groups

– Normal Hearing – Hearing Deficit

Hard to draw conclusions from 2 nd group

– Don’t know severity of hearing loss

Experiments are a rough 1 st pass

– 10 SID Listening Sessions – Analyze data – Learn from mistakes 28 April 2020 5

Group Normal HD

Listening Experiments

Clean 0-1K

90.0

73.3

82.2

62.0

1K-2K 2K-3K

80.9

49.3

76.0

50.0

3K-4K

79.1

58.0

Clean

94.9

73.0

95 65 55 85 75 45 Clean 0-1K 1-2K 2-3K 3-4K Clean Normal Listeners Listeners with Hearing Deficit 28 April 2020 6

Current Research

• 28 April 2020

Data Analysis

– Difficult to compare between sessions • Is the performance statistically different – Between group, within group?

– Current data analysis is focused on individual sentences • Let’s go skiing today.

– Same phonetic content – Same noise (or lack of) – Same intonation – Same session – Main variable is the speaker • Formants, shimmer, jitter, energy, etc 7

“Male 8”

• • • •

Most easily recognized voice Except for Session 6

– 2K-3K noise

Currently, we build models the same

– Good or bad?

Can we figure out what is unique or not unique about and individual’s voice?

Session 2 Clean 0-1K 1K-2K 2K-3K “Male 8”

35 36 36 33 16

3K-4K 0-4K

33 34

Clean Whis

35 34

Rev

33

28 April 2020 8