PROJECT PROPOSAL Shamalee Deshpande

Download Report

Transcript PROJECT PROPOSAL Shamalee Deshpande

PROJECT PROPOSAL
Shamalee Deshpande
Problem Statement
Extracting soft biometric features
• Age
• Gender
• Accent
Speaker Database
• A Speaker database from the LDC Corpus
Catalog*
• Preferable use half the speaker set for
training and the later half for verification of
results
• Contain varying gender, age and accent
*Linguistic Data Consortium,
http://www.ldc.upenn.edu/Catalog/
Possible Computation for Gender
• Pitch
In Cepstrum Analysis, Formants are completely
removed from the spectrum thus isolating the
pitch frequency. LPC also used to find pitch
Pitch is used to classify speech with regards to
Gender
Av Males=100-132Hz Av Females=142-256Hz
Window
Speech
DFT
LOG
IDFT
Cepstrum
Possible Computation for Accent
• People usually have characteristic styles
of pronouncing phonemes from an early
age dependant on the primary language
learned.
• Cepstral coefficients may again be used
and presumably the MFCCs for the
analysis of the speech spectrum to identify
local/non-local speakers in a database.
Possible Computation for Age
BUZZER
Glottal excitation
Characterized by
intensity and
pitch
TUBE
Vocal tract
Characterized
by formants
Vocal tract length is said to be a good classifier of the age of a speaker
Formant frequencies derived using LPC co-relate to the length of the vocal
tract
Children are said to have a higher formant frequency range than adults
Specifically, elderly speakers are said to have lower formant frequencies
F1,F2,F3 than their younger counterparts more so seen with regards to F1