Vocal Tract & Lip Shape Estimation MS Shah & Vikash Sethia By

Download Report

Transcript Vocal Tract & Lip Shape Estimation MS Shah & Vikash Sethia By

Vocal Tract & Lip Shape Estimation

By

MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey

EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June, 2003

ABSTRACT:

The display of intensity, pitch, and vocal tract shape is considered to be helpful in speech training of the hearing impaired. A speech analysis package is developed in MATLAB for displaying speech waveforms, pitch and energy contours, spectrogram, and areagram (a two-dimensional plot of cross sectional area of vocal tract as a function of time and position along the tract length). While vocal tract shape estimation works satisfactorily for vowels, during stop closures, the place of closure can not be estimated due to very low signal energy. There is a need to investigate methods for predicting vocal tract shape during stop closure from the shapes estimated on either side of the closure. Work is in progress for lip shape estimation which may find application in video telephony.

2

Introduction

Hearing impairment → Lack of auditory feedback during speech production → Speech impairment

Speech training to hearing impaired children by visual (using a mirror) & tactile feedback : some important features and efforts not distinguishable

Speech training aids: Display of articulatory efforts and acoustic parameters: vocal tract and lip shape, pitch, and energy variations

3

Vocal tract shape estimation

General model for speech production system

Where

s

(

n

) 

u

(

n

) 

g

(

n

) 

v

(

n

) 

r

(

n

) s (n) = speech signal, u (n) = glottal excitation, g (n) = glottis impulse response, v (n) = impulse response of the vocal tract, r (n) = impulse response of radiation from lips.

4

Cont..

Acoustic tube model of the vocal tract

At the m th section, volume velocity: pressure: reflection coefficient:

u m p m r m

( (

x

, 

x t

, )

t A m

)   

u

m

 (

A m A m

 1

x c A m

A m

 1 ,

t

)   

u m u

m

( (

x

,

t

)

x

,

t

)  

u m

(

x

,

t

)  5

Cont..

Speech analysis model (Wakita-1973) Assumption vocal tract represented as an all-pole filter with

h

(

n

) 

g

(

n

) 

v

(

n

) 

r

(

n

)

Algorithmic steps:

inverse filtering for error signal with LMS techniqueset of simultaneous equations solved with Robinson’s algorithm

for reflection coefficients & relative area values

6

Cont..

Implementation

Set-up: PC with sound card for signal acquisition

(sampling rate used: 11.025 k sa/s) ■

“ VTAG-1 ” developed for speech pr. & display

Pre-emphasis for 6 dB/octave equalization, analysis window: 256-sample Hamming with 50% overlap

Robinson’s algorithm for obtaining reflection coefficients & area values

Beizer form algorithm for interpolation of area values

7

VTAG-1 result for all-vowel word /aIje/

8

/a/

Synthesized vowels

/i/ /u/

9

Amplitude/pitch modulated synthesized vowel

/a/

Amplitude modulated Pitch modulated Amp. & pitch modulated

10

/ata/

Spectrograms for V-C-V sequence

/aka/ /ada/ /aga/

11

/ata/

Areagram for V-C-V sequence

/aka/ /ada/ /aga/

12

Lip shape estimation

Mouth parameters:

Parameter estimation :

Pitch tracking : odd harmonics absent for analysis window

length = 2 * pitch period

Magnitude spectrum above 4000 Hz clipped to zeroMean & variance used for generation of predictor surfaces 13

Lip shape estimation results Pitch and mean vs. variance result (1): synthesized amplitude modulated vowel /

u

/

14

Pitch and mean vs. variance result (2): synthesized pitch/amplitude modulated vowel /

a

/

15

Pitch and mean vs. variance result (3): synthesized pitch modulated vowel /

i

/

16

Summary

■ Analysis & display package VTAG-1 developed for pitch/energy variation, spectrogram, & areagram (2-D plot of v.t. area) to investigate the problems in estimation of vocal tract shape, for use in speech training aid of the hearing impaired children.

17

Cont.

■ Area estimation for vowels: not affected by amplitude & pitch variation ■ Area estimation during stop closure: place of closure can not be estimated from analysis result during stop closure ■ Further work:

 

Investigate methods for predicting vocal tract area during stop closure from the areas estimated on either side of closure Implement algorithm for generation of predictor surfaces for extraction of lip shape estimation parameters

18