melp

Transcript melp

MELP Vocoders
Nima Moghadam
SN#:82245502
Saeed Nari
SN#:82270309
Supervisor
Dr. Saameti
April 2005
Sharif University of Technology
Page 0 of 23
Outline

Introduction
 MELP Vocoder Features
 Algorithm Description
 Parameters & Comparison
Page 1 of 23
Introduction

Traditional pitched-excited LPC
vocoders use either a periodic train or
white noise for synthesis filter
 intelligible speech at very low bit rates
 But sometimes results in mechanical or
buzzy sound and are prone to tonal noise
Page 2 of 23
Introduction

These problems arise from:
– Inability of a simple pulse train to reproduce
all kind of voiced speech

MELP vocoder uses a mixed-excitation
model and it represents a richer ensemble
of speech characteristic
 Produce more natural sounding speech
Page 3 of 23
MELP vocoder


Robust in background
noise environments
Mixed
excitation
Aperiodic
pulses
Based on traditional LPC
model, also includes
additional features
Pulse dispersion
Adaptive spectral
enhancement
Page 4 of 23
Mixed Excitation

Mixed-excitation is implemented using a
multi-band mixing model
 This model can simulate frequency
dependent voicing strength
 Using a mixture of Aperiodic/periodic
and white noise as excitation
 Primary effect of this unit is to reduce the
buzz in broadband acoustic noise
Page 5 of 23
Aperiodic pulses

When input signal is voiced, MELP
vocoder can synthesize speech using
either aperiodic or periodic pulses.
 Aperiodic pulses used during transition
regions between voiced and unvoiced
segments of speech signal
 Producing erratic glottal pulses without tonal
noise
Page 6 of 23
Pulse Dispersion

Pulse dispersion is implemented using fixed pulse
dispersion filter based on a flattened triangle pulse

The pulse dispersion filter improves the match of bandpass
filtered synthetic and natural speech waveforms in
frequency bands which do not contain a formant
resonance.
 Spreading the excitation energy with a pitch period
Reduce harsh quality of the synthetic speech
Page 7 of 23
Adaptive spectral enhancement filter

Based on the poles of the vocal tract filter
 Is used to enhance the formant structure
in the synthetic speech
 This filter improves the match between
synthetic and natural bandpass
waveforms  more natural speech
output
Page 8 of 23
MELP Algorithm Description
(Encoder)
1.
2.
3.
filter out any low frequency noise
This filtered speech is again filtered in
order to perform the initial pitch search
for the pitch estimation
The next step is to perform the Bandpass
voicing analysis
- In this step we decide to use periodic/Aperiodic train or white
noise model
Page 9 of 23
MELP Algorithm Description
(Encoder) cont’d


In this stage A voice degree parameter is estimated in each band, based
on the normalized correlation function of the speech signal and the
smoothed rectified signal in the non-DC band
Let sk(n) denote the speech signal in band k, uk(n) denote the DCremoved smoothed rectified signal of sk(n). The correlation function:
N 1
R x ( p) 
 x ( n) x ( n  p )
n 0
N 1
N 1
n 0
n 0
[ x 2 (n) x 2 (n  p )]1 / 2
P – the pitch of current frame
N – the frame length
k – the voicing strength for band (defined as max(Rsk(P),Ruk(P)))
Page 10 of 23
MELP Algorithm Description
(Encoder ) cont’d

The jittery state is determined by the peakiness of the
fullwave rectified LP residue e(n):
Peakiness

1
[
N
N 1
 e( n )
2 1/ 2
]
n 0
N 1
1
N
 e( n )
n 0
If peakiness is greater than some threshold, the speech
frame is then flagged as jittered (Aperiodic flag will be set)
Page 11 of 23
MELP Algorithm Description
(Encoder) cont’d
4.
5.
6.
7.
8.
Applying a LPC analysis
Calculating final pitch estimate
Calculating Gain estimate
quantize the LPC coefficients, pitch, gain and bandpass
voicing
Fourier magnitudes are determined and quantized
 The information in these coefficients improves the
accuracy of the speech production model at the
perceptually-important lower frequencies
Page 12 of 23
MELP Encoder
Input
Pre filter
signal
LPC
Analysis
Filter
Pitch
Search
Final Pitch
And voicing
Decision
Fourier
Magnitude
calculation
Page 13 of 23
Bandpass
Voicing
Decision
LSF
quantization
Apply
Forward
Error Correction
Gain
Calculator
Quantize
Gain, pitch,
Voicing,
jitter
Transmitted
Bitstream
MELP Algorithm (Decoder)
1.
2.
3.
4.
Decoding the pitch
Applying gain attenuation
Interpolating linearly all of the synthesis
parameters pitch-synchronously
Generating mixed-excitation
Page 14 of 23
MELP Algorithm (Decoder) cont’d
5.
6.
7.
Applying an adaptive spectral
enhancement filter
LPC synthesis and applying gain factor
Dispersion filtering
Page 15 of 23
MELP Decoder
Received
Bitstream
Decode
parameters
Noise
Generator
Pulse
Generator
Pulse
Position
Jitter
Noise
Shaping
Filter
Pulse
Shaping
Filter
LPC
Synthesis
Filter
Page 16 of 23
+
Adaptive
Spectral
Enhancement
gain
Pulse
Dispersion
Filter
Synthesized
Speech
Parameter Quantization
Parameters
Voiced
Unvoiced
LSF parameters
25
25
Fourier magnitudes
8
-
Gain (2 per frames)
8
8
Pitch. overall voicing
7
7
Bandpass voicing
4
-
Aperiodic flag
1
-
Error protection
-
13
Sync bit
1
1
Total bits / 22.5 ms
frame
54
54
Page 17 of 23
Bit transmission order
Page 18 of 23
Comparison of the 2400 BPS MELP with
other Standard Coders


Diagnostic Acceptability
Measure
Two Conditions
– Quiet
– Office




Continuously Variable Slope Delta Modulation
(CVSD)
• 16,000 bps
Code Excited Linear Prediction (CELP)
• 4800 bps
• FS1016
Mixed Excitation Linear Prediction (MELP)
• 2400 bps
• FIPS Publication 137
Linear Predictive Coding (LPC)
• 2400 bps
Page 19 of 23
Comparison of the 2400 BPS MELP with
other Standard Coders (cont’d)

Mean Opinion Score in Six Conditions
Quiet
–
–
Anechoic Sound Chamber
Dynamic Microphone
Quiet - H250
–
–
Anechoic Sound Chamber
H250 Microphone
1% Random Bit Errors
–
–
Anechoic Sound Chamber
Dynamic Microphone
0.5% Random Block Errors
–
–
–
Anechoic Sound Chamber
Dynamic Microphone
50% Errors within a 35ms block
Office
–
–
Modern Office Environment
Dynamic Microphone
Mobile Command Environment
–
–
Field Shelter
EV M87 Microphone
Page 20 of 23
Comparison of the 2400 BPS MELP with
other Standard Coders (cont’d)

Complexity with
three Measurements
– RAM
– ROM
– MIPS
Page 21 of 23
Voice samples
Original Sound
MELP 1800
MELP 2000
MELP 2200
Page 22 of 23
Any Question?
Page 23 of 23