Quantization Noise is 1
Download
Report
Transcript Quantization Noise is 1
Waveform SpeechCoding
Algorithms: An Overview
Outline
Introduction
Concepts
Quantization
PCM
DPCM
ADPCM
Standards & Applications
G711
G726
Performance Comparison & Examples
Summary & Conclusion
Technical Presentation Page 2
Introduction
Motivation
What is Speech Coding ?
It is the procedure of representing a digitized speech signal as efficiently as possible,
while maintaining a reasonable level of speech quality.
Why would we want to do that ?
To Answer this, let’s have a look at the Structure of the Coding System
Our Guy
Technical Presentation Page 3
Introduction
Motivation
Filtering & Sampling (1)
Technical Presentation Page 4
Introduction
Motivation
Filtering & Sampling (2)
Technical Presentation Page 5
Introduction
Motivation
Filtering & Sampling (3)
Technical Presentation Page 6
Introduction
Motivation
Filtering & Sampling (4)
Most of the speech contents lies in between 300 – 3400 Hz
According to Nyquist theorem Fs >= 2 fm (to avoid aliasing)
A value of 8kHz is selected (8 >= 2*3.4).
For good quality16 bits are used to represent each sample.
Bit-rate = 8kHz *16 bits = 128 kbps
Input Rate
The Input rate could even be more, for example in Skype: 16 kHz sampling frequency is used
in skype and so resulting to an input rate of 192 kBit/s. But, this is a waste of bandwidth that
could rather be used by other services and applications.
Source Coding (Speech Coding in this Context)
[1]
Technical Presentation Page 7
Introduction
Motivation
Desirable Properties of a Speech Coder
Low Bit-Rate: By using a lower bit-rate, a smaller bandwidth for transmission is
needed , leaving room for other services and applications .
High Speech Quality: Speech quality is the rival of “low bit-rate”. It is important for the
decoded speech quality to be acceptable for the target application.
Low Coding Delays: The process of speech coding introduce extra delay, this might
affect application that have real time requirements.
[1]
Technical Presentation Page 8
Introduction
Speech Coding Categories
What are the different Categories of speech coding ?
Speech coding is divided into three different categories:
Waveform Codecs (PCM, DM, APCM, DPCM, ADPCM)
Vocoders (LPC, Homo-morphic, …etc )
Hybrid codecs (CELP, SELP, RELP, APC, SBC, … etc)
[2]
Technical Presentation Page 9
Concepts
Quantization
What Is Quantization ?
Quantization is the process of transforming the sample amplitude of a message into a
discrete amplitude from a finite set of possible amplitudes.
[3]
Each sampled value is approximated with a quantized pulse, the approximation will result
in an error no larger than q/2 in the positive direction or –q/2 in the negative direction.
Technical Presentation Page 10
Concepts
Quantization
Understanding Quantization
To understand quantization a bit more let’s have a look at the following Example:
Technical Presentation Page 11
Concepts
Quantization
Classification Of Quantization Process
The Quantization process is classified as follows:
Uniform Quantization: The representation levels are equally spaced (Uniformly spaced)
Midtread type
Midrise type
Non-Uniform Quantization: The representation levels have variable spacing from one
another .
[4]
But why do we need such classification ?!
Technical Presentation Page 12
Concepts
Quantization
Human Speech – Excursion & Recap (1)
Speech can broken into two different categories:
Voiced (zzzzz)
Un-Voiced (sssss)
Naturally occurring speech signals are composed of a combination of the above categories,
take the word “Goat” for example:
[4]
Goat contains two voiced signals followed by a partial closure of the vocal tract and then an
Un-voiced signal. Those occurs at 3400-3900, 3900-5400, and 6300-6900, respectively.
Technical Presentation Page 13
Concepts
Quantization - why do we need such classification ?! (1)
Human Speech – Excursion & Recap (2)
It should be noted that:
The peak-to-peak amplitude of voiced signals is approximately ten times that of un-voiced
signal.
Un-voiced signals contain more information, and thus higher entropy than voiced signals.
The telephone system must provide higher resolution for lower amplitude signals
Probability of occurrence
Statistics of Speech Signals :
Amplitude of speech signals
Technical Presentation Page 14
[3]
[6]
Concepts
Quantization - why do we need such classification ?! - (2)
Quantization Noise
The Quantization process is lossy (errorneous).
An error defined as the difference between the input signal M and the output signal V. This
error E is called the Quantization Noise.
Consider the simple example:
M = (3.117, 4.56, 2.31, 7.82, 1)
V = (3,3,2,7,2)
E = M – V = (0.117 ,1.561, 0.31, 0.89, 1)
How do we calculate the noise power ?
Consider an input m of continuous amplitude of the range (-M_max, M_max)
Assume a uniform Quantizer, how do we get the Quantization Noise Power
1
Technical Presentation Page 15
Concepts
Quantization - why do we need such classification ?! - (3)
Comparison – Uniform Vs. Non-Uniform Usage
Speech signals doesn’t require high quantization resolution for
high amplitudes (50% Vs. 15%).
wasteful to use uniform quantizer ?
The goal is decrease the SQNR, more levels for low amplitudes, less levels for high ones.
Maybe use a Non-uniform quantizer ?
[3]
Technical Presentation Page 16
Concepts
Quantization
More About Non-Uniform Quantizers (Companding)
Uniform quantizer = use more levels when you need it.
The human ear follows a logarithmic process in which high amplitude sound doesn’t
require the same resolution as low amplitude sounds.
One way to achieve non-uniform quantization is to use what is called as “Companding”
Companding = “Compression + Expanding”
Compressor
Function
Uniform
Quantization
Expander
Function
(-1)
Technical Presentation Page 17
Concepts
Quantization
What is the purpose of a Compander ?
The purpose of a compander is to equalize the histogram of speech signals so that the
reconstruction levels tend to be equally used.
[6]
There are two famous companding techniques that Follow the
Encoding law
A-Law Companding
µ-Law Companding
Technical Presentation Page 18
[6]
2
Concepts
Quantization
A-Law Encoding
µ-Law Encoding
[3]
Technical Presentation Page 19
Concepts
Quantization
Companding Approximation
Logarithmic functions are slow to compute, why not approximate ?
3 bits, 8 segments ( chords ) to approximate
P is the sign bit of the output
S’s are the segment code
Q’s are the quantization codes
[3]
Technical Presentation Page 20
Concepts
Quantization
Companding Approximation – Algorithm
Encoding
Add a bias of 33 to the absolute value of the input sample
Determine the bit position of the most significant among bits 5 to 12 of the input
Subtract 5 from that position, and this is the Segment code
Finally, the 4 bit quantization code is set to 4 bits after the bit position of the most
significant among bits 5 to 12
Decoding
Multiply the quantization code by 2 and add 33 the bias to the result
Multiply to the result by 2 raised to the power of the segment code
Decrement the result by the bias
Use P – bit to determine the sign of the result
Example ?!
[3]
Technical Presentation Page 21
Concepts
Quantization
µ-Law Encoding - Example
Example Input - 656
P
S2
S1
S3
Q3
Q4
Q5
Q6
1
1
0
0
0
1
0
1
Sample is negative so bit P becomes 1
Add 33 to the absolute value to bias high input values (due to wrapping)
The result after adding is 689 = 0001-0101-10001
The most-significant 1 bit in position 5 to 12 is at position 9
Subtracting 5 from the position values yields 4 The segment code
Finally the 4 bits after the last position are inserted as the quantization code
Technical Presentation Page 22
Concepts
Quantization
µ-Law Decoding - Example
Example Input - 656
P
S2
S1
S3
Q3
Q4
Q5
Q6
1
1
0
0
0
1
0
1
The quantization code is 101 = 5, so 5*2 +33 =43
The segment code is 100 = 4 , so 43* 2^4 = 688.
Decrement the Bias 688 -33 =655
But P is 1 so the final result is -655
Quantization Noise is 1 (Very small)
Technical Presentation Page 23
Concepts
Quantization
µ-Law Encoding
Approximately linear for smaller values & Logarithmic for high input values
The practically used values for µ is 255
Used for speech signals
Used for PCM telephone systems in US, Canada and Japan
A-Law Encoding
Linear segments for low level inputs & a logarithmic segment for high level inputs
The practically used values for A is 100
Used for PCM telephone system in Europe
Technical Presentation Page 24
Concepts
Pulse Code Modulation (PCM)
PCM Description
Sampling results in PAM
PCM uniformly quantizes PAM
The result of PCM are PCM words
Each PCM word is l= Log2 (L) bits
[3]
Technical Presentation Page 25
Concepts
Differential Pulse Code Modulation (DPCM)
DPCM Description
Signals that are sampled at a high rate have high correlation.
The difference between those samples will not be large
Instead of quantizing each sample, why not quantize the difference ?
This will result in a quantizer with much less number of bits
[7]
This is a simple form where (First Order)
More than one signal can be used in the prediction (N-Order)
Problems with this approach ?
Technical Presentation Page 26
[7]
Concepts
Differential Pulse Code Modulation (DPCM)
DPCM Example
[7]
It is clear here from the table that the error adds up to produce an output signal which is
completely different from the original one
Technical Presentation Page 27
Concepts
Differential Pulse Code Modulation (DPCM)
DPCM Prediction
Previously, input to predictor in the encoder was different than the one in the decoder.
The difference between the predictor led to reconstruction error e(n) = x[n] – x’[n].
To solve this problem completely the same predictor that was used in the decoder will also
be used in the decoder
Channel
Therefore the reconstruction error at the decoder output will be the same as the
quantization error at the encoder.
There will be no quantization accumulation.
Technical Presentation Page 28
Concepts
Adaptive Differential Pulse Code Modulation (ADPCM)
ADPCM Description
As can be inferred from the name, ADPCM combines PCM + DPCM and adds the ADPCM
The “A” in ADPCM stands for “Adaptive”
In DPCM, the difference between x[k] and x[k-1] is transmitted instead of x[k]
To further reduce the number of bits per sample, ADPCM adapts the quantization levels to
the characteristics of the analog signal . Original 32-Kbps ADPCM used 4 bits
[9]
Technical Presentation Page 29
Standards, Examples & Applications
G711
G711 Description
A Wave form codec that was Released in 1972
Formal name is Pulse Code Modulation (PCM) since it uses PCM in it’s encoding
G711 achieves 64 kbps bit rate (8 kHz sampling frequency x 8 bits per sample)
G711 defines two main compression algorithms
A-Law (Used in North America & Japan)
µ-Law (Used in Europe and the rest of the world)
A and µ laws takes as an input 14-bit and 13-bit signed linear PCM samples and Compress
them to 8-bit samples
Applications
Public Switching Telephone Network (PSTN)
WiFi phones VoWLAN
Wideband IP Telephony
Audio & Video Conferencing
H.320 & H.323 specifications
Technical Presentation Page 30
Standards, Examples & Applications
G726
G726 Description
G726 makes a conversion of a 64 kbps A-law or µ-law PCM channel to and from a 40, 32, 24
or 16 kbps channel.
The conversion is applied to raw PCM using the ADPCM Encoding Technique
Different rates are achieved by adapting the number of quantization levels
4
- levels (2 bits and 16 kbps)
7
- levels (3 bits and 24 kbps)
15 - levels (4 bits and 32 kbps)
31 - levels (5 bits and 64 kbps)
Includes G721 and G723
[12]
Technical Presentation Page 31
Performance Comparison
[1]
Technical Presentation Page 32
Summary & Conclusion
Summary
We talked about quantization concepts in all it’s flavors
We discussed about the category of waveform coding (PCM,DPCM and ADPCM)
We presented the ITU Standards (G711 and G726) and mentioned some examples and
applications
Finally we did a comparison the most prominent speech codec's out there.
Conclusion
Speech coding Is an important concept that is required to efficiently use the existing
bandwidth
There exist many important metrics to keep in mind when doing speech coding. It is I
important for a good speech coder to balance those metrics. The Most important ones are
Data Rate
Speech Quality
Delay
Waveform codec's, achieves the best speech quality as well as low delays.
Vocoders achieves low data rate but at the cost of delays and speech quality
Hybrid coders achieves acceptable speech quality and acceptable delay and data rate.
Technical Presentation Page 33
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
Wai C. Chu Speech Coding Algorithms: Foundation & Evolution of Standardized Coders
Speech Coding: http://www-mobile.ecs.soton.ac.uk/speech_codecs/
Sklar: Digital Communication Fundamentals And Applications.
A-Law and mu-Law Companding Implementations Using the TMS320C54x
Michael Langer: Data Compression – Introduction to lossy compression
Signal Quantization and Compression Overview http://www.ee.ucla.edu/~dsplab/sqc/over.html
Wajih Abu-Al-Saud: Ch. VI Sampling & Pulse Code Mod. Lecture 25
Yuli You: Audio Coding: Theory And Applications
Tarmo Anttalainen: Introduction to telecommunication Networks Engineering
Wikipedia G711: http://en.wikipedia.org/wiki/G.711
David Salomon: Data Communication the Complete Reference
ITU CCIT Recommendation G.726 ADPCM
Technical Presentation Page 34
Questions & Discussion
Thank you!!
Technical Presentation Page 35