A Tutorial on MPEG/Audio Compression

Download Report

Transcript A Tutorial on MPEG/Audio Compression

A Tutorial on MPEG/Audio Compression

Davis Pan,

IEEE Multimedia Journal

, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004

Outline  Introduction  Technical Overview  Polyphase Filter Bank  Psychoacoustic Model  Coding and Bit Allocation  Conclusions and Future Work

Introduction  What does MPEG-1 Audio provide?

A transparently lossy audio compression system based on the weaknesses of the human ear.

 Can provide compression by a factor of 6 and retain sound quality.

 One part of a three part standard that includes audio, video, and audio/video synchronization.

Technical Overview

MPEG-I Audio Features  PCM sampling rate of 32, 44.1, or 48 kHz  Four channel modes:   Monophonic and Dual-monophonic Stereo and Joint-stereo  Three modes (layers in MPEG-I speak):    Layer I: Computationally cheapest, bit rates > 128kbps Layer II: Bit rate ~ 128 kbps, used in VCD Layer III: Most complicated encoding/decoding, bit rates ~ 64kbps, originally intended for streaming audio

Human Audio System (ear + brain)  Human sensitivity to sound is non-linear across audible range (20Hz – 20kHz)  Audible range broken into regions where humans cannot perceive a difference  called the

critical bands

MPEG-I Encoder Architecture [1]

MPEG-I Encoder Architecture 

Polyphase Filter Bank

: Transforms PCM samples to frequency domain signals in 32 subbands 

Psychoacoustic Model:

Calculates acoustically irrelevant parts of signal 

Bit Allocator:

Allots bits to subbands according to input from psychoacoustic calculation.

Frame Creation:

Generates an MPEG-I compliant bit stream.

The Polyphase Filter Bank

Polyphase Filter Bank  Divides audio signal into 32 equal width subband streams in the frequency domain.

 Inverse filter at decoder cannot recover signal without some, albeit inaudible, loss.

 Based on work by Rothweiler [

2

].

 Standard specifies 512 coefficient analysis window, C[n]

Polyphase Filter Bank   Buffer of 512 PCM samples with 32 new samples, X[n], shifted in every computation cycle Calculate window samples for i=0…511:

Z

[

i

] 

C

[

i

] 

X

[

i

]  Partial calculation for i=0…63:  Calculate 32 subsamples:

Y

[

i

] 

j

7   0

Z

[

i

 64

j

]

S

[

i

] 

k

63   0

Y

[

i

] 

M

[

i

][

k

]

Polyphase Filter Bank  Visualization of the filter [1] :

Polyphase Filter Bank  The net effect:  Analysis matrix:

S

[

i

] 

k

63   0

M

[

i

][

k

]

j

7   0

C

[

i

 64

j

]

X

[

i

 64

j

]

M

[

i

][

k

]  cos ( 2

i

 1 )(

k

 16 )  64   Requires 512 + 32x64 = 2560 multiplies.

Each subband has bandwidth π/32T centered at odd multiples of π/64T

Polyphase Filter Bank  Shortcomings:  Equal width filters do not correspond with critical band model of auditory system.

 Filter bank and its inverse are NOT lossless.

 Frequency overlap between subbands.

Polyphase Filter Bank  Comparison of filter banks and critical bands [1]:

Polyphase Filter Bank  Frequency response of one subband [1] :

Psychoacoustic Model

The Weakness of the Human Ear  Frequency dependent resolution:  We do not have the ability to discern minute differences in frequency within the critical bands.

 Auditory masking:  When two signals of very close frequency are both present, the louder will mask the softer.

 A masked signal must be louder than some threshold for it to be heard  gives us room to introduce inaudible quantization noise.

MPEG-I Psychoacoustic Models  MPEG-I standard defines two models:  Psychoacoustic Model 1:  Less computationally expensive  Makes some serious compromises in what it assumes a listener cannot hear  Psychoacoustic Model 2:  Provides more features suited for Layer III coding, assuming of course, increased processor bandwidth.

Psychoacoustic Model  Convert samples to frequency domain  Use a Hann weighting and then a DFT  Simply gives an edge artifact (from finite window size) free frequency domain representation.

 Model 1 uses 512 (Layer I) or 1024 (Layers II and III) sample window.

 Model 2 uses a 1024 sample window and two calculations per frame.

Psychoacoustic Model  Need to separate sound into “tones” and “noise” components  Model 1:  Local peaks are tones, lump remaining spectrum per critical band into noise at a representative frequency.

 Model 2:  Calculate “tonality” index to determine likelihood of each spectral point being a tone  based on previous two analysis windows

Psychoacoustic Model  “Smear” each signal within its critical band  Use either a masking (Model 1) or a spreading function (Model 2).

 Adjust calculated threshold by incorporating a “quiet” mask – masking threshold for each frequency when no other frequencies are present.

Psychoacoustic Model    Calculate a masking threshold for each subband in the polyphase filter bank Model 1:   Selects minima of masking threshold values in range of each subband Inaccurate at higher frequencies – recall how subbands are linearly distributed, critical bands are NOT!

Model 2:   If subband wider than critical band:  Use minimal masking threshold in subband If critical band wider than subband:  Use average masking threshold in subband

Psychoacoustic Model  The hard work is done – now, we just calculate the signal-to-mask ratio (SMR) per subband  SMR = signal energy / masking threshold  We pass our result on to the coding unit which can now produce a compressed bitstream

Psychoacoustic Model (example)  Input [1] :

Psychoacoustic Model (example)  Transformation to perceptual domain [1] :

Psychoacoustic Model (example)  Calculation of masking thresholds [1] :

Psychoacoustic Model (example)  Signal-to-mask ratios [1] :

Psychoacoustic Model (example)  What we actually send [1] :

Coding and Bit Allocation

Layer Specific Coding  Layer specific frame formats [1] :

Layer Specific Coding  Stream of samples is processed in groups [1] :

Layer I Coding  Group 12 samples from each subband and encode them in each frame (=384 samples)  Each group encoded with 0-15 bits/sample  Each group has 6-bit scale factor

Layer II Coding  Similar to Layer I except:  Groups are now 3 of 12 samples per-subband = 1152 samples per frame  Can have up to 3 scale factors per subband to avoid audible distortion in special cases  Called scale factor selection information (SCFSI)

Layer III Coding  Further subdivides subbands using Modified Discrete Cosine Transform (MDCT) – a lossless transform  Larger frequency resolution => smaller time resolution  possibility of pre-echo  Layer III encoder can detect and reduce pre-echo by “borrowing bits” from future encodings

Bit Allocation  Determine number of bits to allot for each subband given SMR from psychoacoustic model.

 Layers I and II:   Calculate mask-to-noise ratio:  MNR = SNR – SMR (in dB)  SNR given by MPEG-I standard (as function of quantization levels) Now iterate until no bits to allocate left:  Allocate bits to subband with lowest MNR.

 Re-calculate MNR for subband allocated more bits.

Bit Allocation  Layer III:  Employs “noise allocation”  Quantizes each spectral value and employs Huffman coding  If Huffman encoding results in noise in excess of allowed distortion for a subband, encoder increases resolution on that subband  Whole process repeats until one of three specified stop conditions is met.

Conclusions and Future Work

Conclusions  MPEG-I provides tremendous compression for relatively cheap computation.

 Not suitable for archival or audiophile grade music as very seasoned listeners can discern distortion.

 Modifying or searching MPEG-I content requires decompression and is not cheap!

Future Work      MPEG-1 audio lays the foundation for all modern audio compression techniques Lots of progress since then (1994!) MPEG-2 (1996) extends MPEG audio compression to support 5.1 channel audio MPEG-4 (1998) attempts to code based on perceived audio objects in the stream Finally, MPEG-7 (2001) operates at an even higher level of abstraction, focusing on meta-data coding to make content searchable and retrievable

References [1] D. Pan, “A Tutorial on MPEG/Audio Compression”,

IEEE Multimedia Journal,

1995.

[2] J. H. Rothweiler, “Polyphase Quadrature Filters – a New Subband Coding Technique”, Proc of the Int. Conf. IEEE ASSP, 27.2, pp1280-1283, Boston 1983.