MPEG-1 - Demokritos

Download Report

Transcript MPEG-1 - Demokritos

MPEG-1
MUMT-614
Jan.23, 2002
Wes Hatch
Purpose of MPEG encoding
• To decrease data rate
• How?
– two choices:
• could decrease sample rate, but this would cause a
decrease in available BW
• or, we could decrease word size. This will introduce
noise into the signal (lower S:N ratio).
• Solution:
– perceptual codingreduce word size based on
signal conditions
Quick Overview
• MPEG removes “irrelevancy” & statistical
redundancy
• lossy (but not perceptibly so)
• 1.41 Mbps (cd audio)  between 64 and
448 kbps. (95% to 68% reduction)
• ratios of 4:1, 6:1 can be transparent in
advanced listening tests
• supports 32, 44.1, 48 kHz sampling rates
MPEG-1 Types
• 3 layers: I, II, and III
– I is simplest  III is most complex
– a layer can play encodings of those beneath it
• eg. Layer III can play I, II, and III; layer II may only
play I and II
Components
• There are two “components” to MPEG-1:
the encoder and the decoder.
– the decoder is what is actually described under
the specification; the encoder is not.
– improvements to the encoder will have
immediate effects in quality without
necessitating corresponding changes to the
decoder
Encoder vs. Decoder
• Encoder
– does all the work
– forward adaptive encoding
• all allocation of bits is performed by the encoder
• the psychoacoustic model used to determine
“irrelevant” data is contained here
• improving psychoacoustic models/changes to
encoder doesn’t require changing the decoder
• Decoder does less work
Encoder Details
• audio (PCM) passes through a polyphase
filter bank, splitting the signal into 32 bands
– filter outputs one sample per band for every 32
samples in
• layer I: after each band gets 12 samples the decoder
determines the bit allocation for that band
• layer II: operates on 12 x 3= 36 samples per band
(larger frame). Lower bands may receive: 15 bits,
middle: 7 bits, and high: 3 bits max
• layer III is different…we’ll come back to it.
Encoder Details
• FFT is performed (w/Hann window)
– 512 point for layer I
– 1024 point for layer II
• a psychoacoustic model compares the
output and is used to calculate masking
thresholds
• used to determine which are the audible
components (ie. SMR)
More details...
How bits are allocated
• data in the band is
coded, NOT the FFT
data.
• more “audible”
components (ie. those
highest above the
masking threshold) are
assigned the most bits
Encoder Details
• Scale factor is calculated
– largest sample value in the band for each frame
is found. Each of the 12 samples in the band are
divided by this factor
– layer II has 3 scale factors (for 3 groups of 12
samples), but one may suffice if the differences
are small
• Corresponds to max. SPL in each band
Encoder Schematic
Encoder Details (layer III)
• layer III:
– each band is transformed into 18 spectral
coefficients with a MDCT (50% overlap)
• gives 576 coefficients, each representing a BW of
41.67 at 48 kHz24ms
– window size of the MDCT is variable
• long window for steady state signals (36 samples) to
small windows for transient (12 samples)
Encoder Details (layer III)
• framerate varies in layer III
• can also use a bit reservoir for if more
accuracy is needed
• Huffman encoding employed
Encoder
• a portion of the data stream is consumed by
coding info:
–
–
–
–
headers
bit allocation info
scale factors
samples from each band
Other Features
• stereo joint coding
– stereophonic irrelevance/redundancy eliminated
– sum and difference signals (layer III)
– L/R high frequency band samples summed into
one channel, but scale factors remain
independent
Decoder Details
• Put signal back together:
– decode bit allocation info
– samples multiplied by scale factors and run
through an inverse filterbank
– delays typically range from 10 to 30ms
Decoder schematic
Summary
• Split signal into 32 bands
• determine max. SPL levels for each band
• FFT to calculate masking thresholds
– determine global masking curve
• calculate SMR for each band and assign bits
accordingly