Image and Video Compression • Two dimensional array of pixel values • Spatial redundancy and temporal redundancy • Human eye is less sensitive.
Download
Report
Transcript Image and Video Compression • Two dimensional array of pixel values • Spatial redundancy and temporal redundancy • Human eye is less sensitive.
Image and Video Compression
• Two dimensional array of pixel values
• Spatial redundancy and temporal redundancy
• Human eye is less sensitive to chrominance signal than to
luminance signal (U and V can be coarsely coded)
• Human eye is less sensitive to the higher spatial frequency
components
• Human eye is less sensitive to quantizing distortion at high
luminance levels
JPEG Encoder
• International standards body -- Joint Photographic Experts
Group
• JPEG encoder schematic
• Image/block preparation
• DCT computation
• Quantization
• Entropy coding -- vectoring, differential encoding, runlength encoding, Huffman encoding
• Frame building
Image/block Preparation
• Source image as 2-D matrix of pixel values
• R, G, B format requires three matrices, one each for R, G,
B quantized values
• In Y, U, V representation, the U and V matrices can be half
as small as the Y matrix
• Source image matrix is divided into blocks of 8X8
submatrices
• Smaller block size helps DCT computation and individual
blocks are sequentially fed to the DCT which transforms
each block separately
DCT Computation
• Each pixel value in the 2-D matrix is quantized using 8 bits which
produces a value in the range of 0 to 255 for the intensity/luminance
values and the range of -128 to + 127 for the chrominance values. All
values are shifted to the range of -128 to + 127 before computing DCT
• All 64 values in the input matrix contribute to each entry in the
transformed matrix
• The value in the location F[0,0] of the transformed matrix is called the
DC coefficient and is the average of all 64 values in the matrix
• The other 63 values are called the AC coefficients and have a
frequency coefficient associated with them
• Spatial frequency coefficients increase as we move from left to right
(horizontally) or from top to bottom (vertically). Low spatial
frequencies are clustered in the left top corner.
Quantization
• The human eye responds to the DC coefficient and the
lower spatial frequency coefficients
• If the magnitude of a higher frequency coefficient is below
a certain threshold, the eye will not detect it
• Set the frequency coefficients in the transformed matrix
whose amplitudes are less than a defined threshold to zero
(these coefficients cannot be recovered during decoding)
• During quantization, the size of the DC and AC
coefficients are reduced
• A division operation is performed using the predefined
threshold value as the divisor
Quantization Table
• Threshold values vary for each of the 64 DCT coefficients
and are held in a 2-D matrix
• Trade off between the level of compression required and
the information loss that is acceptable
• JPEG standard includes two default quantization tables -one for the luminance coefficients and the other for use
with the two sets of chrominance coefficients. Customized
tables may be used
Entropy Coding
• Vectoring -- 2-D matrix of quantized DCT coefficients are
represented in the form of a single-dimensional vector
• After quantization, most of the high frequency
coefficients(lower right corner) are zero.
• To exploit the number of zeros, a zig-zag scan of the
matrix is used
• Zig-zag scan allows all the DC coefficients and lower
frequency AC coefficients to be scanned first
• DC are encoded using differential encoding and AC
coefficients are encoded using run-length encoding.
Huffman coding is used to encode both after that.
Differential Encoding
• DC coefficient is the largest in the transformed matrix.
• DC coefficient varies slowly from one block to the next.
• Only the difference in value of the DC coefficients is
encoded. Number of bits required to encode is reduced.
• The difference values are encoded in the form (SSS, value)
where SSS field indicates the number of bits needed to
encode the value and the value field indicates the binary
form.
Run-length Encoding
• 63 values of the AC coefficients
• Long strings of zeros because of the zig-zag scan
• Each AC coefficient encoded as a pair of values -- (skip,
value), skip indicates the number of zeros in the run and
value is the next non-zero coefficient
Huffman Encoding
• Long strings of binary digits replaced by shorter
codewords
• Prefix property of the huffman codewords enable decoding
the encoded bitstream unambiguously
Frame Building
• Encapsulates the information relating to an encoded image
Video Compression
• Video as a sequence of pictures (or frames)
• JPEG algorithm applied to each frame -- moving JPEG
(MJPEG). Exploits only spatial redundancy.
• High correlation between successive frames. Only small
portion of each frame is involved with any motion that is
taking place.
• A combination of actual frame contents and predicted
frame contents are used.
• Motion estimation and motion compensation
Frame/Picture Types
• Interframe and intraframe coding. High compression ratios can be
achieved by using both. Random access requirement of image retrieval
is satisfied by pure intraframe coding.
• I-frames are coded without reference to other frames. Serve as
reference pictures for predictive-coded frames.
• P-frames are coded using motion compensated prediction from a past
I-frame or P-frame.
• B-frames are bidirectionally predictive-coded. Highest degree of
compression, but require both past and future reference pictures for
motion compensation.
• D-frames are DC-coded. Of the DCT coefficients only the DC
coefficients are present. Used in interactive applications like VoD for
rewind and fast-forward operations.
Picture Sequence
• I B B P B B P B B I (display order)
• Bitstream order -- I P B B P B B P B B I
• Prediction span, Group of Pictures (GOP)
MPEG-video Encoding
• Input frames are preprocessed (color space conversion and spatial
resolution adjustment).
• Frame types are decided for each frame/picture
• Each picture is divided into macroblocks of 16 X 16 pixels.
• Macroblocks are intracoded for I frames and predictive coded or
intracoded for P and B frames
• Macroblocks are divided into six blocks of 8 X 8 pixels (4 luminance
and 2 chrominance) and DCT is applied to each block and transform
coefficients are quantized and zig-zag scanned and variable-length
coded.