Introduction to Image and Video Coding Algorithms © 2002-2003 by Yu Hen Hu ECE533 Digital Image Processing.
Download ReportTranscript Introduction to Image and Video Coding Algorithms © 2002-2003 by Yu Hen Hu ECE533 Digital Image Processing.
Introduction to Image and Video Coding Algorithms © 2002-2003 by Yu Hen Hu ECE533 Digital Image Processing 1 Outline Transform-based Image and Video Coding Linear Transformation – DCT Quantization » Scalar Quantization » Vector Quantization Entropy Coding Video Coding – Motion Compensation © 2002-2003 by Yu Hen Hu ECE533 Digital Image Processing 2 Transform-based Image Coding Input Image © 2002-2003 by Yu Hen Hu Linear Transform Quantizatioin Entropy Coding ECE533 Digital Image Processing Binary bit stream 3 Linear Transform If the signal is formatted as a vector, a linear transform can be formulated as a matrixvector product that transform the signal into a different domain. Examples: » » » » K-L Expansion Discrete Fourier Transform Discrete cosine transform Discrete wavelet transform © 2002-2003 by Yu Hen Hu Energy compaction property: The transformed signal vector has few, large coefficients and many nearly zero small coefficients. These few large coefficients can be encoded efficiently with few bits while retaining the majority of energy of the original signal. ECE533 Digital Image Processing 4 Block-based Image Coding An image is a 2D signal of pixel intensities (including colors). A block-based image coding scheme partitions the entire image into 8 by 8 or 16 by 16 (or other size) blocks. Then the coding algorithm is applied to individual blocks independently. © 2002-2003 by Yu Hen Hu Blocks may be overlapping or nonoverlapping. Advantage: parallel processing can be applied to process individual blocks in parallel. For hand-held devices, only one block needs be loaded into main memory each time. ECE533 Digital Image Processing 5 JPEG Image Coding Algorithms 8x8 block Quantization Matrix DC DPCM DCT DC Huffman Q AC Huffman Zig Zag Scan AC Code books JPEG Encoding Process © 2002-2003 by Yu Hen Hu ECE533 Digital Image Processing 6 JPEG Decoding DC Huffman IDPCM DC IQ AC Huffman IDCT 8x8 block AC JPEG Decoding Process © 2002-2003 by Yu Hen Hu ECE533 Digital Image Processing 7 Pre-Processing Color sub-sampling » A color image is converted from RGB to YUV color space. Each pixel in each dimension is 1 byte. » Sub-sample U-V planes: 4:1:1 scheme. » For every 16 by 16 block of a color image, six 8 by 8 blocks are encoded. Level shifting: Each pixel value is subtracted by 128 so it ranges (–128, 127). © 2002-2003 by Yu Hen Hu Four 88 blocks of luminance pixels, plus two 88 sub-sampled chrominance components makes a 16 by 16 macro-block ECE533 Digital Image Processing 8 Discrete Cosine Transform 88 two-dimensional separable DCT: 1 7 7 f (m, n) 4 m 0 n 0 F (u, v) 7 7 1 f (m, n) cos (2m 1)u cos (2n 1)v 16 16 8 m 0 m 0 u v 0; 0 u, v 7; u v 0. DCT is chosen because it leads to superior energy compaction for natural images. F(0,0): DC coefficient ranges (-128x64/4,127x16) needs 12 bits to represent (including sign bit). 12 bits are more than enough for the remaining AC coefficients (u > 0, or v > 0) © 2002-2003 by Yu Hen Hu ECE533 Digital Image Processing 9 Inverse DCT (IDCT) 88 two-dimensional separable IDCT: (2m 1)u (2n 1)v 1 7 7 F ( u , v ) cos cos 4 16 16 u 0 v 0 f (m, n) 7 7 (2m 1)u (2n 1)v 1 F ( u , v ) cos cos 8 16 16 u 0 v 0 m n 0; 0 m, n 7; m n 0. IDCT can be computed using the same routine as DCT © 2002-2003 by Yu Hen Hu ECE533 Digital Image Processing 10 DCT Basis Functions © 2002-2003 by Yu Hen Hu ECE533 Digital Image Processing 11 Quantization of DCT Coefficients © 2002-2003 by Yu Hen Hu ECE533 Digital Image Processing 12 DPCM of DC coefficients DC coding: All DC coefficients of each 8 by 8 blocks of the entire image are combined to make a sequence of DC coefficients. Next, DPCM is applied: DiffDC(blocki) = DC(blocki) – DC(blocki–1) Then DiffDCs will be encoded using Hoffman entropy © 2002-2003 by Yu Hen Hu 1216 1232 1224 1248 1248 1208 Example: Original: 1216 1232 1224 1248 1248 1208 After DPCM: 1216 +16 -8 +24 0 -40 ECE533 Digital Image Processing 13 Huffman Entropy Coding Entropy coding: » Task: to assign a variable-length binary code to a finite set of alphabets. » Goal: to minimize the average length (number of bits) per alphabet. » Approach: Shorter code for alphabet occurred more frequently. Longer for infrequent ones. © 2002-2003 by Yu Hen Hu Optimal solution: » When the averaged code length approaches the entropy of the source. Huffman coding: » Code words are derived from a (perhaps) unbalanced binary tree. Arithmetic coding is another entropy coding method. ECE533 Digital Image Processing 14 Huffman Encoding of DC Coefficients Encoding and decoding of Huffman code is done via look-up table. In JPEG, DC coefficients (after DPCM) are first grouped according to their magnitudes. Each category is assigned as a symbol and a Hoffman table is given. For example, –7 to –4 and 4 to 7 are listed as category 3 which has a code "00“. © 2002-2003 by Yu Hen Hu If the number is positive, the binary representation of the number will be append to the Hoffman code of the category number directly. For example, 6 is encoded as 00 110. If the number is negative, the appended code is the 1’s complement of that number. For example, -5 is encoded as 00 010. Question: Given such a table, how to devise a dedicated hardware to implement the encoding procedure? ECE533 Digital Image Processing 15 JPEG Huffman Table: Categories © 2002-2003 by Yu Hen Hu ECE533 Digital Image Processing 16 JPEG DC Entropy Coding Example: » -9: category 4. Hence Base code = 101 » 1’s complement of (-9) = 1C(1001) = 0110 » Code word = 101 + 0110 = 1010110 Note that category 3 occurs most frequent and hence has shortest base code word. © 2002-2003 by Yu Hen Hu ECE533 Digital Image Processing 17 AC Coefficients AC coefficients are first weighted with a quantization matrix: C(i,j)/q(i,j) = Cq(i,j) Then quantized. Then they are scanned in a zig-zag order into a 1D sequence to be subject to AC Huffman encoding. Question: Given a 8 by 8 array, how to convert it into a vector according to the zigzag scan order? What is the algorithm? © 2002-2003 by Yu Hen Hu 1 2 6 7 15 16 28 29 3 5 8 14 17 27 30 43 4 9 13 18 26 31 42 44 10 12 19 25 32 41 45 54 11 20 24 33 40 46 53 55 21 23 34 39 47 52 56 61 22 35 38 48 51 57 60 62 36 37 49 50 58 59 63 64 Zig-Zag scan order ECE533 Digital Image Processing 18 AC Coefficients Huffman Encoding The symbols for encoding AC coefficient consists both the number of significant bits, as well as runs of 0s preceding the nonzero AC coefficient. For example, 5 0 2 0 0 –1 is encoded as: 100101 11100110 110110 This is according to the table below: © 2002-2003 by Yu Hen Hu Number Run/Category Base code Length Final code 5 0/3 100 6 100 101 02 1/2 111001 8 111001 10 00-1 2/1 11011 6 11011 0 ECE533 Digital Image Processing 19 Huffman Decoding A look-up table procedure. Challenge: How to perform decoding fast? Example: a Huffman table for six symbols: Symbol Codeword A 0 © 2002-2003 by Yu Hen Hu B 10 C 1100 D 1101 E 1110 F 1111 The decoding process can be modeled as a finite state machine with the following state diagram. It decodes one bit of input bit stream per clock cycle. d 0/C,1/D 0/A 0/a 1/0/B b 1/- 1/0/E,1/F c e Question: How to make this process fast enough to match any input bit rate? ECE533 Digital Image Processing 20 Video Coding Video coding is often implemented as encoding a sequence of images. Motion compensation is used to exploit temporal redundancy between successive frames. Examples: MPEG-I, MPEG-II, MPEG-IV, H.323, H.263, H.263+, etc. Existing video coding standards are based on JPEG image compression as well as motion compensation. © 2002-2003 by Yu Hen Hu ECE533 Digital Image Processing 21 MPEG Encoding Buffer control Current frame x(t) + r DCT VLC Q Q-1 IDCT ^ x(t): predicted frame x(t) Motion Estimation & Compensation MV MEx(t ), ~ x (t 1) xˆ (t ) MC~ x (t 1), MV ~ x (t ) xˆ (t ) Q[r (t )] x(t ) xˆ (t ) r (t ) Q[r(t)]: reconstructed residue + ~ x(t-1) Frame Buffer Motion vectors © 2002-2003 by Yu Hen Hu Bit stream Buffer ~ x(t): reconstructed current frame This is a simplified block diagram where the encoding of intra coded frames is not shown. ECE533 Digital Image Processing 22 MPEG Decoding VLD: Variable Length Decoding Received bit stream Bit stream Buffer VLD Q-1 xˆ (t ) MC~ x (t 1), MV ~ x (t ) xˆ (t ) Q[r (t )] IDCT ^ x(t): predicted frame Motion Compensation Q[r(t)]: reconstructed residue + ~ x(t-1) Frame Buffer ~ x(t): reconstructed current frame Motion vectors © 2002-2003 by Yu Hen Hu ECE533 Digital Image Processing 23 Motion Estimation Three types of frames: » Intra (I): the frame is coded as if it is an image » Predicted (P): predicted from an I or P frame » Bi-directional (B): forward and backward predicted from a pair of I or P frames. A typical frame arrangement is (subscripts are used to distinguish them): I1 B1 B2 P1 B3 B4 P2 B5 B6 I2 P1, P2 are both forward-predicted from I1. B1, B2 are interpolated from I1 and P1, B3, B4 are interpolated from P1, P2, and B5, B6 are interpolated from P2, I2. © 2002-2003 by Yu Hen Hu ECE533 Digital Image Processing 24 Forward Motion Estimation 1 2 3 4 5 6 7 8 5 9 10 11 12 9 16 13 13 14 15 Current frame xˆ (t ) constructed From different parts of reference frame © 2002-2003 by Yu Hen Hu 1 2 6 4 3 7 11 10 14 8 12 15 16 x (t 1) Reference frame ~ ECE533 Digital Image Processing 25 Block Motion Estimation MAD: Mean absolute difference between the I,jth pixel of the current block x(i,j) and the (I+m,j+n)-th pixel of the reference frame. MV argmin MAD(m, n) (-pm,n p) is the motion vector corresponding to the macro-block. M and N are search range. It is similar to DPCM in the temporal domain, and has less to do with object motion. © 2002-2003 by Yu Hen Hu motion vector current block search area current frame reference frame 1 MAD(m, n) 2 N N 1 N 1 x(i, j) y(i m, j n) i 0 j 0 ECE533 Digital Image Processing 26 Video sequence : Tennis frame 0 previous frame 50 100 150 200 50 100 150 200 250 300 350 Prepared by Surin Kittitornkun © 2002-2003 by Yu Hen Hu ECE533 Digital Image Processing 27 Video sequence : Tennis frame 1 current frame 50 100 150 200 50 100 150 200 250 300 350 Prepared by Surin Kittitornkun © 2002-2003 by Yu Hen Hu ECE533 Digital Image Processing 28 Frame Difference Frame Difference :frame 0 and 1 Prepared by Surin Kittitornkun © 2002-2003 by Yu Hen Hu ECE533 Digital Image Processing 29 What is motion estimation? Motion Vector Field of frame 1 50 0 -50 -100 -150 -200 -250 0 50 100 150 200 250 300 350 400 Prepared by Surin Kittitornkun © 2002-2003 by Yu Hen Hu ECE533 Digital Image Processing 30 What is motion compensation ? Motion compensated frame 50 100 150 200 50 100 150 200 250 300 350 Prepared by Surin Kittitornkun © 2002-2003 by Yu Hen Hu ECE533 Digital Image Processing 31 Motion Compensated Frame Difference Motion Compensated Frame Difference :frame 0 and 1 Prepared by Surin Kittitornkun © 2002-2003 by Yu Hen Hu ECE533 Digital Image Processing 32 6-Level Nested Do Loop Do h=0 to Nh-1 Do v=0 to Nv-1 MV(h,v)=(0,0) Dmin(h,v)= Do m=-p to p (-1) Do n=-p to p (-1) MAD(m,n)=0 Do i=hN to hN+N-1 Do j=vN to vN+N-1 MAD(m,n)= MAD(m,n) +|x(i,j)y(i+m,j+n)| © 2002-2003 by Yu Hen Hu End do j End do i If Dmin(h,v)>MAD(m,n) Dmin(h,v)=MAD(m,n) MV(h,v)=(m,n) End if End do n End do m End do v End do h ECE533 Digital Image Processing 33