Introduction to Image and Video Coding Algorithms © 2002-2003 by Yu Hen Hu ECE533 Digital Image Processing.

Download Report

Transcript Introduction to Image and Video Coding Algorithms © 2002-2003 by Yu Hen Hu ECE533 Digital Image Processing.

Introduction to Image and
Video Coding Algorithms
© 2002-2003 by Yu Hen Hu
ECE533 Digital Image Processing
1
Outline



Transform-based Image and Video Coding
Linear Transformation – DCT
Quantization
» Scalar Quantization
» Vector Quantization


Entropy Coding
Video Coding – Motion Compensation
© 2002-2003 by Yu Hen Hu
ECE533 Digital Image Processing
2
Transform-based Image
Coding
Input
Image
© 2002-2003 by Yu Hen Hu
Linear
Transform
Quantizatioin
Entropy
Coding
ECE533 Digital Image Processing
Binary bit
stream
3
Linear Transform


If the signal is formatted as a
vector, a linear transform can
be formulated as a matrixvector product that transform
the signal into a different
domain.
Examples:
»
»
»
»
K-L Expansion
Discrete Fourier Transform
Discrete cosine transform
Discrete wavelet transform
© 2002-2003 by Yu Hen Hu

Energy compaction property:
The transformed signal
vector has few, large
coefficients and many nearly
zero small coefficients.
These few large coefficients
can be encoded efficiently
with few bits while retaining
the majority of energy of the
original signal.
ECE533 Digital Image Processing
4
Block-based Image Coding


An image is a 2D signal
of pixel intensities
(including colors).
A block-based image
coding scheme
partitions the entire
image into 8 by 8 or 16
by 16 (or other size)
blocks. Then the coding
algorithm is applied to
individual blocks
independently.
© 2002-2003 by Yu Hen Hu


Blocks may be
overlapping or nonoverlapping.
Advantage: parallel
processing can be
applied to process
individual blocks in
parallel. For hand-held
devices, only one block
needs be loaded into
main memory each
time.
ECE533 Digital Image Processing
5
JPEG Image Coding Algorithms
8x8
block
Quantization
Matrix DC DPCM
DCT
DC
Huffman
Q
AC
Huffman
Zig Zag
Scan
AC
Code books
JPEG Encoding Process
© 2002-2003 by Yu Hen Hu
ECE533 Digital Image Processing
6
JPEG Decoding
DC
Huffman
IDPCM
DC
IQ
AC
Huffman
IDCT
8x8
block
AC
JPEG Decoding Process
© 2002-2003 by Yu Hen Hu
ECE533 Digital Image Processing
7
Pre-Processing

Color sub-sampling
» A color image is converted
from RGB to YUV color
space. Each pixel in each
dimension is 1 byte.
» Sub-sample U-V planes:
4:1:1 scheme.
» For every 16 by 16 block of
a color image, six 8 by 8
blocks are encoded.

Level shifting: Each pixel
value is subtracted by 128 so
it ranges (–128, 127).
© 2002-2003 by Yu Hen Hu
Four 88 blocks of luminance
pixels, plus two 88 sub-sampled
chrominance components makes a
16 by 16 macro-block
ECE533 Digital Image Processing
8
Discrete Cosine Transform

88 two-dimensional separable DCT:
1 7 7

f (m, n)



4 m 0 n 0
F (u, v)   7 7
 1  f (m, n) cos (2m  1)u cos (2n  1)v

16
16
 8 m 0 m 0


u  v  0;
0  u, v  7; u  v  0.
DCT is chosen because it leads to superior energy compaction
for natural images.
F(0,0): DC coefficient ranges (-128x64/4,127x16) needs 12 bits
to represent (including sign bit). 12 bits are more than enough for
the remaining AC coefficients (u > 0, or v > 0)
© 2002-2003 by Yu Hen Hu
ECE533 Digital Image Processing
9
Inverse DCT (IDCT)

88 two-dimensional separable IDCT:
(2m  1)u
(2n  1)v
1 7 7
F
(
u
,
v
)
cos
cos
 4 
16
16
 u 0 v 0
f (m, n)   7 7
(2m  1)u
(2n  1)v
1
F
(
u
,
v
)
cos
cos
 8 
16
16
u 0 v 0

m  n  0;
0  m, n  7; m  n  0.
IDCT can be computed using the same routine as
DCT
© 2002-2003 by Yu Hen Hu
ECE533 Digital Image Processing
10
DCT Basis Functions
© 2002-2003 by Yu Hen Hu
ECE533 Digital Image Processing
11
Quantization of DCT
Coefficients
© 2002-2003 by Yu Hen Hu
ECE533 Digital Image Processing
12
DPCM of DC coefficients


DC coding: All DC
coefficients of each 8 by 8
blocks of the entire image
are combined to make a
sequence of DC coefficients.
Next, DPCM is applied:
DiffDC(blocki) = DC(blocki) –
DC(blocki–1)

Then DiffDCs will be
encoded using Hoffman
entropy
© 2002-2003 by Yu Hen Hu
1216
1232
1224
1248
1248
1208
Example:
 Original:
1216  1232  1224 
1248  1248  1208

After DPCM:
1216  +16  -8  +24  0
 -40
ECE533 Digital Image Processing
13
Huffman Entropy Coding

Entropy coding:

» Task: to assign a
variable-length binary
code to a finite set of
alphabets.
» Goal: to minimize the
average length (number
of bits) per alphabet.
» Approach: Shorter code
for alphabet occurred
more frequently. Longer
for infrequent ones.
© 2002-2003 by Yu Hen Hu
Optimal solution:
» When the averaged code
length approaches the
entropy of the source.

Huffman coding:
» Code words are derived
from a (perhaps) unbalanced binary tree.

Arithmetic coding is
another entropy coding
method.
ECE533 Digital Image Processing
14
Huffman Encoding of DC Coefficients


Encoding and decoding of
Huffman code is done via
look-up table.
In JPEG, DC coefficients
(after DPCM) are first
grouped according to their
magnitudes. Each category
is assigned as a symbol and
a Hoffman table is given. For
example, –7 to –4 and 4 to 7
are listed as category 3
which has a code "00“.
© 2002-2003 by Yu Hen Hu


If the number is positive, the
binary representation of the
number will be append to the
Hoffman code of the category
number directly. For example, 6
is encoded as 00 110. If the
number is negative, the
appended code is the 1’s
complement of that number.
For example, -5 is encoded as
00 010.
Question: Given such a table,
how to devise a dedicated
hardware to implement the
encoding procedure?
ECE533 Digital Image Processing
15
JPEG Huffman Table: Categories
© 2002-2003 by Yu Hen Hu
ECE533 Digital Image Processing
16
JPEG DC Entropy Coding

Example:
» -9: category 4. Hence Base code = 101
» 1’s complement of (-9) = 1C(1001) = 0110
» Code word = 101 + 0110 = 1010110

Note that category 3 occurs most frequent and hence
has shortest base code word.
© 2002-2003 by Yu Hen Hu
ECE533 Digital Image Processing
17
AC Coefficients

AC coefficients are first
weighted with a quantization
matrix:
C(i,j)/q(i,j) = Cq(i,j)
Then quantized.


Then they are scanned in a
zig-zag order into a 1D
sequence to be subject to
AC Huffman encoding.
Question: Given a 8 by 8
array, how to convert it into a
vector according to the zigzag scan order? What is the
algorithm?
© 2002-2003 by Yu Hen Hu
1
2
6
7
15
16
28
29
3
5
8
14
17
27
30
43
4
9
13
18
26
31
42
44
10
12
19
25
32
41
45
54
11
20
24
33
40
46
53
55
21
23
34
39
47
52
56
61
22
35
38
48
51
57
60
62
36
37
49
50
58
59
63
64
Zig-Zag scan order
ECE533 Digital Image Processing
18
AC Coefficients Huffman Encoding



The symbols for encoding AC coefficient consists both
the number of significant bits, as well as runs of 0s
preceding the nonzero AC coefficient. For example,
5 0 2 0 0 –1 is encoded as: 100101 11100110 110110
This is according to the table below:
© 2002-2003 by Yu Hen Hu
Number
Run/Category
Base code
Length
Final code
5
0/3
100
6
100 101
02
1/2
111001
8
111001 10
00-1
2/1
11011
6
11011 0
ECE533 Digital Image Processing
19
Huffman Decoding



A look-up table procedure.
Challenge: How to perform
decoding fast?
Example: a Huffman table for
six symbols:
Symbol
Codeword
A
0
© 2002-2003 by Yu Hen Hu
B
10
C
1100
D
1101
E
1110
F
1111

The decoding process can be
modeled as a finite state
machine with the following state
diagram. It decodes one bit of
input bit stream per clock cycle.
d
0/C,1/D
0/A
0/a
1/0/B
b
1/-
1/0/E,1/F

c
e
Question: How to make this
process fast enough to match
any input bit rate?
ECE533 Digital Image Processing
20
Video Coding



Video coding is often implemented as encoding
a sequence of images. Motion compensation is
used to exploit temporal redundancy between
successive frames.
Examples: MPEG-I, MPEG-II, MPEG-IV, H.323,
H.263, H.263+, etc.
Existing video coding standards are based on
JPEG image compression as well as motion
compensation.
© 2002-2003 by Yu Hen Hu
ECE533 Digital Image Processing
21
MPEG Encoding
Buffer control
Current
frame x(t)
+
r
DCT
VLC
Q

Q-1
IDCT
^
x(t): predicted
frame
x(t)
Motion
Estimation &
Compensation
MV  MEx(t ), ~
x (t  1)
xˆ (t )  MC~
x (t  1), MV 
~
x (t )  xˆ (t )  Q[r (t )]
x(t )  xˆ (t )  r (t )
Q[r(t)]: reconstructed residue
+
~
x(t-1) Frame
Buffer
Motion vectors
© 2002-2003 by Yu Hen Hu
Bit stream
Buffer
~
x(t): reconstructed
current frame
This is a simplified block diagram
where the encoding of intra coded
frames is not shown.
ECE533 Digital Image Processing
22
MPEG Decoding
VLD: Variable Length Decoding
Received bit
stream Bit stream
Buffer
VLD
Q-1
xˆ (t )  MC~
x (t  1), MV 
~
x (t )  xˆ (t )  Q[r (t )]
IDCT
^
x(t): predicted
frame
Motion
Compensation
Q[r(t)]: reconstructed residue
+
~
x(t-1)
Frame
Buffer
~
x(t): reconstructed
current frame
Motion vectors
© 2002-2003 by Yu Hen Hu
ECE533 Digital Image Processing
23
Motion Estimation

Three types of frames:
» Intra (I): the frame is coded as if it is an image
» Predicted (P): predicted from an I or P frame
» Bi-directional (B): forward and backward predicted from a
pair of I or P frames.

A typical frame arrangement is (subscripts are used
to distinguish them):
I1 B1 B2 P1 B3 B4 P2 B5 B6 I2

P1, P2 are both forward-predicted from I1. B1, B2 are
interpolated from I1 and P1, B3, B4 are interpolated
from P1, P2, and B5, B6 are interpolated from P2, I2.
© 2002-2003 by Yu Hen Hu
ECE533 Digital Image Processing
24
Forward Motion Estimation
1
2
3
4
5
6
7
8
5
9
10
11
12
9
16
13
13
14
15
Current frame xˆ (t ) constructed
From different parts of reference
frame
© 2002-2003 by Yu Hen Hu
1 2
6
4
3
7
11
10
14
8
12
15 16
x (t  1)
Reference frame ~
ECE533 Digital Image Processing
25
Block Motion Estimation


MAD: Mean absolute
difference between the I,jth pixel of the current block
x(i,j) and the (I+m,j+n)-th
pixel of the reference
frame.
MV  argmin MAD(m, n)
(-pm,n  p) is the motion
vector corresponding to
the macro-block. M and N
are search range.
It is similar to DPCM in the
temporal domain, and has
less to do with object
motion.
© 2002-2003 by Yu Hen Hu
motion
vector
current block
search area
current frame
reference frame
1
MAD(m, n)  2
N
N 1 N 1
 x(i, j)  y(i  m, j  n)
i 0 j 0
ECE533 Digital Image Processing
26
Video sequence : Tennis frame 0
previous frame
50
100
150
200
50
100
150
200
250
300
350
Prepared by Surin Kittitornkun
© 2002-2003 by Yu Hen Hu
ECE533 Digital Image Processing
27
Video sequence : Tennis frame 1
current frame
50
100
150
200
50
100
150
200
250
300
350
Prepared by Surin Kittitornkun
© 2002-2003 by Yu Hen Hu
ECE533 Digital Image Processing
28
Frame Difference
Frame Difference :frame 0 and 1
Prepared by Surin Kittitornkun
© 2002-2003 by Yu Hen Hu
ECE533 Digital Image Processing
29
What is motion estimation?
Motion Vector Field of frame 1
50
0
-50
-100
-150
-200
-250
0
50
100
150
200
250
300
350
400
Prepared by Surin Kittitornkun
© 2002-2003 by Yu Hen Hu
ECE533 Digital Image Processing
30
What is motion compensation ?
Motion compensated frame
50
100
150
200
50
100
150
200
250
300
350
Prepared by Surin Kittitornkun
© 2002-2003 by Yu Hen Hu
ECE533 Digital Image Processing
31
Motion Compensated Frame
Difference
Motion Compensated Frame Difference :frame 0 and 1
Prepared by Surin Kittitornkun
© 2002-2003 by Yu Hen Hu
ECE533 Digital Image Processing
32
6-Level Nested Do Loop
Do h=0 to Nh-1
Do v=0 to Nv-1
MV(h,v)=(0,0)
Dmin(h,v)=
Do m=-p to p (-1)
Do n=-p to p (-1)
MAD(m,n)=0
Do i=hN to hN+N-1
Do j=vN to vN+N-1
MAD(m,n)= MAD(m,n)
+|x(i,j)y(i+m,j+n)|
© 2002-2003 by Yu Hen Hu
End do j
End do i
If Dmin(h,v)>MAD(m,n)
Dmin(h,v)=MAD(m,n)
MV(h,v)=(m,n)
End if
End do n
End do m
End do v
End do h
ECE533 Digital Image Processing
33