Transcript Document

Introduction to ME and DCT



Variable Length Code
Motion Estimation
Discrete Cosine Transform
2015/7/16
ME & DCT 2009
VCLab
1
Compression Basics


Information entropy: Claude E. Shannon 1948,
“A Mathematical Theory of Communication”
Lossless coding






Entropy coding methods: Huffman code, arithmetic
code
Dictionary-based, LZ77(Lempel-Ziv)
PCM, Prediction coding, Differential PCM
(DPCM)
Sub-band coding
Transform coding
Motion compensation
2015/7/16
ME & DCT 2009
VCLab
2
Outline










Typical motion in videos
The exercises of motion
Motion representation
How to find the Motion?
How to find the motion in a block?
Residuals
Block matching algorithms (BMAs)
Problems with BMA
Fast BMAs
Intra frame and inter frame
2015/7/16
ME & DCT 2009
VCLab
3
Typical Motion in Video Clips

Local motions

Global motions
2015/7/16
Background
ME & DCT 2009
VCLab
4
The Exercises of Motion

Intra: Compress one frame independently

Each pixel has to be compressed.


DCT  Quantization  Binary coding
Inter: Compress one frame depending on
the previous frame.


2015/7/16
Background can be ignored.
Only compress moving objects and new objects
ME & DCT 2009
VCLab
5
Example
1. Compress
and
2. Compress the motion of
in frame 1.
in
remaining frames.

2015/7/16
1
Direction and magnitude
2
ME & DCT 2009
VCLab
3
4
6
Motion Model

Video Model & Format

3-D Motion Model

2-D Motion Model

Constant intensity assumption & Optical
flow model
2015/7/16
ME & DCT 2009
VCLab
7
Motion Model

Object model, Illumination model & Camera model
ambient, point
diffuse reflection
2015/7/16
ME & DCT 2009
VCLab
8
3-D Motion VS. 2-D Motion
rigid: rotation & translation
prospective projection
2015/7/16
ME & DCT 2009
VCLab
9
2-D Motion Approximations

Approximations: affine (6 parameters) or
bilinear model (8 parameters)
2015/7/16
ME & DCT 2009
VCLab
10
Constant Intensity Assumption &
Optical Flow Equation

Valid under:


Constant ambient illumination
Diffuse reflecting surface
Motion

2015/7/16
ME & DCT 2009
VCLab
11
Motion Representation

Four types of motion info
Region-based
Global
Block-based
Pixel-based

Parameters: 2 (translation), 6 (affine) or 8 (bi-linear,
projective)
2015/7/16
ME & DCT 2009
VCLab
12
How to Find the Motion?
2015/7/16
ME & DCT 2009
VCLab
13
How to Find the Motion?

Deformable BMA: node-based, affine (6
parameters) or bilinear model (8)

Mesh-based motion estimation
Region-based motion estimation
Multi-resolution approach
Overlapped motion estimation (H.263 optional)
Block matching: 2-D translation (2 parameters)






2015/7/16
Assume all pixels in a block undergo a coherent motion,
and
search for the motion parameters for each block
independently.
ME & DCT 2009
VCLab
14
Deformable BMA


Node-based, affine (6 parameters) or bilinear
model (8)
Mesh-based motion estimation
2015/7/16
ME & DCT 2009
VCLab
15
Block-based VS. Mesh-based

Block-based

Mesh-based
2015/7/16
ME & DCT 2009
VCLab
16
Region-based
Original image
Quadtree decomposition
Region-based decomposition
Adaptive
threshold
Merge
Quadtree structure
Fractal coefficients
Encoder
Iterated
reconstruct
Reconstructed Image
Region contours
Rebuild
region
Initial image with
Region-based decomposition
Initial image with
quadtree decomposition
Decoder
2015/7/16
ME & DCT 2009
VCLab
17
Multi-resolution Approach
2015/7/16
ME & DCT 2009
VCLab
18
Overlapped Motion Estimation
H.263
2015/7/16
ME & DCT 2009
VCLab
19
How to Find The Motion in A
Block?

Block matching
Occlusion
Frame i-1
Motion vector
2015/7/16
Reference frame
ME & DCT 2009
(existed)
VCLab
Frame i
matched
Current frame
20
(to be encoded)
Block Matching

Compare the difference between two
blocks. (one is in the current frame, and the other
is in the reference frame)
|
2015/7/16
Candidate block
p
|
-
ME & DCT 2009
VCLab
Current block
p = 1, sum of absolute difference
p = 2, mean square error
21
Objective Quality Measurement

Peak Signal to Noise Ratio (PSNR)

Other objective quality metrics, ITU-T Video
Quality Experts Group (VQEG)
(2n  1)2
PSNRdB  10log10
MSE

Currently, no objective measurement system is
able to replace subjective testing, no one
objective model outperforms the others in all
cases.
2015/7/16
ME & DCT 2009
VCLab
22
Scan Line Order, MB by MB

Scan Line Order
Frame n-1

Frame n
Search Range
MV(1,0)
MV(0,0)
2015/7/16
ME & DCT 2009
VCLab
23
Residuals (1)
2015/7/16
occlusion
motion
ME & DCT 2009
VCLab
Residuals
24
Residuals (2)
Residual only

Encoder (DCT  Quantization  Binary coding)
 Residuals

DCT + Q
iDCT + iQ

Motion
Compensation
2015/7/16
ME & DCT 2009
VCLab

MV = (dx, dy)
Previous
Frame Buffer
25
Residuals (3)

Decoder
Residual
MV
Coded
Bitstream
Q 1
Reconstructed
frame
IDCT
VLD
Motion
Compensation
2015/7/16
ME & DCT 2009
VCLab
Previous
Frame memory
26
Block Matching Algorithm - Full
Search Method
15
Scan line order
15
2015/7/16
ME & DCT 2009
VCLab
27
Problems with Block Matching (1)
1. Blocking effect (discontinuity across block boundary)
• Because the block-wise translation model is not
accurate
• Real motion in a block may be more complicated than
translation
• There may be multiple objects with different motions in
a block
• Intensity changes may be due to illumination effect
2.
Motion field somewhat chaotic
• MVs are estimated independently from block to block
2015/7/16
ME & DCT 2009
VCLab
28
Problems with Block Matching (2)
3. Wrong MV in the flat region
• Motion is indeterminate when spatial gradient is near
zero
4. Motion vectors over picture boundaries
5. Motion is undefined in occluded regions
6. Requires tremendous computation
2015/7/16
ME & DCT 2009
VCLab
29
H.264 & Other Solutions
• H.264/AVC
Variable-block-size motion estimation
1/4-, 1/8-pixel motion vector precision
Multiple reference pictures
•
•
•
•
•
•
Wavelet-based - Motion Compensated Temporal Filtering (MCTF)
Deformable BMA
Mesh-based motion estimation
Region-based motion estimation
Multi-resolution approach
Overlapped motion estimation
• Fast algorithms
2015/7/16
ME & DCT 2009
VCLab
30
Complexity of Integer-Pel
EBMA

Assumption
(2R+1)2N2
– Image size: M x M
– Block size: N x N
– Search range: (-R, R) in each dimension
– Search stepsize: 1 pixel (assuming integer MV)
• Operation counts (1 operation=1 “-”, 1 “+”, 1 “*”):
– Each candidate position: N2
– Each block going through all candidates: (2R+1)2 N2
– Entire frame: (M/N)2 (2R+1)2 N2 = M2 (2R+1)2
• Independent of block size!
• Example: M = 512, N = 16, R = 16, 30 fps
Total operation count = 2.85x108/frame = 8.55x109/second
• Regular structure suitable for VLSI implementation
• Challenging for software-only implementation
2015/7/16
ME & DCT 2009
VCLab
31
Fast Algorithms for BMA
(2R+1)2

Reduce # of search
candidates:
• Only search for those that are
likely to produce small errors.
• Predict possible remaining
candidates, based on
previous search result

Simplify the error measure
(DFD) to reduce the
computation involved for
each candidate
16
16
E (d x , d y )  [ ref (i  d x , j  d y )  current (i, j )]p
i 1 j 1
2015/7/16
ME & DCT 2009
VCLab
32
Fast Block Matching
Algorithms

TSS


BBGDS


New 3-Step Search Algorithm
DS


Block-Based Gradient Descent Search Algorithm
NTSS


Three-Step Search Algorithm
Diamond Search Algorithm
FSS

2015/7/16
Four-Step Search Algorithm
ME & DCT 2009
VCLab
33
TSS
(Three-Step Search Algorithm)
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
7
6
5
4
3
2
1
0
-1
-2
-3
-4
-5
-6
-7
2015/7/16
-7 -6 -5 -4 -3 -2 -1
0
1
2
3
4
5
6
7
7 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25
6 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25
5 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25
4 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25
3 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25
2 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25
1 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25
0 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25
-1 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25
-2 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25
-3 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25
-4 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25
-5 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25
-6 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25
-7 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25
ME & DCT 2009
VCLab
34
BBGDS
(Block-Based Gradient Descent Search Algorithm)
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
7
6
5
4
3
2
1
0
-1
-2
-3
-4
-5
-6
-7
-7 -6 -5 -4 -3 -2 -1
0
1
2
3
4
5
6
7
7 44 42 40 38 36 34 32 30 32 34 36 38 40 42 44
6 42 39 37 35 33 31 29 27 29 31 33 35 37 39 42
5 40 37 34 32 30 28 26 24 26 28 30 32 34 37 40
4 38 35 32 29 27 25 23 21 23 25 27 29 32 35 38
3 36 33 30 27 24 22 20 18 20 22 24 27 30 33 36
2 34 31 28 25 22 19 17 15 17 19 22 25 28 31 34
1 32 29 26 23 20 17 14 12 14 17 20 23 26 29 32
0 30 27 24 21 18 15 12 9 12 15 18 21 24 27 30
-1 32 29 26 23 20 17 14 12 14 17 20 23 26 29 32
-2 34 31 28 25 22 19 17 15 17 19 22 25 28 31 34
-3 36 33 30 27 24 22 20 18 20 22 24 27 30 33 36
-4 38 35 32 29 27 25 23 21 23 25 27 29 32 35 38
-5 40 37 34 32 30 28 26 24 26 28 30 32 34 37 40
-6 42 39 37 35 33 31 29 27 29 31 33 35 37 39 42
-7 44 42 40 38 36 34 32 30 32 34 36 38 40 42 44
CBBGDS (i, j)  min{| i |, | j |} 5  (max{|i |, | j |}  min{| i |, | j |})  3  9
2015/7/16
ME & DCT 2009
VCLab
35
NTSS
(New 3-Step Search Algorithm)
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
7
6
5
4
3
2
1
0
-1
-2
-3
-4
-5
-6
-7
2015/7/16
-7 -6 -5 -4 -3 -2 -1
0
1
2
3
4
5
6
7
7 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33
6 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33
5 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33
4 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33
3 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33
2 33 33 33 33 33 22 20 20 20 22 33 33 33 33 33
1 33 33 33 33 33 20 22 20 22 20 33 33 33 33 33
0 33 33 33 33 33 20 20 17 20 20 33 33 33 33 33
-1 33 33 33 33 33 20 22 20 22 20 33 33 33 33 33
-2 33 33 33 33 33 22 20 20 20 22 33 33 33 33 33
-3 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33
-4 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33
-5 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33
-6 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33
-7 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33
ME & DCT 2009
VCLab
36
NTSS (2)
Decision 1: min at the search window center ?
Decision 2: min at one neighbor of center ?
1st step of NTSS
17 checking points
T
Decision 1
F
Decision 2
F
T
MV=0
2nd step of NTSS
3 or 5 checking points
2nd and 3rd step of NTSS
(same as in TSS)
IEEE Transactions on Circuits and Systems for Video Technology, June 1994.
2015/7/16
ME & DCT 2009
VCLab
37
FSS
(Four-Step Search Algorithm)
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
7
6
5
4
3
2
1
0
-1
-2
-3
-4
-5
-6
-7
-7 -6 -5 -4 -3 -2 -1
0
1
2
3
4
5
6
7
7 32 32 30 30 28 28 26 26 26 28 28 30 30 32 32
6 32 32 30 30 28 28 26 26 26 28 28 30 30 32 32
5 30 30 27 27 25 25 23 23 23 25 25 27 27 30 30
4 30 30 27 27 25 25 23 23 23 25 25 27 27 30 30
3 28 28 25 25 22 22 20 20 20 22 22 25 25 28 28
2 28 28 25 25 22 22 20 20 20 22 22 25 25 28 28
1 26 26 23 23 20 20 17 17 17 20 20 23 23 26 26
0 26 26 23 23 20 20 17 17 17 20 20 23 23 26 26
-1 26 26 23 23 20 20 17 17 17 20 20 23 23 26 26
-2 28 28 25 25 22 22 20 20 20 22 22 25 25 28 28
-3 28 28 25 25 22 22 20 20 20 22 22 25 25 28 28
-4 30 30 27 27 25 25 23 23 23 25 25 27 27 30 30
-5 30 30 27 27 25 25 23 23 23 25 25 27 27 30 30
-6 23 32 30 30 28 28 26 26 26 28 28 30 30 32 32
-7 32 32 30 30 28 28 26 26 26 28 28 30 30 32 32
 min{| i |, | j |} 
 max{|i |, | j |}   min{| i |, | j |} 
CFSS (i, j )  

5

(


  
)  3  17
2
2
2

2015/7/16
ME & DCT 2009
VCLab
38
DS
(Diamond Search Algorithm)
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
7
6
5
4
3
2
1
0
-1
-2
-3
-4
-5
-6
-7
-7 -6 -5 -4 -3 -2 -1
0
1
2
3
4
5
6
7
7 34 31 33 30 32 29 31 28 31 29 32 30 33 31 34
6 31 31 28 30 27 29 26 28 26 29 27 30 28 31 31
5 33 28 28 25 27 24 26 23 26 24 27 25 28 28 33
4 30 30 25 25 22 24 21 23 21 24 22 25 25 30 30
3 32 27 27 22 22 19 21 18 21 19 22 22 27 27 32
2 29 29 24 24 19 19 16 18 16 19 19 24 24 29 29
1 31 26 26 21 21 16 16 13 16 16 21 21 26 26 31
0 28 28 23 23 18 18 13 13 13 18 18 23 23 28 28
-1 31 26 26 21 21 16 16 13 16 16 21 21 26 26 31
-2 29 29 24 24 19 19 16 18 16 19 19 24 24 29 29
-3 32 27 27 22 22 19 21 18 21 19 22 22 27 27 32
-4 30 30 25 25 22 24 21 23 21 24 22 25 25 30 30
-5 33 28 28 25 27 24 26 23 26 24 27 25 28 28 33
-6 31 31 28 30 27 29 26 28 26 29 27 30 28 31 31
-7 34 31 33 30 32 29 31 28 31 29 32 30 33 31 34
 (max{|i |, | j |}  min{| i |, | j |}) 
CDS (i, j )  min{| i |, | j |}  3  
 5  13

2


2015/7/16
ME & DCT 2009
VCLab
39
Adaptive Search Patterns
-7 -6 -5 -4 -3 -2 -1
0
1
2
3
4
5
6
7
-7 -6 -5 -4 -3 -2 -1
0
1
2
3
4
5
6
7
7 T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
7 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25
6 T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
6 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25
5 T
T
T
T
T
T
T
D T
T
T
T
T
T
T
5 25 25 25 25 25 25 25 23 25 25 25 25 25 25 25
4 T
T
T
D D D D B D D D D
T
T
T
4 25 25 25 25 22 24 21 21 21 24 22 25 25 25 25
3 T
T
T
D D D B D B D D D
T
T
T
3 25 25 25 22 22 19 20 18 20 19 22 22 25 25 25
2 T
T
T
D D D D B D D D D
T
T
T
2 25 25 25 24 19 19 16 15 16 19 19 24 25 25 25
1 T
T
T
D B D B
T
T
T
1 25 25 25 21 20 16 14 12 14 16 20 21 25 25 25
0 T
T
D B D B B
B B B D B D T
T
0 25 25 23 21 18 15 12 9 12 15 18 21 23 25 25
-1 T
T
T
D B D B
B B D B D
T
T
T
-1 25 25 25 21 20 16 14 12 14 16 20 21 25 25 25
-2 T
T
T
D D D D B D D D D
T
T
T
-2 25 25 25 24 19 19 16 15 16 19 19 24 25 25 25
-3 T
T
T
D D D B D B D D D
T
T
T
-3 25 25 25 22 22 19 20 18 20 19 22 22 25 25 25
-4 T
T
T
D D D D B D D D D
T
T
T
-4 25 25 25 25 22 24 21 21 21 24 22 25 25 25 25
-5 T
T
T
T
T
T
T
D T
T
T
T
T
T
T
-5 25 25 25 25 25 25 25 23 25 25 25 25 25 25 25
-6 T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
-6 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25
-7 T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
-7 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25
B B D B D
TSS + DS + BBGDS
2015/7/16
ME & DCT 2009
VCLab
40
Fractional Pixel Accuracy

Fractional pixel accuracy

e.g., half-pixel accuracy
Integer pixel
half pixel
2015/7/16
(dx, dy) = (1.5, 1)
ME & DCT 2009
VCLab
H.263, Foreman, QCIF
SKIP=2, Q=4,5,7,10,15,25
42
Introduction to ME and DCT


Motion Estimation
Discrete Cosine Transform
2015/7/16
ME & DCT 2009
VCLab
43
Outline







Transform Coding
Principles of Transform Coding
Computation of Transform Coding
Discrete Fourier Transform
What Is DCT And Why Use DCT
How to Compute DCT
Program the DCT
2015/7/16
ME & DCT 2009
VCLab
44
Transform Coding






The picture is handled in a semi-parallel manner
in blocks of elements. And it operates by
generating a set of output data for each block in
which the individual values are no longer
correlated with one another and in which most of
the energy of the original block is contained in a
small proportion of the output samples.
Karhunen-Loeve transform (KLT)
Discrete and Fast Fourier transform
Hadamard transform
Discrete cosine transform
Wavelet transform
2015/7/16
ME & DCT 2009
VCLab
45
Principles of Transform Coding



Decorrelation: to generate less correlated
or uncorrelated transform coefficients for
greater or greatest efficiencies respectively.
Linearity: to allow a one-to-one mapping
between the spatial and transform domains.
Orthogonality: the energy in both domains
should be the same, and therefore no
energy is either lost or carried redundantly.
2015/7/16
ME & DCT 2009
VCLab
46
Principles of Transform Coding


Sparsity (or energy compaction) in transformed
domain: The energy in the transform domain
tends always to be concentrated in the same
subset of components, usually those
representing low frequencies.
In mathematical terms, the analysis or first stage
of the encoding process is a linear
transformations that converts a set of highly
correlated pixels with uniform probability density
functions into a new set of less correlated
coefficients with non-uniform pdfs.
2015/7/16
ME & DCT 2009
VCLab
47
Computation of Transform Coding


1-D transform
2-D transform: Usually 2-D transformation
matrices are separable and the UV transform
coefficients are derived from two 1-D transforms
of lengths M and N. In implementation, this is
done by
N times the 1-D transforms of M pixels in the line scan
direction. This results in MN 1-D transform coefficients;
 further M times the1-D transforms of the sets of N 1-D
transform coefficients;
 the final and true 2-D transform coefficients are then
derived.
2015/7/16
ME & DCT 2009
48

VCLab
Discrete Fourier Transform






Basis function: e j 2nk/ N
N: the matrix length,
k: the frequency or sequence index of the basis
vectors,
n: the pixel index.
Fast Fourier transform algorithm: requires
O(nlog2n) time complexity.
Spurious spectral components are generated due
to the implicit periodicity of the image blocks.
2015/7/16
ME & DCT 2009
VCLab
49
Discrete Cosine Transform




The most efficient suboptimum transform
the elements of the kth basis vector:
cos(k(2n+1)/2N)'s
It has the advantage over all other suboptimum
transforms in that its basis vectors resemble
closely the basis vectors of the KLT for smoothly
varying video inputs. (for a first-order Markov
source model)
The n-point DCT can be computed as a 2n-point
DFT
2015/7/16
ME & DCT 2009
VCLab
50
Transform Coding System
Input samples
Forward
transform
 575 205 215 140 


 355 155 105 20 
 150 200 65 25 
÷10


 100 70 30 10 
 57 20 21 14 


 35 15 10 2 
 15 20 6 2 


 10 7 3 1 
Quantizer
Binary
decoder
Network
Binary
Inverse
quantizer
e.g. zip, RAR
encoder
Huffman coding
2015/7/16
Inverse
transform
 57 20 21 14 


 35 15 10 2 
 15 20 6 2 


 10 7 3 1 
 570 205 210 140 


×10
 350 150 100 20 
 150 200 60 20 


 100 70 30 10 
ME & DCT 2009
VCLab
Output samples
51
Representation of An Image
How to code an image?

1.
2.
2015/7/16
Spatial domain (pixel-based)
Transform domain
ME & DCT 2009
VCLab
52
Why Use DCT?
Properties of DCT




Use cosine function as its basis
function
Performance approaches KLT
Fast algorithm exists
Most popular in image compression
applications
2015/7/16
ME & DCT 2009
VCLab
53
Does Transform Really Make
Sense ?
 Energy compaction
2015/7/16
ME & DCT 2009
 De-correlation: dependency elimination VCLab
54
An Example
139
148
150
149
155
164
165
168
98
115
130
135
143
146
142
147
89
110
125
128
129
121
104
106
96
116
128
132
134
132
113
109
111
125
127
131
137
137
120
110
122
126
126
131
133
131
126
112
133
134
136
138
140
144
141
139
138
139
139
139
140
146
148
147
8
8
2015/7/16
ME & DCT 2009
VCLab
55
An Example
A pixel expressed by it’s value
The coefficient of the basis vector (0,0)
DCT
IDCT
Pixel values in spatial
domain
2015/7/16
ME & DCT 2009
VCLab
DCT coefficients in
transform domain
56
Definition of DCT Basis Function
Basis
function of the 1-D N-point DCT
uk ;n   (k ) cos
 1


 (k )   N
 2

 N
For
(2n  1)k
, n  0,1,..., N  1
2N
k 0
k  1,2,..., N  1
N=8
uk ;n   (k ) cos
2015/7/16
(2n  1)k
, n  0,1,...,7
16
ME & DCT 2009
VCLab
57
An Example, N = 4
2015/7/16
ME & DCT 2009
VCLab
58
An Example, N = 4
2015/7/16
ME & DCT 2009
VCLab
59
An Example
Represent a vector (e.g. a block of image samples) as the
superposition of some typical vectors (block patterns)
2015/7/16
ME & DCT 2009
VCLab
60
Basic diagram of DCT
Discrete cosine transform and
Inverse DCT
(1)
tk 
N 1
N 1
 u s   (k )  sn cos
n 0
N 1
*
k ;n n
N 1
n 0
(2) sn  n0 uk ;ntk  n0  (k )tk cos
2015/7/16
(2n  1)k
2N
(2n  1)k
, n  0,1,...,N  1
2N
ME & DCT 2009
VCLab
61
The Basis of 2D-DCT with 8x8
Block
2015/7/16
ME & DCT 2009
VCLab
62
Again – Do You Know What
DCT Mean?
A pixel expressed by it’s value
The coefficient of the basis vector (0,0)
DCT
IDCT
Pixel values in spatial
domain
2015/7/16
ME & DCT 2009
VCLab
DCT coefficients in
transform domain
63
How to Compute:
1-D VS. 2-D


[1-D] For a M × N 2D-block, we can
use 1D N-point DCT in the row
direction, then the 1-D M-point DCT in
the column direction to get the 2DDCT
[2-D] If 8 × 8 blocks are applied, the
2D-DCT will be
2015/7/16
ME & DCT 2009
VCLab
64
DCT Matrix is Orthonormal
ui ;u   (k ) cos
(2i  1)u
, i  0,1,...,7
16
 cos((2i  1)u / 16) cos((2i  1)v / 16)
0i  7
 (1 / 2) [cos((2i  1) (u  v) / 16)  cos((2i  1) (u  v) / 16)]
0 i  7


The above equation is zero if u≠ v
orthogonal
The basis vector of DCT has unit norm
2015/7/16
ME & DCT 2009
VCLab
65
Energy Compaction of
Orthormal Transform


2015/7/16
ME & DCT 2009
VCLab
66
Separable Transform (1/2)



2015/7/16
ME & DCT 2009
VCLab
67
Separable Transform (2/2)
2015/7/16
ME & DCT 2009
VCLab
68
Fast DCT algorithm (1/2)
2015/7/16
ME & DCT 2009
VCLab
69
Fast DCT Algorithm (2/2)
2015/7/16
ME & DCT 2009
VCLab
70
How to program (1/3) - Basis
/***************************************************************************/
/*2D N*N DCT
*/
/*Input
*/
/*int argSource[N][N]:One block in the original image
/*Output
/*float argDCT[N][N]:The block in frequency domain corresponding to argSource[M][N] */
/***************************************************************************/
void DCT(int argDCT[8][8] , int argSource[8][8])
{
float C[8],Cos[8][8];
float temp;
int i,j,u,v;
*/
*/
for(i=0;i<8;i++)
for(j=0;j<8;j++)
Cos[i][j]=cos((2*i+1)*j*PI/16);
C[0]=0.35355339;
for(i=1;i<8;i++)
C[i]=0.5;
}
for(u=0;u<8;u++)
for(v=0;v<8;v++)
{
temp=0.0;
for(i=0;i<8;i++)
for(j=0;j<8;j++)
temp+=Cos[i][u]*Cos[j][v]*(argSource[i][j]-128);
temp*=C[u]*C[v];
argDCT[u][v]=temp;
}
2015/7/16
ME & DCT 2009
VCLab
71
How to program (2/3) – A fast
algorithm
/***************************************************************************/
/*2D N*N DCT
*/
/*Input
*/
/*int argSource[N][N]:One block in the original image
/*Output
/*float argDCT[N][N]:The block in frequency domain corresponding to argSource[M][N] */
/***************************************************************************/
void DCT(int argDCT[8][8] , int argSource[8][8])
{
float temp[8][8],temp1;
int i,j,k;
*/
*/
for(i=0;i<8;i++)
for(j=0;j<8;j++)
{
temp[i][j] = 0.0;
for(k=0;k<8;k++) temp[i][j] +=((int) argSource[i][k]-128)*Ct[k][j];
}
for(i=0;u<8;u++)
for(j=0;v<8;v++)
{
temp1=0.0;
for(k=0;k<8;k++)
temp1+ =C[i][k] * temp[k][j];
}
}
argDCT[i][j]=ROUND(temp1);
2015/7/16
ME & DCT 2009
VCLab
72
How to program (3/3) - for hardware
implementation
#include <stdio.h>
#define RS(r,s) ((r) >> (s))
#define SCALE(exp) RS((exp),10)
void DCT(short int*input, short int*output)
{
short int
jc, i, j, k;
short int
b[8];
short int
b1[8];
short int
d[8][8];
int c0=724;/* ; lect shift 10*/
int c1=502;
int c2=474;
int c3=426;
int c4=362;
int c5=284;
int c6=196;
int c7=100;
for (i = 0, k = 0; i < 8; i++, k += 8)
{
for (j = 0; j < 8; j++)
{
b[j] = input[k+j];
}
/* row transform */
for (j = 0; j < 4; j++)
{
jc = 7 - j;
b1[j] = b[j] + b[jc];
2015/7/16
b1[jc]
= b[j] - b[jc];
}
b[0] = b1[0] + b1[3];
b[1] = b1[1] + b1[2];
b[2] = b1[1] - b1[2];
b[3] = b1[0] - b1[3];
b[4] = b1[4];
b[5] = SCALE((b1[6] - b1[5]) * c0);
b[6] = SCALE((b1[6] + b1[5]) * c0);
b[7] = b1[7];
d[i][0] = SCALE((b[0] + b[1]) * c4);
d[i][4] = SCALE((b[0] - b[1]) * c4);
d[i][2] = SCALE(b[2] * c6 + b[3] * c2);
d[i][6] = SCALE(b[3] * c6 - b[2] * c2);
b1[4] = b[4] + b[5];
b1[7] = b[7] + b[6];
b1[5] = b[4] - b[5];
b1[6] = b[7] - b[6];
d[i][1] = SCALE(b1[4] * c7 + b1[7] * c1);
d[i][5] = SCALE(b1[5] * c3 + b1[6] * c5);
d[i][7] = SCALE(b1[7] * c7 - b1[4] * c1);
d[i][3] = SCALE(b1[6] * c3 - b1[5] * c5);
}
/* column transform */
for (i = 0; i < 8; i++) {
for (j = 0; j < 4; j++) {
jc = 7 - j;
b1[j] = d[j][i] + d[jc][i];
b1[jc] = d[j][i] - d[jc][i];
}
ME & DCT 2009
VCLab
b[0] = b1[0] + b1[3];
b[1] = b1[1] + b1[2];
b[2] = b1[1] - b1[2];
b[3] = b1[0] - b1[3];
b[4] = b1[4];
b[5] = SCALE((b1[6] - b1[5]) * c0);
b[6] = SCALE((b1[6] + b1[5]) * c0);
b[7] = b1[7];
d[0][i] = SCALE((b[0] + b[1]) * c4);
d[4][i] = SCALE((b[0] - b[1]) * c4);
d[2][i] = SCALE(b[2] * c6 + b[3] * c2);
d[6][i] = SCALE(b[3] * c6 - b[2] * c2);
b1[4] = b[4] + b[5];
b1[7] = b[7] + b[6];
b1[5] = b[4] - b[5];
b1[6] = b[7] - b[6];
d[1][i] = SCALE(b1[4] * c7 + b1[7] * c1);
d[5][i] = SCALE(b1[5] * c3 + b1[6] * c5);
d[7][i] = SCALE(b1[7] * c7 - b1[4] * c1);
d[3][i] = SCALE(b1[6] * c3 - b1[5] * c5);
}
for (i = 0; i < 8; i++) {
/* store 2-D array(8*8) data into a 1-D arra
for (j = 0; j < 8; j++) {
*(output + i*8 + j) = (d[i][j]);
}
}
}
73
Conclusion


DCT provides a new method to
express an image with the properties
of the image
The fast algorithm provided for
hardware implement is possible.
2015/7/16
ME & DCT 2009
VCLab
74
Conclusion
2015/7/16
ME & DCT 2009
VCLab
75
Reference Software

H.264


http://iphome.hhi.de/suehring/tml/
MPEG-4

2015/7/16
http://www.xvid.org
ME & DCT 2009
VCLab
76
Leakage Reduction (1)



It’s inherent in the DIGITAL Fourier transforms
because of the required time domain truncation.
If the truncation interval is chosen equal to a
multiple of the period, the frequency domain
sampling function is coincided with the zeros of
the sin(f)/f function do not alter the DFT results.
If the truncation interval is NOT chosen equal to
a multiple of the period, the side-lobe
characteristics of the sin(f)/f frequency function
result additional frequency components (leakage)
in DFT domain.
2015/7/16
ME & DCT 2009
VCLab
77
Leakage Reduction (2)


To reduce this leakage it is necessary to employ
a time domain truncation function which has
side-lobe characteristics that are of smaller
magnetite.
The Hanning function: The effect is to reduce
the discontinuity, which results from the
rectangular truncation function.
2015/7/16
ME & DCT 2009
VCLab
78
2015/7/16
ME & DCT 2009
VCLab
79
2015/7/16
ME & DCT 2009
VCLab
80