No Slide Title
Download
Report
Transcript No Slide Title
Interframe Coding
Heejune AHN
Embedded Communications Laboratory
Seoul National Univ. of Technology
Fall 2008
Last updated 2008. 10. 12
Agenda
Interframe Coding Concept
Block Matching Algorithm
Fast Block Matching Algorithms
Block Matching Algorithm Variations
Enhanced Motion Models
Implementation Cases
Heejune AHN: Image and Video Compression
p. 2
1. Interframe Coding
Motivation
Video has High Temporal Correlation between frames.
Var[ X(t+1) – X(t) ] << Var[ X(t+1) ]
Two successive video frames
Heejune AHN: Image and Video Compression
DFD
(displaced Frame Difference)
p. 3
Motion Estimation and compensation
Motion estimation
• Find the best parameters of current frame from reference frames
Motion compensation
• Subtracts (Add) the predicted values from current frame (to DFD
frame)
Current
frame
Reference
frames
MC
Encode
Residual
ME
Texture Info
MC
Recon.
Motion parameters
Reference
frames
Recon.
Heejune AHN: Image and Video Compression
p. 4
Performance Criteria
Coding performance
• Residual signal has low energy (variance measure)
Complexity
• Computational and implementation complexity
Storage and Delay
• Number of required frames
Side Information
• Size and complexity of motion parameters
Error resilience
• When data is partially lost.
Some factors are trade off
Coding perf. against complexity, storage, side info, error resilience.
Heejune AHN: Image and Video Compression
p. 5
2D Motion
previous frame
t
stationary
background
current frame
x
time t
y
moving
object
d x
„Displacement vector“
d y
shifted
object
Prediction for the luminance signal S(x,y,t) within the moving object:
Sˆ(x, y,t) S(x dx , y dy ,t t)
Heejune AHN: Image and Video Compression
p. 6
2. Block Matching Algorithm
BMA(Block matching algorithm)
Segment frame into same rectangular Blocks
2-D linear motion (mvx, mvy) per each block
X(t)
X(t+1)
Real Motion
MV
Heejune AHN: Image and Video Compression
p. 7
Difference Measure
MSE
1 N 1
MSE 2
N x 0
MAE and SAE
1
MAE 2
N
x 0
2
(
r
(
x
m
v
,
y
m
v
)
c
(
x
,
y
))
x
y
y 0
N 1 N 1
| r ( x m v , y m v ) - c( x, y) |
x 0
N 1 N 1
SAE
N 1
x
y 0
y
| r ( x m v , y m v ) - c( x, y) |
x
y 0
y
CCF (Cross Correlation Function)
1 N 1
CCF 2
N x 0
N 1
(r ( x m v , x m v ) - m )(c( x, y) m )
y 0
Heejune AHN: Image and Video Compression
x
y
r
c
p. 8
Full Search Algorithm
“Full Search” does Not means the whole frame, but whole
position in limited Search Window
Method
• Raster order or Spiral order (Figure. 6.6)
x-6
x
x+6
x-6
y-6
y-6
y
y
y+6
y+6
Heejune AHN: Image and Video Compression
x
x+6
p. 9
Full Search Complexity
(2w+1) x (2w+1) points (for search window [-w, w])
NxN size Block computation
int SAE(uchar *f, uchar *g, int mvx, int mvy){
for ( x=0; x< N; x++){
for ( y=0; y< N; y++){
sae += ABS(*(f + (y+mvy)*width +(x+mvx), *(g + y*width+x));
}
mvx_min = mvy_min = 0;
min = SAE(f, g, 0, 0);
for(mvy=-w, mvy<=w, mvy++)
for(mvx=-w, mvx<=w, mvx++){
sae = SAE(pre, cur, mv, mv)
if(min >sae)
mvx_min = mvx, mvy_min = mvy, min = sae;
}
Heejune AHN: Image and Video Compression
p. 10
3. Fast BMAs
Complexity Reduction Approaches
Reduce test points
• Monotonic variation assumption
– The closer to the optimal point, the smaller difference
Change the test-point order (more like first)
• Binary Search than Linear Search
-w
+w
0
• Benefit from Early Stop of block difference calculation
Reduce the computation at one point
• Sub-sampled value
Note
Trade-off!
Heejune AHN: Image and Video Compression
p. 11
TSS (3-Step Search)
Step 0: Search center (0,0), n = w
Step 1: n = floor[ n / 2 ]
Step 2: Search 8 points and find the
min values
Step 3: if n == 1 stop, o.w. Go to
Step 1
Properties
x
x+6
y-6
2
1
y
Logarithmic/Binary search (only 3
step when p = 8)
Search decreasing distance
• w/2 => w/4 => w/8 . . . . until 1
x-6
2
2
2
1
2
2
3
3
3
3
2
1
3
3
3
2
1
1
1
1
1
1
y+6
Complexity : O(log2w)
Heejune AHN: Image and Video Compression
p. 12
2D Logarithmic Search
Step 0: Search center (0,0)
Step 1: Search 4 points with s step
size
Step 2: find min, if center S = S/2,
ow. move center to the min locaiton
Step 3: if S = 1, go to step 4, else
go to Step 1
Step 4: search the 8 neighbors, and
decide min.
3
1
1
2
2
1
1
2
4
5 5 5
5 3 5 4
5 5 5
2
Properties
Similar to TSS, but more accurate
Complexity ~ O(log2w) but not fixed
loop count
Heejune AHN: Image and Video Compression
p. 13
Examples
TSS (Tree Step Search)
Logarithmic Search
Cross Search
One-at-a-time Search
Nearest Neighbors Search
From Other Source.
TSS (Three Step search)
TDL (Two Dim. Logarithmic)
CDS (Conjugate Direction Search)
CSA (Cross Search Algorithm)
OSA (Orthogonal Search Algorithm)
Heejune AHN: Image and Video Compression
p. 14
Fast BMA Performance
Complexity
Algorithm
Maximum number of
search points
4
8
16
FSM
(2w + 1)2
81
289
1089
TDL
2 + 7 log2 w
16
23
30
TSS
1 + 8 log2 w
17
25
33
MMEA
1 + 6 log2 w
13
19
25
CDS
3 + 2w
11
19
35
OSA
1 + 4 log2 w
9
13
17
CSA
5 + 4 log2 w
13
17
21
Heejune AHN: Image and Video Compression
w
p. 15
Estimation Performance
Algorithm
Split screen
entropy
(bits/pel)
Trevor white
standard
deviation
entropy
(bits/pel)
standard
deviation
FSM
4.57
7.39
4.41
6.07
TDL
4.74
8.23
4.60
6.92
TSS
4.74
8.19
4.58
6.86
MMEA
4.81
8.56
4.69
7.46
CDS
4.84
8.86
4.74
7.54
OSA
4.85
8.81
4.72
7.51
CSA
4.82
8.65
4.68
7.42
Heejune AHN: Image and Video Compression
p. 16
Issues in Fast MC Algorithm
Local Minimum Error
Fast MC calculates only few of positions
Many cases are not “monotonic” curves, single hill.
Possibly can conclude with local minimum.
See Figure 6.15
1
1
1
3
2
2
3
Heejune AHN: Image and Video Compression
p. 17
Hierarchical MC
Reduced image
• Sub-sampled, filtered
• N levels with half resolution
Search top (N) level fully
• reduced search window range (w/2N-1)
Search lower N-1 level
• only 9(8?) neighbor positions only
Heejune AHN: Image and Video Compression
p. 18
Benefits of hierarchical search
Escape Local minimum
Complexity Reduction
• e.g) Window = 16
full search (2 × 32 + 1)2 = 4225 operations
HBMA with N =4, (2 × 4 + 1)^2 + 3 × 9 = 108 operations
Sub-sampled signal
Original signal
Heejune AHN: Image and Video Compression
p. 19
4. Variations of BMA: Multi-frame MC
Multiple Frame MC
“Forward pred” starts from H.261
“backward, bidirectional” starts from MPEG-1
“multiple reference (each MB takes its own ref picture) starts from
H.264
forward
forward
backward
bidirectional: average
Heejune AHN: Image and Video Compression
p. 20
4. Variations of BMA: Multi-frame MC
Multiple Frame distance
Search Range = frame difference x window
• Since displacement = velocity x time
eg) w = 8, 64 points (1 frame diff), 256 points (2 frame diff)
-2w
mvx2,mvy2
-w
mvx1,mvy1
+2w
t -2
+w
t -1
t
Practice
• search only [-w, w] of (mvx1, mvy1) for (mvx2, mvy2)
Heejune AHN: Image and Video Compression
p. 21
MV at Boundary
Restriction on MV range
Should inside of reference pictures
In H.261/MPEG-1, MPEG-2, MPEG-4
Unrestricted MV
Extrapolates (extends with same boundary pixel value)
In H263 Annex D,H.264
-w
-w
+w
+w
t -1
t
Extrapolated t -1
Heejune AHN: Image and Video Compression
p. 22
Sub-pixel Motion Estimation
Note
Object cannot happens to move integer pixels
We have only integer pixel samples
Sub-pixel estimation
Get the fractional pel values in reference frame
Normally using linear interpolation
Half-pel/quarter-pel
Heejune AHN: Image and Video Compression
p. 23
5. Enhanced Motion Models
More Motion Estimation Model
Rigid 2D Translation (BMA)
• + Transformation
Global Motion
• + Illumination variation
• + zoom-in/out
Object Model
• + overlapping of objects
• + 3D Rotation
• + Non rigid objects (deformation)
Some are from computer vision area
But at present most tools are too complex for application to video
coding area
Some are included in MPEG-4 Part 2’s Object Oriented Coding
Heejune AHN: Image and Video Compression
p. 24
Examples
Region based motion
compensation
• How to get/describe shape
and motion
Global motion (picture
warping)
• Called Camera motion
Mesh-based Deformation
Heejune AHN: Image and Video Compression
p. 25
6. Implementation
Video Encoder and Decoder Complexity Profiling
Heejune AHN: Image and Video Compression
p. 26
SW Optimization
Algorithm level optimization : independent of CPU
Data structure design (most modern CPU, RISC)
Memory Cache optimization
Current blocks into cache
Loop unrolling (See Fig. 6.21)
• Reduce the pointer operation and jump prediction (pipelining)
CPU-specifics Optimization
SIMD (Single Instruction with Multiple Data)
• Packed Instruction (See Fig. 6.22)
• TI DSP, Intel MMX etc
MIMD (MuParalell Processing Core)
• VLIW (Very Long Instruction Word) of TI DSP
GPU
DMA utilization
Coprocessor Utilization
• DCT, ME, Post/Pre Processing
Heejune AHN: Image and Video Compression
p. 27
HW Optimization
Criteria
Performance, cycle count, gate-count, data flow
Example #1: Full Search
Parallelization
• M function block, then M Speed up
Search Window
Memory
(DRAM/SRAM)
SAE
Current MB
(SRAM)
SAE
SAE
SAE
Comparator
Heejune AHN: Image and Video Compression
p. 28
Example #2: Fast Search
TSS and Hierachical search (has fixed clock property)
Pipelining blocks for speed up
Search Window
Memory
(DRAM/SRAM)
Current MB
(SRAM)
STEP1
Step 2
Step3
Step 4
(+/-4
(+/-2)
(+/-1)
(+/-1/2)
t =1
t=2
block 1
block 2
block 1
t= 3
t=4
block 3
block4
block 2
block 3
Heejune AHN: Image and Video Compression
block 1
block 2
block 1
p. 29