Lecture 11 - New Mexico Tech

Download Report

Transcript Lecture 11 - New Mexico Tech

CSE 489-02 & CSE 589-02 Multimedia Processing
Lecture 11 Video Coding
Spring 2009
New Mexico Tech
4/13/2015
1
History
H.264/AVC
4/13/2015
2
MC-DCT Coding Framework
Motion estimation/compensation based on previously
decoded frames
Block-translation motion model
 Inter-coding: DCT-based coding of prediction error (residue)
 Intra-coding: If motion estimation fails or synchronization is
desired, macro-block is encoded in intra-mode
Most international video coding standards are based
on this coding framework
 Video teleconferencing: H.261, H.263, H.263++, H.264
 Video archive & play-back: MPEG-1, MPEG-2 (in DVDs), MPEG-4
4/13/2015
3
Hybrid MC-DCT Encoder
Transform,
Quantization,
Entropy Coding
Input
Macro-Block
Motion
Compensated
Prediction
Encoded Residual
(To Channel)
Entropy Decoding,
Inverse Q,
Inverse Transform
Decoder
Decoded Input
Macro-Block
(To Display)
Motion Comp.
Predictor
4/13/2015
Motion
Estimation
Frame Buffer
(Delay)
Motion Vector and
Block Mode Data
(Side-Info, To Channel)
4
Inter and Intra Coding

Intra



MB is encoded as is without motion
compensation
DCT followed by Q, zig-zag, run-length,
Huffman
Inter



4/13/2015
Block-matching motion estimation
Predictive motion residue from best-match
block is DCT encoded (similarly to intra-mode)
Motion vector is differentially encoded
5
Intra-Coding Mode
DCT
input MB
E
Q
Q
Encoder
to bit-stream
1
IDCT
to motion compensated frame
bit-stream
E
1
Q 1
IDCT
to display frame
Decoder
4/13/2015
6
Inter-Coding Mode
xn
input
MB
rn
DCT
xˆMC n 1
E
Q
to bit-stream
Q 1
IDCT
Encoder
rˆn
xˆMC n 1
MC
4/13/2015
xn
ME
xˆn  1
D
reference
frame
xˆn
7
Video Sequence and Picture
Intra 0

Inter 2
Inter 3
Inter 4
Intra Picture (I-Picture)



Inter 1
Encoded without referencing others
All MBs are intra coded
Inter Picture (P-Picture, B-Picture)


4/13/2015
Encoded by referencing other pictures
Some MBs are intra coded, and
some are inter coded
8
Inter 5
Group of Pictures
GOP

GOP
…
GOP
I
B
Group of Pictures (GOP)
I
B
B
P
B
B
P
Frame order: 0
1
2
3
4
5
6
Encoding order: 0
2
3
1
5
6
4
Video stream
4/13/2015
… B
B
B
P
…
Coding of I-Slice
DCT
Original block
Bit-stream
15 0 -2 -1 -1 -1 0 …
Entropy coding
4/13/2015
Transformed block
Zig-zag scan
10
Quantization matrix
Coding of P-Slice
-
Motion Estimation
=
Original current frame
=
Motion Vectors
+
Frame buffer
4/13/2015
Reconstructed reference
frame
11
Motion Compensation
Residual
Motion Estimation in H.261
8


4/13/2015
Macro-block
 Luminance:
16x16, four 8x8
blocks
 Chrominance:
two 8x8 blocks
 Motion estimation
only performed
for luminance
component
Motion vector range
 [ -15, 15]
8
Y
Y
Y
Y
Cr
Cb
15
15
MB
15
15
Search Area in Reference Frame
12
Coding of Motion Vectors



MV has range [-15, 15]
Integer pixel ME search only
Motion vectors are differentially & separably
encoded
MVDx  MVx [n]  MVx [n  1]
MVDy  MVy [n]  MVy [n  1]


11-bit VLC for MVD
Example
MV = 2 2 3 5 3 1 -1…
MVD = 0 1 2 -2 -2 -2…
4/13/2015
Binary: 1 010 0010 0011 0011 0011…
13
Inter/Intra Switching

Based on energy of prediction error


High energy: scene change, occlusions,
uncovered areas…  use intra mode
Low energy: stationary background,
translational motion …  use inter mode
VAR
INTER
1
2


VAR 
c
[
x
,
y
]

c

256 MB
64
INTRA
MSE
64
4/13/2015
1
2


MSE 
c
[
x
,
y
]

r
[
x

dx
,
y

dy
]

14
256 MB
Loop Filter




4/13/2015
Optional
Can be turned on or off for each block, usually go
together with MC
Advantage
 Decreases prediction error by smoothing the
prediction frame
 Reduces high-frequency artifacts like mosquito
effects
Disadvantage
 Increases complexity & overhead
15
Quantization


Uniform mid-rise quantizer for intra DC coefficients
Uniform mid-tread quantizer with double dead zone for
inter DC and all AC coefficients
Y
^
=X
-2Q
Y
-Q
1
1
X
0 -1
-2
4/13/2015
2
^
=X
2
For intra DC
Q
2Q
-2Q
-Q
0
Q
X
2Q
-1
-2
For inter DC and all AC
16
H.263


Standardization effort started Nov 1993
Aim



Near-term


H.263 and H.263+: established late 1997
Long-term


low bit-rate video communications, less than 64 kbps
target PSTN and mobile network: 10-32 kbps
H.26L, H.264: still under investigation
Main properties


4/13/2015
H.261 with many MPEG features optimized for low bit
rates
Performance: 3-4 dB improvements over H.261 at less
than 64 kbps; 30% bit rate saving over MPEG-1
17
MPEG



Coding and communications of moving
pictures and associated audio for digital
storage and archival
MPEG: Moving Picture Expert Group
MPEG family





MPEG-1,
MPEG-2,
MPEG-4,
MPEG-7,
Nov 1992
Nov 1994
Oct 1998
ongoing work
Main features of the MPEG video family



4/13/2015

Bi-directional MEMC
I-frame, P-frame, B-frame
Structure: Group of Pictures (GOP), picture, slice, macroblock
18
Coding decisions
MPEG Goals and Applications

MPEG-1






Optimized for applications that support a continuous
transfer bit rate of about 1.5 Mbps (example, CD-ROM)
Target 1.2 Mbps for video and 250-300 kbps for audio,
around analog VHS quality
Does not support interlaced sources
Main target source: SIF YCrCb 4:2:0 360 x 240 x 30 fps
VCD
MPEG-2


4/13/2015

The most commercially successful international coding
standard
Wide range of bit rates: 4 – 80 Mbps; optimized for 4
Mbps
Target high-resolution, high-quality video broadcast &
playback
19
Requirements








4/13/2015
Coding of generic video at around 1.5 Mbps
at reasonable quality (VHS)
Random access capability, frequent access
point
Fast forward and fast rewind capability
Audio-video synchronization during play and
access
Simple decoder
Flexibility of data format
Certain degree of robustness to
communication errors
Real-time encoder possibility
20
From H.261 to MPEG-1

There are a few new features in MPEG-1
comparing to the pioneering H.261 codec







4/13/2015
Flexible data sizes and frame rates
More flexible slice structure to replace the fixed
GOB structure
Data structure: introducing Group of Picture (GOP)
allowing frequent access points
Bi-directional motion compensation, B-frames
Half-pixel motion compensation
More finely tuned VLCs for different purposes
Quantization table (like JPEG) replaces single Q
step size
21
Bidirectional MC Properties

Advantage




Higher coding efficiency, frame rate can be
increased significantly with few bits
More accurate motion estimation &
compensation
No error propagation
Disadvantage


4/13/2015
More memory buffer for frame storage
(minimum of 3)
More end-to-end delay
22
H.264/AVC History

In the early 1990’s, the first video compression
standards were introduced:



Since then, the technology has advanced rapidly




H.261 (1990) and H.263 (1995) from ITU
MPEG-1 (1993) and MPEG-2 (1996) from ISO
H.263 was followed by H.263+, H.263++, H.26L
MPEG-1/2 followed by MPEG-4 visual
But industry and research coders are still way ahead
H.264/AVC is a joint project of ITU and ISO, to
create an up-to-date standard.
4/13/2015
23
Scope and Context

Aimed at providing high-quality compression for
various services:





Standard defines:



IP streaming media (50-1500 kbps)
SDTV and HDTV Broadcast and video-on-demand (1 - 8+
Mbps)
DVD
Conversational services (<1 Mbps, low latency)
Decoder functionality (but not encoder)
File and stream structure
Final results:
2-fold improvement in compression
Same fidelity, half the size --- Compared to H.263 and MPEG-2
4/13/2015
24
Video Compression

Motion compensation / prediction





Image transform


Described current frame based on previous frame
Output description + residual image
Predicted frames are called “inter-frames”.
Some frames (intra-frames) are encoded without prediction,
as natural images.
Concentrate image energy in relatively few numeric
coefficients
Lossy coding


4/13/2015
Compress coefficient values in a lossy manner
Try to keep most important information
25
The H.263 Standard Coder
original video
compressed video
Motion
Compensation
4/13/2015
Image
Transform
Lossy
Coding
26
The H.263 Standard Coder
original video
H.263 Motion Compensation
compressed video
• Image is divided into 16x16 macroblocks,
• Each macroblock is matched against nearby blocks in
previous frame (called reference frame),
• “Nearby” = within 15-pixel horizontal/vertical range
Image
Lossy
Motion
• Half-pixel accuracy (with
bilinear pixel interpolation)
Transform
Coding
Compensation
• Best match is used to predict the macroblock,
• The relative displacement, or motion vector, is encoded
and transmitted to decoder
• Prediction error for all blocks constitute the residual.
4/13/2015
27
Motion Compensation Example
4/13/2015
T=1 (reference)
T=2 (current)
28
The H.263 Standard Coder
original video
compressed video
H.263 Image Transform
• Residual is divided into 8x8 blocks,
• 8x8 2-d Discrete Cosine Transform (DCT) is applied to each
block independently
Image
Lossy
Motion
• DCT coefficients describe spatial frequencies in the block:
Transform
Coding
Compensation
• High frequencies correspond to small features and texture
• Low frequencies correspond to larger features
• Lowest frequency coefficient, called DC, corresponds to the
average intensity of the block
4/13/2015
29
8x8 DCT Example
4/13/2015
30
8x8 DCT Example
4/13/2015
31
8x8 DCT Example
4/13/2015
32
The H.263 Standard Coder
original video
compressed video
H.263 Lossy Coding
• Transform coefficients are quantized:
• Some less-significant bits are dropped
• Only the remaining bits are encoded
• For inter-frames, all coefficients get the same number of bits,
except for the DC which gets more.
Image
Lossy
• ForMotion
intra-frames, lower-frequency coefficients get more
bits
Transform
Coding
Compensation
• To preserve larger features better
• The actual number of bits used depends on a quantization
parameter (QP), whose value depends on the bit-allocation policy
• Finally, bits are encoded using entropy (lossless) code
• Traditionally Huffman-style code
4/13/2015
33
Changes in Motion Compensation

Quarter-pixel accuracy


Variable block-size:



Every 16x16 macroblock can be subdivided
Each sub-block gets predicted separately
Multiple and arbitrary reference frames


A gain of 1.5-2dB across the board over ½pixel
Vs. only previous (H.263) or previous and next
(MPEG).
Anti-aliasing sub-pixel interpolation

4/13/2015
Removes some common artifacts in residual
34
Variable Block-Size MC

Motivation: size of moving/stationary
objects is variable



Many small blocks may take too many bits to
encode
Few large blocks give lousy prediction
In H.264, each 16x16 macroblock may be:




4/13/2015
Kept whole,
Divided horizontally (vertically) into two subblocks of size 16x8 (8x16)
Divided into 4 sub-blocks
In the last case, the 4 sub-blocks may be
divided once more into 2 or 4 smaller blocks.
35
H.264 Variable Block Sizes
4/13/2015
36
Motion Scale Example
4/13/2015
T=1
T=2
37
Motion Scale Example
4/13/2015
T=1
T=2
38
Motion Scale Example
4/13/2015
T=1
T=2
39
H.264 VBS Example
4/13/2015
T=1
T=2
40
Arbitrary Reference Frames



In H.263, the reference frame for prediction is
always the previous frame
In MPEG and H.26L, some frames are predicted
from both the previous and the next frames (biprediction)
In H.264, any one frame may be used as
reference:



Encoder and decoder maintain synchronized buffers of
available frames (previously decoded)
Reference frame is specified as index into this buffer
In bi-predictive mode, each macroblock may be:

4/13/2015

Predicted from one of the two references
Predicted from both, using weighted mean of predictors 41
Intra Prediction

Motivation: intra-frames are natural images, so
they exhibit strong spatial correlation


Macroblocks in intra-coded frames are predicted
based on previously-coded ones



Implemented to some extent in H.263++ and MPEG-4, but in
transform domain
Above and/or to the left of the current block
The macroblock may be divided into 16 4x4 sub-blocks which
are predicted in cascading fashion
An encoded parameter specifies which neighbors
should be used to predict, and how
4/13/2015
42
Intra-Prediction Example
4/13/2015
43
Intra-Prediction Example
Vertical
4/13/2015
44
Intra-Prediction Example
Horizontal
4/13/2015
45
Intra-Prediction Example
Main Diagonal
4/13/2015
46
H.264 Image Transform



Motivation:
 DCT requires real-number operations, which
may cause inaccuracies in inversion
H.264 uses a very simple integer 4x4 transform
 A (pretty crude) approximation to 4x4 DCT
 Transform matrix contains only +/-1 and +/-2
 Can be computed with only additions,
subtractions, and shifts
Results show negligible loss in quality (~0.02dB)
4/13/2015
47
Deblocking Filter
Non Deblocked Image
Deblocked Image
Courtesy : Images from http://compression.ru/video/deblocking/
4/13/2015
48
Entropy Coding

Motivation: traditional coders use fixed,
variable-length codes




Essentially Huffman-style codes
Non-adaptive
Can’t encode symbols with probability > 0.5
efficiently, since at least one bit required
H.263 Annex E defines an arithmetic coder


4/13/2015
Still non-adaptive
Uses multiple non-binary alphabets, which
results in high computational complexity
49
Entropy Coding: CABAC



Context-adaptive binary arithmetic coding
(CABAC) framework designed specifically for
H.264
Binarization: all syntax symbols are translated to
bit-strings
399 predefined context models, used in groups
 E.g. models 14-20 used to code macroblock
type for inter-frames
 The model to use next is selected based on
previously coded information (the context)
4/13/2015
50
Comparison to MPEG-2, H.263, MPEG-4p2
Quality
Y-PSNR [dB]
Tempete CIF 30Hz
38
37
36
35
34
33
32
31
30
29
28
27
26
25
JVT/H.264/AVC
MPEG-4 Visual
MPEG-2
H.263
0
500
1000
1500
2000
Bit-rate [kbit/s]
2500
3000
3500