Overview of the Scalable Video Coding Extension of the H

Transcript Overview of the Scalable Video Coding Extension of the H

Overview of the Scalable Video
Coding Extension of the
H.264/AVC Standard
Kai-Chao Yang
2007/8
Kai-Chao Yang, NTHU, Taiwan
1
Outline

Introduction















Problems
Definition
Functionality
Goal
Competition
Applications
Targets
History of SVC
Structure of SVC
Temporal Scalability
Spatial Scalability
Quality Scalability
Combined Scalability
Profiles of SVC
Conclusions
2007/8
Kai-Chao Yang, NTHU, Taiwan
2
Introduction - problem

Non-Scalable Video Streaming

Multiple video streams are needed for
heterogeneous clients
8Mb/s
512Kb/s
1Mb/s
6Mb/s
2007/8
4Mb/s
Kai-Chao Yang, NTHU, Taiwan
3
Introduction - definition

Scalable video stream
Sub-stream n
Sub-stream 2
Sub-stream 1

reconstruc
tion
Sub-stream ki
High quality
…
…

Sub-stream k2
Sub-stream k1
Low quality
Scalability

2007/8
Removal of parts of the video bit-stream to
adapt to the various needs of end users and to
varying terminal capabilities or network
conditions
Kai-Chao Yang, NTHU, Taiwan
4
Introduction - functionality

Functionality of SVC




2007/8
Graceful degradation when “right” parts of the
bit-stream are lost
Bit-rate adaptation to match the channel
throughput
Format adaptation for backwards compatible
extension
Power adaptation for trade-off between
runtime and quality
Kai-Chao Yang, NTHU, Taiwan
5
Introduction - mode

Example
Most significant bit
Enhancement 1
Enhancement 2
Enhancement 3
Enhancement 4
Enhancement 5
Enhancement layer
Base layer

0
1
1
0
1
residual
10010
01101
10010
11001
00101
Scalability mode





2007/8
Fidelity reduction (SNR scalability)
Picture size reduction (spatial scalability)
Frame rate reduction (temporal scalability)
Sharpness reduction (frequency scalability)
Selection of content (ROI or object-based
scalability)
Kai-Chao Yang, NTHU, Taiwan
6
Structure of SVC
SNR scalable
coding
Temporal
scalable coding
Prediction
Multiplex
Spatial
decimation
SNR scalable
coding
Temporal
scalable coding
2007/8
Base layer
coding
Prediction
Kai-Chao Yang, NTHU, Taiwan
Base layer
coding
7
Temporal Scalability

Hierarchical prediction structures
Hierarchical B pictures
0 4 3 5 2 7 6 8 1 12 11 13 10 15 14 16 9
GOP
Non-dyadic hierarchical
prediction
0 3 4 2 6 7 5 8 9 1 12 13 11 15 16 14 17 18 10
Hierarchical prediction with
zero delay
2007/8
Yang, NTHU, Taiwan
0 1 2 3 4 5 6 7 8 Kai-Chao
9 1011
1213 14 15 16
8
Temporal Scalability
N=1
Video Coding Experiment with H.264/MPEG4-AVC
Foreman, CIF 30Hz @ 1320kbps
Performance as a function of N
I P P P P P P P P
Cascaded QP assignment
QP(P)  QP(B0)-3  QP(B1)-4  QP(B2)-5
Temporal
scalability
N=2
I B0 P B0 P B0 P B0 P
N=4
I B1 B0 B1 P B1 B0 B1 P
N=8
2007/8
I B2 B1 B2 B0 B2 B1 B2 P
Kai-Chao Yang, NTHU, Taiwan
9
This slide is copied from JVT-W132-Talk
Spatial Scalability
Hierarchical MCP
& Intra-prediction
Spatial
decimation
texture
motion
Base layer
coding
Inter-layer prediction
•Intra
•Motion
•Residual
H.264/AVC MCP &
Intra-prediction
2007/8
motion
Base layer
coding
Inter-layer prediction
•Intra
•Motion
•Residual
Hierarchical MCP
& Intra-prediction
Spatial
decimation
texture
texture
motion
Multiplex
Scalable
bit-stream
H.264/AVC compatible base
layer bit-stream
Base layer
coding
H.264/AVC compatible coder
Kai-Chao Yang, NTHU, Taiwan
10
Spatial Scalability





Similar to MPEG-2, H.263, and MPEG-4
Arbitrary resolution ratio
The same coding order in all spatial layers
Combination with temporal scalability
Inter-layer prediction
Spatial 1
Temporal 2
Intra
Spatial 0
Temporal 0
Temporal 1
Intra
2007/8
Kai-Chao Yang, NTHU, Taiwan
11
Spatial Scalability

The prediction signals are formed by

MCP inside the enhancement layer (Temporal) (small motion
and high spatial detail)



Up-sampling from the lower layer (Spatial)
Average of the above two predictions (Temporal +
Spatial)
Inter-layer prediction

Three kinds of inter-layer prediction




Base mode MB

2007/8
Inter-layer motion prediction
Inter-layer residual prediction
Inter-layer intra prediction
Only residual are transmitted, but no additional side info.
Kai-Chao Yang, NTHU, Taiwan
12
Spatial Scalability

Inter-layer motion prediction



base_mode_flag = 1
The reference layer is inter-coded
Data are derived from the reference layer




(2x1,2y1)
16
16
(x2,y2)
Reference layer
(x1,y1)
8
8
motion_pred_flag


2007/8
MB partitioning
Reference indices
MVs
(2x2,2y2)
1: MV predictors are obtained from the reference layer
0: MV predictors are obtained by conventional spatial
predictors.
Kai-Chao Yang, NTHU, Taiwan
13
Spatial Scalability

Inter-layer residual prediction


residual_pred_flag = 1
Predictor


2007/8
Block-wise up-sampling by a bi-linear filter from the
corresponding 88 sub-MB in the reference layer
Transform block basis
Kai-Chao Yang, NTHU, Taiwan
14
Spatial Scalability

Inter-layer intra prediction



base_mode_flag = 1
The reference layer is intra-coded
Up-sampling from the reference layer


2007/8
Luma: one-dimensional 4-tap FIR filter
Chroma: bi-linear filter
Kai-Chao Yang, NTHU, Taiwan
15
Spatial Scalability

Past spatial scalable video:




Inter-layer intra prediction requires completely
decoding of base layer.
Multiple motion compensation and deblocking
filter are needed.
Full decoding + inter-layer prediction: complexity >
simulcast.
Single-loop decoding

2007/8
Inter-layer intra prediction is restricted to MBs for
which the co-located base layer is intra-coded
Kai-Chao Yang, NTHU, Taiwan
16
Spatial Scalability

Single-loop vs. multi-loop decoding
Inter
I
2007/8
B
P
Kai-Chao Yang, NTHU, Taiwan
17
This slide is copied from http://iphome.hhi.de/wiegand/assets/pdfs/H264AVC_SVC.pdf
Spatial Scalability

Generalized spatial scalability in SVC

Arbitrary ratio


Cropping


2007/8
Neither the horizontal nor the vertical resolution can
decrease from one layer to the next.
Containing new regions
Higher quality of interesting regions
Kai-Chao Yang, NTHU, Taiwan
18
Spatial Scalability

Encoder control (JSVM)

Base layer

p0 '  arg min{D0 ( p0 )  0 R0 ( p0 )}
{ p0 }


p0’ is optimized for base layer
Enhancement layer

p1 '  arg min{D1 ( p1 | p0 )  1R1 ( p1 | p0 )}
{ p1| p0 }


Decisions of p1 depend on p0

2007/8
p1’ is optimized for enhancement layer
Efficient base layer coding but inefficient
enhancement layer coding
Kai-Chao Yang, NTHU, Taiwan
19
Spatial Scalability

Encoder control (optimization)

Base layer

Considering enhancement layer coding

Eliminating p0’s disadvantaging enhancement layer coding
 p0 '  arg min{(1  w)[D0 ( p0 )  0 R0 ( p0 )]  w[ D1 ( p1
| p0 )  1R1 ( p1 | p0 )]}
{ p0 , p1| p0 }

Enhancement layer


w


2007/8
No change
w = 0: JSVM encoder control
w = 1: Single-loop encoder control (base layer is not
controlled)
Kai-Chao Yang, NTHU, Taiwan
20
Quality Scalability

Coarse-grain quality scalability (CGS)

A special case of spatial scalability



Smaller quantization step sizes of for higher
enhancement residual layers
Designed for only several selected bit-rate
points


2007/8
Identical sizes for base and enhancement layers
Supported bit-rate points = Number of layers
Switch can only occur at IDR access units
Kai-Chao Yang, NTHU, Taiwan
21
Quality Scalability

Medium-grain quality scalability (MGS)

More enhancement layers are supported


Key pictures



2007/8
Refinement quality layers of residual
Drift control
Switch can occur at any access units
CGS + key pictures + refinement quality layers
Kai-Chao Yang, NTHU, Taiwan
22
Quality Scalability

Drift control


Drift: The effect caused by unsynchronized MCP
at the encoder and decoder side
Trade-off of MCP in quality SVC

2007/8
Coding efficiency  drift
Kai-Chao Yang, NTHU, Taiwan
23
Quality Scalability

MPEG-4 quality scalability with FGS
Refinement
(possibly lost
or truncated)
Base layer




Base layer is stored and used for MCP of following pictures
Drift: Drift free
Complexity: Low
Efficiency: Efficient based layer but inefficient enhancement
layer

2007/8
Refinement data are not used for MCP
Kai-Chao Yang, NTHU, Taiwan
24
Quality Scalability

MPEG-2 quality scalability (without FGS)
Refinement
(possibly lost
or truncated)
Base layer


Only 1 reference picture is stored and used for MCP of
following pictures
Drift: Both base layer and enhancement layer



2007/8
Frequent intra updates is necessary
Complexity: Low
Efficiency: Efficient enhancement layer but inefficient base
layer
Kai-Chao Yang, NTHU, Taiwan
25
Quality Scalability

2-loop prediction
Refinement
(possibly lost
or truncated)
Base layer




2007/8
Several closed encoder loops run at different bitrate points in a layered structure
Drift: Enhancement layer
Complexity: High
Efficiency: Efficient base layer and medium efficient
enhancement layer
Kai-Chao Yang, NTHU, Taiwan
26
Quality Scalability

SVC concepts
Refinement
(possibly lost
or truncated)
Base layer

Key picture



2007/8
Trade-off between coding efficiency and drift
MPEG-4 FGS: All key pictures
MPEG-2 quality scalability: No key pictures
Kai-Chao Yang, NTHU, Taiwan
27
Quality Scalability

Drift control with hierarchical prediction
Refinement
(possibly lost
or truncated)
Base layer
P

Key pictures


2007/8
B1
B2
P
B2
B1
B2
P
Based layer is stored and used for the MCP of following pictures
Other pictures


B2
Enhancement layer is stored and used for the MCP of following
pictures
GOP size adjusts the trade-off between enhancement
layer coding efficiency
andNTHU,
drift
Kai-Chao Yang,
Taiwan
28
Combined Scalability

SVC encoder structure
The same
motion/prediction
information
Dependency layer
Temporal
Decomposition
The same
motion/prediction
information
2007/8
Kai-Chao Yang, NTHU, Taiwan
29
Combined Scalability

Dependency and Quality refinement layers
Q=2
D=2
Q=1
Q=0
Q=2
D=1
Q=1
Scalable bitstream
Q=0
Q=2
D=0
2007/8
Q=1
Q=0
Kai-Chao Yang, NTHU, Taiwan
30
Combined Scalability
Q1
D1
Q0
T0
T2
T1
T2
T0
Q1
D0
Q0
2007/8
Kai-Chao Yang, NTHU, Taiwan
31
Combined Scalability

Bit-stream format
NAL unit header NAL unit header extension
2
6
3
3
2
P
T
D
Q
NAL unit payload
1 1 1 1 1
3
P (priority_id): indicates the importance of a NAL unit
T (temporal_id): indicates temporal level
D (dependency_id): indicates spatial/CGS layer
Q (quality_id): indicates MGS/FGS layer
2007/8
Kai-Chao Yang, NTHU, Taiwan
32
Combined Scalability

Bit-stream switching

Inside a dependency layer


Outside a dependency layer


2007/8
Switching everywhere
Switching up only at IDR access units
Switching down everywhere if using multiple-loop
decoding
Kai-Chao Yang, NTHU, Taiwan
33
Profiles of SVC

Scalable Baseline






2007/8
For conversational and surveillance applications
requiring low decoding complexity
Spatial scalability: fixed ratio (1, 1.5, or 2) and MBaligned cropping
Temporal and quality scalability: arbitrary
No interlaced coding tools
B-slices, weighted prediction, CABAC, and 8x8 luma
transform
The base layer conforms Baseline profile of H.264/AVC
Kai-Chao Yang, NTHU, Taiwan
34
Profiles of SVC

Scalable High




For broadcast, streaming, and storage
Spatial, temporal, and quality scalability:
arbitrary
The base layer conforms High profile of
H.264/AVC
Scalable High Intra

2007/8
Scalable High + all IDR pictures
Kai-Chao Yang, NTHU, Taiwan
35
References




H. Schwarz, D. Marpe, and T. Wiegand, “Overview of
the Scalable Video Coding Extension of the H.264/AVC
Standard,” CSVT 2007.
T. Wiegand, “Scalable Video Coding,” Joint Video
Team, doc. JVT-W132, San Jose, USA, April 2007.
T. Wiegand, “Scalable Video Coding,” Digital Image
Communication, Course at Technical University of
Berlin, 2006. (Available on
http://iphome.hhi.de/wiegand/dic.htm)
H. Schwarz, D. Marpe, and T. Wiegand, “Constrained
Inter-Layer Prediction for Single-Loop Decoding in
Spatial Scalability,” Proc. of ICIP’05.
2007/8
Kai-Chao Yang, NTHU, Taiwan
36

Overview of the Scalable Video Coding Extension of the H

Transcript Overview of the Scalable Video Coding Extension of the H

Directory