Overview of the Scalable Video Coding Extension of the H
Download
Report
Transcript Overview of the Scalable Video Coding Extension of the H
Overview of the Scalable Video
Coding Extension of the
H.264/AVC Standard
Kai-Chao Yang
2007/8
Kai-Chao Yang, NTHU, Taiwan
1
Outline
Introduction
Problems
Definition
Functionality
Goal
Competition
Applications
Targets
History of SVC
Structure of SVC
Temporal Scalability
Spatial Scalability
Quality Scalability
Combined Scalability
Profiles of SVC
Conclusions
2007/8
Kai-Chao Yang, NTHU, Taiwan
2
Introduction - problem
Non-Scalable Video Streaming
Multiple video streams are needed for
heterogeneous clients
8Mb/s
512Kb/s
1Mb/s
6Mb/s
2007/8
4Mb/s
Kai-Chao Yang, NTHU, Taiwan
3
Introduction - definition
Scalable video stream
Sub-stream n
Sub-stream 2
Sub-stream 1
reconstruc
tion
Sub-stream ki
High quality
…
…
Sub-stream k2
Sub-stream k1
Low quality
Scalability
2007/8
Removal of parts of the video bit-stream to
adapt to the various needs of end users and to
varying terminal capabilities or network
conditions
Kai-Chao Yang, NTHU, Taiwan
4
Introduction - functionality
Functionality of SVC
2007/8
Graceful degradation when “right” parts of the
bit-stream are lost
Bit-rate adaptation to match the channel
throughput
Format adaptation for backwards compatible
extension
Power adaptation for trade-off between
runtime and quality
Kai-Chao Yang, NTHU, Taiwan
5
Introduction - mode
Example
Most significant bit
Enhancement 1
Enhancement 2
Enhancement 3
Enhancement 4
Enhancement 5
Enhancement layer
Base layer
0
1
1
0
1
residual
10010
01101
10010
11001
00101
Scalability mode
2007/8
Fidelity reduction (SNR scalability)
Picture size reduction (spatial scalability)
Frame rate reduction (temporal scalability)
Sharpness reduction (frequency scalability)
Selection of content (ROI or object-based
scalability)
Kai-Chao Yang, NTHU, Taiwan
6
Structure of SVC
SNR scalable
coding
Temporal
scalable coding
Prediction
Multiplex
Spatial
decimation
SNR scalable
coding
Temporal
scalable coding
2007/8
Base layer
coding
Prediction
Kai-Chao Yang, NTHU, Taiwan
Base layer
coding
7
Temporal Scalability
Hierarchical prediction structures
Hierarchical B pictures
0 4 3 5 2 7 6 8 1 12 11 13 10 15 14 16 9
GOP
Non-dyadic hierarchical
prediction
0 3 4 2 6 7 5 8 9 1 12 13 11 15 16 14 17 18 10
Hierarchical prediction with
zero delay
2007/8
Yang, NTHU, Taiwan
0 1 2 3 4 5 6 7 8 Kai-Chao
9 1011
1213 14 15 16
8
Temporal Scalability
N=1
Video Coding Experiment with H.264/MPEG4-AVC
Foreman, CIF 30Hz @ 1320kbps
Performance as a function of N
I P P P P P P P P
Cascaded QP assignment
QP(P) QP(B0)-3 QP(B1)-4 QP(B2)-5
Temporal
scalability
N=2
I B0 P B0 P B0 P B0 P
N=4
I B1 B0 B1 P B1 B0 B1 P
N=8
2007/8
I B2 B1 B2 B0 B2 B1 B2 P
Kai-Chao Yang, NTHU, Taiwan
9
This slide is copied from JVT-W132-Talk
Spatial Scalability
Hierarchical MCP
& Intra-prediction
Spatial
decimation
texture
motion
Base layer
coding
Inter-layer prediction
•Intra
•Motion
•Residual
H.264/AVC MCP &
Intra-prediction
2007/8
motion
Base layer
coding
Inter-layer prediction
•Intra
•Motion
•Residual
Hierarchical MCP
& Intra-prediction
Spatial
decimation
texture
texture
motion
Multiplex
Scalable
bit-stream
H.264/AVC compatible base
layer bit-stream
Base layer
coding
H.264/AVC compatible coder
Kai-Chao Yang, NTHU, Taiwan
10
Spatial Scalability
Similar to MPEG-2, H.263, and MPEG-4
Arbitrary resolution ratio
The same coding order in all spatial layers
Combination with temporal scalability
Inter-layer prediction
Spatial 1
Temporal 2
Intra
Spatial 0
Temporal 0
Temporal 1
Intra
2007/8
Kai-Chao Yang, NTHU, Taiwan
11
Spatial Scalability
The prediction signals are formed by
MCP inside the enhancement layer (Temporal) (small motion
and high spatial detail)
Up-sampling from the lower layer (Spatial)
Average of the above two predictions (Temporal +
Spatial)
Inter-layer prediction
Three kinds of inter-layer prediction
Base mode MB
2007/8
Inter-layer motion prediction
Inter-layer residual prediction
Inter-layer intra prediction
Only residual are transmitted, but no additional side info.
Kai-Chao Yang, NTHU, Taiwan
12
Spatial Scalability
Inter-layer motion prediction
base_mode_flag = 1
The reference layer is inter-coded
Data are derived from the reference layer
(2x1,2y1)
16
16
(x2,y2)
Reference layer
(x1,y1)
8
8
motion_pred_flag
2007/8
MB partitioning
Reference indices
MVs
(2x2,2y2)
1: MV predictors are obtained from the reference layer
0: MV predictors are obtained by conventional spatial
predictors.
Kai-Chao Yang, NTHU, Taiwan
13
Spatial Scalability
Inter-layer residual prediction
residual_pred_flag = 1
Predictor
2007/8
Block-wise up-sampling by a bi-linear filter from the
corresponding 88 sub-MB in the reference layer
Transform block basis
Kai-Chao Yang, NTHU, Taiwan
14
Spatial Scalability
Inter-layer intra prediction
base_mode_flag = 1
The reference layer is intra-coded
Up-sampling from the reference layer
2007/8
Luma: one-dimensional 4-tap FIR filter
Chroma: bi-linear filter
Kai-Chao Yang, NTHU, Taiwan
15
Spatial Scalability
Past spatial scalable video:
Inter-layer intra prediction requires completely
decoding of base layer.
Multiple motion compensation and deblocking
filter are needed.
Full decoding + inter-layer prediction: complexity >
simulcast.
Single-loop decoding
2007/8
Inter-layer intra prediction is restricted to MBs for
which the co-located base layer is intra-coded
Kai-Chao Yang, NTHU, Taiwan
16
Spatial Scalability
Single-loop vs. multi-loop decoding
Inter
I
2007/8
B
P
Kai-Chao Yang, NTHU, Taiwan
17
This slide is copied from http://iphome.hhi.de/wiegand/assets/pdfs/H264AVC_SVC.pdf
Spatial Scalability
Generalized spatial scalability in SVC
Arbitrary ratio
Cropping
2007/8
Neither the horizontal nor the vertical resolution can
decrease from one layer to the next.
Containing new regions
Higher quality of interesting regions
Kai-Chao Yang, NTHU, Taiwan
18
Spatial Scalability
Encoder control (JSVM)
Base layer
p0 ' arg min{D0 ( p0 ) 0 R0 ( p0 )}
{ p0 }
p0’ is optimized for base layer
Enhancement layer
p1 ' arg min{D1 ( p1 | p0 ) 1R1 ( p1 | p0 )}
{ p1| p0 }
Decisions of p1 depend on p0
2007/8
p1’ is optimized for enhancement layer
Efficient base layer coding but inefficient
enhancement layer coding
Kai-Chao Yang, NTHU, Taiwan
19
Spatial Scalability
Encoder control (optimization)
Base layer
Considering enhancement layer coding
Eliminating p0’s disadvantaging enhancement layer coding
p0 ' arg min{(1 w)[D0 ( p0 ) 0 R0 ( p0 )] w[ D1 ( p1
| p0 ) 1R1 ( p1 | p0 )]}
{ p0 , p1| p0 }
Enhancement layer
w
2007/8
No change
w = 0: JSVM encoder control
w = 1: Single-loop encoder control (base layer is not
controlled)
Kai-Chao Yang, NTHU, Taiwan
20
Quality Scalability
Coarse-grain quality scalability (CGS)
A special case of spatial scalability
Smaller quantization step sizes of for higher
enhancement residual layers
Designed for only several selected bit-rate
points
2007/8
Identical sizes for base and enhancement layers
Supported bit-rate points = Number of layers
Switch can only occur at IDR access units
Kai-Chao Yang, NTHU, Taiwan
21
Quality Scalability
Medium-grain quality scalability (MGS)
More enhancement layers are supported
Key pictures
2007/8
Refinement quality layers of residual
Drift control
Switch can occur at any access units
CGS + key pictures + refinement quality layers
Kai-Chao Yang, NTHU, Taiwan
22
Quality Scalability
Drift control
Drift: The effect caused by unsynchronized MCP
at the encoder and decoder side
Trade-off of MCP in quality SVC
2007/8
Coding efficiency drift
Kai-Chao Yang, NTHU, Taiwan
23
Quality Scalability
MPEG-4 quality scalability with FGS
Refinement
(possibly lost
or truncated)
Base layer
Base layer is stored and used for MCP of following pictures
Drift: Drift free
Complexity: Low
Efficiency: Efficient based layer but inefficient enhancement
layer
2007/8
Refinement data are not used for MCP
Kai-Chao Yang, NTHU, Taiwan
24
Quality Scalability
MPEG-2 quality scalability (without FGS)
Refinement
(possibly lost
or truncated)
Base layer
Only 1 reference picture is stored and used for MCP of
following pictures
Drift: Both base layer and enhancement layer
2007/8
Frequent intra updates is necessary
Complexity: Low
Efficiency: Efficient enhancement layer but inefficient base
layer
Kai-Chao Yang, NTHU, Taiwan
25
Quality Scalability
2-loop prediction
Refinement
(possibly lost
or truncated)
Base layer
2007/8
Several closed encoder loops run at different bitrate points in a layered structure
Drift: Enhancement layer
Complexity: High
Efficiency: Efficient base layer and medium efficient
enhancement layer
Kai-Chao Yang, NTHU, Taiwan
26
Quality Scalability
SVC concepts
Refinement
(possibly lost
or truncated)
Base layer
Key picture
2007/8
Trade-off between coding efficiency and drift
MPEG-4 FGS: All key pictures
MPEG-2 quality scalability: No key pictures
Kai-Chao Yang, NTHU, Taiwan
27
Quality Scalability
Drift control with hierarchical prediction
Refinement
(possibly lost
or truncated)
Base layer
P
Key pictures
2007/8
B1
B2
P
B2
B1
B2
P
Based layer is stored and used for the MCP of following pictures
Other pictures
B2
Enhancement layer is stored and used for the MCP of following
pictures
GOP size adjusts the trade-off between enhancement
layer coding efficiency
andNTHU,
drift
Kai-Chao Yang,
Taiwan
28
Combined Scalability
SVC encoder structure
The same
motion/prediction
information
Dependency layer
Temporal
Decomposition
The same
motion/prediction
information
2007/8
Kai-Chao Yang, NTHU, Taiwan
29
Combined Scalability
Dependency and Quality refinement layers
Q=2
D=2
Q=1
Q=0
Q=2
D=1
Q=1
Scalable bitstream
Q=0
Q=2
D=0
2007/8
Q=1
Q=0
Kai-Chao Yang, NTHU, Taiwan
30
Combined Scalability
Q1
D1
Q0
T0
T2
T1
T2
T0
Q1
D0
Q0
2007/8
Kai-Chao Yang, NTHU, Taiwan
31
Combined Scalability
Bit-stream format
NAL unit header NAL unit header extension
2
6
3
3
2
P
T
D
Q
NAL unit payload
1 1 1 1 1
3
P (priority_id): indicates the importance of a NAL unit
T (temporal_id): indicates temporal level
D (dependency_id): indicates spatial/CGS layer
Q (quality_id): indicates MGS/FGS layer
2007/8
Kai-Chao Yang, NTHU, Taiwan
32
Combined Scalability
Bit-stream switching
Inside a dependency layer
Outside a dependency layer
2007/8
Switching everywhere
Switching up only at IDR access units
Switching down everywhere if using multiple-loop
decoding
Kai-Chao Yang, NTHU, Taiwan
33
Profiles of SVC
Scalable Baseline
2007/8
For conversational and surveillance applications
requiring low decoding complexity
Spatial scalability: fixed ratio (1, 1.5, or 2) and MBaligned cropping
Temporal and quality scalability: arbitrary
No interlaced coding tools
B-slices, weighted prediction, CABAC, and 8x8 luma
transform
The base layer conforms Baseline profile of H.264/AVC
Kai-Chao Yang, NTHU, Taiwan
34
Profiles of SVC
Scalable High
For broadcast, streaming, and storage
Spatial, temporal, and quality scalability:
arbitrary
The base layer conforms High profile of
H.264/AVC
Scalable High Intra
2007/8
Scalable High + all IDR pictures
Kai-Chao Yang, NTHU, Taiwan
35
References
H. Schwarz, D. Marpe, and T. Wiegand, “Overview of
the Scalable Video Coding Extension of the H.264/AVC
Standard,” CSVT 2007.
T. Wiegand, “Scalable Video Coding,” Joint Video
Team, doc. JVT-W132, San Jose, USA, April 2007.
T. Wiegand, “Scalable Video Coding,” Digital Image
Communication, Course at Technical University of
Berlin, 2006. (Available on
http://iphome.hhi.de/wiegand/dic.htm)
H. Schwarz, D. Marpe, and T. Wiegand, “Constrained
Inter-Layer Prediction for Single-Loop Decoding in
Spatial Scalability,” Proc. of ICIP’05.
2007/8
Kai-Chao Yang, NTHU, Taiwan
36