Overview of the Scalable Video Coding Extension of the H
Download
Report
Transcript Overview of the Scalable Video Coding Extension of the H
Overview of the Scalable
Video Coding Extension of
the H.264/AVC Standard
Heiko Schwarz, Detlev Marpe, and
Thomas Wiegand
CSVT, Sept. 2007
2009/5
MC2008, VCLAB
1
Outline
Introduction
Problems
Definition
Functionality
Goal
Competition
Applications
Targets
History of SVC
Structure of SVC
Temporal Scalability
Spatial Scalability
Quality Scalability
Combined Scalability
Profiles of SVC
Conclusions
2007/8
MC2008, VCLAB
2
Introduction - problem
Non-Scalable Video Streaming
Multiple video streams are needed for
heterogeneous clients
8Mb/s
512Kb/s
1Mb/s
6Mb/s
2007/8
4Mb/s
MC2008, VCLAB
3
Introduction - definition
Scalable video stream
Sub-stream 2
Sub-stream 1
Sub-stream ki
reconstruc
tion
High quality
…
…
Sub-stream n
Sub-stream
k2
Sub-stream
k1
Low quality
Scalability
2007/8
Removal of parts of the video bit-stream to adapt
to the various needs of end users and to varying
terminal capabilities or network conditions
MC2008, VCLAB
4
Introduction - functionality
Functionality of SVC
2007/8
Graceful degradation when “right” parts of the bitstream are lost
Bit-rate adaptation to match the channel
throughput
Format adaptation for backwards compatible
extension
Power adaptation for trade-off between runtime
and quality
MC2008, VCLAB
5
Introduction - goal
Goal of SVC
Sub-stream ki
…
Sub-stream
k2
Sub-stream
k1
=
(Quality)
H.264/AVC
bit-stream
Scalability mode
2007/8
Fidelity reduction (SNR scalability)
Picture size reduction (spatial scalability)
Frame rate reduction (temporal scalability)
Sharpness reduction (frequency scalability)
Selection of content (ROI or object-based
scalability)
MC2008, VCLAB
6
Introduction - competition
SVC is an old research topic (> 20 years) and
has been included in H.262/MPEG-2, H.263,
and MPEG-4 Visual.
Rarely used because
The characteristics of traditional video transmission
systems
Significant loss of coding efficiency and large increase
in decoder complexity
Competition
2007/8
Simulcast
Transcoding
MC2008, VCLAB
7
Introduction - applications
Applications
Heterogeneous clients
Unequal protection
Surveillance
Problems
2007/8
Increased decoder complexity
Decreased coding efficiency
Temporal scalability is more often supported than
spatial and quality scalability.
MC2008, VCLAB
8
Introduction - targets
Targets
2007/8
Little decrease in coding efficiency
Little increase in decoding complexity
Support of temporal, spatial, and quality scalability
A backward compatible base layer
Simple bit-stream adaptations after encoding
MC2008, VCLAB
9
History of SVC
October 2003: MPEG issues a call for proposals of Scalable
Video Coding
12 wavelet-based
2 extensions of H.264/AVC
~October 2004: MSRA vs. HHI proposal (Wavelet-based vs.
H.264 Extension)
October 2004: HHI proposal adopted as starting point (due to
reduction of the encoder and decoder and improvements in coding
efficiency)
January 2005: MPEG and VCEG agree to jointly finalize the
SVC project as an Amendment of H.264/AVC
Spring 2007: Finalization
2007/8
MC2008, VCLAB
10
Structure of SVC
SNR
scalable
coding
Temporal
scalable
coding
Prediction
Base layer
coding
Multiplex
Spatial
decimation
SNR
scalable
coding
Temporal
scalable
coding
2007/8
Prediction
MC2008, VCLAB
Base layer
coding
11
Outline
Introduction
History of SVC
Structure of SVC
Temporal Scalability
Hierarchical prediction structure
Spatial Scalability
Quality Scalability
Combined Scalability
Profiles of SVC
Conclusions
2007/8
MC2008, VCLAB
12
Temporal Scalability
Hierarchical prediction structures
Hierarchical B pictures
0 4 3 5 2 7 6 8 1 12 11 13 10 15 14 16 9
GOP
Non-dyadic hierarchical
prediction
0 3 4 2 6 7 5 8 9 1 12 13 11 15 16 14 17 18 10
Hierarchical prediction with
zero delay
2007/8
MC2008, VCLAB
0 1 2 3 4 5 6 7 8 9 1011
1213 14 15 16
13
Temporal Scalability
Combination with multiple reference picture
Arbitrary modification of the prediction
structure
Issue of quantization
2007/8
Lower layers with higher fidelity Smaller QPs
are used in lower layers
Propagation of quantization error smaller QPs
are used in higher layers
MC2008, VCLAB
14
Temporal Scalability
N=1
Video Coding Experiment with H.264/MPEG4-AVC
Foreman, CIF 30Hz @ 1320kbps
Performance as a function of N
I P P P P P P P P
Cascaded QP assignment
QP(P) QP(B0)-3 QP(B1)-4 QP(B2)-5
Temporal
scalability
N=2
I B0 P B0 P B0 P B0 P
N=4
I B1 B0 B1 P B1 B0 B1 P
N=8
2007/8
I B2 B1 B2 B0 B2 B1 B2 P
MC2008, VCLAB
16
This slide is copied from JVT-W132-Talk
Temporal Scalability
Coding efficiency of hierarchical prediction
JSVM11, High profile with CABAC
Only one reference frame
CIF
2007/8
MC2008, VCLAB
18
Temporal Scalability
Compared with IPPP (With and without delay constraint)
Providing temporal scalability usually doesn’t have any
negative impact on coding efficiency
2007/8
MC2008, VCLAB
19
Outline
Introduction
History of SVC
Structure of SVC
Temporal Scalability
Spatial Scalability
Inter layer prediction
Inter layer motion prediction
Inter layer residual prediction
Inter layer intra prediction
Quality Scalability
Combined Scalability
Profiles of SVC
Conclusions
2007/8
MC2008, VCLAB
20
Spatial Scalability
texture
Hierarchical MCP
& Intra-prediction
Spatial
decimation
texture
Base layer
coding
motion
Inter-layer prediction
•Intra
•Motion
•Residual
H.264/AVC MCP
& Intra-prediction
2007/8
motion
Inter-layer prediction
•Intra
•Motion
•Residual
Hierarchical MCP
& Intra-prediction
Spatial
decimation
Base layer
coding
Scalable
bit-stream
H.264/AVC compatible base
layer bit-stream
texture
motion
Multiplex
Base layer
coding
H.264/AVC compatible coder
MC2008, VCLAB
21
Spatial Scalability
Similar to MPEG-2, H.263, and MPEG-4
Arbitrary resolution ratio
The same coding order in all spatial layers
Combination with temporal scalability
Inter-layer prediction
Spatial 1
Temporal 2
Intra
Spatial 0
Temporal 0
Temporal 1
Intra
2007/8
MC2008, VCLAB
22
Spatial Scalability
The prediction signals are formed by
MCP inside the enhancement layer (Temporal)
(small motion and
high spatial detail)
Up-sampling from the lower layer (Spatial)
Average of the above two predictions (Temporal + Spatial)
Inter-layer prediction
2007/8
Three kinds of inter-layer prediction
Inter-layer motion prediction
Inter-layer residual prediction
Inter-layer intra prediction
Base mode MB
Only residual are transmitted, but no additional side info.
MC2008, VCLAB
23
Spatial Scalability
Inter-layer motion prediction
2007/8
(2x2,2y2) (2x1,2y1)
base_mode_flag = 1
The reference layer is inter-coded
16
16
Data are derived from the reference layer
(x2,y2)
(x1,y1)
MB partitioning
Reference layer
8
Reference indices
8
MVs
motion_pred_flag
1: MV predictors are obtained from the reference layer
0: MV predictors are obtained by conventional spatial
predictors.
MC2008, VCLAB
24
Spatial Scalability
Inter-layer residual prediction
residual_pred_flag = 1
Predictor
2007/8
Block-wise up-sampling by a bi-linear filter from the
corresponding 88 sub-MB in the reference layer
Transform block basis
MC2008, VCLAB
25
Spatial Scalability
Inter-layer intra prediction
base_mode_flag = 1
The reference layer is intra-coded
Up-sampling from the reference layer
2007/8
Luma: one-dimensional 4-tap FIR filter
Chroma: bi-linear filter
MC2008, VCLAB
26
Spatial Scalability
Past spatial scalable video:
Inter-layer intra prediction requires completely decoding
of base layer.
Multiple motion compensation and deblocking filter are
needed.
Full decoding + inter-layer prediction: complexity >
simulcast.
Single-loop decoding
2007/8
Inter-layer intra prediction is restricted to MBs for which
the co-located base layer is intra-coded
MC2008, VCLAB
27
Spatial Scalability
Single-loop vs. multi-loop decoding
Inter
I
2007/8
B
P
MC2008, VCLAB
28
This slide is copied from http://iphome.hhi.de/wiegand/assets/pdfs/H264AVC_SVC.pdf
Spatial Scalability
Generalized spatial scalability in SVC
Arbitrary ratio
Cropping
2007/8
Only restriction: Neither the horizontal nor the vertical
resolution can decrease from one layer to the next.
Containing new regions
Higher quality of interesting regions
MC2008, VCLAB
29
Spatial Scalability
Coding efficiency
2007/8
Multiple-loop > Single-loop
MC2008, VCLAB
30
Spatial Scalability
Coding efficiency (IPPP…)
2007/8
Multi-loop > Single-loop
MC2008, VCLAB
32
Spatial Scalability
Encoder control (JSVM)
Base layer
p0 is optimized for base layer
p1 ' arg min{D1 ( p1 | p0 ) 1R1 ( p1 | p0 )}
{ p1| p0 }
p1 is optimized for enhancement layer
Decisions of p1 depend on p0
2007/8
{ p0 }
Enhancement layer
p0 ' arg min{D0 ( p0 ) 0 R0 ( p0 )}
Efficient base layer coding but inefficient enhancement
layer coding
MC2008, VCLAB
33
Spatial Scalability
Encoder control (optimization)
Base layer
Considering enhancement layer coding
2007/8
Eliminating p0’s disadvantaging enhancement layer coding
p0 ' arg min{(1 w)[D0 ( p0 ) 0 R0 ( p0 )] w[ D1 ( p1 | p0 ) 1R1 ( p1 | p0 )]}
{ p0 , p1| p0 }
Enhancement layer
No change
w
w = 0: JSVM encoder control
w = 1: Single-loop encoder control (base layer is not
controlled)
MC2008, VCLAB
34
Spatial Scalability
Coding efficiency of optimal encoder control
2007/8
Optimized encoder vs. JSVM encoder (QPE =
QPB + 4)
MC2008, VCLAB
35
Outline
Introduction
History of SVC
Structure of SVC
Temporal Scalability
Spatial Scalability
Quality Scalability
CGS
MGS
Drift control
Combined Scalability
Profiles of SVC
Conclusions
2007/8
MC2008, VCLAB
36
Quality Scalability
Coarse-grain quality scalability (CGS)
A special case of spatial scalability
Smaller quantization step sizes for higher
enhancement residual layers
Designed for only several selected bit-rate points
2007/8
Identical sizes (resolution) for base and enhancement
layers
Supported bit-rate points = Number of layers
Switch can only occur at IDR access units
MC2008, VCLAB
37
Quality Scalability
Medium-grain quality scalability (MGS)
More enhancement layers are supported
Key pictures
2007/8
Refinement quality layers of residual
Drift control
Switch can occur at any access units
CGS + key pictures + refinement quality layers
MC2008, VCLAB
38
Quality Scalability
Drift control
Drift: The effect caused by unsynchronized MCP
at the encoder and decoder side
Trade-off of MCP in quality SVC
2007/8
Coding efficiency drift
MC2008, VCLAB
39
Quality Scalability
MPEG-4 quality scalability with FGS
Refinement
(possibly lost
or truncated)
Base layer
Base layer is stored and used for MCP of following pictures
Drift: Drift free
Complexity: Low
Efficiency: Efficient based layer but inefficient enhancement
layer
2007/8
Refinement data are not used for MCP
MC2008, VCLAB
40
Quality Scalability
MPEG-2 quality scalability (without FGS)
Refinement
(possibly lost
or truncated)
Base layer
Only 1 reference picture is stored and used for MCP of
following pictures
Drift: Both base layer and enhancement layer
2007/8
Frequent intra updates is necessary
Complexity: Low
Efficiency: Efficient enhancement layer but inefficient base
layer
MC2008, VCLAB
41
Quality Scalability
2-loop prediction
Refinement
(possibly lost
or truncated)
Base layer
2007/8
Several closed encoder loops run at different bit-rate
points in a layered structure
Drift: Enhancement layer
Complexity: High
Efficiency: Efficient base layer and medium efficient
enhancement layer
MC2008, VCLAB
42
Quality Scalability
SVC concepts
Refinement
(possibly lost
or truncated)
Base layer
Key picture
2007/8
Trade-off between coding efficiency and drift
MPEG-4 FGS: All key pictures
MPEG-2 quality scalability: Non-key pictures
MC2008, VCLAB
43
Quality Scalability
Drift control with hierarchical prediction
Refinement
(possibly lost
or truncated)
Base layer
P
Key pictures
2007/8
B1
B2
P
B2
B1
B2
P
Based layer is stored and used for the MCP of following pictures
Other pictures
B2
Enhancement layer is stored and used for the MCP of following
pictures
GOP size adjusts the trade-off between enhancement layer
coding efficiency and drift
MC2008, VCLAB
44
Quality Scalability
Comparisons of drift control
High efficiency
Low efficiency
Drift-free
Drift
2007/8
MC2008, VCLAB
45
Quality Scalability
Comparisons of coding efficiency
QSTEP = 2 (QP-4)/6
High dQP
Low dQP
2007/8
MC2008, VCLAB
46
Quality Scalability
MGS with key pictures using optimized
encoder control
Only base layer
2007/8
MC2008, VCLAB
47
Outline
Introduction
History of SVC
Structure of SVC
Temporal Scalability
Spatial Scalability
Quality Scalability
Combined Scalability
SVC encoder structure
Dependence and Quality refinement layers
Bit-stream format
Bit-stream switching
Profiles of SVC
Conclusions
2007/8
MC2008, VCLAB
48
Combined Scalability
SVC encoder structure
The same
motion/prediction
information
Dependency layer
Temporal
Decomposition
The same
motion/prediction
information
2007/8
MC2008, VCLAB
49
Combined Scalability
Dependency and Quality refinement layers
Q=2
D=2
Q=1
Q=0
Q=2
D=1
Q=1
Scalable bitstream
Q=0
Q=2
D=0
2007/8
Q=1
Q=0
MC2008, VCLAB
50
Combined Scalability
Q1
D1
Q0
T0
T2
T1
T2
T0
Q1
D0
Q0
2007/8
MC2008, VCLAB
51
Combined Scalability
Bit-stream format
NAL unit
header
2
NAL unit header
extension
NAL unit payload
6
3
3
2
P
T
D
Q
1 1 1 1 1
3
P (priority_id): indicates the importance of a NAL unit
T (temporal_id): indicates temporal level
D (dependency_id): indicates spatial/CGS layer
Q (quality_id): indicates MGS/FGS layer
2007/8
MC2008, VCLAB
52
Combined Scalability
Bit-stream switching
Inside a dependency layer
Outside a dependency layer
2007/8
Switching everywhere
Switching up only at IDR access units
Switching down everywhere if using multiple-loop
decoding
MC2008, VCLAB
53
Outline
Introduction
History of SVC
Structure of SVC
Temporal Scalability
Spatial Scalability
Quality Scalability
Combined Scalability
Profiles of SVC
Scalable Baseline
Scalable High
Scalable High Intra
Conclusions
2007/8
MC2008, VCLAB
54
Profiles of SVC
Scalable Baseline
2007/8
For conversational and surveillance applications requiring
low decoding complexity
Spatial scalability: fixed ratio (1, 1.5, or 2) and MB-aligned
cropping
Temporal and quality scalability: arbitrary
No interlaced coding tools
B-slices, weighted prediction, CABAC, and 8x8 luma
transform
The base layer conforms Baseline profile of H.264/AVC
MC2008, VCLAB
55
Profiles of SVC
Scalable High
For broadcast, streaming, and storage
Spatial, temporal, and quality scalability: arbitrary
The base layer conforms High profile of
H.264/AVC
Scalable High Intra
2007/8
Scalable High + all IDR pictures
MC2008, VCLAB
56
Conclusions
Temporal scalability
Hierarchical prediction structure
Spatial and quality scalability
Inter-layer prediction of Intra, motion, and residual information
Single-loop MC decoding
Identical size for each spatial layer – CGS
CGS + key pictures + quality refinement layer – MGS
applications
Power adaption – decoding needed part of the video stream
Graceful degradation – when “right” parts are lost
Format adaption – backwards compatible extension in mobile TV
What’s next in SVC?
Bit-depth scalability (8-bit 4:2:0 10-bit 4:2:0)
Color format scalability (4:2:0 4:4:4)
2007/8
MC2008, VCLAB
57
References
H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the
Scalable Video Coding Extension of the H.264/AVC Standard,”
CSVT 2007.
T. Wiegand, “Scalable Video Coding,” Joint Video Team, doc.
JVT-W132, San Jose, USA, April 2007.
T. Wiegand, “Scalable Video Coding,” Digital Image
Communication, Course at Technical University of Berlin,
2006. (Available on http://iphome.hhi.de/wiegand/dic.htm)
H. Schwarz, D. Marpe, and T. Wiegand, “Constrained InterLayer Prediction for Single-Loop Decoding in Spatial
Scalability,” Proc. of ICIP’05.
2007/8
MC2008, VCLAB
58