Overview of the Scalable Video Coding Extension of the H

Download Report

Transcript Overview of the Scalable Video Coding Extension of the H

Overview of the Scalable
Video Coding Extension of
the H.264/AVC Standard
Heiko Schwarz, Detlev Marpe, and
Thomas Wiegand
CSVT, Sept. 2007
2009/5
MC2008, VCLAB
1
Outline

Introduction















Problems
Definition
Functionality
Goal
Competition
Applications
Targets
History of SVC
Structure of SVC
Temporal Scalability
Spatial Scalability
Quality Scalability
Combined Scalability
Profiles of SVC
Conclusions
2007/8
MC2008, VCLAB
2
Introduction - problem

Non-Scalable Video Streaming

Multiple video streams are needed for
heterogeneous clients
8Mb/s
512Kb/s
1Mb/s
6Mb/s
2007/8
4Mb/s
MC2008, VCLAB
3
Introduction - definition

Scalable video stream
Sub-stream 2
Sub-stream 1

Sub-stream ki
reconstruc
tion
High quality
…
…

Sub-stream n
Sub-stream
k2
Sub-stream
k1
Low quality
Scalability

2007/8
Removal of parts of the video bit-stream to adapt
to the various needs of end users and to varying
terminal capabilities or network conditions
MC2008, VCLAB
4
Introduction - functionality

Functionality of SVC




2007/8
Graceful degradation when “right” parts of the bitstream are lost
Bit-rate adaptation to match the channel
throughput
Format adaptation for backwards compatible
extension
Power adaptation for trade-off between runtime
and quality
MC2008, VCLAB
5
Introduction - goal

Goal of SVC
Sub-stream ki
…

Sub-stream
k2
Sub-stream
k1
=
(Quality)
H.264/AVC
bit-stream
Scalability mode





2007/8
Fidelity reduction (SNR scalability)
Picture size reduction (spatial scalability)
Frame rate reduction (temporal scalability)
Sharpness reduction (frequency scalability)
Selection of content (ROI or object-based
scalability)
MC2008, VCLAB
6
Introduction - competition

SVC is an old research topic (> 20 years) and
has been included in H.262/MPEG-2, H.263,
and MPEG-4 Visual.


Rarely used because
 The characteristics of traditional video transmission
systems
 Significant loss of coding efficiency and large increase
in decoder complexity
Competition


2007/8
Simulcast
Transcoding
MC2008, VCLAB
7
Introduction - applications

Applications




Heterogeneous clients
Unequal protection
Surveillance
Problems



2007/8
Increased decoder complexity
Decreased coding efficiency
Temporal scalability is more often supported than
spatial and quality scalability.
MC2008, VCLAB
8
Introduction - targets

Targets





2007/8
Little decrease in coding efficiency
Little increase in decoding complexity
Support of temporal, spatial, and quality scalability
A backward compatible base layer
Simple bit-stream adaptations after encoding
MC2008, VCLAB
9
History of SVC



October 2003: MPEG issues a call for proposals of Scalable
Video Coding
 12 wavelet-based
 2 extensions of H.264/AVC
~October 2004: MSRA vs. HHI proposal (Wavelet-based vs.
H.264 Extension)
October 2004: HHI proposal adopted as starting point (due to
reduction of the encoder and decoder and improvements in coding
efficiency)


January 2005: MPEG and VCEG agree to jointly finalize the
SVC project as an Amendment of H.264/AVC
Spring 2007: Finalization
2007/8
MC2008, VCLAB
10
Structure of SVC
SNR
scalable
coding
Temporal
scalable
coding
Prediction
Base layer
coding
Multiplex
Spatial
decimation
SNR
scalable
coding
Temporal
scalable
coding
2007/8
Prediction
MC2008, VCLAB
Base layer
coding
11
Outline




Introduction
History of SVC
Structure of SVC
Temporal Scalability






Hierarchical prediction structure
Spatial Scalability
Quality Scalability
Combined Scalability
Profiles of SVC
Conclusions
2007/8
MC2008, VCLAB
12
Temporal Scalability

Hierarchical prediction structures
Hierarchical B pictures
0 4 3 5 2 7 6 8 1 12 11 13 10 15 14 16 9
GOP
Non-dyadic hierarchical
prediction
0 3 4 2 6 7 5 8 9 1 12 13 11 15 16 14 17 18 10
Hierarchical prediction with
zero delay
2007/8
MC2008, VCLAB
0 1 2 3 4 5 6 7 8 9 1011
1213 14 15 16
13
Temporal Scalability



Combination with multiple reference picture
Arbitrary modification of the prediction
structure
Issue of quantization


2007/8
Lower layers with higher fidelity  Smaller QPs
are used in lower layers
Propagation of quantization error  smaller QPs
are used in higher layers
MC2008, VCLAB
14
Temporal Scalability
N=1
Video Coding Experiment with H.264/MPEG4-AVC
Foreman, CIF 30Hz @ 1320kbps
Performance as a function of N
I P P P P P P P P
Cascaded QP assignment
QP(P)  QP(B0)-3  QP(B1)-4  QP(B2)-5
Temporal
scalability
N=2
I B0 P B0 P B0 P B0 P
N=4
I B1 B0 B1 P B1 B0 B1 P
N=8
2007/8
I B2 B1 B2 B0 B2 B1 B2 P
MC2008, VCLAB
16
This slide is copied from JVT-W132-Talk
Temporal Scalability

Coding efficiency of hierarchical prediction


JSVM11, High profile with CABAC
Only one reference frame
CIF
2007/8
MC2008, VCLAB
18
Temporal Scalability

Compared with IPPP (With and without delay constraint)

Providing temporal scalability usually doesn’t have any
negative impact on coding efficiency
2007/8
MC2008, VCLAB
19
Outline





Introduction
History of SVC
Structure of SVC
Temporal Scalability
Spatial Scalability
 Inter layer prediction







Inter layer motion prediction
Inter layer residual prediction
Inter layer intra prediction
Quality Scalability
Combined Scalability
Profiles of SVC
Conclusions
2007/8
MC2008, VCLAB
20
Spatial Scalability
texture
Hierarchical MCP
& Intra-prediction
Spatial
decimation
texture
Base layer
coding
motion
Inter-layer prediction
•Intra
•Motion
•Residual
H.264/AVC MCP
& Intra-prediction
2007/8
motion
Inter-layer prediction
•Intra
•Motion
•Residual
Hierarchical MCP
& Intra-prediction
Spatial
decimation
Base layer
coding
Scalable
bit-stream
H.264/AVC compatible base
layer bit-stream
texture
motion
Multiplex
Base layer
coding
H.264/AVC compatible coder
MC2008, VCLAB
21
Spatial Scalability





Similar to MPEG-2, H.263, and MPEG-4
Arbitrary resolution ratio
The same coding order in all spatial layers
Combination with temporal scalability
Inter-layer prediction
Spatial 1
Temporal 2
Intra
Spatial 0
Temporal 0
Temporal 1
Intra
2007/8
MC2008, VCLAB
22
Spatial Scalability

The prediction signals are formed by

MCP inside the enhancement layer (Temporal)
(small motion and
high spatial detail)



Up-sampling from the lower layer (Spatial)
Average of the above two predictions (Temporal + Spatial)
Inter-layer prediction


2007/8
Three kinds of inter-layer prediction
 Inter-layer motion prediction
 Inter-layer residual prediction
 Inter-layer intra prediction
Base mode MB
 Only residual are transmitted, but no additional side info.
MC2008, VCLAB
23
Spatial Scalability

Inter-layer motion prediction




2007/8
(2x2,2y2) (2x1,2y1)
base_mode_flag = 1
The reference layer is inter-coded
16
16
Data are derived from the reference layer
(x2,y2)
(x1,y1)
 MB partitioning
Reference layer
8
 Reference indices
8
 MVs
motion_pred_flag
 1: MV predictors are obtained from the reference layer
 0: MV predictors are obtained by conventional spatial
predictors.
MC2008, VCLAB
24
Spatial Scalability

Inter-layer residual prediction


residual_pred_flag = 1
Predictor


2007/8
Block-wise up-sampling by a bi-linear filter from the
corresponding 88 sub-MB in the reference layer
Transform block basis
MC2008, VCLAB
25
Spatial Scalability

Inter-layer intra prediction



base_mode_flag = 1
The reference layer is intra-coded
Up-sampling from the reference layer


2007/8
Luma: one-dimensional 4-tap FIR filter
Chroma: bi-linear filter
MC2008, VCLAB
26
Spatial Scalability

Past spatial scalable video:




Inter-layer intra prediction requires completely decoding
of base layer.
Multiple motion compensation and deblocking filter are
needed.
Full decoding + inter-layer prediction: complexity >
simulcast.
Single-loop decoding

2007/8
Inter-layer intra prediction is restricted to MBs for which
the co-located base layer is intra-coded
MC2008, VCLAB
27
Spatial Scalability

Single-loop vs. multi-loop decoding
Inter
I
2007/8
B
P
MC2008, VCLAB
28
This slide is copied from http://iphome.hhi.de/wiegand/assets/pdfs/H264AVC_SVC.pdf
Spatial Scalability

Generalized spatial scalability in SVC

Arbitrary ratio


Cropping


2007/8
Only restriction: Neither the horizontal nor the vertical
resolution can decrease from one layer to the next.
Containing new regions
Higher quality of interesting regions
MC2008, VCLAB
29
Spatial Scalability

Coding efficiency

2007/8
Multiple-loop > Single-loop
MC2008, VCLAB
30
Spatial Scalability

Coding efficiency (IPPP…)

2007/8
Multi-loop > Single-loop
MC2008, VCLAB
32
Spatial Scalability

Encoder control (JSVM)

Base layer




p0 is optimized for base layer
p1 '  arg min{D1 ( p1 | p0 )  1R1 ( p1 | p0 )}
{ p1| p0 }
p1 is optimized for enhancement layer
Decisions of p1 depend on p0

2007/8
{ p0 }
Enhancement layer


p0 '  arg min{D0 ( p0 )  0 R0 ( p0 )}
Efficient base layer coding but inefficient enhancement
layer coding
MC2008, VCLAB
33
Spatial Scalability

Encoder control (optimization)

Base layer
 Considering enhancement layer coding




2007/8
Eliminating p0’s disadvantaging enhancement layer coding
p0 '  arg min{(1  w)[D0 ( p0 )  0 R0 ( p0 )]  w[ D1 ( p1 | p0 )  1R1 ( p1 | p0 )]}
{ p0 , p1| p0 }
Enhancement layer
 No change
w
 w = 0: JSVM encoder control
 w = 1: Single-loop encoder control (base layer is not
controlled)
MC2008, VCLAB
34
Spatial Scalability

Coding efficiency of optimal encoder control

2007/8
Optimized encoder vs. JSVM encoder (QPE =
QPB + 4)
MC2008, VCLAB
35
Outline









Introduction
History of SVC
Structure of SVC
Temporal Scalability
Spatial Scalability
Quality Scalability
 CGS
 MGS
 Drift control
Combined Scalability
Profiles of SVC
Conclusions
2007/8
MC2008, VCLAB
36
Quality Scalability

Coarse-grain quality scalability (CGS)

A special case of spatial scalability



Smaller quantization step sizes for higher
enhancement residual layers
Designed for only several selected bit-rate points


2007/8
Identical sizes (resolution) for base and enhancement
layers
Supported bit-rate points = Number of layers
Switch can only occur at IDR access units
MC2008, VCLAB
37
Quality Scalability

Medium-grain quality scalability (MGS)

More enhancement layers are supported


Key pictures



2007/8
Refinement quality layers of residual
Drift control
Switch can occur at any access units
CGS + key pictures + refinement quality layers
MC2008, VCLAB
38
Quality Scalability

Drift control


Drift: The effect caused by unsynchronized MCP
at the encoder and decoder side
Trade-off of MCP in quality SVC

2007/8
Coding efficiency  drift
MC2008, VCLAB
39
Quality Scalability

MPEG-4 quality scalability with FGS
Refinement
(possibly lost
or truncated)
Base layer




Base layer is stored and used for MCP of following pictures
Drift: Drift free
Complexity: Low
Efficiency: Efficient based layer but inefficient enhancement
layer

2007/8
Refinement data are not used for MCP
MC2008, VCLAB
40
Quality Scalability

MPEG-2 quality scalability (without FGS)
Refinement
(possibly lost
or truncated)
Base layer


Only 1 reference picture is stored and used for MCP of
following pictures
Drift: Both base layer and enhancement layer



2007/8
Frequent intra updates is necessary
Complexity: Low
Efficiency: Efficient enhancement layer but inefficient base
layer
MC2008, VCLAB
41
Quality Scalability

2-loop prediction
Refinement
(possibly lost
or truncated)
Base layer




2007/8
Several closed encoder loops run at different bit-rate
points in a layered structure
Drift: Enhancement layer
Complexity: High
Efficiency: Efficient base layer and medium efficient
enhancement layer
MC2008, VCLAB
42
Quality Scalability

SVC concepts
Refinement
(possibly lost
or truncated)
Base layer

Key picture



2007/8
Trade-off between coding efficiency and drift
MPEG-4 FGS: All key pictures
MPEG-2 quality scalability: Non-key pictures
MC2008, VCLAB
43
Quality Scalability

Drift control with hierarchical prediction
Refinement
(possibly lost
or truncated)
Base layer

P
Key pictures


2007/8
B1
B2
P
B2
B1
B2
P
Based layer is stored and used for the MCP of following pictures
Other pictures


B2
Enhancement layer is stored and used for the MCP of following
pictures
GOP size adjusts the trade-off between enhancement layer
coding efficiency and drift
MC2008, VCLAB
44
Quality Scalability

Comparisons of drift control
High efficiency
Low efficiency
Drift-free
Drift
2007/8
MC2008, VCLAB
45
Quality Scalability

Comparisons of coding efficiency
QSTEP = 2 (QP-4)/6
High dQP
Low dQP
2007/8
MC2008, VCLAB
46
Quality Scalability

MGS with key pictures using optimized
encoder control
Only base layer
2007/8
MC2008, VCLAB
47
Outline









Introduction
History of SVC
Structure of SVC
Temporal Scalability
Spatial Scalability
Quality Scalability
Combined Scalability
 SVC encoder structure
 Dependence and Quality refinement layers
 Bit-stream format
 Bit-stream switching
Profiles of SVC
Conclusions
2007/8
MC2008, VCLAB
48
Combined Scalability

SVC encoder structure
The same
motion/prediction
information
Dependency layer
Temporal
Decomposition
The same
motion/prediction
information
2007/8
MC2008, VCLAB
49
Combined Scalability

Dependency and Quality refinement layers
Q=2
D=2
Q=1
Q=0
Q=2
D=1
Q=1
Scalable bitstream
Q=0
Q=2
D=0
2007/8
Q=1
Q=0
MC2008, VCLAB
50
Combined Scalability
Q1
D1
Q0
T0
T2
T1
T2
T0
Q1
D0
Q0
2007/8
MC2008, VCLAB
51
Combined Scalability

Bit-stream format
NAL unit
header
2
NAL unit header
extension
NAL unit payload
6
3
3
2
P
T
D
Q
1 1 1 1 1
3
P (priority_id): indicates the importance of a NAL unit
T (temporal_id): indicates temporal level
D (dependency_id): indicates spatial/CGS layer
Q (quality_id): indicates MGS/FGS layer
2007/8
MC2008, VCLAB
52
Combined Scalability

Bit-stream switching

Inside a dependency layer


Outside a dependency layer


2007/8
Switching everywhere
Switching up only at IDR access units
Switching down everywhere if using multiple-loop
decoding
MC2008, VCLAB
53
Outline









Introduction
History of SVC
Structure of SVC
Temporal Scalability
Spatial Scalability
Quality Scalability
Combined Scalability
Profiles of SVC
 Scalable Baseline
 Scalable High
 Scalable High Intra
Conclusions
2007/8
MC2008, VCLAB
54
Profiles of SVC

Scalable Baseline






2007/8
For conversational and surveillance applications requiring
low decoding complexity
Spatial scalability: fixed ratio (1, 1.5, or 2) and MB-aligned
cropping
Temporal and quality scalability: arbitrary
No interlaced coding tools
B-slices, weighted prediction, CABAC, and 8x8 luma
transform
The base layer conforms Baseline profile of H.264/AVC
MC2008, VCLAB
55
Profiles of SVC

Scalable High




For broadcast, streaming, and storage
Spatial, temporal, and quality scalability: arbitrary
The base layer conforms High profile of
H.264/AVC
Scalable High Intra

2007/8
Scalable High + all IDR pictures
MC2008, VCLAB
56
Conclusions




Temporal scalability
 Hierarchical prediction structure
Spatial and quality scalability
 Inter-layer prediction of Intra, motion, and residual information
 Single-loop MC decoding
 Identical size for each spatial layer – CGS
 CGS + key pictures + quality refinement layer – MGS
applications
 Power adaption – decoding needed part of the video stream
 Graceful degradation – when “right” parts are lost
 Format adaption – backwards compatible extension in mobile TV
What’s next in SVC?
 Bit-depth scalability (8-bit 4:2:0  10-bit 4:2:0)
 Color format scalability (4:2:0  4:4:4)
2007/8
MC2008, VCLAB
57
References




H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the
Scalable Video Coding Extension of the H.264/AVC Standard,”
CSVT 2007.
T. Wiegand, “Scalable Video Coding,” Joint Video Team, doc.
JVT-W132, San Jose, USA, April 2007.
T. Wiegand, “Scalable Video Coding,” Digital Image
Communication, Course at Technical University of Berlin,
2006. (Available on http://iphome.hhi.de/wiegand/dic.htm)
H. Schwarz, D. Marpe, and T. Wiegand, “Constrained InterLayer Prediction for Single-Loop Decoding in Spatial
Scalability,” Proc. of ICIP’05.
2007/8
MC2008, VCLAB
58