Multimedia Communications

Download Report

Transcript Multimedia Communications

Scalable Video Coding
Scalable Extension of H.264 / AVC
-1/20-
Scalable Video Coding





Video streaming over internet is gaining more and
more popularity due to video conferencing and video
telephony applications.
The heterogeneous, dynamic and best effort structure
of the internet, motivates to introduce a scalability
feature as adapting video streams to fluctuations in the
available bandwidths.
Optimize the video quality for a large range of bit-rates.
A video bit stream is called scalable if part of the
stream can be removed in such a way that the resulting
bit stream is still decodable.
Scalability here implies:


Single encode
Multiple possibilities to transmit and decode bitstream
Scalable Video Coding
A video bit stream is called scalable if part of the stream can be
removed in such a way that the resulting bit stream is still
decodable, to adapt to the various needs of end users and to
varying terminal capabilities or network conditions.
SVC - Standardization
4
SVC Principle : one encoding
5
SVC Principle : multiple decoding
6
H.264/AVC Simulcast vs. SVC

Typical gains in quality by doing SVC spatial scalability
(as opposed to Simulcast) may be in the range



of 0.5dB to 1.5dB PSNR gain
Or equivalently 10 to 30% bit rate reduction
This gap will be more if there are more than one SNR
layer per spatial layer
45
H.264 Simulcast Vs. SVC
ManInRestaurent Sequence
44
43
SD
1920x1080+960x540
SIMULCAST
HD
H.264
simulcast
HD+SD
Y-PSNR
42
41
SVC with 2 spatial layers
(1920x1080<->960x540)
40
SVC
39
38
37
0
1000
2000
3000
4000
5000
Bitrate (KBPS)
6000
7000
8000
9000
10000
Functionalities and Applications


SVC has capability of reconstructing lower resolution or
lower quality signals from partial bit streams.
Partial decoding of the bit stream allows




Graceful degradation in case part of bit stream is lost.
Bit-rate adaptation
Format adaptation
Power adaptation
Beneficial for transmission services with uncertainties
regarding


Resolution required at the terminal.
Channel conditions or device types.
SVC Basics





Straight forward extension to H.264
with very limited added complexity
Layered approach
 One base layer
 One or more enhancement layers.
Base layer is H.264/AVC compliant.
An SVC stream can be decoded by an H.264 decoder.
Enhancement layers enable Temporal, Spatial or Quality (SNR)
scalability.
SVC Profiles

SVC Standard defines 3 profiles

Scalable Baseline profile




Scalable High profile




Targeted for conversational and surveillance applications.
Support for Spatial Scalable coding is restricted to ratios 1.5
and 2, between successive spatial layers.
Interlaced video not supported.
Designed for broadcast, storage and streaming applications.
Spatial scalable coding with arbitrary resolution ratios
supported.
Interlaced video supported
Scalable High Intra profile



Designed for professional applications.
Contains only IDR pictures for all layers.
All other coding tools are same as Scalable High Profile.
Temporal Scalability
(Dyadic prediction structure)
3.75
fps
Frame Rate = 30
7.5
15
fps
fps
GOP border
GOP border
Prediction
T0
T3
T2
T3
T1
T3
Key Picture

Group of Pictures (GOP)
T2
T3
T0
Key Picture
Tx : Temporal Layer Identifier
Structural Delay = 7 frames

Key Picture: Typically Intra-coded

Hierarchically predicted B Pictures: Motion-Compensated
Prediction
Hierarchical B-pictures
•
Above is a non-dyadic prediction structure, which provides 2
independently decodable subsequences with 1/9th and 1/3rd of full
frame rate.
•
Structural delay = 8 frames
•
Above is a non-dyadic prediction structure, which provides 0 structural delay,
but low coding efficiency, compared to above examples.
•
Any chosen prediction structure need not be constant over time. It can be
arbitrarily modified, e.g., to improve coding efficiency.
Figure courtesy “Overview of Scalable Video Coding extension of H.264 / AVC” SCHWARZ et al., IEEE Transactions on circuits and Systems for Video Technology, Sept. 2007
Group Of Pictures (GOP)

IPP : GOP Size 1



IBP : GOP Size 2


Temporal Levels 0, 1
GOP Size 4


No Temporal scalability
Only Temporal Level 0
Temporal Levels 0, 1, 2
GOP Size 8

Temporal Levels 0, 1, 2, 3
Coding efficiency of Hierarchical
Prediction Structures


Significant improvement in coding efficiency for high delay app.
Depends on how QP is chosen for different temporal layers.


larger GOP size gives larger PSNR improvement
Smaller QP for lower layer
Figure courtesy “Overview of Scalable Video Coding extension of H.264 / AVC” SCHWARZ et al., IEEE Transactions on circuits and Systems for Video Technology, Sept. 2007
Spatial Scalability
The base layer contains a reduced-resolution version of each coded
frame. Decoding the base layer alone produces a low-resolution
output sequence and decoding the base layer with enhancement
layer(s) produces a higher-resolution output.
Subtract Predicted
from Original
Sub-sample and Encode
to form Base Layer
Decode and Up-sample
to original Resolution
Encode residue
to form Enhancement Layer
Spatial Scalability


The prediction signals are formed by

MCP inside the enhancement layer (Temporal) (small motion and high spatial detail)

Up-sampling from the lower layer (Spatial)

Average of the above two predictions (Temporal + Spatial)
Inter-layer prediction

Three kinds of inter-layer prediction

Inter-layer motion prediction

Inter-layer residual prediction

Inter-layer intra prediction (when the co-located lower layer MB is intra coded)

Base mode MB

Only residuals are transmitted, but no additional side info.
Extended Spatial Scalability (ESS)

This is required in many applications where different display sizes
from broadcasting, communications and IT environments are
commonly mixed, having different aspect ratios (like 4:3 or 16:9 etc).
Quality / Fidelity / SNR Scalability

Types



Coarse Grain Scalability (CGS)
Medium Grain Scalability (MGS)
Fine Grain Scalability (FGS)



Not supported by SVC standard because of very poor enhancement
layer coding efficiency.
Bit rate adaptation at same spatial/temporal resolution
SVC supports up to 16 SNR layers for each spatial layer
Coarse-grain quality scalability (CGS)

A special case of spatial scalability



Smaller quantization step sizes for higher
enhancement residual layers
Designed for only several selected bit-rate points


Identical sizes (resolution) for base and enhancement
layers
Supported bit-rate points = Number of layers
Switch can only occur at IDR access units
Medium-grain quality scalability (MGS)

More enhancement layers are supported


Key pictures




Refinement quality layers of residual
Drift control
Switch can occur at any access units
CGS + key pictures + refinement quality layers
Drift control


Drift: The effect caused by unsynchronized MCP at
the encoder and decoder side
Trade-off of MCP in quality SVC

Coding efficiency  drift
SVC Encoder
The same motion/prediction
information
Dependency layer
Temporal
Decomposition
The same motion/prediction
information
SVC: Combined Scalability
Spatio-Temporal-Quality Cube
Combined Scalability

Dependency and Quality refinement layers
Q=2
D=2
Q=1
Q=0
Q=2
D=1
Q=1
Q=0
Q=2
D=0
Q=1
Q=0
Scalable bitstream
Combined Scalability
Q1
D1
Q0
T0
Q1
D0
Q0
T2
T1
T2
T0
Combined Scalability

Bit-stream format
NAL unit
header
2
NAL unit header
extension
NAL unit payload
6
3
3
2
P
T
D
Q
1 1 1 1 1
3
P (priority_id): indicates the importance of a NAL unit
T (temporal_id): indicates temporal level
D (dependency_id): indicates spatial/CGS layer
Q (quality_id): indicates MGS/FGS layer

Bit-stream switching

Inside a dependency layer


Switching everywhere
Outside a dependency layer


Switching up only at IDR access units
Switching down everywhere if using multiple-loop decoding
Profiles of SVC

Scalable Baseline







Scalable High




For conversational and surveillance applications requiring low
decoding complexity
Spatial scalability: fixed ratio (1, 1.5, or 2) and MB-aligned
cropping
Temporal and quality scalability: arbitrary
No interlaced coding tools
B-slices, weighted prediction, CABAC, and 8x8 luma transform
The base layer conforms Baseline profile of H.264/AVC
For broadcast, streaming, and storage
Spatial, temporal, and quality scalability: arbitrary
The base layer conforms High profile of H.264/AVC
Scalable High Intra

Scalable High + all IDR pictures
Conclusions




Temporal scalability
 Hierarchical prediction structure
Spatial and quality scalability
 Inter-layer prediction of Intra, motion, and residual information
 Single-loop MC decoding
 Identical size for each spatial layer – CGS
 CGS + key pictures + quality refinement layer – MGS
applications
 Power adaption – decoding needed part of the video stream
 Graceful degradation – when “right” parts are lost
 Format adaption – backwards compatible extension in mobile TV
What’s next in SVC?
 Bit-depth scalability (8-bit 4:2:0  10-bit 4:2:0)
 Color format scalability (4:2:0  4:4:4)
2007/8
MC2008, VCLAB
27
References




H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the
Scalable Video Coding Extension of the H.264/AVC Standard,”
CSVT 2007.
T. Wiegand, “Scalable Video Coding,” Joint Video Team, doc.
JVT-W132, San Jose, USA, April 2007.
T. Wiegand, “Scalable Video Coding,” Digital Image
Communication, Course at Technical University of Berlin,
2006. (Available on http://iphome.hhi.de/wiegand/dic.htm)
H. Schwarz, D. Marpe, and T. Wiegand, “Constrained InterLayer Prediction for Single-Loop Decoding in Spatial
Scalability,” Proc. of ICIP’05.
2007/8
MC2008, VCLAB
28