Transcript 1 - Tinkos
Operational Rate-Distortion
information theory in optimization of
advanced digital video codec
Dragorad A. Milovanović
Zoran S. Bojković
[email protected]
[email protected]
University of Belgrade
TINKOS
25.09.2013.
1/25
CONTENTS
1. Rate-Distortion theory
1.1 Source coding and R-D function
1.2 Operational R-D framework
1.3 Formulation of efficient video coding
2. Operational control of standard-based encoder
2.1 Operational MPEG framework
2.2 Performance/efficiency of digital video codec
2.3 Bitrate control and joint optimization
2/25
1. Rate-Distortion theory
Information Transmission System (message, symbols encoding, entropy)
Sˆ
S
R
Source coding: perceptual signals and distortion criterion D ≤ Dmax
p s , sˆ d
Average distortion: D S , Sˆ
i
j
i j
,
di
j
i, j
0
0
s i sˆ j
s i sˆ j
Rate-distortion theory calculates the minimum transmission bitrate R
for a required video quality D.
Mutual information is the information that symbols S and symbols Sˆ convey about
each other.
p s i , sˆ j
Average mutual information: I S ; Sˆ H S H S | Sˆ H Sˆ H Sˆ | S p s i , sˆ j
p s i p sˆ j
s , sˆ
Channel coding: channel capacity C is a maximum of mutual information I between
source and destination.
i
j
3/25
1.1 Source coding and R-D function
For a given maximum average distortion Dmax , the rate distortion function is lower
bound for the transmission bitrate R L D min I S ; Sˆ
D D max
1. Shannon lower bound RL(D) assumes statistical independence between distortion and reconstruction.
2. R(D) function is non-increasing and convex function of D.
3. For continuous source S, function R(D) approaches infinity as D approaches zero.
4. For discrete source S, the minimum rate that is required for a lossless transmission is equal to the
entropy rate R(0)=H(S) (losseless coding).
Stochastic model of Laplacian pdf source
(variance σ2=1): DL(R) = e/π・σ2・2−2R
Stochastic model of Gauss-Markov source
(correlation 0<ρ<0.9): DL(R) = (1- ρ2)・σ2・2−2R
1.2 Operational (R,D) framework
In a practical coding framework, structure of the coder is determined and finite set of
encoding modes is defined. In addition, it is usually difficult or simply impossible to
find closed-form expressions for the R(D) and D(R) functions for general sources.
Then, each of encoding parameters choices lead to pair of rate and distortion values
of operational point in R-D plane. The lower bound of all these rate-distortion pairs
is referred as ORD function.
Block diagram for a typical lossy source coding system:
block code QN={αN,βN,γN} (N consecutive input samples are independently coded)
bitrate R (average number of bits per source symbol)
additive distortion measure D (MSE of source/reconstructed symbols)
5/25
Operational R-D function
For given source S and code Q, operational point (R,D) is defined R=r(Q) and D=δ(Q).
Operational plane R-D is possible partitioned into region of achievable rate distortion
points (R,D) if there is a code Q with r(Q)≤R and δ(Q)≤D. The function R(D) that
describes this fundamental bound for a given source S is the operational function ORD.
The ORD boundary regions of achievable rate distortion points specifies:
A. the minimum rate R that is required for representing the source S with a distortion less than or equal
to a given value D or, alternatively,
B. the minimum distortion D that can be achieved if the source S is coded at a rate less than or equal to
a given value R.
D=Max
Region of achievable rate-distortion
points (R,D)
Operational R(D) function
R=Max
Rmin
Dmin
6/25
Quantization
Uniform scalar quantizer (Δ=const, D~Δ2/12, opt. γ)
Non-uniform optimal quantizer (Lloyd–Max centroids of pdf )
Asymptotic performance DL(R) = σ2・εS2・2−2R (Shannon lower bound)
7/25
Etropy coding (γ)
Variable length code (VLC ):
Huffman code minimize average code length
Ls = Σ p(si)・length(si) [bps]
Optimal code p*(si) minimize first-order entropy
Hs = - Σ p*(si)・log2 p*(si) [bps]
K= 2: p(s1) = P1, p(s2) = 1-P1
Hs = - P1 log2 P1 – (1-P1) log2 (1-P1) bits/symbol
P1 = 0.5 max Hs =1, Redundancys = log2K - Hs = 0
Arithmetic encoder (CABAC):
adaptive estimation of statistical distribution p(si)
8/25
Predictive coding
Differential coder
Predictive coder (DPCM)
Linear prediction Ŝn:
prediction coefficient pi
prediction error Un
reconstruction error U'n
Optimal linear prediction (Un orthogonal on Ŝn)
Prediction error variance σ' 2 = εα2σ2 ≥ γS2 εα2σS2 , γ=sfm
asymptotic performance: Coding gain CG =1/ γS2
N =1: p1,opt =ρ1, CG=1/(1- ρ1 2)
N =2: p1,opt =ρ1 (1- ρ2)/(1- ρ1 2), p2,opt =ρ1 (ρ2 - ρ1 2) (1- ρ2)/(1- ρ1 2)
9/25
Transform coding
Linear transformation
A transformation matrices
B inverse matrices
A orthogonal matrices A-1 = AT, AT A= A AT = I
B orthonormal matrices B = A-1 = AT (sum of N variances of coeff. = variance of s)
Optimal linear transformation KLT (eigenvalues of auto-covariance matrices RSS)
Asymptotic performance: Coding gain CG =1/ γS2
Optimal bitrate allocation R between N quantizers:
N=2
R0
R
2
1
R SS
1
2
log
1
0
2
1
R1
KLT S
R
2
1 1
2 1
1
2
log
0
2
1
S
2
2
q0
2
q1
CG
2
0
2 1/ 2
1
1
1
10/25
1.3 Formulation of efficient video coding
Standard-based codec requires optimization procedure over a set of
allowed operating parameters as well as additional criteria that arise from
real-time operations (complexity, delay).
The goal of operational information theory is to find a set of operating
parameters of the encoder which is optimal in R(D) sense. Also, an
efficient optimization procedure based on a fast algorithms solution
instead the full search of parameter’s space, is requires.
Practical trade-off between the allowed distortion D and available bitrate
R in designing an encoder, is based on the discrete optimization
procedure of finding a local optimum operational (R, D) points.
11/25
Lagrange multiplier method
Formulation of R-D problem: Cost function min D R with constraint R R max
D
Necessary condition for the existence of a minimum:
The solution: R * R max , D min D R max
R
0
Unconstrained Lagrangian cost function: min J R , , J D R R ,
J R ,
Necessary condition for the existence of a minimum :
R
0,
0
J R ,
0
The solution is simultaneous iteration of R and λ: D R * 0 , R * R
0
max
R
*
D R
R
,
R * R max
12/25
Geometrical interpretation
Operational R-D function is convex border which
connects subset of local optimum operational
points (connected operational points are suboptimal solution of Lagrange method).
Optimal operational point (D,R) as a solution of
Lagrange method min(D+ λ R) for constant λ, is
operational point on convex border which
touches slope λ.
13/25
Optimal bit allocation
Formulation:
N
Optimal bit allocation
R
i 1
i
R max const
with constraint min
Ri
i 1
N
Di (Ri )
Unconstrained Lagrangian cost function: min J min
Ri
Ri
i 1
Necessary condition for the existence of a minimum:
J R k ,
R k
D i ( R i )
N
N
i 1
R i , 0
N
D i (R i )
i 1
R k
0,
J R k ,
The solution is simultaneous iteration of Ri and λ:
D k ( R k )
R k
const ,
*
*
R R max 0
14/25
0
Joint hierarchical optimization
Optimal image decomposition and bitrate allocation:
1. discrete version of Lagrange multiplier method,
2. deterministic dynamic programming (forward/backward).
The solution:
1. The image is decomposed to pre-specified number of levels.
2. For the adopted value of quality parameters λ = const, on each level of
decomposition is calculated operational point min(D + λR) for each partition and the
specified set of quantizers.
3. At each level of decomposition split/merge decision is made (principle of optimality)
in the comparison of the Lagrange function of successive levels of decomposition:
D c 1 D c 2 R c 1 R c 2 D p R p
4. Binary search (Newton method) determines the optimal λ * for a given bitrate Rmax
and the initial search interval l , h R * l R max R * u
15/25
2. Operational control of standard-based encoder
Digital video encode exploits statistical redundancy of source
as well as perceptual irrelevancy of an user.
Block-adaptive hybrid transform-entropy encoder with motion estimation&compensation:
Scope of standardization
16/25
2.1 Operational MPEG framework
ITU/MPEG process of standardization:
Encoding techniques and operational parameters:
17/25
Set of operational parameters
The task of an encoder control is to determine the values of the
standardized syntax elements, and thus the bitstream b, for a given
input sequence in a way that the distortion between the input
sequence and its reconstruction is minimized subject to a set of
constraints on average and maximum bit rate.
R(QP)
D(QP)
Let Bc be the set of all conforming bitstreams that obey the given
set of constraints. For distortion measure D, the optimal bitstream in
the rate–distortion sense is given by b * min D s , s ' b
bBc
Due to the huge parameter space and encoding delay, it is
impossible to directly apply the minimization. Instead, the overall
minimization problem is split into a series of K smaller minimization
problems (p is subset of operational parameters)
min D k p ,
p Pk
D
s i sˆ i
e
Rk p Rc
iB
The constrained minimization problem can be reformulated as an
2
MODE cQ
unconstrained minimization, where MOTION MODE
min J MOTION ,
J D ( MAD ) MOTION R ( Q )
min J MODE ,
J D MODE R ( Q )
Q denotes the quantization step size, which is controlled by the
quantization parameter QP.
18/60
2.2 Performance/efficiency of digital video codec
1
1
1
H.265
HD720
2
2
QP=30 BR=512 PSNR= 39.66dB
QP=20 BR=512 PSNR= 34.00dB
2
2
QP=30 BR=512 PSNR= 39.36dB
3
HD720
H.263
H.265
HD720
3
1
3
H.263
QP=31 BR=512 PSNR= 30.94dB
3
H.265
QP=30 BR=512 PSNR= 39.24dB
19/60
H.263
QP=25 BR=512 PSNR= 32.78dB
Coding gain BRCG, PSNR=const
The three test sequences (1/2/3) with typical video conferencing content was selected in
experiments (Vidyo 1280x720 60fps x10s).
Each test sequence was coded at 12 different bitrates. The ORD function PSNRYUV(BR) are shown
for bitrates BR = 0.256, 0.384, 0.512, 0.850, 1.500 Mbps
The combined PSNRYUV is first calculated as the weighted sum of the PSNR per picture of the
individual components (PSNR) to obtain PSNRYUV = (6·PSNRY + PSNRU + PSNRV)/8
where individual components are computed as PSNR = 10 log10 (2B-1)2/MSE, B=8
1
1
2
3
2
3
BitRate reduction of HEVC
vs. AVC based on subjective
MOS performance for typical
video conferencing bitrates
Coding gain PSNRCG, BR=const
Variability PSNRY per frame (time) for BR=const
(BR~0.512Mbps: QPHEVC=30, QPAVC=32, QPH.263=20/31/25)
1
2
3
21/25
Complexity of encoder/decoder
The encoding and decoding times for the representative HD720 sequences (60fps x 10s) are
shown.Times are recorded in 10s of seconds such as to illustrate the ratio to real-time operation:
the HEVC encoding time exceed 1000 times real-time,
the decoding time exceed 4 times real-time on an Ultrabook x86-64 Core i5 2/[email protected] 4GB RAM.
1
2
3
22/25
2.3 Bitrate control
The objective of rate control is to regulate the MPEG coded bit
stream to satisfy certain given conditions (variable/constant bits
budget constraints, buffer over/underflow prevention).
Variable/Constant (VBR/CBR) bitrate is under control of
constant/variable quantization parameter QP in open/closed loop.
A typical rate-control scheme consists of two basic operations:
1.bit allocation (R-D model), and
2.bit rate control (buffer occupancy measure).
To achieve the target bit rate R, rate control scheme
appropriately chooses a quantization parameter Q . For
accuracy, it is of importance R-Q rate-quantization model.
Together with distortion-quantization D-Q function, R-Q functions
characterize the rate-distortion (R-D) behavior of video encoding.
The first step of the derivation of a rate control formula is to
approximate the rate-distortion function R-Q by an inverse
proportional curve as shown in figure.
23/25
Joint encoding (Det/Stat Mux)
Deterministic multiplex of L video sequences, CBR encoded with constant bitrate Ri
(variable Di and picture quality) in fixed channel caacity Rc :
2
Ri 7836 1
1
2
Ri 9331 2
2
R
2
Ri 7427 3
3
i
R channel , R i const
( CBR )
i
Statistical multiplex of L+SMCG video sequences, VBR encoded with variable bitrate Ri
(constant Di and picture quality). Criteria are joint buffer occupancy measure 0 B f ( t ) B s
1
2
R
3
i
.
.
.
.
.
.
i
R channel , R i R channel
(VBR )
Xi
24/25
Xi
i
References
[1] K.R.Rao, Z.S.Bojkovic, D.A.Milovanovic, Introduction to multimedia communications:
applications – middleware - networking, Wiley, 2005.
[2] K.R.Rao, Z.S.Bojkovic, D.A.Milovanovic, Multimedia communication systems:
techniques, standards, and networks, Prentice Hall, 2002.
[3] Y. Shoham, A Gersho, “Efficient bit allocation for an arbitrary set of quantizers,”
IEEE Trans. ASSP, vol.36,pp. 1445-1453,Sep 1988.
[4] T. Berger, Rate-Distortion theory: A mathematical theory for data compression,
Prentice-Hall, 1971.
[5] D.P. Bertsekas, Constrained optimization and Lagrange multiplier methods,
Athena Scientific, 1996.
[6] R. Bellman, Dynamic Programming, Princeton University Press, 1957.
[7] D.A.Milovanovic, Z.S.Bojkovic, From information theory to standard codec optimization for
digital visual multimedia, Seminar on Computer science and Applied mathematics - June
2013, Mathematical institute of the Serbian Academy of science and arts, and IEEE Chapter
Computer Science (CO-16), Belgrade, Serbia.
[8] D.Milovanović, Z.Milićević, Z.Bojković, MPEG video deployment in digital television: HEVC
vs. AVC codec performance study, 11th International Conference on Telecommunications in
Modern Satellite, Cable and Broadcasting Services TELSIKS2013, Nis, Serbia, Oct. 2013.
25/25