Distributed Source Coding

Download Report

Transcript Distributed Source Coding

Distributed Source Coding
By
Raghunadh K Bhattar, EE
Dept, IISc
Under the Guidance of
Prof.K.R.Ramakrishnan
Outline of the Presentation










Introduction
Why Distributed Source Coding
Source Coding
How Source Coding Works
How Channel Coding Works
Distributed Source Coding
Slepian-Wolf Coding
Wyner-Ziv Coding
Applications of DSC
Conclusion
Why Distributed Source Coding ?



Low Complexity Encoders
Error Resilience – Robust to transmission errors
The above two attributes make the DSC an
enabling technology for wireless communications
Low Complexity Wireless Handsets
Courtesy Nicolas Gehrig
Distributed Source Coding (DSC)

Compression of Correlated Sources –
Separate Encoding & Joint Decoding
X
Encoder
RX
Statistically But Physical Distinct
dependent
Y
Encoder
Decoder
RY
X , Y 
Source Coding (Data Compression)




Exploit the redundancy in the source to reduce the data
required for storage or for transmission
Highly complex encoders are required for compression
(MPEG, H.264 …)
However, Simple decoders !
The Highly complex encoders require



Bulky handsets
Power consumption
Battery Life
How Source Coding Works

Types of redundancy





Spatial redundancy - Transform or predictive coding
Temporal redundancy - Predictive coding
In predictive coding, the next value in the sequence is
predicted from the past values and the predicted value is
subtracted from the actual value
The difference is only sent to the decoder
Let the Past values are in C, the predicted value is y =
f(C). If the actual value is x, then (x – y) sent to the
decoder.


Decoder, knowing the past values C, can also
predict the value of y.
With the knowledge of (x – y), the decoder finds
the value of x, which is the desired value
x
x-y
+
x-y
x
+
-
y
y
Prediction
Prediction
Past Values
(C)
Past Values
(C)
Encoder
Decoder
Compression – Toy Example
Suppose, X and Y – Two uniformly distributed i.i.d Sources.
X, Y X  3bits
If they are related (i.e., correlated)
000 Y  3bits
Can we reduce the data rate?
001
Let the relation be :
010
X and Y differ at most by one bit
011
i.e., the Hamming distance
100
between X and Y is maximum one
101
110
xi  yi  1
111
i

H(X) = H(Y) = 3bits
Let Y = 101 then X = 101 (0), 100 (1), 111 (2), 001 (3)
Code = XY
H(X/Y) = 2bits
Need 2bits to transmit X and 3bits for Y and total 5 bits for
both X and Y instead of 6bits.
Here we should know what is the outcome of Y, then we
code the X with 2bits.
Decoding = Y Code; Code 0 = 000, 1 = 001, 2 = 010, 3 =
100;



Now assume that, we don’t know the outcome of
the Y (but sent to decoder using 3bits), can I still
transmit X using 2 bits?
The Answer is YES (Surprisingly!)
How?
Partition





Group all the symbols into four groups each
consists of two members
{(000),(111)} 0
Trick = Partition each
{(001),(110)} 1
set with Hamming
{(010),(101)} 2
distance 3
{(100),(011)} 3




The encoding of X is simply done by sending the
index of the set that actually contains X
Let X = (100) then the index for X = 3
Let the decoder already received a correlated Y
(101)
How we recover X knowing the Y (101) (from now
onwards call this as side information) at decoder
and index (3) from X





Since, index is 3 we know that the value of X is either
(100) or (011)
Measure the Hamming distance between the two
possible values of X with side information Y
(100)(101) = (001) = Ham dis = 1
(011)(101) = (110) = Ham dis = 2
 X = (100)
Source Coding




Y = 101
X = 100
Code = (100)(101)
= 001 = 1
Decoding =
YCode =
=(101)(001)
= 100 = X
Y
000
Code
001
X’
001
001
001
000
010
001
011
011
001
010
100
001
010
101
001
100
110
001
111
111
001
110
X
X
X
X
X

X
X
Code = 3
Side Information
Decoding
Output
Y = 000
011000 = 2
100000 = 1
X = 100
Y = 110
011110 = 2
100110 = 1
X = 100
Y = 101
011101 = 2
100101 = 1
X = 100
Y = 100
011100 = 3
100100 = 0
X = 100
Y = 111
011111 = 1
100111 = 2
X = 011
Uncorrelated Side Information
Erroneous Decoding
No Error in Decoding
X =100
Correlated Side Information
Distributed Source Coding



How to partition the input sample space? Always I have
to find some trick ? If input sample space is large (even if
few hundreds), can I still find the trick???
The trick is matrix multiplication and we have to have one
such matrix, which partition the input space.
For above toy example the matrix is
1
H 
1
1
0
0
1

Index = X*HT in GF(2) field
H is the parity check matrix in
Error correction terminology
Coset Partition
Now,
we see again the partitions.
{(000),(111)}
0
This is the repetition code
{(001),(110)}
1
(in error correction
{(010),(101)}
2
terminology.)
{(100),(011)}
3
These are the Cosets of
the repetition code induced by the elements of the
sample space of X
Channel Coding


In channel coding, controlled redundancy is added to the
information bits to protect the them from channel noise
We can classify channel coding or error control coding
into two categories




Error Detection
Error Correction
In Error Detection, the introduced redundancy is just
enough to detect errors
In Error Correction, we need to introduce more
redundancy.
dmin = 1
dmin= 2
dmin= 3
 d min  1
t 

2


Parity Check









X
000
001
010
011
100
101
110
111
Parity
0
1
1
0
1
0
0
1
Minimum Hamming Distance =2
How to make Hamming distance 3 ?
It is not clear (or not easy to make
minimum hamming distance 3)
Slepian-Wolf theorem: The Slepian-Wolf theorem states that the
correlated sources that don’t communicate each other can be coded at a
rate equal to the rate at which they are coded jointly. No performance
loss occurs if they are decoded jointly.
•
When correlated sources are coded independently, but decoded
jointly, then the minimum data rate for each source is lower
bounded by
RX  RY  H ( X , Y )
RX  H ( X | Y )
RY  H (Y | X )
•
Total data rate should be atleast equal to (or greater) than H(X,Y)
and individual data rates should be atleast equal to (or greater
than) H(X/Y) and H(Y/X) respectively
J. D. Slepian and J. K. Wolf, “Noiseless coding of correlated information
sources,” IEEE Trans. Inf. Theory, vol. 19, pp. 471–480, July 1973.
DISCUS (DIstributed Source Coding Using
Syndrome)




The first constructive realization of the Slepian-Wolf boundary
using practical channel codes was proposed where singleparity check codes were used with the binning scheme.
Wyner first proposed to use capacity achieving binary linear
channel code to solve the SW compression problem for a
class of joint distributions
DISCUS extended the results of Wyner idea to the distributed
rate-distortion (lossy compression) problem using channel
codes
S. Pradhan and K.Ramchandran, “Distributed source coding using
syndrome(DISCUS),” in IEEE Data Compression Conference, DCC-1999,
Snowbird,UT, 1999.
Distributed Source Coding
(Compression with Side Information)
X
Encoder
RX  H ( X | Y )
Decoder
Statistically
dependent
Y
RY  H (Y )
Side Information
Available at Decoder
Lossless
X , Y 
Achievable Rate Region - SWC

Rx > H(X/Y)
Ry > H(Y)
Ry
Separate Coding
No Errors
Rx > H(X)
Ry > H(Y)
H(X,Y)
H(Y)
A
C
Achievable Rates with
Slepian-Wolf Coding
Rx > H(X/Y)
Ry > H(Y)
H(Y/X)
B
Rx + Ry = H(X,Y) = Joint
Encoding and Decoding
0
H(X/Y)
H(X)
H(X,Y)
Rx
Code X with Y as
side-information
RY
No errors
H(Y)
R
Vanishing error
probability for
long sequences
Time-sharing/
Source splitting/
Code partitioning
H(Y|X)
Code Y with X as
side-information
H(X|Y)
H(X)
H(X,Y)
RX
How Compression Works ?
Redundant Data
(Correlated Data)
Remove
Redundancy
Compressed
Data
(Decorrelated
Data)
How Channel Coding Works ?
Decorrelated
Data
+
Redundant Data
Generator
Correlated Data
Duality Between
Source Coding and Channel Coding
Source Coding
Channel Coding
Compress the Data
Expands the Data
De-correlates the Data Correlates the Data
Complex Encoder
Simple Encoder
Simple Decoder
Complex Decoder
Code Word
Channel
(Additive Noise)
Data Expansion =
n
times
k
Decoding
Bits Bits
Parity BitsInformation
Information
n-k
n
Channel
Parity Bits
k
Information Bits
Channel Coding or Error Correction Coding
Coding
n-k
Compressed
Data
Decompression
Y
=
Parity
Bits
Parity Bits
X
Channel
Parity Bits
k
Information Bits (X)
Channel Codes for DSC
x +
Channel
Decoding
Correlation Model (
= Noise)
x
= YY
Turbo Coder for Slepian-Wolf Encoding
X
L bits in
Systematic Convolutional
L bits
Discarded
Encoder Rate
n 1
n
Interleaver
length L
Systematic Convolutional
X P1
L
bits
n 1
L
bits
n 1
X P2
Encoder Rate
n 1
n
Discarded
L bits
2
RX 
n 1
Curtsey
Anne Aaron and Bernd Girod
Turbo Decoder for Slepian-Wolf Decoding
L
bits in
n 1
1
P
X
L
bits in
n 1
2
P
X
Pchannel
Channel
probabilities
calculations
P( x | y )
Channel
probabilities
calculations
Interleaver
length L
Y
Curtsey
Anne Aaron and Bernd Girod
SISO Pa posteriori
Decoder
Pa priori
Deinterleaver
length L
Pextrinsic
Interleaver
length L
Pextrinsic Pa priori
Pchannel SISO
Decoder
Decision
Deinterleaver
length L
Pa posteriori
X
Wyner’s Scheme




Use a linear block code, send syndrome
(n,k) block code, 2(n-k) syndromes, each corresponding
to a set of 2k words of length n.
Each set is a coset code.
Compression ratio of n:(n-k).
X
Lossless Encoder
(Syndrome Former)
R ≥ H(X|Y)
Syndrome
bits
Joint Decoder
^
X
Y
A D Wyner, "Recent Results in the Shannon Theory” in IEEE Transactions On
Information Theory, VOL. IT-20, NO. 1, JANUARY 1974
A. D. Wyner, “On source coding with side information at the decoder,” IEEE Trans.
Inf. Theory, vol. 21, no. 3, pp. 294–300, May 1975.
Syndrome
n-k
Decoding
x
H
Corrupted Codeword
=
x =
Compressed
Data
Syndrome
Former
n
Corrupted Codeword
Decompression
Linear Block Codes for DSC
Y
n
Correlation Model for Side Information
Compression Ratio =
n
nk
x +
Correlation Model (
= Noise)
= YY
X
LDPC Encoder
(Syndrome Former
Generator)
Compressed Data
Syndrome (s)
s  xH T
Y
Entropy Coding
s
LDPC Decoder
Side Information (Y)
Decompressed
Data
Xˆ
Correlation Model
The Wyner-Ziv theorem





Wyner and Ziv extended the work by Slepian and Wolf by
studying the lossy case in the same scenario, where signals X
and Y are statistically dependent.
Y is transmitted at a rate equal to its entropy (Y is then called
Side Information) and what needs to be found is the minimum
transmission rate for X that introduces no more than a certain
distortion D.
The Wyner-Ziv rate-distortion function, which is the lowest
bound for Rx.
For
MSE
distortion
and
Gaussian
statistics,
rate-distortion functions of the two systems are the same.
A.D.Wyner and J.Ziv, “The rate distortion function for source coding with
side information at the decoder,” IEEE Transactions on Information theory,
vol. 22, no. 1, pp. 1–10, January 1976.
Wyner-Ziv Codec

A codec that intends to separately encode signals X and Y
while jointly decoding them, but does not aim at recovering
them perfectly, it expects some distortion D in the
reconstruction is called a Wyner-Ziv codec.
Wyner-Ziv Coding Lossy Compression with Side
Information
X
Encoder
RX|Y (d)
Decoder
Y
X
Y
For MSE distortion and Gaussian statistics,
rate-distortion functions of the two systems are the same.
The rate loss R*(d) – RX|Y (d) is bounded.
X
Encoder
R*(d)
Decoder
Y
X

The structure of the Wyner-Ziv encoding and decoding

Encoding consists of quantization followed by a binning
operation encoding U into Bin (Coset) index.

Structure of distributed decoders. Decoding consists of “debinning” followed by estimation
Wyner-Ziv Coding (WZC) - A joint source-channel
coding problem
Pixel-Domain Wyner-Ziv Residual Video Codec
Wyner-Ziv Decoder
Wyner-Ziv Encoder
WZ
frames
X
Slepian-Wolf Codec
Scalar
Quantizer
Q-1
Xer
Buffer
Request
bits
-
LDPC
Encoder
LDPC
Decoder
I
Conventional
Intraframe coding
-
Side
information
Xer
Y
Frame
Memory
Key
frames
X’
Reconstruction
Interpolation/
Extrapolation
Conventional
Intraframe
decoding
I’
Distributed Video Coding




Distributed coding is a new paradigm for video
compression, based on Slepian and Wolf’s (lossless
coding) and Wyner and Ziv’s (lossy coding) information
theoretic results.
Enables low-complexity video encoding where the bulk of
the computation is shifted to the decoder.
A second architectural goal is to allow for far greater
robustness to packet and frame drops.
Useful for wireless video applications by means of
transcoding architecture use.
PRISM







PRISM (Power-efficient, Robust, hIgh compression
Syndrome based Multimedia)
The PRISM is a practical video coding framework built on
distributed source coding principles.
Flexible encoding/decoding complexity
High compression efficiency
Superior robustness to packet/frame drops
Light yet rich encoding syntax
R. Puri, A. Manjumdar, and K.Ramchandran, “PRISM: A video coding
paradigm with motion estimation at the decoder,” IEEE Transactions on
Image Processing, vol. 16, no. 10, pp. 2436–2448, October 2007.
DIStributed COding for Video sERvices
(DISCOVER)






DISCOVER is a new video coding scheme which has a strong
potential of new applications, targeting new advances in
coding efficiency, error resilience and scalability
At the encoder side the video is split into two parts.
The first set of frames called key frame are encoded with
conventional H.264/AVC encoder.
The remaining frames known as Wyner-Ziv frames which are
coded using distributed coding principle
X.Artigas, J.ascenso, M.Dalai, D.Kubasov, and M.quaret, “The discover
codec: Architecture, techniques and evaluation,” Picture Coding
Symposium, 2007.
www.discoverdvc.org
A Typical Distributed Video coding
Side Information from Motion-Compensated
Interpolation
WZ frame
Wyner-Ziv
Residual
Encoder
W
Wyner-Ziv
Residual
Decoder
WZ parity bits
Side
information
Previous key frame as
encoder reference
I
wz
I wz
I
Decoded
WZ frames
W’
Y
Decoded
frames
Interpolation
I
I
I
Wyner-Ziv DCT Video Codec
WZ frames
Decoded WZ frames
Intraframe
Encoder
W
DCT
Xk
2 M k level
q
…
planes
bit-plane Mk
For each transform band k
Turbo
Encoder
Buffer
Request
bits
Quantizer
bit-plane 1
k Extract bit- bit-plane 2
Interframe
Decoder
Turbo
Decoder
qk’
W’
IDCT
Xk’
Reconstruction
Side
information
Yk
DCT
Y
Interpolation/
Extrapolation
Key frames
K
Conventional
Intraframe coding
Conventional
Intraframe
decoding
K’
Foreman sequence
Side information
After Wyner-Ziv Coding
16-level quantization (~1 bpp)
Sample Frame (Foreman)
Side information
After Wyner-Ziv Coding
16-level quantization (~1 bpp)
Carphone Sequence
H263+ Intraframe Coding
410 kbps
Wyner-Ziv Codec
384 kbps
Salesman sequence at 10 fps
DCT-based Intracoding
247 kbps
PSNRY=33.0 dB
Wyner-Ziv DCT codec
256 kbps
PSNRY=39.1 dB
GOP=16
Salesman sequence at 10 fps
H.263+ I-P-P-P
249 kbps
PSNRY=43.4 dB
GOP=16
Wyner-Ziv DCT codec
256 kbps
PSNRY=39.1 dB
GOP=16
Hall Monitor sequence at 10 fps
DCT-based Intracoding
231 kbps
PSNRY=33.3 dB
Wyner-Ziv DCT codec
227 kbps
PSNRY=39.1 dB
GOP=16
Hall Monitor sequence at 10 fps
H.263+ I-P-P-P
212 kbps
PSNRY=43.0 dB
GOP=16
Wyner-Ziv DCT codec
227 kbps
PSNRY=39.1 dB
GOP=16
Facsimile Image Compression with DSC
CCITT 8 Image
Reconstructed Image with 30 errors
Fax4 Reconstructed Image with 8 errors
Applications
•
Very low complexity encoders
•
Compression for networks of cameras
•
Error-resilient transmission of signal waveforms
•
Digitally enhanced analog transmission
•
Unequal error protection without layered coding
•
Image authentication
•
Random access
•
Compression of encrypted signals
Cosets









Let G = {000,001,…111}
Let H = {000,111} is a subgroup of G
Coset are
001000 = 001
001111 = 110
Hence, {001, 110} is one coset
010000 = 010
010111 = 101
{010, 101} is another coset and so on…
Hamming Distance






Hamming distance is a distance measure defined as the
number of bits two binary sequence differ
Let X and Y be two binary equences, the Hamming
distance between X and Y is defined as
i xi  yi
Hamming distance =
Example
Let X = {0 0 1 1 1 0 1 0 1 0}
Let Y = {0 1 0 1 1 1 0 0 1 1}
Hamming distance
Sum({0 1 1 0 0 1 1 0 0 1}) = 5