Lossless compression

Download Report

Transcript Lossless compression

Lossless Compression Statistical Model
Part II Arithmetic Coding
2015/7/16
資料壓縮 ※Unit 4 Arithmetic Coding※
1
CONTENTS





Introduction to Arithmetic Coding
Arithmetic Coding & Decoding Algorithm
Generating a Binary Code for Arithmetic
Coding
Higher-order and Adaptive Modeling
Applications of Arithmetic Coding
2015/7/16
資料壓縮 ※Unit 4 Arithmetic Coding※
2
Arithmetic Coding

Huffman codes have to be an integral
number of bits long, while the entropy
value of a symbol is almost always a
faction number, theoretical possible
compressed message cannot be achieved.

2015/7/16
For example, if a statistical method assign 90%
probability to a given character, the optimal code size
would be 0.15 bits.
資料壓縮 ※Unit 4 Arithmetic Coding※
3
Arithmetic Coding


Arithmetic coding bypasses the idea of replacing
an input symbol with a specific code. It replaces a
stream of input symbols with a single floatingpoint output number.
Arithmetic coding is especially useful when
dealing with sources with small alphabets, such
as binary sources, and alphabets with highly
skewed probabilities.
2015/7/16
資料壓縮 ※Unit 4 Arithmetic Coding※
4
Arithmetic Coding
Character
probability
^(space)
1/10
A
1/10
B
1/10
E
1/10
G
1/10
I
1/10
L
2/10
S
1/10
T
1/10
Example (1)
Range
0.00  r  0.10
0.10  r  0.20
0.20  r  0.30
0.30  r  0.40
0.40  r  0.50
0.50  r  0.60
0.60  r  0.80
0.80  r  0.90
0.90  r  1.00
Suppose that we want to encode the message
BILL GATES
2015/7/16
資料壓縮 ※Unit 4 Arithmetic Coding※
5
Arithmetic Coding
0.0
^
0.1
0.2 A
0.3 B
0.4 E
0.5 G
0.6 I
0.2
0.25
2015/7/16
0.256
0.2572
^
0.25724
0.25
I
0.26
0.258
0.3
0.2572
0.256
L
0.8
0.9 S
1.0 T
Example (1)
L
L
0.2576
0.26
0.258
資料壓縮 ※Unit 4 Arithmetic Coding※
0.2576
6
Arithmetic Coding
New character
B
I
L
L
^(space)
G
A
T
E
S
2015/7/16
Low value
0.2
0.25
0.256
0.2572
0.25720
0.257216
0.2572164
0.25721676
0.257216772
0.2572167752
Example (1)
high value
0.3
0.26
0.258
0.2576
0.25724
0.257220
0.2572168
0.2572168
0.257216776
0.2572167756
資料壓縮 ※Unit 4 Arithmetic Coding※
7
Arithmetic Coding


Example (1)
The final value, named a tag, 0.2572167752 will
uniquely encode the message ‘BILL GATES’.
Any value between 0.2572167752 and
0.2572167756 can be a tag for the encoded
message, and can be uniquely decoded.
2015/7/16
資料壓縮 ※Unit 4 Arithmetic Coding※
8
Arithmetic Coding

Encoding algorithm for arithmetic coding.
Low = 0.0 ; high =1.0 ;
while not EOF do
range = high - low ; read(c) ;
high = low + rangehigh_range(c) ;
low = low + rangelow_range(c) ;
enddo
output(low);
2015/7/16
資料壓縮 ※Unit 4 Arithmetic Coding※
9
Arithmetic Coding


Decoding is the inverse process.
Since 0.2572167752 falls between 0.2 and 0.3,
the first character must be ‘B’.

Removing the effect of ‘B’ from 0.2572167752
by first subtracting the low value of B, 0.2, giving
0.0572167752.

Then divided by the width of the range of ‘B’, 0.1.
This gives a value of 0.572167752.
2015/7/16
資料壓縮 ※Unit 4 Arithmetic Coding※
10
Arithmetic Coding


Then calculate where that lands, which is in the
range of the next letter, ‘I’.
The process repeats until 0 or the known length
of the message is reached.
2015/7/16
資料壓縮 ※Unit 4 Arithmetic Coding※
11
r
c
Low
High
range
0.2572167752
B
0.2
0.3
0.1
0.572167752
I
0.5
0.6
0.1
0.72167752
L
0.6
0.8
0.2
0.6083876
L
0.6
0.8
0.2
0.041938
^(space)
0.0
0.1
0.1
0.41938
G
0.4
0.5
0.1
0.1938
A
0.2
0.3
0.1
0.938
T
0.9
1.0
0.1
0.38
E
0.3
0.4
0.1
0.8
S
0.8
0.9
0.1
0.0
2015/7/16
資料壓縮 ※Unit 4 Arithmetic Coding※
12
Arithmetic Coding

Decoding algorithm
r = input_code
repeat
search c such that r falls in its range ;
output(c) ;
r = r - low_range(c) ;
r = r/(high_range(c) - low_range(c));
until r equal 0
2015/7/16
資料壓縮 ※Unit 4 Arithmetic Coding※
13
Arithmetic Coding
Symbol
1
2
3
probability
0.80
0.02
0.18
Example (2)
Range
[0.00, 0.80)
[0.80, 0.82)
[0.82, 1.00)
Suppose that we want to encode the message
1321
2015/7/16
資料壓縮 ※Unit 4 Arithmetic Coding※
14
Arithmetic Coding
0.00
0.00
0.656
1
0.80 2
0.82
3
1.00
2015/7/16
Example (2)
0.7712
0.7712
1
0.7712
0.773504
2
0.77408
0.656
3
0.80
0.80
0.77408
資料壓縮 ※Unit 4 Arithmetic Coding※
0.773504
15
Arithmetic Coding
Example (2)
Encoding:
New character
1
3
2
1
Low value
0.0
0.0
0.656
0.7712
0.7712
High value
1.0
0.8
0.800
0.77408
0.773504
0.7712  0.773504
Tx (1312 ) 
 0.772352
2
2015/7/16
資料壓縮 ※Unit 4 Arithmetic Coding※
16
Arithmetic Coding
Example (2)
Decoding:
r
c
low high range
0.772352
1 0
0.96544
3 0.82 1.0
0.808
0.4
2 0.8
1 0
2015/7/16
0.8
0.8
(0.772352-0)/0.8=0.96544
0.18
(0.96544-0.82) / 0.18=0.808
0.82 0.02
0.8
(0.808-0.8)/0.02=0.4
資料壓縮 ※Unit 4 Arithmetic Coding※
17
Arithmetic Coding



In summary, the encoding process is simply one
of narrowing the range of possible numbers with
every new symbol.
The new range is proportional to the predefined
probability attached to that symbol.
Decoding is the inverse procedure, in which the
range is expanded in proportion to the probability
of each symbol as it is extracted.
2015/7/16
資料壓縮 ※Unit 4 Arithmetic Coding※
18
Arithmetic Coding



Coding rate approaches high-order entropy
theoretically.
Not so popular as Huffman coding because , 
are needed.
Average bits/byte on 14 files (program, object,
text, and etc.)
Huff.
LZW
4.99
4.71
2015/7/16
LZ77/LZ78
2.95
資料壓縮 ※Unit 4 Arithmetic Coding※
Arithmetic
2.48
19
Generating a Binary Code for
Arithmetic Coding

Problem:
The binary representation of some of the
generated floating point values (tags) would
be infinitely long.
We need increasing precision as the length of the
sequence increases.

Solution:
Synchronized rescaling and incremental
encoding.
2015/7/16
資料壓縮 ※Unit 4 Arithmetic Coding※
20
Generating a Binary Code for
Arithmetic Coding



If the upper bound and the lower bound of the
interval are both less than 0.5, then rescaling the
interval and transmitting a ‘0’ bit.
If the upper bound and the lower bound of the
interval are both greater than 0.5, then rescaling
the interval and transmitting a ‘1’ bit.
Mapping rules:
2015/7/16
E1 : [0,0.5)  [0,1);
E1 ( x)  2 x
E2 : [0.5,1)  [0,1);
E2 ( x)  2( x  0.5)
資料壓縮 ※Unit 4 Arithmetic Coding※
21
Arithmetic Coding
0.00
0.00
Example (2)
0.3568 0.3568
0.312
0.3568
0.0848 0.1696
1
0.80 2
0.82
3
1.00
2015/7/16
0.312
0.6784
0.3392
0.09632
0.19264
1
0.38528
0.77056
0.5424 0.38528
0.504256
2
0.6 0.54812
0.656
0.54112
3
0.80
0.6
資料壓縮 ※Unit 4 Arithmetic Coding※
0.54112 0.504256
22
Encoding:
new character
2015/7/16
1
3
rescale
2
rescale
rescale
rescale
rescale
rescale
1
EOF
lower
0
0
0.656
0.312
0.5424
0.0848
0.1696
0.3392
0.6784
0.3568
0.3568
upper
1
0.8
0.8
0.6
0.5481
0.0963
0.1926
0.3853
0.7706
0.5411
0.5043
code
1
1
0
0
0
1
1
資料壓縮 ※Unit 4 Arithmetic Coding※
Any binary value
between lower or
upper.
23

Decoding the bit stream start with 1100011…

The number of bits to distinct the different
symbol is  log2 0.02  6 bits.
binary
110001
100011
decimal original range
0.765625
0~0.8
0.9525
0.82~1
c
1
3
0.546875
0.815
new range
(0.76525-0)/0.8=0.9525
0.656~0.8
0.312~0.6
2
(0.546875-0.312)/(0.6-0.312)=0.815
0.5424~0.54816
rescale
000110
0.0848~0.09632
rescale
001100
0.1696~0.19264
rescale
011000
0.3392~0.38528
rescale
110000
0.6784~0.77056
rescale
100000
0.8~0.82
rescale
0.5
0.777
2015/7/16
0.3568~0.54112 (0.5-0.3568)/(0.54112-0.3568)=0.777
0~0.8
1
資料壓縮 ※Unit 4 Arithmetic Coding※
24
Higher-order and Adaptive
Modeling

To have a good compression ratio results in
the statistical model compression methods,
the model should be



Accurately predicts the frequency/ probability
of symbols in the data stream.
A non-uniform distribution
The finite context modeling provide a better
prediction ability.
2015/7/16
資料壓縮 ※Unit 4 Arithmetic Coding※
25
Higher-order and Adaptive
Modeling

Finite context modeling :

Calculate the probabilities for each incoming symbol
based on the context (上下文) in which the symbol
appears.


2015/7/16
p(u)  0.05
p(u | q)  0.95
The order of the model refers to the number of
previous symbols that make up the context.


e.g.
e.g.
p( xn | xn1 , xn2 ,, xnk )
In information theory, this type of finite context
modeling is called Markov process/system.
資料壓縮 ※Unit 4 Arithmetic Coding※
26
Higher-order and Adaptive
Modeling

Problem:



As the order of the model increases linearly,
the memory consumed by the model
increases exponentially.
e.g. for q symbols and order k, the table size
will be qk.
Solution:

2015/7/16
Adaptive modeling
資料壓縮 ※Unit 4 Arithmetic Coding※
27
Higher-order and Adaptive
Modeling

Adaptive modeling



2015/7/16
In adaptive data compression, both the compressor
and decompressor start with the same model.
The compressor encodes a symbol using the existing
model, then it updates the model to account for the
new symbol.
The decompressor likewise decodes a symbol using
the existing model, then it updates the model.
資料壓縮 ※Unit 4 Arithmetic Coding※
28
Higher-order and Adaptive
Modeling



Adaptive data compression has a slight
disadvantage in that it starts compressing with
less than optimal statistics.
By subtracting the cost of transmitting the
statistics with the compressed data, however, an
adaptive algorithm will usually perform better
than a fixed statistical model.
Adaptive compression also suffers in the cost of
updating the model.
2015/7/16
資料壓縮 ※Unit 4 Arithmetic Coding※
29

Higher-order and Adaptive
Modeling
Encoding phase
low = 0.0 ; high = 1.0 ;
while not EOF do
read(c) ;
range = high - low ;
high = low + range *high_ range(context,c);
low = low + range *low_ range(context,c);
update_model(context,c);
context = c ;
enddo
output(low);
2015/7/16
資料壓縮 ※Unit 4 Arithmetic Coding※
30
Higher-order and Adaptive
Modeling

Instead of just having a single context table, we
now have a set of q context tables.

Every symbol is encoded using the context table
from the previously seen symbol, and only the
statistics for the selected context get updated after
the symbol is seen.
2015/7/16
資料壓縮 ※Unit 4 Arithmetic Coding※
31
Higher-order and Adaptive
Modeling

Decoding phase
r = input_code ;
repeat
search c from context_table [context] s.t. r falls in its range ;
output(c) ;
range = high_ range(context,c) - low_ range(context,c) ;
r = r - low_ range(context,c) ;
r = r/ range ;
update_model(context,c) ;
context = c ;
until r equal 0.
2015/7/16
資料壓縮 ※Unit 4 Arithmetic Coding※
32
Applications
The JBIG Standard
JBIG --- Joint Bi-Level Image Processing Group
JBIG was issued in 1993 by ISO/IEC for the
progressive lossless compression of binary and
low-precision gray-level images (typically,
having less than 6 bits/pixel).
The major advantages of JBIG over other
existing standards are its capability of progressive
encoding and its superior compression efficiency.
2015/7/16
資料壓縮 ※Unit 4 Arithmetic Coding※
33
The JBIG Standard
Context-based arithmetic coder
The core of JBIG is an adaptive context-based
arithmetic coder.
If the probability of encountering a black pixel p is
0.2 and the probability of encountering a white pixel
q is 0.8.
Using a single arithmetic coder, the entropy is
H  0.2 log2 0.2  0.8 log2 0.8  0.722
2015/7/16
資料壓縮 ※Unit 4 Arithmetic Coding※
34
The JBIG Standard
Context-based arithmetic coder
Group the data into Set A (80%) and Set B (20%), using
two coders
pw = 0.95, pb = 0.05, HA = 0.286
pw = 0.3, pb = 0.7, HB = 0.881,
then, the average H = HA *.8+HB *.2 = 0.405.
The number of possible patterns is 1024. The JBIG coder
uses 1024 or 4096 coders
Pixel to be
coded
2015/7/16
資料壓縮 ※Unit 4 Arithmetic Coding※
35
Experimental Results
Compression using adaptive arithmetic codes on pixel values.
Image Bits/Pixel Total Size
Compression
Compression
Name
(bytes)
Ratio (arithmetic) Ratio(Huffman)
Sena
6.52
56,431
1.23
1.16
Sensin
7.12
58,306
1.12
1.27
Earth
4.67
38,248
1.71
1.67
Omaha
6.84
56,061
1.17
1.14
2015/7/16
資料壓縮 ※Unit 4 Arithmetic Coding※
36
Experimental Results
Compression using adaptive arithmetic codes on pixel differences.
Image
Bits/Pixel Total Size Compression
Compression
Name
(bytes)
Ratio (arithmetic) Ratio(Huffman)
Sena
3.89
31,847
2.06
2.08
Sensin
4.56
37,387
1.75
1.73
Earth
3.92
32,137
2.04
2.04
Omaha
6.27
51,393
1.28
1.26
2015/7/16
資料壓縮 ※Unit 4 Arithmetic Coding※
37
Conclusions

Compression-ratio tests show that statistical modeling can
perform at least as well as dictionary - based methods.
But the high order programs are at present somewhat
impractical because of their resource requirements.

JPEG, MPEG-1/2 uses Huffman and arithmetic coding –
preprocessed by DPCM

JPEG-LS

JPEG2000, MPEG-4 uses arithmetic coding only

Order-3 : the best performance for Unix.
2015/7/16
資料壓縮 ※Unit 4 Arithmetic Coding※
38