Transcript Lecture 4

Lecture 4

Source Coding and Compression

Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: [email protected]

1

Static vs. Adaptive Coding

Static (Two-Pass Model) Encoder

1. Initialize the

data model

based on a first pass over the data (i.e., perform the

probabilities analysis

) 2. Transmit the data model (

encoder

).

3. Send data and while there is more data to send: -- Encode the next symbol using the existing data model and send it.

Decoder

1. Receive the data model (

decoder

).

2. Receive the data and while there is more data to receive -- Decode the next symbol using the data model and output it.

Summary about the Two-Pass procedure:

1. Collect statistics, generate codewords (1 st pass round) 2. Perform actual encoding/compression (2 nd pass round) 3. Not practical in many situations (e.g., compressing network transmissions) 2

Static vs. Adaptive Coding

Adaptive (One-Pass Model) Encoder

1. Initialize the data model as fixed probability and fixed code length.

2. Send data first and while there is more data to send a. Encode the next symbol using the data model (if we have) and send it.

b. Modify the existing data model based on the last symbol.

Decoder

1. Initialize the data model as per agreement.

2. While there is more data to receive a. Decode the next symbol using the data model and output it.

b. Modify the data model based on the decoded symbol.

What Do We Find ?

No Encoder map to send!

3

Huffman Coding (e.g.: Lossless JPEG) Properties:

I. Huffman codes are built from the bottom up, starting with the leaves of the tree and working progressively closer to the root II. Huffman coding will always at least work more efficient than Shannon-Fano coding, so it has become the predominate entropy coding method III. It was shown that Huffman coding cannot be improved or with any other integral bit width coding stream

Sibling Property: Defined by Gallager [Gallager 1978]: “A binary code tree has the sibling property if each node (except the root) has a sibling and if the nodes can be listed in order of nonincreasing (decreasing) weight with each node adjacent to its sibling.” Thus: 1- If A is the parent node of B (left) and C (right) is a child of B, then W(A) > W(B) > W(C) Thus if A is the parent node of B (left) and C (right), then W(B) < W(C)

4

Huffman Coding Properties A binary tree is a Huffman tree if and only if it obeys the sibling property, i.e., W(#1) ≤ W(#3) ≤ W(#3) ≤ … ≤ W(#7) ≤ W(#8) ≤ W(#9)

#1 A(1) #5(3) #2 B(2) #7(7) #3 C(2) #6(4) #9(17) #4 D(2) 5 #8 E(10)

Adaptive Huffman Coding

Algorithm:

o Given: alphabet

S

= {

s

1 , …,

s

n } (

NO Probabilities !!!

) o Pick a fixed default binary codes for all symbols (

block/quadratic code

) o Start with an empty “Huffman” tree (I said and I mean it –

Empty

) o Read symbol

s

from source If

NYT

(

s

) %% (//) Not Yet Transmitted Send NYT, default(s) (except for the first symbol) Update the tree (and keep it

Huffman

) Else Send

codeword

for s • Update tree o Repeat until done with all symbols in the source 6

Example (Adaptive Huffman)

• Assume we are encoding the message [a a r d v a r k] • The total number of nodes in this tree will be (at most) 2*n – 1 + 2 = 2*26 -1 +2 = 53 where n is the number of usable alphabets and +2 is only for the “

NYT

” and its “

node

” • The first letter to be transmitted is “a” • As a does not yet exist in the tree, we send a binary code 00000 for a and then add a to the tree • The NYT node gives birth to a new NYT node and a terminal node corresponding to “a” • In this example, we will consider only 51 nodes and leaves (instead of 53!!). However, the correct is 53. The weight of the terminal node will be higher than the NYT node, so we assign the number 49 to the NYT node and 50 to the terminal node “a” • The next symbol is a, and the transmitted code is 1 now (as a = 1 only now!) • Lest see an example … (we first starts with a fixed code!) 7

Example: Adaptive Huffman Coding

Input: aardvark Output:

Symbol

NYT a r d v k

Code

0

To keep the rest of the slides as is, we started as the book with 51; however, the correct thing is to start with 53!

8

Example: Adaptive Huffman Coding

Input:

a

ardvark Output:

00000 Symbol

NYT a r d v k

Code

0 9

Example: Adaptive Huffman Coding

Input:

a

ardvark Output: 00000

Symbol

NYT a r d v k

Code

0 1 1 1 10

Example: Adaptive Huffman Coding

Input: a

a

rdvark Output: 00000

1 Symbol

NYT a r d v k

Code

0 1 11

Example: Adaptive Huffman Coding

Input: aa

r

dvark Output: 000001

0 10001 Symbol

NYT a r d v k

Code

0

0

1

01

12

Example: Adaptive Huffman Coding

Input: aar

d

vark Output:000001010001

00 00011 Symbol

NYT a r d v k

Code

00

0

1 01

001

13

Example: Adaptive Huffman Coding

Input: aard

v

ark Output:0000010100010000011

000 Symbol

NYT a r d v k

Code

000

0

1 01 001 0001??

14

Example: Adaptive Huffman Coding

Input: aard

v

ark Output:0000010100010000011

000 Symbol

NYT a r d v k

Code

000

0

1 01 001 0001??

15

Example: Adaptive Huffman Coding

Input: aard

v

ark Output:0000010100010000011

000 Symbol

NYT a r d v k

Code

000

0

1 01 001 0101 ??

16

Example: Adaptive Huffman Coding

Input: aard

v

ark Output:0000010100010000011

000 Symbol

NYT a r d v k

Code 000

1 01 001 ??

17

Example: Adaptive Huffman Coding

Input: aard

v

ark Output:0000010100010000011000

10101 Symbol

NYT a r d v k

Code

1100 0 10 111

1101

18

Example: Adaptive Huffman Coding

Input: aardv

a

rk Output:000001010001000001100010101

0 Symbol

NYT a r d v k

Code

1100 0 10 111 1101 19

Example: Adaptive Huffman Coding

Input: aardva

r

k Output:0000010100010000011000101010

10 Symbol

NYT a r d v k

Code

1100 0 10 111 1101 20

Example: Adaptive Huffman Coding

Input: aardvar

k

Output:0000010100010000011000101010 10

1100 01010 Symbol Code

NYT a r d v k

1100

0 10 111 0?

1101 11001??

21

Example: Adaptive Huffman Coding

Input: aardvar

k

Output:0000010100010000011000101010 101100

01010 Symbol Code

NYT a r d v k

1100

0 10 111 0?

1101 11001 ??

22

Example: Adaptive Huffman Coding

Input: aardvar

k

Output:0000010100010000011000101010 101100

01010 Symbol Code

NYT a r d v k 11100 0 10 110

1111

11101 23

Adaptive Huffman Decoding

Input:

a

Output:

00000

10100010000011000101010 10110001010

Symbol Code

NYT a r d v k 1 1 24

Adaptive Huffman Decoding

Input:

aa

Output:

000001 0

100010000011000101010 10110001010

Symbol Code

NYT a r d 0 1 v k 25

Adaptive Huffman Decoding

Input: aa

r

Output:000001

0 10001 00

00011000101010 10110001010

Symbol Code

NYT a r d 0

0

1

01

v k 26

Adaptive Huffman Decoding

Input: aar

d

Output:00000101000100

00011 000

101010 10110001010

Symbol Code

NYT a r d v k

000

1 01

001

27

Adaptive Huffman Decoding

Input: aard

v

Output:0000010100010000011

000 10101

0 10110001010

Symbol Code

NYT a r d v k 000 1

0 ?

01 001 0001??

28

Adaptive Huffman Decoding

Input: aard

v

Output:0000010100010000011

000 10101

0 10110001010

Symbol Code

NYT a r d v k 000 1 01 001

0

0001??

29

Adaptive Huffman Decoding

Input: aard

v

Output:0000010100010000011

000 10101

0 10110001010

Symbol Code

NYT a r d v k 000 1 01 001

0

0101 ??

30

Adaptive Huffman Decoding

Input: aard

v

Output:0000010100010000011

000

101010 10110001010

Symbol Code NYT

a r d v k

000

1 01 001 ??

31

Adaptive Huffman Decoding

Input: aard

v

Output:0000010100010000011 000

10101

0 10110001010

Symbol Code

NYT a r d v k 1100 0 10 111

1101

32

Adaptive Huffman Decoding

Input: aardv

a

Output:000001010001000001100010101

0

10110001010

Symbol Code

NYT a r d v k 1100 0 10 111 1101 33

Adaptive Huffman Decoding

Input: aardva

r

Output:0000010100010000011000101010

10

110001010

Symbol Code

NYT a r d v k 1100 0 10 111 1101 34

Adaptive Huffman Decoding

Input: aardvar Output:0000010100010000011000101010 10

1100

01010

Symbol Code NYT

a r d v k

1100

0 10 111 1101 35

Adaptive Huffman Decoding

Input: aardvar

k

Output:0000010100010000011000101010 10

1100 01010 Symbol Code

NYT a r d v k 11000 0 10 111 1101 ??

36

Adaptive Huffman Decoding

Input: aardvar

k

Output:0000010100010000011000101010 10

1100 01010 ?

Symbol Code

NYT a r d v k 11100 0 10 110 1111 11101 37

Adaptive Huffman Exercise Try to solve the following!

Find the adaptive Huffman encoder (compressor) for the following text: raaaabcbaacvkl Assuming 26 alphabet set!

38

Adaptive Huffman Notes To Follow the Text Book example:

If the source has an alphabet {

a 1 ,a 2 , …, a m

} of size

m m = 2 e +r

and

0 ≤ r <2 e

. The letter

a k

, then pick

e

and

r

such that is encoded as the ﴾

e+1

﴿-bit corresponds to

k−1

, iff

1≤ k ≤2r

; else,

a k

is encoded as (only) the

e

-bit binary representation of

k−r−1

.

Example: suppose

m

= 26, then

e

= 4, and

r

=10. Then symbol a1 is encoded as 00000, (“a” in English) the symbol a2 is encoded as 00001, (“b” in English) and the symbol a22 is encoded as 1011 (“b” in English) 39

Adaptive Huffman Applications Lossless Image Compression Steps to have lossless image compression:

1. Generate a Huffman code for each uncompressed image (but already quantized and compressed with lossy methods) 2. Encode the image using the Huffman code 3. Save it in a file again !!!

The original (uncompressed) image representation uses 8 bits/pixel. The image consists of 256 rows of 256 pixels, so the uncompressed representation uses 65,536 bytes.

Compression ratio → number of bytes (uncompressed): number of bytes compressed 40

Adaptive Huffman Applications Lossless Image Compression

41

Adaptive Huffman Applications Lossless Image Compression Image Name

Sena Sensin Earth Omaha

Bits/Pixel

7.01

7.49

4.94

7.12

Total Size (B)

57,504 61,430 40,534 58,374

Compression Ratio

1.14

1.07

1.62

1.12

Huffman (Lossless JPEG) Compression Based on Pixel value

42

Adaptive Huffman Applications Lossless Image Compression Image Name

Sena Sensin Earth Omaha

Bits/Pixel

4.02

4.70

4.13

6.42

Total Size (B)

32,968 38,541 33,880 52,643

Compression Ratio

1.99

1.70

1.93

1.24

Huffman Compression Based on Pixel Difference value and Two-Pass Model

43

Adaptive Huffman Applications Lossless Image Compression Image Name

Sena Sensin Earth Omaha

Bits/Pixel

3.93

4.63

4.82

6.39

Total Size (B)

32,261 37,896 39,504 52,321

Compression Ratio

2.03

1.73

1.66

1.25

Huffman Compression Based on Pixel Difference Value and One-Pass Adaptive Model

44

Adaptive Huffman Applications Lossless Image Compression Image Name

Sena Sensin Earth Omaha

Bits/Pixel

3.93

4.63

4.82

6.39

Total Size (B)

32,261 37,896 39,504 52,321

Compression Ratio

2.03

1.73

1.66

1.25

Huffman Compression Based on Pixel Difference Value and One-Pass Adaptive Model

45

Optimality of Huffman Codes!

The necessary conditions for an optimal variable-length binary code: Condition 1:

Given any two letters

a j

and is the number of bits in the codeword for

a j.

a k

, if

P(a j )

P(a k )

, then

l j

l k

, where

l j

Condition 2:

The two least probable letters have codewords with the same maximum length

lm.

Condition 3:

In the tree corresponding to the optimum code, there must be two branches stemming from each intermediate node.

Condition 4:

Suppose we change an intermediate node into a leaf node by combining all the leaves descending from it into a composite word of a reduced alphabet. Then, if the original tree was optimal for the original alphabet, the reduced tree is optimal for the reduced alphabet.

46

Minimum Variance Huffman Codes

By performing the sorting procedure in a slightly different manner, we could have found a different Huffman code.

47

Huffman Coding: Self Study!

3.2.1 Minimum Variance Huffman Codes (pp. 46 – 47 {redo the examples}) 3.2.3 Length of Huffman Codes (pp. 49 ~ 51 and the example 3.2.2) 3.2.3 Huffman Codes optimality condition!!

48