Document 7385312

Download Report

Transcript Document 7385312

Huffman Code and Data
Decomposition
Pranav Shah CS157B
Why Data Compression?
 Fixed Length Data inefficient for transfers and storage.
Types Of Compressions
 Lossless Compression
 Exact Original data reconstructed from compressed data.
 Nothing lost.
 Examples : Zip, Bank Account Records
Types of Compressions
 Lossy Compression
 Approximation of original data reconstructed from compressed
data.
 Examples : JPEG – Loss of data quality after repeated
compressions.
File Size: 87 KB
File Size:26 KB
Variable Length Bit Coding
 Maps source symbols to a variable number of bits.
 Allows sources to be compressed and decompressed with
zero error.
 Examples : Huffman Coding, Lempel-Ziv Coding and
Arithmetic Coding
Variable Bit Coding Rules
 Use Minimum Number of bits.
 Helps to speed up the transfer rate and increase storage.
Variable Bit Coding Rules
 Cannot have code which contains prefix for another code
 Example: Assume A has the code 01. Then, B cannot have the
code 010 as it contains A.
 Enable left to right unambiguous decoding.
 Example: If you have 01, then you know that it is A and not any
other character (Not B!)
Huffman Code
 Entropy encoding algorithm used for lossless data
compression.
 Variable length code using average length formula : L = l1p1 +
l2p2 + … + lMpM where l1,l2,l3…lM = length and p1,p2,p3…pM =
Probabilities of Source Alphabets A1,A2,…AM being generated.
 Uses binary tree.
 The Huffman Code generated using binary Huffman Code
construction method.
 Equivalent to simple binary block encoding (Example: ASCII)
Algorithm
 Make a leaf node for each code symbol
 Add the generation probability of each symbol to the leaf node.
 Take the two leaf nodes with the smallest probability and
connect them into a new node.
 Add 1 or 0 to each of the two branches.
 The probability of the new node is the sum of the probabilities
of the two connecting nodes.
 If there is only one node left, the code construction is
completed. If not, go back to (2)
Example
Characters
Frequency
A
19% (0.19)
B
28% (0.28)
C
13% (0.13)
D
30% (0.30)
E
10% (0.10)
Step 1
Take lowest two frequencies and make a node.
0.23
0.10
0.13
Character
s
Frequenc
y
A
19%
(0.19)
EC
23% (0.23)
B
28%
(0.28)
C
13% (0.13)
D
30% (0.30)
E
10% (0.10)
Step 2
Take next two lowest and connect into a node.
0.42
0.19
0.23
0.10
0.13
Characters
Frequency
A
19% (0.19)
EC
23% (0.23)
B
28% (0.28)
C
13% (0.13)
D
30% (0.30)
AEC
42% (0.42)
E
10% (0.10)
Step 3
 Continue…
0.58
0.42
0.19
0.28
0.23
0.10
0.13
0.30
Characters
Frequency
A
19% (0.19)
EC
23% (0.23)
B
28% (0.28)
C
13% (0.13)
D
30% (0.30)
AEC
42% (0.42)
BD
58% (0.58)
E
10% (0.10)
Completed Tree
1.0
0.42
0.19
0.58
0.28
0.23
0.10
0.13
0.30
Add 0 or 1 to each branch
0
1.0
0 0.42 1
0.19
0
0 0.23 1
0.10
1
0.13
0.58
0.28
1
0.30
Generated Code
Characters
Frequency
Code
A
19% (0.19)
00
B
28% (0.28)
10
C
13% (0.13)
011
D
30% (0.30)
11
E
10% (0.10)
010
References
 http://gadgethobby.com/wp-content/plugins/blog/images/data-
compression.jpg
 http://www.steves-digicams.com/knowledge-center/jpeg-images-
counting-your-losses.html
 http://en.wikipedia.org/wiki/Variable-length_code
 http://en.wikipedia.org/wiki/Huffman_coding
 http://www.aykew.com/aboutwork/speed.html
 http://www.000studio.com/kobe_biennale2007/main/gallery.php?i
d=1