Document 7385312
Download
Report
Transcript Document 7385312
Huffman Code and Data
Decomposition
Pranav Shah CS157B
Why Data Compression?
Fixed Length Data inefficient for transfers and storage.
Types Of Compressions
Lossless Compression
Exact Original data reconstructed from compressed data.
Nothing lost.
Examples : Zip, Bank Account Records
Types of Compressions
Lossy Compression
Approximation of original data reconstructed from compressed
data.
Examples : JPEG – Loss of data quality after repeated
compressions.
File Size: 87 KB
File Size:26 KB
Variable Length Bit Coding
Maps source symbols to a variable number of bits.
Allows sources to be compressed and decompressed with
zero error.
Examples : Huffman Coding, Lempel-Ziv Coding and
Arithmetic Coding
Variable Bit Coding Rules
Use Minimum Number of bits.
Helps to speed up the transfer rate and increase storage.
Variable Bit Coding Rules
Cannot have code which contains prefix for another code
Example: Assume A has the code 01. Then, B cannot have the
code 010 as it contains A.
Enable left to right unambiguous decoding.
Example: If you have 01, then you know that it is A and not any
other character (Not B!)
Huffman Code
Entropy encoding algorithm used for lossless data
compression.
Variable length code using average length formula : L = l1p1 +
l2p2 + … + lMpM where l1,l2,l3…lM = length and p1,p2,p3…pM =
Probabilities of Source Alphabets A1,A2,…AM being generated.
Uses binary tree.
The Huffman Code generated using binary Huffman Code
construction method.
Equivalent to simple binary block encoding (Example: ASCII)
Algorithm
Make a leaf node for each code symbol
Add the generation probability of each symbol to the leaf node.
Take the two leaf nodes with the smallest probability and
connect them into a new node.
Add 1 or 0 to each of the two branches.
The probability of the new node is the sum of the probabilities
of the two connecting nodes.
If there is only one node left, the code construction is
completed. If not, go back to (2)
Example
Characters
Frequency
A
19% (0.19)
B
28% (0.28)
C
13% (0.13)
D
30% (0.30)
E
10% (0.10)
Step 1
Take lowest two frequencies and make a node.
0.23
0.10
0.13
Character
s
Frequenc
y
A
19%
(0.19)
EC
23% (0.23)
B
28%
(0.28)
C
13% (0.13)
D
30% (0.30)
E
10% (0.10)
Step 2
Take next two lowest and connect into a node.
0.42
0.19
0.23
0.10
0.13
Characters
Frequency
A
19% (0.19)
EC
23% (0.23)
B
28% (0.28)
C
13% (0.13)
D
30% (0.30)
AEC
42% (0.42)
E
10% (0.10)
Step 3
Continue…
0.58
0.42
0.19
0.28
0.23
0.10
0.13
0.30
Characters
Frequency
A
19% (0.19)
EC
23% (0.23)
B
28% (0.28)
C
13% (0.13)
D
30% (0.30)
AEC
42% (0.42)
BD
58% (0.58)
E
10% (0.10)
Completed Tree
1.0
0.42
0.19
0.58
0.28
0.23
0.10
0.13
0.30
Add 0 or 1 to each branch
0
1.0
0 0.42 1
0.19
0
0 0.23 1
0.10
1
0.13
0.58
0.28
1
0.30
Generated Code
Characters
Frequency
Code
A
19% (0.19)
00
B
28% (0.28)
10
C
13% (0.13)
011
D
30% (0.30)
11
E
10% (0.10)
010
References
http://gadgethobby.com/wp-content/plugins/blog/images/data-
compression.jpg
http://www.steves-digicams.com/knowledge-center/jpeg-images-
counting-your-losses.html
http://en.wikipedia.org/wiki/Variable-length_code
http://en.wikipedia.org/wiki/Huffman_coding
http://www.aykew.com/aboutwork/speed.html
http://www.000studio.com/kobe_biennale2007/main/gallery.php?i
d=1