Transcript Document

Chapter 4
Variable-Length Codes:
Huffman Codes
1
Outline
•
•
•
•
•
•
4.1 Introduction
4.2 Unique Decoding
4.3 Instantaneous Codes
4.4 Construction of Instantaneous Codes
4.5 The Kraft Inequality
4.6 Huffman Codes
2
4.1 Introduction
• Consider the problem of efficient coding of
message to be sent over a “noiseless”
channel.
– maximize the number of messages that can be
sent in a given period of time.
– transmit a message in the shortest possible time.
– make the codeword as short as possible.
3
4.2 Unique Decoding
•
•
•
•
•
Source symbols (alphabet): { s1, . . . , sq }
Codes alphabet: { C1, C2, . . . , Cr }
X is a random variable
X→{ s1, . . . , sq } with probabilities {p1, . . . , pq}
X is observed over and over again, i.e., it generates a
sequence of symbols  {s1, . . . , sq}
Si
• Ex: s1 → 000
s2 → 111
encode
Ci Cj · · ·
Ck
4
• The collection of all codewords is called a ‘code’.
• Our Objective:
– Minimize the average codeword length
q
i.e.,min{ pi ni } where ni is t hecodewordlengt hof si
i 1
– Unique decodability − the received message must have a single,
unique possible interpretation.
Ex. s1 → 0
Source alphabet:{s1, s2, s3, s4}
s2 → 01
Code alphabet:{0,1}
s3 → 11
s4 → 00
s4 s3
Then 0011
s1 s1 s3
So it doesn’t satisfy unique decodability
5
• Ex
s1 →
s2 →
s3 →
s4 →
• Ex
s1 →
s2 →
s3 →
s4 →
0
010
01
10
0
01
011
111
s1 s4
s2
Then 010
s3 s1
It also doesn’t satisfy unique decodability
It is a unique decodable code
6
• Definition:
– The nth extension of a code is simple all possible
concatenations of n symbols of the original source code.
• No two encoded concatenations can be the same,
even for different extensions.
• Every finite sequence of code characters
corresponds at most one message.
≡every distinct sequence of source symbols has a
corresponding encoded sequence that is unique.
7
4.3 Instantaneous Codes
• Decision (Decoding) tree:
s1 = 0
0
Initial
state
s2 = 10
s1
s3 = 110
0
s2
s4 = 111
1
1
0
1
s3
s4
8
• Note that: each bit of the received stream is
examined only once and that the terminal states of
this tree are the four source symbols s1, s2, s3 and
s4.
• Definition: A code is instantaneous if it is
decodable without lookahead (i.e., a word can be
recognized as soon as complete).
• When a complete symbol is received, the receiver
immediately know this, and do not have to look
further before deciding what message symbol you
received.
• A code is instantaneous iff no codeword si is a
prefix of another codeword sj.
9
• the existence of the decoding tree
≡ the existence of the instantaneous decodability
• Ex. Let n be a positive integer. A comma code is a code
with codewords
1, 01, 001, 0001, . . . , 00···01, 00···0
n-1
n
• “1” becomes a comma to represent end of a codeword.
• Because a comma code is prefix-free, it is a instantaneous
code.
10
• Ex:
s1 → 0
Not instantaneous code, but it
still be uniquely decodable
s3 → 011
code.
s2 → 01
s4 → 111
U.D.
I.C. is better than U.D.
I.C.
ex: 01111·····111
s4
s4
So it had better use comma code.
s1 → 1
s2 → 01
s3 → 001
s4 → 001
11
4.4 Construction of Instantaneous Codes
• Given five symbols si in the source code S.
s1 → 0
s1 → 00
s2 → 10
s2 → 01
C1 s3 → 110
C2 s3 → 10
s4 → 1110
s4 → 110
s5 → 1111
s5 → 111
Both C1 and C2 are Instantaneous Codes, which one is
better?
Answer: Depends on the frequency of occurrence of the
symbols
12
4.5 Kraft Inequality
• Theorem: A necessary and sufficient
condition for the existence of an
instantaneous code S of q symbols si
(i = 1, .., q) with encoded
words of length
q
1
l1  l2  ···  lq is  li  1
i 1
r
where r is the radix (number of symbols) of
the alphabet of the encoded symbols.
13
• Thm: An instantaneous code with word length
n1 , n2, . . ., nM exits iff  D  1where D is the
size of the code alphabet.
() For simplicity, we assume D = 2
(1) when M = 2, n1 = 1 and n2 = 1
1 1
s10 0 s
  1 is OK for tree of length 1
2 2
s21 1
M
 ni
i 1
1
1
1
s2
14
(2)If M  k is OK,
then k’  1
k”  1
when M = k + 1,
1
1
k  k ' k"  1
2
2
 M  k  1 is also OK.
K’
k
K”
By induction method, the inequality is true.
15
• Another proof:
() C = {c1, c2, …, cM} with codeword
lengths l1, …, lM
Let L = max{ li }
If ci  x1x 2 ...xl  C then x x1x 2 ...xl yl 1...yL
where yj are any code symbols, cannot be in C
because ci is a prefix of x.
L -li
x has LD
possibilities.
l
M
M
1
L
 D
 D  l words (length of L) not in C
D
i
i
i
i
i 1
i 1
i
16
l
M
i
M
1
L
L
 D  li  D   D  1
i 1 D
i 1
=> If there are 1 number of words with length 1 then 1 r.
If there are 2 number of words with length 2 then
(2  r2 - 1r).
Infer that, 3  r3 - 1r2 - 2r.
=> 1  r
1r + 2  r2
1r2 + 2r+ 3  r3
…
1rn-1 + 2rn-2 + n  rn
So if it satisfy the last equation, then all the equations hold.

1
r

2
r
2
 ... 
n
r
n
 1 => It satisfy Kraft’s inequality.
17
Note: A code may obey Kraft inequality still not be
instantaneous.
EX: 0
1 1 1 1
   1
01
2 4 8 8
011
But it is not I.C.
111
EX: Binary Block codes (Error Correcting Codes)
(
)
(
)
2k
 1 is I.C.
k
n
h
2
b:2
18
length k
…
…
• Ex: Comma code
D
{…….}
length 1
1 (It must to be.)
length 2
D-1
length 3
(D-1)2
D(D-1)k-1
1
D 1
(D  1)k  2 D(D  1)k 1

 ... 

1
2
K 1
D
D
D
Dk
a0 
D-1
Let r 
D
a0  a2 r  a3 r 2  ...  a0 r
1
b
( D  1) k  2
D k 2
• Kraft inequality can be extended to any uniquely decodable
codes.
19
• McMillan Inequality:
Thm: A uniquely decodable code has word
length l1, l2, …, lq exits iff  r  1
(r is the size of the code alphabet)
()Trivial. Because I.C. is one kind of U.C.
()  q 1  n
nl
N
q
li
i 1
  li   k n   kk  k n
k n r
 i 1 r 
where l is the length of the longest symbol, i.e.,
l  max{l1,...,lq } and Nk is the number of code
symbols (of radix r) of length k.
20
 Nk  r k (the number of distinct sequences of
length k in radix r)
k
r
 k n   k  nl  n  1  nl
k n r
nl
If k > 1, we can find a n s.t. kn > nl →←
k 1
21
4.6 Huffman Codes
• Lemma: If a code C is optimal within the class of
instantaneous codes, then C is optimal with the
entire class of U.D. codes.
• pf: Suppose C ’ is a U.D. code. C ’ has a smaller
average codeword length than C.
Let n1’, n2’ , . . . , nM’ be the codeword length of C ’

M
 n 'i
D
 1 (It satisfy Kraft Inequality)

i 1
So, C is not optimal in I.C. →←
22
• Optimal Codes:
Given a binary I.C. C with codeword length
n1, …, nM associated with probability p1, …,
pM.
For convenience, let {p1 ≥ p2 ··· ≥ pM-1 ≥ pM}
(ni ≤ ni+1≤ ··· ≤ ni+r if pi = pi+1 = ··· = pi+r)
Then C is optimal within the class of I.C., C
must have the following properties:
23
• (a) Higher probable symbols have shorter codewords.
i.e. if pj > pk => nj ≤ nk
• (b) The 2 least probable symbols have codewords of equal
length, i.e., nM-1 = nM
• (c) Among the codewords of length nM, 2 codes the agree
in all digits except the least one.
• Ex:
x1 → 0
x2 → 100
x3 → 101
x4 → 1101
x5 → 1110
Don’t satisfy (c), it have to be
x4 → 1101
x5 → 1110
24
• pf:
(a) if ( pj > pk )  ( nj > nk ) then we can construct a better
codes C ’ by interchange codewords j, k.
(b) From (a) if pM-1 > pM then nM-1 ≤ nM & By assumption if
pM-1 = pM then nM-1 ≤ nM .We may make nM-1 = nM and still
have in I.C. better than the original one.
(c) If condition (c) is not true, we may drop the least digit
of all such codewords to obtain a better code.
• Huffman coding─ Construction of Optimal (instantaneous)
codes
25
• Let x1, …, xM be an array of symbols with
probabilities p1, …, pM ( p1 ≥ p2 ≥ ··· ≥ pM)
(1) Combine xM-1, xM into xM-1,M with probability
pM-1+pM
(2) Assume we can construct an O.C. C2 for
x1, x2, …, xM-1,M
(3) Now construct a code C1 for x1, …, xM as
follows
– The codeword associated with x1, …, xM-2 in C1 is
exactly the same as the corresponding codewords of C2
– Let wM-1,M be the codeword of xM-1,M in C2
The codewords for xM-1, xM in C1 is
either wM-1,M 0 → xM-1 or
wM-1,M 1 → xM
26
• Claim: C1 is an optimal code for the set of probability
p1, …, pM.
• Ex:
x3,4,5,6 0.45 x1,2
x1 0.3
x1 0.3
x1 0.3
x2 0.25
x2 0.25
x2
0.25
x1
0.3
x3 0.2
x3 0.2
x4,5,6 0.25
x2
0.25
x4 0.1
x5,6 0.15
x3
x5 0.1
x4
0.55
x3,4,5,6 0.45
0.2
0.1
x6 0.05
27
x1,2
0
x3,4,5,6 1
x1
00
x2
01
x3,4,5,6 1
x3
10
x4,5,6 11
x4
110
x5,6 111
x5 1110
x6 1111
28
• pf:
– We assume that C1 is not optimal.
– Let C1’ be an optimal instantaneous code for x1, …, xM.
– Then C1’ has codewords w1’, w2’, …, wM’ with length n1’, n2’, …,
nM’.
– If there are only two symbols of maximum length in a tree, they
must have their last decision node in common, and they must be
the two least probable symbols. Before we reduce a tree, the two
symbols contribute nM( pM + pM-1) and after the reduction they
contribute (nM - 1)( pM + pM-1).
– So that the code length is reduced by ( pM + pM-1).
Average length of C1 > Average length of C1’ --- (1)
– After reduction,
Average length of C2 > Average length of C2’
(The terms of (1) minus pM+pM-1)
– But C2 is optimal  →←
29
• If there are more than two symbols of the
maximum length, we can use the following
proposition:
– Symbols having the same length may be inter-change
without changing the average code length.
• We can use the biggest two probable symbols to
encode like the way before.
• Huffman encoding is not unique.
30
• Ex:
p1 = 0.4 → 00
Average length :
p2 = 0.2 → 10
L = 0.4·2+0.2·2+0.2·2+0.1·3+0.1·3
p3 = 0.2 → 11
= 2.2
p4 = 0.1 → 010
p5 = 0.1 → 011
Or
p1 = 0.4 → 1
Average length :
p2 = 0.2 → 01
L = 0.4·1+0.2·2+0.2·3+0.1·4+0.1·4
p3 = 0.2 → 000
= 2.2
p4 = 0.1 → 0010
p5 = 0.1 → 0011
31
• Which encoding way is better?
– Var( I ) = 0.4(2-2.2)2 + 0.2(2-2.2)2 + 0.2(2-2.2)2
+ 0.1(3-2.2)2 + 0.1(3-2.2)2 = 0.16 (Good!)
– Var( II ) = 0.4(1-2.2)2 + 0.2(2-2.2)2 +
0.2(3-2.2)2 + 0.1(4-2.2)2 + 0.1(4-2.2)2 = 1.36
32