Transcript Document

Information Theory
Linawati
Electrical Engineering Department
Udayana University
1 of 28





Information Source
Measuring Information
Entropy
Source Coding
Designing Codes
2
Information Source
 4 characteristics of information source
 The no. of symbols, n
 The symbols, S1, S2, …, Sn
 The probability of occurrence of each symbol,
P(S1), P(S2), …, P(Sn)
 The correlation between successive symbols
 Memoryless source: if each symbol is
independent
 A message: a stream of symbols from the
senders to the receiver
3
Examples …
 Ex. 1.: A source that sends binary
information (streams of 0s and 1s)
with each symbol having equal
probability and no correlation can be
modeled as a memoryless source
 n=2
 Symbols: 0 and 1
 Probabilities: p(0) = ½ and P(1) = ½
4
Measuring Information
 To measure the information contained in a
message
 How much information does a message
carry from the sender to the receiver?
 Examples
 Ex.2.: Imagine a person sitting in a room.
Looking out the window, she can clearly see
that the sun is shining. If at this moment she
receives a call from a neighbor saying “It is now
daytime”, does this message contain any
information?
 Ex. 3. : A person has bought a lottery ticket. A
friend calls to tell her that she has won first
prize. Does this message contain any
information?
5
Examples …
 Ex.3. It does not, the message contains no
information. Why? Because she is already
certain that is daytime.
 Ex. 4. It does. The message contains a lot of
information, because the probability of winning
first prize is very small
 Conclusion
 The information content of a message is
inversely proportional to the probability of the
occurrence of that message.
 If a message is very probable, it does not
contain any information. If it is very improbable,
it contains a lot of information
6
Symbol Information
 To measure the information contained in a
message, it is needed to measure the information
contained in each symbol


I(s) = log2 1/P(s) bits
Bits is different from the bit, binary digit, used to define a
0 or 1
 Examples
 Ex.5. Find the information content of each symbol
when the source is binary (sending only 0 or 1 with
equal probability)
 Ex. 6. Find the information content of each symbol
when the source is sending four symbols with prob.
P(S1) = 1/8, P(S2) = 1/8, P(S3) = ¼ ; and P(S4) =
1/2
7
Examples …
 Ex. 5.
 P(0) = P(1) = ½ , the information content of
each symbol is
1
1
I (0)  log2
I (1)  log2
 Ex.6.
P(0)
 log2
1
2
 log2 [2]  1 bit
1
1
 log2 1  log2 [2]  1 bit
P(1)
2
I ( S1 )  log2
1
1
 log2 1  log2 [8]  3 bit
P ( S1 )
8
I ( S 2 )  log2
1
1
 log2 1  log2 [8]  3 bit
P( S 2 )
8
I ( S 3 )  log2
1
1
 log2 1  log2 [4]  2 bit
P( S3 )
4
I ( S 4 )  log2
1
1
 log2 1  log2 [ 2]  1 bit
P( S 4 )
2
8
Examples …
 Ex.6.
 The symbols S1 and S2 are least probable.
At the receiver each carries more
information (3 bits) than S3 or S4. The
symbol S3 is less probable than S4, so S3
carries more information than S4
 Definition the relationships
 If P(Si) = P(Sj), then I(Si) = I(Sj)
 If P(Si) < P(Sj), then I(Si) > I(Sj)
 If P(Si) = 1, then I(Si) = 0
9
Message Information
 If the message comes from a memoryless
source, each symbol is independent and the
probability of receiving a message with
symbols Si, Sj, Sk, … (where i, j, and k can
be the same) is:
 P(message) = P(Si)P(Sj)P(Sk) …
 Then the information content carried by the
message is
I (m essage)  log2
1
P ( message )
I (m essage)  log2
1
P ( Si )
 log2
1
P ( Sj )
 log2
I (m essage)  I ( Si )  I ( S j )  I ( S k )  ...
1
P ( Sk )
 ...
10
Example …
 Ex.7.
 An equal – probability binary source
sends an 8-bit message. What is the
amount of information received?
 The information content of the message is
 I(message) = I(first bit) + I(second bit) +
… + I(eight bit) = 8 bits
11
Entropy
 Entropy (H) of the source
 The average amount of information
contained in the symbols
 H(Source) = P(S1)xI(S1) + P(S2)xI(S2) + …
+ P(Sn)xI(Sn)
 Example
 What is the entropy of an equal-probability
binary source?
 H(Source) = P(0)xI(0) + P(1)xI(1) = 0.5x1
+ 0.5x1 = 1 bit
 1 bit per symbol
12
Maximum Entropy
 For a particular source with n symbols,
maximum entropy can be achieved only if all
the probabilities are the same. The value of
this max is

H max ( Source )   P ( Si ) log 2
1
P ( Si )
 
1
n
log  log n
1
2
1
n
2
 In othe words, the entropy of every source
has an upper limit defined by
 H(Source)≤log2n
13
Example …
 What is the maximum entropy of a
binary source?
 Hmax = log22 = 1 bit
14
Source Coding
 To send a message from a source to a
destination, a symbol is normally coded
into a sequence of binary digits.
 The result is called code word
 A code is a mapping from a set of symbols
into a set of code words.
 Example, ASCII code is a mapping of a set
of 128 symbols into a set of 7-bit code
words
 A ………………………..> 0100001
 B …………………………> 0100010
 Set of symbols ….> Set of binary streams
15
Fixed- and Variable-Length Code
 A code can be designed with all the
code words the same length (fixedlength code) or with different lengths
(variable length code)
 Examples
 A code with fixed-length code words:
 S1 -> 00; S2 -> 01; S3 -> 10; S4 -> 11
 A code with variable-length code words:
 S1 -> 0; S2 -> 10; S3 -> 11; S4 -> 110
16
Distinct Codes
 Each code words is different from every
other code word
 Example
 S1 -> 0; S2 -> 10; S3 -> 11; S4 -> 110
 Uniquely Decodable Codes
 A distinct code is uniquely decodable if each
code word can be decoded when inserted
between other code words.
 Example
 Not uniquely decodable
 S1 -> 0; S2 -> 1; S3 -> 00; S4 -> 10 because
 0010 -> S3 S4 or S3S2S1 or S1S1S4
17
Instantaneous Codes
 A uniquely decodable
 S1 -> 0; S2 -> 01; S3 -> 011; S4 -> 0111
 A 0 uniquely defines the beginning of a code
word
 A uniquely decodable code is
instantaneously decodable if no code
word is the prefix of any other code
word
18
Examples …
 A code word and its prefixes (note that each
code word is also a prefix of itself)
 S -> 01001 ; prefixes: 0, 10, 010, 0100, 01001
 A uniquely decodable code that is instantaneously
decodable


S1 -> 0; s2 -> 10; s3 -> 110; s4 -> 111
When the receiver receives a 0, it immediately
knows that it is S1; no other symbol starts with a
0. When the rx receives a 10, it immediately
knows that it is S2; no other symbol starts with
10, and so on
19
Relationship between different
types of coding
All codes
Distinct codes
Uniquely decodable codes
Instantaneous
codes
20
Code …
 Average code length
 L=L(S1)xP(S1) + L(S2)xP(S2) + …
 Example
 Find the average length of the following
code:
 S1 -> 0; S2 -> 10; S3 -> 110; S4 -> 111
 P(S1) = ½, P(S2) = ¼; P(S3) = 1/8; P(S4) =
1/8
 Solution
 L = 1x ½ + 2x ¼ + 3x 1/8 + 3x1/8 = 1
¾ bits
21
Code …
 Code efficiency
  (code efficiency) is defined as the entropy of
the source code divided by the average length
of the code
H ( source )
L
 Example
 Find the efficiency of the following code:


100%
 S1 ->0; S2->10; S3 -> 110; S4 -> 111
 P(S1) = ½, P(S2) = ¼; P(S3) = 1/8; P(S4) = 1/8
 Solution L  1 34 bits
H ( source)  12 log2 (2)  14 log2 (4)  18 log2 (8)  18 log2 (8)  1 34 bits

 100%  100%
1 34
1 34
22
Designing Codes
 Two examples of instantaneous codes
 Shannon – Fano code
 Huffman code
 Shannon – Fano code
 An instantaneous variable – length encoding method in
which the more probable symbols are given shorter
code words and the less probable are given longer code
words
 Design builds a binary tree top (top to bottom
construction) following the steps below:



1. List the symbols in descending order of probability
2. Divide the list into two equal (or nearly equal)
probability sublists. Assign 0 to the first sublist and 1
to the second
3. Repeat step 2 for each sublist until no further
division is possible
23
Example of Shannon – Fano
Encoding
 Find the Shannon – Fano code words for
the following source
 P(S1) = 0.3 ; P(S2) = 0.2 ; P(S3) = 0.15 ; P(S4)
= 0.1 ; P(S5) = 0.1 ; P(S6) = 0.05 ; P(S7) =
0.05 ; P(S8) = 0.05
 Solution
 Because each code word is assigned a leaf of
the tree, no code word is the prefix of any
other. The code is instantaneous. Calculation of
the average length and the efficiency of this
code
 H(source) = 2.7
 L = 2.75
  = 98%
24
Example of Shannon – Fano
Encoding
S1
0.30
S2
0.20
S3
0.15
S4
0.10
S5
0.10
0
S6
0.05
S7
0.05
S8
0.05
S6
S7
S8
S7
S8
1
S1
S2
S3
S4
S5
0
1
S1
S2
S3
S4
00
01
0
1
S3
S4
S5
S6
S7
S8
100
101
0
1
0
1
S5
S6
S7
S8
1100
1101
1110
1111
0
1
S5
S6
0
1
25
Huffman Encoding
 An instantaneous variable – length
encoding method in which the more
probable symbols are given shorter
code words and the less probable are
given longer code words
 Design builds a binary tree (bottom
up construction):
 1. Add two least probable symbols
 2. Repeat step 1 until no further
combination is possible
26
Example Huffman encoding
 Find the Huffman code words for the
following source
 P(S1) = 0.3 ; P(S2) = 0.2 ; P(S3) = 0.15 ; P(S4)
= 0.1 ; P(S5) = 0.1 ; P(S6) = 0.05 ; P(S7) =
0.05 ; P(S8) = 0.05
 Solution
 Because each code word is assigned a leaf of
the tree, no code word is the prefix of any
other. The code is instantaneous. Calculation of
the average length and the efficiency of this
code
 H(source) = 2.70 ; L = 2.75 ;  = 98%
27
Example Huffman encoding
0
1.00
0.60
0
0
0.40
1
1
1
0.3
0
1
0
0
0.20
1
0.15
0
1
0.10
1
0.30
0.20
0.15
0.10
0.10
0.05
0.05
0.05
S1
00
S2
10
S3
010
S4
110
S5
111
S6
0110
S7
01110
S8
01111
28