COM347J1 Networks and Data Communications L1

Transcript COM347J1 Networks and Data Communications L1

COM342
Networks and Data Communications
Lecture 4: Data Compression, Error
Detection and Error Correction
Ian McCrum
Room 5B18
Tel: 90 366364 voice mail on 6th ring
Email: [email protected]
Web site: http://www.eej.ulst.ac.uk
10/10/04
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
L4/1/39
The Encoding and compression
of data
• Introduction
• Information Content of a message stream
• simple coding methods
• Huffman coding
• compression techniques
10/10/04
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
L4/2/39
REDUNDANCY
• Consider that you were in receipt of the
following telegram:
• RONMIE (ROCKTT) O’SULLIVON 146
CREAK
• It is possible due to the inherent redundancy
of natural language to perform a
reconstruction leading to the message on the
next slide.
10/10/04
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
L4/3/39
REDUNDANCY
• Consider that you were in receipt of the following
telegram:
• RONMIE (ROCKTT) O’SULLIVON 146
CREAK
• It is possible due to the inherent redundancy of
natural language to perform a reconstruction
leading to the message below.
• RONNIE (ROCKET) O’SULLIVAN 146
BREAK
• but what about the numbers in the message?
10/10/04
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
L4/4/39
Redundancy
• Redundancy arises due to the correlation
of letters occurring in natural language,
consider the word:
• YACH ( if T is sent it will carry no
information)
• Is it possible for a coding schema to
produce an Ideal code?
10/10/04
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
L4/5/39
Reduction of Redundancy
• observe the
– Statistical occurrence of symbols
– Repetition of symbols
• employ
– Fano coding, Huffman coding (the most
common symbols are given shorter codes)
– data compression (e,g code repetition as a
special case)
10/10/04
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
L4/6/39
Packed decimal / half byte
compression
• When frames just contain numeric
characters
– use binary coded decimal instead of 7 bit ASCII
or 8 bit EBCDIC as only the four least
significant bits change with number.
– In ASCII “:” and “;” in same column are used
as decimal pt and space respectively
10/10/04
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
L4/7/39
Packed Decimal
STX Cntrl XX ‘2’’6’ ‘:’’3’ ‘2’’;’ ‘4’’5’
ETX BCC
Closing flag
& Block CC
1st number 26.32
Number of digits following
Control character half byte compression
Opening Flag
10/10/04
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
L4/8/39
Relative encoding
• Whenever only small differences occur between
successive values
• send only that difference
• very effective in data logging
• consider level of a river
Relative encoding sign, number and delimiter
STX ‘+’
‘1’
‘¬’
‘+ ‘
‘4’
‘¬’
ETX BCC
Relative encoding using signed 8 bit integers
STX +3
10/10/04
-95
+11 +124 -100
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
ETX BCC
L4/9/39
Character suppression
• in a stream of digits there are often
sequences of the same characters, most
frequently spaces.
• if a continuous string of three or more chars
in a sequence it is replaced by
Cntrl,char,number
• thus CntrlF25 means 25Fs in a sequence.
• type of run-length encoding
10/10/04
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
L4/10/39
Character suppression
STX Cntrl sp
45
‘A’ ‘B’
Single letters
ETX BCC
Closing flag
& Block CC
number of chars
Char being suppressed
Control character
Opening Flag
10/10/04
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
L4/11/39
Run length encoding
• Run-length compression where the codeword actually
contains the number of repetitions.
• A three byte minimum repetition is chosen such that all
occurrences of repetitions greater or equal to 3 will be
encoded thus.
• <char><char><char><n>
• this four byte codeword can represent repetitions up to 259
<char>
<char><char>
<char><char><char>
<char><char><char><char>
<char><char><char><char><char>
10/10/04
<char>
<char><char>
<char><char><char><0>
<char><char><char><1>
<char><char><char><2>
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
L4/12/39
Huffman coding
• Instead of representing symbols with a fixed
no of bits, fewer bits are used for frequently
occurring symbols and vice versa
• Method: Determine the relative frequency
of symbols. Create an unbalanced tree with
unequal branches.
10/10/04
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
L4/13/39
Example of Huffman
• Consider that a group of characters A to H is to be
transmitted. This comprises
• 9As, 9Bs, 5Cs, 5Ds, 2Es, 2Fs, 2Gs, 2Hs
• Sequence of operations.
– a) Order the symbols in terms of probability
– b) Combine the two least frequently occurring
symbols
– c) assigning 1(upper) and 0(lower) to each.
– d) This is now considered to be one entity.
10/10/04
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
L4/14/39
Huffman continued
• Perform the same steps until only two
symbols are left.
• Determine the codeword by reading from
left to right. The first bit being read is the
least significant one.
10/10/04
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
L4/15/39
The resulting Huffman Codes for these
symbols are:
10/10/04
A9 -->
1
0
B9 -->
0
1
C5 -->
1
1
1
D5 -->
1
1
0
E2 --> 0
0
0
1
F2 --> 0
0
0
0
G2 --> 0
0
1
1
H2 --> 0
0
1
0
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
L4/16/39
Comparison
• If there were N symbols then N codewords
would be sent. In the case of fixed length
binary codes this would be represented by
3N bits.
• How does this compare with those required
by this example of Huffman encoding?
10/10/04
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
L4/17/39
Message
Prob
No of bits N messages
A
9/36
2
18/36N
B
9/36
2
18/36N
C
5/36
3
15/36N
D
5/36
3
15/36N
E
2/36
4
8/36N
F
2/36
4
8/36N
G
2/36
4
8/36N
H
2/36
4
8/36N
Total number of Bits 98/36N = 2.72N bits
10/10/04
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
L4/18/39
Therefore there has been a saving of 0.28N bits in
comparison with fixed length binary each of 3 bits.
Redundancy it can shown that the ideal code for this
sequence of symbols would take 2.53N bits ie. this is
the actual information content of the stream of
codewords.
Thus for fixed length binary codes the
Information content
Redundancy = 1 - ------------------------Number of bits sent
or
= 1 - 2.53N/3.0N
=
16%
for Huffman = 1 - 2.53N/2.72N
= 7%
10/10/04
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
L4/19/39
MNP Class 5 Compression
• is a combination of Huffman and run-length encoding.
• The symbol stream is run-length encoded with a minimum
repetition of 3 bytes and then Huffman encoded using a
statistically generated table.
• During transmission the statistics for the occurrence of
each symbol are updated and the allocation of codewords
are dynamically changed.
• MNP Class 5 compression achieves 2:1 compression on a
regular basis. Its major drawback is that cannot turn itself
off when it offers no gain, so that an incompressible file
actually expands by >10%.
10/10/04
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
L4/20/39
Error detection and protection
• Introduction
• Error Detection
– recognise that one has happened
• Error Correction
– repair damaged data
• parity and CRC.
• BCC and Hamming,
10/10/04
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
L4/21/39
Data errors
• Errors can arise due to attenuation of signal
strength and due to other reasons.
– well shaped signals can become distorted and thus
misinterpreted.
• Random errors (each occurs with certain
probability)
– noise in electronics
– distance traveled
• Burst errors (groups of bits in error occur)
– source interference
– faults in equipment
10/10/04
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
L4/22/39
Error detection
• A sequence of bits (I0 … In) is subjected to some
processing (P) giving rise to a check sequence (C0…Ck)
• Both are transmitted toward a receiver and incur a
possibility of corruption.
• Upon reception the bit stream is separated into received
data (I0r … Inr) and received check sequence (C0r…Ckr).
• The received data (I0r … Inr) is assumed to be correct and
the same processing (P) is performed on it giving the
reconstructed sequence (C0rr...Ckrr).
• If received check sequence (C0r…Ckr) and the
reconstructed sequence (C0rr...Ckrr) are equal then no
detectable error has occurred.
10/10/04
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
L4/23/39
Parity for ASCII codes
• Consider a seven-bit ASCII code to comprise the
following bits which can be labeled I6, I5, I4, I3, I2,
I1, I0
• A Parity bit P0 is placed beside the most significant
bit I6 so that the codeword P0, I6, I5, I4, I3, I2, I1, I0 is
formed.
• The Parity bit is determined as before so that for Odd
parity there are an odd number of 1s in the
codeword.
• and for Even parity there are an even number of 1s in
the codeword.
10/10/04
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
L4/24/39
Block Sum Check Character
P0 I6 I5 I4 I3 I2 I1 I0
1
0
1
0
1
1
0
0
1
1
0
1
1
1
1
1
0
1
1
1
0
1
1
1
0
0
1
1
1
1
1
0
1
0
1
0
0
1
1
0
1
0
1
0
0
1
0
1
0
0
1
0
0
1
1
0
Codeword 1
Codeword 2
Codeword 3
Codeword 4
Codeword 5
Codeword 6
Block Check Char.
Hey!
See me!!
10/10/04
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
L4/25/39
Block Sum Check Character
• Consider what this method can do:
– in terms of detecting errors.
– in terms of correcting errors.
• Can you see where it might be used in
practice?
• Where will it cease to work adequately?
10/10/04
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
L4/26/39
Cyclic Redundancy Check (CRC)
• The CRC is so called because the codes fall into a class of
cyclic codes each forming new legal code which shifted,
when added to a sequence of bits they increase the
redundancy of the codeword.
• The data sequence is divided by a standard polynomial
and the remainder is the check bits or CRC.
• Polynomial is of the form
– 1.X4 + 0.X3 + 1.X2 + 0.X1 + 1
– more usually written X4 + X2 + 1
– and in binary take the form 10101
10/10/04
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
L4/27/39
The arithmetic is different! But easier
• In decimal 0..9 and 0..9 means 100 different
additions and 21 different answers (0..20)
• In binary using a half adder or exclusive OR there
are (0 1) and (0 1) meaning 4 different additions
and only 2 answers.
• Thus 0  0 = 0, 0  1 = 1, 1  0 = 1 and 1
1=0
–  being the symbol for exclusive OR.
– think of a half adder being an adder without a carry.
10/10/04
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
L4/28/39
To perform CRC determination
• Get data to be protected, ok 11011
• Choose polynomial ok X4 + X2 + 1
• append to data the number of bits indicated by the
maximum order of the polynomial (4) giving 110110000
• divide this number by the polynomial thus
– 110110000 / 10101
• Take the remainder and send after the original data.
• Upon reception check received CRC with reconstructed
CRC to determine error conditions.
10/10/04
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
L4/29/39
4
2
Use the polynomial x + x + 1 to
generate CRC
11101
10101 110110000
10101
11100
10101
10010
10101
11100
10101
1001
Thus the remainder is 1001and codeword 110111001
10/10/04
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
L4/30/39
Does 111010010 contain an error,
generated by using the same polynomial
as before.
11000
10101 111010000
10101
10000
10101
10100
10101
010
Thus the remainder is 0010 and codeword 111010010
10/10/04
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
L4/31/39
Or divide rx data and crc by generating
polynomial and remainder should be
zero
11010
10101 111010010
10101
10000
10101
10101
10101
000
Thus the remainder is 000 and codeword 11101 was rx ok!
10/10/04
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
L4/32/39
Hamming Codes
11
10
9
8
7
6
5
4
3
2
1
position in codeword
I6 I5 I4 C3 I3 I2 I1 C2 I0 C1 C0
information and checks
Given an ASCII code 1001010 what is the Hamming Code?
11
10
9
8
7
6
5
4
3
2
1
I6 I5 I4 C3 I3 I2 I1 C2 I0 C1 C0
1 0 0 x 1 0 1 x 0 x x
10/10/04
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
L4/33/39
How to determine the values of
C3C2C1&C0
C3 C2 C1 C0
11
7
5
1 0 1 1
0 1 1 1
0 1 0 1
1 0 0 1
I6 I5 I4 C3 I3 I2 I1 C2 I0 C1 C0
1 0 0 1 1 0 1 0 0 0 1
10/10/04
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
L4/34/39
How does this detect an error?
I6 I5 I4 C3 I3 I2 I1 C2 I0 C1 C0
1 0 0 1 1 1 1 0 0 0 1
C3 C2 C1 C0
11
8
7
6
5
1
10/10/04
1
1
0
0
0
0
0
0
0
1
1
1
0
1
1
0
1
1
0
0
1
1
0
1
0
1
1
0
Bit in error
Therefore 6th bit was received in error
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
L4/35/39
Summary
• Hamming codes have their redundant bits in the positions
which are powers of 2 ie 1,2,4,8 etc
• They can detect and correct single errors.
• They can indicate multiple error conditions but cannot
correct.
• Used for random errors.
• Can you think of how they might be applied to a
circumstance a burst error could occur? Assume that the
burst is shorter that 8 bits and there are 256 bytes to be
transmitted.
10/10/04
www.eej.ulster.ac.uk/~ian/modules/COM342/COM342_L4.ppt
L4/36/39

COM347J1 Networks and Data Communications L1

Transcript COM347J1 Networks and Data Communications L1

Directory