Base64 and CRC

Transcript Base64 and CRC

Traceroute Assignment

Base64 Encoding

The SMTP protocol only allows 7 bit ASCII data, so how can you send me a picture of Avril Lavigne, which is an 8 bit binary JPEG file?

Encode it.

But back to Base64 encoding…

The encoding method used is simple and elegant. Each group of 3 bytes is encoded as 4 bytes, each containing only 6 bits of data. These are sent as 7-bit ASCII.

Why is it called BASE64? Because 6 bits gives us decimal numbers in the range 0-63, by assigning a character to each decimal value (64 of them), we can encode any number in the range 0-63 by just one single character. Base 64 requires 64 symbols, just as decimal (base 10) requires 10 symbols and hexadecimal (base 16), requires 16 symbols.

The Base64 Alphabet: (values given in decimal) 0 A 17 R 34 i 51 z 1 B 18 S 35 j 52 0 2 C 19 T 36 k 53 1 3 D 20 U 37 l 54 2 4 E 21 V 38 m 55 3 5 F 22 W 39 n 56 4 6 G 23 X 40 o 57 5 7 H 24 Y 41 p 58 6 8 I 25 Z 42 q 59 7 9 J 26 a 43 r 60 8 10 K 27 b 44 s 61 9 11 L 28 c 45 t 62 + 12 M 29 d 46 u 63 / 13 N 30 e 47 v 14 O 31 f 48 w (pad) = 15 P 32 g 49 x 16 Q 33 h 50 y

We take 3 bytes and encode to 4 bytes:

3 bytes to encode: 10101111 11001010 11101010 24 bit stream: 101011111100101011101010 Four 6-bit values: 101011 111100 101011 101010 decimal value Base64 character 43 60 43 42 r 8 r q

We then use the table to send the ASCII codes for each BASE64 character.

The Base64 Alphabet: (values given in decimal)

0 A 17 R 34 i 51 z 1 B 18 S 35 j 52 0 2 C 19 T 36 k 53 1 3 D 20 U 37 l 54 2 4 E 21 V 38 m 55 3 5 F 22 W 39 n 56 4 6 G 23 X 40 o 57 5 7 H 24 Y 41 p 58 6 8 I 25 Z 42 q 59 7 9 J 26 a 43 r 60 8 10 K 27 b 44 s 61 9 11 L 28 c 45 t 62 + 12 M 29 d 46 u 63 / 13 N 30 e 47 v 14 O 31 f 48 w (pad) = 15 P 32 g 49 x 16 Q 33 h 50 y

There is a slight problem when the bit stream to be encoded is not an exact multiple of 3. In this case, zeros are added to make the last group of bytes (ie 1 or 2 bytes) up to a multiple of 6 bits. One or two padding characters (=) are added to make the encoded data a multiple of 4 bytes.

For example:

4 bytes to encode: 10101111 11001010 11101010 00100011 32 bit stream: 10101111110010101110101000100011 Six 6-bit values: 101011 111100 101011 101010 001000 11 0000 decimal value 43 60 43 42 08 48 Base64 characters r 8 r q I w Add padding r 8 r q I w = = In this case four zeros are added, then two padding characters.

The Base64 Alphabet: (values given in decimal)

Example Email Message with GIF attachment - BASE64 encoded

MIME-Version: 1.0

Content-Type: Multipart/mixed; BOUNDARY="Part10510241718.A" --Part10510241718.A

Content-Type: Text/Plain; charset="us-ascii" This email contains an attachment - a small GIF file.

Jim --------------------- --Part10510241718.A

Content-Type: Image/gif; name="pin.gif" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="pin.gif" R0lGODlhDgARAPIAAAAAAL8AAICAgP8AAP///wAAAAAAAAAAACH5BAEAAAQA LAAAAAAOABEAAAM/SArRoRAy5yIBMwwynqTb1kjMtHHeFWRal2JUTAZCPJJA XTdYhOWrX8/3w1mOgqFCwGwyLU4nNPqcRo9LqSUBADs= --Part10510241718.A

CRC –Cyclic Redundancy Check

Errors happen!

One simple method of error checking is to do a checksum. All the bytes in the message are added up and the result is transmitted with the message. The receiver does the sum again and compares the result with the transmitted checksum. This can detect lots of errors, but it is easy to see that one bit changed in one byte could be cancelled out by one bit changed in another byte. This is not a very secure method of error checking.

The CRC is extensively used for error checking in many network protocols.

It is based on some very complex mathematics, concerned with polynomial arithmetic. If you wish to investigate the theory behind CRC, this is a good starting point: http://www.ross.net/crc/links.html

Why polynomial arithmetic? In any number system, numbers can be considered as polynomials, in our familiar decimal system, the number 3807 can be expressed as: 3*10 3 +8*10 2 +0*10 1 +7*10 0 Things are actually simplified if we are working in binary, as the coefficients can only be 0 or 1. So if we consider the binary number 101101. This is: 1*2 5 +0*2 4 +1*2 3 +1*2 2 +0*2 1 +1*2 0 or x 5 +x 3 +x 2 +1

The CRC works by division, rather than addition. The data (the transmitted message) is considered to be a big binary number, which could be represented as a polynomial. This polynomial is divided by another, carefully chosen, polynomial to give a result which is used to check the data, in the same way as a checksum.

By using a division algorithm, this method of error checking can detect many more errors than a simple checksum.

Rather than using a straightforward binary division, the CRC uses modulo-2 arithmetic. This means effectively doing a normal long division, but with a few strange rules. In modulo-2 arithmetic, subtraction and addition are identical, since there are no “carries”. The logical function is actually XOR. Deciding if the divisor “goes into” the current part of the dividend simply depends if the MSB is the same(1). So 1111 would “go into” 1000.

Example from the book:

Data to be checked: 101110 Generator polynomial: 1001 (x 3 +1) The data is first multiplied by 2 3 , since the generator polynomial is of order 3, done by adding 3 zeros: 101110

000

Next this value is divided by the generator polynomial (1001), by long division, using modulo-2 arithmetic, where subtraction becomes the XOR function:

101011 -------------- 1001 | 101110000 1001 --- 101 000 --- 1010 1001 --- 110 000 --- 1100 1001 --- 1010 1001 --- 011 Remainder

The value actually transmitted is the data plus the remainder ie, in this case: 101110

011

When the CRC is calculated at the receiver the value should be zero, as the remainder value has been added to the original data.

101011 -------------- 1001 | 101110

000

1001 --- 101 000 --- 1000 1001 --- 110 000 --- 1100 1001 --- 1010 1001 ----

011 Remainder

International standards have been established for various different CRC generators of different bit lengths. For example, this is the CRC-32-IEEE 802.3 polynomial, used for Ethernet:

32 +

26 +

23 +

22 +

16 +

12 +

11 +

10 +

8 +

7 +

5 +

4 +

2 +

+ 1 (V.42) The CRC can always detect burst errors of fewer than r+1 bits (where r is the order of the generator polynomial). There is also a good probability of longer burst errors being detected. The CRC can also detect any odd number of bit errors.

The CRC is easy to implement in software and is often implemented in hardware (using shift registers and xor gates).