Fixed Point Numbers

Transcript Fixed Point Numbers

Floating Point Numbers

Topics Covered

 Fixed point Numbers  Representation of Floating Point Numbers  IEEE 32-bit floating point number.

 Floating point Arithmetic

Fixed Point Numbers

 The binary (or decimal) point is assumed to be in a fixed position  Base 10 fixed point arithmetic: 7632135 1794821 9426956 763.2135

179.4821

942.6956

Fixed Point (Binary) Numbers

 Example: Add 3.625 and 6.5

Convert the numbers to 8-bit form (4-bit int, 4-bit fraction): 3.625  11.101  0011.1010

6.500  110.10  0110.1000

Consider the numbers having an imaginary binary point and added in the normal way: 00111010 + 01101000 = 10100010 3.

The integer part of the result is converted to 10, and the fractional part is interpreted as .125. Therefore, the result is 10.125.

Problem with Fixed Point (Binary) Numbers

 Some systems require a large range of numbers: 1.

Mass of sun: 1990000000000000000000000000000000 grams Requires about 14 bytes 2.

Mass of electron: 000000000000000000000000000910956 grams Requires about 12 bytes

Floating Point Numbers

Definitions  Range  How small and how large the numbers can be.

 Precision   The number of significant figures used to represent the number.

A measure of a number’s exactness.

 PI = 3.141592 is more precise that PI = 3.14

 Accuracy  A measure of the correctness of a number.

 PI = 3.241592 is more precise than PI = 3.14, but  PI = 3.14 is more accurate.

IEEE Floating Point Numbers

Single Precision Format

-1

* 2

E-B

* 1.F

B = 127

IEEE Floating Point Numbers

Range of Mantissa  A floating point mantissa is limited to one of the three ranges: -2 < x <= -1 x = 0 +1 <= x < +2

IEEE Floating Point Numbers

Exponent

Binary Value

0000 0000 0000 0001 0000 0010 0000 0100 . . .

1000 0000 . . .

1111 1100 1111 1101 1111 1110 1111 1111

True Exponent

-127 -126 -125 -124 0 125 126 127 128

Biased Exponent

0 1 2 3 128 252 253 254 255

Special Numbers

zero +- Infinity

IEEE Floating Point Numbers

Excess - n  The stored exponent is also called excess – n, or excess 127, for the IEEE single precision format.

 The stored exponent exceeds the true exponent by 127, the bias.

  b’ = b + 127 where b’ is the biased exponent, and b is the true exponent.

Examples:   If the true exponent is 2, the exponent is stored in biased form as 2 + 127 = 1000 0001.

If the stored exponent is 0000 0001, the true exponent is 1 – 127 = -126.

IEEE Floating Point Numbers

Representation of Zero  The smallest stored exponent 0000 0000 (in biased form), corresponding to a true exponent of -127, is used to represent zero.

IEEE Floating Point Numbers

Infinity and Not a Number (NaN) 

1111 1111



used as +- infinity.



1111 1111 and Mantissa != 0



used as NaN.

IEEE Floating Point Numbers

Example Representation  Represent -2345.125

number.

as a single precision IEEE floating point   -2345.125

10 = -100100101001.001

2 -2345.125

10 = -1.00100101001001

2 x 2 11     S = 1 (negative) The biased exponent is 11 + 127 = 138 = 10001010 2 The fractional part of the mantissa is .00100101001001000000000

Therefore, -2345.125

10 = 1 10001010 00100101001001000000000

IEEE Floating Point Numbers

Arithmetic Example #1 1.

     123.5

10 = 1111011.1

2 = 1.1110111 x 2 6 The mantissa is positive, and so S = 0.

The exponent is +6, which is stored in biased form as 6 + 127 = 133 10 = 10000101 2 .

The mantissa is 1.1110111, which is stored in 23 bits, with the leading ‘1’ suppressed.

Therefore, 123.5

10 is stored as: 0 10000101 11101110000000000000000 IEEE

IEEE Floating Point Numbers

Arithmetic Example #1 (Continued) 1.

Convert the decimal number s 123.5 and 100.25 into the IEEE 32-bit floating point number representation . Then carry out the subtraction of 123.5 – 100.25 and express the result as a normalized 32-bit floating point number. (Continued)      100.25

10 = 1100100.01

2 = 1.10010001 x 2 6 The mantissa is positive, and so S = 0.

The exponent is +6, which is stored in biased form as 6 + 127 = 133 10 = 10000101 2 .

The mantissa is 1.10010001, which is stored in 23-bits, with the leading ‘1’ suppressed.

Therefore, 100.25

10 is stored as: 0 10000101 10010001000000000000000 IEEE

IEEE Floating Point Numbers

Arithmetic Example #1 (Continued) 1.

Convert the decimal numbers 123.5 and 100.25 into the IEEE 32-bit floating point number representation. Then carry out the subtraction of 123.5 – 100.25 and express the result as a normalized 32-bit floating point number. (Continued)   The two IEEE numbers are first unpacked: the sign, exponent, and mantissa must be reconstituted.

The two exponents are compared. If they are the same, the mantissas are added. If they are not, the number with the smaller exponent is denormalized by shifting its mantissa right (i.e., dividing by 2) and incrementing its exponent (i.e., multiplying by 2) until the two exponents are equal. Then the numbers are added.

IEEE Floating Point Numbers

Arithmetic Example #1 (Continued) 1.

Convert the decimal numbers 123.5 and 100.25 into the 32-bit floating point number representation. Then carry out the subtraction of 123.5 – 100.25 and express the result as a normalized 32-bit floating point number. (Continued)  After unpacking, insert the leading ‘1’ and perform the subtraction.

1.11101110000000000000000

-1.10010001000000000000000

0.01011101000000000000000

 Normalize the result: 1.01110100000000000000000

IEEE Floating Point Numbers

Arithmetic Example #1 (Continued) 1.

Convert the decimal numbers 123.5 and 100.25 into the IEEE 32-bit floating point number representation. Then carry out the subtraction of 123.5 – 100.25 and express the result as a normalized 32-bit floating point number. (Continued)   The exponent must be decreased by 2.

10000101 – 2 10 = 10000011  The result expressed in IEEE format is: 0 10000011 01110100000000000000000

IEEE Floating Point Numbers

Arithmetic Example #2 2.

Convert the decimal number s 42.6875 and -0.09375 into the IEEE 32-bit floating point number representation. Then carry out the addition of 42.6875 and – 0.09375 and express the result as a normalized 32-bit floating point number.

     42.6875

10 = 101010.1011

2 = 1.010101011 x 2 The mantissa is positive, and so S = 0.

5 The exponent is +5, which is stored in biased form as 5 + 127 = 132 10 = 10000100 2 .

The mantissa is 1.010101011, which is stored in 23-bits, with the leading ‘1’ suppressed.

Therefore, 42.6875

10 is stored as: 0 10000100 01010101100000000000000 IEEE

IEEE Floating Point Numbers

Arithmetic Example #2 (Continued) 2.

Convert the decimal number s 42.6875 and -0.09375 into the IEEE 32-bit floating point number representation. Then carry out the addition of 42.6875 – 0.09375 and express the result as a normalized 32-bit floating point number (continued).

     -0.09375

10 = -0.00011

2 = -1.1 x 2 -4 The mantissa is negative, and so S = 1.

The exponent is -4, which is stored in biased form as -4 + 127 = 123 10 = 01111011 2 .

The mantissa is 1.1, which is stored in 23-bits, with the leading ‘1’ suppressed.

Therefore, -0.09375

10 is stored as: 1 01111011 10000000000000000000000 IEEE

IEEE Floating Point Numbers

Arithmetic Example #2 (Continued) 2.

… +42.6875

10 :0 10000100

01010101100000000000000 -0.09375

10 :1 01111011

10000000000000000000000   In order to perform the addition, the exponents must be the same.

Increase the second exponent by 9 and shift the mantissa right 9 times to get: +42.6875

10 :0 10000100

01010101100000000000000 -0.09375

10 :1 10000100 000000000

10000000000000000000000

IEEE Floating Point Numbers

Arithmetic Example #2 (Continued) 2.

… +42.6875

10 :0 10000100

01010101100000000000000 -0.09375

10 :1 10000100 000000000

10000000000000000000000  Adding the mantissas, we get: 101010100110000000000000   The result is positive with a biased exponent of 10000100.

Therefore, the result is stored as: 0 10000100 0101010011000000000000

Fixed Point Numbers

Transcript Fixed Point Numbers

Floating Point Numbers

Topics Covered

Fixed Point Numbers

Fixed Point (Binary) Numbers

Problem with Fixed Point (Binary) Numbers

Floating Point Numbers

IEEE Floating Point Numbers

-1

* 2

* 1.F

B = 127

IEEE Floating Point Numbers

IEEE Floating Point Numbers

IEEE Floating Point Numbers

IEEE Floating Point Numbers

IEEE Floating Point Numbers

1111 1111

used as +- infinity.

1111 1111 and Mantissa != 0

used as NaN.

IEEE Floating Point Numbers

IEEE Floating Point Numbers

IEEE Floating Point Numbers

IEEE Floating Point Numbers

IEEE Floating Point Numbers

IEEE Floating Point Numbers

IEEE Floating Point Numbers

IEEE Floating Point Numbers

IEEE Floating Point Numbers

IEEE Floating Point Numbers

Directory