Floating Point Representation
Download
Report
Transcript Floating Point Representation
Floating Point Representation
Major: All Engineering Majors
Authors: Autar Kaw, Matthew Emmons
http://numericalmethods.eng.usf.edu
Transforming Numerical Methods Education for STEM
Undergraduates
4/13/2015
http://numericalmethods.eng.usf.edu
1
Floating Point Representation
http://numericalmethods.eng.usf.edu
Floating Decimal Point : Scientific Form
256.78 is writtenas 2.567810
2
3
0.003678is writtenas 3.67810
256.78 is writtenas 2.567810
2
3
http://numericalmethods.eng.usf.edu
Example
The form is
or
sign mantissa10exponent
m 10e
Example: For
2.5678102
1
m 2.5678
e2
4
http://numericalmethods.eng.usf.edu
Floating Point Format for Binary
Numbers
y m 2
sign of number 0 for ve,1 for - ve
m mantissa12 m 102
e
1 is not stored as it is always given to be 1.
e integerexponent
5
http://numericalmethods.eng.usf.edu
Example
9 bit-hypothetical word
the
the
the
the
first bit is used for the sign of the number,
second bit for the sign of the exponent,
next four bits for the mantissa, and
next three bits for the exponent
54.7510 110110.112 1.10110112 25
1.10112 1012
We have the representation as
0
Sign of the
number
6
0
1
Sign of the
exponent
0
1
mantissa
1
1
0
1
exponent
http://numericalmethods.eng.usf.edu
Machine Epsilon
Defined as the measure of accuracy and found
by difference between 1 and the next number
that can be represented
7
http://numericalmethods.eng.usf.edu
Example
Ten bit word
Sign of number
Sign of exponent
Next four bits for exponent
Next four bits for mantissa
Next
number
0
0
0
0
0
0
0
0
0
0
110
0
0
0
0
0
0
0
0
0
1
1.00012 1.062510
mach 1.06251 24
8
http://numericalmethods.eng.usf.edu
Relative Error and Machine
Epsilon
The absolute relative true error in representing
a number will be less then the machine epsilon
Example
0.0283210 1.11002 25
1.11002 20110
2
10 bit word (sign, sign of exponent, 4 for exponent, 4 for mantissa)
0
Sign of the
number
1
0
Sign of the
exponent
1
1
0
exponent
1.11002 2 0110
2
1
1
0
0
mantissa
0.0274375
0.02832 0.0274375
a
0.02832
9
0.034472 2 4 0.0625
http://numericalmethods.eng.usf.edu
IEEE 754 Standards for Single
Precision Representation
http://numericalmethods.eng.usf.edu
IEEE-754 Floating Point
Standard
• Standardizes representation of
floating point numbers on
different computers in single and
double precision.
• Standardizes representation of
floating point operations on
different computers.
One Great Reference
What every computer scientist (and even if
you are not) should know about floating point
arithmetic!
http://www.validlab.com/goldberg/paper.pdf
IEEE-754 Format Single
Precision
32 bits for single precision
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Sign
(s)
Biased
Exponent (e’)
Mantissa (m)
.
Value (1)s 1 m2 2e' 127
13
Example#1
1 1 0 1 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Sign
(s)
Biased
Exponent (e’)
Mantissa (m)
Value 1 1. m2 2
s
e' 127
1 1.101000002 2
1 1.625 2162127
1 1.625 235 5.58341010
1
14
(10100010 ) 2 127
Example#2
Represent -5.5834x1010 as a single
precision floating point number.
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Sign
(s)
Biased
Exponent (e’)
Mantissa (m)
5.583410 1 1. ? 2
10
15
1
?
Exponent for 32 Bit IEEE-754
8 bits would represent
0 e 255
Bias is 127; so subtract 127 from
representation
127 e 128
16
Exponent for Special Cases
Actual range of
e
1 e 254
e 0 and e 255
are reserved for special numbers
Actual range of
e
126 e 127
Special Exponents and Numbers
e 0
e 255
s
0
1
0
1
0 or 1
e
all zeros
all zeros
all ones
all ones
all ones
all zeros
all ones
m
Represents
all zeros
0
all zeros
-0
all zeros
all zeros
non-zero
NaN
IEEE-754 Format
The largest number by magnitude
1.1........12 2
127
3.4010
38
The smallest number by magnitude
1.00......02 2126 2.181038
Machine epsilon
mach 2
23
19
7
1.19 10
Additional Resources
For all resources on this topic such as digital audiovisual
lectures, primers, textbook chapters, multiple-choice tests,
worksheets in MATLAB, MATHEMATICA, MathCad and
MAPLE, blogs, related physical problems, please visit
http://numericalmethods.eng.usf.edu/topics/floatingpoint_re
presentation.html
THE END
http://numericalmethods.eng.usf.edu