CSE 431. Computer Architecture

Download Report

Transcript CSE 431. Computer Architecture

CSCI-365
Computer Organization
Lecture
Note: Some slides and/or pictures in the following are adapted from:
Computer Organization and Design, Patterson & Hennessy, ©2005
Some slides and/or pictures in the following are adapted from:
slides ©2008 UCB
Number Representations

32-bit signed numbers (2’s complement):
0000 0000 0000 0000 0000 0000 0000 0000two = 0ten
0000 0000 0000 0000 0000 0000 0000 0001two = + 1ten
...
0111
0111
1000
1000
...
MSB
1111
1111
0000
0000
1111
1111
0000
0000
1111
1111
0000
0000
1111
1111
0000
0000
1111
1111
0000
0000
1111
1111
0000
0000
1110two
1111two
0000two
0001two
=
=
=
=
+
+
–
–
maxint
2,147,483,646ten
2,147,483,647ten
2,147,483,648ten
2,147,483,647ten
1111 1111 1111 1111 1111 1111 1111 1110two = – 2ten
1111 1111 1111 1111 1111 1111 1111 1111two = – 1ten
minint
LSB

Converting <32-bit values into 32-bit values
l
copy the most significant bit (the sign bit) into the “empty” bits
0010 -> 0000 0010
1010 -> 1111 1010
l
sign extend
versus
zero extend (lb vs. lbu)
MIPS Arithmetic Logic Unit (ALU)

zero ovf
Must support the Arithmetic/Logic
operations of the ISA
add, addi, addiu, addu
1
1
A
32
sub, subu
ALU
mult, multu, div, divu
sqrt
32
B
32
and, andi, nor, or, ori, xor, xori
beq, bne, slt, slti, sltiu, sltu

result
With special handling for
l
sign extend – addi, addiu, slti, sltiu
l
zero extend – andi, ori, xori
l
overflow detection – add, addi, sub
4
m (operation)
Dealing with Overflow

Overflow occurs when the result of an operation cannot
be represented in 32-bits, i.e., when the sign bit contains
a value bit of the result and not the proper sign bit


When adding operands with different signs or when subtracting
operands with the same sign, overflow can never occur
Operation
Operand A
Operand B
Result indicating
overflow
A+B
≥0
≥0
<0
A+B
<0
<0
≥0
A-B
≥0
<0
<0
A-B
<0
≥0
≥0
MIPS signals overflow with an exception (aka interrupt) –
an unscheduled procedure call where the EPC contains
the address of the instruction that caused the exception
Two’s Complement Arithmetic

Addition is accomplished by adding the codes, ignoring
any final carry

Subtraction: change the sign and add

16 + (-23) =?

16 - (-23) =?

-23 - (-16) =?
Hardware for Addition and Subtraction
Multiply

Binary multiplication is just a bunch of right shifts and
adds
n
multiplicand
multiplier
partial
product
array
n
can be formed in parallel
and added in parallel for
faster multiplication
double precision product
2n
Multiplication Example
1011 Multiplicand (11 dec)
x 1101 Multiplier
(13 dec)
1011 Partial products
0000
1011
1011
Note: if multiplier bit is 1 copy
multiplicand (place value)
otherwise zero
10001111 Product (143 dec)
Note: need double length result
Add and Right Shift Multiplier Hardware
0110
=6
multiplicand
add
32-bit ALU
product
shift
right
multiplier
0000
add 0 1 1 0
0011
0011
0001
add 0 1 1 1
0011
0011
0001
0101
0101
0010
0010
1001
1001
1100
1100
1110
Control
=5
= 30
Unsigned Binary Multiplication
Execution of Example
Multiplying Negative Numbers

This does not work!

Solution 1
l
l
l

Convert to positive if required
Multiply as above
If signs were different, negate answer
Solution 2
l
Booth’s algorithm
Booth’s Algorithm
Example of Booth’s Algorithm
MIPS Multiply Instruction

Multiply (mult and multu) produces a double
precision product
mult
$s0, $s1
0
l
l

16
# hi||lo = $s0 * $s1
17
0
0
0x18
Low-order word of the product is left in processor register lo
and the high-order word is left in register hi
Instructions mfhi rd and mflo rd are provided to move
the product to (user accessible) registers in the register file
Multiplies are usually done by fast, dedicated
hardware and are much more complex (and slower)
than adders
MIPS Multiplication
 Two
l
l
32-bit registers for product
HI: most-significant 32 bits
LO: least-significant 32-bits
 Instructions
l
mult rs, rt
/
multu rs, rt
- 64-bit product in HI/LO
l
mfhi rd
/
mflo rd
- Move from HI/LO to rd
- Can test HI value to see if product overflows 32 bits
l
mul rd, rs, rt
- Least-significant 32 bits of product –> rd
Division

Division is just a bunch of quotient digit guesses and left
shifts and subtracts
dividend = quotient x divisor + remainder
n
quotient
n
0 0 0
dividend
divisor
0
partial
remainder
array
0
0
remainder
n
Division of Unsigned Binary Integers
00001101
Quotient
1011 10010011
1011
001110
Partial
1011
Remainders
001111
1011
100
Dividend
Divisor
Remainder
Left Shift and Subtract Division Hardware
0010
=2
divisor
subtract
32-bit ALU
dividend
remainder
sub
sub
sub
sub
0000
0000
1110
0000
0001
1111
0001
0011
0001
0010
0000
quotient
shift
left
Control
0110 =6
1100
1100
rem neg, so ‘ient bit = 0
1100
restore remainder
1000
1100
rem neg, so ‘ient bit = 0
1000
restore remainder
0000
rem pos, so ‘ient bit = 1
0001
0010
rem pos, so ‘ient bit = 1
0011
= 3 with 0 remainder
Division of Signed Binary Integers
Division of Signed Binary Integers
Division of Signed Binary Integers
MIPS Divide Instruction

Divide (div and divu) generates the reminder in hi
and the quotient in lo
div
$s0, $s1
# lo = $s0 / $s1
# hi = $s0 mod $s1
0
l

16
17
0
0
0x1A
Instructions mfhi rd and mflo rd are provided to move
the quotient and reminder to (user accessible) registers in the
register file
As with multiply, divide ignores overflow so software
must determine if the quotient is too large. Software
must also check the divisor to avoid division by 0.
MIPS Division

Use HI/LO registers for result
l
l

HI: 32-bit remainder
LO: 32-bit quotient
Instructions
l
div rs, rt
/
divu rs, rt
l
No overflow or divide-by-0 checking
- Software must perform checks if required
l
Use mfhi, mflo to access result
Representation of Fractions
“Binary Point” like decimal point signifies
boundary between integer and fractional parts:
Example 6-bit
representation:
xx.yyyy
21
20
2-1
2-2
2-3
2-4
10.10102 = 1x21 + 1x2-1 + 1x2-3 = 2.62510
If we assume “fixed binary point”, range of 6-bit
representations with this format:
0 to 3.9375 (almost 4)
Fractional Powers of 2
i
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
2-i
1.0
1
0.5
1/2
0.25
1/4
0.125
1/8
0.0625
1/16
0.03125
1/32
0.015625
0.0078125
0.00390625
0.001953125
0.0009765625
0.00048828125
0.000244140625
0.0001220703125
0.00006103515625
0.000030517578125
Example:
0.828125 and 0.1640625
(done in class)
Representation of Fractions
So far, in our examples we used a “fixed” binary point. What we
really want is to “float” the binary point. Why?
Floating binary point most effective use of our limited bits (and
thus more accuracy in our number representation):
example: put 0.1640625 into binary. Represent as in
5-bits choosing where to put the binary point.
… 000000.001010100000…
Store these bits and keep track of the binary
point 2 places to the left of the MSB
Any other solution would lose accuracy!
With floating point rep., each numeral carries a exponent field recording
the whereabouts of its binary point.
The binary point can be outside the stored bits, so very large and small
numbers can be represented.
Scientific Notation (in Decimal)
significand
6.0210 x 1023
decimal point
exponent
radix (base)
 Normalized
form: no leadings 0s
(exactly one digit to left of decimal point)
 Alternatives
to representing 1/1,000,000,000
l
Normalized:
1.0 x 10-9
l
Not normalized:
0.1 x 10-8,10.0 x 10-10
Scientific Notation (in Binary)
significand
1.0two x 2-1
“binary point”
exponent
radix (base)
Computer
arithmetic that supports it called floating point,
because it represents numbers where the binary point is
not fixed, as it is for integers
l
Declare such variable in C as float
Floating Point Representation

Normal format: +1.xxxxxxxxxxtwo*2yyyytwo

32-bit version (C “float”)
31 30
23 22
S Exponent
1 bit
8 bits
S represents Sign
Exponent represents y’s
Significand represents x’s
Significand
23 bits
0
Floating Point Representation

What if result too large?
l

Overflow!  Exponent larger than represented in 8-bit Exponent
field
What if result too small?
l
Underflow!  Negative exponent larger than represented in 8-bit
Exponent field
overflow
overflow
-1

0
underflow
1
What would help reduce chances of overflow and/or
underflow?
Double Precision Fl. Pt. Representation

64 bit version (C “double”)
31 30
20 19
S
Exponent
1 bit
11 bits
Significand
0
20 bits
Significand (cont’d)
32 bits
• Double Precision (vs. Single Precision)
– C variable declared as double
– But primary advantage is greater accuracy due to larger significand
QUAD Precision Fl. Pt. Representation

Next Multiple of Word Size (128 bits)
l
l

Currently being worked on (IEEE 754r)
l

Current version has 15 exponent bits and 112 significand bits
(113 precision bits)
Oct-Precision?
l

Unbelievable range of numbers
Unbelievable precision (accuracy)
Some have tried, no real traction so far
Half-Precision?
l
Yep, that’s for a short (16 bit)
en.wikipedia.org/wiki/Quad_precision
en.wikipedia.org/wiki/Half_precision
IEEE 754 Floating Point Standard
Single Precision (DP similar):
31 30
23 22
S Exponent
1 bit
8 bits
Significand
23 bits

Sign bit:1 means negative, 0 means positive

Significand:
l
l
l

0
To pack more bits, leading 1 implicit for normalized numbers
1 + 23 bits single, 1 + 52 bits double
always true: 0 < Significand < 1 (for normalized numbers)
Note: 0 has no leading 1, so reserve exponent value 0
just for number 0
IEEE 754 Floating Point Standard
754 uses “biased exponent”
representation
IEEE
l
l
l

Designers wanted FP numbers to be used even if no FP hardware;
e.g., sort records with FP numbers using integer compares
Wanted bigger (integer) exponent field to represent bigger numbers
2’s complement poses a problem (because negative numbers look
bigger)
1.0x 2-1 and 1.0x21 (done in class)
IEEE 754 Floating Point Standard
• Called Biased Notation, where bias is number subtracted to get real number
– IEEE 754 uses bias of 127 for single precision
– Subtract 127 from Exponent field to get actual value for exponent
– 1023 is bias for double precision

Summary (single precision):
31 30
23 22
S Exponent
1 bit
•
8 bits
Significand
0
23 bits
(-1)S x (1 + Significand) x 2(Exponent-127)
– Double precision identical, except with exponent bias of 1023
(half, quad similar)
Single-Precision Range
 Exponents
 Smallest
00000000 and 11111111 reserved
value
l
Exponent: 00000001
 actual exponent = 1 – 127 = –126
Fraction: 000…00  significand = 1.0
l
±1.0 × 2–126 ≈ ±1.2 × 10–38
l
 Largest
l
l
l
value
exponent: 11111110
 actual exponent = 254 – 127 = +127
Fraction: 111…11  significand ≈ 2.0
±2.0 × 2+127 ≈ ±3.4 × 10+38
Double-Precision Range
 Exponents
 Smallest
0000…00 and 1111…11 reserved
value
l
Exponent: 00000000001
 actual exponent = 1 – 1023 = –1022
Fraction: 000…00  significand = 1.0
l
±1.0 × 2–1022 ≈ ±2.2 × 10–308
l
 Largest
l
l
l
value
Exponent: 11111111110
 actual exponent = 2046 – 1023 = +1023
Fraction: 111…11  significand ≈ 2.0
±2.0 × 2+1023 ≈ ±1.8 × 10+308
Floating-Point Precision

Relative precision
l all fraction bits are significant
l
Single: approx 2–23
- Equivalent to 23 × log102 ≈ 23 × 0.3 ≈ 6 decimal
digits of precision
l
Double: approx 2–52
- Equivalent to 52 × log102 ≈ 52 × 0.3 ≈ 16 decimal
digits of precision
Floating-Point Example

Represent –0.75
l –0.75 = (–1)1 × 1.12 × 2–1
l
l
l
S=1
Fraction = 1000…002
Exponent = –1 + Bias
- Single: –1 + 127 = 126 = 011111102
- Double: –1 + 1023 = 1022 = 011111111102

Single: 1011111101000…00

Double: 1011111111101000…00
Floating-Point Example

What number is represented by the single-precision float
11000000101000…00

l
S=1
l
Fraction = 01000…002
l
Fxponent = 100000012 = 129
x = (–1)1 × (1 + 012) × 2(129 – 127)
= (–1) × 1.25 × 22
= –5.0
Example: Converting Binary FP to
Decimal
1 10001001 10010010011000000000000
(done in class)
Example: Converting Decimal to FP
63.25
(done in class)
Representation for 0

Represent 0?
l exponent all zeroes
significand all zeroes
l What about sign? Both cases valid
+0: 0 00000000 00000000000000000000000
l
-0: 1 00000000 00000000000000000000000
Special Numbers

What have we defined so far? (Single Precision)
Exponent
Significand
Object
0
0
0
0
nonzero
???
1-254
anything
+/- fl. pt. #
255
0
+/- ∞
255
nonzero
???
Representation for Not a Number

What do I get if I calculate sqrt(-4.0)or 0/0?
l
l
l

If ∞ not an error, these shouldn’t be either
Called Not a Number (NaN)
Exponent = 255, Significand nonzero
Why is this useful?
l Hope NaNs help with debugging?
l They contaminate: op(NaN, X) = NaN
Infinities and NaNs

Exponent = 111...1, Fraction = 000...0
l ±Infinity
l

Can be used in subsequent calculations, avoiding
need for overflow check
Exponent = 111...1, Fraction ≠ 000...0
l
l
l
Not-a-Number (NaN)
Indicates illegal or undefined result
- e.g., 0.0 / 0.0
Can be used in subsequent calculations
Representation for Denorms

Problem: There’s a gap among representable FP
numbers around 0

Normalization and implicit 1 is to blame!
(done in class)
-
Gaps!
b
0 a
+
Representation for Denorms

Solution:
l
We still haven’t used Exponent = 0, Significand
nonzero
l
Denormalized number: no (implied) leading 1, implicit
exponent = -127
l
Smallest representable pos num:
a = 2-150
l
Second smallest representable pos num:
b = 2-149
-
0
+
Special Numbers

What have we defined so far? (Single Precision)
Exponent
Significand
Object
0
0
0
0
nonzero
Denorm
1-254
anything
+/- fl. pt. #
255
0
+/- ∞
255
nonzero
NaN
Floating-Point Addition

Consider a 4-digit decimal example
l


1. Align decimal points
l
Shift number with smaller exponent
l
9.999 × 101 + 0.016 × 101
2. Add significands
l

9.999 × 101 + 0.016 × 101 = 10.015 × 101
3. Normalize result & check for over/underflow
l

9.999 × 101 + 1.610 × 10–1
1.0015 × 102
4. Round and renormalize if necessary
l
1.002 × 102
Floating Point Addition

Addition (and subtraction)
(F1  2E1) + (F2  2E2) = F3  2E3
l
Step 0: Restore the hidden bit in F1 and in F2
l
Step 1: Align fractions by right shifting F2 by E1 - E2 positions
(assuming E1  E2) keeping track of (three of) the bits shifted out
in G R and S
l
Step 2: Add the resulting F2 to F1 to form F3
l
Step 3: Normalize F3 (so it is in the form 1.XXXXX …)
- If F1 and F2 have the same sign  F3 [1,4)  1 bit right shift F3
and increment E3 (check for overflow)
- If F1 and F2 have different signs  F3 may require many left shifts
each time decrementing E3 (check for underflow)
l
Step 4: Round F3 and possibly normalize F3 again
l
Step 5: Rehide the most significant bit of F3 before storing the
result
Floating Point Addition Example

Add
(0.5 = 1.0000  2-1) + (-0.4375 = -1.1100 2-2)
l
Step 0:
l
Step 1:
l
Step 2:
l
Step 3:
l
Step 4:
l
Step 5:
Floating Point Addition Example

Add
(0.5 = 1.0000  2-1) + (-0.4375 = -1.1100 2-2)
Hidden bits restored in the representation above
Shift significand with the smaller exponent (1.1100) right
until its exponent matches the larger exponent (so once)
l
Step 0:
l
Step 1:
l
Step 2:
l
Step 3: Normalize the sum, checking for exponent over/underflow
0.001 x 2-1 = 0.010 x 2-2 = .. = 1.000 x 2-4
l
Step 4: The sum is already rounded, so we’re done
l
Step 5: Rehide the hidden bit before storing
Add significands
1.0000 + (-0.111) = 1.0000 – 0.111 = 0.001
Hardware for
Floating-Point
Addition
Input 1
Input 2
Unpack
Signs Exponents
Significands
AddSub
Mu x
Sub
Possible swap
& complement
Align
significands
Control
& sign
logic
Add
Normalize
& round
Figure 12.5
Simplified schematic of
a floating-point adder.

Sign
Exponent
Significand
Pack
Output
Floating Point Multiplication

Multiplication
(F1  2E1) x (F2  2E2) = F3  2E3
l
Step 0: Restore the hidden bit in F1 and in F2
l
Step 1: Add the two (biased) exponents and subtract the bias
from the sum, so E1 + E2 – 127 = E3
also determine the sign of the product (which depends on the
sign of the operands (most significant bits))
l
Step 2: Multiply F1 by F2 to form a double precision F3
l
Step 3: Normalize F3 (so it is in the form 1.XXXXX …)
- Since F1 and F2 come in normalized  F3 [1,4)  1 bit right shift
F3 and increment E3
- Check for overflow/underflow
l
Step 4: Round F3 and possibly normalize F3 again
l
Step 5: Rehide the most significant bit of F3 before storing the
result
Floating Point Multiplication Example

Multiply
(0.5 = 1.0000  2-1) x (-0.4375 = -1.1100 2-2)
l
Step 0:
l
Step 1:
l
Step 2:
l
Step 3:
l
Step 4:
l
Step 5:
Floating Point Multiplication Example

Multiply
(0.5 = 1.0000  2-1) x (-0.4375 = -1.1100 2-2)
l
Step 0: Hidden bits restored in the representation above
l
Step 1: Add the exponents (not in bias would be -1 + (-2) = -3
and in bias would be (-1+127) + (-2+127) – 127 = (-1
-2) + (127+127-127) = -3 + 127 = 124
l
Step 2: Multiply the significands
1.0000 x 1.110 = 1.110000
l
Step 3: Normalized the product, checking for exp over/underflow
1.110000 x 2-3 is already normalized
l
Step 4: The product is already rounded, so we’re done
l
Step 5: Rehide the hidden bit before storing
Hardware for
Floating-Point
Multiplication
and Division
Input 1
Input 2
Unpack
Signs Exponents
Significands
MulDiv

Multiply
or divide
Control
& sign
logic
Normalize
& round

Figure 12.6 Simplified
schematic of a floatingpoint multiply/divide unit.
Sign
Exponent
Significand
Pack
Output
Floating Point Examples

Add
(0.75) + (-0.375)

Multiplication
(0.75) * (-0.375)
MIPS Floating Point Instructions


MIPS has a separate Floating Point Register File
($f0, $f1, …, $f31) (whose registers are used in
pairs for double precision values) with special instructions
to load to and store from them
lwcl
$f1,54($s2)
#$f1 = Memory[$s2+54]
swcl
$f1,58($s4)
#Memory[$s4+58] = $f1
And supports IEEE 754 single
add.s $f2,$f4,$f6 #$f2 = $f4 + $f6
and double precision operations
add.d $f2,$f4,$f6 #$f2||$f3 =
$f4||$f5 + $f6||$f7
similarly for sub.s, sub.d, mul.s, mul.d, div.s,
div.d
MIPS Floating Point Instructions, Con’t

And floating point single precision comparison operations
c.x.s $f2,$f4
#if($f2 < $f4) cond=1;
else cond=0
where x may be eq, neq, lt, le, gt, ge
and double precision comparison operations
c.x.d $f2,$f4
#$f2||$f3 < $f4||$f5
cond=1; else cond=0

And floating point branch operations
bclt
25
#if(cond==1)
go to PC+4+25
bclf
25
#if(cond==0)
go to PC+4+25
FP Example: °F to °C
C
code:
float f2c (float fahr) {
return ((5.0/9.0)*(fahr - 32.0));
}
l
fahr in $f12, result in $f0, literals in global memory
space
 Compiled
MIPS code:
f2c: lwc1
lwcl
div.s
lwcl
sub.s
mul.s
jr
$f16,
$f18,
$f16,
$f18,
$f18,
$f0,
$ra
const5($gp)
const9($gp)
$f16, $f18
const32($gp)
$f12, $f18
$f16, $f18
FP Example: Array Multiplication
X
l
C
=X+Y×Z
All 32 × 32 matrices, 64-bit double-precision elements
code:
void mm (double x[][],
double y[][], double z[][]) {
int i, j, k;
for (i = 0; i! = 32; i = i + 1)
for (j = 0; j! = 32; j = j + 1)
for (k = 0; k! = 32; k = k + 1)
x[i][j] = x[i][j]
+ y[i][k] * z[k][j];
}
l
Addresses of x, y, z in $a0, $a1, $a2, and
i, j, k in $s0, $s1, $s2
FP Example: Array Multiplication

MIPS code:
li
li
L1: li
L2: li
sll
addu
sll
addu
l.d
L3: sll
addu
sll
addu
l.d
…
$t1, 32
$s0, 0
$s1, 0
$s2, 0
$t2, $s0, 5
$t2, $t2, $s1
$t2, $t2, 3
$t2, $a0, $t2
$f4, 0($t2)
$t0, $s2, 5
$t0, $t0, $s1
$t0, $t0, 3
$t0, $a2, $t0
$f16, 0($t0)
#
#
#
#
#
#
#
#
#
#
#
#
#
#
$t1 = 32 (row size/loop end)
i = 0; initialize 1st for loop
j = 0; restart 2nd for loop
k = 0; restart 3rd for loop
$t2 = i * 32 (size of row of x)
$t2 = i * size(row) + j
$t2 = byte offset of [i][j]
$t2 = byte address of x[i][j]
$f4 = 8 bytes of x[i][j]
$t0 = k * 32 (size of row of z)
$t0 = k * size(row) + j
$t0 = byte offset of [k][j]
$t0 = byte address of z[k][j]
$f16 = 8 bytes of z[k][j]
FP Example: Array Multiplication
…
sll $t0, $s0, 5
addu $t0, $t0, $s2
sll
$t0, $t0, 3
addu $t0, $a1, $t0
l.d
$f18, 0($t0)
mul.d $f16, $f18, $f16
add.d $f4, $f4, $f16
addiu $s2, $s2, 1
bne
$s2, $t1, L3
s.d
$f4, 0($t2)
addiu $s1, $s1, 1
bne
$s1, $t1, L2
addiu $s0, $s0, 1
bne
$s0, $t1, L1
#
#
#
#
#
#
#
#
#
#
#
#
#
#
$t0 = i*32 (size of row of y)
$t0 = i*size(row) + k
$t0 = byte offset of [i][k]
$t0 = byte address of y[i][k]
$f18 = 8 bytes of y[i][k]
$f16 = y[i][k] * z[k][j]
f4=x[i][j] + y[i][k]*z[k][j]
$k k + 1
if (k != 32) go to L3
x[i][j] = $f4
$j = j + 1
if (j != 32) go to L2
$i = i + 1
if (i != 32) go to L1
Accurate Arithmetic
 IEEE
Std 754 specifies additional rounding
control
l
Extra bits of precision (guard, round, sticky)
l
Choice of rounding modes
l
Allows programmer to fine-tune numerical behavior of
a computation
 Not
l
all FP units implement all options
Most programming languages and FP libraries just
use defaults
 Trade-off
between hardware complexity,
performance, and market requirements
Problem

Calculate the sum of A and B assuming the 16-bit NVIDIA
format (1 bit sign, 5 bit exponent and 10 bit significands),
as well as, 1 guard, 1 round bit and 1 sticky bit.

A = -1.278 x 103

B = 3.90625 x 10-1
Problem

Calculate the product of A and B assuming the 16-bit
NVIDIA format (1 bit sign, 5 bit exponent and 10 bit
significands), as well as, 1 guard, 1 round bit and 1 sticky
bit.

A = 5.66015625 x 100

B = 8.59375 x 100
Associativity

Parallel programs may interleave operations in
unexpected orders
l
Assumptions of associativity may fail
(x+y)+z
x+(y+z)
-1.50E+38
x -1.50E+38
y 1.50E+38 0.00E+00
z
1.0
1.0 1.50E+38
1.00E+00 0.00E+00
Problem

Calculate (A+B)+C and A+(B+C) assuming the 16-bit
NVIDIA format (1 bit sign, 5 bit exponent and 10 bit
significands), as well as, 1 guard, 1 round bit and 1 sticky
bit.

A = 2.865625 x 101

B = 4.140625 x 10-1

C = 1.2140625 x 101