Chapter 2 - Iowa State University
Download
Report
Transcript Chapter 2 - Iowa State University
CprE 381 Computer Organization and Assembly
Level Programming, Fall 2013
Chapter 2
Instructions: Language
of the Computer
Zhao Zhang
Iowa State University
Revised from original slides provided
by MKP
MIPS procedure/function call convention
Leaf and non-leaf examples
Clearing array example
String copy example
Other issues:
Load 32-bit immediate
Assembler, loader, and compiler effects
§2.8 Supporting Procedures in Computer Hardware
Review of Week 4
Chapter 2 — Instructions: Language of the Computer — 2
Announcements
Exam 1 on Friday Oct. 4
Course review on Wednesday Oct. 2
HW4 is due on Sep. 27
HW5 will be due on Oct. 11
Do HW5 as exercise before Exam 1
No HW and quizzes next week
Lab 2 demo is due this week and Lab 3
demo due next week
Lab 4 starts next week, due in one week
Chapter 1 — Computer Abstractions and Technology — 3
Exam 1
Open book, open notes, calculator are
allowed
E-book reader is allowed
Must be put in airplane mode
Coverage
Chapter 1, Computer Abstraction and Technology
Chapter 2, Instructions: Language of the
Computer
Some contents from Appendix B
MIPS floating-point instructions
Chapter 1 — Computer Abstractions and Technology — 4
Exam Question Types
Short conceptual questions
Calculation: speedup, power saving, CPI, etc.
MIPS assembly programming
Translate C statements to MIPS (arithmetic,
load/store, branch and jump, others)
Translate C functions to MIPS (call convention)
Among others
Suggestions:
Review slides and textbook
Review homework and quizzes
Chapter 1 — Computer Abstractions and Technology — 5
Overview for Week 5
Overview for Week 5, Sep. 23 - 27
Bubble sorting example
It will be used in Mini-Projects
Floating point instructions
ARM and x86 instruction set overview
Chapter 1 — Computer Abstractions and Technology — 6
Classic Bubble Sorting
Bubble sort: Swap two adjacent elements
if they are out of order
Pass the array n times, each time a largest
element will float to the top
Look at the first pass of five elements
1st try: 5 3 8 2 7 => 3 5 8 2 7
2nd try: 3 5 8 2 7 => 3 5 8 2 7
3rd try: 3 5 8 2 7 => 3 5 2 8 7
4th try: 3 5 2 7 8 => 3 5 2 7 8
Chapter 1 — Computer Abstractions and Technology — 7
Classic Bubble Sorting
Pass i only has to check for (n-i) swaps
In each pass, an element may float up until it
meets a larger element
The sorted sub-array increments by one
1st
2nd
3nd
4nd
pass:
pass:
pass:
pass:
5
3
3
2
3
5
2
3
8
2
5
5
2
7
7
7
7
8
8
8
=>
=>
=>
=>
3
3
2
2
5
2
3
3
2
5
5
5
7
7
7
7
8
8
8
8
Chapter 1 — Computer Abstractions and Technology — 8
Revised Bubble Sorting
The textbook bubble-sort is optimized to
reduce comparisons
void sort (int v[], int n)
{
int i, j;
for (i = 0; i < n; i++) {
for (j = i – 1; j >= 0 && v[j] > v[j+1]; j--)
swap(v, j);
}
}
Chapter 1 — Computer Abstractions and Technology — 9
Revised Bubble Sorting
The classic one let a largest element float
to the top of the unsorted sub-array
The revised one let an element float to its
right place in the sorted sub-array
1st
2nd
3nd
4nd
pass:
pass:
pass:
pass:
5
3
3
2
3
5
5
3
8
8
8
5
2
2
2
8
7
7
7
7
=>
=>
=>
=>
3
3
2
2
5
5
3
3
8
8
5
5
2
2
8
7
7
7
7
8
Chapter 1 — Computer Abstractions and Technology — 10
The swap function is a leaf function
void swap(int v[], int k)
{
int temp;
temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;
}
v in $a0, k in $a1, temp in $t0
§2.13 A C Sort Example to Put It All Together
The Swap Function
Chapter 2 — Instructions: Language of the Computer — 11
The Swap Function
swap: sll $t1, $a1, 2
# $t1 = k * 4
add $t1, $a0, $t1 # $t1 = v+(k*4)
#
(address of v[k])
lw $t0, 0($t1)
# $t0 (temp) = v[k]
lw $t2, 4($t1)
# $t2 = v[k+1]
sw $t2, 0($t1)
# v[k] = $t2 (v[k+1])
sw $t0, 4($t1)
# v[k+1] = $t0 (temp)
jr $ra
# return to calling routine
Chapter 2 — Instructions: Language of the Computer — 12
The Sort Function
for (i = 0; i < n; i++) {
for (j = i – 1; j >= 0 && v[j] > v[j+1]; j--)
swap(v, j);
}
Save $ra to stack, as it’s a non-leaf function
Assign i and j to $s0 and $s1
Move v, n from $a0 and $a1 to $s2 and $s2
They must be preserved when calling swap()
They must be preserved, too
$a0 and $a1 are used when calling swap()
We need a stack frame of 5 words or 20
bytes
Chapter 1 — Computer Abstractions and Technology — 13
Sort Prologue and Epilogue
sort:
addi $sp,$sp, –20
sw $ra, 16($sp)
sw $s3,12($sp)
sw $s2, 8($sp)
sw $s1, 4($sp)
sw $s0, 0($sp)
…
…
exit1: lw $s0, 0($sp)
lw $s1, 4($sp)
lw $s2, 8($sp)
lw $s3,12($sp)
lw $ra,16($sp)
addi $sp,$sp, 20
jr $ra
#
#
#
#
#
#
#
make room on stack for 5 registers
save $ra on stack
save $s3 on stack
save $s2 on stack
save $s1 on stack
save $s0 on stack
procedure body
#
#
#
#
#
#
#
restore $s0 from stack
restore $s1 from stack
restore $s2 from stack
restore $s3 from stack
restore $ra from stack
restore stack pointer
return to calling routine
• Entry: Get a frame, save $ra and $s3-$s0
• Exit: Restore $s0-$s3 and $ra, free the frame
Chapter 2 — Instructions: Language of the Computer — 14
Sort Function Body
A new pseudo instruction
move rd, rs
is equivalent to
add rd, rs, $zero
Example
move
move
$s2, $a0
$s3, $a1
# $s2 = $zero
# $s3 = $a1
No use of pseudo assembly instructions in
Exam 1
Chapter 1 — Computer Abstractions and Technology — 15
Sort Function Body
move
move
move
for1tst: slt
beq
addi
for2tst: slti
bne
sll
add
lw
lw
slt
beq
move
move
jal
addi
j
exit2:
addi
j
$s2, $a0
$s3, $a1
$s0, $zero
$t0, $s0, $s3
$t0, $zero, exit1
$s1, $s0, –1
$t0, $s1, 0
$t0, $zero, exit2
$t1, $s1, 2
$t2, $s2, $t1
$t3, 0($t2)
$t4, 4($t2)
$t0, $t4, $t3
$t0, $zero, exit2
$a0, $s2
$a1, $s1
swap
$s1, $s1, –1
for2tst
$s0, $s0, 1
for1tst
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
save $a0 into $s2
save $a1 into $s3
i = 0
$t0 = 0 if $s0 ≥ $s3 (i ≥ n)
go to exit1 if $s0 ≥ $s3 (i ≥ n)
j = i – 1
$t0 = 1 if $s1 < 0 (j < 0)
go to exit2 if $s1 < 0 (j < 0)
$t1 = j * 4
$t2 = v + (j * 4)
$t3 = v[j]
$t4 = v[j + 1]
$t0 = 0 if $t4 ≥ $t3
go to exit2 if $t4 ≥ $t3
1st param of swap is v (old $a0)
2nd param of swap is j
call swap procedure
j –= 1
jump to test of inner loop
i += 1
jump to test of outer loop
Move
params
Outer loop
Inner loop
Pass
params
& call
Inner loop
Outer loop
Chapter 2 — Instructions: Language of the Computer — 16
Sort Function Optimized
Old version:
void sort(int v[], int n)
int i, j;
for (i = 0; i < n; i++) {
for (j = i – 1; j >= 0 && v[j] > v[j+1]; j--)
swap(v, j);
}
New version:
void sort(int v[], int n)
{
int *pi, *pj;
for (pi = v; pi < &v[n]; pi++)
for (pj = pj - 1; pj >= v && swap(pj); pj--)
{}
}
Chapter 1 — Computer Abstractions and Technology — 17
New Swap Function
A more efficient swap function that reduces
memory loads
// swap two adjacent elements if they are
// out of order. Return 1 if swapped, 0
// otherwise
int swap(int *p)
{
if (p[0] > p[1]) {
int tmp = p[0];
p[0] = p[1];
p[1] = tmp;
return 1;
}
else
return 0;
}
Chapter 1 — Computer Abstractions and Technology — 18
New Swap Function
A new swap function
swap:
lw
lw
slt
beq
sw
sw
addi
jr
else:
addi
jr
$t0,
$t1,
$t2,
$t2,
$t1,
$t0,
$v0,
$ra
0($a0)
#
4($a0)
#
$t1, $t0 #
$zero, else
0($a0)
#
4($a0)
#
$zero, 1 #
$v0, $zero, 0
$ra
load p[0]
load p[1]
p[1] < p[0]?
swap
swap
$v0 = 1
# $v0 = 0
Chapter 1 — Computer Abstractions and Technology — 19
New Sort Function
The sort() function optimized
Register usage
$s0:
$s1:
$s2:
$s3:
v
&v[n]
pi
pj
Need a frame of 5 words to save $ra and
$s0-$s2
Chapter 1 — Computer Abstractions and Technology — 20
Sort Prologue and Epilogue
sort:
addi
sw
sw
sw
sw
sw
$sp,
$ra,
$s3,
$s2,
$s1,
$s0,
$sp, -20
16($sp)
12($sp)
8($sp)
4($sp)
0($sp)
# frame of 5 words
MIPS code for sort function body
lw
lw
lw
lw
lw
addi
jr
$s0,
$s1,
$s2,
$s3,
$ra,
$sp,
$ra
0($sp)
4($sp)
8($sp)
12($sp)
16($sp)
$sp, 20
# release frame
Chapter 1 — Computer Abstractions and Technology — 21
New Sort: Outer Loop
for (pi = v; pi < &v[n]; pi++)
for
pj loop
- 1; pj >= v && swap(pj); pj--)
C
code(pj
for the=inner
{}
add
$s0, $a0,
sll
$a1, $a1,
add
$s1, $s0,
add
$s2, $s0,
j
for1_tst
for1_loop:
$zero
2
$a1
$zero
#
#
#
#
$s0 = v
$a1 = 4*n
$s1 = &v[n]
pi = v
MIPS code for the inner loop
addi $s2, $s2, 4
# pi++
for1_tst:
slt
$t0, $s2, $s1
# pi < &v[n]?
bne
$t0, $zero, for1_loop # yes? repeat
Chapter 1 — Computer Abstractions and Technology — 22
New Sort: Inner Loop
for (pj = pi-1; pj >= v && swap(pj); pj--)
{}
addi $s3, $s2, -4
j
for2_tst
for2_loop:
addi $s3, $s3, -4
for2_tst:
slt
$t0, $s3, $s0
bne
$t0, $zero,for2_exit
add
$a0, $s3, $zero
jal
swap
bne
$v0, $zero,for2_loop
cont
for2_exit:
# pj = pi-1
# pj--
#
#
#
#
#
pj < v?
yes? exit
$a0 = pj
swap(pj)
ret 1?
Chapter 1 — Computer Abstractions and Technology — 23
Lab Mini-Projects
You will use the sorting code to test your
CPU design in the lab mini-projects
Use the new sorting code
The new code is more optimized
It will simplify the debugging
Chapter 1 — Computer Abstractions and Technology — 24
FP Instructions in MIPS
Reading: Textbook Ch. 3.5 and B-71 – B80
FP hardware is coprocessor 1
Adjunct processor that extends the ISA
Separate FP registers
32 single-precision: $f0, $f1, … $f31
Paired for double-precision: $f0/$f1,
$f2/$f3, …
Release 2 of MIPS ISA supports 32 ×
64-bit FP reg’s
Chapter 3 — Arithmetic for Computers — 25
FP Instructions in MIPS
FP instructions operate only on FP
registers
Programs generally don’t do integer ops
on FP data, or vice versa
More registers with minimal code-size
impact
Chapter 1 — Computer Abstractions and Technology — 26
FP Instructions in MIPS
FP load and store instructions
lwc1, ldc1, swc1, sdc1
e.g., ldc1 $f8, 32($sp)
lwc1, swc1: Load/store singleprecision
ldc1, swc1: Load/store doubleprecision
Chapter 1 — Computer Abstractions and Technology — 27
FP Instructions in MIPS
Single-precision arithmetic
add.s, sub.s, mul.s, div.s
e.g., add.s $f0, $f1, $f6
Double-precision arithmetic
add.d, sub.d, mul.d, div.d
e.g., mul.d $f4, $f4, $f6
Chapter 3 — Arithmetic for Computers — 28
FP Instructions in MIPS
Single- and double-precision comparison
c.xx.s, c.xx.d (xx is eq, lt, le, …)
Sets or clears FP condition-code bit
e.g. c.lt.s $f3, $f4
Branch on FP condition code true or false
bc1t, bc1f
e.g., bc1t TargetLabel
Chapter 1 — Computer Abstractions and Technology — 29
MIPS Call Convention: FP
The first two FP parameters in registers
1st parameter in $f12 or $f12:$f13
A double-precision parameter takes two registers
2nd FP parameter in $f14 or $f14:$f15
Extra parameters in stack
$f0 stores single-precision FP return value
$f0:$f1 stores double-precision FP return
value
$f0-$f19 are FP temporary registers
$f20-$f31 are FP saved temporary registers
Chapter 1 — Computer Abstractions and Technology — 30
FP Example: °F to °C
C code:
float f2c (float fahr)
{
return ((5.0/9.0) * (fahr - 32.0));
}
fahr in $f12, result in $f0
Assume literals in global memory space,
e.g. const5 for 5.0 and const9 for 9.0
Can FP immediate be encoded in MIPS
instructions?
Chapter 3 — Arithmetic for Computers — 31
FP Example: °F to °C
Compiled MIPS code:
f2c: lwc1
lwc1
div.s
lwc1
sub.s
mul.s
jr
$f16,
$f18,
$f16,
$f18,
$f18,
$f0,
$ra
const5($gp)
const9($gp)
$f16, $f18
const32($gp)
$f12, $f18
$f16, $f18
Chapter 1 — Computer Abstractions and Technology — 32
FP Example: Function Call
extern float fahr, cel;
cel = f2c(fahr);
Assume fahr is at 100($gp), cel is at 104($gp)
lwc1
jal
swcl
$f12, 100($gp)
f2c
$f0, 104($gp);
# load 1st para
# save ret val
Chapter 1 — Computer Abstractions and Technology — 33
FP Example: Max
double max(double x, double y)
{
return (x > y) ? x : y;
}
max:
c.lt.d
bc1f
mov.d
jr
else:
mov.d
jr
$f14, $f12
else
$f0, $f12
$ra
# y < x?
# if false, do else
# $f0:$f1 = x
$f0, $f14
$ra
# $f0:$f1 = y
Chapter 1 — Computer Abstractions and Technology — 34
FP Example: Max
How to call max?
Assume a, b, c at 100($gp), 108($gp), and 116($gp)
extern double a, b, c;
c = max(a, b);
ldc1
ldc1
jal
sdc1
$f12, 100($gp)
$f14, 108($gp)
max
$f0, 116($gp)
# $f12:$f13 = a
# $f14:$f15 = b
# c = $f0:$f1
Chapter 1 — Computer Abstractions and Technology — 35
FP Example: Search Value
int search(double X[], int size, double value)
{
for (int i = 0; i < size; i++)
if (X[i] == value)
return 1;
return 0;
}
Note 1: There are integer and FP parameters, and the
return value is integer
Note 2: A real program may search a value in a range, e.g.
[value - delta, value + delta]
Chapter 1 — Computer Abstractions and Technology — 36
FP Example: Search Value
search:
add
j
for_loop:
sll
add
lwc1
c.eq.d
bc1f
addi
jr
endif:
addi
for_cond:
slt
bne
add
jr
$t0, $zero, $zero
for_cond
# i = 0
$t1, $t0, 3
$t1, $a0, $t1
$f2, 0($t1)
$f2, $f12
endif
$v0, $zero, 1
$ra
#
#
#
#
#
#
#
$t0, $t0, 1
# i++
$t1 = 8*i
$t1 = &X[i]
$f2 = X[i]
X[i] == value?
if false, skip
$v0 = 1
return
$t1, $t0, $a1
# i < size?
$t1, $zero, for_loop # repeat if true
$v0, $zero, $zero # to return 0
$ra
Chapter 1 — Computer Abstractions and Technology — 37
FP Example: Array Multiplication
X=X+Y×Z
All 32 × 32 matrices, 64-bit double-precision
elements
C code:
void mm (double x[][],
double y[][], double z[][]) {
int i, j, k;
for (i = 0; i! = 32; i = i + 1)
for (j = 0; j! = 32; j = j + 1)
for (k = 0; k! = 32; k = k + 1)
x[i][j] = x[i][j]
+ y[i][k] * z[k][j];
}
Addresses of x, y, z in $a0, $a1, $a2, and
i, j, k in $s0, $s1, $s2
Chapter 3 — Arithmetic for Computers — 38
FP Example: Array Multiplication
MIPS code:
li
li
L1: li
L2: li
sll
addu
sll
addu
l.d
L3: sll
addu
sll
addu
l.d
…
$t1, 32
$s0, 0
$s1, 0
$s2, 0
$t2, $s0, 5
$t2, $t2, $s1
$t2, $t2, 3
$t2, $a0, $t2
$f4, 0($t2)
$t0, $s2, 5
$t0, $t0, $s1
$t0, $t0, 3
$t0, $a2, $t0
$f16, 0($t0)
#
#
#
#
#
#
#
#
#
#
#
#
#
#
$t1 = 32 (row size/loop end)
i = 0; initialize 1st for loop
j = 0; restart 2nd for loop
k = 0; restart 3rd for loop
$t2 = i * 32 (size of row of x)
$t2 = i * size(row) + j
$t2 = byte offset of [i][j]
$t2 = byte address of x[i][j]
$f4 = 8 bytes of x[i][j]
$t0 = k * 32 (size of row of z)
$t0 = k * size(row) + j
$t0 = byte offset of [k][j]
$t0 = byte address of z[k][j]
$f16 = 8 bytes of z[k][j]
Chapter 3 — Arithmetic for Computers — 39
FP Example: Array Multiplication
…
sll $t0, $s0, 5
addu $t0, $t0, $s2
sll
$t0, $t0, 3
addu $t0, $a1, $t0
l.d
$f18, 0($t0)
mul.d $f16, $f18, $f16
add.d $f4, $f4, $f16
addiu $s2, $s2, 1
bne
$s2, $t1, L3
s.d
$f4, 0($t2)
addiu $s1, $s1, 1
bne
$s1, $t1, L2
addiu $s0, $s0, 1
bne
$s0, $t1, L1
#
#
#
#
#
#
#
#
#
#
#
#
#
#
$t0 = i*32 (size of row of y)
$t0 = i*size(row) + k
$t0 = byte offset of [i][k]
$t0 = byte address of y[i][k]
$f18 = 8 bytes of y[i][k]
$f16 = y[i][k] * z[k][j]
f4=x[i][j] + y[i][k]*z[k][j]
$k k + 1
if (k != 32) go to L3
x[i][j] = $f4
$j = j + 1
if (j != 32) go to L2
$i = i + 1
if (i != 32) go to L1
Chapter 3 — Arithmetic for Computers — 40
ARM: the most popular embedded core
Similar basic set of instructions to MIPS
ARM
MIPS
1985
1985
Instruction size
32 bits
32 bits
Address space
32-bit flat
32-bit flat
Data alignment
Aligned
Aligned
9
3
15 × 32-bit
31 × 32-bit
Memory
mapped
Memory
mapped
Date announced
Data addressing modes
Registers
Input/output
§2.16 Real Stuff: ARM Instructions
ARM & MIPS Similarities
Chapter 2 — Instructions: Language of the Computer — 41
Compare and Branch in ARM
Uses condition codes for result of an
arithmetic/logical instruction
Negative, zero, carry, overflow
Compare instructions to set condition codes
without keeping the result
Each instruction can be conditional
Top 4 bits of instruction word: condition value
Can avoid branches over single instructions
Chapter 2 — Instructions: Language of the Computer — 42
Instruction Encoding
Chapter 2 — Instructions: Language of the Computer — 43
Evolution with backward compatibility
8080 (1974): 8-bit microprocessor
8086 (1978): 16-bit extension to 8080
Adds FP instructions and register stack
80286 (1982): 24-bit addresses, MMU
Complex instruction set (CISC)
8087 (1980): floating-point coprocessor
Accumulator, plus 3 index-register pairs
§2.17 Real Stuff: x86 Instructions
The Intel x86 ISA
Segmented memory mapping and protection
80386 (1985): 32-bit extension (now IA-32)
Additional addressing modes and operations
Paged memory mapping as well as segments
Chapter 2 — Instructions: Language of the Computer — 44
The Intel x86 ISA
Further evolution…
i486 (1989): pipelined, on-chip caches and FPU
Pentium (1993): superscalar, 64-bit datapath
New microarchitecture (see Colwell, The Pentium Chronicles)
Pentium III (1999)
Later versions added MMX (Multi-Media eXtension)
instructions
The infamous FDIV bug
Pentium Pro (1995), Pentium II (1997)
Compatible competitors: AMD, Cyrix, …
Added SSE (Streaming SIMD Extensions) and associated
registers
Pentium 4 (2001)
New microarchitecture
Added SSE2 instructions
Chapter 2 — Instructions: Language of the Computer — 45
The Intel x86 ISA
And further…
AMD64 (2003): extended architecture to 64 bits
EM64T – Extended Memory 64 Technology (2004)
Intel Core (2006)
Intel declined to follow, instead…
Advanced Vector Extension (announced 2008)
Added SSE4 instructions, virtual machine support
AMD64 (announced 2007): SSE5 instructions
AMD64 adopted by Intel (with refinements)
Added SSE3 instructions
Longer SSE registers, more instructions
If Intel didn’t extend with compatibility, its
competitors would!
Technical elegance ≠ market success
Chapter 2 — Instructions: Language of the Computer — 46
Basic x86 Registers
Chapter 2 — Instructions: Language of the Computer — 47
Basic x86 Addressing Modes
Two operands per instruction
Source/dest operand
Second source operand
Register
Register
Register
Immediate
Register
Memory
Memory
Register
Memory
Immediate
Memory addressing modes
Address in register
Address = Rbase + displacement
Address = Rbase + 2scale × Rindex (scale = 0, 1, 2, or 3)
Address = Rbase + 2scale × Rindex + displacement
Chapter 2 — Instructions: Language of the Computer — 48
x86 Instruction Encoding
Variable length
encoding
Postfix bytes specify
addressing mode
Prefix bytes modify
operation
Operand length,
repetition, locking, …
Chapter 2 — Instructions: Language of the Computer — 49
Implementing IA-32
Complex instruction set makes
implementation difficult
Hardware translates instructions to simpler
microoperations
Simple instructions: 1–1
Complex instructions: 1–many
Microengine similar to RISC
Market share makes this economically viable
Comparable performance to RISC
Compilers avoid complex instructions
Chapter 2 — Instructions: Language of the Computer — 50
Powerful instruction higher performance
Fewer instructions required
But complex instructions are hard to implement
May slow down all instructions, including simple ones
§2.18 Fallacies and Pitfalls
Fallacies
Compilers are good at making fast code from simple
instructions
Use assembly code for high performance
But modern compilers are better at dealing with
modern processors
More lines of code more errors and less
productivity
Chapter 2 — Instructions: Language of the Computer — 51
Fallacies
Backward compatibility instruction set
doesn’t change
But they do accrete more instructions
x86 instruction set
Chapter 2 — Instructions: Language of the Computer — 52
Pitfalls
Sequential words are not at sequential
addresses
Increment by 4, not by 1!
Keeping a pointer to an automatic variable
after procedure returns
e.g., passing pointer back via an argument
Pointer becomes invalid when stack popped
Chapter 2 — Instructions: Language of the Computer — 53
Design principles
1.
2.
3.
4.
Layers of software/hardware
Simplicity favors regularity
Smaller is faster
Make the common case fast
Good design demands good compromises
§2.19 Concluding Remarks
Concluding Remarks
Compiler, assembler, hardware
MIPS: typical of RISC ISAs
c.f. x86
Chapter 2 — Instructions: Language of the Computer — 54
Concluding Remarks
Measure MIPS instruction executions in
benchmark programs
Consider making the common case fast
Consider compromises
Instruction class
MIPS examples
SPEC2006 Int
SPEC2006 FP
Arithmetic
add, sub, addi
16%
48%
Data transfer
lw, sw, lb, lbu,
lh, lhu, sb, lui
35%
36%
Logical
and, or, nor, andi,
ori, sll, srl
12%
4%
Cond. Branch
beq, bne, slt,
slti, sltiu
34%
8%
Jump
j, jr, jal
2%
0%
Chapter 2 — Instructions: Language of the Computer — 55