Chapter 2 - Iowa State University

Download Report

Transcript Chapter 2 - Iowa State University

CprE 381 Computer Organization and Assembly
Level Programming, Fall 2013
Chapter 2
Instructions: Language
of the Computer
Zhao Zhang
Iowa State University
Revised from original slides provided
by MKP





MIPS procedure/function call convention
Leaf and non-leaf examples
Clearing array example
String copy example
Other issues:


Load 32-bit immediate
Assembler, loader, and compiler effects
§2.8 Supporting Procedures in Computer Hardware
Review of Week 4
Chapter 2 — Instructions: Language of the Computer — 2
Announcements




Exam 1 on Friday Oct. 4
Course review on Wednesday Oct. 2
HW4 is due on Sep. 27
HW5 will be due on Oct. 11




Do HW5 as exercise before Exam 1
No HW and quizzes next week
Lab 2 demo is due this week and Lab 3
demo due next week
Lab 4 starts next week, due in one week
Chapter 1 — Computer Abstractions and Technology — 3
Exam 1


Open book, open notes, calculator are
allowed
E-book reader is allowed


Must be put in airplane mode
Coverage




Chapter 1, Computer Abstraction and Technology
Chapter 2, Instructions: Language of the
Computer
Some contents from Appendix B
MIPS floating-point instructions
Chapter 1 — Computer Abstractions and Technology — 4
Exam Question Types



Short conceptual questions
Calculation: speedup, power saving, CPI, etc.
MIPS assembly programming



Translate C statements to MIPS (arithmetic,
load/store, branch and jump, others)
Translate C functions to MIPS (call convention)
Among others
Suggestions:
 Review slides and textbook
 Review homework and quizzes
Chapter 1 — Computer Abstractions and Technology — 5
Overview for Week 5
Overview for Week 5, Sep. 23 - 27
 Bubble sorting example



It will be used in Mini-Projects
Floating point instructions
ARM and x86 instruction set overview
Chapter 1 — Computer Abstractions and Technology — 6
Classic Bubble Sorting
Bubble sort: Swap two adjacent elements
if they are out of order
 Pass the array n times, each time a largest
element will float to the top
 Look at the first pass of five elements
1st try: 5 3 8 2 7 => 3 5 8 2 7
2nd try: 3 5 8 2 7 => 3 5 8 2 7
3rd try: 3 5 8 2 7 => 3 5 2 8 7
4th try: 3 5 2 7 8 => 3 5 2 7 8

Chapter 1 — Computer Abstractions and Technology — 7
Classic Bubble Sorting

Pass i only has to check for (n-i) swaps

In each pass, an element may float up until it
meets a larger element

The sorted sub-array increments by one
1st
2nd
3nd
4nd
pass:
pass:
pass:
pass:
5
3
3
2
3
5
2
3
8
2
5
5
2
7
7
7
7
8
8
8
=>
=>
=>
=>
3
3
2
2
5
2
3
3
2
5
5
5
7
7
7
7
8
8
8
8
Chapter 1 — Computer Abstractions and Technology — 8
Revised Bubble Sorting

The textbook bubble-sort is optimized to
reduce comparisons
void sort (int v[], int n)
{
int i, j;
for (i = 0; i < n; i++) {
for (j = i – 1; j >= 0 && v[j] > v[j+1]; j--)
swap(v, j);
}
}
Chapter 1 — Computer Abstractions and Technology — 9
Revised Bubble Sorting

The classic one let a largest element float
to the top of the unsorted sub-array

The revised one let an element float to its
right place in the sorted sub-array
1st
2nd
3nd
4nd
pass:
pass:
pass:
pass:
5
3
3
2
3
5
5
3
8
8
8
5
2
2
2
8
7
7
7
7
=>
=>
=>
=>
3
3
2
2
5
5
3
3
8
8
5
5
2
2
8
7
7
7
7
8
Chapter 1 — Computer Abstractions and Technology — 10

The swap function is a leaf function

void swap(int v[], int k)
{
int temp;
temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;
}
v in $a0, k in $a1, temp in $t0
§2.13 A C Sort Example to Put It All Together
The Swap Function
Chapter 2 — Instructions: Language of the Computer — 11
The Swap Function
swap: sll $t1, $a1, 2
# $t1 = k * 4
add $t1, $a0, $t1 # $t1 = v+(k*4)
#
(address of v[k])
lw $t0, 0($t1)
# $t0 (temp) = v[k]
lw $t2, 4($t1)
# $t2 = v[k+1]
sw $t2, 0($t1)
# v[k] = $t2 (v[k+1])
sw $t0, 4($t1)
# v[k+1] = $t0 (temp)
jr $ra
# return to calling routine
Chapter 2 — Instructions: Language of the Computer — 12
The Sort Function
for (i = 0; i < n; i++) {
for (j = i – 1; j >= 0 && v[j] > v[j+1]; j--)
swap(v, j);
}


Save $ra to stack, as it’s a non-leaf function
Assign i and j to $s0 and $s1


Move v, n from $a0 and $a1 to $s2 and $s2



They must be preserved when calling swap()
They must be preserved, too
$a0 and $a1 are used when calling swap()
We need a stack frame of 5 words or 20
bytes
Chapter 1 — Computer Abstractions and Technology — 13
Sort Prologue and Epilogue
sort:
addi $sp,$sp, –20
sw $ra, 16($sp)
sw $s3,12($sp)
sw $s2, 8($sp)
sw $s1, 4($sp)
sw $s0, 0($sp)
…
…
exit1: lw $s0, 0($sp)
lw $s1, 4($sp)
lw $s2, 8($sp)
lw $s3,12($sp)
lw $ra,16($sp)
addi $sp,$sp, 20
jr $ra
#
#
#
#
#
#
#
make room on stack for 5 registers
save $ra on stack
save $s3 on stack
save $s2 on stack
save $s1 on stack
save $s0 on stack
procedure body
#
#
#
#
#
#
#
restore $s0 from stack
restore $s1 from stack
restore $s2 from stack
restore $s3 from stack
restore $ra from stack
restore stack pointer
return to calling routine
• Entry: Get a frame, save $ra and $s3-$s0
• Exit: Restore $s0-$s3 and $ra, free the frame
Chapter 2 — Instructions: Language of the Computer — 14
Sort Function Body
A new pseudo instruction
move rd, rs
is equivalent to
add rd, rs, $zero
Example
move
move
$s2, $a0
$s3, $a1
# $s2 = $zero
# $s3 = $a1
No use of pseudo assembly instructions in
Exam 1
Chapter 1 — Computer Abstractions and Technology — 15
Sort Function Body
move
move
move
for1tst: slt
beq
addi
for2tst: slti
bne
sll
add
lw
lw
slt
beq
move
move
jal
addi
j
exit2:
addi
j
$s2, $a0
$s3, $a1
$s0, $zero
$t0, $s0, $s3
$t0, $zero, exit1
$s1, $s0, –1
$t0, $s1, 0
$t0, $zero, exit2
$t1, $s1, 2
$t2, $s2, $t1
$t3, 0($t2)
$t4, 4($t2)
$t0, $t4, $t3
$t0, $zero, exit2
$a0, $s2
$a1, $s1
swap
$s1, $s1, –1
for2tst
$s0, $s0, 1
for1tst
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
save $a0 into $s2
save $a1 into $s3
i = 0
$t0 = 0 if $s0 ≥ $s3 (i ≥ n)
go to exit1 if $s0 ≥ $s3 (i ≥ n)
j = i – 1
$t0 = 1 if $s1 < 0 (j < 0)
go to exit2 if $s1 < 0 (j < 0)
$t1 = j * 4
$t2 = v + (j * 4)
$t3 = v[j]
$t4 = v[j + 1]
$t0 = 0 if $t4 ≥ $t3
go to exit2 if $t4 ≥ $t3
1st param of swap is v (old $a0)
2nd param of swap is j
call swap procedure
j –= 1
jump to test of inner loop
i += 1
jump to test of outer loop
Move
params
Outer loop
Inner loop
Pass
params
& call
Inner loop
Outer loop
Chapter 2 — Instructions: Language of the Computer — 16
Sort Function Optimized
Old version:
void sort(int v[], int n)
int i, j;
for (i = 0; i < n; i++) {
for (j = i – 1; j >= 0 && v[j] > v[j+1]; j--)
swap(v, j);
}
New version:
void sort(int v[], int n)
{
int *pi, *pj;
for (pi = v; pi < &v[n]; pi++)
for (pj = pj - 1; pj >= v && swap(pj); pj--)
{}
}
Chapter 1 — Computer Abstractions and Technology — 17
New Swap Function

A more efficient swap function that reduces
memory loads
// swap two adjacent elements if they are
// out of order. Return 1 if swapped, 0
// otherwise
int swap(int *p)
{
if (p[0] > p[1]) {
int tmp = p[0];
p[0] = p[1];
p[1] = tmp;
return 1;
}
else
return 0;
}
Chapter 1 — Computer Abstractions and Technology — 18
New Swap Function

A new swap function
swap:
lw
lw
slt
beq
sw
sw
addi
jr
else:
addi
jr
$t0,
$t1,
$t2,
$t2,
$t1,
$t0,
$v0,
$ra
0($a0)
#
4($a0)
#
$t1, $t0 #
$zero, else
0($a0)
#
4($a0)
#
$zero, 1 #
$v0, $zero, 0
$ra
load p[0]
load p[1]
p[1] < p[0]?
swap
swap
$v0 = 1
# $v0 = 0
Chapter 1 — Computer Abstractions and Technology — 19
New Sort Function
The sort() function optimized
 Register usage





$s0:
$s1:
$s2:
$s3:
v
&v[n]
pi
pj
Need a frame of 5 words to save $ra and
$s0-$s2
Chapter 1 — Computer Abstractions and Technology — 20
Sort Prologue and Epilogue
sort:
addi
sw
sw
sw
sw
sw
$sp,
$ra,
$s3,
$s2,
$s1,
$s0,
$sp, -20
16($sp)
12($sp)
8($sp)
4($sp)
0($sp)
# frame of 5 words
MIPS code for sort function body
lw
lw
lw
lw
lw
addi
jr
$s0,
$s1,
$s2,
$s3,
$ra,
$sp,
$ra
0($sp)
4($sp)
8($sp)
12($sp)
16($sp)
$sp, 20
# release frame
Chapter 1 — Computer Abstractions and Technology — 21
New Sort: Outer Loop
for (pi = v; pi < &v[n]; pi++)
for
pj loop
- 1; pj >= v && swap(pj); pj--)
C
code(pj
for the=inner
{}
add
$s0, $a0,
sll
$a1, $a1,
add
$s1, $s0,
add
$s2, $s0,
j
for1_tst
for1_loop:
$zero
2
$a1
$zero
#
#
#
#
$s0 = v
$a1 = 4*n
$s1 = &v[n]
pi = v
MIPS code for the inner loop
addi $s2, $s2, 4
# pi++
for1_tst:
slt
$t0, $s2, $s1
# pi < &v[n]?
bne
$t0, $zero, for1_loop # yes? repeat
Chapter 1 — Computer Abstractions and Technology — 22
New Sort: Inner Loop
for (pj = pi-1; pj >= v && swap(pj); pj--)
{}
addi $s3, $s2, -4
j
for2_tst
for2_loop:
addi $s3, $s3, -4
for2_tst:
slt
$t0, $s3, $s0
bne
$t0, $zero,for2_exit
add
$a0, $s3, $zero
jal
swap
bne
$v0, $zero,for2_loop
cont
for2_exit:
# pj = pi-1
# pj--
#
#
#
#
#
pj < v?
yes? exit
$a0 = pj
swap(pj)
ret 1?
Chapter 1 — Computer Abstractions and Technology — 23
Lab Mini-Projects


You will use the sorting code to test your
CPU design in the lab mini-projects
Use the new sorting code


The new code is more optimized
It will simplify the debugging
Chapter 1 — Computer Abstractions and Technology — 24
FP Instructions in MIPS
Reading: Textbook Ch. 3.5 and B-71 – B80
 FP hardware is coprocessor 1
 Adjunct processor that extends the ISA
 Separate FP registers
 32 single-precision: $f0, $f1, … $f31
 Paired for double-precision: $f0/$f1,
$f2/$f3, …
 Release 2 of MIPS ISA supports 32 ×
64-bit FP reg’s
Chapter 3 — Arithmetic for Computers — 25
FP Instructions in MIPS

FP instructions operate only on FP
registers
 Programs generally don’t do integer ops
on FP data, or vice versa
 More registers with minimal code-size
impact
Chapter 1 — Computer Abstractions and Technology — 26
FP Instructions in MIPS

FP load and store instructions
 lwc1, ldc1, swc1, sdc1
 e.g., ldc1 $f8, 32($sp)
lwc1, swc1: Load/store singleprecision
 ldc1, swc1: Load/store doubleprecision

Chapter 1 — Computer Abstractions and Technology — 27
FP Instructions in MIPS


Single-precision arithmetic
 add.s, sub.s, mul.s, div.s
 e.g., add.s $f0, $f1, $f6
Double-precision arithmetic
 add.d, sub.d, mul.d, div.d
 e.g., mul.d $f4, $f4, $f6
Chapter 3 — Arithmetic for Computers — 28
FP Instructions in MIPS


Single- and double-precision comparison
 c.xx.s, c.xx.d (xx is eq, lt, le, …)
 Sets or clears FP condition-code bit
 e.g. c.lt.s $f3, $f4
Branch on FP condition code true or false
 bc1t, bc1f
 e.g., bc1t TargetLabel
Chapter 1 — Computer Abstractions and Technology — 29
MIPS Call Convention: FP

The first two FP parameters in registers

1st parameter in $f12 or $f12:$f13







A double-precision parameter takes two registers
2nd FP parameter in $f14 or $f14:$f15
Extra parameters in stack
$f0 stores single-precision FP return value
$f0:$f1 stores double-precision FP return
value
$f0-$f19 are FP temporary registers
$f20-$f31 are FP saved temporary registers
Chapter 1 — Computer Abstractions and Technology — 30
FP Example: °F to °C

C code:
float f2c (float fahr)
{
return ((5.0/9.0) * (fahr - 32.0));
}


fahr in $f12, result in $f0
Assume literals in global memory space,
e.g. const5 for 5.0 and const9 for 9.0

Can FP immediate be encoded in MIPS
instructions?
Chapter 3 — Arithmetic for Computers — 31
FP Example: °F to °C

Compiled MIPS code:
f2c: lwc1
lwc1
div.s
lwc1
sub.s
mul.s
jr
$f16,
$f18,
$f16,
$f18,
$f18,
$f0,
$ra
const5($gp)
const9($gp)
$f16, $f18
const32($gp)
$f12, $f18
$f16, $f18
Chapter 1 — Computer Abstractions and Technology — 32
FP Example: Function Call
extern float fahr, cel;
cel = f2c(fahr);
Assume fahr is at 100($gp), cel is at 104($gp)
lwc1
jal
swcl
$f12, 100($gp)
f2c
$f0, 104($gp);
# load 1st para
# save ret val
Chapter 1 — Computer Abstractions and Technology — 33
FP Example: Max
double max(double x, double y)
{
return (x > y) ? x : y;
}
max:
c.lt.d
bc1f
mov.d
jr
else:
mov.d
jr
$f14, $f12
else
$f0, $f12
$ra
# y < x?
# if false, do else
# $f0:$f1 = x
$f0, $f14
$ra
# $f0:$f1 = y
Chapter 1 — Computer Abstractions and Technology — 34
FP Example: Max

How to call max?

Assume a, b, c at 100($gp), 108($gp), and 116($gp)
extern double a, b, c;
c = max(a, b);
ldc1
ldc1
jal
sdc1
$f12, 100($gp)
$f14, 108($gp)
max
$f0, 116($gp)
# $f12:$f13 = a
# $f14:$f15 = b
# c = $f0:$f1
Chapter 1 — Computer Abstractions and Technology — 35
FP Example: Search Value
int search(double X[], int size, double value)
{
for (int i = 0; i < size; i++)
if (X[i] == value)
return 1;
return 0;
}
Note 1: There are integer and FP parameters, and the
return value is integer
Note 2: A real program may search a value in a range, e.g.
[value - delta, value + delta]
Chapter 1 — Computer Abstractions and Technology — 36
FP Example: Search Value
search:
add
j
for_loop:
sll
add
lwc1
c.eq.d
bc1f
addi
jr
endif:
addi
for_cond:
slt
bne
add
jr
$t0, $zero, $zero
for_cond
# i = 0
$t1, $t0, 3
$t1, $a0, $t1
$f2, 0($t1)
$f2, $f12
endif
$v0, $zero, 1
$ra
#
#
#
#
#
#
#
$t0, $t0, 1
# i++
$t1 = 8*i
$t1 = &X[i]
$f2 = X[i]
X[i] == value?
if false, skip
$v0 = 1
return
$t1, $t0, $a1
# i < size?
$t1, $zero, for_loop # repeat if true
$v0, $zero, $zero # to return 0
$ra
Chapter 1 — Computer Abstractions and Technology — 37
FP Example: Array Multiplication

X=X+Y×Z


All 32 × 32 matrices, 64-bit double-precision
elements
C code:
void mm (double x[][],
double y[][], double z[][]) {
int i, j, k;
for (i = 0; i! = 32; i = i + 1)
for (j = 0; j! = 32; j = j + 1)
for (k = 0; k! = 32; k = k + 1)
x[i][j] = x[i][j]
+ y[i][k] * z[k][j];
}
 Addresses of x, y, z in $a0, $a1, $a2, and
i, j, k in $s0, $s1, $s2
Chapter 3 — Arithmetic for Computers — 38
FP Example: Array Multiplication

MIPS code:
li
li
L1: li
L2: li
sll
addu
sll
addu
l.d
L3: sll
addu
sll
addu
l.d
…
$t1, 32
$s0, 0
$s1, 0
$s2, 0
$t2, $s0, 5
$t2, $t2, $s1
$t2, $t2, 3
$t2, $a0, $t2
$f4, 0($t2)
$t0, $s2, 5
$t0, $t0, $s1
$t0, $t0, 3
$t0, $a2, $t0
$f16, 0($t0)
#
#
#
#
#
#
#
#
#
#
#
#
#
#
$t1 = 32 (row size/loop end)
i = 0; initialize 1st for loop
j = 0; restart 2nd for loop
k = 0; restart 3rd for loop
$t2 = i * 32 (size of row of x)
$t2 = i * size(row) + j
$t2 = byte offset of [i][j]
$t2 = byte address of x[i][j]
$f4 = 8 bytes of x[i][j]
$t0 = k * 32 (size of row of z)
$t0 = k * size(row) + j
$t0 = byte offset of [k][j]
$t0 = byte address of z[k][j]
$f16 = 8 bytes of z[k][j]
Chapter 3 — Arithmetic for Computers — 39
FP Example: Array Multiplication
…
sll $t0, $s0, 5
addu $t0, $t0, $s2
sll
$t0, $t0, 3
addu $t0, $a1, $t0
l.d
$f18, 0($t0)
mul.d $f16, $f18, $f16
add.d $f4, $f4, $f16
addiu $s2, $s2, 1
bne
$s2, $t1, L3
s.d
$f4, 0($t2)
addiu $s1, $s1, 1
bne
$s1, $t1, L2
addiu $s0, $s0, 1
bne
$s0, $t1, L1
#
#
#
#
#
#
#
#
#
#
#
#
#
#
$t0 = i*32 (size of row of y)
$t0 = i*size(row) + k
$t0 = byte offset of [i][k]
$t0 = byte address of y[i][k]
$f18 = 8 bytes of y[i][k]
$f16 = y[i][k] * z[k][j]
f4=x[i][j] + y[i][k]*z[k][j]
$k k + 1
if (k != 32) go to L3
x[i][j] = $f4
$j = j + 1
if (j != 32) go to L2
$i = i + 1
if (i != 32) go to L1
Chapter 3 — Arithmetic for Computers — 40


ARM: the most popular embedded core
Similar basic set of instructions to MIPS
ARM
MIPS
1985
1985
Instruction size
32 bits
32 bits
Address space
32-bit flat
32-bit flat
Data alignment
Aligned
Aligned
9
3
15 × 32-bit
31 × 32-bit
Memory
mapped
Memory
mapped
Date announced
Data addressing modes
Registers
Input/output
§2.16 Real Stuff: ARM Instructions
ARM & MIPS Similarities
Chapter 2 — Instructions: Language of the Computer — 41
Compare and Branch in ARM

Uses condition codes for result of an
arithmetic/logical instruction



Negative, zero, carry, overflow
Compare instructions to set condition codes
without keeping the result
Each instruction can be conditional


Top 4 bits of instruction word: condition value
Can avoid branches over single instructions
Chapter 2 — Instructions: Language of the Computer — 42
Instruction Encoding
Chapter 2 — Instructions: Language of the Computer — 43

Evolution with backward compatibility

8080 (1974): 8-bit microprocessor


8086 (1978): 16-bit extension to 8080


Adds FP instructions and register stack
80286 (1982): 24-bit addresses, MMU


Complex instruction set (CISC)
8087 (1980): floating-point coprocessor


Accumulator, plus 3 index-register pairs
§2.17 Real Stuff: x86 Instructions
The Intel x86 ISA
Segmented memory mapping and protection
80386 (1985): 32-bit extension (now IA-32)


Additional addressing modes and operations
Paged memory mapping as well as segments
Chapter 2 — Instructions: Language of the Computer — 44
The Intel x86 ISA

Further evolution…

i486 (1989): pipelined, on-chip caches and FPU


Pentium (1993): superscalar, 64-bit datapath



New microarchitecture (see Colwell, The Pentium Chronicles)
Pentium III (1999)


Later versions added MMX (Multi-Media eXtension)
instructions
The infamous FDIV bug
Pentium Pro (1995), Pentium II (1997)


Compatible competitors: AMD, Cyrix, …
Added SSE (Streaming SIMD Extensions) and associated
registers
Pentium 4 (2001)


New microarchitecture
Added SSE2 instructions
Chapter 2 — Instructions: Language of the Computer — 45
The Intel x86 ISA

And further…


AMD64 (2003): extended architecture to 64 bits
EM64T – Extended Memory 64 Technology (2004)



Intel Core (2006)


Intel declined to follow, instead…
Advanced Vector Extension (announced 2008)


Added SSE4 instructions, virtual machine support
AMD64 (announced 2007): SSE5 instructions


AMD64 adopted by Intel (with refinements)
Added SSE3 instructions
Longer SSE registers, more instructions
If Intel didn’t extend with compatibility, its
competitors would!

Technical elegance ≠ market success
Chapter 2 — Instructions: Language of the Computer — 46
Basic x86 Registers
Chapter 2 — Instructions: Language of the Computer — 47
Basic x86 Addressing Modes


Two operands per instruction
Source/dest operand
Second source operand
Register
Register
Register
Immediate
Register
Memory
Memory
Register
Memory
Immediate
Memory addressing modes




Address in register
Address = Rbase + displacement
Address = Rbase + 2scale × Rindex (scale = 0, 1, 2, or 3)
Address = Rbase + 2scale × Rindex + displacement
Chapter 2 — Instructions: Language of the Computer — 48
x86 Instruction Encoding

Variable length
encoding


Postfix bytes specify
addressing mode
Prefix bytes modify
operation

Operand length,
repetition, locking, …
Chapter 2 — Instructions: Language of the Computer — 49
Implementing IA-32

Complex instruction set makes
implementation difficult

Hardware translates instructions to simpler
microoperations





Simple instructions: 1–1
Complex instructions: 1–many
Microengine similar to RISC
Market share makes this economically viable
Comparable performance to RISC

Compilers avoid complex instructions
Chapter 2 — Instructions: Language of the Computer — 50

Powerful instruction  higher performance


Fewer instructions required
But complex instructions are hard to implement



May slow down all instructions, including simple ones
§2.18 Fallacies and Pitfalls
Fallacies
Compilers are good at making fast code from simple
instructions
Use assembly code for high performance


But modern compilers are better at dealing with
modern processors
More lines of code  more errors and less
productivity
Chapter 2 — Instructions: Language of the Computer — 51
Fallacies

Backward compatibility  instruction set
doesn’t change

But they do accrete more instructions
x86 instruction set
Chapter 2 — Instructions: Language of the Computer — 52
Pitfalls

Sequential words are not at sequential
addresses


Increment by 4, not by 1!
Keeping a pointer to an automatic variable
after procedure returns


e.g., passing pointer back via an argument
Pointer becomes invalid when stack popped
Chapter 2 — Instructions: Language of the Computer — 53

Design principles
1.
2.
3.
4.

Layers of software/hardware


Simplicity favors regularity
Smaller is faster
Make the common case fast
Good design demands good compromises
§2.19 Concluding Remarks
Concluding Remarks
Compiler, assembler, hardware
MIPS: typical of RISC ISAs

c.f. x86
Chapter 2 — Instructions: Language of the Computer — 54
Concluding Remarks

Measure MIPS instruction executions in
benchmark programs


Consider making the common case fast
Consider compromises
Instruction class
MIPS examples
SPEC2006 Int
SPEC2006 FP
Arithmetic
add, sub, addi
16%
48%
Data transfer
lw, sw, lb, lbu,
lh, lhu, sb, lui
35%
36%
Logical
and, or, nor, andi,
ori, sll, srl
12%
4%
Cond. Branch
beq, bne, slt,
slti, sltiu
34%
8%
Jump
j, jr, jal
2%
0%
Chapter 2 — Instructions: Language of the Computer — 55