Load and Store - Iowa State University
Download
Report
Transcript Load and Store - Iowa State University
CprE 381 Computer Organization and Assembly
Level Programming, Fall 2013
Exam 1 Review
Dr. Zhao Zhang
Iowa State University
What We Have Learned
Ch. 1: Computer Abstraction and
Technology
Technology Trends
CPU Performance
Instruction count, CPI, and cycle time
Processor power efficiency
Processor manufacturing and cost
Chapter 1 — Computer Abstractions and Technology — 2
Question Styles and
Coverage
Short conceptual questions
Calculation questions
Performance improvement (speedup)
Power rate and energy saving
CPU time, CPI, Instruction Count, Cycle Time
CPU time = # Cycles × CT = IC × CPI × CT
Speedup = Old Time / New Time
The coverage excludes
Manufacturing and cost
Chapter 1 — Computer Abstractions and Technology — 3
Question 1
A MIPS processor runs at 1.0GHz, and for a given
benchmark program its CPI is 1.5. A design
optimization will improve the clock rate to 1.5GHz
and increase the CPI to 1.8. What is the speedup
from the optimization?
Instruction count remains the same
Clock rate change: 1.5/1.0 = 1.5x
Cycle time improvement factor is 1.50x
CPI change: 1.8/1.5 = 1.2x
Improvement factor is 0.83x (degradation)
Overall performance improvement is 1.50*0.83 = 1.25x
Chapter 1 — Computer Abstractions and Technology — 4
Question 2
A processor spends 60% time on load/store
instructions. A new design improve
load/store performance by 2.0 times. What is
the overall performance improvement?
Amdahl’s Law: Speedup = 1/((1-f)+f/s)
f: Fraction of time that the optimization applies to
s: The improvement factor of the optimization
Speedup = 1/(0.4 + 0.6/2.0) = 1/0.7 = 1.43
Chapter 1 — Computer Abstractions and Technology — 5
What We Have Learned
Ch. 2, Instructions: Language of the
Computer
Instruction set architecture
MIPS binary instruction format
Plus floating-point instructions
Chapter 1 — Computer Abstractions and Technology — 6
Question 3
Translate the following C statement into MIPS.
Variables f, g, h are global and located at
100($gp), 104($gp) and 108($gp), respectively.
extern int f, g, h;
f = g + 4 * h;
Try to predict how many instructions that you have
to use
Chapter 1 — Computer Abstractions and Technology — 7
Question 3
# Load g, load h, multiply, add, store
lw
lw
sll
add
sw
$t0,
$t1,
$t1,
$t0,
$t0,
104($gp)
108($gp)
$t1, 2
$t0, $t1
100($gp)
#
#
#
#
#
load g
load h
4*h
g+4*h
store f
Chapter 1 — Computer Abstractions and Technology — 8
Exam Strategy
In your exam, write comments with the MIPS
code
It helps you write the code
It helps the grader understand your code
You may get more partial credit
In case your code is not 100% correct
Chapter 1 — Computer Abstractions and Technology — 9
Load and Store
Three factors: address, size and extension
Load/store word: lw, sw
Half word: lh, lhu, sh
Byte: lb, lbu, sb
Choose sign extension or zero extension,
when loading a half word or a byte
Floating points load and store
Single precision: lwc1, swc1
Double precision: ldc1, sdc1
Chapter 1 — Computer Abstractions and Technology — 10
Array access
Load from an array element
extern unsigned short X[];
h = X[i];
Assume h in $s2, X in $s0, i in $s1.
sll
$t0, $s1, 1
# $t0=i*2
add
$t0, $s0, $t0
# $t0=&X[i]
lhu
$s2, 0($t0)
# h=X[i]
Chapter 1 — Computer Abstractions and Technology — 11
Array Access
Store to an array element
extern int Y[];
Y[j] = g;
Assume g in $s2, Y in $s0, j in $s1.
sll
$t0, $s1, 2
# $t0=j*4
add
$t0, $s0, $t0
# $t0=&Y[j]
sw
$s2, 0($t0)
# Y[j]=g
Chapter 1 — Computer Abstractions and Technology — 12
Array Access
Load and store floating point numbers
extern double X[], Y[];
Y[i] = X[i];
Assume i in $s0, X in $a0, j in $a1
sll
add
ldc1
add
sdc1
$t0,
$t0,
$f0,
$t1,
$f0,
$s0, 3
$a0, $t0
0($t0)
$a1, $t0
0($t1)
#
#
#
#
#
$t0=8*i
$t0=&X[i]
$f0:f1=X[i]
$t1=&Y[i]
$f0:f1=Y[i]
Chapter 1 — Computer Abstractions and Technology — 13
16-bit and 32-bit Constants
Load a 16-bit immediate
f = 0x1000; // f in $s0
addi $s0, 0x1000
Load an 32-bit immediate
f = 0xFFFF1000;
lui
ori
$s0, 0xFFFF
$s0, $s0, 0x1000
Chapter 1 — Computer Abstractions and Technology — 14
Pointer Access
Pointer access
int h, *p;
Assume h in $t0, p in $s0.
h = *p;
lw
$t0, 0($s0)
# h = *p
*p = h;
sw
$t0, 0($s0)
# h = *p
Chapter 1 — Computer Abstractions and Technology — 15
Branches
Only two branches in the original MIPS
beq
rs, rt, label
bne
rs, rt, label
Branch if true/non-zero
bne
rs, $zero, label
Branch if false/zero
beq
rs, $zero, label
Chapter 1 — Computer Abstractions and Technology — 16
If-else Statement
Evaluate condition, branch if false
if (a < 0)
a = -a;
Assume a in $s0
slt
beq
sub
endif:
$t0, $s0, $zero # a < 0?
endif
# false? skip
$s0, $zero, $s0 # a = -a
Chapter 1 — Computer Abstractions and Technology — 17
If-else Structure
Evaluate condition, branch if false
if (a > b) max = a; else max = b;
Assume max in $s2, a in $s0, b in $s1
slt
beq
add
j
else: add
endif:
$t0, $s1, $s0
$t0, $zero, else
$s2, $s0, $zero
endif
$s2, $s1, $zero
# b < a
# false?
# max = a
# max = b
Chapter 1 — Computer Abstractions and Technology — 18
FOR Loop
Control and Data Flow
Graph
Linear Code Layout
Init-expr
Init-expr
Jump
For-body
For-body
Incr-expr
Incr-expr
Test cond
Cond
F
T
Branch if true
(Optional: prologue and epilogue)
19
Function with For-loop
Translate the following C function into MIPS
short checksum(short X[], int N)
{
int i;
short checksum = 0;
for (i = 0; i < N; i++)
checksum = checksum ^ X[i];
return checksum;
}
Chapter 1 — Computer Abstractions and Technology — 20
Function with For-loop
checksum:
addi
addi
j
# X=>$a0, N=>$a1, i=>$t0,
# checksum=>$v0
$v0, $zero, 0 # checksum = 0
$t0, $zero, 0 # i = 0
loop_cond
loop:
sll
add
lh
xor
addi
loop_cond:
slt
bne
jr
$t1,
$t1,
$t1,
$v0,
$t0,
$t0, 1
$a0, $t1
0($t1)
$v0, $t1
$t0, 1
#
#
#
#
#
i*2
&X[i]
load X[i]
checksum ^= X[i]
i++
$t1, $t0, $a1# i < N
$t1, $zero, loop # loop
$ra
Chapter 1 — Computer Abstractions and Technology — 21
Leaf and Non-Leaf Functions
Leaf function doesn’t call another function
Stack frame is not necessary
Prefer to use temp registers (t-registers)
Non-leaf function calls some other
functions(s)
Must use a stack frame, has to save $ra
Usually has to use save registers (s-registers)
Chapter 1 — Computer Abstractions and Technology — 22
Non-Leaf Function
What is the size of the frame?
extern short xor(short, short);
short checksum(short X[], int N)
{
int i;
short checksum = 0;
for (i = 0; i < N; i++)
checksum = xor(checksum, X[i]);
return checksum;
}
Chapter 1 — Computer Abstractions and Technology — 23
Non-Leaf Function
X, N, i, and $ra must be preserved
Need a stack frame of 16 bytes
addi
sw
sw
sw
sw
$sp,
$ra,
$s2,
$s1,
$s0,
$sp, -16
12($sp)
8($sp)
4($sp)
0($sp)
add
add
addi
$s0, $a0, $zero # $s0 = X
$s1, $a1, $zero # $s1 = N
$s2, $zero, 0
# i = 0
# for return address
Chapter 1 — Computer Abstractions and Technology — 24
Non-Leaf Function
…
# function body
lw
lw
lw
lw
addi
jr
$s0,
$s1,
$s2,
$ra,
$sp,
$ra
0($sp)
4($sp)
8($sp)
12($sp)
$sp, 16
Chapter 1 — Computer Abstractions and Technology — 25
Register Name and Call Convention
NAME
Number
6
Preserved?
$zero
0
Constant value 0
N/A
$at
1
Assembler temporary
No
$v0-$v1
2-3
Values for function results and expression
evaluation
No
$a0-$a3
4-7
Arguments
No
$t0-$t7
8-15
Temporaries
No
$s0-$s7
16-23
Saved temporaries
Yes
$t8-$t9
24-25
Temporaries
No
$k0-$k1
26-27
Saved for OS kernel
No
6
24
Use
$gp
28
Global pointer
Yes
$sp
29
Stack pointer
Yes
$fp
30
Frame pointer
Yes
$ra
31
Return address
Yes
Chapter 1 — Computer Abstractions and Technology — 26
MIPS Call Convention: FP
The first two FP parameters in registers
1st parameter in $f12 or $f12:$f13
A double-precision parameter takes two registers
2nd FP parameter in $f14 or $f14:$f15
Extra parameters in stack
$f0 stores single-precision FP return value
$f0:$f1 stores double-precision FP return
value
$f0-$f19 are FP temporary registers
$f20-$f31 are FP saved temporary registers
Chapter 1 — Computer Abstractions and Technology — 27
FP Example: Call a Function
extern double a, b, c;
extern double max(double, double);
c = max(a, b);
ldc1
ldc1
jal
sdc1
$f12, 100($gp)
$f14, 108($gp)
max
$f0, 116($gp)
# $f12:$f13 = a
# $f14:$f15 = b
# c = $f0:$f1
Assume a, b, c assigned to 100($gp), 108($gp),
and 116($gp)
Chapter 1 — Computer Abstractions and Technology — 28
FP Instructions in MIPS
Single-precision arithmetic
add.s, sub.s, mul.s, div.s
e.g., add.s $f0, $f1, $f6
Double-precision arithmetic
add.d, sub.d, mul.d, div.d
e.g., mul.d $f4, $f4, $f6
Chapter 3 — Arithmetic for Computers — 29
FP Instructions in MIPS
Single- and double-precision comparison
c.xx.s, c.xx.d (xx is eq, lt, le, …)
Sets or clears FP condition-code bit
e.g. c.lt.s $f3, $f4
Branch on FP condition code true or false
bc1t, bc1f
e.g., bc1t TargetLabel
Chapter 1 — Computer Abstractions and Technology — 30