COMP 3221 Microprocessors and Embedded Systems Lecture 4: Memory Access

Download Report

Transcript COMP 3221 Microprocessors and Embedded Systems Lecture 4: Memory Access

COMP 3221
Microprocessors and Embedded Systems
Lecture 4: Memory Access
http://www.cse.unsw.edu.au/~cs3221
March, 2004
Modified from Notes by Saeid Nooshabadi
Elec2041 lec-11-mem-I.1
Saeid Nooshabadi
Data Transfer: Memory to Reg (#4/4)
° Example: ldr a1, [v1, #8]
This instruction will take the pointer in v1, add 8 bytes to it, and
then load the value from the memory pointed to by this
calculated sum into register a1
arr[0]
° Notes:
#8
arr[1]
25 arr[2]
arr[3]
• v1 is called the base register
• 8 is called the offset
• offset is generally used in accessing elements of array: base reg
points to beginning of array
° Example: ldr a1, [v1, v2]
This instruction will take the pointer in v1, add an index offset in
register v2 to it, and then load the value from the memory
pointed to by this calculated sum into register a1
° Notes:
• v1 is called the base register
• v2 is called the index register
• index is generally used in accessing elements of array using an
variable index: base reg points to beginning of array
Saeid Nooshabadi
Elec2041 lec-11-mem-I.9
Data Transfer: Other Mem to Reg Variants (#1/2)
° Pre Indexed Load Example:
ldr a1, [v1,#12]!
This instruction will take the pointer in v1, add 12 bytes to
it, and then load the value from the memory pointed to by
this calculated sum into register a1.
Subsequently, v1 is updated by computed sum of v1 and
12, ( v1  v1 + 12).
° Pre Indexed Load Example:
ldr a1, [v1, v2]!
This instruction will take the pointer in v1, add an index
offset in register v2 to it, and then load the value from the
memory pointed to by this calculated sum into register a1.
Subsequently, v1 is updated by computed sum of v1 and v2,
(v1 v1 + v2).
Elec2041 lec-11-mem-I.10
Saeid Nooshabadi
Data Transfer: Other Mem to Reg Variants (#2/2)
° Post Indexed Load Example:
ldr a1, [v1], #12
This instruction will load the value from the memory pointed
to by value in register v1 into register a1.
Subsequently, v1 is updated by computed sum of v1 and 12,
( v1  v1 + 12).
° Example: ldr a1, [v1], v2
This instruction will load the value from the memory pointed
by value in register v1, into register a1.
Subsequently, v1 is updated by computed sum of v1 and v2,
( v1  v1 + v2).
Elec2041 lec-11-mem-I.11
Saeid Nooshabadi
Data Transfer: Reg to Memory (1/2)
° Also want to store value from a register into
memory
° Store instruction syntax is identical to Load
instruction syntax
° Instruction Name:
str (meaning Store from Register, so 32 bits
or one word are stored from register to
memory at a time)
Elec2041 lec-11-mem-I.12
Saeid Nooshabadi
Data Transfer: Reg to Memory (2/2)
° Example: str a1,[v1, #12]
This instruction will take the pointer in v1, add 12 bytes to
it, and then store the value from register a1 into the
memory address pointed to by the calculated sum
° Example: str a1,[v1, v2]
This instruction will take the pointer in v1, adds register v2
to it, and then store the value from register a1 into the
memory address pointed to by the calculated sum.
Elec2041 lec-11-mem-I.13
Saeid Nooshabadi
Data Transfer: Other Reg to Mem Variants (#1/2)
° Pre Indexed Store Example:
str a1, [v1,#12]!
This instruction will take the pointer in v1, add 12 bytes to it,
and then store the value from register a1 into the memory
address pointed to by the calculated sum.
Subsequently, v1 is updated by computed sum of v1 and 12,
( v1  v1 + 12).
° Pre Indexed Store Example:
str a1,[v1, v2]!
This instruction will take the pointer in v1, adds register v2 to
it, and then store the value from register a1 into the memory
address pointed to by the calculated sum.
Subsequently, v1 is updated by computed sum of v1 and v2 (
v1  v1 + v2).
Elec2041 lec-11-mem-I.14
Saeid Nooshabadi
Data Transfer: Other Reg to Mem Variants (#2/2)
° Post Indexed Store Example:
str a1, [v1],#12
This instruction will store the value from register a1 into the
memory address pointed to by register v1.
Subsequently, v1 is updates by computed sum of v1 and 12,
( v1  v1 + 12).
° Post Indexed Store Example:
str a1,[v1], v2
This instruction will store the value from register a1 into the
memory address pointed to by register v1.
Subsequently, v1 is updated by computed sum of v1 and v2,
( v1  v1 + v2).
Elec2041 lec-11-mem-I.15
Saeid Nooshabadi
Pointers v. Values
° Key Concept: A register can hold any 32-bit
value. That value can be a (signed) int, an
unsigned int, a pointer (memory
address), etc.
° If you write
add v3,v2,v1
then v1 and v2
better contain values
° If you write
ldr a1,[v1]
then v1 better contain a pointer
° Don’t mix these up!
Elec2041 lec-11-mem-I.16
Saeid Nooshabadi
Addressing: Byte vs. halfword vs. word
° Every word in memory has an address, similar to an index
in an array
° Early computers numbered words like C numbers
elements of an array:
• Memory[0], Memory[1], Memory[2], …
Called the “address” of a word
° Computers needed to access 8-bit bytes, half
words (2 bytes/halfword) as well as words (4
bytes/word)
° Today machines address memory as bytes, hence
• Half word addresses differ by 2
Memory[0], Memory[2], Memory[4], …
• word addresses differ by 4
Memory[0], Memory[4], Memory[8], …
Elec2041 lec-11-mem-I.17
Saeid Nooshabadi
Compilation with Memory
° What offset in ldr to select my_Array[8] in C?
° 4x8=32 to select my_Array[8]: byte v. word
° Compile by hand using registers:
g = h + my_Array[8];
• g: v1, h: v2, v3: base address of my_Array
° 1st transfer from memory to register:
ldr v1, [v3,#32] ; v1 gets my_Array[8]
• Add 32 to v3 to select my_Array[8], put into v1
° Next add it to h and place in g
add v1,v2,v1
; v1 = h+ my_Array[8]
Elec2041 lec-11-mem-I.18
Saeid Nooshabadi
Same thing in pictures
°v3 contains the address of
the Base of the my_Array .
°ldr v1, [v3,#32]
0
my_Array
my_Array[0]
Adds offset “8 x 4 = 32” to
select my_Array[8], and
puts into a1
32
my_Array[8]
v1
v2
v3
0xFFFFFFFF
v1 + v2
g
h
°The value in register v3 is an
address
°Think of it as a pointer into
memory
°add v1, v2,v1
The value in register
v1 is the sum of v2
and v1.
Elec2041 lec-11-mem-I.19
Saeid Nooshabadi
Compile with variable index
° What if array index not a constant?
g = h + my_Array[i];
• g: v1, h: v2, i: v3,
v4: base address of my_Array
° To load my_Array[i] into a register, first turn i
into a byte address; multiply by 4
° How multiply using adds?
• i + i = 2i, 2i + 2i = 4i
mov a1,v3
add a1,a1
add a1,a1
; a1 = i
; a1 = 2*i
; a1 = 4*i
Better alternative: mov a1, v3, lsl #2
Elec2041 lec-11-mem-I.20
Saeid Nooshabadi
Compile with variable index, con’t
° Now load my_Array[i]= my_Array[0] + 4*i
into v1 register:
ldr v1, [v4, a1]
;v1= my_Array[i]
° Finally add to h to it and put sum in g:
add v1,v1, v2
Elec2041 lec-11-mem-I.21
;g = h + my_Array[i]
Saeid Nooshabadi
Compile with variable index: Summary
° C statement:
g = h + my_Array[i];
° Compiled ARM assembly instructions:
mov a1, v3, lsl #2
; a1 = 4*i
ldr v1, [v4, a1]
Base Reg
;v1= my_Array[i]
Index Reg
° Finally add to h to it and put sum in g:
add v1,v1, v2
Elec2041 lec-11-mem-I.22
;g = h + my_Array[i]
Saeid Nooshabadi
Compile with variable index Example
° Compile this into ARM code:
B_Array[i] = h + A_Array[i];
• h: v1, i:v2, v3:base address of A_Array,
v4:base address of B_Array
Elec2041 lec-11-mem-I.23
Saeid Nooshabadi
Compile with variable index Example (Solution)
°Compile this C code into ARM:
B_Array[i] = h + A_Array[i];
• h: v1, i:v2, v3:base address of A_Array, v4:base
address of B_Array
mov a1, v2, lsl #2
;a1 = 4*i
ldr a2, [v3, a1] ;
v4 + a1 =
;addrB_Array[i]
Base Reg Index Reg ;a2= A_array[i]
add a2, a2, v1
;a2 = h + A_Array[i];
str a2, [v4, a1]
; v4 + a1 =
;addrB_Array[i]
;B_Array[i]= a2
Elec2041 lec-11-mem-I.24
Saeid Nooshabadi
Notes about Memory
° Pitfall: Forgetting that sequential word
addresses in machines with byte
addressing do not differ by 1.
• Many an assembly language programmer has toiled over
errors made by assuming that the address of the next
word can be found by incrementing the address in a
register by 1 instead of by the word size in bytes.
• So remember that for both ldr and str, the sum of the
base address and the offset must be a multiple of 4 (to be
word aligned)
Elec2041 lec-11-mem-I.26
Saeid Nooshabadi
More Notes about Memory: Alignment (#1/2)
° ARM requires that all words start at addresses
that are multiples of 4 bytes
3
2
1
0
Aligned
Not
Aligned
° Called Alignment: objects must fall on address
that is multiple of their size.
° Some machines like Intel allow non-aligned
accesses
Elec2041 lec-11-mem-I.27
Saeid Nooshabadi
More Notes about Memory: Alignment (#2/2)
° Non-Aligned memory access causes byte
rotation in right direction within the word
0 1 2 3
0x80 09 82 a2 2e
0x83 0x82 0x81 0x80
ldr a1, 0x80
a1 = 0x0982a22e
ldr a1, 0x81
a1 = 0x2e0982a2
ldr a1, 0x82
a1 = 0xa22e0982
ldr a1, 0x83
a1 = 0x82a22e09
Elec2041 lec-11-mem-I.28
Saeid Nooshabadi
Role of Registers vs. Memory
° What if more variables than registers?
• Compiler tries to keep most frequently used variable in
registers
• Writing less common to memory: spilling
° Why not keep all variables in memory?
• Smaller is faster:
registers are faster than memory
• Registers more versatile:
- ARM Data Processing instructions can read 2,
operate on them, and write 1 per instruction
- ARM data transfer only read or write 1 operand per
instruction, and no operation
Elec2041 lec-11-mem-I.29
Saeid Nooshabadi
Overview
° Word/ Halfword/ Byte Addressing
° Byte ordering
° Signed Load Instructions
° Instruction Support for Characters
Elec2041 lec-11-mem-I.30
Saeid Nooshabadi
Data Transfer: More Mem to Reg Variants (#1/2)
1 word = 4 Bytes
° Load Byte Example:
ldrb a1, [v1,#12]
This instruction will take the pointer in v1, add
12 bytes to it, and then load the byte value
from the memory pointed to by this
calculated sum into register a1.
v1
+12
° Load Byte Example: a1 0 0 0
ldrb a1, [v1, v2]
This instruction will take the pointer in v1,
add an index offset in register v2 to it, and
then load the byte value from the memory
pointed to by this calculated sum into
register a1.
Elec2041 lec-11-mem-I.34
Saeid Nooshabadi
Data Transfer: More Mem to Reg Variants (#2/2)
1 word = 4 Bytes
° Load Half Word Example:
ldrh a1, [v1,#12]
This instruction will take the pointer in v1, add
12 bytes to it, and then load the half word
value from the memory pointed to by this
calculated sum into register a1.
v1
+12
° Load Byte Example: a1 0 0 0
ldrh a1, [v1, v2]
This instruction will take the pointer in v1,
add an index offset in register v2 to it, and
then load the half word value from the
memory pointed to by this calculated sum
into register a1.
Elec2041 lec-11-mem-I.35
Saeid Nooshabadi
Data Transfer: More Reg to Mem Variants (#1/2)
° Store Byte Example:
1 word = 4 Bytes
strb a1, [v1,#12]
This instruction will take the
pointer in v1, add 12 bytes to it,
and then store the value from
lsb Byte of register a1 into the
memory address pointed to by
the calculated sum.
v1
+12
° Store Byte Example:
a1
strb a1,[v1, v2]
This instruction will take the
pointer in v1, adds register v2 to
it, and then store the value from
lsb Byte of register a1 into the
memory address pointed to by
the calculated sum.
Elec2041 lec-11-mem-I.36
Saeid Nooshabadi
Data Transfer: More Reg to Mem Variants (#2/2)
° Store Half Word Example:
1 word = 4 Bytes
strh a1, [v1,#12]
This instruction will take the
pointer in v1, add 12 bytes to it,
and then store the value from
half word of register a1 into the
memory address pointed to by
the calculated sum.
v1
° Store Half Word Example:
strh a1,[v1, v2]a1
+12
0
This instruction will take the
pointer in v1, adds register v2 to it,
and then store the value from half
word of register a1 into the
memory address pointed to by the
calculated sum.
Elec2041 lec-11-mem-I.37
Saeid Nooshabadi
Compilation with Memory (Byte Addressing)
° What offset in ldr to select my_Array[8]
(defined as Char) in C?
° 1x8=8 to select my_Array[8]: byte
° Compile by hand using registers:
g = h + my_Array[8];
• g: v1, h: v2, v3:base address of my_Array
° 1st transfer from memory to register:
ldrb v1, [v3,#8] ; v1 gets my_Array[8]
• Add 8 to r3 to select my_Array[8], put into v1
° Next add it to h and place in g
add v1,v2,v1
; v1 = h+ my_Array[8]
Elec2041 lec-11-mem-I.38
Saeid Nooshabadi
Compilation with Memory (half word Addressing)
° What offset in ldr to select my_Array[8] (defined
as halfword) in C?
° 2x8=16 to select my_Array[8]: byte
° Compile by hand using registers:
g = h + my_Array[8];
• g: v1, h: v2, v3:base address of my_Array
° 1st transfer from memory to register:
ldrh v1, [v3, #16] ; v1 gets my_Array[8]
• Add 16 to r3 to select my_Array[8], put into v1
° Next add it to h and place in g
add v1,v2,v1
; v1 = h+ my_Array[8]
Elec2041 lec-11-mem-I.39
Saeid Nooshabadi
More Notes about Memory: Word
° How are bytes numbered in a word?
3
msb
0
little endian byte 0
100 ‘C’
2 1
0
101 ‘O’
lsb102 ’M’
103 ‘P’
1
2
3
big endian byte 0
104
105
106
107
‘3’
‘2’
‘2’
‘1’
“COMP”
“3221”
•Gulliver’s Travels: Which end of egg to open?
Cohen, D. “On holy wars and a plea for peace (data transmission).”
Computer, vol.14, (no.10), Oct. 1981. p.48-54.
‘P’
’M’
‘O’
‘C’
‘1’
‘2’
‘2’
‘3’
•Little Endian address of least significant byte: Intel
80x86, DEC Alpha,
•Big Endian address of most significant byte
HP PA, IBM/Motorola PowerPC, SGI, Sparc
•ARM is Little Endian by default, However it can be
made Big Endian by configuration.
Saeid Nooshabadi
Elec2041 lec-11-mem-I.40
100
101
102
103
104
105
106
107
Endianess Example
r0 = 0x11223344
31 24 23 16 15
87
0
11 22 33 44
STR r0, [r1]
31 24 23 16 15
87
0
31
Memory
r1 = 0x100 11 22 33 44
Little-endian
24 23 16 15
87
0
44 33 22 11 r1 = 0x100
Big-endian
LDRB r2, [r1]
31 24 23 16 15
87
00 00 00 44
r2 = 0x44
Elec2041 lec-11-mem-I.41
0
31 24 23 16 15
87
0
00 00 00 11
r2 = 0x11
Saeid Nooshabadi
Code Example
° Write a segment of code that add together
elements x to x+(n-1) of an array, where the
element x = 0 is the first element of the array.
° Each element of the array is word sized (ie. 32
bits).
° The segment should use post-indexed addressing.
° At the start of your segments, you should assume
that:
• a1 points to the start of the array.
• a2 = x
• a3 = n
a1
n elements
Elec2041 lec-11-mem-I.42
Elements
0
{
x
x+1
x + (n - 1)
Saeid Nooshabadi
Code Example: Sample Solution
add a1, a1, a2, lsl #2
add a3, a1, a3, lsl #2
mov a2, #0
Loop:
ldr a4, [a1], #4
add a2, a2, a4
cmp a1, a3
blt loop
Elec2041 lec-11-mem-I.43
; Set a1 to address
; of element x
; Set a3 to address
; of element x +(n-1)
; Initialise
;accumulator
;
;
;
;
;
;
;
;
;
;
Access element and
move to next
Add contents to
counter
Have we reached
element x+n?
If not - repeat
for next element
on exit sum
Saeid Nooshabadi
contained in
a2
Sign Extension and Load Byte & Load Half Word
° ARM instruction (ldrsb) automatically
extends “sign” of byte for load byte.
ldrsb a1, [v1,#12] ldrsb a1, [v1,v2]
31
98 76543210
SSSSSSSSSSSSSSSSSSSSSSSS S
S
° ARM instruction (ldrsh) automatically
extends “sign” of half word for load half word.
ldrsh a1, [v1,#12] ldrsh a1, [v1,v2]
15
31
98 76543210
SSSSSSSSSSSSSSSSS
Elec2041 lec-11-mem-I.44
S
Saeid Nooshabadi
Instruction Support for Characters
° ARM (and most other instruction sets)
include instructions to operate on bytes:
• move byte (ldrb) loads a byte from memory/reg, placing it in
rightmost 8 bits of a register, or vice versa
° Declares byte variables in C as “char”
° Assume x, y are declared char. x in memory at
[v1,#4]and y at [v1,#0].
What is ARM code for x = y; ?
ldrb a1, [v1,#0]
strb a1, [v1,#4]
Elec2041 lec-11-mem-I.45
; transfer
y to x
Saeid Nooshabadi
Strings in C: Example
° String simply an array of char
void strcpy (char x[], char y[]){
int i = 0; /* declare,initialize i*/
while ((x[i] = y[i]) != ’\0’) /* 0 */
i = i + 1; /* copy and test byte */
}
° function
i, addr. of x[0], addr. of y[0]: v1, a1, a2 , func
ret addr. :lr
strcpy:
mov v1, #-1
L1: add v1, v1, #1
ldrb a3, [a2,v1]
strb a3, [a1,v1]
cmp a3, #0
bne L1
mov pc, lr
Elec2041 lec-11-mem-I.46
;
;
;
;
i = -1
i =i + 1
a1= y[i]
x[i]=y[i]
; y[i]!=0
;goto L1
; return
Saeid Nooshabadi
Strings in C: Example using pointers
° String simply an array of char
void strcpy2 (char *px, char *py){
while ((*px++ = *py++) != ’\0’) /* 0 */
; /* copy and test byte */
}
° function
addr. of x[0], addr. of y[0]: v2, v3 func ret addr.:lr
strcpy:
L1: ldrb a1, [v3],#1
;a1= *py, py = py +1
strb a1, [v2],#1
;*px = *py, px = px +1
cmp a1, #0
bne L1
; py!=0 goto L1
mov pc, lr
; return
° ideally compiler optimizes code for you
Elec2041 lec-11-mem-I.47
Saeid Nooshabadi
Block Copy Transfer (#1/5)
° Consider the following code:
str
str
str
str
a1,
a2,
a3,
a4,
[v1],#4
[v1],#4
[v1],#4
[v1],#4
v1
a1
v1
a2
v1
a3
v1
a4
0x100
0x104
0x108
0x112
v1
Replace this with
stmia v1!, {a1-a4}
STMIA : STORE MULTIPLE INCREMENT AFTER
° Consider the following code:
str
str
str
str
a1,
a2,
a3,
a4,
[v1,
[v1,
[v1,
[v1,
#4]!
#4]!
#4]!
#4]!
Replace this with
stmib v1!, {a1-a4}
v1
v1
v1
v1
v1
a1
a2
a3
a4
0x100
0x104
0x108
0x112
STMIB : STORE MULTIPLE INCREMENTSaeidBEFORE
Nooshabadi
Elec2041 lec-11-mem-I.48
Block Copy Transfer (#2/5)
° Consider the following code:
str
str
str
str
a1,
a2,
a3,
a4,
[v1],#-4
[v1],#-4
[v1],#-4
[v1],#-4
v1
v1
a4
v1
a3
v1
a2
v1
a1
0x100
0x104
0x108
0x112
Replace this with
stmda v1!, {a1-a4}
STMDA : STORE MULTIPLE DECREMENT AFTER
° Consider the following code:
str
str
str
str
a1,
a2,
a3,
a4,
[v1,
[v1,
[v1,
[v1,
#-4]!
#-4]!
#-4]!
#-4]!
Replace this with
stmdb v1!, {a1-a4}
v1
a4
v1
v1
a3
a2
v1
a1
0x100
0x104
0x108
0x112
v1
STMDB : STORE MULTIPLE DECREMENT
BEFORE
Saeid Nooshabadi
Elec2041 lec-11-mem-I.49
Block Copy Transfer (#3/5)
° Consider the following code:
str
str
str
str
a1,
a2,
a3,
a4,
[v1]
[v1,#4]
[v1,#8]
[v1,#12]
v1
v1
a1
v1
a2
v1
a3
v1
a4
0x100
0x104
0x108
0x112
Replace this with
stmia v1, {a1-a4}
STMIA : STORE MULTIPLE INCREMENT AFTER
° Consider the following code:
str
str
str
str
a1,
a2,
a3,
a4,
[v1,
[v1,
[v1,
[v1,
#4]
#8]
#12]
#16]
Replace this with
stmib v1, {a1-a4}
v1
v1
v1
v1
v1
v1
a1
a2
a3
a4
0x100
0x104
0x108
0x112
STMIB : STORE MULTIPLE INCREMENTSaeidBEFORE
Nooshabadi
Elec2041 lec-11-mem-I.50
Block Copy Transfer (#4/5)
° Consider the following code:
str
str
str
str
a1,
a2,
a3,
a4,
[v1]
[v1,#-4]
[v1,#-8]
[v1,#-12]
v1
v1
a4
v1
a3
v1
a2
v1
a1
0x100
0x104
0x108
0x112
Replace this with
stmda v1, {a1-a4}
STMDA : STORE MULTIPLE DECREMENT AFTER
° Consider the following code:
str
str
str
str
a2,
a3,
a4,
a1,
[v1,#-4]
[v1,#-8]
[v1,#-12]
[v1,#16]
Replace this with
stmdb v1, {a1-a4,}
v1
v1
a4
v1
v1
a3
a2
v1
a1
0x100
0x104
0x108
0x112
v1
STMDB : STORE MULTIPLE DECREMENT
BEFORE
Saeid Nooshabadi
Elec2041 lec-11-mem-I.51
Block Data Transfer (#5/5)
° Similarly we have
• LDMIA : Load Multiple Increment After
• LDMIB : Load Multiple Increment Before
• LDMDA : Load Multiple Decrement After
• LDMDB : Load Multiple Decrement Before
For details See Chapter 3, page 61 – 62
Steve Furber: ARM System On-Chip; 2nd Ed,
Addison-Wesley, 2000, ISBN: 0-201-67519-6.
Elec2041 lec-11-mem-I.52
Saeid Nooshabadi
COMP3221 Reading Materials (Week #4)
° Week #4: Steve Furber: ARM System On-Chip; 2nd Ed,
Addison-Wesley, 2000, ISBN: 0-201-67519-6. We use
chapters 3 and 5
° ARM Architecture Reference Manual –On CD ROM
Elec2041 lec-11-mem-I.53
Saeid Nooshabadi
“And in Conclusion…” (#1/2)
° In ARM Assembly Language:
•
•
•
•
Registers replace C variables
One Instruction (simple operation) per line
Simpler is Better
Smaller is Faster
° Memory is byte-addressable, but ldr and str
access one word at a time.
° Access byte and halfword using ldrb,
ldrh,ldrsb and ldrsh
° A pointer (used by ldr and str) is just a
memory address, so we can add to it or
subtract from it (using offset).
Elec2041 lec-11-mem-I.54
Saeid Nooshabadi
“And in Conclusion…”(#2/2)
° New Instructions:
ldr, str
ldrb, strb
ldrh, strh
ldrsb, ldrsh
Elec2041 lec-11-mem-I.55
Saeid Nooshabadi