Transcript Document

Exercises Embedded Systems
William Sandqvist [email protected]
1.1 The C-function
int fac_c(int x)
fac_c(5)
{
int f;
calculates
if(x <= 0) f = 0;
else
1*5*4*3*2*1=120
{
f = 1;
while(x > 1)
{
f = f * x;
x--;
}
}
We should document our
return f;
}
code. You can find a
Flowchart
int fac_c(int x)
x<=0
else
N
Y if
f=0
f=1
while
x>1
N
Y
f=f*x
x = x-1
flowchart tool in
Word or Powerpoint.
This could be useful for lab
reports.
William Sandqvist [email protected]
return f
End
main in C
Message to the linker:
#include<stdio.h>
fac_asm() is an external function
extern int fac_asm(int);
(from an other file).
int fac_c(int);
int main(void)
{
int c_result, asm_result;
int x;
while(1)
{
printf(”Enter a number: ”);
scanf(”%d”, &x);
c_result = fac_c(x);
asm_result = fac_asm(x);
printf(”C-result: %d\n”, c_result);
printf(”Asm-result: %d\n”,asm_result);
}
return 0;
}
William Sandqvist [email protected]
Structure diagram?
To document the program structure, a structure diagram could be useful.
It could be directly translated into structured programming.
( while, if, else … )
But in assembler, we are
not interested in the
program structure, but in
the program flow.
William Sandqvist [email protected]
The Flowchart
The flowchart could be
directly translated to
assembler code.
William Sandqvist [email protected]
How to program the Nios
processor?
The Nios processor is the Altera version of a MIPS processor.
It is designed to make efficient use of the resources in a FPGA.
It comes in three versions: Small – Medium – Large …
William Sandqvist [email protected]
Nios II registers 0…15
Use as constant ”0”!
If you call a subroutine,
save the contents of
the registers you’ve
used on stack!
William Sandqvist [email protected]
Nios II registers 16…31
Points to the stack!
William Sandqvist [email protected]
Register operations, R-type instructions
William Sandqvist [email protected]
Program constants, I-type instructions
Some pseudoinstructions:
movi rB, IMMED  addi rB,r0,IMMED
movia rB,label  orhi rB,r0,%hiadj(label)
addi rB,r0,%lo(label)
William Sandqvist [email protected]
I-type, Branch
Pseudoinstruction:
ble
branch if less than or equal signed
bge is the ble with register A and B
swapped!
The IMM16 adress is effectively a 18 Byte-adress because instructions
must be word-aligned.
William Sandqvist [email protected]
Conditional operators of C
Compare two registers and branch relative if the expression is true.
All C-language conditional operators have assembly instructions (or
pseudoinstructions).
William Sandqvist [email protected]
Memory content, Load and Store
Store in memory …
stw r6, 100(rA)
William Sandqvist [email protected]
The call and ret instructions
William Sandqvist [email protected]
From Flowchart to assembler
William Sandqvist [email protected]
Assembler
fac_asm has to be made known to other files
.global fac_asm
.text
# Parameter in r4 (and if needed in r5,
# Return value in r2 (and r3 if long or
# we can use r2 and r3 for calculations
# r8 … r15 must be saved by caller of a
r6, r7)
double)
until return
sub
fac_asm:
# int r2 fac_asm(int r4 x), the function prototype
# r3 : for constant ”1”
if:
ble r4, r0, else # if(x <= 0)
movi r3, 1
# constant ”1”
mov r2,r3
# f = 1
while: ble r4,r3, endsub # while(x>1){
mul r2,r2,r4
# f = f*x
sub r4,r4,r3
# x = x - 1
br while
# }
else:
mov r2, r0
# f = 0
endsub: ret
# return r2
.end
William Sandqvist [email protected]
Exercises Embedded Systems
William Sandqvist [email protected]
2.1 Prioritized interrupts
William Sandqvist [email protected]
Exercises Embedded Systems
William Sandqvist [email protected]
2.2 Input/Output
R/W reverses the
direction of the
databuss.
CS Chip Select
enables the chip
Connect a 8 register memory-mapped peripheral to the CPU.
The CPU has 8 bit address and data busses.
The peripheral should have registeraddresses 0x10…0x17.
William Sandqvist [email protected]
Decode - doorlock
How to open the doorlock?
Press 4 (d) and 8 (h)
simultaneously but don’t
press any other key!
William Sandqvist [email protected]
Connections
0x10
0x11
0x12
0x13
0x14
0x15
0x16
0x17
=
=
=
=
=
=
=
=
00010.000
00010.001
00010.010
00010.011
00010.100
00010.101
00010.110
00010.111
Decoder
CS  A7  A6  A5  A4  A3
CS RS2RS1RS0
William Sandqvist [email protected]
Why memory cache?
William Sandqvist [email protected]
Exercises Embedded Systems
William Sandqvist [email protected]
3.2 Hitrate and accesstime
a) tAVG = 8 ns h = ?
h is hitrate.
b) tAVG = 15 ns h = ?
c) tAVG = 6 ns h = ?
William Sandqvist [email protected]
Hitrate calculations
tAVG  h  tC  (1  h)  tM  h  (tC  tM )  tM
tM  tAVG
h
t M  tC
a) h 
70  8
 0,954
70  5
c) h 
tAVG 8, 15, 6 ns
b) h 
70  15
 0,846
70  5
70  6
 0,985
70  5
William Sandqvist [email protected]
Exercises Embedded Systems
William Sandqvist [email protected]
Exercises Embedded Systems
William Sandqvist [email protected]
3.1 Memory system
In this example. The Blocktransfer is Cache-line of 2 words.
The memory is Byteorganized, but we could
draw it as if it was
organized in Memory-lines
with the same size as the
Cache-line.
This will simplify all figures.
Direct addressmapping:
Memory-line: i  Cache-line: j = i % K
William Sandqvist [email protected]
Why Blocktransfer?
”1 word” 3TBus/word
”2 words” (3+1)/2 = 2TBus/word
”4 words” (3+1+1+1)/4 = 1.5TBus/word
• To transfer 1 ”random” word in memory takes three buscykles 3TBus/word
( 2 TBUS are Waitstates)
• To transfer a ”Burst” of 2 words takes 3+1 buscykles, 4/2 = 2TBus/word
• To transfer a ”Burst” of 4 words takes 3+1+1+1 buscykler, 6/4 = 1,5TBus/word
• To transfer a ”Burst” of 8 words takes 3+1+1+1+1+1+1+1 buscykles, 10/8 = 1,25TBus/word
Remember, to make these gains, you must have use for most of the
transfered words – otherwise blocktransfer could be even slower than
random transfer!
This is just an example. Other accesspatterns exists, eg. 5+3+3+3 and so on.
The busclock is derived from the processorclock, perhaps TBUS = 10*TCPU.
William Sandqvist [email protected]
Mapping of memory address
Memory 4kB 4*210 = 212 Bytes. Memory address: mmmmmmmmmmmm
Cache 8 Word, 8*32 Bytes. Cache-line 2 Word, 2*4 Byte.
Cache-address: ll.w.bb
Memory – Cache mapping:
mmmmmmm.mm.m.mm
ttttttt.ll.w.bb
The Adress Tag
Adress in Cache is irrespective of tag-bits!
Our example: Data-adresses are acessed four times in this order:
0x010, 0x1FC, 0x168, 0x008, 0x014, 0x1F8, 0x00C
William Sandqvist [email protected]
Memory and Cache
Data is acessed from three
different locations (Tags), but
they will map to the same lines in
this small cache!
William Sandqvist [email protected]
Direct mapped Cache
Memory-address
mem-location
Tag (#)
Cache
.ll.w.bb
0x010
0000000.10.0.00
0000000. (0)
.10.0.00
2(0)
0x1FC
0001111.11.1.00
0001111. (1)
.11.1.00
3(1)
0x168
0001011.01.0.00
0001011. (2)
.01.0.00
1(2)
0x008
0000000.01.0.00
0000000. (0)
.01.0.00
1(0)
0x014
0000000.10.1.00
0000000. (0)
.10.1.00
2(0)
0x1F8
0001111.11.0.00
0001111. (1)
.11.0.00
3(1)
0x00C
0001111.01.1.00
0001111. (1)
.01.1.00
1(1)
William Sandqvist [email protected]
Line#(Tag#)
Program execution
Data-adresses are acessed four times in this order:
0x010, 0x1FC, 0x168, 0x008, 0x014, 0x1F8, 0x00C
Cache access, line#(tag#):
2(0)3(1)1(2)1(0)2(0)3(1)1(1)
2(0)3(1)1(2)1(0)2(0)3(1)1(1)
2(0)3(1)1(2)1(0)2(0)3(1)1(1)
2(0)3(1)1(2)1(0)2(0)3(1)1(1)
CCCMHHM
HHMMHHM
HHMMHHM
HHMMHHM
C, ColdMiss = line entry to a previously unused cache memory
(This counts as a Miss)
M, Miss = the previous line entry was from an other location (tag)
H, Hit = the previous line entry was from the same location (tag)
2 3 4
h    4   0,5
7
7
1
4
William Sandqvist [email protected]
2-way set associative cache
Memory address: mmmmmmmm.m.m.mm
Address mapping: tttttttt.l.w.bb
OBSERVE! The set number is not included in the
address map. Logic circuits within the associtive cache
takes care of the set number and connects the CPU
with the correct set.
( Tags are stored in associative cache for each line in
every set. All sets are searched in parallell for tag. )
William Sandqvist [email protected]
Example of how an associative cache
can boost performance
Memory: 0x010, Tag: 0x01 Cache: 0x0=0b0.0.00
Memory: 0x1FC, Tag: 0x1F Cache: 0xC=0b1.1.00
Memory: 0x168, Tag: 0x16 Cache: 0x8=0b1.0.00
Memory: 0x008, Tag: 0x00 Cache: 0x8=0b1.0.00
Memory: 0x014, Tag: 0x01 Cache: 0x4=0b0.1.00
Memory: 0x1F8, Tag: 0x1F Cache: 0x8=0b1.0.00
Memory: 0x00C, Tag: 0x00 Cache: 0xC=0b1.1.00
( Nice example. The Cache part is one full hex digit.)
William Sandqvist [email protected]
Fewer conflict misses
Memory locations 0x010, 0x014 are stored in cacheline 0 – But there are two sets! Both can be stored
simultaneously.
0x1FC, 0x168, 0x008, 0x1F8, 0x00C are stored in
cache-line 1, Two of them could be stored
simultaneously.
You have to consider the exchange policy in order to be
able to analyse this example in full detail. (Not given).
Exchange policy: FIFO, RANDOM, LRU …
If the exchange policy were known, we could follow the cache
accesses for every step to calculate hitrate: line,set(tag)
line,set(tag) … William Sandqvist [email protected]