Overview SHARC processor

Download Report

Transcript Overview SHARC processor

Overview of SHARC processor
ADSP-21061 and ADSP-21065L
M. R. Smith,
Electrical and Computer Engineering,
University of Calgary, Alberta, Canada
smithmr @ ucalgary.ca
To be tackled today






Reference sources
Register file and operations
Memory configuration and operations
Sample instructions
Program Flow
Some warnings of expected errors


Code review and code review standards
Some recent architectural advances

Tiger-SHARC and Hammerhead-SHARC
7/20/2015
ENCM515 -- Review of SHARC Processor
Copyright [email protected]
2 / 36
This is a
FAMILIARIZATION lecture
Learn a useful subset of instructions
by OSMOSIS later on in course
Reference Sources





ADSP-2106x SHARC User’s Manual 2nd edition,
Analog Devices -- provided to everybody
ENCM515 SHARC Reference card
ENCM515 Course, Reference and Laboratory Notes
SHARC Developers CD (Borrow from office and
install the manuals)
SHARC Navigator Tutorial Tool
http://www.analogdevices.com/industry/dsp/training/index.html#Navigator
7/20/2015
ENCM515 -- Review of SHARC Processor
Copyright [email protected]
4 / 36
Picture Source

SHARC Navigator Tutorial Tool
T. [email protected]
Talik Alukaidey
Dept. of EEE
Uninversity of Hertfordshire, Hatfield, U.K.
7/20/2015
ENCM515 -- Review of SHARC Processor
Copyright [email protected]
5 / 36
ADSP-2106x Core Architecture
CACHE
MEMORY
32 x 48
JTAG TEST &
EMULATION
FLAGS
DAG 1
8 x 4 x 32
DAG 2
8 x 4 x 24
PROGRAM
SEQUENCER
PMA BUS
TIMER
24
PMA
DMA BUS
32
DMA
PMD BUS
48
DMD BUS
40
PMD
BUS CONNECT
FLOATING & FIXED-POINT
MULTIPLIER,
FIXED-POINT
ACCUMULATOR
DMD
REGISTER
FILE
16 x 40
32-BIT
BARREL
SHIFTER
FLOATING-POINT
& FIXED-POINT
ALU
Register File and A.L.Us

Key issues



5 data paths FROM ALU
5 data paths TO ALU
Highly parallel operations UNDER THE RIGHT CONDITIONS
7/20/2015
ENCM515 -- Review of SHARC Processor
Copyright [email protected]
7 / 36
Register File

Key issues






40 bits wide
Top 32 bits used for integer
Top 32 bits used for float
40 bits for precision float
32 registers available
16 at a time
A Register is always 40 bits



can be processed
as a float
can be processed
as an integer
Must convert integer<-> float
7/20/2015
ENCM515 -- Review of SHARC Processor
Copyright [email protected]
8 / 36
Sample ALU Instructions
Rn = Rx + Ry
Rx = Rx – Ry
Rn = Rx + Ry + CI (Carry In)
Rn = Rx - Ry + CI - 1
Rn = (Rx + Ry) / 2
COMP(Rx, Ry)
Rn = Rx + CI – 1
Rn = Rx + 1
Rn = Rx – 1
Rn = -Rx
Rn = ABS Rx
Rn = PASS Rx
Rn = Rx AND Ry
Rn = Rx OR Ry
Rn = NOT Rx
Rn = MIN(Rx, Ry)
Rn = MAX(Rx, Ry)
Rn = CLIP Rx by Ry
7/20/2015
Fn = Fx + Fy
Fn = Fx - Fy
Fn = ABS(Fx + Fy)
Fn = ABS(Fx – Fy)
Fn – (Fx + Fy) / 2
COMP(Fx, Fy)
Fn = - Fx
SEE REF-CARD
Fn= ABS Fx
Fn= PASS Fx
Fn = RND Fx
Fn = SCALB Fx BY Ry
Rn = MANT Fx
Rn = LOGB Fx
Rn = FIX Fx BY Ry
Fn = FLOAT Rx BY Ry
Rn = TRUNC Fx
Fn = RECIPS Fx
Fn = RSQRTS Fx
ENCM515 -- Review of SHARC Processor
Copyright [email protected]
9 / 36
ALU instructions -- Common errors

Key issues -- what IS NOT there rather than what IS
there -- REMEMBER -- Superscaler RISC DSP CPU

Rx = Ry + CONSTANT --- NOT THERE
Rtemp = CONSTANT
This twin instruction
Rx = Ry + Rtemp
MUST be used
VERY COMMON TIME WASTER IN LABS WHEN NOT CHECKED

NOTE: -- Rx = constant is not an ALU operation but
an Immediate Move Universal Register instruction
bringing in a value from PROGRAM memory as part
of the op-code -- MOVEQ equivalent
Ureg = <data32>
7/20/2015
ENCM515 -- Review of SHARC Processor
Copyright [email protected]
10 / 36
MAC instructions -- mainly INTEGER
Multiply and Accumulate
Rn = Rx * Ry
MRB = Rx * Ry
Rn = MRB + Rx * Ry
MRB = MRB + Rx * Ry
Rn = MRB – Rx * Ry
MRB = MRB – Rx * Ry
Rn = SAT MRB
MRB = SAT MRB
Rn = RND MRB
MRB = RND MRB
Rn = MR SEE REF-CARD
7/20/2015
MRF = Rx * Ry
Rn = MRF + Rx * Ry
MRF = MRF + Rx * Ry
Rn = MRF – Rx * Ry
MRF = MRF – Rx * Ry
Rn = SAT MRF
MRF = SAT MRF
Rn = RND MRF
MRF = RND MRF
MR = Rn
FLOAT – Fx * Fy
ENCM515 -- Review of SHARC Processor
Copyright [email protected]
11 / 36
Shifter Instructions -- mainly integer
SEE REF-CARD
Rn = LSHIFT Rx BY Ry/<dataa8>
Rn = Rn OR LSHIFT Rx BY Ry/<data8>
Rn = ASHIFT Rx BY Ry/<data8>
Rn = ROT Rx BY Ry/<data8>
Rn = BCLR Rx BY Ry/<data8>
Rn = BSET Rx BY Ry/<data8> Rn = BTGL Rx BY Rx/<data8>
BTST Rx BY Ry/<data8>
Rn = Rn OR FDEP Rx BY Ry/<bit6>:<len6> (SE)
Rn = Rx BY Ry/<bit 6>:<len6> (SE)
Rn = EXP Rx (EX)
Rn = LEFTZ Rx
Rn = LEFT0 Rx
Rn = FPACK Fx
Fn = UNPACK Rx
FPACK is a cast and means (32bit -> 16bit) Fx
7/20/2015
UNPACK
is a cast and means (16bit -> 32bit) Rx
ENCM515 -- Review of SHARC Processor
12 TOO!
/ 36
BUTCopyright
[email protected]
A LOT OF HIDDEN STUFF
21061 ALU instructions

Under the RIGHT conditions can do multiple operations at the
same time


Certain Ops using certain registers -- see reference material
1 MAC, 1 ALU (SOMETIMES 2), 1 SHIFTER ?, AND
1 DM ACCESS, 1 PM ACCESS
Certain combinations can also be CONDITIONAL
We are going to write code in a format that will allow us to
parallel instructions -- an expectation for the course

Depends on what you do and who you do it to (special
registers combos)

only a certain number of bits available in opcode (40 bits) so that not
all reasonable combinations possible
7/20/2015
ENCM515 -- Review of SHARC Processor
Copyright [email protected]
13 / 36
REMINDER
This is a
FAMILIARIZATION lecture
Learn a useful subset
by OSMOSIS later on in course
21061 Memory Accesses
Under the right conditions -- 3 memory accesses at same time
Program Memory, Data Memory, Instruction Cache
PLUS up to 3 Arithmetic operations at the same time
7/20/2015
ENCM515 -- Review of SHARC Processor
Copyright [email protected]
15 / 36
Data Address Generators -- DAG
DAG1 -- best for accessing Data memory
(0 -- 7)
DAG2 -- best for accessing Program memory
(8 -- 15)
MUST be used in this fashion for simultaneous memory ops
ENCM515 -- Review of SHARC Processor
Also an alternate set of DAGs
7/20/2015
Copyright [email protected]
16 / 36
Register and Register Ops in DAG1
SPECIAL CIRCBUFFER STUFF
SPECIAL FFT BIT
7/20/2015
ENCM515 -- Review of SHARC Processor
Copyright [email protected]
17 / 36
DAG register info

Index registers



Modify registers M0 -- M7, M8 -- M15



I0 -- I7 (dm -- data mem), I8 -- I15 (pm -- program mem)
“like” 68K address registers A0 -- A6
Can be offset registers (c.f 68K (4, SP)
Can be used for high speed post increment
Special Hardware for Circular Buffers

Base registers B0 -- B7, B8 -- B15
Length registers L0 -- L7, L8 -- L15

See labs 2 -- 4 and associated lectures

7/20/2015
ENCM515 -- Review of SHARC Processor
Copyright [email protected]
SEE REF-CARD
18 / 36
SHARC Universal Registers -- Ureg
SEE REF-CARD
See first column of ENEL515 reference card
UNIVERSAL REGISTER INFORMATION -- UREG
DATA – R0 to R15
INDEX (Address) – I0 to I7, I8 to I15 MODIFY -- M0 to M7 M8 to M15
LENGTH – L0 to L7, L8 to L15
BASE – B0 to B7, B8 to B15
(Setting Bx also set Ix)
Ia/Mb refers to DAG1 (dm) registers – Ic/Md refers to DAG2
PROGRAM SEQUENCER – PC, PCSTK, PCSTKP, FADDR, DADDR. LADDR. CURLCNTR, LCNTR
BUS EXCHANGE -- PX1, PX2, PX
TIMER – TPERIOD, TCOUNT
SYSTEM REGISTERS – sreg -- MODE1, MODE2, IRPTL, IMASK, IMASKP, ASTAT, STKY, USTAT1,
USTAT2
KEY ISSUES -- Can do certain things to Ureg that you can’t do to
other system registers
7/20/2015
ENCM515 -- Review of SHARC Processor
Copyright [email protected]
19 / 36
Some memory access instructions
SEE REF-CARD
ureg = dm(<addr32>);
dm(<addr32>) = ureg;
dm(<addr32>, Ia) = ureg;
ureg = dm(<addr32>, Ia);
IMMEDIATE MOVE
ureg = pm(<addr24>);
pm(<addr24>) = ureg;
pm(<addr24>, Ic) = ureg;
ureg = pm(<addr24>), Ic);
Add the following to your reference sheet
ureg = <data32>
7/20/2015
ENCM515 -- Review of SHARC Processor
Copyright [email protected]
20 / 36
Special hardware addressing modes
SEE REF-CARD
PRE AND POST MODIFY OPERATIONS
PRE-MOD -- (Mb, Ia) – Use address (Mb + Ia) Leave Ia unchanged
IF La register = 0 – causes normal array operation
POST-MOD -- (Ia. Mb) – Use address (Ia) – Change Ia to Ia + Mb
IF La register != 0 – causes circular buffer operations
POST-MOD – (Ia, Mb) – Use address (Ia) – Change Ia to Ia + Mb
then perform Ia –La or Ia + La until Ia in range Ba to Ba + La –1
7/20/2015
ENCM515 -- Review of SHARC Processor
Copyright [email protected]
21 / 36
Compute with Move instructions
COMPUTE AND MOVE INSTRUCTIONS
compute, dm(Ia, Mb) = dreg1, pm(Ic, Md) = dreg2;
dreg1 = dm(Ia, Mb), dreg2 = pm(Ic, Md);
IF condition compute;
N.B. italics = optional part of instruction
N.B. IF affects the WHOLE instruction
IF condition compute,
dm(Pre/Post with REGISTERS) = ureg;
pm(Pre/Post with REGISTERS) = ureg;
ureg = dm(Pre/Post with REGISTERS);
ureg = pm(Pre/Post with REGISTERS);
N.B. ureg can’t be from same DAG as Pre/Post registers
SEE REF-CARD
KEY ISSUES -- Multiple operations available in 1 instruction
Compute, 1 dm access, 1 pm access
PROVIDED you are describing memory operations using registers
and not by a number -- dm(I1, 1) BAD -- dm(I1, M6) (with M6 = 1) GOOD
MOST INSTRUCTIONS ARE CONDITIONAL (Why?)
7/20/2015
ENCM515 -- Review of SHARC Processor
Copyright [email protected]
22 / 36
SHARC Program Flow
SEE REF-CARD
IF condition JUMP <addr24> (DB/LA/CI);
(PC, <reladdr24>) (DB/LA/CI);
IF condition CALL <addr24> (DB);
(PC, <reladdr24> (DB);
IF condition JUMP (Md, Ic) (DB/LA/CI), compute;
(PC, <reladdr24>) (DB/LA/CI) ELSE compute;
ditto using CALL
IF condition JUMP (Md. Ic), ELSE compute, dm(Ia, Mb) = dreg;
Key issues
Condition affects ALL of the instruction,
Compute and jump both become conditional
JUMP and also JUMP (DB)
7/20/2015
ENCM515 -- Review of SHARC Processor
Copyright [email protected]
23 / 36
Delayed Branch -- A killer!
SEE REF-CARD
R2 = 1;
R8 = pass R2;
If NE jump(pc, _LABEL) (DB);
R8 = 2;
Execute whether jump or not
R7 = 1;
Execute whether jump or not
R8 = 3;
_LABEL:
Warning: R7 = 1, whether jump OR NOT,
R8 = 3 if jump DOES NOT OCCUR,
R8 = 2 if jump occurs and not 1
7/20/2015
ENCM515 -- Review of SHARC Processor
Copyright [email protected]
24 / 36
High Speed Loops available
LCNTR = <data16>, DO <addr24> UNTIL LCE;
= ureg
<PC, <reladdr24>
DO <addr24>
UNTIL termination;
Some possible HARDWARE loop operation instructions
7/20/2015
ENCM515 -- Review of SHARC Processor
Copyright [email protected]
25 / 36
You think you have it bad


Some recent architecture advances -- SISD, SIMD, VLIW
Tiger SHARC 21161 -- 2 CPU’s on the same
chip -- working with the same instruction



One CPU uses R registers and memory Moves
using the value in the I register
The other CPU uses S registers and memory
Moves using the value in the I register PLUS 1
Hammerhead SHARC -- 2 CPU’s that can be
independently controlled -- 8 possible
operations in a single instructions
7/20/2015
ENCM515 -- Review of SHARC Processor
Copyright [email protected]
26 / 36
TigerSHARC ADSP-21160 Core Architecture
CACHE
MEMORY
32 x 48
DAG 1
8 x 4 x 32
JTAG TEST &
EMULATION
FLAGS
DAG 2
8 x 4 x 32
PROGRAM
SEQUENCER
PMA BUS
TIMER
32
PMA
DMA BUS 32
DMA
PMD BUS
64
PMD
BUS CONNECT
DMD BUS 64
FLOATING & FIXEDPOINT MULTIPLIER,
FIXED-POINT
ACCUMULATOR
REGISTER
FILE
16 x 40
32-BIT
BARREL
SHIFTER
DMD
FLOATING-POINT
&FIXED-POINT
ALU
FLOATING-POINT
&FIXED-POINT
ALU
32-BIT
BARREL
SHIFTER
REGISTER
FILE
16 x 40
FLOATING & FIXEDPOINT MULTIPLIER,
FIXED-POINT
ACCUMULATOR
Normal-Word - Dual Data - SISD
r0=dm(i0,m0), r4=pm(i8,m8) -- Like normal SHARC
Memory Block 0
Memory Block 1
32-bit Word N3
32-bit Word N2
32-bit Word M3
32-bit Word M2
32-bit Word N1
32-bit Word N0
32-bit Word M1
32-bit Word M0
63
31
0
PM Data Bus
DM Data Bus
PEy Register File
PEx Register File
R0
R4
•I0 points to normal word space in block 1
•I8 points to normal word space in block 0
7/20/2015
ENCM515 -- Review of SHARC Processor
Copyright [email protected]
28 / 36
Long-Word - Dual Data - SISD
r0=dm(i0,m0), r4=pm(i8,m8) -- 64 bit precision
Memory Block 0
Memory Block 1
Odd Address
Even Address
Odd Address
Even Address
32-bit Word N3
32-bit Word N2
32-bit Word M3
32-bit Word M2
32-bit Word N1
32-bit Word N0
32-bit Word M1
32-bit Word M0
63
31
0
PM Data Bus
DM Data Bus
PEy Register File
PEx Register File
R0
R1
R4
R5
•I0 points to long word space in block 1
•I8 points to long word space in block 0
7/20/2015
ENCM515 -- Review of SHARC Processor
Copyright [email protected]
29 / 36
Normal-Word - Dual Data - SIMD
r0=dm(i0,m0), r4=pm(i8,m8)
Memory Block 0
Memory Block 1
32-bit Word N3
32-bit Word N2
32-bit Word M3
32-bit Word M2
32-bit Word N1
32-bit Word N0
32-bit Word M1
32-bit Word M0
63
31
0
PM Data Bus
DM Data Bus
PEy Register File
PEx Register File
S0
R0
S4
R4
•I0 points to normal word space in block 1
•I8 points to normal word space in block 0
7/20/2015
ENCM515 -- Review of SHARC Processor
Copyright [email protected]
30 / 36
Data Interleave for Multi-Channel Optimization
Instruction:
vecprod: f8=f0*f4, f12=f8+f12, f0=dm(i0,m1), f4=pm(i8,m9);
Memory Block 0
Up to 6 ALU Ops + 5 memory Memory Block 1
Address
Value
Value
Address
0x40000
B[0]
A[0]
0x50000
0x40001
D[0]
C[0]
0x50001
0x40002
B[1]
A[1]
0x50002
0x40003
D[1]
C[1]
0x50003
63
31
0
PM Data Bus
DM Data Bus
PEy Register File
PEx Register File
S0
R0
S4
R4
I0 points to normal word space in Block 1
I8 points to normal word space in Block 0
7/20/2015
ENCM515 -- Review of SHARC Processor
Copyright [email protected]
31 / 36
Special instructions to handle “C”

Cjump -- getting to “C” compatible subroutine
Processor architecture customized for C
 Replaces 3 instructions for faster operations
 Difficult to use in ENCM515

Will not be having assembly code calling other
subroutines (95%) -- Why bother since slow!
RFRAME -- returning to “C” environment


Processor architecture customized for C
 Part of MAGIC lines of code
 See reference card

7/20/2015
ENCM515 -- Review of SHARC Processor
Copyright [email protected]
32 / 36
“C” interface to assembly code
C/ASSEMBLY LANGUAGE INTERFACE
Special Purpose Registers – usage predetermined by compiler
I7 – C runtime stack pointer – next empty place – NOT last used
I6 – C runtime frame pointer – start of frame of current function
(cdefines.i -- I7 = CTOPstack, I6 = FP)
L6/L7 – must remain as zero – controls stack memory characteristics
DAG1 registers – M5 (0), M6 (1), M7 (-1) – in “C” runtime header
(cdefines.i -- zeroDM, plus1DM, minus1DM)
DAG2 registers – M13 (0), M14 (1), M15 (-1) – in “C” header
(cdefines.i – zeroPM, plus1PM, minus1PM)
LENGTH registers MUST RETURN to 0 – don’t touch L6/L7
SEE REF-CARD
7/20/2015
ENCM515 -- Review of SHARC Processor
Copyright [email protected]
33 / 36
21k Volatile registers when using “C”
R4 (INPAR1), R8 (INPAR2), R12 (INPAR3), R0 (retvalue)
Scratch or Volatile registers (cdefines.i definitions)
Don’t keep useful values in them across subroutine calls
R0, R1, R2 (cdefines.i -- retvalue, scratchR1, scratchR2)
R4, I4, M4 (cdefines.i -- INPAR1, scratchDMpt, scratchMDM)
R8
(cdefines.i -- INPAR2)
R12, I12, M12 (cdefines.i -- INPAR3, scratchPMpt, scratchMPM)
SEE REF-CARD
7/20/2015
ENCM515 -- Review of SHARC Processor
Copyright [email protected]
34 / 36
Anticipated Errors while coding
ANTICIPATED ERRORS – CHECK DURING CODE REVIEW
1) 2 instructions executed after any delayed BRANCH/JUMP/CALL
2) Wrong Ia/Ma etc (DAG1) with pm memory (DAG2) – viceversa
3) Incorrect use Scratch registers, parameter passing, stack operations
4) Illegal instruction combinations, looks correct but not enough bits in opcode IF xxxx dm(Pre/Post
with REGISTERS) = ureg; OK
IF xxxx dm(Pre/Post with CONSTANT) = ureg; NO
SEE REF-CARD
METRICS FOR QUALITY CODE INDICATION
Design Review / Design Time > 25%
Code Review / Coding Time > 50%
DEFECTS found in assemble/compile < 10 / kLOC
DEFECTS found in TEST < 5 / kLOC
Code Review rate < 150 LOC / hr
If, after code review, you find many SYNTAX errors in your code, it is an indication that there are a large
number of LOGICAL defects left in your code and undiscovered by assembler/compiler
7/20/2015
ENCM515 -- Review of SHARC Processor
Copyright [email protected]
35 / 36
Tackled today






Reference sources
Register file and operations
Memory configuration and operations
Sample instructions
Program Flow
Some warnings of expected errors


Code review and code review standards
Some recent architectural advances

Tiger-SHARC and Hammerhead-SHARC
7/20/2015
ENCM515 -- Review of SHARC Processor
Copyright [email protected]
36 / 36