ENCM515 -- Compare architectures

Download Report

Transcript ENCM515 -- Compare architectures

Comparing the SHARC and
HAMMERHEAD DSPs
Prepared by
Eugene So
To be tackled today

Comparison of Processor Architectures



ADSP2106X (SHARC)
ADSP2116X (HAMMERHEAD)
Focus on the advantages of the Hammerhead



SIMD Engine
Benchmark
CRISP requirements
7/17/2015
Comparing the SHARC and HAMMERHEAD
2 / 13
SHARC Architecture
CACHE
MEMORY
32 x 48
JTAG TEST &
EMULATION
FLAGS
DAG 1
8 x 4 x 32
DAG 2
8 x 4 x 24
PROGRAM
SEQUENCER
PMA BUS
TIMER
24
PMA
DMA BUS
32
DMA
PMD BUS
48
DMD BUS
40
PMD
BUS CONNECT
FLOATING & FIXED-POINT
MULTIPLIER,
FIXED-POINT
ACCUMULATOR
7/17/2015
DMD
REGISTER
FILE
16 x 40
32-BIT
BARREL
SHIFTER
Comparing the SHARC and HAMMERHEAD
FLOATING-POINT
& FIXED-POINT
ALU
3 / 13
Hammerhead Architecture
32-bit PMA bus (vs 24-bit)
48/64-bit PMD bus (vs 48-bit)
40/64-bit DMD bus (vs 40-bit)
Dual processor,
Shared memory
7/17/2015
Comparing the SHARC and HAMMERHEAD
4 / 13
Single Instruction Multiple Data Engine

“The Hammerhead is a SHARC” with two CPUs

The same instruction is executed in both CPUs, but
each CPU operates on different data



CPU1 can be controlled
CPU2 simply performs the same operation as CPU1 on the
next memory location.
This is called a Single Instruction Multiple Data
(SIMD) Engine
7/17/2015
Comparing the SHARC and HAMMERHEAD
5 / 13
Advantages of SIMD

Double cycle performance


If an algorithm can be broken into 2
components then CPU1 can process memory
elements 1, 3, 5, 7 etc., and CPU2 can process
2, 4, 6, 8 etc.
Examples:

FIR, divide, inverse square root, matrix multiply
7/17/2015
Comparing the SHARC and HAMMERHEAD
6 / 13
FIR example

y(n) = summation from k=0 to M of h(k) * x(n-k)
/* HAMMER HEAD CODE SEGMENT ... */
lcntr=samples, do macs until lce;
/* FIR loop */
macs: f12=f0*f4, f8=f8+f12, f0=dm(sample_pt,m3), f4=pm(coeff_pt,m9);
/* (2 samples) * (2 coeffs), accum, read 2 samples, read 2 coeffs */

Cycle count


Hammerhead: TAPS/2 + 16
SHARC: TAPS + 9
7/17/2015
Comparing the SHARC and HAMMERHEAD
7 / 13
Advantages of SIMD (cont.)

Double Bandwidth


Accessing two memory locations in a single instruction
essentially doubles the bandwidth between memory
and the CPUs
Special Addressing Modes



The data busses are widened to 64-bits on the
Hammerhead to accommodate SIMD
One addressing mode on the Hammerhead allows
identical data to be copied into both register files
Another allows 64-bit data to be split between the two
register files
7/17/2015
Comparing the SHARC and HAMMERHEAD
8 / 13
Advantages of SIMD (cont.)

Complex Arithmetic Handling


CPU1 handles real component
CPU2 handles imaginary component
7/17/2015
Comparing the SHARC and HAMMERHEAD
9 / 13
Complex Vector Add example

z(n) = x(n) + jy(n)
/* HAMMERHEAD CODE SEGMENT */
#include
"def21160.h"
/* Symbol Definition File */
.global cx_vec_add;
/* program memory code */
.section/pm seg_pmco;
cx_vec_add:
bit set MODE1 PEYEN;
/* SIMD enabled */
lcntr=samples, do add until lce;
f0=dm(i0,m0), f4=pm(i8,m8);
add: f8=f0+f4, dm(i1,m0)=f8;
/* f0 = Xr, s0 = Xi, f4 = Yr, s4 = Yi */
/* f8 = Xr + Yr, s8 = Xi + Yi, store result */
rts (db);
bit clr MODE1 PEYEN;
dm(i1,m0)=f8;
7/17/2015
/* SIMD disabled */
/* store last result */
Comparing the SHARC and HAMMERHEAD
10 / 13
Benchmark
Description
7/17/2015
21061
21160M
Type
Clock speed (MHz)
Instruction cycle (ns)
FP
50
20
FP
80
12.5
FIR Filter
Cycles
per tap time (ns)
1
20
1
6.25
IIR Filter
biquad cycle
per biquad time (ns)
4
80
4
25
Radix-4 FFT
1024, bit reversed
370
115
Divide (y/x)
cycles
time (ns)
6
120
6
37.5
Inverse Square Root
cycles
time (ns)
9
180
9
56.25
DMA Transfer Rate
(MBytes/s)
300
560
Comparing the SHARC and HAMMERHEAD
11 / 13
CRISP Requirements
Requirement
SHARC
Hammerhead





















Fast instruction cycle
Cycle time adjustable according to instruction type
Fast hardware multiplier
Floating point for easier algorithm design
High precision, implying wide data buses for memory, internal processor
transfers, registers and on-board processing units
Several data buses available to reduce bus conflict transfer overhead
Harvard architecture and/or instruction cache
Duplicate resources for parallel computation of complex numbers
Dedicated hardware required for address calculations
Extensive temporary registers to reduce unwanted fetches of continually
used data or single cycle, highly parallel, memory operations
Fast and reliable, easily programmed, developed and upgraded
Lower power consumption with a standby mode
7/17/2015
Comparing the SHARC and HAMMERHEAD
12 / 13
Conclusion


“The Hammerhead is a SHARC” with two CPUs
Two heads are (usually) better than one! But
also more costly!
7/17/2015
Comparing the SHARC and HAMMERHEAD
13 / 13