Target - technology presentation

Download Report

Transcript Target - technology presentation

Ideas for the design of an ASIP for LQCD

Target Compiler Technologies CASTNESS’11, Rome, Italy

CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 1

Agenda ASIPs and IP Designer EURETILE platform An ASIP for LQCD

CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 2

ASIPs in Multi-Core SoC

ASIP: Application-Specific Processor

   Anything between general-purpose  P and hardwired data-path Flexibility through programmability and design-time reconfigurability High throughput, low energy through parallelism and specialization

ASIP is foundation of heterogeneous multi-core SoC

 Balanced SoC architecture offers best performance at lowest energy and lowest cost CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 3

Why ASIPs?

Maximise performance

  Specialisation Parallelism: VLIW, SIMD, multi-core

Minimise power dissipation

   Specialisation Parallelism: VLIW, SIMD, multi-core Power-optimised RTL generation

Leverage the benefits of programmability

    React to changing requirements Ship first for evolving standards Remedy defects Extend products to new markets without an SoC respin CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 4

IP Designer Tool Suite

CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 5

nML – ASIP description language Example: architectural specialisation

 Absolute-difference instruction in motion estimation

reg V[4]; trn vecr; trn vecs; trn trn fu vecd; vect; vec; fu ...

vabs; opn vec_adiff_opn(t:c2u, r:c2u) { action { stage E1: vecd = vsub(vecr=V[r],vecs=V[t]) @vec; V[t] = vect = vabs(vecd) @vabs; } syntax : "vadiff v"t ",v"r ",v"t; image : t::r; } Structural skeleton

• Registers, busses, functional units • Application specific data type ‘vector’

Instruction-set grammar

Primitive functions: • vsub() • vabs() Operation pattern: V  vabs()  vsub()  V, V CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 6

Agenda

ASIPs and IP Designer

EURETILE platform

An ASIP for LQCD

CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 7

EURETILE hardware platform

  

Communication

 DNP

Control

 RISC

DNP DSP RISC ASIP1 *** MEM Computation

  DSP ASIPs: specialised towards the application − Lattice quantum chromo dynamics (LQCD) − Neural network (Izhikevich) CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 8

Agenda

ASIPs and IP Designer EURETILE platform

An ASIP for LQCD

CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 9

LQCD ASIP

Goals

  Increase performance Decrease gate count or usage of FPGA blocks

Means

    Task level parallelism (multi tile architecture) Data level parallelism Instruction level parallelism Architecture specialisation CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 10

LQCD ASIP

Data level parallelism

c1 c2 c3   3-way SIMD fits with SU(3) matrix algebra 3x speed improvement over scalar architecture

Instruction level parallelism

 VLIW instruction word

VU_1 … VU_n LS_0 … LS_m

   Arithmetic operations in parallel with load/store operations Appropriate mix of n and m based on feedback from compilation of Qphi() function n*m speed improvement over scalar architecture CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 11

LQCD ASIP

 

Architecture specialisation: complex floating point

operations:

C + C, C + i*C C – C, C – i*C

→ 2x speedup over scalar architecture

C * R C * C …

→ 4x speedup over scalar architecture → 8x speedup over scalar architecture

Behaviour of floating point operations

• Defined in a C dialect intended for the modelling of functional units • Translated into simulation and implementation (RTL) models • Synthesis on standard cell library, mapping on FPGA primitives

Vector types and operators defined for the C compiler vector v1, va[4], vb[4]; v1 += va[0] * vb[1];

CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 12

LQCD ASIP

Architecture specialisation: address generation

Goal: Vector units should be used every cycle, address generation must be done in parallel How: to be investigated, after feedback from C compiler!

Deliverables

 SDK (Compiler, Assembler, Linker, Simulator, Debugger) based on IP Designer   SystemC model RTL Model + FPGA mapping CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 13

CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 14

CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 15