Target - technology presentation

Transcript Target - technology presentation

Ideas for the design of an ASIP for LQCD

Target Compiler Technologies CASTNESS’11, Rome, Italy

Agenda ASIPs and IP Designer EURETILE platform An ASIP for LQCD

ASIPs in Multi-Core SoC

ASIP: Application-Specific Processor

   Anything between general-purpose  P and hardwired data-path Flexibility through programmability and design-time reconfigurability High throughput, low energy through parallelism and specialization

ASIP is foundation of heterogeneous multi-core SoC

Why ASIPs?

Maximise performance

  Specialisation Parallelism: VLIW, SIMD, multi-core

Minimise power dissipation

   Specialisation Parallelism: VLIW, SIMD, multi-core Power-optimised RTL generation

Leverage the benefits of programmability

    React to changing requirements Ship first for evolving standards Remedy defects Extend products to new markets without an SoC respin CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 4

IP Designer Tool Suite

nML – ASIP description language Example: architectural specialisation

 Absolute-difference instruction in motion estimation

reg V[4]; trn vecr; trn vecs; trn trn fu vecd; vect; vec; fu ...

vabs; opn vec_adiff_opn(t:c2u, r:c2u) { action { stage E1: vecd = vsub(vecr=V[r],vecs=V[t]) @vec; V[t] = vect = vabs(vecd) @vabs; } syntax : "vadiff v"t ",v"r ",v"t; image : t::r; } Structural skeleton

• Registers, busses, functional units • Application specific data type ‘vector’

Instruction-set grammar

Agenda

ASIPs and IP Designer

EURETILE platform

An ASIP for LQCD

EURETILE hardware platform

  

Communication

 DNP

Control

 RISC

DNP DSP RISC ASIP1 *** MEM Computation

Agenda

ASIPs and IP Designer EURETILE platform

An ASIP for LQCD

LQCD ASIP

Goals

  Increase performance Decrease gate count or usage of FPGA blocks

Means

    Task level parallelism (multi tile architecture) Data level parallelism Instruction level parallelism Architecture specialisation CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 10

LQCD ASIP

Data level parallelism

c1 c2 c3   3-way SIMD fits with SU(3) matrix algebra 3x speed improvement over scalar architecture

Instruction level parallelism

 VLIW instruction word

VU_1 … VU_n LS_0 … LS_m

   Arithmetic operations in parallel with load/store operations Appropriate mix of n and m based on feedback from compilation of Qphi() function n*m speed improvement over scalar architecture CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 11

LQCD ASIP

 

Architecture specialisation: complex floating point

operations:

C + C, C + i*C C – C, C – i*C

→ 2x speedup over scalar architecture

C * R C * C …

→ 4x speedup over scalar architecture → 8x speedup over scalar architecture

Behaviour of floating point operations

• Defined in a C dialect intended for the modelling of functional units • Translated into simulation and implementation (RTL) models • Synthesis on standard cell library, mapping on FPGA primitives

Vector types and operators defined for the C compiler vector v1, va[4], vb[4]; v1 += va[0] * vb[1];

LQCD ASIP

Architecture specialisation: address generation

Goal: Vector units should be used every cycle, address generation must be done in parallel How: to be investigated, after feedback from C compiler!

Deliverables