CS5222 Advanced Computer Architecture Part 3: VLIW Architecture Fall Term, 2004/2005 Chi Chi Hung (email: [email protected]) Building S/17, Rm 5-13 Phone: 6874-2832 CS5222 Adv.

Download Report

Transcript CS5222 Advanced Computer Architecture Part 3: VLIW Architecture Fall Term, 2004/2005 Chi Chi Hung (email: [email protected]) Building S/17, Rm 5-13 Phone: 6874-2832 CS5222 Adv.

CS5222 Advanced Computer Architecture Part 3: VLIW Architecture Fall Term, 2004/2005 Chi Chi Hung (email: [email protected]) Building S/17, Rm 5-13 Phone: 6874-2832 CS5222 Adv. Comp. Arch. Part 3 Page.1

Chi C.H. Fall 2004 NUS

Basic Working Principles of VLIW

° ° °

Aim at speeding up computation by exploiting instruction level parallelism.

°

Same hardware core as superscalar processors, having multiple execution units (EUs) working in parallel.

°

An instruction is consisted of multiple operations; typical word length from 52 bits to 1 Kbits.

°

All operations in an instruction are executed in a lock-step mode.

One or multiple register files for FX and FP data.

Rely on compiler to find parallelism and schedule dependency free program code.

CS5222 Adv. Comp. Arch. Part 3 Page.2

Chi C.H. Fall 2004 NUS

Basic VLIW Approach CS5222 Adv. Comp. Arch. Part 3 Page.3

Chi C.H. Fall 2004 NUS

Register File Structure for VLIW

°

What is the challenge to register file in VLIW? CS5222 Adv. Comp. Arch. Part 3 Page.4

R/W ports Chi C.H. Fall 2004 NUS

Differences Between VLIW & Superscalar Architecture (I) CS5222 Adv. Comp. Arch. Part 3 Page.5

Chi C.H. Fall 2004 NUS

Differences Between VLIW & Superscalar Architecture (II)

°

Instruction formulation:

Superscalar:

-

Receive conventional instructions conceived for seq. processors.

VLIW:

-

Receive (very) long instruction words, each comprising a field (or opcode) for each execution unit.

-

Instruction word length depends (a) number of execution units, and (b) code length to control each unit (such as opcode length, register names, …).

Typical word length is 64 – 1024 bits, much longer than conventional machine word length.

CS5222 Adv. Comp. Arch. Part 3 Page.6

Chi C.H. Fall 2004 NUS

Differences Between VLIW & Superscalar Architecture (III)

°

Instruction scheduling:

Superscalar:

-

Done dynamically at run-time by the hardware.

-

Data dependency is checked and resolved in hardware.

Need a lookahead hardware window for instruction fetch.

CS5222 Adv. Comp. Arch. Part 3 Page.7

Chi C.H. Fall 2004 NUS

Differences Between VLIW & Superscalar Architecture (IV)

°

Instruction scheduling (cont’d):

VLIW:

-

Static scheduling done at compile-time by the compiler.

Advantages:

– –

Reduce hardware complexity.

Tasks such as decoding, data dependency detection, instruction issue, …, etc. becoming simple.

– –

Potentially higher clock rate .

Higher degree of parallelism with global program information.

CS5222 Adv. Comp. Arch. Part 3 Page.8

Chi C.H. Fall 2004 NUS

Differences Between VLIW & Superscalar Architecture (V)

°

Instruction scheduling (cont’d):

VLIW:

-

Disadvantages

Higher complexity of the compiler.

Compiler optimization needs to consider technology dependent parameters such as latencies and load-use time of cache.

– –

(Question: What happens to the software if the hardware is updated?) Non-deterministic problem of cache misses, resulting in worst case assumption for code scheduling.

In case of un-filled opcodes in a (V)LIW, memory space and instruction bandwidth are wasted.

CS5222 Adv. Comp. Arch. Part 3 Page.9

Chi C.H. Fall 2004 NUS

Development history of Proposed/Commercial VLIWs CS5222 Adv. Comp. Arch. Part 3 Page.10

Chi C.H. Fall 2004 NUS

Case Study of VLIW: Trace 200 Family (I) CS5222 Adv. Comp. Arch. Part 3 Page.11

Chi C.H. Fall 2004 NUS

Case Study of VLIW: Trace 200 Family (II)

°

Only two branches might be used in Trace 7/2000 CS5222 Adv. Comp. Arch. Part 3 Page.12

Chi C.H. Fall 2004 NUS

Code Expansion in VLIW

°

It is found that code in VLIW is expanded roughly by a factor of three.

°

For “long” VLIW, more opcode fields will be emptied. This will result in wasting bandwidth and storage space. Can you propose a solution for it?

CS5222 Adv. Comp. Arch. Part 3 Page.13

Chi C.H. Fall 2004 NUS

CS5222 Adv. Comp. Arch. Part 3 Page.14

END Chi C.H. Fall 2004 NUS