FUTURE • Timing seemed good • However, only student to give feedback marked confusing (2 of 5 on clarity) and too fast (5 of.

Download Report

Transcript FUTURE • Timing seemed good • However, only student to give feedback marked confusing (2 of 5 on clarity) and too fast (5 of.

FUTURE
• Timing seemed good
• However, only student to give feedback
marked confusing (2 of 5 on clarity) and too
fast (5 of 5 on pace).
• VLIW, MIMD not mean anything to EE
undergrads.
• In general, the taxonomy at the end is not the
big revelation it was for UCB CS grad
students
1
Penn ESE680-002 Spring2007 -- DeHon
ESE680-002 (ESE534):
Computer Organization
Day 8: February 5, 2007
Computing Requirements and
Instruction Space
2
Penn ESE680-002 Spring2007 -- DeHon
Previously
• Fixed and Programmable Computation
• Area-Time-Energy Tradeoffs
• VLSI Scaling
3
Penn ESE680-002 Spring2007 -- DeHon
Today
• Computing Requirements
• Instructions
– Requirements
– Taxonomy
4
Penn ESE680-002 Spring2007 -- DeHon
Computing Requirements
(review)
5
Penn ESE680-002 Spring2007 -- DeHon
Requirements
• In order to build a general-purpose
(programmable) computing device, we
absolutely must have?
–_
–_
–_
–_
–_
6
Penn ESE680-002 Spring2007 -- DeHon
7
Penn ESE680-002 Spring2007 -- DeHon
Primitive compute
elements enough?
8
Penn ESE680-002 Spring2007 -- DeHon
9
Penn ESE680-002 Spring2007 -- DeHon
10
Penn ESE680-002 Spring2007 -- DeHon
Compute and Interconnect
11
Penn ESE680-002 Spring2007 -- DeHon
Sharing
Interconnect
Resources
12
Penn ESE680-002 Spring2007 -- DeHon
Sharing Interconnect and Compute Resources
What role are the
memories playing
here?
13
Penn ESE680-002 Spring2007 -- DeHon
Memory block or
Register File
Interconnect:
moves data from
input to storage
cell;
or from storage
cell to output.
14
Penn ESE680-002 Spring2007 -- DeHon
What do I need to be able to use this circuit properly?
(reuse it on different data?)
15
Penn ESE680-002 Spring2007 -- DeHon
16
Penn ESE680-002 Spring2007 -- DeHon
Requirements
• In order to build a general-purpose
(programmable) computing device, we
absolutely must have?
– Compute elements
– Interconnect: space
– Interconnect: time (retiming)
– Interconnect: external (IO)
– Instructions
17
Penn ESE680-002 Spring2007 -- DeHon
Instruction Taxonomy
18
Penn ESE680-002 Spring2007 -- DeHon
• Distinguishing feature of programmable
architectures?
– Instructions -- bits which tell the device
how to behave
Compute
0000 00
net0
Penn ESE680-002 Spring2007 -- DeHon
010
add
11 0110
mem slot#6
19
Focus on Instructions
• Instruction organization has a large
effect on:
– size or compactness of an architecture
– realm of efficient utilization for an
architecture
20
Penn ESE680-002 Spring2007 -- DeHon
Terminology
• Primitive Instruction (pinst)
– Collection of bits which tell a single bitprocessing element what to do
– Includes:
• select compute operation
• input sources in space
– (interconnect)
• input sources in time
– (retiming)
Compute
0000 00
net0
010
add
11 0110
mem slot#6
21
Penn ESE680-002 Spring2007 -- DeHon
Computational Array Model
• Collection of
computing elements
– compute operator
– local storage/retiming
• Interconnect
• Instruction
22
Penn ESE680-002 Spring2007 -- DeHon
“Ideal” Instruction Control
• Issue a new instruction to every
computational bit operator on every
cycle
23
Penn ESE680-002 Spring2007 -- DeHon
“Ideal” Instruction Distribution
• Why don’t we do this?
24
Penn ESE680-002 Spring2007 -- DeHon
“Ideal” Instruction Distribution
• Problem: Instruction bandwidth (and
storage area) quickly dominates
everything else
– Compute Block ~ 1Ml2 (1Kl x 1Kl)
– Instruction ~ 64 bits
– Wire Pitch ~ 8l
– Memory bit ~ 1.2Kl2
25
Penn ESE680-002 Spring2007 -- DeHon
64x8l=512l
Two instructions in 1024l
Instruction Distribution
26
Penn ESE680-002 Spring2007 -- DeHon
Instruction Distribution
Distribute from both sides = 2x
27
Penn ESE680-002 Spring2007 -- DeHon
Instruction Distribution
Distribute X and Y = 2x
28
Penn ESE680-002 Spring2007 -- DeHon
Instruction Distribution
• Room to distribute 2 instructions across
PE per metal layer (1024 = 2864)
• Feed top and bottom (left and right) = 2
• Two complete metal layers = 2
•  8 instructions / PE Side
29
Penn ESE680-002 Spring2007 -- DeHon
Instruction Distribution
• Maximum of 8 instructions per PE side
• Saturate wire channels at 8N = N
•  at 64 PE
– beyond this:
• instruction distribution dominates area
• Instruction consumption goes with area
• Instruction bandwidth goes with
perimeter
30
Penn ESE680-002 Spring2007 -- DeHon
Instruction Distribution
• Beyond 64 PE, instruction bandwidth
dictates PE size
PEarea 4N
=N
(648l)
PEarea =16Kl2N
• As we build larger arrays
processing elements become less dense
31
Penn ESE680-002 Spring2007 -- DeHon
Avoid Instruction BW Saturation?
• How might we avoid this?
32
Penn ESE680-002 Spring2007 -- DeHon
Instruction Memory
Requirements
• Idea: put instruction memory in array
• Problem: Instruction memory can
quickly dominate area, too
– Memory Area = 641.2Kl2/instruction
– PEarea = 1Ml2 + (Instructions)  80Kl2
33
Penn ESE680-002 Spring2007 -- DeHon
Instruction Pragmatics
• Instruction requirements could dominate
array size.
• Standard architecture trick:
– Look for structure to exploit in “typical
computations”
34
Penn ESE680-002 Spring2007 -- DeHon
Typical Structure?
• What structure do we usually expect?
35
Penn ESE680-002 Spring2007 -- DeHon
Two Extremes
• SIMD Array
(microprocessors)
– Instruction/cycle
– share instruction across array
of PEs
– uniform operation in space
– operation variance in time
36
Penn ESE680-002 Spring2007 -- DeHon
Two Extremes
• SIMD Array
(microprocessors)
– Instruction/cycle
– share instruction
across array of PEs
– uniform operation in
space
– operation variance in
time
• FPGA
– Instruction/PE
– assume temporal
locality of instructions
(same)
– operation variance in
space
– uniform operations in
time
37
Penn ESE680-002 Spring2007 -- DeHon
Placing Architectures
• What programmable architectures
(organizations) are you familiar with?
38
Penn ESE680-002 Spring2007 -- DeHon
• What differentiates a VLIW from a
multicore?
– E.g.
• 4-issue VLIW vs.
• 4 single-issue processors
40
Penn ESE680-002 Spring2007 -- DeHon
Gross Parameters
• Instruction sharing width
– SIMD width
– granularity
• Instruction depth
– Instructions stored locally per compute
element
• pinsts per control thread
– E.g. VLIW width
41
Penn ESE680-002 Spring2007 -- DeHon
Architecture Instruction
Taxonomy
42
Penn ESE680-002 Spring2007 -- DeHon
Instruction Message
• Architectures fall out of:
– general model too expensive
– structure exists in common problems
– exploit structure to reduce resource
requirements
• Architectures can be viewed in a unified
design space
43
Penn ESE680-002 Spring2007 -- DeHon
Admin
• Instruction assignment due Wednesday
• Reading for today and Wed. on web
• Try GRW262 for André Office Hours this
week
44
Penn ESE680-002 Spring2007 -- DeHon
Big Ideas
[MSB Ideas]
• Basic elements of a programmable
computation
– Compute
– Interconnect
• (space and time, outside system [IO])
– Instructions
• Instruction resources can be significant
– dominant/limiting resource
45
Penn ESE680-002 Spring2007 -- DeHon
Big Ideas
[MSB-1 Ideas]
• Two key functions of memory
– retiming
– instructions
• description of computation
46
Penn ESE680-002 Spring2007 -- DeHon