No Slide Title

Download Report

Transcript No Slide Title

Dynamically Programmable
Array Architecture
Robert Heaton
Obsidian Technology
Confidential
Mesh of Trees
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
Confidential




Busses are BI-directional
2 Cycles to exchange data
Separate X and Y dimensions
Diagonal routing not directly
supported
 PU’s difficult to program to
take advantage of structure
Two Dimensional Mesh
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
Confidential
4x4 Hierarchical Cluster
PU
PU
PU
RU
PU
PU
RU
PU
PU
PU
PU
PU
RU
PU
PU
RU
PU
RU
PU
PU
Confidential
PU
Simple 4x4 Cluster Wiring
6*N Wires
Hin1
Hout1
N
PU
PU
PU
PU
2L-2
Joint
1.4
M2 Pitch
Switch
Confidential
Hadr1
Bus width = 140u for 16 bit busses
That is a lot of wires!
Budget 4x4 Cluster area is 1mm2
Routing Hierarchy
PU
PU
PU
RU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
RU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
RU
PU
PU
PU
PU
PU
PU
PU
PU
PU
RU
PU
RU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
RU
PU
PU
PU
RU
RU
PU
PU
PU
PU
PU
RU1
PU
RU
RU
PU
PU
RU
RU1
RU
PU
PU
PU
PU
RU
RU1
RU
PU
PU
RU
PU
RU1
PU
PU
RU
PU
PU
PU
PU
RU
RU1
RU
PU
PU
RU2
RU
PU
PU
RU
RU2
PU
PU
PU
RU
PU
PU
RU
PU
PU
RU
RU1
RU
RU
PU
PU
RU1
RU
PU
PU
RU
RU1
PU
PU
RU
RU
PU
PU
RU
PU
PU
PU
PU
PU
PU
RU
PU
PU
PU
RU
PU
PU
PU
RU3
PU
PU
PU
RU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
RU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
RU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
RU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
RU
PU
PU
PU
PU
PU
PU
PU
RU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
RU
PU
PU
PU
PU
PU
PU
PU
PU
RU
PU
PU
RU
PU
PU
PU
PU
PU
PU
PU
PU
PU
PU
RU
PU
Confidential
PU
RU
PU
PU
RU1
RU
RU
PU
PU
RU
PU
PU
PU
RU
PU
PU
Hadr: up level till
L0adr: local address
L1adr: level 1 address
L2adr: level 2 address
L3adr: level 3 address
PU
PU
RU1
RU
PU
PU
RU
RU1
RU
PU





RU
RU1
RU
RU
RU1
PU
PU
RU
PU
PU
RU2
RU
PU
PU
RU
RU2
PU
PU
RU1
RU
RU
PU
RU
RU1
RU
PU
PU
RU
RU1
PU
PU
RU
RU
 256 PUs
 4 Levels of hierarchy
PU
RU
PU
PU
PU
Hadr L0adr L1adr L2adr
L3adr
Weeks Investigation (9/12/97)
Investigate routing structures
Dynamic routing assignment/programming
Compromise between area and flexibility
Support for tree of trees
Not a complete story yet!
Confidential
Routing Unit
Process
Unit
(PU)
Process
Unit
(PU)
Routing
Unit
(RU)
Process
Unit
(PU)
Process
Unit
(PU)
Confidential
 Full Duplex connect busses
 Each PU node controls its
source port via a 2 bit local or
6 bit hierarchical address
 Broadcast support
 Any node may listen to any
other input to the cluster
 Hierarchical node addressing
must not clash
Routing Unit PU Port Detail
from port 0
from port 1
from port 2
from port H
PU Input
N
PU Output
to other ports
N
PU Input address
6
2
4
s0
s1
&
Confidential
 Port numbering is clockwise &
relative to each PU port
 HBUS port is always at port 3
PU Overview
Simple data path functionality
Primitive control options
Wide instructions control data path function
and operand routing
Conditions may be inverted for “repeat
until” or “Branch If” control
Very primitive address arithmetic
32 or less instructions in program
Confidential
N Bit Functional Unit
A
Constbit
Constbit
mux0
mux1
LSin
RSin
SFTCTL
Bit Shift




Logic functions: OR, XOR, AND, 0, 1
Arithmetic: Add, subtract, Multiply
Shifts: single bit left and right
Conditional detection: 0, -1, <0, >0.
ALUCTL
ALU/MULT
Cout
Cin
Carry
Logic
DFF
mux2
F
Confidential
 More optimization needed
 Routing issues need more work
N Bit Functional Unit (V2)
Operands
N b it RAM
mux0
N b it RAM
mux1
LSin
RSin
SFTCTL
B Shift
Multiply
Sequencer
ALUCTL
ALU
Cout
Cin
Carry
Logic
DFF
mux2
Out
Confidential




Logic functions: OR, XOR, AND, 0, 1
Arithmetic: Add, subtract
Shifts: right and left shifts
Conditional detection: 0, <0, >0, OF
 Memory mapped RAM access to
operands
Instruction Fields
Field
ALU_CTL
SHIFT_CTL
MUX_CTL
BRANCH_ADR
COND_MSK
COND_FLD
EXT_COND_SRC
HEIR_ADDR
L0_ADDR
L1_ADDR
L2_ADDR
L3_ADDR

Comment
Bits
Control of Basic ALU Functions
5
Control of the operand shift
2
Control operand muxes
3
Next address if condition true
2
Condition mask
5
Condition field
5
Select source for external condition inputs 2
Hierarchical routing level address
2
Level 0 source address
2
Level 1 source address
2
Level 2 source address
2
Level 3 source address
2
?? + XN Bits per context
Confidential
PU Instruction Types
32 Bits
Data Process
Move
00
ALU_CTL, SFT_CTL, MUX_CTL, ROUTE_CTL
01 OP_SEL R/W
Operand_Value
Multiply
100 OP_SEL Options
Immediate Operand
Attention
101 Condition
Branch_Adr Options
Flag
Branch
110 Condition
Branch_Adr Options
Link
ROUTE_CTL Field:
Hadr L0adr L1adr L2adr
L3adr
Condition Field:
Invert +ve -ve zero OF X1 X0
Condition Mask
15 Bits
Confidential
Ext’ Source Sel
Condition Field
Condition Field:
Invert +ve -ve zero OF X1 X0
Condition Mask
Ext’ Source Sel
15 Bits
X[1:0] are external condition bits & may be
source from:
Operand bits
Global synchronization bus
Nearest nabough conditions outputs
Condition Mask is anded with flag bits
Confidential
Static Program
Data Process
Adr +1
Branch
PU Never changes function
Branch is set to always true
Just two Instructions
Confidential
Always
More Typical Program
Confidential
Open Issues
PU Data path width
Complexity of shift operations
RU Trunking
Number of contexts per PU
Flexible context RAM partitioning
Improve PU synchronization
Confidential
Shifter Instructions
Confidential
Design Tools
PU Assembler
Architecture mapping
Global resource allocation
Confidential
Conditional N Bit PU Cell
Input
Port address
A
Constbit
RSin
mux0
EXT[1:0]
ALUCTL
LSin
mux1
LSin
SFTCTL
ColSel
RAM
Address
Logic
B
RSin
Bit Shift
ALU/MULT
Branch
Cout
Condition
Logic
Cout
Carry
Logic
Cin
DFF
mux2
F
Out
Confidential
Cin
Commercial Viability
X5 performance improvement over
conventional solutions (mix of cost &
power)
Conceptually simple
Clearly defined target applications
Simple systems connections
Scaleable
Support hardware & software standards
Confidential
Conditional N Bit DPA Cell
Routing Matrix
A
Constbit
RSin
mux0
LSin
mux1
RSin
Bit Shift
ALUCTL
ALU
Branch
Cout
Carry
Logic
Condition
Logic
Cout
DFF
mux2
F
Routing Matrix
Confidential
Cin
Cin
Routing Matrix
EXT[1:0]
B
LSin
SFTCTL
ColSel
RAM
Routing Matrix
Address
Logic
4 Bit Cell:
180 Gates
112 Bits RAM
N Bit Wide DPA
A
Program
Storage
M Plane FU Decode
RAM StatusReg
N bit wide FU
Condition Logic
C
A
Program
Storage
M Plane FU Decode
RAM StatusReg
Program
Storage
Confidential
B
N bit wide FU
Condition Logic
C
A
M Plane FU Decode
RAM StatusReg
B
B
N bit wide FU
Condition Logic
N Bit Wide PU Block
OP Code
Source A
Source B
Shift Op
PipeBus
Status Msk
Instruction Format
Arbit
Local
RAM
N Bit wide Shift
Arbit
Inst
RAM
Addr
Logic
StatusReg
N bit wide ALU
Condition Logic
PipeBus
NOTES/QUESTIONS
- Inst has no const, but has offsets,
- Inst RAM can be small. 64 words?
- note counter takes 3 instructions.
- How much subroutine support? None?
- Simplified 16 bit or full 32 bit instructions.
- 2 or 4 local area busses?
- Synchronization issue: Master states accessible, Cond mask use.
- Option to break or combine N bit DP elements?
- Resource pool on busses? E.g... MULT?
- Approx.. size of 32 bit FU 800u x 500u?
- If so a 16x8 processor array is possible.
- I.e.. 128 processors at 100MHz = 12800MIPS
- Turn off till global state instruction for power reduction
- Handling of interrupts (if at all)
- Handle global signal interrupts how?
- Multiple bit wide segmentation through masks? E.g... 2 counter in one PU?
Confidential
I Decode
State
HierBus
BusW
BusX
B
A
Potential Configuration




128 32 Bit “Pico” Process Units
12800MIPS @ 100MHz
80mm2 in 0.35u CMOS
Concept of hierarchical hardware
scope
 Very fast streaming operations
 Simple PU programming model
 Applications:
 Video processing
 LAN Routing
 DSP Fast Prototyping
Confidential
16 x 8 PU
ARRAY
Controller
Global
Ram
MUX/DMA/FIFO
RAMBUS Interface
256
PU Program Environment
Operands: BusW, BusX, Accumulator,
HierBus, PipeBus, Local Ram.
Use PU Typically runs a small program
– May be as little as two instructions
– 64 words of code maximum
Instruction types:
Arithmetic, logical
Data moving
Interrupt
Confidential
Function Instructions
Arithmetic
1
Counter
1-2
Mux
1
Multiply Accumulate
3
FIFO Stage
3
Multiport Register
1
Shift Register
2
Architecture Figures of Merit
Average density vs application specific cells
Speed of applications vs hardwired logic
Percentage reuse
Confidential
Next Steps
VHDL Modeling of Architecture
Primitive assembler tools for PUs
Selection coding and simulation of
applications
Architecture tuning
Layout and verification of complete DPA
Confidential
Design Tools
Tanner:
Schematic entry, logic simulation, custom layout,
layout verification.
Circuit Simulation.
PC & Sun platforms.
MOSIS Libraries.
Mentor Graphics:
VHDL compilation and simulation.
Confidential
Basic FU Routing
FU
FU
FU
FU
FU
FU
FU
FU
FU
FU
FU
FU
Confidential