Document 7277195

Download Report

Transcript Document 7277195

Course-Grained Reconfigurable Devices
Dataflow Machines
• General Structure:
 ALU-computing elements,
 Programmable interconnections,
 I/O components.
• Most dominating coarse-grained systems:
 PACT XPP
 NEC-DRP
 PicoChip
 Morphosys
 [RaPiD]
 Chameleon
2
3
PACT XPP
• V. Baumgarte, G. Ehlers, F. May, A. Nueckel, M. Vorbach, and
M. Weinhardt, “PACT XPP A self-reconfigurable data
processing architecture,” J. Supercomput., vol. 26, no. 2, pp.
167–184, 2003.
• M. Petrov, T. Murgan, F. May, M. Vorbach, P. Zipf, and M.
Glesner, “The XPP architecture and its co-simulation within the
simulink environment.” in Proceedings of International
Conference on Field-Programmable Logic and Applications
(FPL), ser. Lecture Notes in Computer Science (LNCS), vol.
3203. Antwep, Belgium: Springer, Aug. 2004, pp. 761–770.
• http://www.pactxpp.com/
4
PACT XPP
• Aim:
 Efficiently compute streams of data provided from
different sources (e.g. A/D converters) rather than
single instructions (as in Von-Neumann computers).
• Characteristic:
 Computation should be done while data are streaming
through the processing elements
  it is suitable to configure the PEs to adapt to the
natural computation paradigm of a given application.
5
Course Grain Architectures
6
PACT XPP: Architecture
• XPP (Extreme Processing Platform)
 A hierarchical structure consisting of PAEs
• PAEs
 Course grain PEs
 Adaptive
 Clustered in PACs
 PA = PAC + CM
 A hierarchical
configuration tree
 Memory elements
(aside PAs)
 I/O elements (on
each side of the
chip)
PA
PA
PA
PA
7
PACT XPP Architecture: CM
•
CM (Configuration Manager):
 Powerful run-time reconfiguration:
− Configuration control is distributed over several CMs
− PAEs can be configured rapidly in parallel while neighboring PAEs are processing
data.
 Entire applications can be configured and run independently
on different parts of the array.
 Reconfiguration can be triggered:
− externally or
− internally (by special event signals originating within the array
−  self-reconfiguring
•
Local CM:
 One configuration manager (CM) attached to a local memory
is responsible for writing configuration onto a PA.
 The CMs at a lower level are controlled by a CM at the next
higher level.
•
Root CM:
 Attached to an external configuration memory.
 Supervises the whole device configuration.
8
XPP Architecture
• Scalability:
 Can cascade multiple devices in a multi-chip module
  Root CMs act like ordinary, subordinate CMs
• CM:
 consists of a state machine +
 internal RAM for configuration caching
9
PACT XPP Architecture: PAE
1. ALU PAE has:
1. ALU: is configured to perform basic operations:
− Common fixed-point arithmetical and logical operations
− Special three-input opcodes (e.g. multiply-add, sort, counters)
− Generate events (e.g. counting termination, ovf, …)
2. Back Register: provides routing channels for data and
events from bottom to top
3. Forward Register: provides routing channels from top to
bottom
10
PACT XPP Architecture: PAE
 Dataflow-Registers: used at the object output for data
buffering in case of a pipeline stall.
 Input Registers : can be pre-loaded by configuration
data and always provide single cycle stall.
11
PACT XPP Architecture: PAE
2.
RAM PAE:

As ALU PAE but instead of ALU, it has a dual port RAM
 Useful for data storage (intermediate results)
−
Can be used in FIFO or RAM mode
 Useful for LUT-based functions
 The RAM generates a data packet after an address was
received at the input.
 Writing to the RAM requires two data packets:
1. for the address
2. for the data to be written.
RAM
12
PACT XPP Architecture: Communication
• PAE Objects communicate via a packet-oriented network:
 Two types of packets:
− Data packets: uniform bit width for a device (specific to the device type,
e.g 32)
− Event packets: one or a few bits wide
 Self-synchronizing:
− An operation is performed as soon as all necessary data input packets
are available.
− The results are forwarded as soon as they are available, provided the
previous results have been consumed.
−  Thus possible to map a DFG directly to ALU objects, and to pipeline
input data streams through it.
 Event signals:
− can trigger a self-reconfiguration
− Can control the merging of data-streams
13
PACT XPP: Routing
•
Routing and Communication:
 Two independent networks:
1. for data transmission
2. for event transmission
14
PACT XPP: Routing
1. Horizontal Channel
•
Vertical routing channels
to connect a PAE within a row.
2. Vertical Channel
•
to connect objects to a given
horizontal bus.
3. Configuration Bus
Horizontal routing channels
15
PACT XPP: Interface
 Number and type of interfaces vary
from device to device
•
XPP42-A1: 6 internal interfaces
consisting of:
 4 identical general purpose I/O onchip interfaces (bottom left, upper
left, upper right, and bottom right)
 One configuration manager (not
shown on the picture)
Interfaces
 One JTAG (Join Test Action Group,
"IEEE Standard 1149.1") Boundary
scan interface or for testing
purpose
16
2.1 The PACT XPP - Interface
 The I/O interfaces can operate
independent from each other.
•
Two operation modes
 The RAM mode
 The streaming mode
•
RAM mode:
 Each port can access external
Static RAM (SRAM).
 Control signals for the SRAM
transaction are available.
 No additional logic required
17
2.1 The PACT XPP - Interface
• Streaming mode:
 For high speed streaming of
data to and from the device
 Each I/O element provides two
bidirectional ports for data
streaming
 Handshake signals are used for
synchronization of data packets
to external port
18