A Framework of Embedded Reconfigurable Systems based on Re-locatable Virtual Components.pptx

Download Report

Transcript A Framework of Embedded Reconfigurable Systems based on Re-locatable Virtual Components.pptx

A Framework of Embedded
Reconfigurable Systems based on
Re-locatable Virtual Components
Authors: Victor Dumitriu and Lev Kirischian
Source:
International Journal of Embedded Systems, vol. 4,
no. 3/4, pp. 182–194, 2010
Presenter: Jun-Yang Peng
Date: 2016/8/3
Outline
•
•
•
•
•
•
•
•
Introduction
Related Works
Architecture
Virtual Component Organization
On-Chip Loading and Relocation
System Level Support
Evaluation Results
Conclusions and Future Work
2
Introduction
• High performance embedded systems are
oriented towards multi-task and multi-modal
data-stream processing applications
• Cost-performance value is the limit of the
embedded systems
3
Introduction (cont.)
• The main reason is lack of reuse of relatively expensive
FPGA resources in the time domain
• Modern FPGA devices allows
– run time reconfiguration of computing and interface
resources
– reconfigure parts of resources without suspension
• Multi-task and Multi-modal
– only a part of circuitry needs to be allocated in the chip for
the entire period application activity
– these components to be kept in “virtual” form
4
Outline
•
•
•
•
•
•
•
•
Introduction
Related Works
Architecture
Virtual Component Organization
On-Chip Loading and Relocation
System Level Support
Evaluation Results
Conclusions and Future Work
5
Related Works
• Virtualization of on-chip computing resources
– divide computation process into fixed-size
segments with time-multiplexing on on-chip
physical hardware resources
• Coarse-grained versus Fine-grained
– the fine-grain homogenous resources allow the
more efficient adaptation of the data-path on
variations of algorithm and data structure
6
Virtual Hardware Components
• VHC are macro-function specific processing
circuits represented in the form of configuration
bit-stream
– requires certain static infrastructure at the start-up
time
– incorporates a set of slots-and-sockets for VHCs
– associated communication routing for all modes of
the task
– depending on task mode(task algorithm or data
structure), the associated set of VHCs be loaded into
the addressed slots
7
Effective Problem
• Run-time allocation and re-location of virtual
hardware components in the target FPGA
– requires quite specific architectural organization
of VHCs
– organization of static interconnect infrastructure
• Inter-component communication
– Networks-on-Chip
Dedicated communication (Multiplexer)
8
Outline
•
•
•
•
•
•
•
•
Introduction
Related Works
Architecture
Virtual Component Organization
On-Chip Loading and Relocation
System Level Support
Evaluation Results
Conclusions and Future Work
9
Architecture
• Include
– System-level requirements
– On-Chip-level requirements
– Individual components
• Analyze
– Timing overhead
– Performance limitations
– Hardware overhead
10
Outline
•
•
•
•
•
•
•
•
Introduction
Related Works
Architecture
Virtual Component Organization
On-Chip Loading and Relocation
System Level Support
Evaluation Results
Conclusions and Future Work
11
Virtual Component Organization
• Effectiveness of the re-locatable VHCs systems
– reconfiguring a portion of the device while the
rest of the device continues operating.
– only components which are needed for particular
mode are activated.
– smaller FPGA device can be used
12
Static & Dynamic Multi-Modal Systems
Static
Dynamic
MUX
Figure 1: Difference between static and dynamic multi-modal system implementation
13
VHC Structure and Behavior
• VHC behavior can be divided into two aspects:
a) fulfill processing
b) accommodate the virtual nature of the system
• VHCs have the ability to:
1) synchronize themselves with the system
2) should have “safe” states
3) uniform communication interface
14
VHC Structure and Behavior (cont.)
• VHC consist of two primary components:
a) data-path (ex. buffering or processing)
b) control unit (ex. stall or synchronize)
Figure 2: VHC structure and mode variation
15
Outline
•
•
•
•
•
•
•
•
Introduction
Related Works
Architecture
Virtual Component Organization
On-Chip Loading and Relocation
System Level Support
Evaluation Results
Conclusions and Future Work
16
On-Chip Loading and Relocation
• Infrastructure Specification and Requirements
– Flexibility: load any VHC into any of the system
slots (loading policy)
– Efficiently: accommodate changes in workload or
mode (homogenize resource)
Figure 3: General architecture for systems using virtual hardware resources
17
Loading Process
• Loading process
1. Configure the FPGA with the VHC bit-stream
targeting a specific slot
2. Stall components that depend on the newlyloaded component
3. Connect the newly-loaded component to the
rest of the system
4. Start / re-synchronize the loaded component
and the stalled components surrounding it
18
Relocation Process
• Relocation Process
1. Stall / stop components that depend on the
component being relocated
2. Disconnect the component being relocated from the
rest of the system
3. Configure the FPGA with the VHC bit-stream
targeting a new slot
4. Re-connect the component to the rest of the system.
5. Start / re-synchronize the loaded component and
the stalled components surrounding it
19
Connection/Disconnection example
Figure 4: Connection/Disconnection Protocol
20
Signal Propagation Delay
1. VHCs be loaded into any slot in the system
2. The relocation capabilities of the system
• Signal Propagation Delay
– incorrect system operation
– performance decrease
Figure 5: Delay variation due to relocation
21
Communication Infrastructure
• The communication infrastructure should:
a) provide physical communication channels
between any components in the system
b) change communication channels if necessary
c) provide connection and disconnection
operations between component slots
d) offer some immunity from delay variations
22
Communication Infrastructure (cont.)
• Switch simply offers a physical channel
between components
• The switch is controlled by an external entity
Control line
Figure 6: Four-port switch
23
Eliminate Signal Propagation Delay
• No longer dictated by propagation delays
caused by the wire length
– monitor wire delays
– manually insert registers
Figure 7: Four slot system with delay variation mitigation
24
Hierarchical Inter-component Infrastructure
• Complex hierarchical system structures can be
composed using multiple switches
– complex switch structures only increase initial
latency
Figure 8: Hierarchical communication infrastructure
25
Hierarchical Inter-component Infrastructure
• Without
– external intervention
– complete reconfiguration
Figure 9: Variation in communication infrastructure architecture using VHCs
26
Outline
•
•
•
•
•
•
•
•
Introduction
Related Works
Architecture
Virtual Component Organization
On-Chip Loading and Relocation
System Level Support
Evaluation Results
Conclusions and Future Work
27
System Level Architecture Organization
• The block diagram is divided in two parts:
– The upper part is associated with hierarchy of
configuration memory and sub-system
Figure 10: Block diagram of RCS architecture
28
System Level Architecture Organization
• The block diagram is divided in two parts:
– The bottom part is associated with data-stream(s)
processing
Figure 10: Block diagram of RCS architecture
29
Outline
•
•
•
•
•
•
•
•
Introduction
Related Works
Architecture
Virtual Component Organization
On-Chip Loading and Relocation
System Level Support
Evaluation Results
Conclusions and Future Work
30
Video Processing Using VHCs
• The system consists of a static acquisition and
display unit
– used to connect to a camera board and a VGA
monitor
• Two VHCs types were used
– a smoothing mask (eliminate noise)
– a Y-dimensional Sobel mask (edge detection)
31
Video Processing Using VHCs (cont.)
• A host PC
– was used to configure the FPGA
– store the full and partial bit-streams needed by
the system
– control entity to connect/disconnect components
• JTAG configuration interface
32
Video Processing Systems
Figure 11-a: Four-slot video
processing system
Figure 12-a: Eight-slot video
processing system
33
Video Processing Systems (cont.)
Figure 11-b: Four-slot system
post-place-and-route schematic
Figure 12-b: Eight-slot system
post-place-and-route schematic
34
Video Processing Systems (cont.)
Figure 11-b: Four-slot system
post-place-and-route schematic
Figure 13: VHC post-place-androute schematic
35
Relocation Process Timing Analysis
• Timing overhead:
• Transmit control information (less 160µs) and
switch delay (most 100ns)
1. Disconnect the down-stream sink from the source of
the VHC being relocated.
2. Partially reconfigure the FPGA with the new VHC bitstream, targeting the destination slot.
3. Connect the sink of the relocated VHC to the upstream source.
4. Connect the down-stream sink to the source of the
relocated VHC.
36
Relocation Process Timing Analysis
• Timing overhead:
• Relocation
– reconfiguration time needed to download a new
bit-stream to the FPGA
Size (bytes)
Tconf
JTAG (s)
Tconf
SelectMAP32 (ms)
System (Complete)
5043464
6.73
12.6
Component (VHC)
90578
0.12
0.23
Table 1: Bit-stream size and configuration times
37
Infrastructure Resource Utilization
• as the number of system slots increases
– size of a fully connected switch will increase
rapidly with the number of ports
Infrastructure
System Gates
LUTs
Flip-Flops
Overhead Per slot
INT2
660
30
60
0.67%
INT4
2173
154
154
1.16%
INT4M
2173↑
394
362
2.84%
INT8
6000
569
337
1.71%
INT8M
6000↑
569
817
2.61%
Table 2: Communication infrastructure resource use
38
Infrastructure Performance
• Via the ChipScope Analyzer
• The infrastructure could operate correctly at
up to 425 MHz operating clock frequencies
– meaning it can accommodate high-performance
systems
39
Outline
•
•
•
•
•
•
•
•
Introduction
Related Works
Architecture
Virtual Component Organization
On-Chip Loading and Relocation
System Level Support
Evaluation Results
Conclusions and Future Work
40
Conclusions
• This paper presented a framework for the
design of systems utilizing virtual hardware
resources in the form of Virtual hardware
components
• In addition, the structure can accommodate
component loading and relocation
41
Future Work
• Development of
– memory hierarchy for VHC storage
– dedicated cache
– automated management and control modules
42
Afterthought
Figure 11-a: Four-slot video
processing system
Figure 12-a: Eight-slot video
processing system
43
Thanks for your listening