A Framework of Embedded Reconfigurable Systems based on Re-locatable Virtual Components.pptx
Download ReportTranscript A Framework of Embedded Reconfigurable Systems based on Re-locatable Virtual Components.pptx
A Framework of Embedded Reconfigurable Systems based on Re-locatable Virtual Components Authors: Victor Dumitriu and Lev Kirischian Source: International Journal of Embedded Systems, vol. 4, no. 3/4, pp. 182–194, 2010 Presenter: Jun-Yang Peng Date: 2016/8/3 Outline • • • • • • • • Introduction Related Works Architecture Virtual Component Organization On-Chip Loading and Relocation System Level Support Evaluation Results Conclusions and Future Work 2 Introduction • High performance embedded systems are oriented towards multi-task and multi-modal data-stream processing applications • Cost-performance value is the limit of the embedded systems 3 Introduction (cont.) • The main reason is lack of reuse of relatively expensive FPGA resources in the time domain • Modern FPGA devices allows – run time reconfiguration of computing and interface resources – reconfigure parts of resources without suspension • Multi-task and Multi-modal – only a part of circuitry needs to be allocated in the chip for the entire period application activity – these components to be kept in “virtual” form 4 Outline • • • • • • • • Introduction Related Works Architecture Virtual Component Organization On-Chip Loading and Relocation System Level Support Evaluation Results Conclusions and Future Work 5 Related Works • Virtualization of on-chip computing resources – divide computation process into fixed-size segments with time-multiplexing on on-chip physical hardware resources • Coarse-grained versus Fine-grained – the fine-grain homogenous resources allow the more efficient adaptation of the data-path on variations of algorithm and data structure 6 Virtual Hardware Components • VHC are macro-function specific processing circuits represented in the form of configuration bit-stream – requires certain static infrastructure at the start-up time – incorporates a set of slots-and-sockets for VHCs – associated communication routing for all modes of the task – depending on task mode(task algorithm or data structure), the associated set of VHCs be loaded into the addressed slots 7 Effective Problem • Run-time allocation and re-location of virtual hardware components in the target FPGA – requires quite specific architectural organization of VHCs – organization of static interconnect infrastructure • Inter-component communication – Networks-on-Chip Dedicated communication (Multiplexer) 8 Outline • • • • • • • • Introduction Related Works Architecture Virtual Component Organization On-Chip Loading and Relocation System Level Support Evaluation Results Conclusions and Future Work 9 Architecture • Include – System-level requirements – On-Chip-level requirements – Individual components • Analyze – Timing overhead – Performance limitations – Hardware overhead 10 Outline • • • • • • • • Introduction Related Works Architecture Virtual Component Organization On-Chip Loading and Relocation System Level Support Evaluation Results Conclusions and Future Work 11 Virtual Component Organization • Effectiveness of the re-locatable VHCs systems – reconfiguring a portion of the device while the rest of the device continues operating. – only components which are needed for particular mode are activated. – smaller FPGA device can be used 12 Static & Dynamic Multi-Modal Systems Static Dynamic MUX Figure 1: Difference between static and dynamic multi-modal system implementation 13 VHC Structure and Behavior • VHC behavior can be divided into two aspects: a) fulfill processing b) accommodate the virtual nature of the system • VHCs have the ability to: 1) synchronize themselves with the system 2) should have “safe” states 3) uniform communication interface 14 VHC Structure and Behavior (cont.) • VHC consist of two primary components: a) data-path (ex. buffering or processing) b) control unit (ex. stall or synchronize) Figure 2: VHC structure and mode variation 15 Outline • • • • • • • • Introduction Related Works Architecture Virtual Component Organization On-Chip Loading and Relocation System Level Support Evaluation Results Conclusions and Future Work 16 On-Chip Loading and Relocation • Infrastructure Specification and Requirements – Flexibility: load any VHC into any of the system slots (loading policy) – Efficiently: accommodate changes in workload or mode (homogenize resource) Figure 3: General architecture for systems using virtual hardware resources 17 Loading Process • Loading process 1. Configure the FPGA with the VHC bit-stream targeting a specific slot 2. Stall components that depend on the newlyloaded component 3. Connect the newly-loaded component to the rest of the system 4. Start / re-synchronize the loaded component and the stalled components surrounding it 18 Relocation Process • Relocation Process 1. Stall / stop components that depend on the component being relocated 2. Disconnect the component being relocated from the rest of the system 3. Configure the FPGA with the VHC bit-stream targeting a new slot 4. Re-connect the component to the rest of the system. 5. Start / re-synchronize the loaded component and the stalled components surrounding it 19 Connection/Disconnection example Figure 4: Connection/Disconnection Protocol 20 Signal Propagation Delay 1. VHCs be loaded into any slot in the system 2. The relocation capabilities of the system • Signal Propagation Delay – incorrect system operation – performance decrease Figure 5: Delay variation due to relocation 21 Communication Infrastructure • The communication infrastructure should: a) provide physical communication channels between any components in the system b) change communication channels if necessary c) provide connection and disconnection operations between component slots d) offer some immunity from delay variations 22 Communication Infrastructure (cont.) • Switch simply offers a physical channel between components • The switch is controlled by an external entity Control line Figure 6: Four-port switch 23 Eliminate Signal Propagation Delay • No longer dictated by propagation delays caused by the wire length – monitor wire delays – manually insert registers Figure 7: Four slot system with delay variation mitigation 24 Hierarchical Inter-component Infrastructure • Complex hierarchical system structures can be composed using multiple switches – complex switch structures only increase initial latency Figure 8: Hierarchical communication infrastructure 25 Hierarchical Inter-component Infrastructure • Without – external intervention – complete reconfiguration Figure 9: Variation in communication infrastructure architecture using VHCs 26 Outline • • • • • • • • Introduction Related Works Architecture Virtual Component Organization On-Chip Loading and Relocation System Level Support Evaluation Results Conclusions and Future Work 27 System Level Architecture Organization • The block diagram is divided in two parts: – The upper part is associated with hierarchy of configuration memory and sub-system Figure 10: Block diagram of RCS architecture 28 System Level Architecture Organization • The block diagram is divided in two parts: – The bottom part is associated with data-stream(s) processing Figure 10: Block diagram of RCS architecture 29 Outline • • • • • • • • Introduction Related Works Architecture Virtual Component Organization On-Chip Loading and Relocation System Level Support Evaluation Results Conclusions and Future Work 30 Video Processing Using VHCs • The system consists of a static acquisition and display unit – used to connect to a camera board and a VGA monitor • Two VHCs types were used – a smoothing mask (eliminate noise) – a Y-dimensional Sobel mask (edge detection) 31 Video Processing Using VHCs (cont.) • A host PC – was used to configure the FPGA – store the full and partial bit-streams needed by the system – control entity to connect/disconnect components • JTAG configuration interface 32 Video Processing Systems Figure 11-a: Four-slot video processing system Figure 12-a: Eight-slot video processing system 33 Video Processing Systems (cont.) Figure 11-b: Four-slot system post-place-and-route schematic Figure 12-b: Eight-slot system post-place-and-route schematic 34 Video Processing Systems (cont.) Figure 11-b: Four-slot system post-place-and-route schematic Figure 13: VHC post-place-androute schematic 35 Relocation Process Timing Analysis • Timing overhead: • Transmit control information (less 160µs) and switch delay (most 100ns) 1. Disconnect the down-stream sink from the source of the VHC being relocated. 2. Partially reconfigure the FPGA with the new VHC bitstream, targeting the destination slot. 3. Connect the sink of the relocated VHC to the upstream source. 4. Connect the down-stream sink to the source of the relocated VHC. 36 Relocation Process Timing Analysis • Timing overhead: • Relocation – reconfiguration time needed to download a new bit-stream to the FPGA Size (bytes) Tconf JTAG (s) Tconf SelectMAP32 (ms) System (Complete) 5043464 6.73 12.6 Component (VHC) 90578 0.12 0.23 Table 1: Bit-stream size and configuration times 37 Infrastructure Resource Utilization • as the number of system slots increases – size of a fully connected switch will increase rapidly with the number of ports Infrastructure System Gates LUTs Flip-Flops Overhead Per slot INT2 660 30 60 0.67% INT4 2173 154 154 1.16% INT4M 2173↑ 394 362 2.84% INT8 6000 569 337 1.71% INT8M 6000↑ 569 817 2.61% Table 2: Communication infrastructure resource use 38 Infrastructure Performance • Via the ChipScope Analyzer • The infrastructure could operate correctly at up to 425 MHz operating clock frequencies – meaning it can accommodate high-performance systems 39 Outline • • • • • • • • Introduction Related Works Architecture Virtual Component Organization On-Chip Loading and Relocation System Level Support Evaluation Results Conclusions and Future Work 40 Conclusions • This paper presented a framework for the design of systems utilizing virtual hardware resources in the form of Virtual hardware components • In addition, the structure can accommodate component loading and relocation 41 Future Work • Development of – memory hierarchy for VHC storage – dedicated cache – automated management and control modules 42 Afterthought Figure 11-a: Four-slot video processing system Figure 12-a: Eight-slot video processing system 43 Thanks for your listening