12JAN07 Talk for I/UCRC Annual Meeting

Download Report

Transcript 12JAN07 Talk for I/UCRC Annual Meeting

FPGA Partial Reconfiguration
Presented by: Abelardo Jara-Berrocal
HCS Research Laboratory
College of Engineering
University of Florida
April 10th, 2009
Outline





Introduction
Partial Reconfiguration (PR) Overview
Proposed Design Methodologies
Framework analysis
F4: Virtual Architecture for Partial Reconfiguration
and Design Automation for PR Design
2
Introduction – Fully reconfigurable systems
Module A
Battery
Config 1Module A
FPGA
Does’nt fit
Module B
Module B
Module C
A
Configuration
disabled
enabled lines
Module C
Config 2
System
controller
Module B
Module B
Module C
disabled
General purpose
I/O
disabled
enabled
Bitstreams
storage
disabled
Module A
Required design
Config 3
Module C
External I/O
Shared
memory
Config
Config1 2Request
Request
Design station
1. Device too small for complex designs
2. Big full bitstreams (long reconfiguration time)
3. Complete system operation is halted prior to reconfiguration
3
Introduction – Modular Reconfiguration

Types of Modular Dynamic Reconfiguration:



Static Partial Reconfiguration: Reconfiguring a portion of the device
(changing the functionality) when the device is inactive without
affecting other areas of the device
Dynamic Partial Reconfiguration (PDR): Reconfiguring a portion of
the device while the remaining design is still active and operating
)
without affecting the remaining portion of the device.
Virtex 4 and Virtex 5 devices support DPR
Reconfigurable
region 1
Reconfigurable
region 2
4
Partial Reconfiguration
Partial Reconfiguration is useful for systems with
multiple functions that can time-share the same
FPGA resources.
TERMINOLOGY
 Reconfigurable Region (PRR)
 Reconfigurable Module (PRM)
 Static Logic
 Bus Macro
 Partial Bitstream
 Merged Bitstream

5
Introduction – A sample PR architecture
Battery
FPGA
Module A
ICAP
Module
ModuleBB
Flash controller
Module C
Bitstreams
storage
External I/O
Module A
C
Controller
(Microblaze)
disabled enabled
disabled enabled
Module B
JTAG
Base system
configuration
Reconfigurable area
Static area
Module A request
1. System controller does not need to be placed in an external device
2. Access to fast Internal Configuration Access Port (ICAP – 32 bits, 100 MHz)
3. Smaller partial bitstreams
4. No need to halt complete system when reconfiguring a module
5. Time multiplexing of FPGA resources, load and unload HW modules on demand
6
Medium for Partial Reconfiguration



External – JTAG, UART (RS232)
Internal – ICAP
ICAP (Internal Configuration Access Port)
 Self-Reconfiguration controlled by soft-processor
o


Internal read and write access to configuration logic
Faster
HWICAP (provided by Xilinx)
o
o
o
Wraps the ICAP with additional logic to read and write frames to BRAM
Slave to PLB (Processor Peripheral Bus)
100MHz, 32 bits
7
Additional considerations

General benefits from PDR






Saves space on the FPGA
Less time to change only a part of design
Reduction of power dissipation by storing functionality to external
memory
Smaller FPGAs can be used to run an application
Architecture adaptation
Architecture adaptability

Main advantage, system can modify its internal modules based two
schemes

Data-Driven: Characteristics of input data changes at the runtime


Artificial intelligence, Evolutionary architectures, Adaptive Signal Processing
Situation-Driven: System load/unload modules to adapt to environment
conditions

Adaptive Fault tolerance, intelligent management of system resources
8
Bus
Macros
 Bus Macros: Means of communication between PRMs and



static design
All connections between PRMs and static design must
pass through a bus macro with the exception of a clock
signal
Type of Bus Macros
 Tri-state buffer (TBUF) based bus macros
 Slice-based (or LUT-based) bus macros
Advantage of slice-based bus macros
 No signals lines should cross the border in partial
reconfiguration
 TBUFs – will ignore the boundaries
 Slice-based – signals not crossing boundaries
9
LUT-based Slice Macros
10
Introduction – Current PR Design Flow




ICAP
Static modules
Many manual steps





Controller
(Microblaze)

Partition the system into modules
Define static modules and
reconfigurable modules
Decide the number of PR regions
(PRRs)
Decide PRR sizes, shapes and
locations
Map modules to PRRs
Define PRR interfaces, instantiate
slice macros for PRR interfaces
Flash controller

Design partitioning
Number of PRRs
PRR sizes, shapes and locations
Mapping PRMs to PRRs
Type and placement of PRR Static modules
interfaces
# of PRRs? 2
1
11
Module A
Module C
Module B
Reconfigurable Modules (PRMs)
Modules: A and B
Static region

Steps
FPGA
Design
floorplanning
and budgeting
Design
partitioning

PRR 1
PRR 2
Modules: C
Introduction – Early Access PR Design Flow


Introduced by Xilinx in FPL’06
Major improvements:





Automatic implementation scripts
Rectangular regions (not full column reconfiguration)
Static nets can cross reconfigurable regions
Slice macros replace bus macros
Partitioning and floorplanning steps are manually executed

Design guidelines for these steps are not provided
Reconfigurable
design
specifications
Placement and PRRs
constraints
PRM Bitstreams
Design
partitioning
Design floorplanning
and budgeting
(manual)
Xilinx PR
Implementation
Flow
Full Initial Bistream
Potential for development of automatic CAD tools
12
(automatic)
Introduction – Current PR design tools limitations


PR design is a very specialized task
Only a physical level of support is provided



Partitioning and floorplanning steps are manually executed



Architectural knowledge of the target device is a must
Not very flexible, many design constraints
No performance sensitive design guidelines are provided
No automatic heuristics based design flow is available too
Lack of abstraction from low level details
13
PR Overview – Taxonomy of PR systems design flows
PR Designs
Multipurpose
Special purpose

Highly specialized systems design

All PRMs that will exist on the system are
known at design time

Each PRR is independently optimized
(size, shape, location, interface) based on
the PRMs that will be mapped to it


Not optimized for a specific application

PRMs required by the application are
not known when designing the base
system

Goal is to design a flexible and
reusable base design that can be used
for several different PR systems

Base system designer defines a set of
PRRs with fixed shapes, sizes,
locations and interfaces

Generated floorplan is used as input
template for the PRMs implementation
Output is:
1)
Floorplan defining a static region and a
set of optimized PRRs
2)
The set of PRMs that can be placed in
each PRR (PRMs to PRRs mapping)
14
PRR Geometries

PR system design flows require:



Study of the effects of varying
PRR shape over



Maximum Clock Frequency
Partial Bitstream Size
Five separate test cores:




Proper metrics for PRR
performance analysis
Design guidelines for efficient PRR
floorplanning
Beamforming (DSP/slice)
CFAR (slice/memory)
AES (register)
Performed on V4SX55 thus far
15
Aspect ratio =
PRR Height / PRR Width
Framework analysis – Beamforming (~125 MHz, 40%)



Bitstream size (kB)
Clock frequency (MHz)

5022 slices
16 DSP48s
17 RAMB16s
Baseline, non-PR performance = 1614 kB, 127.845 MHz
Aspect ratio
Aspect ratio
16
Framework analysis – CFAR (~100 MHz, 16%)


Bitstream size (kB)

2610 slices
2 DSP48s
34 RAMB16s
Baseline, non-PR performance = 1001 kB, 103.616 MHz
Clock frequency (MHz)

Aspect ratio
Aspect ratio
17
Framework analysis – AES (~80 MHz, 13.75%)


Bitstream size (kB)

3634 slices
3943 registers
4 RAMB16s
Baseline, non-PR performance = 1393 kB, 80.483 MHz
Clock frequency (MHz)

Aspect ratio
Aspect ratio
18
F4: Virtual Architecture and
Design Automation for Partial
Reconfiguration
Dr. Ann Gordon-Ross
Dr. Alan D. George
UF ECE Faculty
Abelardo Jara
Shaon Yousuft
Rohit Kumar
Terence Frederick
CHREC Students
Approach

Task 3: Bitstream
Relocation


Port Bit Reloc to
Microblaze
Context save and restore
for PRMs
PR for
Application
Designers


Task 1: VA for PR Adaptive
Embedded Systems



Task 2: PR Design Flow
Automation

SCORES Inter-module
Communication Architecture
VAPRES Multipurpose Base
Embedded Platform
Initial Research on fast
algorithms for online PRMs
placement and scheduling


20
Framework to model and
design PR systems
Identification of points in
Xilinx PR Design Flow
amenable for automation
Software tools (C/C++
programs/scripts) for
automatable steps
Background – VA for Adaptive PR Embedded Systems
Multi-purpose base system platform to
build runtime-adaptive HW processing
embedded systems

Architectural support for on-demand HW
module loading/unloading
HW modules can offer better
performance than SW modules


Exploit increased parallelism
Main bottleneck:


Adaptive embedded system
at each processing node
Type A target
VA benefits:




Type B target
Adaptive base system platform


Inter-module communication flows
through centralized controller
Can be alleviated by adding custom
inter-module communication architecture
Target B
Response to environmental changes
HW/SW partitioned applications
Time-shared virtual resources enable
larger available area for system
operations
Improved system resource utilization
Case study application:
PR for Mobile Agents
21
External memory
Type
A module
Free
slot
Type A module
Type B module
SCORES

e.g. Geographical area
divided into 4 regions (one
processing node per region)
Controller and peripherals

Target A
VAPRES

VAPRES- (Virtual Architecture for Partially Reconfigurable Adaptive
Embedded Systems)
UART
Fast
Simplex
Link (FSL)
Flash
controller
BUFR
PRR1
ICAP
Network
Switch
PRM A PRR2
Interface
Interface
PRR3
Interface
PRR4
Interface
Shared memory
Network
(other
VAPRES
nodes)
Microblaze
PLB Bus
USB
Network-on-chip (SCORES)

VAPRES Motivations/Benefits




Embedded base architecture for
multi-purpose PR systems
Facilitates dynamic HW modules
placement and scheduling
Provides dynamic module frequency
scaling
Computing power can be distributed
among VAPRES-based nodes

VAPRES Architectural Components

Partially Reconfigurable Regions (PRRs)



Controlling agent (Microblaze):




22
Independently clocked using BUFRs
PR modules (PRMs) can span multiple PRRs
Dynamic module placement and scheduling
Module control and context save/restore
Partial reconfiguration through ICAP
Communication with other VAPRES nodes
Background – Current Application PR Design Flow
PR is a very powerful feature of
Xilinx FPGAs, but requires
specialized skills






Partition the application into modules
Define static modules and partially
reconfigurable modules (PRMs)
Determine the number of PR regions
(PRRs)
Determine PRR sizes, shapes, and
locations (resource allocation)
Map PRMs to PRRs
Define PRR interfaces and instantiate
slice macros for PRR interfaces
Static modules
Automatiable points and optimization
problems (design-time)

Design partitioning
# of PRRs? 1
2





Central
Controlling Agent
Manual steps
Number of PRRs
PRR sizes, shapes, and locations Static
modules
Mapping PRMs to PRRs
Type and placement of PRR interfaces
Reconfiguration schedule
Module A
Module C
Module B
Reconfigurable Modules (PRMs)
Modules: A and B
Static region

Mem controller
Design
Design
floorplanning partitioning
and budgeting

ICAP
FPGA

PRR 1
PRR 2
Potential for automation through C/C++ programs or scripts
23
Modules: C
Questions
24