SHARC programming model

Download Report

Transcript SHARC programming model

Lecture 7
FPGA technology
Implementation Platform Comparison
2
FPGA main components and features





Logic block architecture
Interconnect architecture
Programming technology
Power dissipation
Reconfiguration model
3
FPGA model

…….
4
Interconnect Network Topologies





Island style
Row-based
Sea-of-gates
Hierarchical
One-dimensional structures
5
Island-Style Architecture
6
Row-Based Architecture
7
Sea-of-Gates Architecture
8
Hierarchical Architecture
9
One-Dimensional Architecture
10
Logic Cluster Parameters




The size of (number of inputs to) a LUT.
The number of CLBs in a cluster.
The number of inputs to the cluster for
use as inputs by the LUTs.
The number of clock inputs to a cluster
(for use by the registers).
11
Studies on the CLB structure


Area optimal: 3-4 input LUTs
For multiple output LUTs:



Optimal area: 4 input LUTs
Optimal delay: 5-6 input LUTs
4-input LUT clusters show 10% area
efficiency in comparison to single 4-input
LUTs
12
Programming Technology




Volatile (SRAM)
Irreversible (Antifuse)
EPROM, EEPROM AND FLASH
The programming technology affects the
FPGA area
13
SRAM Programming Technology




Configuration storage on SRAM cells
Volatile (FPGA has to be reprogrammed on
power-up)
Large area (SRAM cells)
Allows dynamic and partial reconfiguration
14
Antifuse Programming Technology




Programming element is an antifuse (high
impedance (open-circuit) on low voltage,
low impedance (connection) on high
voltage)
Small area
Non-volatile (no need for reprogramming
on power-up)
Irreversible (design errors cannot be
corrected)
15
EPROM, EEPROM and Flash
Programming Technology



Non-volatile
Reprogramming through exposure to
ultraviolet light (EPROM) or electrical
signals (EEPROM/Flash)
Slower programming than SRAM
16
FPGA Power Consumption

FPGA power dissipation components:




Interconnection network
Clock network
Input/Output
Logic block
17
FPGA Power Consumption Breakdown
(XC4003)
18
Dynamic vs Static Power Consumption
Dynamic power consumption is still dominant, even
though the static power consumption component
increases with the decrease in feature size.
19
Reconfiguration Models






Static Reconfiguration
Dynamic Reconfiguration
Single Context
Multi-Context
Partial Reconfiguration
Pipeline Reconfiguration
20
Static Reconfiguration




Compile-time Reconfiguration
Most common approach
One configuration per application
System must be halted and then restarted
with new program
21
Dynamic Reconfiguration



Run-time Reconfiguration
Based on virtual hardware
Trade-off between time and space
22
Single Context




One configuration at a time
Programming using a serial bitstream
High overhead for small configuration
changes
Not suitable for run-time reconfiguration
23
Multi-Context



Multiple memory bits for each
programming bit location
Multiplexed set of single context devices
One context can be reprogrammed when
another is active
24
Partial Reconfiguration



Addresses used to specify the target
location of the configuration data
Undisturbed portions of the array can
continue execution during reconfiguration
Reduces the amount of data that must be
transferred to the FPGA
25
Pipeline Reconfiguration


Partial reconfiguration increments of
pipeline stages
Used in datapath-style computations
26
Run-Time Reconfiguration








Algorithmic Reconfiguration
Architectural Reconfiguration
Functional Reconfiguration
Fast Configuration
Configuration Prefetching
Configuration Compression
Relocation and Defragmentation in Partially
Reconfigurable Systems
Configuration Caching
27
Algorithmic Reconfiguration


Reconfigure the system with an algorithm
which performs the same functionality but
with different requirements
Adapt dynamically to environment or
operational changes
28
Architectural Reconfiguration

Modify hardware topology by reallocating
resources to computations
29
Functional Reconfiguration


Execute different functions on the same
resources
Time-share resources across
computational tasks
30
Fast Configuration

Reconfigure the device as fast as possible
in order to minimize reconfiguration
overhead
31
Configuration Prefetching


Loading a configuration onto a device in
advance, in order to overlap
reconfiguration with useful computation
The challenge is to determine future
configurations
32
Configuration Compression

Minimize the data that must be loaded to
the device in multi-context environment
33
Configuration Caching


Reducing the amount of configuration
data that must be transferred to the
device
The challenge is to determine which
configuration to retain and which to flush
34
Commercial Fine-Grain Reconfigurable
Architectures

Xilinx






Cyclone
Cyclone II
Stratix II /Stratix II GX
Actel




Atmel
AT40K/AT40KLV
AT6000


Altera


Spartan-3 /Spartan-3L
Virtex-4
Virtex-5

Quicklogic
PolarPro
Eclipse II


Lattice
LatticeECP2
LatticeXP

Fusion
ProASIC3/ ProASICPLUS
Axcelerator
Varicore
35
Xilinx Spartan-3

CLB




Interconnect





Long lines (one out of every six CLBs)
Hex lines (one out of every three CLBs)
Double lines (every other CLB)
Direct lines (each CLB with its neighbours)
Advanced features




Four slices
Two logic function generators/slice
Two storage elements/slice
BlockRAM
Dedicated Multipliers
Digital Clock Managers
Configuration

SRAM
36
Xilinx Spartan-3
37
Xilinx Virtex-4


Three variations (LX, FX, SX)
CLB




Advanced features





Four slices
Two logic function generators/slice
Two storage elements/slice
BlockRAM
XtremeDSP slices
Digital Clock Managers
Additional features in the FX family
 8–24 RocketIO Multi-Gigabit serial Transceivers
 One or Two PowerPC cores
 Two or Four Tri-MAC Cores
Configuration

SRAM
38
Xilinx Virtex-5


65 nm
ExpressFabric


Interconnect


Diagonal symmetric interconnect
Advanced features




6-input LUTs
DCM and PLLs
BlockRAM
DSP48E slices
Configuration


SRAM
Advanced Encryption Standard technology for bitstream protection
39
Altera Cyclone/Cyclone II


Essentially the same architecture in 130 nm (Cyclone),
and 90 nm (Cyclone II)
LE (10 per LAB):




MultiTrack Interconnect


4-input LUT
Register
Carry chain
Row and column interconnects spanning fixed distances
Advanced Features:




Embedded Memory
PLLs
External RAM interfacing
Embedded multipliers (Cyclone II only)
40
Cyclone Logic Element
41
Altera Stratix II/ Stratix II GX



Adaptive Logic Modules:
MultiTrack Interconnect
Advanced Features:

TriMatrix Memory
42
Adaptive Logic Module
43
Review Questions



Can you partially reconfigure a singlecontext FPGA?
How often do you need to reconfigure a
SRAM configuration memory FPGA device?
One design comprising 200 CLBs and
one comprising 400 CLBs are to be
downloaded on the same device, that
doesn’t support dynamic reconfiguration.
How big is the size of the second design
bitstream in comparison to the first?
44