Integrated Tool Suite for Post Synthesis FPGA Power Consumption Analysis Matthew French, Li Wang University of Southern California, Information Sciences Institute Tyler Anderson, Michael Wirthlin Brigham.

Download Report

Transcript Integrated Tool Suite for Post Synthesis FPGA Power Consumption Analysis Matthew French, Li Wang University of Southern California, Information Sciences Institute Tyler Anderson, Michael Wirthlin Brigham.

Integrated Tool Suite for
Post Synthesis FPGA Power
Consumption Analysis
Matthew French, Li Wang
University of Southern California, Information Sciences Institute
Tyler Anderson, Michael Wirthlin
Brigham Young University
French 207
Slide 1
MAPLD 2005
FPGA Power Trends & Needs
•
•
•
•
•
Number of logic blocks & maximum operating frequency track Moore’s Law
Voltage reduction is slower
Resulting power increase is exponential
Power needs to be a first class design constraint
Limited power tools available
– Spreadsheets
1,000,000
Number of F-F’s
• Manual entry
• Prone to guess-timation
100,000
Power (mW)
– XPower (post-routing)
10,000
• At end of design cycle
• Profiled after timing simulation
1,000
• Time intensive
• Unwieldy file sizes
100
• Limited Reporting
Clocking Frequency
(MHz)
10
Voltage (V)
• Only total power consumed
• No ability to capture power transients
1
• Limited design path if specifications not
met
• Routing tools optimize only throughput
French 207
Internal Power Consumption
Virtex
Virtex-E
Virtex-II
Virtex-II Pro
Virtex 4 LX
Xilinx Family
Power calculated assuming 80% device utilization, 80% peak clock
frequency, 12.5% toggling rate. Internal logic only, no I/O.
Slide 2
MAPLD 2005
Power Tools: Goals
• Push power analysis, visualization, and
optimization to front of the tools chain:
– Analyze power consumption at logic simulation
with two levels of accuracy
• Pre-place-and-route, using heuristic
estimates based on fanout
• Back-annotated with precise post-placeand-route RC data
– Visualize by providing intuitive views to help
the designer rapidly find and correct inefficient
circuits, operating modes, data patterns, etc.
– Optimize systems by automatically identifying
problem paths and suggesting improvements
FPGA Tool Flow
• Benefits
–
–
–
–
Closer to logical level and design entry
Power profiling during functional simulation
Early estimation before place and route
Automatic specific resource utilization power
details
– Facilitates high level design alternative
exploration
French 207
Slide 3
Proposed
Power Tool
Entry Point
Current
Power Tool
Entry Point
MAPLD 2005
Tool Backbone: JHDL & EDIF Parser
• Leverage JHDL simulation Environment with EDIF Parser circuit manipulation
• JHDL
–
–
–
–
Java-based structural design tool for FPGAs
Circuits described by creating Java Classes
Design libraries provided for several FPGA families
http://www.jhdl.org
• JHDL design aides
– Logic simulator & waveform viewer
– Circuit schematic & hierarchy browser
– Module Generators
• Circuit designer does not need to know Java!
• EDIF Parser
–
–
–
–
–
–
Supports multiple EDIF files
Virtex2 libraries and memory initialization
Support for “black boxes”
No JHDL wrapper required
http://splish.ee.byu.edu/reliability/edif/
Verified: Synplicity, Synplcity Pro, Coregen,
System Generator, Chipscope
French 207
Slide 4
3rd Party
Tools
JHDL Environment
JHDL
Data
Structure
EDIF
Netlist
EDIF
Parser
EDIF Parser
EDIF
Data
Structure
Manipulation
Tools
MAPLD 2005
Power Tool Flow: Timing-Level
.ncd
VHDL
Verilog
JHDL
Synthesis
.ncd
Place &
Route
Map
Source
Code
Xilinx Tool
Flow
To
Target
Xpower
Bitgen
.pwr
EDIF
Routed
Circuit
Model
EDIF
Parser
• Event Model Restructured
JHDL
– Tool Interoperability
– Cross-probing Enabled
Power Analysis &
Visualization
Power
Tools
• Support dynamic insertion of 3rd party (Power) tools
– Circuit APIs in place
– Graphical User Interfaces (GUI) support
French 207
Slide 5
MAPLD 2005
Power Visualization Tool
•
Two views:
–
–
•
Integrated “cross-probing” with
existing JHDL tools
–
–
–
•
•
•
Instantaneous vs. cumulative power
consumption over time
Sorted tree view of “worst offenders”
Unified Environment
Allows Experimentation
Smart Re-use of CPU Memory
Help rapidly identify inefficient
circuits and operating modes
Per-cell / per-bit granularity
Simulation trigger on power
specification
Cross Probing
French 207
Slide 6
MAPLD 2005
Post Synthesis Level Power
Modeling
• Power Modeling
– Quiescent power based on total circuit size
– Dynamic Power
Power  (%toggle)(FreqClock )(CapComponent  CapWire )
• Toggle Rates (Data Dependant)
• Components Used
• Routing Interconnect
– Actual quiescent and dynamic power not known
until circuit is placed and routed
• Leverage existing JHDL tool environment
Component
Cap
(pF)
Component
Cap
(pF)
FF
1.21
LUT
1.0
SRL
3.0
LD
1.0
INV
1.0
AND
1.0
RAM
1.0
MULT
17.2
DLL
40.0
IBUF
1.0
– Toggling rates derived from simulator
BUFG
6.0
BRAM
59.0
• Will lose glitching information
– Components known from EDIF or JHDL primitives Xpower Component Capacitance
• Component capacitance imported from
Interconnect
Cap (pF)
Xpower
– How to model routing interconnect?
Long Line
11.8
• Do not have exact routing information at
Hex Line
0.59
synthesis
Double Line
0.44
• Routing tools can pick different route each
iteration
Direct Connect
0.29
– Interconnect length and combinations vary
Xpower Interconnect Capacitance
French 207
Slide 7
MAPLD 2005
Wire Power Model Analysis
• Developed power tools to analyze
relationships
• Can plot capacitance vs
–
–
–
–
–
Fanout
Programmable Interconnect Points
Wire Length
Total Number of Nets
Total Number of Components
• Which relationships maintain correlation
from synthesis to place and route?
– Optimizer removes components, nets
• Can also use tools to judge routing
quality
Optimization Candidates
– Identify Outliers
– Information Available to do Power Weighted
Placement and Routing
• Use Placement Macros in JHDL
• Use UCF placement and/or timing
constraints
French 207
Slide 8
MAPLD 2005
Low Fanout Capacitance Variance
• Not all routes are
created Equal
• Up to 60% variance on
“same” route length
• East-West vs NorthSouth Bias
• Switches sometimes
use Doubles instead of
Direct Connects
Switch
Logic
2.45 pF (#2727)
2.37 pF (#4791)
YQ -> F2 (omux-B3)
YQ -> G4 (omux-B4)
1.46 pF (#2768)
YQ -> F4 (omux-A2)
Direct vs Double
Direct Connect
French 207
0.75 pF (#131)
Double Wire
YQ -> F2 (omux-A7)
Slide 9
MAPLD 2005
Capacitance vs Fanout
• Fanout model
well correlated
• Secondary fit
line
corresponds
to Macros
• High variance
at low fanout
• Achieving
4.3% average
error, 16%
variance
• Explored
device
utilization
models as well
French 207
Placement
Macros
Slide 10
MAPLD 2005
Resulting Power Tool Flow
.ncd
Map
Source
Code
VHDL
Verilog
JHDL
Synthesis
Xilinx Tool
Flow
Bitgen
.pwr
EDIF
Virtex II
Power
Model
EDIF
Parser
JHDL
French 207
.ncd
Place &
Route
Xpower
To
Target
Routed
Circuit
Model
Power Analysis &
Visualization
Slide 11
Power
Tools
MAPLD 2005
Power Optimization Approach
• Influence Xilinx Place&Route tools for power
efficiency
– Minimize clock/wire lengths of high power nets
Timing
Constraint
(ns)
• Use power analysis tools to identify hot-spots and
generate constraints
– Timing constraints on non-clock signals
– Location constraints on sink flip-flops of clock signals
• Verify power optimization approaches
Placement
Constraint
(X,Y)
– Use final circuit timing model to verify power savings
.ncd
Xilinx Tool Flow
Ngdbuild .ncd
& Map
Place &
Route
.ucf
EDIF
EDIF
Parser
Xpower
vcd
vhd
Power
Tools
ModelSim
Tool
Verification
Verification
Optimization
French 207
bitgen
Slide 12
MAPLD 2005
Timing Constraint Power Optimization
• Wire power is optimized by reducing
length
– MAXDELAY constraint in UCF file defines the
maximum latency a wire has
• Power tools contain Wire Table database
– Sortable by: Average power, Toggling rate,
Fanout, Load
– Apply constraints
Default Constraints
Wire Table
Constraint Freq : 50 MHz
Operating Freq : 50 MHz
Poor Power Efficiency
French 207
Power Timing Constraints
Constraint Freq : 100 MHz
Operating Freq : 50 MHz
Better Power Efficiency
Slide 13
MAPLD 2005
Timing Constraint Power Optimization:
Preliminary Results
French 207
% of total
nets
constrained
Clock (mW)
Signal (mW)
Total Power (mW)
Clock + Signal
Baseline, no
constraints
N/A
442.5
19.9
462.4
All nets
constrained
12.5%
439.3
29.4
468.7 (-1.4%)
Fanout < 10
constrained
11.1%
394.2
23.7
417.9 (9.6%)
Fanout < 4
constrained
10.6%
400.6
23.1
423.7 (8.4%)
Top 25%
constrained
4.1%
384.5
23.4
407.9 (11.8%)
Power is reduced by from –1.4% to 11.8%
More constraints are not necessarily better
Can also vary amount of timing that nets are constrained by
Circuits still meet original timing specification requirements
Slide 14
MAPLD 2005
Location Constraint Power
Optimization
• Power Optimization
Guidelines
Less Power Efficient
More Power Efficient
– Minimize clock zone
utilization
– Group flip-flops as
tightly as possible
– Group flip-flops closer
to clock trunks
Reduce clock paths by putting
constraints on flip-flops
locations, thus reducing the
clock capacitance and power.
French 207
Slide 15
MAPLD 2005
Location Constraint Power Optimization
Interface
• Clock table can be
sorted by power,
number of flip-flops
etc.
• Users can select
locations of flip-flops
- Users can select how
Clock Table
tightly flip-flops are placed
- Users can define the area
where flip-flops are placed
The tool checks the validity of
constraint areas.
- Users can select which
flip-flop groups are added
with the constraints
French 207
Slide 16
MAPLD 2005
Location Constraint Power Optimization
Preliminary Results
Clock (mW)
Signal (mW)
Logic (mW)
Total Power
(mW) Clock
+ Signal +
Logic
Baseline, no
constraints
442.5
19.9
285.8
748.2
All FFs
Placed
293.7
(33.6%)
27.6
(-38.8%)
255.4
(10.6%)
576.7
(22.9%)
IOs in IOBs,
all other FFs
placed
356,251
(19.5%)
21,909
(-10%)
285,787
(0%)
663,947
(11.3%)
- Individual clock net improvement ranged from -4% to 57%
- Achieve up to 22.9% total power improvement
- Circuits still meet timing requirement if IO buffer flip-flops are
left in IOBs
- Power could be further reduced if IO buffer flip-flops are not
constrained to be within IOBs
French 207
Slide 17
Unconstrained
Constrained
MAPLD 2005
Conclusions
• Post-synthesis level power modeling is feasible
– Some accuracy trade-offs inevitable
– Quicker power results enable
• Capability to determine power specifications early in
the design flow
• Feedback on design-level circuit power ramifications
• Tighter feedback loop to designer for more design
iterations
• Optimization
– Preliminary results encouraging
– Tools do not alter original circuit functionality & use COTS
inputs
– Developing optimization algorithms & routines
• Tools are open source: http://rhino.east.isi.edu
• This research made possible by a grant from
the NASA Earth-Sun System Technology Office
French 207
Slide 18
MAPLD 2005