Transcript Document
UltraFast
TM
Guidelines For Predictable Success
© Copyright 2013 Xilinx .
Xilinx Delivers an ASIC-Class Advantage Through Silicon, Tools, and Methodology
Page 2 © Copyright 2013 Xilinx .
Agenda
UltraFast Methodology Introduction Write HDL code that best fit the hardware Timing constraints creation and validation Clock planning, Pin planning, Floorplanning
Page 3 © Copyright 2013 Xilinx .
UltraFast
TM
Benefits Methodology
Fast Compile Times and Predictable Results
– Require good methodology
Project Schedules Drive Time To Market
– Manage risk affectively – Minimize Iterations, especially late-stage changes – Explore options early with estimation and progressive analysis
Proven Recommendations from Successful Customers
– Best Practices with Checklists and Links to Documentation – Verification Tools and Reports – Linting and DRC Page 4 © Copyright 2013 Xilinx .
UltraFast User Guide: UG949
PCB planning: Avoid board re-spins
– Use XPE to validate power against budget – Use Vivado I/O planning & DRC on a top level including all I/F
Design Creation: Coding style for best QoR
– Use HDL language templates in Vivado – New Linting capability:
Methodology
DRC ruledeck
Implementation: Rapid convergence & signoff timing
– Rapid convergence technique: Closure with the simplest constraints – Signoff convergence: Closure with pristine constraints – Use XDC language templates &
Timing
DRC ruledeck Page 5 © Copyright 2013 Xilinx .
Overall Strategy for Accelerated Design Cycle Earlier Iterations
Start closure at the front-end of the design flow
– Engage UltraFast early – Faster iterations than in the back-end – Greater impact on Quality of Results (QoR)
Device/IP selection Impact on QoR 100x 10x PCB / Planning IP Integration, RTL Design, Verification 1.2x 1.1x
Implementation Closure Config., Bring-up, Debug
Page 6
Reduce Design Cycle Time & Cost
© Copyright 2013 Xilinx .
UltraFAST Design Methodology Guide – UG949
Project Planning & Kickoff Board Planning & Schematic Creation Design Creation & IP Integration Implementation & Design Closure Configuration Programming & Hardware Debug
Page 7 © Copyright 2013 Xilinx .
Design Methodology Checklist in DocNav
Sample section
© Copyright 2013 Xilinx .
Checklist
Spreadsheet based checklist to be used by designer and FAE to review key portions of board schematic for FPGA/SOC
– Power Distribution System, Configuration, Transceivers, XADC, I/O Interfaces © Copyright 2013 Xilinx .
UltraFast
TM
Design Methodology
Guidelines For Predictable Success
© Copyright 2013 Xilinx .
Vivado Enables Design Methodology
Key Technology: Shared, Scalable Data Model Progressive estimation accuracy across the entire flow
Estimation IP Integration RTL Design Synthesis Shared, Scalable Data Model
Reduce iterations late in the cycle
Place & Route
Shares design information between implementation steps
– Ensures fast convergence and timing closure
Enables use of the same commands & reports to analyze design at every step Enables cross-probing Highly efficient memory utilization
– Scalable for next decade of designs Page 11 © Copyright 2013 Xilinx .
RTL
entity FIR is port (clk : in rst : in din : in
Code Changes
Schematics
Tool Settings
Timing Report
Timing Path #1 Timing Path #2 Timing Path #3
Reports
Placement
Placement Edits
Technique for Rapid Timing Closure
Baselining
Prioritize and close 1 step at a time
– Converge first at Synthesis (faster, higher impact), then in back-end – Start with the simplest (baseline) constraint: • Internal Fmax (flop-to-flop constraints) which is the problem 9/10 times • Define proper clock dependencies – Make sure the design & constraints are reasonable – Analyze, get to root cause, then decide how to fix it • Clock path vs. data path vs. interconnect delay vs. logic delay… – Add I/O constraints (with Vivado XDC templates) and redo…
Do not confuse with “Signoff” Constraints
– You still want complete constraints
View QuickTime Video for UltraFast Design Methodology for Timing Closure
Page 12 © Copyright 2013 Xilinx .
Progressive Approach to Design Closure
Synthesis • Analysis
Baseline Constraints
Route • Analysis Place • Analysis
Optimize Internal Paths
Synthesis • Analysis
Add I/O Constraints
Route • Analysis Place • Analysis
Optimize Entire Chip
Synthesis • Analysis Route • Analysis
If needed Add Timing Exceptions and/or Floorplan
Place • Analysis
Fine-tune
F max F max F max Page 13
Baseline
XDC Complete XDC © Copyright 2013 Xilinx .
Final XDC
Critical Path could be a Moving Target
Example from a Real Design
Post-synthesis estimates (the real problem)
– Worst path: 13 levels of logic worst path: 4.3ns
Post-place
– Worst path: 7 levels – Paths with 7-13 levels got placed locally worst path: 4.2ns
Post-route (the side-effect of the real problem)
– Worst Path: 4 levels of logic – Paths with 5-13 levels got preferred routing worst path: 4.1ns
Page 14
Analyze & Fix timing issues at early stages for faster timing convergence
© Copyright 2013 Xilinx .
Fits the Hardware
© Copyright 2013 Xilinx .
Impact of HDL Coding Style
Block inference
– Follow recommended templates for RAM, DSP, LUTRAM, SRL inference
Pipeline your design to reduce levels of logic Think about Reset
– Taxes routing not always needed: Xilinx devices boot in a known state – Dedicated shifters (SRLs) and RAM memory arrays don’t use resets
Synchronous resets are preferred
– Allows packing of registers into dedicated RAM and DSP blocks – Tools have the option to implement reset in datapath (LUT)
Give more freedom to Synthesis
– Revisit attributes needed by other synthesis engines or older releases – Avoid KEEP, dont_touch, syn_preserve, max_fanout attributes…
Review Design Creation Chapter in UG949 Review Design Creation tab in the Design Methodology Checklist
Page 16 © Copyright 2013 Xilinx .
Using HDL Language Templates
Accessing templates in IDE
– Windows Language Templates
Synthesis Templates
– BRAM, LUTRAM, ROM, SRL – Counter, MULT – FSM, Decoder, Encoder – … Page 17 © Copyright 2013 Xilinx .
Coding to Match the Hardware DSP48 Blocks and BRAM Blocks
Leverage DSP block cascading capabilities
in Pipelined adder chain delivers optimal performance Adder tree becomes a performance bottleneck in out DSP48 DSP48 DSP48 DSP48 out
Avoid Block RAM collision avoidance logic (*)
Synthesis assumes collision
rdaddr wraddr din
(*): logic added by default by Synplify (attribute syn_no_rw_check removes the logic) Page 18 RAMB
= dout rdaddr wraddr din
Inference with collision check disabled RAMB
dout
© Copyright 2013 Xilinx .
The Impact of Resets
Increase performance with the right reset choice
– Think Local, not Global with resets – No reset at all
(if possible)
is best – Synchronous rather than asynchronous reset – Active HIGH rather than active low reset – Default register value can be controlled via the INIT property or at signal declaration in RTL From: UG949 Chapter 4 Design Creation – Control Signals and Control Sets Page 19 © Copyright 2013 Xilinx .
Reset Routing
Resets compete for the same resources as the rest of the active signals of the design
– Including the critical datapath paths
Designs that minimize or eliminate resets have
– About 18% fewer timing paths on average – About 15% less runtime on average – 10% fewer registers and 7% fewer LUTs – 20% lower timing scores – Use less memory
Be selective with where you code resets Initialize all registers in the VHDL / Verilog code
Page 20 © Copyright 2013 Xilinx .
More on Resets
Many designs need some resets
– Very few designs require resets on all registers • Most ASICs require a described reset on every register for testability • But the FPGA has a built-in Global Set/Reset (GSR)
Guideline: Be selective with where you code resets
– Only place resets that have impact on functionality • I/O, State-machines, critical control logic, etc.
– Omit resets that do not
Initialize all registers in the VHDL / Verilog code
– This should be done whether using a reset or not
VHDL: signal my_regsiter : std_logic_vector (7 downto 0) := “01010101”; Verilog: reg [7:0] my_register = 8’h55;
© Copyright 2013 Xilinx .
Gauging Other Design Metrics
report_high_fanout_nets
– To reduce fanout on a net use… • max_fanout (Vivado synthesis and XST) • syn_maxfan (Synplify) – Use
phys_opt_design
for timing driven replication Page 22 From: Design Methodology Checklist – Design Creation tab © Copyright 2013 Xilinx .
Gauging Other Design Metrics
report_control_sets
– Indicator of possible packing fragmentation and fitting issues – Run the
–verbose
option to generate a full list – Use Synplify’s
syn_reduce_controlset_size
attribute for control Default is 2, set it to 8 to eliminate most lowest fanout control sets From: Design Methodology Checklist – Design Creation tab Page 23 © Copyright 2013 Xilinx .
Methodology DRCs
Two new rule decks in 2013.3
– methodology_checks – timing_checks
Usage:
– report_drc –ruledeck methodology_checks – report_drc –ruledeck timing_checks – Specific “methodology_checks” available only for the elaborated design Page 24 Tools →Report → Report DRCs © Copyright 2013 Xilinx .
Review and Resolve Critical Warnings
Vivado does not stop for Critical Warnings
– Enables fixing many issues at once – Bitstream generation will error with unresolved critical warnings Page 25 From: UG949 Chapter 5 Implementation – Moving past Synthesis © Copyright 2013 Xilinx .
Review and Resolve Critical Warnings
Critical warnings are serious design issues
– Invalid constraints or XDC syntax errors – Path segmentation – Netlist or target objects not found or invalid
Address these warnings before moving forward
– Results of design analysis may be inaccurate – Critical Warnings may prevent design success Page 26 © Copyright 2013 Xilinx .
and Validation
© Copyright 2013 Xilinx .
Timing Constraints Need to Be "Clean"
When constraints (clock, IO) are missing
– The corresponding paths are timed optimistically – No violation will be reported but design may not work on HW
When path are incorrectly constrained
– Runtime and optimization efforts will be spent on the wrong paths – Reported timing violations may not result in any issues on HW
When constraints create wrong HOLD violations
– May result in long runtime and SETUP violations – P&R fixes HOLD violations as #1 priority, because: • Designs with HOLD violations won’t work on HW • Designs with SETUP violations will work, but slower
Review the Creating Constraints section of the Design Creation Chapter in UG949 & checklist
Page 28 © Copyright 2013 Xilinx .
Include IP Constraints
Many cores have their own constraints / exceptions
– PCIE, MIG, RAM-based asynchronous FIFOs…
Non-native IP: Be careful!
– Very easy to drop the IP constraints especially if provided as .ngc files
Native IP: Constraints included
– Sources window in IDE: Compile Order Constraints – Use
report_compile_order –constraints
to identify constraint file sources Page 29 © Copyright 2013 Xilinx .
Method to Create Good Constraints
Create clocks and define clock interactions
– Four-step guideline
Set input and output delays
– Beware of creating incorrect HOLD violations
Set timing exceptions
– Less is more!
– Beware of creating incorrect HOLD violations
Use report commands to validate each step
Page 30 © Copyright 2013 Xilinx .
Clock Ground Rules
For SDC-based timers, clocks only exist if you create them
– Use
create_clock
for primary clocks
Clocks propagate automatically through clocking modules
– MMCM and PLL output clocks are automatically generated – Gigabit transceivers are not supported. Create them manually.
create_clock here don’t create_clock here
Use create_generated_clock for internal clocks (if needed) All inter-clock paths are evaluated by default
Page 31 © Copyright 2013 Xilinx .
Four Steps for Creating Clocks
Run report_timing_summary before starting constraint capture
– View
report_clocks
section to see all signals driving clock pins
Step 1
– Use
create_clock
for all primary clocks on top level ports – Run the design (synthesis) or open netlist design
Step 2
– Run
report_clocks
– Study the report to verify period, phase and propagation – Apply corrections to your constraints
(if needed)
Attributes P: Propagated G: Generated
Clock Period Waveform Attributes Sources
sys_clk 10.000 {0.000 5.000} P {sys_clk} pll0/clkfbout 10.000 {0.000 5.000} P,G {pll0/plle2_adv_inst/CLKFBOUT} pll0/clkout0 2.500 {0.000 1.250} P,G {pll0/plle2_adv_inst/CLKOUT0} pll0/clkout1 10.000 {0.000 5.000} P,G {pll0/plle2_adv_inst/CLKOUT1} Page 32 Output of
report_clocks
© Copyright 2013 Xilinx .
(excerpt)
Four Steps for Creating Clocks (continued)
Step 3
– Evaluate the clock interaction using
report_clock_interaction BEWARE: All inter-clock paths are constrained by default!
– Mark inter-clock paths (Clock Domain Crossing) as asynchronous • Make sure you designed proper CDC synchronizers • Use
set_clock_groups
(preferred method to set_false_path)
BEWARE: This overrides any set_max_delay constraints!
– Do you have unconstrained objects? • Find out with
check_timing Step 4
– Run
report_clock_networks
– You want the design to have clean clock lines without logic • Tip: Use clock gating option in synthesis to remove LUTs on the clock line Page 33 © Copyright 2013 Xilinx .
Defining & Validating Clock Interactions
Page 34 © Copyright 2013 Xilinx .
Constraining Cross Clock Domains
Use appropriate synchronizing techniques
– 2
or more
register stages, for single bit – FIFO for buses
Maximize MTBF
– ASYNC_REG to place synchronizing flops in the same slice for best Mean Time Between Failures (MTBF) Page 35 set_property ASYNC_REG TRUE \ [get_cells [list sync0_reg sync1_reg]] © Copyright 2013 Xilinx .
Constraints for Asynchronous CDC
Ignoring timing paths between individual clocks
set_clock_groups –asynchronous –group {clk1} –group {clk2}
This is equivalent to: set_false_path –from [get_clocks clk1] –to [get_clocks clk2] set_false_path –from [get_clocks clk2] –to [get_clocks clk1]
BEWARE: This overrides any set_max_delay constraints!
Ignoring timing paths between groups of clocks
# SDC create_clock for the two primary clocks create_clock -name clk_oxo -period 10 [ get_ports clk_oxo] create_clock -name clk_core -period 10 [ get_ports clk_core] # Set Asynchronous Clock Groups set_clock_groups -asynchronous -group [ get_clocks –include_generated_clocks clk_oxo] \ -group [ get_clocks –include_generated_clocks clk_core} ]
BEWARE: This overrides any set_max_delay constraints!
Page 36 © Copyright 2013 Xilinx .
Setting Input / Output Delays
Start with no IO constraints
– Focus on finding and fixing core timing issues – Vivado does not time from IOs without IO constraints • No Need to false_path –from or –to get_ports to ignore IO timing
Specify realistic IO delays Once Core Timing Reasonable
– Use
set_input_delay
and
set_output_delay
– Wrong delay value (e.g. <0 ns) can cause invalid analysis
The delay value specified is the external delay
– Default in UCF: internal delay Page 37 © Copyright 2013 Xilinx .
Multicycle Paths
set_multicycle_path N implies a HOLD check at N-1
– E.g.: a multicycle_path of 10 implies a HOLD requirement of 9 cycles!
Whenever setup check is changed, hold check is also changed Guidelines for proper multicycle path constraints
– Should always be pairs of set_multicycle_path constraints • One for –setup and one for –hold – Bring the HOLD requirement back to 0 (reduce by N-1) to avoid incorrect HOLD violations CLK regA D CE Q Multicycle Path = 3T regB D CE Q regA/CLK
HOLD
regB/CLK REGB/D
set_multicycle_path –from [get_cells regA] –to [get_cells regB] 3 setup set_multicycle_path –from [get_cells regA] –to [get_cells regB] 2 – hold
hold checked at edge 3-1-2 = 0
Page 38 © Copyright 2013 Xilinx .
SETUP
Using Vivado Language Templates
XDC Template Accessing templates in IDE
– Windows Language Templates
SDR & DDR Templates
– Inputs and outputs – Source / System synchronous – Center / Edge aligned Page 39 © Copyright 2013 Xilinx .
Reading the Reports
Reading the report_timing_summary
– Intra-clock report – Inter-clock report
Use report_timing for interactivity and advanced options
– You would typically use it in the TCL window • report_timing –through [get_nets {/cpu_top/crit_net_name}] • report_timing –setup –max_paths 10
# For 10 worst setup paths
• report_timing –hold –to [get_cells {/top/item}]
# Hold on “item”
– Use filters from your XDC files to check each expression • set_multicycle_path –from [get_pins regA/C] –to [get_pins regB/D] • report_timing –from [get_pins regA/C] –to [get_pins regB/D] Page 40 © Copyright 2013 Xilinx .
Timing Command Summary
Obtain full timing summary of the design
–
report_timing_summary
: summary subsections for all timing checks
Create and validate clocks
–
check_timing
: for missing clocks and IO constraints –
report_clocks
: check frequency and phase –
report_clock_networks
: possible clock root
Validate clock groups
–
report_clock_interaction Validate I/O delays
– –
report_timing
–
from
[
input_port
] –
setup
/-
hold report_timing
–
to
[
output_port
] –
setup
/-
hold Add exceptions if necessary
– Validate using
report_timing
Page 41 © Copyright 2013 Xilinx .
Managing Constraint Files
Using a single XDC file
– XDC apply to both synthesis & implementation
Using multiple XDC files
– Main XDC with top level constraints • Primary clocks and I/O delays • Exceptions on clocks and RTL objects – Implementation specific XDC • Physical constraints • Exceptions based on physical netlist
main.xdc
The order of constraint files matters!
– To report the order of XDC files: report_compile_order –constraints
impl.xdc
Page 42 © Copyright 2013 Xilinx .
Elaboration Synthesis Implementation
Managing IP Constraint Files
Some IP come with their own XDC constraints
– Example: The clocking wizard
The clocking wizard XDC will be read before the user XDC by default
(user constraints can override IP defined clocks by default)
The order of constraint files matters!
– To report the order of XDC files: report_compile_order –constraints – Always verify the clocks using report_clocks (step 2 of 4-step process) – To change the default processing order set_property set_processing_order
early
|
late
IP_XDC_File
– If necessary,
IP_XDC_files
can be enabled/disabled Page 43 © Copyright 2013 Xilinx .
and Floorplanning
© Copyright 2013 Xilinx .
Clock and Pin Planning
Pin and Clock Planning often happens early in the Project
– Decisions here can have prolific effects throughout the design • Excessive clock skew • Poor I/O timing • Timing hazardous clock domain crossing • Less flexible logic placement • Fewer clocking resource choices • Excessive routing delays • Reduced device utilization
Pin and Clock Planning should be considered together
– Choices made for clock pins affect clocking timing and resources choices – Choices made for data pins affect clock pin placement decisions
Review the Board & Device Planning Chapter in UG949 Review the Board and FPGA Planning tab in the Design Methodology Checklist
Page 45 © Copyright 2013 Xilinx .
Clock and Pin Planning
Considerations for clock pin planning
– Generate all I/O interface and clocking IP prior to pin assignment – Consolidate clocking where possible and consolidate MMCMs • Fewer clocks and MMCM means fewer clock resources and crossings – Consider all CDC when assigning clocking resource and pins
Considerations for data pin planning
– Group related data pins in same bank, or adjacent banks if single bank not possible • Place associated I/O clock in same bank when possible – Consider associated control signal placement along with data paths – Consider data flow as planning pinout • Chose a pinout that has clean passage through device – Place high fanout signals towards the middle of the chip • Really high fanout signals considered for CCIO pins with BUFG resources – Evaluate all pin attributes (I/O Standard, Slew, etc.) during placement Page 46 © Copyright 2013 Xilinx .
Clock and Pin Planning
Use Vivado Pin Planning capabilities
– Import pin & clocking assignments from generated IP – Visualization of I/O resource placement on package and in device – DRC, SSN and other checks available to validate choices – Configuration pin assignments & possible device migration considerations
Re-evaluate in Vivado any subsequent pin changes
– Understand how PCB pin swaps affect timing & resources
Vivado I/O & Clock Planning Tutorial UG935
– Available in DocNav and Vivado Page 47 © Copyright 2013 Xilinx .
Additional Considerations for SSI Devices
Clocking
– High fanout clocks should be placed in center SLRs – Place regional clocks on center clock region within an SLR – Place clock pin / MMCMs in same SLR as timing critical I/O interfaces (avoid driving timing critical I/O interfaces from a different SLR) – Clock pin choices should be balanced across upper & lower SLR: • 2 upper SLR clock domains have 8 BUFG x 2 • 4 lower SLR clock domains have 4 BUFG x 4
Pinout
– High fanout signals feeding all SLRs placed in center SLRs – I/O interfaces should not span across SLRs – Pay attention to data flow across SLRs • Avoid the need for multiple SLR crossings due to pinout decisions
For more details
Consult UG872: Large FPGA Methodology Guide for more details Page 48 © Copyright 2013 Xilinx .
Improving Placement Through Floorplanning
First improve HDL, synthesis & constraints
– Easier, more repeatable to not floorplan when avoidable
Start design without any floorplanning
– See what P&R algorithms can do without restrictions
Using Vivado IDE
– Highlight placement per module as guideline – Visualize placement of critical timing paths • Understand data flow in & out of Pblocks • Understand affects of Pblock inside & out • Resources around placement can affect data flow – Create Pblocks minding resource utilization
Careful not to over floorplan – Less is best
– Only floorplan the critical areas of the design – Do not create Pblocks with very high utilization • Can create routing congestion or new timing problems – Avoid overlapping Pblocks • Creates more complex placement and clock scenarios Page 49 © Copyright 2013 Xilinx .
Baseline run with highlighted regions
© Copyright 2013 Xilinx .
UltraFastTM Methodology Review
For optimal results, adapt your HDL style to the FPGA
– Be mindful of BRAM, LUTRAM, DSP, SRL inference needs – Avoid asynchronous reset and wired resets in general – Minimize control signals – For large FPGAs, design with the dataflow and floorplanning in mind
Baseline your constraints to converge rapidly Provide clean timing constraints
– Bad constraints results in bad runtime, performance and HW failures – Learn the essentials of timing creation & validation methods
Follow pin/clock planning guidelines
– Must follow dataflow – Place large fanout clocks and pins in the center of SSIT devices Page 51 © Copyright 2013 Xilinx .
Follow Xilinx
facebook.com/ XilinxInc twitter.com/ XilinxInc youtube.com/ XilinxInc © Copyright 2013 Xilinx .
© Copyright 2013 Xilinx .