Transcript Document

UltraFast

TM

Guidelines For Predictable Success

© Copyright 2013 Xilinx .

Xilinx Delivers an ASIC-Class Advantage Through Silicon, Tools, and Methodology

Page 2 © Copyright 2013 Xilinx .

Agenda

UltraFast Methodology Introduction Write HDL code that best fit the hardware Timing constraints creation and validation Clock planning, Pin planning, Floorplanning

Page 3 © Copyright 2013 Xilinx .

UltraFast

TM

Benefits Methodology

Fast Compile Times and Predictable Results

– Require good methodology

Project Schedules Drive Time To Market

– Manage risk affectively – Minimize Iterations, especially late-stage changes – Explore options early with estimation and progressive analysis

Proven Recommendations from Successful Customers

– Best Practices with Checklists and Links to Documentation – Verification Tools and Reports – Linting and DRC Page 4 © Copyright 2013 Xilinx .

UltraFast User Guide: UG949

PCB planning: Avoid board re-spins

– Use XPE to validate power against budget – Use Vivado I/O planning & DRC on a top level including all I/F

Design Creation: Coding style for best QoR

– Use HDL language templates in Vivado – New Linting capability:

Methodology

DRC ruledeck

Implementation: Rapid convergence & signoff timing

– Rapid convergence technique: Closure with the simplest constraints – Signoff convergence: Closure with pristine constraints – Use XDC language templates &

Timing

DRC ruledeck Page 5 © Copyright 2013 Xilinx .

Overall Strategy for Accelerated Design Cycle Earlier Iterations

Start closure at the front-end of the design flow

– Engage UltraFast early – Faster iterations than in the back-end – Greater impact on Quality of Results (QoR)

Device/IP selection Impact on QoR 100x 10x PCB / Planning IP Integration, RTL Design, Verification 1.2x 1.1x

Implementation Closure Config., Bring-up, Debug

Page 6

Reduce Design Cycle Time & Cost

© Copyright 2013 Xilinx .

UltraFAST Design Methodology Guide – UG949

Project Planning & Kickoff Board Planning & Schematic Creation Design Creation & IP Integration Implementation & Design Closure Configuration Programming & Hardware Debug

Page 7 © Copyright 2013 Xilinx .

Design Methodology Checklist in DocNav

Sample section

© Copyright 2013 Xilinx .

Checklist

Spreadsheet based checklist to be used by designer and FAE to review key portions of board schematic for FPGA/SOC

– Power Distribution System, Configuration, Transceivers, XADC, I/O Interfaces © Copyright 2013 Xilinx .

UltraFast

TM

Design Methodology

Guidelines For Predictable Success

© Copyright 2013 Xilinx .

Vivado Enables Design Methodology

Key Technology: Shared, Scalable Data Model Progressive estimation accuracy across the entire flow

Estimation IP Integration RTL Design Synthesis Shared, Scalable Data Model

Reduce iterations late in the cycle

Place & Route

Shares design information between implementation steps

– Ensures fast convergence and timing closure

Enables use of the same commands & reports to analyze design at every step Enables cross-probing Highly efficient memory utilization

– Scalable for next decade of designs Page 11 © Copyright 2013 Xilinx .

RTL

entity FIR is port (clk : in rst : in din : in

Code Changes

Schematics

Tool Settings

Timing Report

Timing Path #1 Timing Path #2 Timing Path #3

Reports

Placement

Placement Edits

Technique for Rapid Timing Closure

Baselining

Prioritize and close 1 step at a time

– Converge first at Synthesis (faster, higher impact), then in back-end – Start with the simplest (baseline) constraint: • Internal Fmax (flop-to-flop constraints) which is the problem 9/10 times • Define proper clock dependencies – Make sure the design & constraints are reasonable – Analyze, get to root cause, then decide how to fix it • Clock path vs. data path vs. interconnect delay vs. logic delay… – Add I/O constraints (with Vivado XDC templates) and redo…

Do not confuse with “Signoff” Constraints

– You still want complete constraints

View QuickTime Video for UltraFast Design Methodology for Timing Closure

Page 12 © Copyright 2013 Xilinx .

Progressive Approach to Design Closure

Synthesis • Analysis

Baseline Constraints

Route • Analysis Place • Analysis

Optimize Internal Paths

Synthesis • Analysis

Add I/O Constraints

Route • Analysis Place • Analysis

Optimize Entire Chip

Synthesis • Analysis Route • Analysis

If needed Add Timing Exceptions and/or Floorplan

Place • Analysis

Fine-tune

F max F max F max Page 13

Baseline

XDC Complete XDC © Copyright 2013 Xilinx .

Final XDC

Critical Path could be a Moving Target

Example from a Real Design

Post-synthesis estimates (the real problem)

– Worst path: 13 levels of logic worst path: 4.3ns

Post-place

– Worst path: 7 levels – Paths with 7-13 levels got placed locally worst path: 4.2ns

Post-route (the side-effect of the real problem)

– Worst Path: 4 levels of logic – Paths with 5-13 levels got preferred routing worst path: 4.1ns

Page 14

Analyze & Fix timing issues at early stages for faster timing convergence

© Copyright 2013 Xilinx .

Fits the Hardware

© Copyright 2013 Xilinx .

Impact of HDL Coding Style

Block inference

– Follow recommended templates for RAM, DSP, LUTRAM, SRL inference

Pipeline your design to reduce levels of logic Think about Reset

– Taxes routing not always needed: Xilinx devices boot in a known state – Dedicated shifters (SRLs) and RAM memory arrays don’t use resets

Synchronous resets are preferred

– Allows packing of registers into dedicated RAM and DSP blocks – Tools have the option to implement reset in datapath (LUT)

Give more freedom to Synthesis

– Revisit attributes needed by other synthesis engines or older releases – Avoid KEEP, dont_touch, syn_preserve, max_fanout attributes…

Review Design Creation Chapter in UG949 Review Design Creation tab in the Design Methodology Checklist

Page 16 © Copyright 2013 Xilinx .

Using HDL Language Templates

Accessing templates in IDE

– Windows  Language Templates

Synthesis Templates

– BRAM, LUTRAM, ROM, SRL – Counter, MULT – FSM, Decoder, Encoder – … Page 17 © Copyright 2013 Xilinx .

Coding to Match the Hardware DSP48 Blocks and BRAM Blocks

Leverage DSP block cascading capabilities

in Pipelined adder chain delivers optimal performance Adder tree becomes a performance bottleneck in out DSP48 DSP48 DSP48 DSP48 out

Avoid Block RAM collision avoidance logic (*)

Synthesis assumes collision

rdaddr wraddr din

(*): logic added by default by Synplify (attribute syn_no_rw_check removes the logic) Page 18 RAMB

= dout rdaddr wraddr din

Inference with collision check disabled RAMB

dout

© Copyright 2013 Xilinx .

The Impact of Resets

Increase performance with the right reset choice

– Think Local, not Global with resets – No reset at all

(if possible)

is best – Synchronous rather than asynchronous reset – Active HIGH rather than active low reset – Default register value can be controlled via the INIT property or at signal declaration in RTL From: UG949 Chapter 4 Design Creation – Control Signals and Control Sets Page 19 © Copyright 2013 Xilinx .

Reset Routing

Resets compete for the same resources as the rest of the active signals of the design

– Including the critical datapath paths

Designs that minimize or eliminate resets have

– About 18% fewer timing paths on average – About 15% less runtime on average – 10% fewer registers and 7% fewer LUTs – 20% lower timing scores – Use less memory

Be selective with where you code resets Initialize all registers in the VHDL / Verilog code

Page 20 © Copyright 2013 Xilinx .

More on Resets

Many designs need some resets

– Very few designs require resets on all registers • Most ASICs require a described reset on every register for testability • But the FPGA has a built-in Global Set/Reset (GSR)

Guideline: Be selective with where you code resets

– Only place resets that have impact on functionality • I/O, State-machines, critical control logic, etc.

– Omit resets that do not

Initialize all registers in the VHDL / Verilog code

– This should be done whether using a reset or not

VHDL: signal my_regsiter : std_logic_vector (7 downto 0) := “01010101”; Verilog: reg [7:0] my_register = 8’h55;

© Copyright 2013 Xilinx .

Gauging Other Design Metrics

report_high_fanout_nets

– To reduce fanout on a net use… • max_fanout (Vivado synthesis and XST) • syn_maxfan (Synplify) – Use

phys_opt_design

for timing driven replication Page 22 From: Design Methodology Checklist – Design Creation tab © Copyright 2013 Xilinx .

Gauging Other Design Metrics

report_control_sets

– Indicator of possible packing fragmentation and fitting issues – Run the

–verbose

option to generate a full list – Use Synplify’s

syn_reduce_controlset_size

attribute for control Default is 2, set it to 8 to eliminate most lowest fanout control sets From: Design Methodology Checklist – Design Creation tab Page 23 © Copyright 2013 Xilinx .

Methodology DRCs

Two new rule decks in 2013.3

– methodology_checks – timing_checks

Usage:

– report_drc –ruledeck methodology_checks – report_drc –ruledeck timing_checks – Specific “methodology_checks” available only for the elaborated design Page 24 Tools →Report → Report DRCs © Copyright 2013 Xilinx .

Review and Resolve Critical Warnings

Vivado does not stop for Critical Warnings

– Enables fixing many issues at once – Bitstream generation will error with unresolved critical warnings Page 25 From: UG949 Chapter 5 Implementation – Moving past Synthesis © Copyright 2013 Xilinx .

Review and Resolve Critical Warnings

Critical warnings are serious design issues

– Invalid constraints or XDC syntax errors – Path segmentation – Netlist or target objects not found or invalid

Address these warnings before moving forward

– Results of design analysis may be inaccurate – Critical Warnings may prevent design success Page 26 © Copyright 2013 Xilinx .

and Validation

© Copyright 2013 Xilinx .

Timing Constraints Need to Be "Clean"

When constraints (clock, IO) are missing

– The corresponding paths are timed optimistically – No violation will be reported but design may not work on HW

When path are incorrectly constrained

– Runtime and optimization efforts will be spent on the wrong paths – Reported timing violations may not result in any issues on HW

When constraints create wrong HOLD violations

– May result in long runtime and SETUP violations – P&R fixes HOLD violations as #1 priority, because: • Designs with HOLD violations won’t work on HW • Designs with SETUP violations will work, but slower

Review the Creating Constraints section of the Design Creation Chapter in UG949 & checklist

Page 28 © Copyright 2013 Xilinx .

Include IP Constraints

Many cores have their own constraints / exceptions

– PCIE, MIG, RAM-based asynchronous FIFOs…

Non-native IP: Be careful!

– Very easy to drop the IP constraints especially if provided as .ngc files

Native IP: Constraints included

– Sources window in IDE: Compile Order  Constraints – Use

report_compile_order –constraints

to identify constraint file sources Page 29 © Copyright 2013 Xilinx .

Method to Create Good Constraints

Create clocks and define clock interactions

– Four-step guideline

Set input and output delays

– Beware of creating incorrect HOLD violations

Set timing exceptions

– Less is more!

– Beware of creating incorrect HOLD violations

Use report commands to validate each step

Page 30 © Copyright 2013 Xilinx .

Clock Ground Rules

For SDC-based timers, clocks only exist if you create them

– Use

create_clock

for primary clocks

Clocks propagate automatically through clocking modules

– MMCM and PLL output clocks are automatically generated – Gigabit transceivers are not supported. Create them manually.

create_clock here don’t create_clock here

Use create_generated_clock for internal clocks (if needed) All inter-clock paths are evaluated by default

Page 31 © Copyright 2013 Xilinx .

Four Steps for Creating Clocks

Run report_timing_summary before starting constraint capture

– View

report_clocks

section to see all signals driving clock pins

Step 1

– Use

create_clock

for all primary clocks on top level ports – Run the design (synthesis) or open netlist design

Step 2

– Run

report_clocks

– Study the report to verify period, phase and propagation – Apply corrections to your constraints

(if needed)

Attributes P: Propagated G: Generated

Clock Period Waveform Attributes Sources

sys_clk 10.000 {0.000 5.000} P {sys_clk} pll0/clkfbout 10.000 {0.000 5.000} P,G {pll0/plle2_adv_inst/CLKFBOUT} pll0/clkout0 2.500 {0.000 1.250} P,G {pll0/plle2_adv_inst/CLKOUT0} pll0/clkout1 10.000 {0.000 5.000} P,G {pll0/plle2_adv_inst/CLKOUT1} Page 32 Output of

report_clocks

© Copyright 2013 Xilinx .

(excerpt)

Four Steps for Creating Clocks (continued)

Step 3

– Evaluate the clock interaction using

report_clock_interaction BEWARE: All inter-clock paths are constrained by default!

– Mark inter-clock paths (Clock Domain Crossing) as asynchronous • Make sure you designed proper CDC synchronizers • Use

set_clock_groups

(preferred method to set_false_path)

BEWARE: This overrides any set_max_delay constraints!

– Do you have unconstrained objects? • Find out with

check_timing Step 4

– Run

report_clock_networks

– You want the design to have clean clock lines without logic • Tip: Use clock gating option in synthesis to remove LUTs on the clock line Page 33 © Copyright 2013 Xilinx .

Defining & Validating Clock Interactions

Page 34 © Copyright 2013 Xilinx .

Constraining Cross Clock Domains

Use appropriate synchronizing techniques

– 2

or more

register stages, for single bit – FIFO for buses

Maximize MTBF

– ASYNC_REG to place synchronizing flops in the same slice for best Mean Time Between Failures (MTBF) Page 35 set_property ASYNC_REG TRUE \ [get_cells [list sync0_reg sync1_reg]] © Copyright 2013 Xilinx .

Constraints for Asynchronous CDC

Ignoring timing paths between individual clocks

set_clock_groups –asynchronous –group {clk1} –group {clk2}

This is equivalent to: set_false_path –from [get_clocks clk1] –to [get_clocks clk2] set_false_path –from [get_clocks clk2] –to [get_clocks clk1]

BEWARE: This overrides any set_max_delay constraints!

Ignoring timing paths between groups of clocks

# SDC create_clock for the two primary clocks create_clock -name clk_oxo -period 10 [ get_ports clk_oxo] create_clock -name clk_core -period 10 [ get_ports clk_core] # Set Asynchronous Clock Groups set_clock_groups -asynchronous -group [ get_clocks –include_generated_clocks clk_oxo] \ -group [ get_clocks –include_generated_clocks clk_core} ]

BEWARE: This overrides any set_max_delay constraints!

Page 36 © Copyright 2013 Xilinx .

Setting Input / Output Delays

Start with no IO constraints

– Focus on finding and fixing core timing issues – Vivado does not time from IOs without IO constraints • No Need to false_path –from or –to get_ports to ignore IO timing

Specify realistic IO delays Once Core Timing Reasonable

– Use

set_input_delay

and

set_output_delay

– Wrong delay value (e.g. <0 ns) can cause invalid analysis

The delay value specified is the external delay

– Default in UCF: internal delay Page 37 © Copyright 2013 Xilinx .

Multicycle Paths

set_multicycle_path N implies a HOLD check at N-1

– E.g.: a multicycle_path of 10 implies a HOLD requirement of 9 cycles!

Whenever setup check is changed, hold check is also changed Guidelines for proper multicycle path constraints

– Should always be pairs of set_multicycle_path constraints • One for –setup and one for –hold – Bring the HOLD requirement back to 0 (reduce by N-1) to avoid incorrect HOLD violations CLK regA D CE Q Multicycle Path = 3T regB D CE Q regA/CLK

HOLD

regB/CLK REGB/D

set_multicycle_path –from [get_cells regA] –to [get_cells regB] 3 setup set_multicycle_path –from [get_cells regA] –to [get_cells regB] 2 – hold

hold checked at edge 3-1-2 = 0

Page 38 © Copyright 2013 Xilinx .

SETUP

Using Vivado Language Templates

XDC Template Accessing templates in IDE

– Windows  Language Templates

SDR & DDR Templates

– Inputs and outputs – Source / System synchronous – Center / Edge aligned Page 39 © Copyright 2013 Xilinx .

Reading the Reports

Reading the report_timing_summary

– Intra-clock report – Inter-clock report

Use report_timing for interactivity and advanced options

– You would typically use it in the TCL window • report_timing –through [get_nets {/cpu_top/crit_net_name}] • report_timing –setup –max_paths 10

# For 10 worst setup paths

• report_timing –hold –to [get_cells {/top/item}]

# Hold on “item”

– Use filters from your XDC files to check each expression • set_multicycle_path –from [get_pins regA/C] –to [get_pins regB/D] • report_timing –from [get_pins regA/C] –to [get_pins regB/D] Page 40 © Copyright 2013 Xilinx .

Timing Command Summary

Obtain full timing summary of the design

report_timing_summary

: summary subsections for all timing checks

Create and validate clocks

check_timing

: for missing clocks and IO constraints –

report_clocks

: check frequency and phase –

report_clock_networks

: possible clock root

Validate clock groups

report_clock_interaction Validate I/O delays

– –

report_timing

from

[

input_port

] –

setup

/-

hold report_timing

to

[

output_port

] –

setup

/-

hold Add exceptions if necessary

– Validate using

report_timing

Page 41 © Copyright 2013 Xilinx .

Managing Constraint Files

Using a single XDC file

– XDC apply to both synthesis & implementation

Using multiple XDC files

– Main XDC with top level constraints • Primary clocks and I/O delays • Exceptions on clocks and RTL objects – Implementation specific XDC • Physical constraints • Exceptions based on physical netlist

main.xdc

The order of constraint files matters!

– To report the order of XDC files: report_compile_order –constraints

impl.xdc

Page 42 © Copyright 2013 Xilinx .

Elaboration Synthesis Implementation

Managing IP Constraint Files

Some IP come with their own XDC constraints

– Example: The clocking wizard

The clocking wizard XDC will be read before the user XDC by default

(user constraints can override IP defined clocks by default)

The order of constraint files matters!

– To report the order of XDC files: report_compile_order –constraints – Always verify the clocks using report_clocks (step 2 of 4-step process) – To change the default processing order set_property set_processing_order

early

|

late

IP_XDC_File

– If necessary,

IP_XDC_files

can be enabled/disabled Page 43 © Copyright 2013 Xilinx .

and Floorplanning

© Copyright 2013 Xilinx .

Clock and Pin Planning

Pin and Clock Planning often happens early in the Project

– Decisions here can have prolific effects throughout the design • Excessive clock skew • Poor I/O timing • Timing hazardous clock domain crossing • Less flexible logic placement • Fewer clocking resource choices • Excessive routing delays • Reduced device utilization

Pin and Clock Planning should be considered together

– Choices made for clock pins affect clocking timing and resources choices – Choices made for data pins affect clock pin placement decisions

Review the Board & Device Planning Chapter in UG949 Review the Board and FPGA Planning tab in the Design Methodology Checklist

Page 45 © Copyright 2013 Xilinx .

Clock and Pin Planning

Considerations for clock pin planning

– Generate all I/O interface and clocking IP prior to pin assignment – Consolidate clocking where possible and consolidate MMCMs • Fewer clocks and MMCM means fewer clock resources and crossings – Consider all CDC when assigning clocking resource and pins

Considerations for data pin planning

– Group related data pins in same bank, or adjacent banks if single bank not possible • Place associated I/O clock in same bank when possible – Consider associated control signal placement along with data paths – Consider data flow as planning pinout • Chose a pinout that has clean passage through device – Place high fanout signals towards the middle of the chip • Really high fanout signals considered for CCIO pins with BUFG resources – Evaluate all pin attributes (I/O Standard, Slew, etc.) during placement Page 46 © Copyright 2013 Xilinx .

Clock and Pin Planning

Use Vivado Pin Planning capabilities

– Import pin & clocking assignments from generated IP – Visualization of I/O resource placement on package and in device – DRC, SSN and other checks available to validate choices – Configuration pin assignments & possible device migration considerations

Re-evaluate in Vivado any subsequent pin changes

– Understand how PCB pin swaps affect timing & resources

Vivado I/O & Clock Planning Tutorial UG935

– Available in DocNav and Vivado Page 47 © Copyright 2013 Xilinx .

Additional Considerations for SSI Devices

Clocking

– High fanout clocks should be placed in center SLRs – Place regional clocks on center clock region within an SLR – Place clock pin / MMCMs in same SLR as timing critical I/O interfaces (avoid driving timing critical I/O interfaces from a different SLR) – Clock pin choices should be balanced across upper & lower SLR: • 2 upper SLR clock domains have 8 BUFG x 2 • 4 lower SLR clock domains have 4 BUFG x 4

Pinout

– High fanout signals feeding all SLRs placed in center SLRs – I/O interfaces should not span across SLRs – Pay attention to data flow across SLRs • Avoid the need for multiple SLR crossings due to pinout decisions

For more details

Consult UG872: Large FPGA Methodology Guide for more details Page 48 © Copyright 2013 Xilinx .

Improving Placement Through Floorplanning

First improve HDL, synthesis & constraints

– Easier, more repeatable to not floorplan when avoidable

Start design without any floorplanning

– See what P&R algorithms can do without restrictions

Using Vivado IDE

– Highlight placement per module as guideline – Visualize placement of critical timing paths • Understand data flow in & out of Pblocks • Understand affects of Pblock inside & out • Resources around placement can affect data flow – Create Pblocks minding resource utilization

Careful not to over floorplan – Less is best

– Only floorplan the critical areas of the design – Do not create Pblocks with very high utilization • Can create routing congestion or new timing problems – Avoid overlapping Pblocks • Creates more complex placement and clock scenarios Page 49 © Copyright 2013 Xilinx .

Baseline run with highlighted regions

© Copyright 2013 Xilinx .

UltraFastTM Methodology Review

For optimal results, adapt your HDL style to the FPGA

– Be mindful of BRAM, LUTRAM, DSP, SRL inference needs – Avoid asynchronous reset and wired resets in general – Minimize control signals – For large FPGAs, design with the dataflow and floorplanning in mind

Baseline your constraints to converge rapidly Provide clean timing constraints

– Bad constraints results in bad runtime, performance and HW failures – Learn the essentials of timing creation & validation methods

Follow pin/clock planning guidelines

– Must follow dataflow – Place large fanout clocks and pins in the center of SSIT devices Page 51 © Copyright 2013 Xilinx .

Follow Xilinx

facebook.com/ XilinxInc twitter.com/ XilinxInc youtube.com/ XilinxInc © Copyright 2013 Xilinx .

© Copyright 2013 Xilinx .