Title (Arial Bold Italic 26pt)

Download Report

Transcript Title (Arial Bold Italic 26pt)

METHODOLOGY FOR HIGHSPEED CLOCK TREE
IMPLEMENTATION IN
LARGE CHIPS
Ravinder Rachala
Aaron Grenat
Prashanth Vallur
Christopher Ang
January 31, 2013
ADVANTAGES OF CUSTOM CLOCK DISTRIBUTION
Low skew
Smaller AOCV timing uncertainty compared to full CTS
Custom buffers are more tolerant to OCV, IR drop, supply noise
The plot here displays a scenario where increased skew would require
boosting voltage to achieve target Fmax. Effectively skew translates to higher
power (dynamic and leakage) for meeting a target frequency.
3E+09
2.8E+09
0ps skew
2.6E+09
10ps skew
2.4E+09
20ps skew
30ps skew
2.2E+09
40ps skew
2E+09
50ps skew
1.8E+09
60ps skew
70ps skew
1.6E+09
80ps skew
Low
Skew
1.4E+09
High
Skew
90ps skew
100ps skew
1.2E+09
1E+09
0.75
0.8
0.85
0.9
0.95
1
1.05
1.1
2 | Methodology for High-Speed Clock Tree Implementation in Large Chips | January 31, 2013 | Public
1.15
OLD METHODOLOGY – CLOCK SPINE FRIENDLY FPLAN
Clock Spine
Macros
PLL
• Regular and repetitive structure like the above floorplan is conducive to
thin, long clock macro structures like above. Here we build 2 unique
types of clock macros and stamp them. So, custom macro effort is
relatively small compared to more complex floorplans.
3 | Methodology for High-Speed Clock Tree Implementation in Large Chips | January 31, 2013 | Public
OLD FLOW - CLOCK SPINE TOPOLOGY IN COMPLEX FLOORPLAN
•
•
In more complex floorplans like above we would end up needing too many custom
clock spine macros which are resource intensive and hard to converge in time for
chip tapeout.
Traditional clock spine macro style is not scalable for today’s complex chips
4 | Methodology for High-Speed Clock Tree Implementation in Large Chips | January 31, 2013 | Public
ISSUES WITH OLD METHODOLOGY
Very resource intensive. Increasing number of SOCs in roadmap makes this
even more challenging
Area taken by the clock trees is badly utilized …<10%
Increasing size of the macros (of the order of ~20mm) runs risk of not
converging through the custom macro/IP build flow
Floorplan challenges in accommodating the clock macros and minimizing the
number of unique macros typically consumes lot of resource energy and time
Re-use of clock macros across projects is heavily restricted by even small
floorplan changes between projects
5 | Methodology for High-Speed Clock Tree Implementation in Large Chips | January 31, 2013 | Public
TMAC FLOW : NEW METHODOLOGY
Clock macros are broken down to cells (called as TMACs: Tiny-MACros) that will be flat
instantiations at IP level
Connection between the TMACs is done in overlay (or RDL - Route Distribution Layers)
Clock
Macro
TMAC
cells
1mm
6 | Methodology for High-Speed Clock Tree Implementation in Large Chips | January 31, 2013 | Public
TMAC FLOW : NEW METHODOLOGY – SAMPLE CLOCK SPINE + MESH TOPOLOGY
CTS Root buffer or
Clock Gater
MH (Horizontal
Low-Res Layer)
MV (Vertical
Low-Res Layer)
7 | Methodology for High-Speed Clock Tree Implementation in Large Chips | January 31, 2013 | Public
PRIOR WORK: EXAMPLE
Tile/RLM
IP floorplan
Conduit - 1
Vtree - 1
Htree - 8
PLL
(All 8 flavors are
delay-matched)
Total unique
clock macros
Bad skew
 Driving large areas of the design from a corner (i.e., huge cap on the buffer, big current
through the wire) causes EM, self-heating issues
 Long distribution wire susceptible to ringing/reflections (parasitic inductance)
8 | Methodology for High-Speed Clock Tree Implementation in Large Chips | January 31, 2013 | Public
= 10
CLOCK COVERAGE IS BETTER IN NEW METHODOLOGY
Tile/RLM
IP floorplan
TMAC
Overlay
PLL
 1 clock spine
 All TMAC cells connected in overlay
 More clock coverage
TMAC
9 | Methodology for High-Speed Clock Tree Implementation in Large Chips | January 31, 2013 | Public
TMAC FLOW : NEW METHODOLOGY BENEFITS
Entire distribution is contained in one clock spine
Reduces number of circuit and layout resources
Frees up area between the TMACs for RLMs/Tiles
TMAC library of cells built once per technology node (e.g. GF 28nm), reused across all
projects in that process technology
Floorplan changes can be easily accommodated even in late stages of design cycle
Provides more complete and robust clock coverage. Bad skew zones are avoided,
reliability concerns minimized
Instance swapping (Sizing clock mesh drivers for power and performance optimization)
can be done easily based on the clock mesh load
Creates full-custom quality clock spine network with significantly “less” effort
10 | Methodology for High-Speed Clock Tree Implementation in Large Chips | January 31, 2013 | Public
GRID CAP OPTIMIZATION, SDF FOR SKEW ANNOTATION
 Clock grid optimization techniques - reduced clock metal capacitance (by
~45%)
– Classic clock mesh pruning methods like on-demand-grid
– Pushing VIA stack into the MPCTS (Multi-Point CTS) buffer.
 Providing clock arrival times at each MPCTS entry point on the mesh
(SDF file) for full-chip timing flow
Clock mesh (MH Layer)
CLK
(M2)
Standard MPCTS buffer cell. Auto
router built connection from ‘CLK’ pin
to ‘MH’ clock grid route.
Clock mesh (MH Layer)
All of this route
cap is saved.
Skew from
circuitous route
is avoided.
11 | Methodology for High-Speed Clock Tree Implementation in Large Chips | January 31, 2013 | Public
CLK
(MH)
CLK
(M2)
New MPCTS buffer cell. Connection
from M2 pin to MH layer is built into
the cell. Pin is elevated to MH layer.
New cell is the same size as standard
cell.
TMAC METHODOLOGY : FLOW CHART
Import IP/SOC
floorplan (DEF or GDS)
into Cadence Virtuoso
layout XL
Draw full clock
spine in Cadence
Virtuoso XL
(schematic, layout)
Export entire clock
spine layout to a
DEF file (using
internal flow)
Merge clock spine
DEF with other
overlay DEF (top
layer power grid +
clock mesh etc.) –
First Encounter
Push down clock
design (distribution +
mesh/grid) into
floorplan views for
RLM/tiles to see for
CTS buffer placement
etc.
Extract clock routes
(StarRCXT) at
IP/SOC top level
and run timing
using Primetime.
12 | Methodology for High-Speed Clock Tree Implementation in Large Chips | January 31, 2013 | Public
CUSTOM DESIGN DATA TO DEF CONVERSION FLOW CHART
cdl
component
cell list
gdsii
lvs
cross
reference
files
annotated
gdsii file
data processing tools
internal
database
def writer
def
13 | Methodology for High-Speed Clock Tree Implementation in Large Chips | January 31, 2013 | Public
CLOCK GRID INSERTION AND SDF GENERATION: FLOW CHART
Top level script prunes MH
route completely and inserts
back shortest possible
segment to connect CTS
entry buffers to nearest MV
layer
Draw clock
mesh/grid routes in
FE (Spec from clock
circuit team – route
width, space,
shielding)
Push down the mesh
into the tiles. CTS
buffer placement flow
is run. Tiles close
placement, routing
and timing..
Run CES flow. Skew (clock
arrival times – SDF file) is
reported to full-chip timing
flow. Here clock routes are
analyzed for EM pass/fail
criteria as well.
All tile DEFs are
exported for full
clock mesh
extraction and
spice simulation
flow (CES)
14 | Methodology for High-Speed Clock Tree Implementation in Large Chips | January 31, 2013 | Public
Extract clock
distribution routes at
IP/SOC level and run
full-chip STA timing
(Primetime).
BENEFITS PROVEN IN RECENT AMD SOCS
 Less resource needs
– 32nm SOI APU Graphics IP: 7 clocks. ~30 clock macros. 4 circuit and 4 layout resources
– 28nm APU Graphics IP: 9 clocks: 1 clock spine DEF. 1.5 circuit and 1 layout resource
 Area savings
– 32nm SOI APU Graphics IP area : 98 mm2
clock macro area: 1.21 mm2  1.23%
– 28nm APU Graphics IP area: 131 mm2
clock macro area: 0.18 mm2  0.12%
 Floorplan flexibility
– With the new methodology (TMAC flow), high-speed clock distribution can be designed to
fit into any floorplan.
– E.g.: We were able to deliver clock distribution design to a server SOC in ¼ the time it
takes in the old clock spine macro flow.
 Reuse across projects
– TMAC library (clock buffer cells etc.) developed for a technology process are being
leveraged for multiple APU projects.
15 | Methodology for High-Speed Clock Tree Implementation in Large Chips | January 31, 2013 | Public
Q&A
Thank You
16 | Methodology for High-Speed Clock Tree Implementation in Large Chips | January 31, 2013 | Public
Trademark Attribution
AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States
and/or other jurisdictions. Other names used in this presentation are for identification purposes only and may be trademarks of
their respective owners.
©2012 Advanced Micro Devices, Inc. All rights reserved.
17 | Methodology for High-Speed Clock Tree Implementation in Large Chips | January 31, 2013 | Public