No Slide Title

Download Report

Transcript No Slide Title

Dynamic Power Analysis of
Custom Macros
Stephen Bijansky
Bassam Mohd
Baker Mohammad
2
Outline
• Motivation
• HSIM Power Analysis
• ESP-CV Power analysis
• ESP-CV Flow
• Results
• Conclusions
3
Motivation
• Power characterization is an important part of low
power design
• Custom macro with transistor level design has a
challenge to model active power
–
–
–
–
Spice level simulation is slow
Characterizing all custom cells is a big task
Need a detail gate level model to use ASIC design flow
Changes in the top level affects macro power
Need on going modeling of power with new stimulus
4
Overview
• Power estimation for custom macros
– Transistor level schematics
– Post-layout capacitance extraction
• Reduce analysis time
• Improve accuracy for long test cases
• This work is used extensively in Qualcomm’s 45nm
low power DSPs
5
Traditional Approach (.lib)
• Fast SPICE simulator HSIM
• Assume certain activities on data
• Append power into lib files
– conditional statements based on control signals
• Limitation on conditional statement
– Mutually exclusive
• Depends on internal state nodes
– Has the macro just come out of reset or has the macro
been running for a while
– Potential 2M+N entry in the lib file
6
Cont. Traditional Approach (HSIM)
•
•
•
•
Fast SPICE simulator
Accuracy within 2% to 3% of HSPICE
Use HSIM to run the entire power benchmark
Power benchmark might be thousands of cycles
– Potential for long run time
– Large macros could take days or weeks
• Reduce benchmark to only 100 cycles
– Which 100 cycle window should be used
– Power analysis could be too large or too small
• Can be time consuming and error prone to set initial
conditions
7
1st Order Power Equation
Power = Activity Factor * Cap * Voltage2 * Freq
• Capacitance – LPE
• Voltage – Fixed
• Frequency – Fixed
• Activity Factor – Unknown
8
ESP-CV Simulation
• Symbolic equivalence checking of schematics vs RTL
• Input to ESP-CV is a standard Verilog testbench
• Use ESP-CV as a Verilog simulator for schematics
• Verilog simulation orders of magnitude faster than
Spice
– Functional simulation
– Only need to determine activity factor
9
RC verilog switch-level simulator
G
D
“Gold standard”
For Accuracy
HSPICE
S
“High Performance”
For Accuracy
“Functional Accuracy”
Automated Modeling
“Extremely Fast”
No Timing
VCS
HSIM
ESPCV
10
ESP-CV Simulation
• ESP-CV converts schematic to switch level verilog
– Special directives for transistor strengths
– Internal node names in a custom macro are not in RTL
– ESP-CV uses the internal nodes in the schematic
• Run entire benchmarks using thousands of cycles
– Same benchmarks used in PT-PX for power estimation of
synthesized logic
– Includes reset and initialization
• Fast run time allows running many more benchmarks
11
Flow Steps
Custom
macro design
xmp d g s b qc_pch l=40e-9 w=120e9
Input to the Flow
Spice netlist
VCD on the macro boundaries
Cap file
Output : Power Value in W
Integrate the flow with PTPX chip
level run
Full Chip
simulation
T=0
010101001
T=1101010101
Spice netlist for Macro
fsdb from top level VCS
sims
ESPCV
Vtran converts fsdb into
verilog test bench
Verilog test bench for
macro interface
*GV file from
ESPCV
ESPCV simulate the
Verilog test bench
Vcd dump of all
nodes
Vcd2saif
Nodes AF
(SAIF)
Cap file from
nanotime
Power_calc_script
Power value (avg, peak, static)
set_annotated_power -internal_power 2.452e-02
12
Flow Steps
RTL
Simulation
Create
Verilog
Test Bench
ESP-CV
Switch
Level
Verilog
Simulation
Calculate
Activity
Factor
Node Caps
Calculate
Power
13
RTL Simulation and
Testbench Creation
• Entire benchmark is simulated for the top level design
– Verilog VCS simulation
– Starts from reset, performs initialization, then benchmark
– Single fsdb dump file for each benchmark
• Vtran converts the fsdb dump of the benchmark to a
Verilog testbench
– Macro testbench has all of the same inputs as the top level
simulation
14
Flow Steps
RTL
Simulation
Create
Verilog
Test Bench
ESP-CV
Switch
Level
Verilog
Simulation
Calculate
Activity
Factor
Node Caps
Calculate
Power
15
Calculate Activity Factor
• Process ESP-CV VCD dump file and calculate an
activity factor for each node
• Vcd2saif produces a switch activity interchange
format (SAIF) file
– Time spent at 0/1/Z, numbers of transitions, …
– Computed for only the window of interest
• Process the SAIF file to get the activity factor for each
node
– Transitions / Number of cycles
16
Node Capacitances
• Calibre layout parasitic extraction (LPE)
• Nanotime calculates the total cap of every node
– Reads Calibre SPEF file
– Add gate, diffusion, and wire caps
• Qcs_process_cap_rpt.pl
– Converts Nanotime report to an easy to use column based
text file format
• For nodes, such as bitlines, that do not have a full rail
swing, the caps can be scaled
17
Flow Steps
RTL
Simulation
Create
Verilog
Test Bench
ESP-CV
Switch
Level
Verilog
Simulation
Calculate
Activity
Factor
Node Caps
Calculate
Power
18
Calculate Power
• Qcs_calc_power.pl
– Combines switching activities with the capacitances to
compute the power
– Voltage and frequency are fixed
• Output is a text file with the power, activity factor,
capacitance, and name for each node
– Easily sort to determine which nodes use the most power
– Retains hierarchy  easy to filter
– Can partition to determine power on multiple supply nets
19
100 Cycle Validation
3
2.5
P 2
o
w 1.5
e
r 1
HSIM
ESP-CV
0.5
0
Test1
Test2
Test3
Test4
Test5
Test6
20
100 Cycle Validation
• Run ESP-CV with the same 100 cycle window that is
used for HSIM
• For tests that use more than 1 mW of power,
ESP-CV is within 3% of the HSIM
• For tests that use less than 1 mW,
ESP-CV is within 0.08 mW of HSIM
• ESP-CV has good correlation to HSIM
21
ESP-CV for Entire Test
versus HSIM for 100 Cycles
3
2.5
P 2
o
w 1.5
e
r 1
HSIM
ESP-CV
0.5
0
Test1
Test2
Test3
Test4
Test5
Test6
22
Results
• 100 cycles do not accurately model an entire test
• Test3 reported 4.7X more power using 100 cycles
compared to the entire test
• Test4 reported 55% less power using 100 cycles
compared to the entire test
• Difficult to choose a good 100 cycle window
23
Run Time Comparison
10000
9000
R
u 8000
n 7000
100 cycles
6000
T
i 5000
m 4000
e 3000
(
s
HSIM
ESP-CV
240,000
cycles
2000
)
1000
0
Test1
Test2
Test3
Test4
Test5
Test6
24
Run Time Comparison
• ESP-CV full test simulations
– Test3 with 49,101 cycles took 406 seconds
– Test4 with 240,510 cycles took 3267
– Event based simulations scales with the number of cycles
• ESP-CV 100 cycle simulations needed 21 seconds
– Not many events in 100 cycles
• HSIM needed between 1,950 seconds (Test5) and
9,468 seconds (Test2) to run 100 cycles
– Large differences in run time with fixed number of cycles
25
IR Drop Analysis
• Compute fixed activity factor power for use in
Redhawk IR drop analysis
• Every clock nodes is assigned an activity factor
of 100%
• Every non-clock node is assigned an activity
factor of 15% which is 3 transitions per every 10
clock cycles
• This is worst case analysis that is used to stress
the power grid to see where are the weak points
26
Conclusion
• Simulate an entire benchmark instead of trying to
guess at a subset of the benchmark
– The wrong subset led to a 4.7X overestimation of power
– Includes reset and initialization
• Fast simulation enables running more benchmarks
• ESP-CV is being used to generate power estimations
of longer benchmarks
27
Future Work
• Short circuit power modeling
– Current flow does not address
• Leakage power modeling
– Active leakage power is not accurately modeled
• Enable other methods to calculate node capacitances
• More calibration on different circuit families
28
Thank You!
Questions
29
Backup Slides
30
Nanotime Capacitance Report
# max rise CAP
# max rise CAP
NODE : clk
C_diff
:
C_overlap :
C_gate
:
C_wire
:
C_pin
:
C_total
:
NODE : xblock/lclk
C_diff
:
0.000
C_overlap :
0.013
C_gate
:
0.012
C_wire
:
0.093
C_pin
:
0.009
C_total
:
0.127
0.000
0.004
0.003
0.081
0.006
0.094
31
Process Capacitance Report
%nodeCap = ();
while ($line = <CAPFILE>) {
if ($line =~ /^NODE : (\S+)/) {
$node = $1;
$line = <CAPFILE>; $line = <CAPFILE>; $line = <CAPFILE>;
$line = <CAPFILE>; $line = <CAPFILE>; $line = <CAPFILE>;
if ($line =~ /^C_total\s*:\s*(\S+)/) {
$ctotal = $1;
$nodeCap{$node} = max ($ctotal, $nodeCap{$node};
}
}
}