Presentation

Download Report

Transcript Presentation

Enhancing Performance of Iterative
Heuristics
for VLSI Netlist Partitioning
Dr. Sadiq M. Sait
Dr. Aiman El-Maleh
Mr. Raslan Al Abaji.
Computer Engineering Department
King Fahd University of Petroleum & Minerals
1
Outline ….
•
•
•
•
•
•
Introduction
Problem Formulation
Cost Functions
PowerFM
Experimental Results
Conclusion
2
The VLSI Chip in 2006
Technology
Transistors
Logic gates
Size
Clock
Chip I/O’s
Wiring levels
Voltage
Power
Supply current
0.1 um
200 M
40 M
520 mm2
2 - 3.5 GHz
4,000
7-8
0.9 - 1.2
160 Watts
~160 Amps
Performance
Power consumption
Noise immunity
Area
Cost
Time-to-market
Tradeoffs!!!
3
Why we need Partitioning ?
• Decomposition of a complex system into
smaller subsystems
• Each subsystem can be designed independently
speeding up the design process (divide-and
conquer-approach)
• Decompose a complex IC into a number of
functional blocks, each of them designed by one
or a team of engineers
• Decomposition scheme has to minimize the
interconnections between subsystems
4
Levels of Partitioning
System
System Level Partitioning
PCBs
Board Level Partitioning
Chips
Chip Level Partitioning
Sub-circuits/Blocks
5
Classification of Partitioning
Algorithms
Partitioning Algorithms
Group Migration
Simulation Based Iterative
1.
Kernighan-Lin
1.
Simulated annealing
2.
FiducciaMattheyeses (FM)
2.
Simulated evolution
3.
Tabu Search
4.
Genetic
3.
Multilevel K-way
Partitioning
1.
2.
3.
4.
Performance
Driven
Lawler et
al.
Vaishnav
choi et al.
jun’ichiro
et al.
Others
1.
Spectral
2.
Multilevel
Spectral
6
Problem formulation

Objective: Design a class of iterative
algorithms for VLSI multi objective partitioning
optimizing Power AND Delay AND Cutset
 Constraint: Balanced partitions to a certain
tolerance degree (10%)
7
Cutset
• Based on hypergraph model H = (V, E)
• c(e) = 1 if e spans more than 1 block
• Cutset = sum of hyperedge costs
cutset = 3
8
Delay
•
•
•
•
Gate delay: d(v)
Constant inter-chip wire delay dc :dc  d (v)
Path delay between nodes vi and vj as d(pij)
Number of nodes cut along path pij as
ncut(pij)
• Objective:
Minimize d(pij )   d (vi )  d c  ncut( pij )
v V ( p )
i
ij
9
Power
The average dynamic power consumed by CMOS
logic gate in a synchronous circuit is given by:
Pi
average
2
dd
V
Load
 0 .5 
 Ci  N i
Tcycle
Ni : is the number of output gate transitions per cycle
( switching Probability)
Load
i
C
: is the Load Capacitance
10
Power
Load
i
C
C
basic
i
C
extra
i
basic : Load Capacitances driven by a cell
i
before Partitioning
C
C
extra : additional Load due to off chip
i
capacitance.( cut net)
Total Power dissipation of a Circuit:
2
dd


V
basic
extra
P   
  Ci  Ci  N i
Tcycle i
11
Power
C
C
extra
i
 C
basic
i
extra
: Can be assumed identical for all nets
i
objective: Minimize

Ni
i v
v
:Set of Visible gates Driving a load outside the
partition.
12
Balance
The Balance as a constraint is expressed as follows:
Cells ( Block1)  Cells ( Block 2)
Cells ( Block1)  Cells ( Block 2)

However balance as a constraint is not appealing because it may
prohibit lots of good moves.
Objective : |Cells(block1) – Cells(block2)|
13
Fuzzy Cost Function
• A good partitioning can be described by the
following fuzzy rule
IF solution has
small cutset AND
low power AND
short delay AND
good Balance.
THEN it is a good solution
14
Fuzzy cost function
The above rule is translated to AND-like OWA
 ( x)    min  C ,  P ,  D ,  B 
1
1     C   P   D   B 
4
 (x)
Represent the total Fuzzy fitness of the solution,
our aim is to Maximize this fitness.
C ,  P ,  D ,  B
Respectively (Cutset, Power, Delay ,
Balance ) Fitness.
15
Membership functions
Where Oi and Ci are lower bound and actual cost of objective “i”
 i(x) is the membership of solution x in set “good ‘i’ ”
gi is the relative acceptance limit for each objective.
16
PowerFM- Algorithm
Start with a balanced partition P = {X, Y}.
Repeat
For i = 1 to n:
Choose a free cell b  XY s.t. moving b to the other
side gives the highest Power gain, Pgain(b), and
moving b preserves balance in P.
Move and lock b.
Let gi = gain(b).
Find k s.t. G = g1 + g2 + ….. + gk is maximized and move
the k cells to their complement partitions
Until G = 0
An Example
a g
1
b
c
d
e
f
a
c
a
c g4
d
f
a
c
e
b
e
g2 d
e
f
b
g3
d
f
b
locked
An Example
c g5
d
e
a
f
b
d
g6
e
a
c
f
b
d
a
f
c
e
b
If G = g1 + g2 + g3 + g4 is the largest partial sum,
the final partition after this pass is:
c
d
e
a
f
b
19
Power Gain Calculation
K
K
j 1
j 1
Pgain(i)   S j  j  Xi    S j  j  U i 
Xi: is the set of cut critical nets.
Ui: is the set of uncut critical net.
Pgain (7)  0  (0.3  0.4)  0.7
Pgain (1)  0.1  0  0.1
20
Experimental Results
ISCAS 85-89 Benchmark Circuits
21
PowerFM Vs SimE For Power
For bigger circuits the performance is degraded.
22
GA from PowerFM vs Random Start
S298
S386
S641
S832
S953
S1196
S1238
S1488
S1494
S2081
S3330
S5378
S9234
s13207
s15850
GA Random Start
D
C
P
233
19
1013
356
36
1529
1043
45
2355
444
45
3034
526
96
2916
396
123
5443
475
127
5713
571
104
5648
614
102
5474
302
26
787
571
299
10358
587
573
18437
1313
1090
38149
1399
1683
45611
1820
2183
51747
GA Start From PowerFM
D
C
P
191
10
921
345
31
1401
861
43
2343
441
42
3032
465
89
3012
390
86
4921
461
91
5702
541
83
5248
601
97
5123
260
15
740
435
203
9296
442
423
15356
856
375
28305
951
750
39620
1350
851
43680
23
TS from PowerFM vs Random Start
S298
S386
S641
S832
S953
S1196
S1238
S1488
S1494
S2081
S3330
S5378
S9234
s13207
s15850
TS Random Start
D
C
P
197
24
926
386
30
1426
889
59
2281
446
50
2731
466
99
2518
301
106
4920
408
79
4597
528
98
5529
585
101
5339
225
17
770
533
295
10298
590
430
16527
1052
918
34055
843
1332
41114
1411
1671
47480
TS Start From PowerFM
D
C
P
189
10
849
333
27
1264
844
48
2476
431
40
3135
430
85
2999
335
77
4823
401
74
5190
521
94
6005
534
95
5058
244
12
704
419
257
9288
432
400
15319
835
705
31837
823
1310
40235
1210
1332
45320
24
Conclusion
• Proposed a modification to the FM algorithm,
PowerFM, targeting low power.
• PowerFM results are comparable to SimE but with
a faster runtime.
• Investigated the use of PowerFM as a starting
solution to iterative algorithms, GA and TS.
• GA performed significantly better when starting
from PowerFM.
25