Presentation

Download Report

Transcript Presentation

Simulated Evolution Algorithm for MultiObjective VLSI Netlist
Bi-Partitioning
Sadiq M. Sait, Aiman El-Maleh, Raslan Al-Abaji
King Fahd University of Petroleum & Minerals
Dhahran, Saudi Arabia
27th May, ISCAS-2003, Bangkok, Thailand
Outline






Introduction
Problem Formulation
Cost Functions
Proposed Approach
Experimental Results
Conclusion
2
VLSI Technology Trends
Design Characteristics
0.06M
2MHz
6um
SPICE
Simulation
0.13M
12MHz
1.5um
CAE
Systems,
Silicon
Compilation
1.2M
50MHz
0.8um
HDLs,
Synthesis
3.3M
200MHz
0.6um
Top-Down
Design,
Emulation
7.5M
333MHz
0.25um
Cycle-Based
Simulation,
Formal
Verification
Key CAD Capabilities
The challenges to sustain such a fast growth to achieve giga-scale
integration have shifted in a large degree, from the process of
manufacturing technologies to the design technology. New issues have
also come up.
3
VLSI Design Cycle
VLSI design process comprises a number of levels:
1.
2.
3.
4.
5.
6.
7.
8.
System Specification
Functional Design
Logic Design
Circuit Design
Physical Design
Design Verification
Fabrication
Packaging Testing and Debugging
4
Physical Design
What is Physical Design? A process that translates a
structural (netlist) description into a geometric description
that is used to manufacture a chip.
The physical design cycle consists of:
1. Partitioning
2. Floorplanning and Placement
3. Routing
4. Compaction
Why do we need Partitioning ?
5
Levels of Partitioning
System
System Level Partitioning
PCBs
Board Level Partitioning
Chips
Chip Level Partitioning
Subcircuits
/Blocks
6
Classification of Partitioning Algorithms
Partitioning Algorithms
Group Migration
1. Kernighan-Lin
2. FiducciaMattheyeses
(FM)
3. Multilevel
K-way
Partitioning
Iterative Heuristics
1.
Simulated
Annealing
2.
Simulated
Evolution
3.
Tabu Search
4.
Genetic
Algorithm
Performance
Driven
Others
1. Lawler et al.
1. Spectral
2. Vaishnav
3. Choi et al.
4. Jun’ichiro
et al.
2. Multilevel
Spectral
7
Related previous Work
1969 A bottom-up approach for delay optimization (clustering) was
proposed by Lawler et al.
1998 A circuit partitioning algorithm under path delay constraint is
proposed by jun’ichiro et al. The proposed algorithm consists of the
clustering and iterative improvement phases.
1999 Two low power oriented techniques based on simulated annealing
(SA) algorithm by choi et al.
1999 Enumerative partitioning algorithm targeting low power were
proposed by Vaishnav et al. Enumerates alternate partitioning and
selects a partitioning that has the same delay but less power
dissipation.
8
Motivation & Objective
Need for Power optimization:
 Portable devices
 Power consumption is a hindrance to further integration
 Increasing clock frequency
Need for Delay optimization:
 In current sub micron design wire delays tend to dominate
gate delay.
 Larger die size imply long on-chip wires which affect
performance
 Delay due to off-chip capacitance
Objectives: Power, Delay & Cutset are optimized
Constraint: Balanced partitions (with some tolerance)
9
Problem formulation

The circuit is modeled as a hypergraph H(V,E), where
V={v1,v2,v3,… vn} is a set of modules (cells)

And E={e1, e2, e3,… ek} is a set of hyperedges. Being the
set of signal nets, each net is a subset of V containing the
modules that the net connects.

A 2-way partitioning of a set of nodes V is to determine
subsets VA and VB such that VA VB = V and VA VB = 
10
Cutset




Based on hypergraph model H = (V, E)
Cost: c(e) = 1 if e spans more than 1 block
Cutset = sum of hyperedge costs
Efficient gain computation and update
cutset = 3
11
Delay
Metal 1
Metal 2
C2
C7
C3
SE1
C1
C4
Cut Line
CoffChip
C5
SE2
C6
Delay ( Pi) 
 Delay (cell )   Delay (net )
cellPi
netPi
Objective : Max Delay ( Pi) 
path : SE1  C1C4C5SE2.
PiP
Delay  = CDSE1 + CDC1+ CDC4+ CDC5+ CDSE2
CDC1 = BDC1 + LFC1 * ( Coffchip + CINPC2+ CINPC3+ CINPC4)
12
Power
The average dynamic power consumed by CMOS logic gate in a
synchronous circuit is given by:
Ni is the number of output
2
V
gate transition per cycle
Pi average  0.5  dd  CiLoad  N i
(Switching Probability)
Tcycle
CiLoad  Cibasic  Ciextra
load capacitance = Load Capacitances before Partitioning + load due to off chip capacitance
Objective:  N i
Total Power dissipation of a Circuit:
P   
2
dd


iv
V
  Cibasic  Ciextra  N i
Tcycle i
13
Unifying Objectives by Fuzzy logic
Weighted Sum Approach




 Delay _ Cost _ of _ Circuit 
 CutsetCost 
 Power 
  Wc 
Cost  Wd 
  Wp 
MaxDelay
MaxPower 
 MaxCutest 






1. Problems in choosing weights
2. Need to tune for every circuit
 Imprecise values of the objectives
Best represented by linguistic terms that are basis of fuzzy
algebra
 Conflicting objectives
 Operators for aggregating function
14
Fuzzy logic for Multi-objective function
1. The cost to membership mapping
2. Linguistic fuzzy rule for combining the membership values in
an aggregating function
3. Translation of the linguistic rule in form of appropriate fuzzy
operators
4. Fuzzy operators
• And-like operators: Min operator  = min (1, 2)
• And-like OWA: = * min (1,2) + ½ (1-) (1+ 2)
• Or-like operators: Max operator  = max (1, 2)
• Or-like OWA: = * max (1,2) + ½ (1-) (1+ 2)
Where  is a constant in range [0,1]
15
Membership functions
Where Oi and Ci are lower bound and actual cost of objective “i”
i(x) is the membership of solution x in set “good ‘i’ gi is the
relative acceptance limit for each objective.
16
Fuzzy linguistic rule & Cost function
A good partitioning can be described by the following fuzzy
rule
IF solution has small cutset AND low power AND short delay AND good Balance
THEN it is a good solution
The above rule is translated to AND-like OWA
 ( x)    min  C ,  P ,  D ,  B  1   
 (x)
1
 C   P   D   B 
4
Represent the total Fuzzy fitness of the solution, our aim
is to Maximize this fitness
C ,  P ,  D ,  B Respectively (Cutset, Power, Delay, Balance)
Fitness
17
Simulated Evolution
Algorithm Simulated_Evolution
Begin
Start with an initial feasible Partition S
Repeat
Evaluation:
Evaluate Gi (goodness) for all modules
Selection:
For each Vi (cell) DO
begin
if Random Rm > Gi then select the cell
End For
Allocation: For each selected Vi (cell) DO
begin
Move the cell to destination block.
End For
Until Stopping criteria is satisfied.
Return best solution.
End
18
Cut goodness
d i  wi
gci 
di
gc5 
3 2
3
 0.33
di: set of all nets, connected and not cut.
wi: set of all nets, connected and cut.
19
Power Goodness
k
gpi 
k
 S  j V    S  j U 
j 1
j
I
j
j 1
i
k
 S  j V 
j 1
j
I
Vi is the set of all nets connected
and Ui is the set of all nets
connected and cut.
0.7  0.4
gp5 
 0.428
0.7
20
Delay Goodness
gd i 
K i  Li
Ki
Ki: is the set of cells in all paths passing
by cell i.
Li: is the set of cells in all paths passing by
cell i and are not in same block as i.
52
gd 5 
 0.6
5
gd 4 
53
 0.4
5
21
Final selection Fuzzy rule
IF cell ‘i’ is near its optimal cut-set goodness as
compared to other cells
AND
AND
near its optimal power goodness
compared to other cells
near its optimal net delay goodness as compared
to other cells
OR T(max)(i) is much smaller than Tmax
THEN it has a high goodness.
22
Experimental Results
ISCAS 85-89 Benchmark Circuits
23
SimE versus Tabu Search & GA against time
Circuit: s13207
24
Experimental Results: SimE versus TS and GA
SimE results were better than TS and GA, with faster execution time.
25
Conclusion

The present work addressed the issue of partitioning
VLSI circuits with the objective of reducing power
and delay (in addition to nets cut)

Fuzzy logic was resorted to for combining multiobjectives

Iterative algorithms (GA, SA, and SimE) were
investigated and compared for performance in terms
of quality of solution and run time

SimE outperformed TS and GA in terms of quality
of solution and execution time
26
Thank you
27
Fuzzy Goodness
Tmax :delay of most critical path
in current iteration.
T(max)(i) :delay of longest path
traversing cell i.
Xpath= Tmax / T(max)(i)
1
g i   i ( x)    min  iC ,  iP ,  iD  1     iC   iP   iD 
3
 iC ,  iP ,  iD
Respectively (Cutset, Power, Delay ) goodness.
28