Simulated Evolution Algorithm for Multi- Objective VLSI Netlist Bi-Partitioning Dhahran, Saudi Arabia

Download Report

Transcript Simulated Evolution Algorithm for Multi- Objective VLSI Netlist Bi-Partitioning Dhahran, Saudi Arabia

Simulated Evolution Algorithm for Multi Objective VLSI Netlist Bi-Partitioning

Sadiq M. Sait,, Aiman El-Maleh, Raslan Al Abaji King Fahd University of Petroleum & Minerals Dhahran, Saudi Arabia

27 May 2003, IEEE ISCAS, Bangkok, Thailand 1

Outline • Introduction • Problem Formulation • Cost Functions • Proposed Approaches • Experimental results • Conclusion

2

VLSI Technology Trends

Design Characteristics 0.06M

2MHz 6um 0.13M

12MHz 1.5um 1.2M

50MHz 0.8um 3.3M

200MHz 0.6um Top-Down Design, Emulation 7.5M

333MHz 0.25um Cycle-based simulation, Formal Verification SPICE Simulation CAE Systems, Silicon compilation HDLs, Synthesis Key CAD Capabilities

The challenges to sustain such a fast growth to achieve giga-scale integration have shifted in a large degree, from the process of manufacturing technologies to the design technology .

3

The VLSI Chip in 2006

Technology Transistors Logic gates Size Clock Chip I/O’s Wiring levels Voltage 0.1 um 200 M 40 M 520 mm 7 - 8 0.9 - 1.2

2 2 - 3.5 GHz 4,000 Power 160 Watts Supply current ~160 Amps Performance Power consumption Noise immunity Area Cost Time-to-market Tradeoffs!!!

4

VLSI Design Cycle

VLSI design process is carried out at a number of levels.

1. System Specification 2. Functional Design 3. Logic Design 4. Circuit Design

5. Physical Design

6. Design Verification 7. Fabrication 8. Packaging Testing and Debugging 5

Physical Design

The physical design cycle consists of:

1. Partitioning

2. Floorplanning and Placement 3. Routing 4. Compaction Physical design converts a circuit description (behavioral/structural), into a geometric description. This description is used to manufacture a chip. 6

Why we need Partitioning ?

• Decomposition of a complex system into smaller subsystems.

• Each subsystem can be designed independently speeding up the design process ( divide-and conquer-approach ).

• Dividing a complex IC into a number of functional blocks, each of them designed by one or a team of engineers.

• The partitioning scheme has to minimize the interconnections between subsystems.

7

Levels of Partitioning

System Level Partitioning Board Level Partitioning Chip Level Partitioning System PCBs Chips Subcircuits / Blocks 8

Classification of Partitioning Algorithms

Partitioning Algorithms Group Migration 1.

2.

3.

Kernighan-Lin Fiduccia Mattheyeses (FM) Multilevel K-way Partitioning Iterative Heuristics 1.

2.

3.

4.

Simulated annealing Simulated evolution Tabu Search Genetic 1.

2.

3.

4.

Performance Driven Lawler et al.

1.

Vaishnav Choi et al.

Jun’ichiro et al.

2.

Others Spectral Multilevel Spectral

9

Related previous Works

1999 Two low power oriented techniques based on simulated annealing (SA) algorithm by choi et al.

1969 A bottom-up approach for delay optimization (clustering) was proposed by Lawler et al.

1998 1999 A circuit partitioning algorithm under path delay constraint is proposed by jun’ichiro et al. The proposed algorithm consists of the clustering and iterative improvement phases.

Enumerative partitioning algorithm targeting low power is proposed in Vaishnav et al. Enumerates alternate partitionings and selects a partitioning that has the same delay but less power dissipation . (not feasible for huge circuits.) 10

Motivation

Need for Power optimization • Portable devices • Power consumption is a hindrance in further integration • Increasing clock frequency Need for delay optimization • In current sub micron design wire delay tend to dominate gate delay . • Larger die size imply long on-chip global routes, which affect performance • Optimizing delay due to off-chip capacitance 11

Objective

• Design a class of iterative algorithms for VLSI multi-objective partitioning.

• Explore partitioning from a wider angle and consider circuit delay, power dissipation and interconnect in the same time, under a given balance constraint 12

Problem formulation

Objectives • Power cost is optimized • Delay cost is optimized • Cutset cost is optimized Constraint • Balanced partitions to a certain tolerance degree (10%) 13

Problem formulation

• the circuit is modeled as a hypergraph H(V,E), where V

={v 1 ,v 2 ,v 3 ,… v n }

is a set of modules (cells).

• And E

= {e 1 , e 2 , e 3 ,… e k }

is a set of hyperedges. Being the set of signal nets , each net is a subset of V containing the modules that the net connects.

• A two-way partitioning of a set of nodes V is to determine two subsets and V A  V B =  V A and V B such that V A U V B = V 14

Cutset

• Based on hypergraph model H = (V, E) • Cost 1: c(e) = 1 if e spans more than 1 block • Cutset = sum of hyperedge costs • Efficient gain computation and update

cutset = 3

15

SE1 Metal 1 Metal 2

Delay Model

C2 C3 C1 C4 C5 C6 CoffChip C7 SE2

path

: SE 1

C 1

C 4

C 5

SE 2 .

Delay  = CD SE1 + CD C1 + CD C4 + CD C5 + CD SE2 CD C1 = BD C1 + LF C1 * ( Coffchip + CINP C2 + CINP C3 + CINP C4 ) 16

Delay

Delay(Pi) =

cell

Pi

Delay

(

cell

)  

net

Pi

Delay

(

net

)

Objective

:

Max Pi

P

Delay

(

Pi

) 

Pi: is any path Between 2 cells or nodes P: set of all paths of the circuit.

17

Power

The average dynamic power consumed by CMOS logic gate in a synchronous circuit is given by:

P i average

 0 .

5 

V

2

dd T cycle

C i Load

N i

Ni is the number of output gate transition per cycle (Switching Probability)

C i Load

: is the load capacitance 18

Power

C i Load

C i basic

C i extra C i basic

: Load Capacitances driven by a cell before Partitioning

C i extra

: Additional load due to off chip capacitance. (cut net) Total Power dissipation of a Circuit:

P

   

V dd

2

T cycle

 

i

  

C i basic

C i extra

 

N i

19

Power

C i extra



C i basic C i extra

: Can be assumed identical for all nets

objective

:

i

  

v

N

i

v

:Set of Visible gates Driving a load outside the partition.

20

Unifying Objectives, How ?

Weighted Sum Approach

cos

t

W d

*  

DelayCosto fCircuit Maxdelay

  

W c CutsetCost MaxCutest

W p

    

Power MaxPower

    

1. Problems in choosing weights.

2. Need to tune for every circuit.

21

Fuzzy logic for cost function

•Imprecise values of the objectives – best represented by linguistic terms that are basis of fuzzy algebra • Conflicting objectives • Operators for aggregating function 22

Fuzzy logic for Multi-objective function

1. The cost to membership mapping 2. Linguistic fuzzy rule for combining the membership values in an aggregating function 3. Translation of the linguistic rule in form of 4.

appropriate fuzzy operators And-like operators: Min operator And-like OWA:  =   = min (  1,  2) * min (  1,  2) + ½ (1  ) (  1+  2) Or-like operators Where  Max operator Or-like OWA:  =   2) * max (  1,   = max (  1,  2) 2) + ½ (1 is a constant in range [0,1]  ) (  1+ 23

Membership functions

Where

O i

and

C i

are lower bound and actual cost of objective “i” i (x) is the membership of solution x in set “good ‘i’ ” g i is the relative acceptance limit for each objective .

24

Fuzzy linguistic rule

• A good partitioning can be described by the following fuzzy rule

IF

solution has small cutset

AND

low power

AND

short delay

AND

good Balance.

THEN

it is a good solution

25

Fuzzy cost function

The above rule is translated to AND-like OWA

 (

x

)    min  

C

, 

P

, 

D

, 

B

  1     1 4  

C

 

P

 

D

 

B

  (

x

) Represent the total Fuzzy fitness of the solution, our aim is to Maximize this fitness 

C

, 

P

, 

D

, 

B

Respectively (Cutset, Power, Delay, Balance) Fitness 26

Simulated Evolution

Algorithm Simulated evolution

Begin

Start with an initial feasible Partition S

Repeat

Evaluation :

Evaluate the G i (goodness) of all modules

Selection :

For

each V i (cell)

DO begin

if Random Rm > G i then select the cell

End For

Allocation

: For

each selected V i (cell)

DO begin

Move the cell to destination Block.

End For Until

Return best solution.

End

Stopping criteria is satisfied.

27

Cut goodness

gc gc

5

i

d i

d i

 3  2 3

w i

 0 .

33 d i : set of all nets, Connected and not cut.

w i : set of all nets, Connected and cut.

28

Power Goodness

gp i

j k

  1

S j

j

V I

 

j k

  1

S j

j

U i

j k

  1

S j

j

V I

V i is the set of all nets connected and Ui is the set of all nets connected and cut.

gp

5  0 .

7  0 .

4 0 .

7  0 .

428 29

Delay Goodness

gd i

K i

L i K i

Ki: is the set of cells in all paths passing by cell i.

Li: is the set of cells in all paths passing by cell i and are not in same block as i.

gd

5  5  2 5  0 .

6

gd

4  5  3 5  0 .

4 30

Final selection Fuzzy rule

IF

Cell I is near its optimal Cut-set goodness as compared to other cells

AND

near its optimal power goodness compared to other cells

AND

near its optimal net delay goodness as compared to other cells

OR

T (max) (i) is much smaller than T max

THEN

it has a high goodness.

31

Fuzzy Goodness

T max :delay of most critical path in current iteration.

T (max) (i) :delay of longest path traversing cell i.

X path = T max / T (max) (i) Fuzzy Goodness:

g i

 

i

(

x

)    min

iC

, 

iP

, 

iD

   

3 

iC

iC

, 

iP

, 

iD

 

iP

 

iD

Respectively (Cutset, Power, Delay ) goodness.

32

Experimental Results

ISCAS 85-89 Benchmark Circuits 33

SimE versus Tabu Search & GA against time Circuit S13207 34

Experimental Results SimE versus Ts versus GA SimE results were better than TS and GA, with faster execution time.

35

Conclusion

Re-write this

• The present work successfully addressed the important issue of reducing power and delay consumption in VLSI circuits.

• The present work successfully formulate and provide solutions to the problem of multi objective VLSI partitioning • TS partitioning algorithm outperformed GA in terms of quality of solution and execution time 36

Thank you

37