Low-Power Design of Digital VLSI Circuits Gate-Level Power Optimization Vishwani D. Agrawal James J.

Download Report

Transcript Low-Power Design of Digital VLSI Circuits Gate-Level Power Optimization Vishwani D. Agrawal James J.

Low-Power Design of Digital VLSI Circuits Gate-Level Power Optimization

Copyright Agrawal, 2011 Vishwani D. Agrawal James J. Danaher Professor Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL 36849 [email protected]

http://www.eng.auburn.edu/~vagrawal Lectures 11-14: Gate-level optimization 1

Components of Power

 Dynamic 

Signal transitions

Logic activity

Glitches

 Short-circuit (often neglected)  Static  Leakage Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 2

Power of a Transition

V i R i sc V DD R Ground V o C L Dynamic Power = C L V DD 2 /2 + P sc

Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 3

Dynamic Power

 Each transition of a gate consumes

CV

2 /2.

 Methods of power saving:  Minimize load capacitances  Transistor sizing  Library-based gate selection  Reduce transitions  Logic design  Glitch reduction Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 4

Glitch Power Reduction

 Design a digital circuit for minimum transient energy consumption by eliminating hazards Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 5

Theorem 1

 For correct operation with minimum energy consumption, a Boolean gate must produce no more than

one

event per transition.

Output logic state changes One transition is necessary Output logic state unchanged No transition is necessary

Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 6

Event Propagation

0

Single lumped inertial delay modeled for each gate PI transitions assumed to occur without time skew

Path P1 1 3 1 2 4 6 P2 1 3 0 0

Copyright Agrawal, 2011

2 2 5

Lectures 11-14: Gate-level optimization

Path P3

7

Inertial Delay of an Inverter

V in d HL V out d LH d = d HL +d LH ────

2 Lectures 11-14: Gate-level optimization

time

Copyright Agrawal, 2011 8

Multi-Input Gate

A DPD: Differential path delay B Delay d < DPD C A B DPD C Copyright Agrawal, 2011 d d Hazard or glitch Lectures 11-14: Gate-level optimization 9

A B C Copyright Agrawal, 2011

Balanced Path Delays

A DPD B Delay buffer Delay d < DPD C d No glitch Lectures 11-14: Gate-level optimization 10

A B

Glitch Filtering by Inertia

A d Delay > DPD C B DPD C Copyright Agrawal, 2011 d > DPD Filtered glitch Lectures 11-14: Gate-level optimization 11

Theorem

 Given that events occur at the input of a gate, whose inertial delay is exceed

d

, at times,

t

1 ≤ . . . ≤

t

n , the number of events at the gate output cannot

t

n

– t

1 min (

n

, 1 + )

d t

n

- t

1

time t

1

t

2

t

3

t

n

Lectures 11-14: Gate-level optimization Copyright Agrawal, 2011 12

Minimum Transient Design

 Minimum transient energy condition for a Boolean gate:

| t

i

– t

j

| < d Where t

i

and t

j

are arrival times of input events and d is the inertial delay of gate

Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 13

Balanced Delay Method

   All input events arrive simultaneously Overall circuit delay not increased Delay buffers may have to be inserted 1 1

3

1 1 1 No increase in critical path delay 1 1 1 1 1 Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 14

Hazard Filter Method

   Gate delay is made greater than maximum input path delay difference No delay buffers needed

(least transient energy)

Overall circuit delay may increase 1 1 1 1 1 Copyright Agrawal, 2011 1 1 1 1 3 Lectures 11-14: Gate-level optimization 15

Designing a Glitch-Free Circuit

   Maintain specified critical path delay.

Glitch suppressed at all gates by  Path delay balancing  Glitch filtering by increasing inertial delay of gates or by inserting delay buffers when necessary.

A linear program optimally combines all objectives.

Path delay = d1 |d1 – d2| < D Path delay = d2 Delay D Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 16

Problem Complexity

 Number of paths in a circuit can be exponential in circuit size.

 Considering all paths through enumeration is infeasible for large circuits.

 Example: c880 has 6.96M path constraints.

17 Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization

Define Arrival Time Variables

d i

Gate delay.

 Define two

timing window variables

per gate output:   

t

i

T

i

Earliest time of signal transition at gate

i

.

Latest time of signal transition at gate

i

.

Glitch suppression constraint:

T i – t i < d i t

1

, T

1

.

.

.

d i t

i

, T

i

t

n

, T

n

Reference: T. Raja,

Master’s Thesis

, Rutgers Univ., 2002.

Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 18

Linear Program

 Variables: gate and buffer delays, arrival time variables.

 Objective: minimize number of buffers.

 Subject to: overall circuit delay constraint for all input-output paths.

 Subject to: minimum transient energy condition for all multi-input gates.

Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 19

An Example: Full Adder add1b

1 1 1 1 1 1

Critical path delay = 6

1 1 1

Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 20

Linear Program

   Gate variables: d

4

. . . d

12

Buffer delay variables: d

15

. . . d

29

Window variables: t

4

. . . t

29

and T

4

. . . . T

29

Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 21

Multiple-Input Gate Constraints

For Gate 7:

T

7

T

7

≥ T

5

≥ T

6

+ d

7

+ d

7

Copyright Agrawal, 2011 t

7

t

7

≤ t

5

≤ t

6

+ d

7

+ d

7

Lectures 11-14: Gate-level optimization d

7

> T

7

– t

7

Glitch suppression

22

Single-Input Gate Constraints

Buffer 19:

T

16

+ d

19

t

16

+ d

19

= T

19

= t

19

Lectures 11-14: Gate-level optimization Copyright Agrawal, 2011 23

Critical Path Delay Constraints

Copyright Agrawal, 2011 T

11

T

12

≤ ≤

maxdelay maxdelay maxdelay is specified

Lectures 11-14: Gate-level optimization 24

Objective Function

 Need to minimize the number of buffers.

 Because that leads to a nonlinear objective function, we use an approximate criterion: minimize ∑ (buffer delay) all buffers i.e., minimize d 15 + d 16 + ∙ ∙ ∙ + d 29  This gives a near optimum result.

Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 25

AMPL Solution:

maxdelay

= 6

1 2 1 2 1 2 1 1

Critical path delay = 6

2 1 1

Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 26

AMPL Solution:

maxdelay

= 7

3 1 2 1 2 1 1

Critical path delay = 7

1 1 2

Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 27

AMPL Solution:

maxdelay

≥ 11

5 1 2 1 1 3

Critical path delay = 11

4 1 3

Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 28

ALU4: Four-Bit ALU 74181

maxdelay

7 10 12 15

Buffers inserted

5 2 1 0

Maximum Power Savings (zero-buffer design): Peak = 33%, Average = 21%

Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 29

ALU4: Original and Low-Power

Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 30

Benchmark Circuits

Circuit ALU4 C880 C6288 c7552 Copyright Agrawal, 2011 Max-delay (gates) 7 15 24 48 47 94 43 86 No. of Buffers 5 0 62 34 294 120 Normalized Power Average 0.80

0.79

0.68

0.68

0.40

0.36

366 111 Lectures 11-14: Gate-level optimization 0.44 0.42 Peak 0.68

0.67

0.54

0.52

0.36

0.34

0.34 0.32 31

C7552 Circuit: Spice Simulation

Power Saving: Average 58%, Peak 68%

Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 32

References

       R. Fourer, D. M. Gay and B. W. Kernighan,

AMPL: A Modeling Language for Mathematical Programming

, South San Francisco: The Scientific Press, 1993.

M. Berkelaar and E. Jacobs, “Using Gate Sizing to Reduce Glitch Power,”

Proc. ProRISC Workshop

, Mierlo, The Netherlands, Nov. 1996, pp. 183-188.

V. D. Agrawal , “Low Power Design by Hazard Filtering,”

VLSI Design

, Jan. 1997, pp. 193-197.

Proc.

10 th

Int’l Conf.

V. D. Agrawal, M. L. Bushnell, G. Parthasarathy and R. Ramadoss

Proc.

12 th

Int’l Conf. VLSI Design

, Jan. 1999, pp. 434-439.

, “Digital Circuit Design for Minimum Transient Energy and Linear Programming Method,” T. Raja, V. D. Agrawal and M. L. Bushnell, “Minimum Dynamic Power CMOS Circuit Design by a Reduced Constraint Set Linear Program,”

Proc.

16 th

Int’l Conf. VLSI Design

, Jan. 2003, pp. 527-532.

T. Raja, V. D. Agrawal , and M. L. Bushnell, “Transistor sizing of logicgates to maximize input delay variability,”

J. Low Power Electron., vol.

2, no. 1, pp. 121 – 128, Apr. 2006.

T. Raja, V. D. Agrawal , and M. L. Bushnell, “Variable Input Delay CMOS Logic for Low Power Design,” IEEE Trans. VLSI Design, vol. 17, mo. 10, pp. 1534 1545. October 2009.

Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 33

Exercise: Dynamic Power

 An average gate  VDD, V = 1 volt     Output capacitance, C = 1pF Activity factor, α = 10% Clock frequency, f = 1GHz What is the dynamic power consumption of a 1 million gate VLSI chip?

Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 34

Answer

 Dynamic energy per transition = 0.5CV

2  Dynamic power per gate  = Energy per second = 0.5 CV 2 α f = 0.5 ✕ 10 – 12 ✕ 1 2 ✕ 0.1 ✕ 10 9 = 0.5 ✕ 10 – 4 = 50 μW Power for 1 million gate chip = 50W Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 35

Components of Power

 Dynamic  Signal transitions  Logic activity  Glitches  Short-circuit  Static 

Leakage

Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 36

Subthreshold Conduction

I ds = I 0 exp

(

V gs – V th

)

nV T ×

(1

–V ds

)

V T I ds

1mA 100 μA 10 μA 1 μA 100nA 10nA 1nA 100pA 10pA

Saturation region Subthreshold slope

g

V th

0 0.3 0.6 0.9 1.2 1.5 1.8 V

V gs

Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization d s 37

Thermal Voltage,

v

T V T = kT/q = 26 mV, at room temperature.

When V ds is several times greater than V T I ds = I 0 V gs – V th exp( ───── ) nV T

Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 38

Leakage Current

     Leakage current equals

I ds

Leakage current,

I ds

=

I

0 when

V gs

= 0 exp(

– V th /nV T

) At cutoff,

V gs

=

V th

, and

I ds

Lowering leakage to 10 -

b

✕ =

I

0

I

0

V th

=

bnV T

ln 10 = 1.5

b

× 26 ln 10 = 90

b

mV Example: To lower leakage to

I

0 /1,000

V th

= 270 mV Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 39

Threshold Voltage

V th

=

V t0

+ γ[(Φ

s

+

V sb

) ½ – Φ

s

½ ] 

V t0

is threshold voltage when source is at body potential ( 0.4 V for 180nm process )   Φ

s =

2

V T

ln(

N A /n i

) is surface potential γ = (2

q

ε

si N A

) ½

t ox

/ ε

ox

coefficient (0.4 to 1.0) is body effect  

N A n i

is doping level = = 1.45

×10 10 cm –3 8 ×10 17 cm –3 Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 40

Threshold Voltage,

V

sb

= 1.1V

      Thermal voltage,

V T

Φ

s

= 0.93 V =

kT/q

ε

ox

ε

si

= 3.9

×8.85×10 -14 = 11.7

×8.85×10 -14 F/cm F/cm = 26 mV t

ox

= 40 A γ = 0.6 V ½ o 

V th

=

V t0

+ γ[(Φ

s

+

V sb

) ½ Φ

s

½ ] = 0.68 V Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 41

A Sample Calculation

 

V DD

= 1.2V, 100nm CMOS process Transistor width, W = 0.5

μm   OFF device (

V gs

I 0

=

V th

) leakage = 20nA/ μm, for low threshold transistor 

I 0

= 3nA/ μm, for high threshold transistor 100M transistor chip  Power = (100 ×10 6 /2)(0.5

×20×10 -9 A)(1.2V) = 600mW

for all low-threshold transistors

 Power = (100 ×10 6 /2)(0.5

×3×10 -9 A)(1.2V) = 90mW

for all high-threshold transistors

Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 42

Dual-Threshold Chip

 Low-threshold only for 20% transistors on critical path.

 Leakage power = 600 ×0.2 + 90×0.8

= 120 + 72 = 192 mW 43 Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization

Dual-Threshold CMOS Circuit

Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 44

Dual-Threshold Design

   To maintain performance, all gates on critical paths are assigned low

V th .

Most other gates are assigned high

V th

.

But, some gates on non-critical paths may also be assigned low

V th

to prevent those paths from becoming critical.

Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 45

Integer Linear Programming (ILP) to Minimize Leakage Power

 Use dual-threshold CMOS process   First, assign all gates low

V th

Use an ILP model to find the delay (

T c

) of the critical path   Use another ILP model to find the optimal

V th

assignment as well as the reduced leakage power for all gates without increasing

T c

Further reduction of leakage power possible by letting

T c

increase Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 46

ILP -

Variables

For each gate

i

define two variables.

T i

:

the longest time at which the output of gate

i

can produce an event after the occurrence of an input event at a primary input of the circuit. 

X i :

a variable specifying low or high

V th

for gate

i

;

X i

is an integer [0, 1], 1  gate

i

is assigned low

V th ,

0  gate

i

is assigned high

V th

.

Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 47

ILP -

objective function

Leakage power: P leak

V dd

I leaki i

minimize the sum of all gate leakage currents, given by   

Min

i

X i

I Li

  1 

X i

 

I Hi

I Li

is the leakage current of gate

i

with low

V th I Hi

is the leakage current of gate

i

with high

V th

Using SPICE simulation results, construct a leakage current look up table, which is indexed by the gate type and the input vector . Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 48

ILP -

Constraints

 For each gate (1)

T i

T j

X i

D Li

  1 

X i

 

D Hi

output of gate

j

is fanin of gate

i

Gate j (2) 0 

X i

 1 T j Gate i T i  Max delay constraints for primary outputs (PO)

T i

T

max

T max

is the maximum delay of the critical path Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 49

ILP Constraint Example

1 0 2 3

T i

T j

X i

D Li

  1 

X i

 

D Hi

  Assume all primary input (PI) signals on the left arrive at the same time. For gate 2, constraints are

T

2 

T

0 

X

2 

D L

2

T

2  0 

X

2 

D L

2    1   1 

X

2  

D H

2

X

2  

D H

2 Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 50

ILP – Constraints (cont.)

 

D Hi

is the delay of gate

D Li

is the delay of gate

i i

with high

V th

with low

V th

 A second look-up table is constructed and specifies the delay for given gate types and fanout numbers. 51 Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization

ILP – Finding Critical Delay

T i

T

max  

T max

can be specified or be the delay of longest path (

T c

).

To find

T c

, we first delete the above constraint and assign all gates low

V th

0 

X i

 1

X i

 1   Maximum

T i

in the ILP solution is

T c .

If we replace

T max

with

T c

, the objective function then minimizes leakage power without sacrificing performance.

Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 52

Copyright Agrawal, 2011

Power-Delay Tradeoff

0.5

0.4

0.3

0.2

0.1

1 0.9

0.8

0.7

0.6

1 C432 C880 C1908 1.5

1.1

1.2

1.3

Normalized Critical Path Delay Lectures 11-14: Gate-level optimization 1.4

53

Power-Delay Tradeoff

  If we gradually increase

T max

from

T c

, leakage power is further reduced, because more gates can be assigned high

V th

.

But, the reduction trends to become slower.

  When

T max =

(130%)

T c

, the reduction about levels off because almost all gates are assigned high

V th

. Maximum leakage reduction can be 98%. Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 54

Leakage & Dynamic Power Optimization 70nm CMOS c7552 Benchmark Circuit @ 90

o

C

900 800 700 600 500 400 300 200 100 0 Original circuit Copyright Agrawal, 2011 Optimized design Lectures 11-14: Gate-level optimization Leakage power Dynamic power Total power Y. Lu and V. D. Agrawal, “CMOS Leakage and Glitch Minimization for Power Performance Tradeoff,” Journal

of Low Power Electronics

(JOLPE), vol. 2, no. 3, pp. 378 387, December 2006.

55

Summary

 Leakage power is a significant fraction of the total power in nanometer CMOS devices.

 Leakage power increases with temperature; can be as much as dynamic power.

 Dual threshold design can reduce leakage.

 Reference: Y. Lu and V. D. Agrawal, “CMOS Leakage and Glitch Minimization for Power-Performance Tradeoff,”

J. Low Power Electronics,

Vol. 2, No. 3, pp. 378-387, December 2006.

 Access other paper at http://www.eng.auburn.edu/~vagrawal/TALKS/talks.html

Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 56

Problem: Leakage Reduction

Following circuit is designed in 65nm CMOS technology using low threshold transistors. Each gate has a delay of 5ps and a leakage current of 10nA. Given that a gate with high threshold transistors has a delay of 12ps and leakage of 1nA, optimally design the circuit with dual-threshold gates to minimize the leakage current without increasing the critical path delay. What is the percentage reduction in leakage power? What will the leakage power reduction be if a 30% increase in the critical path delay is allowed?

Copyright Agrawal, 2011 Lectures 11-14: Gate-level optimization 57

Solution 1: No Delay Increase

Three critical paths are from the first, second and third inputs to the last output, shown by a dashed line arrow. Each has five gates and a delay of 25ps. None of the five gates on the critical path (red arrow) can be assigned a high threshold. Also, the two inverters that are on four-gate long paths cannot be assigned high threshold because then the delay of those paths will become 27ps. The remaining three inverters and the NOR gate can be assigned high threshold. These gates are shaded blue in the circuit.

The reduction in leakage power = 1 – (4 × 1+7 × 10)/(11 × 10) =

32.73%

Critical path delay = 25ps 5ps 12ps 5ps 5ps 5ps 12ps 12ps 5ps Lectures 11-14: Gate-level optimization 12ps 5ps 5ps Copyright Agrawal, 2011 58

Solution 2: 30% Delay Increase

Several solutions are possible. Notice that any 3-gate path can have 2 high threshold gates. Four and five gate paths can have only one high threshold gate. One solution is shown in the figure below where six high threshold gates are shown with shading and the critical path is shown by a dashed red line arrow.

The reduction in leakage power = 1 – (6 × 1+5 × 10)/(11 × 10) =

49.09%

Critical path delay = 29ps 5ps 12ps 5ps 12ps 5ps 12ps 12ps 12ps 5ps 12ps 5ps Lectures 11-14: Gate-level optimization Copyright Agrawal, 2011 59