ELEN 468 Advanced Logic Design

Download Report

Transcript ELEN 468 Advanced Logic Design

Delay Model and Simulation
ELEN 468 Lecture 30
1
Simulation with Delay
A
X
B
X
A
C
C
3
X
D
2
13
D
B
X
0
10
15
20
30
40
50
tsim
A=x
B=x
C=x
D=x
A=1
B=0
B=1
A=0
C=1
B=0
C=0
C=0
D=0
D=1
D=1
ELEN 468 Lecture 30
2
Delay Models
Gate delay



Intrinsic delay
Layout-induced delay due to capacitive load
Waveform slope-induced delay
Net delay/transport delay

Signal propagation delay along interconnect wires
Module path delay

Delay between input port and output port
ELEN 468 Lecture 30
3
Inertial Delay
Delay is caused by charging and
discharging node capacitors in circuit
A
C
D
B
Gate delay and wire delay
Pulse rejection

If pulse with is less than delay, the pulse is
ignored
ELEN 468 Lecture 30
4
Gate Delay
and (yout, x1, x2);
// default, zero gate delay
and #3 (yout, x1, x2);
// 3 units delay for all transitions
and #(2,3) G1(yout, x1, x2); // rising, falling delay
and #(2,3) G1(yout, x1, x2), G2(yout2, x3, x4);
// Multiple instances
a_buffer #(3,5,2) (yout, x); // UDP, rise, fall, turnoff
bufif1 #(3:4:5, 6:7:9, 5:7:8) (yout, xin, enable);
// min:typ:max / rise, fall, turnoff
•Simulators simulate with only one of min, typ and max delay values
•Selection is made through compiler directives or user interfaces
•Default delay is typ delay
ELEN 468 Lecture 30
5
Gate and Wire Model
C
R
r: resistance per unit length
c: capacitance per unit length
L
rL
cL/2
ELEN 468 Lecture 30
cL/2
6
Example of Model
0
0
rL1
1
L2
C2
L3
2
rL2
R
L1
2
3
cL2/2+C2
1
C3
cL1/2
rL3
(L1+L2+L3)c/2
ELEN 468 Lecture 30
3
cL3/2+C3
7
Delay Estimation
2
R2
R
0
C0
R1
C2
1
C1
R3
3
C3
D0 = R ( C0 + C1 + C2 + C3 )
D1 = D0 + R1 ( C1 + C2 + C3 )
D2 = D1 + R2 C2
D3 = D1 + R3 C3
ELEN 468 Lecture 30
8
Clock Scheduling
LD: logic delay
Register
i
Clock
Combinational
Logic
ti
Register
j
tj
ELEN 468 Lecture 30
9
Timing Constraints
tj
hold
setup
LDmin
ti
LDmax
skewij = ti – tj >= holdmax – LDmin
skewij = ti – tj <= CP – LDmax – setupmax
CP: clock period
ELEN 468 Lecture 30
10
Assignment
ELEN 468 Lecture 30
11
Blocking and Non-blocking Assignment
initial
begin
a = 1;
b = 0;
a = b; // a = 0;
b = a; // b = 0;
end
initial
begin
a = 1;
b = 0;
a <= b; // a = 0;
b <= a; // b = 1;
end
Blocking assignment “=“


Statement order matters
A statement has to be executed
before next statement
Non-blocking assignment “<=“




Concurrent assignment
Normally the last assignment at
certain simulation time step
If it triggers other blocking
assignments, it is executed before the
blocking assignment it triggers
If there are multiple non-blocking
assignments to same variable in
same behavior, latter overwrites
previous
ELEN 468 Lecture 30
12
Procedural Continuous Assignment
Continuous assignment establishes
static binding for net variables
Procedural continuous assignment
(PCA) establishes dynamic binding for
variables


“assign … deassign” for register variables
only
“force … release” for both register and
net variables
ELEN 468 Lecture 30
13
Intra-assignment Delay:
Blocking Assignment
// B = 0 at time 0
// B = 1 at time 4
…
#5 A = B; // A = 1
C = D;
…
A = #5 B; // A = 0
C = D;
…
A = @(enable) B;
C = D;
…
A = @(named_event) B;
C= D;
…
If timing control
operator(#,@) on LHS



Blocking delay
RHS evaluated at (#,@)
Assignment at (#,@)
If timing control
operator(#,@) on RHS



Intra-assignment delay
RHS evaluated immediately
Assignment at (#,@)
ELEN 468 Lecture 30
14
Example
initial begin
a = #10 1;
b = #2 0;
c = #3 1;
end
initial begin
d <= #10 1;
e <= #2 0;
f <= #3 1;
end
t
0
2
3
10
12
15
a
x
x
x
1
1
1
ELEN 468 Lecture 30
b
x
x
x
x
0
0
c
x
x
x
x
x
1
d
x
x
x
1
1
1
e
x
0
0
0
0
0
f
x
x
1
1
1
1
15
Tell the Differences
always @ (a or b)
y = a|b;
Which one describes or gate?
always @ (a or b)
#5 y = a|b;
Event control is blocked
always @ (a or b)
y = #5 a|b;
always @ (a or b)
y <= #5 a|b;
ELEN 468 Lecture 30
16
Race Condition
always @ ( posedge clk )
c = b;
// c will get previous b or new b ?
always @ ( posedge clk )
b = a;
ELEN 468 Lecture 30
17
Avoid Race Condition
always @ ( posedge clk )
begin
c = b; b = a;
end
// Solution 1: merge always
always @ ( posedge clk )
c = #1 b;
always @ ( posedge clk )
b = #1 a;
// Solution 2: intra-assignment delay
always @ ( posedge clk )
c <= b;
always @ ( posedge clk )
b <= a;
// Solution 3: non-blocking assignment
ELEN 468 Lecture 30
18
Finite State Machine
ELEN 468 Lecture 30
19
FSM Example: Speed Machine
a = 1, b = 0
b=1
a = 1, b = 0
low
medium
stopped
b=1
b=1
a: accelerator
b: brake
accelerator
brake
clock
b=1
speed
high
a = 1, b = 0
a = 1, b = 0
ELEN 468 Lecture 30
20
Verilog Code for Speed Machine
// Explicit FSM style
module speed_machine ( clock,
accelerator, brake, speed );
input clock, accelerator, brake;
output [1:0]
speed;
reg
[1:0]
state, next_state;
parameter
parameter
parameter
parameter
stopped = 2`b00;
s_slow = 2`b01;
s_medium = 2`b10;
s_high = 2`b11;
assign speed = state;
always @ ( posedge clock )
state <= next_state;
always @ ( state or accelerator or brake )
if ( brake == 1`b1 )
case ( state )
stopped: next_state <= stopped;
s_low: next_state <= stopped;
s_medium: next_state <= s_low;
s_high: next_state <= s_medium;
default: next_state <= stopped;
endcase
else if ( accelerator == 1`b1 )
case ( state )
stopped: next_state <= s_low;
s_low: next_state <= s_medium;
s_medium: next_state <= s_high;
s_high: next_state <= s_high;
default: next_state <= stopped;
endcase
else next_state <= state;
endmodule
ELEN 468 Lecture 30
21
State Encoding Example
#
0
1
2
3
4
5
6
7
Binary
000
001
010
011
100
101
110
111
Gray
000
001
011
010
110
111
101
100
Johnson
0000
0001
0011
0111
1111
1110
1100
1000
ELEN 468 Lecture 30
One-hot
00000001
00000010
00000100
00001000
00010000
00100000
01000000
10000000
22
State Encoding
A state machine having N states will require
at least log2N bits register to store the
encoded representation of states
Binary and Gray encoding use the minimum
number of bits for state register
Gray and Johnson code:

Two adjacent codes differ by only one bit
 Reduce simultaneous switching


Reduce crosstalk
Reduce glitch
ELEN 468 Lecture 30
23
One-hot Encoding
Employ one bit register for each state
Less combinational logic to decode
Consume greater area, does not matter for certain
hardware such as FPGA
Easier for design, friendly to incremental change
case and if statement may give different result for
one-hot encoding
Runs faster
‘define state_0 3’b001
‘define state_1 3’b010
‘define state_2 3’b100
ELEN 468 Lecture 30
24
Transistor Level Model
ELEN 468 Lecture 30
25
Static CMOS Circuits
module cmos_inverter ( out, in );
output out;
input in;
supply0 GND;
supply1 PWR;
Vdd
in
d
out
drain
pmos ( out, PWR, in );
nmos ( out, GND, in );
endmodule
source
gate
ELEN 468 Lecture 30
26
Pull Gates
module nmos_nand_2 ( Y, A, B );
output Y;
input A, B;
supply0 GND;
tri w;
pullup ( Y );
nmos ( Y, w, A );
nmos ( w, GND, B );
endmodule
Vdd
Vdd
Y
Y
A
A
B
B
ELEN 468 Lecture 30
27
Assign Drive Strengths
nand ( pull1, strong0 ) G1( Y, A, B );
wire ( pull0, weak1 ) A_wire = net1 || net2;
assign ( pull1, weak0 ) A_net = reg_b;
Drive strength is specified through an unordered pair


one value from { supply0, strong0, pull0, weak0 , highz0 }
the other from { supply1, strong1, pull1, weak1, highz1 }
Only scalar nets may receive strength assignment
When a tri0 or tri1 net is not driven , it is pulled to indicated
logic value with strength of pull0 or pull1
The trireg net models capacitance holds a charge after the
drivers are removed, the net has a charge strength of small,
medium(default) or large capacitor
ELEN 468 Lecture 30
28
Signal Strength Levels
Supply Drive
St0
Strong Drive
Pu0
Pull Drive
La0
Large Capacitor
We0
Weak Drive
Me0
Medium Capacitor
Sm0
Weak Capacitor
HiZ0 High Impedance
Su0
Su1
St1
Pu1
La1
We1
Me1
Sm1
HiZ1
Signal strength – signal’s ability to act as a logic driver determining the
resultant logic value on a net


Signal contention between multiple drivers of nets
Charge distribution between nodes in a circuit
Default – strong drive
Capacitive strengths may be assigned only to trireg nets
ELEN 468 Lecture 30
29
Strength Reduction
Dependence of output strength on input
strength


Combinational and pull gate – NO, except
3-state gates
Transistor switch and bi-directional gates –
YES
In general, output strength <= input
strength
ELEN 468 Lecture 30
30
Transistor Switch and Bi-directional Gate
Transistor switch

nmos, pmos, cmos
Bi-directional gate

tran, tranif0, tranif1
If input ( supply0 or supply1 )

Output ( strong0, strong1 )
Otherwise

Output strength = input strength
ELEN 468 Lecture 30
31
Signal Contention:
Known Strength and Known Value
Signal with greater strength dominates
Same strength, different logic values


wand -> and, wor -> or
Otherwise -> x
driver1
We0
Pu1
driver2
Pu1
ELEN 468 Lecture 30
32
Synthesis
ELEN 468 Lecture 30
33
Unexpected and Unwanted Latch
Combinational logic must specify output
value for all input values
Incomplete case statements and
conditionals (if) imply


Output should retain value for unspecified
input values
Unwanted latches
ELEN 468 Lecture 30
34
Example of Unwanted Latch
module myMux( y, selA, selB, a, b );
input selA, selB, a, b;
output y;
reg y;
always @ ( selA or selB or a or b )
case ( {selA, selB} )
2’b10: y = a;
2’b01: y = b;
endcase
endmodule
b
selA’
selB
selA
selB’
en
y
latch
a
ELEN 468 Lecture 30
35
Synthesis of case and if
case and if statement imply priority


Synthesis tool will determine if case items of a
case statement are mutually exclusive
If so, synthesis will treat them with same priority
and synthesize a mux
A synthesis tool will treat casex and casez
same as case


“x” and “z” will be treated as don’t cares
Post-synthesis simulation result may be different
from pre-synthesis simulation
ELEN 468 Lecture 30
36
Example of if and case
…
input [3:0] data;
output [1:0] code;
reg [1:0] code;
always @(data)
begin // implicit priority
if ( data[3] ) code = 3;
else if (data[2]) code = 2;
else if (data[1]) code = 1;
else if (data[0]) code = 0;
else code = 2’bx;
end
…
…
input [3:0] data;
output [1:0] code;
reg [1:0] code;
always @(data)
case (data)
4’b1000: code = 3;
4’b0100: code = 2;
4’b0010: code = 1;
4’b0001: code = 0;
default: code = 2’bx;
endcase
…
ELEN 468 Lecture 30
37
Synthesis of Register Variables
A hardware register will be generated for a
register variable when



It is referenced before value is assigned in a
behavior
Assigned value in an edge-sensitive behavior and is
referenced by an assignment outside the behavior
Assigned value in one clock cycle and referenced in
another clock cycle
Multi-phased latches may not be supported in
synthesis
ELEN 468 Lecture 30
38
Synthesis of Arithmetic Operators
If corresponding library cell exists, an operator will be
directly mapped to it
Synthesis tool may select among different options in
library cell, for example, when synthesize an adder



Small wordlength -> ripple-carry adder
Long wordlength -> carry-look-ahead adder
Need small area -> bit-serial adder
Implementation of “*” and “/”


May be inefficient when both operands are variables
If a multiplier or the divisor is a power of two, can be
implemented through shift register
ELEN 468 Lecture 30
39
Static Loops without Internal Timing
Controls –> Combinational Logic
module count1sA ( bit_cnt, data, clk, rst );
parameter data_width = 4; parameter cnt_width = 3;
output [cnt_width-1:0] bit_cnt;
input [data_width-1:0] data; input clk, rst;
reg [cnt_width-1:0] cnt, bit_cnt, i; reg [data_width-1:0] tmp;
always @ ( posedge clk )
if ( rst ) begin cnt = 0; bit_cnt = 0; end
else begin cnt = 0; tmp = data;
for ( i = 0; i < data_width; i = i + 1 )
begin
if ( tmp[0] ) cnt = cnt + 1;
tmp = tmp >> 1; end
bit_cnt = cnt;
end
endmodule
ELEN 468 Lecture 30
40
Static Loops with Internal Timing
Controls –> Sequential Logic
module count1sB ( bit_cnt, data, clk, rst );
parameter data_width = 4; parameter cnt_width = 3;
output [cnt_width-1:0] bit_cnt;
input [data_width-1:0] data; input clk, rst;
reg [cnt_width-1:0] cnt, bit_cnt, i; reg [data_width-1:0] tmp;
always @ ( posedge clk )
if ( rst ) begin cnt = 0; bit_cnt = 0; end
else begin
cnt = 0; tmp = data;
for ( i = 0; i < data_width; i = i + 1 )
@ ( posedge clk )
begin if ( tmp[0] ) cnt = cnt + 1;
tmp = tmp >> 1; end
bit_cnt = cnt;
end
endmodule
ELEN 468 Lecture 30
41
Non-Static Loops without Internal Timing
Controls –> Not Synthesizable
module count1sC ( bit_cnt, data, clk, rst );
parameter data_width = 4; parameter cnt_width = 3;
output [cnt_width-1:0] bit_cnt;
input [data_width-1:0] data; input clk, rst;
reg [cnt_width-1:0] cnt, bit_cnt, i; reg [data_width-1:0] tmp;
always @ ( posedge clk )
if ( rst ) begin cnt = 0; bit_cnt = 0; end
else begin
cnt = 0; tmp = data;
for ( i = 0; | tmp; i = i + 1 )
begin if ( tmp[0] ) cnt = cnt + 1;
tmp = tmp >> 1; end
bit_cnt = cnt;
end
endmodule
ELEN 468 Lecture 30
42
Non-Static Loops with Internal
Timing Controls –> Sequential Logic
module count1sD ( bit_cnt, data, clk, rst );
parameter data_width = 4; parameter cnt_width = 3;
output [cnt_width-1:0] bit_cnt;
input [data_width-1:0] data; input clk, rst;
reg [cnt_width-1:0] cnt, bit_cnt, i; reg [data_width-1:0] tmp;
always @ ( posedge clk )
if ( rst ) begin cnt = 0; bit_cnt = 0; end
else begin: bit_counter
cnt = 0; tmp = data;
while ( tmp )
@ ( posedge clk ) begin
if ( rst ) begin cnt = 0; disable bit_counter; end
else begin cnt = cnt + tmp[0]; tmp = tmp >> 1; end
bit_cnt = cnt;
end
end
endmodule
ELEN 468 Lecture 30
43
VHDL
ELEN 468 Lecture 30
44
Example
-- eqcomp4 is a four bit equality comparator
-- Entity declaration
entity eqcomp4 is
port ( a, b: in bit_vector( 3 downto 0 );
equals: out bit ); -- equal is active high
end eqcomp4;
-- Architecture body
architecture dataflow of eqcomp4 is
begin
equals <= ‘1’ when ( a = b ) else ‘0’;
end dataflow;
ELEN 468 Lecture 30
45
Behavioral Descriptions
library ieee;
use ieee.std_logic_1164.all;
entity eqcomp4 is port (
a, b:
in std_logic_vector( 3 downto 0 );
equals:
out std_logic );
end eqcomp4;
architecture behavioral of eqcomp4 is
begin
comp: process ( a, b ) -- sensitivity list
begin
if a = b then equals <= ‘1’;
else equals <= ‘0’; -- sequential assignment
endif
end process comp;
end behavioral;
ELEN 468 Lecture 30
46
Dataflow Descriptions
library ieee;
use ieee.std_logic_1164.all;
entity eqcomp4 is port (
a, b:
in std_logic_vector( 3 downto 0 );
equals:
out std_logic );
end eqcomp4;
architecture dataflow of eqcomp4 is
begin
equals <= ‘1’ when ( a = b ) else ‘0’;
end dataflow;
-- No process
-- Concurrent assignment
ELEN 468 Lecture 30
47
Structural Descriptions
library ieee;
use ieee.std_logic_1164.all;
entity eqcomp4 is port (
a, b: in std_logic_vector( 3 downto 0 );
end eqcomp4;
equals: out std_logic );
use work.gatespkg.all;
architecture struct of eqcomp4 is
signal x : std_logic_vector( 0 to 3);
begin
u0: xnor2 port map ( a(0), b(0), x(0) ); -- component instantiation
u1: xnor2 port map ( a(1), b(1), x(1) );
u2: xnor2 port map ( a(2), b(2), x(2) );
u3: xnor2 port map ( a(3), b(3), x(3) );
u4: and4 port map ( x(0), x(1), x(2), x(3), equals );
end struct;
ELEN 468 Lecture 30
48
Test and Design For Testability
ELEN 468 Lecture 30
49
Single Stuck-at Fault
Three properties define a single stuck-at fault



Only one line is faulty
The faulty line is permanently set to 0 or 1
The fault can be at an input or output of a gate
Example: XOR circuit has 12 fault sites ( ) and 24 single
Faulty circuit value
stuck-at faults
c
1
0
a
d
b
e
Good circuit value
j
s-a-0
g
1
0(1)
1(0)
h
i
z
1
k
f
Test vector for h s-a-0 fault
ELEN 468 Lecture 30
50
Stuck-Open Example
Vector 1: test for A s-a-0
(Initialization vector)
pMOS
FETs
1
0
0
0
A
B
nMOS
FETs
Vector 2 (test for A s-a-1)
VDD
Stuckopen
C
0
Two-vector s-op test
can be constructed by
ordering two s-at tests
1(Z)
Good circuit states
Faulty circuit states
ELEN 468 Lecture 30
51
Stuck-Short Example
Test vector for A s-a-0
PFETs
1
0
A
VDD
IDDQ path in
faulty circuit
Stuckshort
B
NFETs
C
Good circuit state
0 (X)
Faulty circuit state
ELEN 468 Lecture 30
52
Test Pattern for Stuck-At Faults
a
b
c
Ygood = (a●b●c)’
No need to enumerate
all input combinations
to detect a fault
a SA1
b
c
Ya-SA1 = (b●c)’
Test pattern: {a,b,c} = 011
ELEN 468 Lecture 30
53
Fault Simulation
Fault simulation Problem: Given
 A circuit
 A sequence of test vectors
 A fault model

Determine
 Fault coverage - fraction (or percentage) of modeled
faults detected by test vectors
 Set of undetected faults
Motivation
 Determine test quality and in turn product quality
 Find undetected fault targets to improve tests
ELEN 468 Lecture 30
54
Goal of Design for Testability (DFT)
Improve



Controllability
Observability
Predictability
ELEN 468 Lecture 30
55
Scan Storage Cell
D
Si
N’/T
Clk
Q, So
SSC
SSC
D
ELEN 468 Lecture 30
Q
56
Integrated Serial Scan
PI
PO
Combinational
SFF
logic
SFF
SCANOUT
SFF
Control
SCANIN
ELEN 468 Lecture 30
57
Interconnect Timing Optimization
ELEN 468 Lecture 30
58
Buffers Reduce Wire Delay
x/2
R
rx/2
cx/4 cx/4
x/2
C
R
rx/2
cx/4 cx/4
C
∆t
t_unbuf = R( cx + C ) + rx( cx/2 + C )
t_buf = 2R( cx/2 + C ) + rx( cx/4 + C ) + tb
x
t_buf – t_unbuf = RC + tb – rcx2/4
ELEN 468 Lecture 30
59
Buffers Improve Slack
RAT = 300
Delay = 350
Slack = -50
slackmin = -50
RAT = Required Arrival Time
Slack = RAT - Delay
slackmin = 50
Decouple capacitive
load from critical path
RAT = 700
Delay = 600
Slack = 100
RAT = 300
Delay = 250
Slack = 50
RAT = 700
Delay = 400
Slack = 300
ELEN 468 Lecture 30
60
Slew Constraints
When a buffer is inserted, assume ideal slew
rate at its input
Check slew rate at downstream buffers/sinks
If slew is too large, candidate is discarded
ELEN 468 Lecture 30
61
Cost-Slack Trade-off
1000
0
Slack (ps)
0
1
2
3
4
5
6
7
-1000
-2000
-3000
-4000
# of Buffers
ELEN 468 Lecture 30
62
Wire Sizing: Monotone Property
Ancestor edges cannot be narrower
than downstream edges
ELEN 468 Lecture 30
63
Area or Radius?
Radius:
the longest
source-sink
path length
•Prim’s minimum spanning tree
•Small total wire length
•Long path to sinks
•Dijkstra’s shortest path tree
•Short path to sinks
•Large total wire length
ELEN 468 Lecture 30
64
Area Radius Trade-off
Find a solution in middle


Not too much area
Not too long radius
How to find an ideal
point?
ELEN 468 Lecture 30
65
Gate Characteristics
ELEN 468 Lecture 30
66
I-V Characteristics
Cutoff region


d
Vgs < Vt
Ids = 0
g
s
Linear region


Vgs > Vt, 0 < Vds < Vgs-Vt
Ids = B[(Vgs-Vt)Vds – V2ds/2]
Ids
Saturation region


Vgs > Vt, 0 < Vgs-Vt < Vds
Ids = B(Vgs-Vt)2/2
B = a W/L
Vds
ELEN 468 Lecture 30
67
Falling Time
Falling time = t1 + t2
t1 = Vout drops from 0.9Vdd to Vdd-Vt
t2 = Vout drops from Vdd-Vt to 0.1Vdd
Falling time = rising time
≈ k C / (B Vdd)
Delay ≈ Falling time / 2
ELEN 468 Lecture 30
68
Gate Power Dissipation
Leakage power
Dynamic power
Short circuit power
ELEN 468 Lecture 30
69
Leakage Power
Static
Leakage current
= a ● Vdd
Leakage current
= b/Vt
Killer to CMOS
technology
Vdd
Vdd
Leakage
out
out
Leakage
Linear
ELEN 468 Lecture 30
Saturation
70
Dynamic Power
Occurs at each
switching
Pd = CL●Vdd2●fp
fp switching
frequency
Vdd
Vdd
out
Linear
ELEN 468 Lecture 30
out
Saturation
71
Short Circuit Power
During switching,
there is a short
moment when both
PMOS and CMOS are
partially on
Ps = Q●(Vdd-Vt)3●tr●fp
tr rising time
ELEN 468 Lecture 30
Input falling
Vdd
Vdd
out
out
Input rising
72
Low Power Design
ELEN 468 Lecture 30
73
Clock Gating
Gate off clock to idle functional
units


e.g., floating point units
need logic to generate
disable signal
R
Functional
e
unit
g
 increases complexity of control logic
 consumes power
 timing critical to avoid clock glitches
at OR gate output

additional gate delay on clock
signal
clock
disable
 gating OR gate can replace a buffer in
the clock distribution tree
ELEN 468 Lecture 30
74
Active Power Reduction - Supply
Voltage Reduction
Static
Low
Supply
Voltage
Slow
Fast
Slow
Dynamic
High
Supply
Voltage
Pros:
• Always active in saving
Cons:
• Additional power delivery network
• Needs special care of interface between
power domains
• signals close to Vt – excessive leakage
and reduced noise margins
Adjusting operation voltage and frequency to
performance requirements:
• High performance – high Vdd & frequency
• Power saving – low Vdd & frequency
Pros:
• Doesn’t limit performance
Cons:
• Penalty of transition between different
power states can be high (in performance
and power)
• Additional control logic
ELEN 468 Lecture 30
75
Dynamic Frequency and
Voltage Scaling
Always run at the lowest supply voltage that meets the timing
constraints


DFS (dynamic frequency scaling) saves only power
DVS (dynamic voltage scaling) + DFS saves both energy and power
A DVS+DFS system requires the following

A programmable clock generator (PLL)
 PLL from 200MHz  700MHz in increments of 33MHz

A supply regulation loop that sets the minimum VDD necessary for
operation at the desired frequency
 32 levels of VDD from 1.1V to 1.6V

An operating system that sets the required frequency + supply voltage
to meet the task completion deadlines
 heavier load  ramp up VDD, when stable speed up clock
 lighter load  slow down clock, when PLL locks onto new rate, ramp down
VDD
ELEN 468 Lecture 30
76
Design with Dual Vth
Dual Vth evaluation
Dual Vth design


Two flavors of transistors: slow – high Vth, fast – low Vth
Low Vth are faster, but have ≈10X leakage
ELEN 468 Lecture 30
77
Power Gating Using Sleep Transistors
Or can reduce leakage by
gating the supply rails when
the circuit is in sleep mode


in normal mode, sleep = 0 and
the sleep transistors must
present as small a resistance as
possible (via sizing)
in sleep mode, sleep = 1, the
transistor stack effect reduces
leakage by orders of magnitude
Or can eliminate leakage by switching off the power
supply (but lose the memory state)
ELEN 468 Lecture 30
78