Transmission gate latches - ACSEL

Download Report

Transcript Transmission gate latches - ACSEL

Digital System Clocking:
High-Performance and Low-Power Aspects
Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic
Chapter 8: State-of-the-Art Clocked Storage Elements
in CMOS Technology
Wiley-Interscience and IEEE Press, January 2003
•
•
•
•
•
Master-Slave Latches
Flip-Flops
CSE’s with local clock gating
Low clock swing
Dual-edge triggering
Nov. 14, 2003
2
Transmission gate latches
Simplest implementation
D
Clk
S
(a)
- only 4 transistors
-Dynamic when S=1
-Susceptible to noise
Nov. 14, 2003
Q
Basic static latch
D
Clk
S
Complete implementation
Q
D
Clk
(b)
- pull-up/pull-down keeper
- Conflict at node S whenever
new data is written
S
Q
(c)
- Feedback turned off
when writing to the latch
- No conflict
- Larger clock load
3
Transmission Gate Master-Slave Latch
(MSL)
Clk
Clk1
Clk1
Clk
SM
D
Clk 1
Master Transmission
Gate Latch
QM
Q
SS
Clk
Slave Transmission
Gate Latch
MSL with unprotected input
(Gerosa et al. 1994), Copyright © 1994 IEEE
Nov. 14, 2003
4
Transmission Gate MS Latch (continued)
Clk
Clk1
Clk
D
Clk 1
SM
Clk1
Q
Clk1
QM
SS
Q
Clk
removed
Protection from input noise
MSL with input gate isolation
(Markovic et al. 2001), Copyright © 2001 IEEE
Nov. 14, 2003
5
Noise Robustness of MS Latch
3
1
Distant
driver
V SS
D
5
4
V DD
2
D
S
Q
2
1
1 noise on input
3
2 leakage
4
-particle and cosmic
rays
unrelated signal coupling
5
pow er supply
ripple
Sources of noise affecting the latch state node
(Partovi in Chandrakasan et al. 2001), Copyright © 2001 IEEE
Nov. 14, 2003
6
Clocked CMOS (C2MOS) Latch
D
Clk
Q
Clk
D
Clk
Q
Clk
removed
Transmission gate latch with
gate isolation (dynamic)
Nov. 14, 2003
C2MOS latch (dynamic)
7
Clocked CMOS (C2MOS) MS Latch
Clk 1
D
QM
Clk
Clk
Q
Clk 1
Clk
Clk
Clk1
Clk1
Clk
Clk
Clk1
State-keeping feedbacks outside the D-to-Q path
(Suzuki et al. 1973), Copyright © 1973 IEEE
Nov. 14, 2003
8
MS Latches: Comparison
Delay (D) and Race immunity (R)
Energy per cycle
C2MOS: larger clock transistors:
-Smaller delay and race immunity (80% of MSL)
-Higher energy consumption (1.4x more than MSL)
Nov. 14, 2003
9
•
•
•
•
•
Master-Slave Latches
Flip-Flops
CSE’s with local clock gating
Low clock swing
Dual-edge triggering
Nov. 14, 2003
10
Hybrid Latch Flip-Flop (HLFF)
Clk
S
Q
D
Clk1
•
•
•
•
Transparent to D only when Clk and Clk1 are both high
Limited clock uncertainty absorption
Small DQ delay
Small clock load
(Partovi et al. 1996), Copyright © 1996 IEEE
Nov. 14, 2003
11
Semidynamic Flip-Flop (SDFF)
Clk
S
Q
I
Clk
D
Clk
Clk 1
• Dynamic-style first stage
• Fast, small clock load, logic embedding
• Consumes energy for evaluation whenever D=1
• Dynamic-to-static latch in second stage
• “Static 1” hazard
(Klass 1998), Copyright © 1998 IEEE
Nov. 14, 2003
12
“Static 1” hazard in SDFF
• If D=Q=1 in previous cycle, race between Clk and S
causes Q to falsely switch to 0  generated glitch
• Also seen in HLFF
Nov. 14, 2003
13
Sense-Amplifier Flip-Flop (SAFF)
• When Clk=0, S and R are
high, Q and Q unchanged
• At rising edge of Clk
• sense amplifier in 1st stage
generates a “low” pulse on
either S or R, based on which
of D and D is higher
• Other node R or S is driven
high, preventing further
changes
• Latch captures low level of S
or R and updates output
S
R
D
D
Clk
Q
Q
Both NAND gates must sequentially
switch to change Q and Q
Original design (Montanaro et al. 1996), Copyright © 1996 IEEE
Nov. 14, 2003
14
SAFF: Evolution of 2nd Stage Latch
S
S
S
R
R
R
S
S
S
R
R
S
R
S
R
S
R
S
R
S
R
Q
Q
Q
Nov. 14, 2003
S
R
S
R
R
S
Q
Q
all-n-MOS push-pull
(Gieseke et al. 1991);
R
complementary push-pull
(Oklobdzija and
Stojanovic 2001)
Q
complementary push-pull
with gated keeper
(Nikolic, Stojanovic,
Oklobdzija, Jia, Chiu,
Leung 1999).
15
Modified Sense Amplifier Flip-Flop
(MSAFF)
• Sense amplifier in 1st stage
generates a “low” pulse on either
S or R, based on which of D and
D is higher
• Symmetric latch in 2nd stage
D
D
Clk
• outputs are simultaneously pulled to
Vdd and Gnd  fast
• Large drive capability  can be
small
• Keeper in latch active only when
there is no change
S
S
R
Q
R
Q
• No conflict
(Nikolic et al. 1999), Copyright © 1999 IEEE
Nov. 14, 2003
16
Delay [FO4]
Flip-Flops and MS Latches:
Delay Comparison (DQ)
5.0
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
MSL
C2MOS
HLFF
SDFF
SAFF
M-SAFF
• MS Latches are slow – positive setup time, two latches in critical path
• SAFF is slow: it waits for one output to switch the other
• Fastest structures are simple flip-flops with negative setup time
CSE delay comparison (0.18 mm, high load)
Nov. 14, 2003
17
Flip-Flops: Timing Comparisons with
Voltage Scaling
Internal race immunity comparison:
Delay comparison:
- Small race immunity, usually not a
- Relative delay reduces with
concern in critical paths
supply voltage due to
reduction of body effect
(0.25 mm, light load)
Nov. 14, 2003
18
Flip-Flops and MS Latches:
Energy Comparison
120
Ext. clock
Ext. data
Int. clock
Internal non-clk
Energy [fJ]
100
80
60
40
20
0
MSL
C2MOS
HLFF
SDFF
SAFF
M-SAFF
• In MS Latches, internal nodes change only when input D changes
• SAFF, M-SAFF: very small clock load, small 2nd stage latch
• Most energy consumed in HLFF, SDFF with pulse generator and high
internal switching activity
CSE energy breakdown (0.18 mm, 50% activity, high load)
Nov. 14, 2003
19
Flip-Flops and MS Latches:
Energy Comparisons
(0.25 mm, light load)
(Markovic et al. 2001), Copyright © 2001 IEEE
Nov. 14, 2003
20
•
•
•
•
•
Master-Slave Latches
Flip-Flops
CSE’s with local clock gating
Low clock swing
Dual-edge triggering
Nov. 14, 2003
21
Gated Transmission Gate MS Latch
• Concept: inhibit clock
switching when new
D=Q
• comp=D XNOR Q
• If comp=0 (DQ),
circuit works as MSL
• If comp=1 (D=Q),
Clk=0, Clk1=1 
latches closed, no
output change, no
internal power
Clk1
Clk
D
SM
D
QM
Clk1
SS
Q
Clk
D
0.5 *
comp
Clk 1
Clk
0.5
comp
0.5
0.5
0.5
Clk
0.5 *
Nov. 14, 2003
QS
Gated MSL
(Markovic et al. 2001), Copyright © 2001 IEEE
QS
SS
Q
22
Gated TG MS Latch: Timing and Energy
Setup time (U) and Hold time (H)
comparison with MSL
Energy comparison with MSL
• Increased Setup time in gated MSL due to inclusion of the
comparator into the critical path  slower than conventional MSL
• Smaller energy per transition if switching activity of D is <0.3
• For higher switching activity, comparator and clock generator
dominate the energy consumption
Nov. 14, 2003
23
Data-transition
look-ahead
latch
CP
CP
CP
QM
D
Clk
Q
CP
CP
CP
Pulse Generator
CPI
Clock Control
P1
Data-Transition
Look-Ahead
CP
CP
• Pulsed latch in which the generation of clock pulses are
gated with XOR DTLA circuit
• If DQ P1=0, circuit operates as a conventional pulsed latch
• If D=Q  P1=1  CP=0, no output change or energy consumption
in the latch
• XOR circuit and Clock Control in the critical path  large
setup time and D-Q delay
Nov. 14, 2003
(Nogawa and Ohtomo 1998), Copyright © 1998 IEEE
24
DTLA-L: Analysis of Energy Consumption
CP
CP
Clk
QM
D
Q
Pulse Generator
CPI
CP
Clk
CP
CP
CP
CP
C in(CC)
CP
DTLA-L without clock gating
ECMSL
DTLA-L Pulse Generator

1
  ( E01  E10 ) 
 EClk  ECin EDL DFF    ( E01  E10 )  1    EDidle
2
2
2
2
E01  EDidle  ECLK  EDL CC  Eint  Eext
EDidle  E00  E11 
Nov. 14, 2003
EPG
 ECin
N
E10  EDidle  ECLK  EG  Eint
 – input switching activity
Pulse generator shared among N
DTLA-L’s
25
Energy comparison of DTLA-L and CMSL
E(DTLA-L) < E(CMSL)
E(DTLA-L) > E(CMSL)
DTLA-L is more energy-efficient than CMSL when N>2 and < 0.25
Nov. 14, 2003
26
Clock-on-demand PL
• Pulsed latch in which the
generation of clock pulses
D
are gated with XNOR
DTLA circuit
• If DQ XNOR=0, CP1
when Clk, and CP0 after
Q has changed to D
• If D=Q  XNOR=1  CP=0,
no output change or energy
consumption in the latch
• Pulse Generator includes
clock control
CP
CP
Data-Transition
Look-Ahead
Q
CP
XNOR
CP
CP
Clk
Pulse Generator
• can not be shared among
multiple PL’s
(Hamada et al. 1999), Copyright © 1999 IEEE
Nov. 14, 2003
27
Energy-Efficient Pulse Generator in
COD-PL
XNOR
"1
"
• Straightforward
implementation with CMOS
gates
"0
"
"1
"
C int
• Cint switches in each cycle
• Energy-inefficient
"1"
Clk
Clk
XNOR
• Compound ANDNOR gate
Clk
inv
"0
"
CP
• Energy-efficient
Clk
inv
Compound
XNOR AND-NOR
Clk
"1"
Nov. 14, 2003
28
Impact of circuit sizing on the energy
efficiency of COD-PL
COD-PL more effective in high-speed sizing due to large clock transistors
Nov. 14, 2003
(Markovic et al. 2001), Copyright © 2001 IEEE
29
Conditional capture flip-flop
• First stage: pulse
generator with internal
clock gating
• When Clk=1, S=R=1
• When Clk=1, Clk1=0, S can
switch low if D=1, Q=0, R
can switch low if D=0, Q=1
• Otherwise, S=R=1  no
energy consumption
• Second stage: pass-gate
implementation of M-SAFF
latch (Oklobdzija, Stojanovic)
• No setup time degradation
Clk
*
S
S
*
N
R
R
Q Q
N
D
D
Clk
Clk1
S
R
Q
Q
R
S
due to clock gating
(Kong et al. 2000), Copyright © 2000 IEEE
Nov. 14, 2003
30
Comparison of latches and flip-flops with
local clock gating: Timing
Internal race immunity comparison:
Delay comparison:
- Delay relatively constant with supply - Generally R(FF)< R(MSL)< R(gated MSL)
- COD-PL has low race immunity due to
voltage
wide clock pulse
- Latches with clock gating have very
large delay due to large setup time
(Markovic et al. 2001), Copyright © 2001 IEEE
Nov. 14, 2003
31
Comparison of latches and flip-flops with
local clock gating: Energy, EDP
Energy comparison:
- Latches with gated clock consume
less energy than MSL if  < 0.2 – 0.3
Energy-Delay Product comparison:
  < 0.03  G-MSL best
- 0.03 <  < 0.23  DTLA-L best
- 0.23 <   Conventional MSL best
(Markovic et al. 2001), Copyright © 2001 IEEE
Nov. 14, 2003
32
•
•
•
•
•
Master-Slave Latches
Flip-Flops
CSE’s with local clock gating
Low clock swing
Dual-edge triggering
Nov. 14, 2003
33
N-only clocked latches
Clk
D
Clk
QM
SM
Clk
Q
CP
D
Q
SS
D
SS
Clk
Clk
CP
D
N-PL
(c)
Q
N-MSL
(a)
CP
CP
Q
SS
d1
N1
N2
Clk
N-PPL
(d)
N-FF
(b)
CP
Pulse Generator (b)-(d)
• Concept: Bring clock only to n-MOS transistors to allow reduced clock
swing without conflict with partially turned-off p-MOS transistors
• Reduced clock swing reduces clocking energy with some penalty in
performance
• Clock is always in critical path as its edge signalizes when to change
the output
(a) conventional TG MSL, (b) pulsed-latch, (c) conventional PL, (d) push-pull PL
Nov. 14, 2003
34
• PL preferred for
high-speed
• MSL preferred
for low energy
• Low-swing clock:
• N-FF preferred
for high-speed
• N-PPL is
preferred for low
energy
Energy / cycle (norm)
• Full-swing:
Energy / cycle (norm)
Low clock swing CSE’s comparison:
energy and delay
1.2
High-Vdd
PL
1.0
0.8
0.6
N-FF
N-PPL
MSL
0.4
0.2
0.00
0.8
Low-Swing Clk
0.6
N-FF
0.4
N-PPL
N-PL
0.2
N-MSL
0
0
1
2
3
4
5
Data-to-Q delay (FO3 inverter delay)
130nm technology, 50fF load, max. input cap=12.5fF, data activity=0.1:
(a) high-Vdd and (b) low-swing Clk
Nov. 14, 2003
6
35
Effect of clock noise on low-swing clock
latch delay
Clk-Q delay degradation
20%
16%
12%
N-CL
N-PPL
8%
4%
N-FF
0%
0%
3%
6%
9%
Noise on low-swing clock
12%
• All latches fail for clock noise > 12% of clock voltage
• N-FF gives best clock noise rejection
Nov. 14, 2003
36
•
•
•
•
•
Master-Slave Latches
Flip-Flops
CSE’s with local clock gating
Low clock swing
Dual-edge triggering
Nov. 14, 2003
37
DET Latch-mux circuit (DET-LM)
Clk
• Pass-gate latches:
• One transparent when
Clk=0
• One transparent when
Clk=1
• Pass-gate multiplexer
that selects the output
of the opaque latch
Nov. 14, 2003
D
Clk
Clk
Clk
Clk
Clk Clk
Clk
Q
Clk
Clk
(Llopis and Sachdev 1996), Copyright © 1996 IEEE
38
C2MOS Latch-mux (C2MOS-LM)
Clk
N4
N2
• C2MOS latches:
• One transparent when Clk=1
• One transparent when Clk=0
C2MOS
• Multiplexer: two
inverters that propagate the
output of the opaque latch
• Large clock transistors
shared between the latches
and the multiplexer
Nov. 14, 2003
N5
Clk
N3
D
Q
Clk
N2
N4
N3
N5
Clk
(Gago et al. 1993), Copyright © 1993 IEEE
39
Pulsed-latch (DET-PL)
Clk
Clk 2
Clk
Clk1
Clk
Clk1
Clk1 Clk
Clk
Clk2
Clk
Clk1
Clk 1
Clk1 Clk
D
D
Clk2 Clk
Q
Q
Clk2 Clk
Clk2
Clk
(a)
Clk1 Clk Clk2
Clk 1
Clk
Clk
(b)
• Pulse generator transparent to D only when Clk=Clk1=1, or when
Clk=Clk2=1  shortly after both edges of the clock
• DET PL consumes lot of energy for four clocked pass gates
• To improve speed, modified from original design (Strollo et al, 1999)
which implemented n-MOS-only pass gate and p-MOS-only keeper
Nov. 14, 2003
(a) single - edge, (b) dual - edge triggered
40
DET Symmetric pulse generator flip-flop
(SPGFF)
• Two pulse generators: X
active at rising edge of
the clock, Y active at
Clk
falling edge of the clock
• SX and SY alternately
precharge and evaluate
D
• At any moment, one Clk
1
of SX and SY keeps
the value of data
Clk
sampled at the most
recent clock edge
• The other SX or SY
is precharged high
1st STAGE: X
1 st STAGE: Y
2nd STAGE
Clk
SX
SY
D
Clk2
Q
Clk
Clk
Clk1
Clk
Clk1
Clk2
•Pulses at SX and SY have same width as clock
•Second stage is a simple NAND gate  no need for a latch
Nov. 14, 2003
41
SET vs. DET: Delay comparison
6
SE
DE
Delay [FO4]
5
4
3
2
1
0
MSL/LM
C2MOS-LM
PL
SPGFF
• Latch-MUX’s have two equally critical paths, somewhat shorter than
that of MSL
• PL is more complex, adding more capacitance to the critical path
compared to SET PL
• SPGFF has short domino-like critical path  fastest
Nov. 14, 2003
42
SET vs. DET:
Power consumption comparison
Nov. 14, 2003
Clk
160
Non-clk
140
120
Power [uW]
• LM’s benefit from
clever implementation
of latch-mux
structure with clock
transistors sharing
• PL adds extra highactivity capacitance
compared to SET PL
• SPGFF power
consumption is in the
middle, mainly due to
alternate switching
of nodes SX and SY
180
100
80
60
40
20
0
MSL
LM
PL SE
PL DE
C2MOS SE C2MOS DE
(0.18 mm, 500MHz for SET, 250MHZ for DET, high load)
SPGFF
43
60
SET vs. DET: EDP comparison
EDP [fJ/500MHz], [fJ/250MHz]
Single Edge
50
Double Edge
40
30
20
10
0
MSL/LM
C2MOS
PL
SPGFF
• Latch-MUX’s have similar or better EDP than their SET counterparts
• PL exhibits worse delay and energy compared to SET PL, due to more
complex design
• SPGFF is fastest with moderate energy consumption: lowest EDP
• EDP (SPGFF) < EDP (LM) < EDP (PL)
Nov. 14, 2003
(0.18 mm, 500MHz for SET, 250MHZ for DET high load)
44