3. Preformance of CMOS Circuits
Download
Report
Transcript 3. Preformance of CMOS Circuits
Performance of CMOS
Circuits
Instructed by Shmuel Wimer
Eng. School, Bar-Ilan University
Credits: David Harris
Harvey Mudd College
(Some material copied/taken/adapted from
Harris’ lecture notes)
Dec 2010
Performance of CMOS Circuits
1
Outline
Gate and Diffusion Capacitance
RC Delay Models
Power and Energy
Dynamic Power
Static Power
Low Power Design
Dec 2010
Performance of CMOS Circuits
2
MOSFET Capacitance
gate to
source
gate to
drain
gate to
substrate
Dec 2010
Performance of CMOS Circuits
3
Any two conductors separated by an insulator have
capacitance
Gate to channel capacitor is very important
– Creates channel charge necessary for operation
Source and drain have capacitance to body
– Across reverse-biased diodes
– Called diffusion capacitance because it is
associated with source/drain diffusion
Dec 2010
Performance of CMOS Circuits
4
Gate Capacitance
Approximate channel as connected to source
Cgs = eoxWL/tox = CoxWL = CpermicronW
Cpermicron is typically about 2 fF/mm
polysilicon
gate
W
tox
n+
L
n+
SiO2 gate oxide
(good insulator, eox = 3.9e0)
p-type body
Dec 2010
Performance of CMOS Circuits
5
Accumulation occurs
when Vg is negative (for
P material). Holes are
induced under the oxide.
Cgate = CoxA where Cox =
eSiO2eo/tox
Dec 2010
Performance of CMOS Circuits
6
Depletion occurs
when Vg is near zero
but < Vtn. Here the
Cgate is given by CoxA
in series with
depletion layer
capacitance Cdep
Dec 2010
Performance of CMOS Circuits
7
Inversion occurs when
Vg is positive and >
Vtn (for P material). A
model for inversion in
comprised of Cox A
connecting from gateto-channel and Cdep
connecting from
channel-to-substrate.
Dec 2010
Performance of CMOS Circuits
8
Normalized gate
capacitance versus
Gate voltage Vgs.
High freq behavior
is due to the
distributed
resistance of
channel
Dec 2010
Performance of CMOS Circuits
9
Normalized Experimental MOS Gate Capacitance
Measurements vs Vds, Vgs
For Vds = 0, the total gate capacitance Cox A splits
equally to the drain and source of the transistor.
Dec 2010
Performance of CMOS Circuits
10
For Vds > 0, the gate capacitance tilts more toward
the source and becomes roughly 2/3 CoxA to the
source and 0 to the drain for high Vds.
Dec 2010
Performance of CMOS Circuits
11
Higher Vgs – Vt forces this tilting to occur later,
since the device is linear up to Vgs – Vt = Vds.
Dec 2010
Performance of CMOS Circuits
12
MOS Transistor Gate Capacitance Model
Gate capacitance has different components in different modes, but total
remains constant.
Dec 2010
Performance of CMOS Circuits
13
Gate capacitance has different components in different modes, but total
remains constant.
Dec 2010
Performance of CMOS Circuits
14
Diffusion Capacitance
Csb, Cdb
Undesirable, called parasitic capacitance
Capacitance depends on area and perimeter
– Use small diffusion nodes
– Comparable to Cg
for contacted diff
– ½ Cg for uncontacted
– Varies with process
Dec 2010
Performance of CMOS Circuits
15
Diffusion Capacitance (Cont’d)
worst
best
Dec 2010
Performance of CMOS Circuits
16
Effective Resistance
Shockley models have limited value
– Not accurate enough for modern transistors
– Too complicated for much hand analysis
Simplification: treat transistor as resistor
– Replace Ids(Vds, Vgs) with effective resistance R
• Ids = Vds/R
– R averaged across switching of digital gate
Too inaccurate to predict current at any given time
– But good enough to predict RC delay
Dec 2010
Performance of CMOS Circuits
17
RC Delay Model
Use equivalent circuits for MOS transistors
– Ideal switch + capacitance and ON resistance
– Unit nMOS has resistance R, capacitance C
– Unit pMOS has resistance 2R, capacitance C
Capacitance proportional to width
Resistance inversely proportional to width
d
g
2R/k
g
g
kC
kC
d
k
s
s
Dec 2010
kC
kC
R/k
d
k
s
s
Performance of CMOS Circuits
kC
g
kC
d
18
RC Values
Capacitance
– C = Cg = Cs = Cd = 2 fF/mm of gate width
– Values similar across many processes
Resistance
– R 6 KW*mm in 0.6um process
– Improves with shorter channel lengths
Unit transistors
– May refer to minimum contacted device (4/2 l)
– Or maybe 1 mm wide device
– Doesn’t matter as long as you are consistent
Dec 2010
Performance of CMOS Circuits
19
Inverter Delay Estimate
Estimate the delay of a fanout-of-1 inverter
2C
R
A
2 Y
2
1
1
2C
2C
2C
Y
R
C
R
C
C
C
Dec 2010
2C
Performance of CMOS Circuits
C
d = 6RC
20
Transient Response
DC analysis tells us Vout if Vin is constant
Transient analysis tells us Vout(t) if Vin(t) changes
– Requires solving differential equations
Input is usually considered to be a step or ramp
– From 0 to VDD or vice versa
Dec 2010
Performance of CMOS Circuits
21
Inverter Step Response
Ex: find step response of inverter driving load cap
Vin (t ) u (t t0 )VDD
Vout (t t0 ) VDD
Vin(t)
dVout (t )
I dsn (t )
dt
Cload
0
2
I dsn (t )
V
V
DD
t
2
VDD Vt Vout (t )
2
Dec 2010
V out(t)
Cload
Idsn(t)
t t0
Vout VDD Vt
V (t ) V V V
out
out
DD
t
Performance of CMOS Circuits
22
Inverter Step Response
Ex: find step response of inverter driving load cap
Vin(t)
Vin (t)
V out(t)
Cload
Idsn(t)
Vout(t)
t0
Dec 2010
t
Performance of CMOS Circuits
23
Delay Definitions
rising delay
falling delay
low to high
propagation
delay
high to low
propagation
delay
Dec 2010
Performance of CMOS Circuits
24
Dec 2010
Performance of CMOS Circuits
25
Delay Definitions (Cont’d)
tpdr: rising propagation delay
– Maximum time from input crossing 50% to rising
output crossing 50%
tpdf: falling propagation delay
– Maximum time from input crossing 50% to falling
output crossing 50%
tpd: average propagation delay
– tpd = (tpdr + tpdf)/2
tr: rise time
– From output crossing 0.2 VDD to 0.8 VDD
Dec 2010
Performance of CMOS Circuits
26
Delay Definitions (Cont’d)
tf: fall time
– From output crossing 0.8 VDD to 0.2 VDD
tcdr: rising contamination delay
– Minimum time from input crossing 50% to rising
output crossing 50%
tcdf: falling contamination delay
– Minimum time from input crossing 50% to falling
output crossing 50%
tcd: average contamination delay
– tcd = (tcdr + tcdf)/2
Dec 2010
Performance of CMOS Circuits
27
Simulated Inverter Delay
Solving differential equations by hand is too hard
SPICE simulator solves the equations numerically
– Uses more accurate I-V models too!
But simulations take time to write
2. 0
1. 5
1. 0
t pdr = 83ps
(V)
Vin
t pdf = 66ps
Vout
0. 5
0. 0
0. 0
200p
400p
600p
800 p
1n
t(s)
Dec 2010
Performance of CMOS Circuits
28
Delay Estimation
We would like to be able to easily estimate delay
– Not as accurate as simulation
– But easier to ask “What if?”
The step response usually looks like a 1st order RC
response with a decaying exponential.
Use RC delay models to estimate delay
– C = total capacitance on output node
– Use effective resistance R
– So that tpd = RC
Characterize transistors by finding their effective R
– Depends on average current as gate switches
Dec 2010
Performance of CMOS Circuits
29
RC Delay Model
Use equivalent circuits for MOS transistors
– Ideal switch + capacitance and ON resistance
– Unit nMOS has resistance R, capacitance C
– Unit pMOS has resistance 2R, capacitance C
Capacitance proportional to width
Resistance inversely proportional to width
d
g
2R/k
g
g
kC
kC
d
k
s
s
Dec 2010
kC
kC
R/k
d
k
s
s
Performance of CMOS Circuits
kC
g
kC
d
30
Example: 3-input NAND
Sketch a 3-input NAND with transistor widths
chosen to achieve effective rise and fall resistances
equal to a unit inverter (R).
2
2
2
all N devices must
In worst case of P
only one device is
3
opened.
3
be opened.
3
Dec 2010
Performance of CMOS Circuits
31
3-input NAND Caps
Annotate the 3-input NAND gate with gate and
diffusion capacitance.
2C
2C
2
2C
2C
2
2C
2C
3C
3C
3C
Dec 2010
2C
2C
2
2C
3
3
3
3C
3C
3C
3C
Performance of CMOS Circuits
32
3-input NAND Caps (Cont’d)
Annotate the 3-input NAND gate with gate and
diffusion capacitance.
2
2
3
5C
5C
5C
Dec 2010
2
3
3
Performance of CMOS Circuits
9C
3C
3C
33
Elmore Delay
ON transistors look like resistors
Pullup or pulldown network modeled as RC ladder
Elmore delay of RC ladder
t pd
Ri to sourceCi
nodes i
R1C1 R1 R2 C2 ... R1 R2 ... RN C N
R1
Dec 2010
R2
R3
C1
C2
RN
C3
Performance of CMOS Circuits
CN
34
For a step input Vin, the delay at any node can be estimated with the
Elmore delay equation tDi = Cj Rk
For example, the Elmore delay at node 7 is give by:
R1 ( C1 + C2 + C3 + C4 + C5+ C6+ C7 + C8 ) +
R6 ( C6+ C7+ C8 )+
R7 ( C7 + C8)
Dec 2010
Performance of CMOS Circuits
35
Dec 2010
Performance of CMOS Circuits
36
Dec 2010
Performance of CMOS Circuits
37
Dec 2010
Performance of CMOS Circuits
38
Example: 2-input NAND
Estimate rising and falling propagation delays of a 2input NAND driving h identical gates.
2
2
6C
A
2
B
2x
R
Dec 2010
Y
(6+4h)C
Y
4hC
h copies
2C
t pdr 6 4h RC
Performance of CMOS Circuits
39
Example: 2-input NAND
Estimate rising and falling propagation delays of a
2-input NAND driving h identical gates.
2
h copies
x
R/2
Dec 2010
R/2
2C
Y
(6+4h)C
2
A
2
B
2x
6C
Y
4hC
2C
t pdf 2C R2 6 4h C R2 R2
7 4h RC
Performance of CMOS Circuits
40
Delay Components
Delay has two parts
– Parasitic delay
• 6 or 7 RC
• Independent of load
– Effort delay
• 4h RC
• Proportional to load capacitance
Dec 2010
Performance of CMOS Circuits
41
Contamination Delay
Best-case (contamination) delay can be substantially
less than propagation delay.
Ex: If both inputs fall simultaneously
2
2
6C
A
2
B
2x
R R
Y
(6+4h)C
Dec 2010
Y
4hC
2C
tcdr 3 2h RC
Performance of CMOS Circuits
42
Diffusion Capacitance
we assumed contacted diffusion on every s / d
Good layout minimizes diffusion area
Ex: NAND3 layout shares one diffusion contact
– Reduces output capacitance by 2C
– Merged uncontacted diffusion might help too
2C
Shared
Contacted
Diffusion
2C
Isolated
Contacted
Diffusion
Merged
Uncontacted
Diffusion
2
2
2
3
3
3
7C
3C
3C
3C 3C 3C
Dec 2010
Performance of CMOS Circuits
43
Layout Comparison
Layout representation by stick diagram. What CKT?
Which layout is better?
VDD
A
VDD
B
Y
GND
Dec 2010
A
B
Y
GND
Performance of CMOS Circuits
44
Power and Energy
Power is drawn from a voltage source attached to
the VDD pin(s) of a chip.
Instantaneous Power: P(t ) iDD (t )VDD
T
0
0
E P(t )dt iDD (t )VDDdt
Energy:
Average Power:
Dec 2010
T
T
E 1
Pavg iDD (t )VDD dt
T T 0
Performance of CMOS Circuits
45
Dynamic Power
Dynamic power is required to charge and discharge
load capacitances when transistors switch
One cycle involves a rising and falling output
On rising output, charge Q = CVDD is required
On falling output, charge is dumped to GND
This repeats Tfsw times over an interval of T
VDD
iDD (t)
f sw
Dec 2010
C
Performance of CMOS Circuits
46
T
Pdynamic
E 1
iDD (t )VDD dt
T T0
T
VDD
VDD
2
i
(
t
)
dt
Tf
CV
CV
DD
sw
DD
DD f sw
T 0
T
VDD
iDD (t)
f sw
Dec 2010
C
Performance of CMOS Circuits
47
Activity Factor
Suppose the system clock frequency = f
Let fsw = af, where a = activity factor
– If the signal is a clock, a = 1
– If the signal switches once per cycle, a = ½
– Static gates:
• Depends on design, but typically a = 0.1
– Dynamic gates:
• Switch either 0 or 2 times per cycle, a = ½
Dynamic power:
Dec 2010
Pdynamic aCVDD2 f
Performance of CMOS Circuits
48
Short Circuit Current
When transistors switch, both nMOS and pMOS
networks may be momentarily ON at once
Leads to a blip of “short circuit” current.
< 10% of dynamic power if rise/fall times are
comparable for input and output
Dec 2010
Performance of CMOS Circuits
49
Power Dissipation Sources
Ptotal = Pdynamic + Pstatic
Dynamic power: Pdynamic = Pswitching + Pshortcircuit
– Switching load capacitances
– Short-circuit current
Static power: Pstatic = (Isub + Igate + Ijunct + Icontention)VDD
– Sub-threshold leakage
– Gate leakage
– Junction leakage
– Contention current
Dec 2010
Performance of CMOS Circuits
50
Dynamic Power Example
1 billion transistor chip
– 50M logic transistors
• Average width: 12 l
• Activity factor = 0.1
– 950M memory transistors
• Average width: 4 l
• Activity factor = 0.02
– 1.0 V 65 nm process
– C = 1 fF/mm (gate) + 0.8 fF/mm (diffusion)
Estimate dynamic power consumption @ 1 GHz.
Neglect wire capacitance and short-circuit current.
Dec 2010
Performance of CMOS Circuits
51
Power Estimate Ex (Cont’d)
Clogic 50 106 12l 0.025m m / l 1.8 fF / m m 27nF
Cmem 950 106 4l 0.025m m / l 1.8 fF / m m 171nF
Pdynamic 0.1Clogic 0.02Cmem 1.0 1.0GHz 6.1W
2
Dec 2010
Performance of CMOS Circuits
52
Dynamic Power Reduction
Pswitching aCVDD2 f
Try to minimize:
– Activity factor
– Capacitance
– Supply voltage
– Frequency
Dec 2010
Performance of CMOS Circuits
53
Activity Factor Estimation
Let Pi = Prob(node i = 1)
ai = Pi *(1- Pi)
Completely random data has P = 0.5 and a = 0.25
Data is often not completely random
– e.g. upper bits of 64-bit words representing bank
account balances are usually 0
Data propagating through ANDs and ORs has lower
activity factor
– Depends on design, but typically a ≈ 0.1
Dec 2010
Performance of CMOS Circuits
54
Switching Probability
Dec 2010
Performance of CMOS Circuits
55
Example
A 4-input AND is built out of two levels of gates
Estimate the activity factor at each node if the inputs
have P = 0.5
Dec 2010
Performance of CMOS Circuits
56
Clock Gating
The best way to reduce the activity is to turn
off the clock to registers in unused blocks
– Saves clock activity (a = 1)
– Eliminates all switching activity in the block
– Requires determining if block will be used
Dec 2010
Performance of CMOS Circuits
57
Capacitance
Gate capacitance
– Fewer stages of logic
– Small gate sizes
Wire capacitance
– Good floorplanning to keep communicating
blocks close to each other
– Drive long wires with inverters or buffers rather
than complex gates
Dec 2010
Performance of CMOS Circuits
58
Voltage / Frequency
Run each block at the lowest possible
voltage and frequency that meets
performance requirements
Voltage Domains
– Provide separate supplies to different blocks
– Level converters required when crossing
from low to high VDD domains
Dynamic Voltage Scaling
– Adjust VDD and f according to
workload
Dec 2010
Performance of CMOS Circuits
59
Static Power
Static power is consumed even when chip is
quiescent
– Ratioed circuits burn power in fight between ON
transistors. Occurs when output is low (0).
– Leakage draws power from nominally OFF
devices
Dec 2010
Performance of CMOS Circuits
60
Static Power Example
Revisit power estimation for 1 billion transistor chip
Estimate static power consumption
– Subthreshold leakage
• Normal Vt:
100 nA/mm
• High Vt:
10 nA/mm
• High Vt used in all memories and in 95% of
logic gates
– Gate leakage
5 nA/mm
– Junction leakage
negligible
Dec 2010
Performance of CMOS Circuits
61
Solution
Wnormal-Vt 50 106 12l 0.025m m / l 0.05 0.75 106 m m
Whigh-Vt 50 106 12l 0.95 950 106 4l 0.025m m / l 109.25 106 m m
I sub Wnormal-Vt 100 nA/m m+Whigh-Vt 10 nA/m m / 2 584 mA
I gate Wnormal-Vt Whigh-Vt 5 nA/m m / 2 275 mA
Pstatic 584 mA 275 mA1.0 V 859 mW
Dec 2010
Performance of CMOS Circuits
62
Leakage Control
Leakage and delay trade off
– Aim for low leakage in sleep and low delay in
active mode
To reduce leakage:
– Increase Vt: multiple Vt
• Use low Vt only in critical circuits
– Increase Vs: stack effect
• Input vector control in sleep
– Decrease Vb
• Reverse body bias in sleep
• Or forward body bias in active mode
Dec 2010
Performance of CMOS Circuits
63
Gate Leakage
Extremely strong function of tox and Vgs
– Negligible for older processes
– Approaches subthreshold leakage at 65 nm and
below in some processes
An order of magnitude less for pMOS than nMOS
Control leakage in the process using tox > 10.5 Å
– High-k gate dielectrics help
– Some processes provide multiple tox
• e.g. thicker oxide for 3.3 V I/O transistors
Control leakage in circuits by limiting VDD
Dec 2010
Performance of CMOS Circuits
64
Power Gating
Turn OFF power to blocks when they are idle to
save leakage
– Use virtual VDD (VDDV)
– Gate outputs to prevent
invalid logic levels to next block
Voltage drop across sleep transistor degrades
performance during normal operation
– Size the transistor wide enough to minimize
impact
Switching wide sleep transistor costs dynamic power
– Only justified when circuit sleeps long enough
Dec 2010
Performance of CMOS Circuits
65
Low Power Design
Reduce dynamic power
a: clock gating, sleep mode
– C: small transistors (esp. on clock), short wires
– VDD: lowest suitable voltage
– f: lowest suitable frequency
Reduce static power
– Selectively use ratioed circuits
– Selectively use low Vt devices
– Leakage reduction:
stacked devices, body bias, low temperature
Dec 2010
Performance of CMOS Circuits
66