Power Consumption by Integrated Circuits

Download Report

Transcript Power Consumption by Integrated Circuits

Power Consumption by
Integrated Circuits
Lin Zhong
ELEC518, Spring 2011
Power consumption of processing
• Dynamic power
2
Busy power vs. delay vs. energy
Pdyn  a  C  V dd  f
2
t
V dd
(V dd  V T )
Analysis and Design of Digital ICs,
Hodges et al
3
Core 2 Duo for example
• Intel® Core™2 Duo processor
– T7800 at 2.6GHz
– T7700 at 2.4GHz available on Thinkpad T61p
– 0.75-1.35V, 35Watts
• Intel® Core™2 Duo Low Voltage
– L7500 at 1.6GHz available on Thinkpad X61
– 0.75-1.3V, 17Watts
• Intel® Core™2 Duo Ultra Low Voltage
– U7500 at 1.06GHz available on Dell D430
– 0.75-0.975V, 10Watts
4
Switching energy
e=1/2∙C ∙V2
Switching power
P= b∙C ∙V2= a∙C ∙V2 ∙f
5
Higher integration
• Selling the chipset (or solution or platform)
– Intel Centrino
• Centrino Duo includes Core 2 Duo processor, 9XX Express-series chipset,
and Wi-Fi adapter
– TI TCS2600 chipset
6 6
System-on-a-chip (SoC)
• TI OMAP
7
SiP: Multiple-chip product (MCP)
400MHz
32MB
Source: Intel.com
Siemens SX66 PDA Phone
Audiovox PPC6601KIT
8
SiP: Stacked-die approach
Qualcomm 3G
CDMA2000 chip
Seven power regimes
100 clock regimes
ISSCC 2004
9
Moore’s Law
Exciting
Unknown
known
10
MOSFET at nanoscale
Sunlin Chou, “Extending Moore’s Law in the Nanotechnology Era” (www.intel.com).
11
Given workload L and deadline T
• L measured by # of CPU cycles
• Clock speed f ≥ L/T
• Time to finish: t = L/f
• Energy to finish: P ∙ t= a∙C ∙V2 ∙f ∙t= a∙C ∙V2 ∙L
12
Effect of lower clock speed (f)
Power consumption
P= a∙C ∙V2 ∙f
Energy consumption
E=P ∙ t= a∙C ∙V2 ∙f ∙t= a∙C ∙V2 ∙L
13
Effect of lower supply voltage (V)
Maximum clock speed
f= b∙V
Power consumption
P= a∙C ∙V2 ∙f=k∙V3=x∙f3
Energy consumption
E=P ∙ t= a∙C ∙V2 ∙f ∙t= a∙C ∙V2 ∙L
14
Given workload L and deadline T
single processor
• The processor can run at any frequency (voltage)
– f= b∙V
• The processor can be complete off when work is
done (zero power when idle)
• To minimize energy consumption, at which
frequency should the processor run?
– f ≥ L/T (in order to meet the deadline)
– E=P ∙ t= a∙C ∙V2 ∙f ∙t= a∙C ∙V2 ∙L
– f=????
15
f
f2=L/(T/2)=2f1
f1=L/T
T
time
16
P2=23P1
P
P1=x∙f3
T
time
17
Given workload L and deadline T
M processors
• The workload can be divided without
overhead: L = L1+L2+…+LM (L ≥ Li≥0)
• To minimize energy consumption, at which
frequency should processor i run?
– f i= Li/T and V = u ∙ Li
– Ei= a∙C ∙V2 ∙Li=w∙Li3
18
Given workload L and deadline T
M processors
• The workload can be divided without
overhead: L = L1+L2+…+LM (L ≥ Li≥0)
• To minimize the TOTAL energy consumption,
how should the workload be allocated?
– E= E1+E2+…+EM= w∙L13+w∙L23+…+w∙LM3
– = w(L13+L23+…+LM3)
19
From high school
• [(a+b)/2]2≤ (a2+b2)/2
≥
≥
Quadratic mean
Arithmetic mean
Geometric mean
≥
harmonic mean
20
From high school (Contd.)
• [(a+b)/2]3≤ (a3+b3)/2 ( for a, b ≥0)
– E= w(L13+L23+…+LM3) ??? (L1+L2+…+LM)3
21
From college: Convex (Concave)
By definition of “convex”
22
Jensen’s Inequality (finite form)
• ϕ (x) is convex
– ϕ (t∙x1+(1-t)∙x2)≤ t∙ ϕ (x1)+(1-t) ∙ϕ (x2)
http://en.wikipedia.org/wiki/Jensen%27s_inequality#Proof_1_.28finite_form.29
23
• ai=1/n
• ϕ (x) =x2 (Convex)
≥
• ϕ (x) =x3(Convex for x≥0)
– E= w(L13+L23+…+LM3)=w∙M ∙ (L13+L23+…+LM3)/M
– ≥ w∙M ∙[(L1+L2+…+LM)/M] 3=w∙L3/M2
24
More about Convexity
Cost
Return
Example
Cost
Return
Workload distribution
Energy
Workload finished within T
Eating
Price of apples
Pleasure from eating apples
Helicopter engine
Price of engine
Engine thrust
Law of diminishing marginal
returns
Cost of production
Increase in production
More about Convexity
Cost
Return
• Greedy optimization works
• Combine simpler/cheaper components
Check the assumptions
• Power consumption is zero when the
processor is not active
27
Idle power (Static power)

Pstatic  T e
2
T
Pstatic  V dd e
 V d d
When IC is idle but not powered off, e.g. SRAM
28
Leakage power
Scaling down
30
Scaling down (Contd.)
Quantum dynamics:
Individual molecules
Thermodynamics: Gas
High variation and likely defectivel
Uniform (central limit theorem)
31
Scaling: Not that simple (Contd.)
Tunneling effect
32
f
f2=L/(T/2)=2f1
f1=L/T
T
time
33
P
P1=x∙f3
T
time
34
P
P1=x∙f3+Pstatic
T
time
35
P2=23x∙f3+Pstatic
P
P1=x∙f3+Pstatic
T
time
36
Why is static power important?
ITRS, 2009
Pentium II (Klamath) and III (Coppermine)
28M Transistors
7.5M Transistors
38
Core 2 Duo (Conroe)
Core 1
Core 2
64KB L1 cache, 4MB L2 cache,
291M Transistors
39
Solutions to “never-enough” challenge
234M transistors
24M go to L2 cache
8 SPE, each 20.9M
transistors (167M
transistors)
Each has 4 64KB SRAM
(12M transistors)
SRAM takes 122M transistors (>50%)
40
Multiple power/clock domains
Multimedia phone:
NTT DoCoMo 3G FOMA 902i
to be released with OMAP2420
TI OMAP 2 architecture,
ISSCC 2005
41
Given workload L and deadline T
single processor
• One processor can run at any frequency
(voltage)
– f= b∙V
• The processor can be complete off when work
is done (zero power when idle) Given Pstatic
– Given energy overhead of shutting down the
processor (Eoverhead)
• To minimize energy consumption, at which
frequency should the processor run?
42
P2=23x∙f3+Pstatic
P
P1=x∙f3+Pstatic
T
time
43
Why is there overhead to power off
circuit?
Clock generator
• Resonant circuit + amplifier
A
Res
• Resonant circuit (Oscillator)
– Crystal oscillator (>2x109/yr)
• ~10KHz to ~10MHz
• Quartz, ceramics (low cost, low accuracy), surface acoustic
wave (SAW) quartz crystal (expensive, accurate)
• Real-time clocks
– 32.768KHz (215), 4.194304MHz (222)
• Application-specific
– 4.9152MHz (4 x 1.2288MHz, CDMA baseband frequency)……
45
Oscillator (Contd.)
• LC/RLC circuit
• Ring oscillator
– Application other than oscillator?
• Voltage-controlled oscillator (VCO)
– Varicap: variable capacitance diode (tuning diode)
– Phase-locked loop for high-speed clock (next slide)
– Frequency scaling of IC for energy saving
46
Phase-locked loop (PLL)
• High-speed clock from a master oscillator
• Digital PLL
Master
oscillator
Phasefrequency
detector
voltage
VCO
Frequency
divider (N)
• Clock generation, recovery, synchronization
– Digital computing, RF communication
47
Given workload L and deadline T
single processor
• The processor can run at any frequency (voltage)
– f= b∙V
• The processor can be complete off when work is
done (zero power when idle)
• To minimize energy consumption, at which
frequency should the processor run?
– f ≥ L/T (in order to meet the deadline)
– E=P ∙ t= a∙C ∙V2 ∙f ∙t= a∙C ∙V2 ∙L
– f=????
48
Threshold voltage
Vdd scales slow & Vth scales slower
• Vth is limited by the
thermal voltage
• Vdd needs to stay
considerable higher than
Vth to curb leakage current
• End up with destroying the
scaling rules
– low channel mobility
Plummer and Griffin, 2001 (Data from ITRS/NTRS)
50
Check the assumptions (Contd.)
• The workload can be divided without
overhead: L = L1+L2+…+LM (L ≥ Li≥0)
• Communication cost between processors!!!
51
Quadrotor vs. Helicopter
Quadrotor vs. Helicopter
De Bothezat Quadrotor, 1923.
Quadrotor vs. Helicopter
A.R. Drone, 2010
Wire power consumption
55
Wire power consumption
Inter-processor communication