Michael P. Frank http://www.eng.fsu.edu/~mpf Requirements for Energy-Efficient Computing Beyond the von Neumann Limit ECE Graduate Seminar Thursday, October 20, 2005

Download Report

Transcript Michael P. Frank http://www.eng.fsu.edu/~mpf Requirements for Energy-Efficient Computing Beyond the von Neumann Limit ECE Graduate Seminar Thursday, October 20, 2005

Michael P. Frank
http://www.eng.fsu.edu/~mpf
Requirements for
Energy-Efficient Computing
Beyond the von Neumann Limit
ECE Graduate Seminar
Thursday, October 20, 2005
Abstract of Talk
• Fundamental physics limits the performance of
conventional computing technologies.
– The energy efficiency of conventional machines will be
forced to level off in roughly the next 10-20 years.
• Practical computer performance must then plateau as well.
• However, all of the proven limits to computer energy
efficiency can, in principle, be circumvented…
– but only if computing undergoes a radical paradigm shift.
• The essential new paradigm: Reversible computing.
– It involves reusing energy to improve energy efficiency.
• However, doing this well tightly constrains computer design at all
levels from devices through logic, architectures, and algorithms.
• In this talk, I review the stringent physical and logical
requirements that must be met,
– if we wish to break through the near-term barriers,
• and approach the true physical limits of computing.
11/7/2015
M. Frank, "Approaching the Physical Limits of Computing"
2
Moore’s Law and Performance
Moore's Law - Transistors per Chip
1,000,000,000
Madison
Itanium 2
P4
P3
P2
486DX Pentium
386
286
• Gordon Moore, 1975:
– Devices per IC can be
doubled every 18 months
• Borne out by history!
• Some fortuitous corollaries:
10,000,000
1,000,000
100,000
10,000
1,000
100
Devices per IC
100,000,000
8086
4004
10
1
1950
Avg. increase
of 57%/year
Year of Introduction
1960
1970
1980
1990
– Every 3 years: Devices ½ as long
– Every 1.5 years: ~½ as much stored energy per bit!
2000
2010
• It is that that has enabled us to throw away bits (and their energies)
2× more frequently every 1.5 years, at reasonable power levels!
– And thereby double processor performance ~2× every 1.5 years!
• Increased energy efficiency of computation is a
prerequisite for improved raw performance!
– Given realistic levels of total power consumption.
11/7/2015
M. Frank, "Approaching the Physical Limits of Computing"
3
Efficiency in General,
and Energy Efficiency
• The efficiency η of any process is: η = P/C
– Where P = Amount of some valued product produced
– and C = Amount of some costly resources consumed
• In energy efficiency ηe, the cost C measures energy.
• We can talk about the energy efficiency of:
– A heat engine: ηhe = W/Q, where:
• W = work energy output, Q = heat energy input
– An energy recovering process : ηer = Eend/Estart, where:
• Eend = available energy at end of process,
• Estart = energy input at start of process
– A computer: ηec = Nops/Econs, where:
• Nops = useful operations performed
• Econs = free-energy consumed
11/7/2015
M. Frank, "Approaching the Physical Limits of Computing"
4
ITRS '97-'03
Gate Energy Switching
Trends
Trend of Minimum
Transistor
Energy
Based on ITRS ’97-03 roadmaps
1.E-14
250
180
1.E-15
130
Joules
energy,
CV2/2 gate
CVV/2
energy,
J
90
LP min gate energy, aJ
HP min gate energy, aJ
100 k(300 K)
ln(2) k(300 K)
1 eV
k(300 K)
Node numbers
(nm DRAM hp)
65
1.E-16
45
32
1.E-17
fJ
22
Practical limit for CMOS?
1.E-18
aJ
Room-temperature 100 kT reliability limit
One electron volt
1.E-19
1.E-20
Room-temperature kT thermal energy
Room-temperature von Neumann - Landauer limit
zJ
1.E-21
1.E-22
1995
2000
2005
2010
2015
2020
2025
2030
2035
2040
2045
Year
11/7/2015
M. Frank, "Approaching the Physical Limits of Computing"
5
Some Lower Bounds on Energy
Dissipation
• In today’s 90 nm VLSI technology, for minimal operations
(e.g., conventional switching of a minimum-sized transistor):
– Ediss,op is on the order of 1 fJ (femtojoule)  ηec ≲ 1015 ops/sec/watt.
• Will be a bit better in coming technologies (65 nm, maybe 45 nm)
• But, conventional digital technologies are subject to several
lower bounds on their energy dissipation Ediss,op for digital
transitions (logic / storage / communication operations),
– And thus, corresponding upper bounds on their energy efficiency.
• Some of the known bounds include:
– Leakage-based limit for high-performance field-effect transistors:
• Maybe roughly ~5 aJ (attojoules)  ηec ≲ 2×1017 operations/sec./watt
– Reliability-based limit for all non-energy-recovering technologies:
• On the order of 1 eV (electron-volt)  ηec ≲ 6×1018 ops./sec/watt
– von Neumann-Landauer (VNL) bound for all irreversible technologies:
• Exactly kT ln 2 ≈ 18 meV  ηec ≲ 3.5×1020 ops/sec/watt
– For systems whose waste heat ultimately winds up in Earth’s atmosphere,
» i.e., at temperature T ≈ Troom = 300 K.
11/7/2015
M. Frank, "Approaching the Physical Limits of Computing"
6
Reliability Bound on Logic
Signal Energies
• Let Esig denote the logic signal energy,
– The energy involved (transferred, manipulated) in the process of storing,
transmitting, or transforming a bit’s worth of digital information.
• But note that “involved” does not necessarily mean “dissipated!”
• As a result of fundamental thermodynamic considerations, it is required
that Esig ≲ kBTsig ln r (with quantum corrections that are small for large r)
– Where kB is Boltzmann’s constant, 1.38×10−12 J/K;
– and Tsig is the temperature in the degrees of freedom carrying the signal;
– and r is the reliability factor, i.e., the improbability of error, 1/perr.
• In non-energy-recovering logic technologies (totally dominant today)
– Basically all of the signal energy is dissipated to heat on each operation.
• And often additional energy (e.g., short-circuit power) as well.
• In this case, minimum sustainable dissipation is Ediss,op ≳ kBTenv ln r,
– Where Tenv is now the temperature of the waste-heat reservoir (environment)
• Averages around 300 K (room temperature) in Earth’s atmosphere
• For a decent r of e.g. 2×1017, this energy is on the order ~40 kT ≈ 1 eV.
– Therefore, if we want energy efficiency ηec > ~1 op/eV, we must recover some
of the signal energy for later reuse.
• Rather than dissipating it all to heat with each manipulation of the signal.
11/7/2015
M. Frank, "Approaching the Physical Limits of Computing"
7
The von Neumann-Landauer
(VNL) Principle
• First alluded to by John von Neumann in 1949.
– Developed explicitly by Rolf Landauer of IBM in 1961.
• The principle is a rigorous theorem of physics!
– It follows from the reversibility of fundamental dynamics.
• A correct statement of the principle is the following:
– Any process that loses or obliviously erases 1 bit of known
(correlated) information increases total entropy by at least
∆S = 1 bit = kB ln 2,
and thus implies the eventual dissipation at least
Ediss = kBTenv ln 2
of free energy to the environment as waste heat.
• where kB = log e = 1.38×10−23 J/K is Boltzmann’s constant
• and Tenv = temperature of the waste-heat reservoir (environment)
– Not less than about room temperature, or 300 K for earthbound
computers.  implies Ediss ≥ 18 meV.
11/7/2015
M. Frank, "Approaching the Physical Limits of Computing"
8
Definition of Reversibility
• What does it mean for a dynamical system (either
continuous or discrete) to be (time-) reversible?
– Let x(t) denote the state of the system at time t.
• The universe, or any closed system of interest (e.g. a computer).
– Let Ft→u(x) be the transition relation operating between a
given two times t and u; i.e., x(u) = Ft→u[x(t)].
• Determined by the system’s dynamics (laws of physics, or a FSM).
– Then the system is called “dynamically reversible” iff Ft→u
is a one-to-one function, for any times (t, u) where u > t.
• That is,  t >u: ¬ x1x2: Ft→u(x1) = Ft→u(x2).
– That is, no two distinct states would ever go to the same state over the
course of a given time interval.
– The definition implies determinism, if we also allow u < t.
• A reversible system is deterministic in the reverse time direction.
11/7/2015
M. Frank, "Approaching the Physical Limits of Computing"
9
Types of Dynamics
• Nondeterministic,
irreversible
• Nondeterministic,
reversible
• Deterministic,
irreversible
• Deterministic,
reversible
11/7/2015
M. Frank, "Approaching the Physical Limits of Computing"
WE
ARE
HERE
10
Physics is Reversible!
• All successful models of fundamental physics are
expressible in the Hamiltonian formalism.
– Including: Classical mechanics, quantum mechanics,
special and general relativity, quantum field theories.
• The latter two (GR & QFT) are backed up by enormous,
overwhelming mountains of evidence confirming their predictions!
– 11 decimal places of precision so far! And, no contradicting evidence.
• In Hamiltonian systems, the dynamical state x(t)
obeys a differential equation that’s first-order in time,
dx/dt = g(x)
(where g is some function)
– This immediately implies determinism of the dynamics.
• And, since the time differential dt can be taken to be
negative, the formalism also implies reversibility!
– Thus, dynamical reversibility is one of the most firmlyestablished, fundamental, inviolable facts of physics!
11/7/2015
M. Frank, "Approaching the Physical Limits of Computing"
11
Illustration of VNL Principle
•
Either digital state is initially encoded by any of N possible physical microstates
– Illustrated as 4 in this simple example (the real number would usually be much larger)
– Initial entropy S = log[#microstates] = log 4 = 2 bits.
•
Reversibility of physics ensures “bit erasure” operation can’t possibly merge two
microstates, so it must double the possible microstates in the digital state!
– Entropy S = log[#microstates] increases by log 2 = 1 bit = (log e)(ln 2) = kB ln 2.
– To prevent entropy from accumulating locally, it must be expelled into the environment.
Microstates
representing
logical “0”
Entropy
S′
S ==
log 8
4=
3
2 bits
11/7/2015
Microstates
representing
logical “1”
∆S = S′ − S
= 3 bits − 2 bits
= 1 bit
M. Frank, "Approaching the Physical Limits of Computing"
[Play as
slideshow
to see
animations]
Entropy
S=
log 4 =
2 bits
12
Reversible Computing
• The basic idea is simply this:
– Don’t erase information when performing logic / storage /
communication operations!
• Instead, just reversibly (invertibly) transform it in place!
• When reversible digital operations are implemented
using well-designed energy-recovering circuitry,
– This can result in local energy dissipation Ediss << Esig,
• this has already been empirically demonstrated by many groups.
– and even total energy dissipation Ediss << kT ln 2!
• This is easily shown in theory & simulations,
– but we are not yet to the point of demonstrating such low levels of total
dissipation empirically in a physical experiment.
• Achieving this goal requires very careful design,
– and verifying it requires very sensitive measurement equipment.
11/7/2015
M. Frank, "Approaching the Physical Limits of Computing"
13
How Reversible Logic Avoids the
von Neumann-Landauer Bound
• We arrange our logical manipulations to never
attempt to merge two distinct digital states,
– but only to reversibly
transform them from
one state to another!
• E.g., illustrated is a
reversible operation
cCLR (controlled CLR)
logic 00
logic 01
logic 10
logic 11
– It and its inverse cSET
enable arbitrary logic!
11/7/2015
M. Frank, "Approaching the Physical Limits of Computing"
14
A Few Highlights Of Reversible
Computing History
• Charles Bennett @ IBM, 1973-1989:
– Reversible Turing machines & emulation algorithms
• Can emulate irreversible machines on reversible architectures.
– But, the emulation introduces some inefficiencies
– Models of chemical & Brownian-motion physical realizations.
• Fredkin and Toffoli’s group @ MIT, late 1970’s/early 1980’s
– Reversible logic gates and networks (space/time diagrams)
– Ballistic and adiabatic circuit implementation proposals
• Groups @ Caltech, ISI, Amherst, Xerox, MIT, ‘85-’95:
– Concepts for & implementations of adiabatic circuits in VLSI tech.
– Small explosion of adiabatic circuit literature since then!
• Mid 1990s-today:
– Better understanding of overheads, tradeoffs, asymptotic scaling
– A few groups begin development of post-CMOS implementations
• Most notably, the Quantum-dot Cellular Automata group at Notre Dame
11/7/2015
M. Frank, "Approaching the Physical Limits of Computing"
15
Caveat #1
•
Technically, to avoid the VNL bound doesn’t actually require that the digital
operation must be reversible at the level of the logical states…
– It can be logically irreversible if the information in the digital state is already entropy!
• In the below example, the non-digital entropy doesn’t change, because the operation is also
nondeterministic (N to N), and the transition relation between logical states has semi-detailed
balance, so the entropy in the digital state remains constant.
•
However, such operations just re-randomize bits that are already random!
Digital bit with unknown value
– It’s not clear if this kind of operation is computationally useful.
0
0
1
1
11/7/2015
Physical dynamics whose precise
details may be uncertain
M. Frank, "Approaching the Physical Limits of Computing"
16
Caveat #2
• Operations that are logically N-to-1 can be used, if
there are sufficient compensating 1-to-N
(nondeterministic) logical operations.
– All that is really required is that the logical dynamics be 1to-1 in the long-term average.
• Thus, it’s possible to thermally generate random bits and discard
them later when we are through with them.
– While maintaining overall thermodynamic reversibility.
• This ability is useful for probabilistic (randomized) algorithms.
logic 0
11/7/2015
logic 1
M. Frank, "Approaching the Physical Limits of Computing"
17
Reversibility and Reliability
• A widespread myth: “Future low-level digital
devices will necessarily be highly unreliable.”
– This comes from a flawed line of reasoning:
• Faster  more energy efficient  lower bit energies  high rate of
bit errors from thermal noise
– However, this scaling strategy doesn’t work, because:
• High rate of thermal errors  high power dissipation from error
correction  less energy efficient  ultimately slower!
• But in contrast, using reversible computing, we can
achieve arbitrarily high energy efficiency while also
maintaining arbitrarily high reliability!
– The key is to keep bit energies reasonably high!
• While recovering most of the bit energy…
11/7/2015
M. Frank, "Approaching the Physical Limits of Computing"
18
Minimizing Energy Dissipation
Due to Thermal Errors
• Let perr = 1/r be the bit-error probability per operation.
– Where r quantifies the “reliability level.”
– And pok = 1 − perr is the probability the bit is correct
• The necessary entropy increase ∆S per op due to error occurrence is
given by the (binary) Shannon entropy of the bit-value after the operation:
H(perr) = perr log perr-1 + pok log pok-1.
• For r >> 1 (i.e., as r → ∞), this increase approaches 0:
∆S = H(perr) ≈ perr log perr-1 = (log r)/r → 0
• Thus, the required energy dissipation per op also approaches 0:
Ediss = T∆S ≈ (kT ln r)/r → 0
• Could get the same result by assuming the signal energy Esig = kT ln r
required for reliability level r is dissipated each time an error occurs:
Ediss = perrEsig = perr(kT ln r) = (kT ln r)/r → 0 as r → ∞.
• Further, note that as r → ∞, the required signal energy grows only very
slowly…
– Specifically, only logarithmically in the reliability, i.e., Esig = Θ(log r).
11/7/2015
M. Frank, "Approaching the Physical Limits of Computing"
19
Device-Level Requirements for
Reversible Computing
• A good reversible device technology should have:
– Low manufacturing cost ¢d per device
• Important for good overall (system-level) cost-efficiency
– Low rate of static power dissipation Pleak due to energy leakage.
• Required for energy-efficient storage especially (but also in logic)
– Low energy coefficient cE = Ediss/f (energy dissipated per operation, per unit
transition frequency) for adiabatic transitions.
• Implies we can achieve a high operating frequency (and thus good costperformance) at a given level of energy efficiency.
– High maximum available transition frequency fmax.
• Important for those applications in which the latency of serial threads of computation
dominates total cost
• Important: For system-level energy efficiency, Pleak and cE must be taken
as effective global values measuring the implied amount of energy emitted
into the outside environment at temperature Tenv.
– With an ideal (Carnot) refrigerator, Pleak = StTenv and cE = cSTenv,
• Where St = the static rate of leakage entropy generation per unit time,
• and cS = Sgen/f adiabatic entropy coefficient, or entropy generated per unit transition
frequency.
11/7/2015
M. Frank, "Approaching the Physical Limits of Computing"
20
Early Chemical Implementations
• How to physically implement reversible logic?
– Bennett’s original inspiration: DNA polymerization!
• Reversible copying of a DNA strand
– Molecular basis of cell division / organism reproduction
• This (and all) chemical reactions are reversible…
– Direction (forward vs. backward) & reaction rate depends on relative
concentrations of reagent and product species  affect free energy
• Energy dissipated per step turns out to be proportional to speed.
– Implies process is characterized by an energy-time constant.
» I call this the “energy coefficient” cEt ≡ Ediss,optop = Ediss,op/fop.
• For DNA, typical figures are 40 kT ≈ 1eV @ ~1,000 bp/s
– Thus, the energy coefficient cE is about 1 eV/kHz.
• Can we achieve better energy coefficients?
– Yes, in fact, we had already beat DNA’s cE in reversible
CMOS VLSI technology available circa 1995!
11/7/2015
M. Frank, "Approaching the Physical Limits of Computing"
21
Energy & Entropy Coefficients
Q
in Electronics
R
• For a transition involving the adiabatic transfer
of an amount Q of charge along a path with
resistance R:
– The raw (local) energy coefficient is given by
cEt = Edisst = Pdisst2 = IVt2 = I2Rt2 = Q2R.
• Where V is the voltage drop along the path.
– The entropy coefficient cSt = Q2R/Tpath.
• where Tpath is the local thermodynamic temperature in
the path.
– The effective (global) energy coefficient is
cEt,eff = Q2R(Tenv/Tpath).
• We pay a penalty for low-T operation!
11/7/2015
M. Frank, "Approaching the Physical Limits of Computing"
22
Example of Electronic cEt
• In a fairly recent (180 nm) CMOS VLSI technology:
– Energy stored per min. sized transistor gate: ~1 fJ @ 2V
• Corresponds to charge per gate of Q = 1 fC ≈ 6,000 electrons
– Resistance per turned-on min-sized nFET of ~14 kΩ
• Order of the quantum resistance R = R0 = 1/G0 = h/2q2 = 12.9 kΩ
– Ideal energy coefficient for a single-gate transition
~1.4×10−26 J/Hz
• Or in more convenient units, ~80 eV/GHz = 0.08 eV/MHz!
– with some expected overheads for a simple test circuit,
calculated energy coefficient comes out to about 8× higher,
or ~10−25 J·s
• Or ~600 eV/GHz = 0.6 eV/MHz.
– Detailed Cadence simulations gave us, per transistor:
• @ 1 GHz: P = 20 μW, E = 20 fJ = 1.2 keV, so Ec = 1.2 eV/MHz
• @ 1 MHz: P = 0.35 pW, E = 3.5 aJ = 2.2 eV, so Ec = 2.1 eV/MHz
11/7/2015
M. Frank, "Approaching the Physical Limits of Computing"
23
Cadence Simulation Results
Power vs. freq., TSMC 0.18, Std. CMOS vs. 2LAL
•
2LAL = Two-level adiabatic logic
1.E-05
– in a shift register.
•
1.E-07
Standard
CMOS
1.E-09
1.E-10
1.E-11
1.E-12
1.E-13
1.E-14
1.E+09 1.E+08 1.E+07 1.E+06 1.E+05 1.E+04 1.E+03
11/7/2015
Energy dissipated per nFET per cycle
Average power dissipation per nFET, W
1.E-06
1.E-08
Graph shows power
dissipation vs. frequency
At moderate frequencies
(1 MHz),
– Reversible uses
< 1/100th the power of
irreversible!
•
At ultra-low power
(1 pW/transistor)
– Reversible is 100×
faster than irreversible!
•
Minimum energy
dissipation < 1 eV!
– 500× lower than best
irreversible!
• 500× higher
computational energy
efficiency!
•
Energy transferred is still
~10 fJ (~100 keV)
– So, energy recovery
efficiency is 99.999%!
Frequency, Hz
M. Frank, "Approaching the Physical Limits of Computing"
• Not including losses in
power supply
24
A Useful Two-Bit Primitive:
Controlled-SET or cSET(a,b)
• Semantics: If a=1, then set b:=1.
a
0
0
1
– Conditionally reversible, if the special
precondition ab=0 is met.
• Note it’s 1-to-1 on the subset of states used
– Sufficient to avoid Landauer’s principle!
• We can implement cSET in dual-rail
CMOS with a pair of transmission gates
a
• This 2-bit semi-reversible operation &
its inverse cCLR are universal for
reversible (and irreversible) logic!
– If we compose them in special ways.
• And include latches for sequential logic.
11/7/2015
M. Frank, "Approaching the Physical Limits of Computing"
a’ b’
0 0
0 1
1 1
drive
(0→1)
– Each needs just 2 transistors,
• plus one controlling “drive” signal
b
0
1
0
switch
(T-gate)
b
a
b
25
Reversible OR (rOR) from cSET
• Semantics: rOR(a,b) ::= if a|b, c:=1.
– Set c:=1, on the condition that either a or b is 1.
• Reversible under precondition that initially a|b → ~c.
• Two parallel cSETs simultaneously
Hardware diagram
driving a shared output line
a
implement the rOR operation!
c
– This type of gate composition was
not traditionally considered.
• Similarly one can do
rAND, and reversible
versions of all operations.
– Logic synthesis with these
is extremely straightforward…
11/7/2015
b
Spacetime diagram
a’
a
c
0
b
M. Frank, "Approaching the Physical Limits of Computing"
a OR b
c’
b’
26
CMOS Gate Implementing
rLatch / rUnLatch
• Symmetric Reversible Latch
Implementation
Icon
Spacetime Diagram
crLatch
connect
in
2
in
mem
mem
crUnLatch
in
or
connect
in
mem
mem
(in)
• The hardware is just a CMOS transmission gate again
• This time controlled by a clock, with the data signal driving
• Concise, symmetric hardware icon – Just a short orthogonal line
• Thin strapping lines denote connection in spacetime diagram.
11/7/2015
M. Frank, "Approaching the Physical Limits of Computing"
27
Example:
Building cNOT from rlXOR
• rlXOR(a,b,c): Reversible latched XOR.
– Semantics: c := ab.
• Reversible under precondition that c is initially clear.
• cNOT(a,b): Controlled-NOT operation.
– Semantics: b := ab. (No preconditions.)
• A classic “primitive” operation in reversible & quantum computing
– But, it turns out to be fairly complex to implement cNOT in
available fully adiabatic hardware technologies…
• Thus, it’s really not a very good building block for practical
reversible hardware designs!
– Of course, we can still build it, if we really want to.
• Since, as I said, our gate set is universal for reversible logic
11/7/2015
M. Frank, "Approaching the Physical Limits of Computing"
28
cNOT from rlXOR:
Hardware Diagram
• A logic block providing an in-place cNOT
operation (a cNOT “gate”) can be constructed
from 2 rlXOR gates and two latched buffers.
A
B
Reversible
latches
X
• The key is:
– Operate some of the gates in reverse!
11/7/2015
M. Frank, "Approaching the Physical Limits of Computing"
29
Θ(log n)-time carry-skip adder
(8 bit segment shown)
3rd carry tick
4th carry tick
S AB
G
S AB
Cin
P
Pms
G
S AB
GCoutCin
G
P
S AB
P
Gls Pls
MS
Pms
GCout
P
S AB
GCoutCin
Cin
G
P
Gls
LS
G
Gls
S AB
P
Pls
Pms
G
Cin
S AB
GCoutCin
Cin
P
Pms
With this structure, we can do a
2n-bit add in 2(n+1) logic levels
→ 4(n+1) reversible ticks
→ n+1 clock cycles.
2nd carry tick
G
P
Gls Pls
MS
Pms
GCout
Gls
GCout LS
P
P
Gls
LS
Pls
Cin
Pls
Cin
P
Pms
Gls
GCout LS
Hardware
overhead is
< 2× regular
ripple-carry!
P
Pms
MS
GCoutCin
P
P
Pls
S AB
Cin
Pls
Cin
Spacetime
overhead only
~2(n+1)× a
conventional
single-cycle
equivalent.
P
11/7/2015
M. Frank, "Approaching the Physical Limits of Computing"
30
32-bit Adder Simulation Results
32-bit adder power vs.
frequency
32-bit adder energy vs.
frequency
1.E-04
1.E-11
Energy/Add (J)
1.E-05
Power (W)
1.E-06
1.E-07
1.E-12
1V CMOS
0.5V CMOS
1.E-13
1.E-14
1.E-08
Adia. enrgy
20x better perf.
@ 3 nW/adder
CMOS pwr
1.E-09
CMOS energy
1.E-15
1.E+08
Adia. pwr
1.E+07
1.E+06
1.E+05
1.E+04
Add Frequency (Hz)
1.E-10
1.E+08
1.E+07
1.E+06
1.E+05
Add Frequency (Hz)
11/7/2015
1.E+04
(All results here are normalized to a
throughput level of 1 add/cycle)
M. Frank, "Approaching the Physical Limits of Computing"
31
Technological Challenges
• Fundamental theoretical challenges:
– Find more efficient reversible algorithms
• Or, prove rigorous lower bounds on complexity overheads
– Study fundamental physical limits of reversible computing
• Implementation challenges:
– Design new devices with lower energy coefficients cEt
– Design high-quality resonators for driving transitions
– Empirically demonstrate large system-level power savings
• Application development challenges:
– Find a plausible near- to medium-term “killer app” for RC
• Something that’s very valuable, and can’t be done without it
– Build a prototype RC-based solution prototype
11/7/2015
M. Frank, "Approaching the Physical Limits of Computing"
32
Power vs. freq., alt. device techs.
Plenty of Room for
Device Improvement
Power per device, vs. frequency
1.E-03
1.E-04
1.E-05
1.E-06
• Recall, irreversible device
technology has at most ~34 orders of magnitude of
power-performance
improvements remaining.
1.E-07
1.E-08
1.E-09
1.E-10
1.E-11
1.E-12
1.E-13
1.E-15
– And then, the firm kT ln 2
(VNL) limit is encountered.
1.E-16
1.E-17
1.E-18
• But, a wide variety of
proposed reversible device
technologies have been
analyzed by physicists.
1.E-19
1.E-20
1.E-21
.18um 2LAL
nSQUID
QCA cell
Quantum FET
Rod logic
Param. quantron
Helical logic
.18um CMOS
kT ln 2
– With theoretical powerperformance up to 10-12
orders of magnitude better
than today’s CMOS!
• Ultimate limits are unclear.
11/7/2015
Power per device (W)
1.E-14
1.E+12
1.E+11
1.E+10
1.E+09
1.E-22
1.E-23
1.E-24
Various
reversible
device proposals
1.E-25
1.E-26
1.E-27
1.E-28
1.E-29
1.E-30
1.E+08
1.E+07
Frequency (Hz)
M. Frank, "Approaching the Physical Limits of Computing"
1.E+06
1.E+05
1.E+04
1.E-31
1.E+03
33
Limiting Cases of
Energy/Entropy Coefficients
• Entropy/entropy coefficients in adiabatic “single electronics:”
– Suppose the amount of charge moved |Q| = q (a single electron)
– Let the path consist of a single quantum channel (chain of states)
• Has quantum resistance R = R0 = 1/G0 = h/2q2 = 12.9 kΩ.
– Then cE = h/2 = 2.07 meV/THz (very low!)
• If path is at Tpath = Troom = 300 K, then cS = 0.08 k/THz.
– For N× better efficiency than this, let the path consist of N parallel
quantum channels.  N× lower resistance.
• What about systems where resistive models may not apply?
– E.g., superconductors, photonics, etc.
• A more general and rigorous (but perhaps loose) lower bound
on the energy coefficient in all adiabatic quantum systems is
given by the expression cE ≥ h2/4Egt,
– where Eg = energy gap between ground & excited states,
– and t = time taken for a single orthogonalizing transition
– Ex.: Let Eg = 1 eV, t = 1 ps. Then cE ≥ 4.28 μeV/THz.
11/7/2015
M. Frank, "Approaching the Physical Limits of Computing"
34
Requirements for EnergyRecovering Clock/Power Supplies
• All known reversible computing schemes require a periodic
global signal that synchronizes and drives adiabatic
transitions.
– For good system-level energy efficiency, this signal must oscillate
resonantly and near-ballistically, with a high effective quality factor.
• Several factors make the design of a satisfactory resonator
quite difficult:
– Need to avoid uncompensated back-action of logic on resonator
– In some resonators, Q factor may scale unfavorably with size
– Effective quality factor problem
• There’s no reason to think that it’s impossible to do…
– But it is definitely a nontrivial hurdle, that we need to face up to, pretty
urgently…
• If we want to make reversible computing practical in time to avoid an
extended period of stagnation in computer performance growth.
11/7/2015
M. Frank, "Approaching the Physical Limits of Computing"
35
The Back-Action Problem
• The ideal resonator signal is a pure periodic signal.
– A pretty general result from communications theory:
• A resonator’s quality factor is inversely proportional to its signal bandwidth B.
– E.g., for an EM cavity w. resonant frequency ω0,
• the half-maximum BW is B = ∆ω = ω0/(2πQ) [1].
– Thus Q∞  B  0.
• There must be little or no information in the resonator signal!
• However, if the logic load being driven varies from on cycle to the next,
– whether due to data-dependent variations,
– or structural variations (different amounts of logic being driven per cycle)
• this will tend to produce impedance nonuniformities, which will lead to
nonuniform reflections of the resonator signal
– and thereby introduce nonzero bandwidth into that signal.
• Even more generally, any departure of resonator energy away from its
ideal desired trajectory represents a form of effective energy dissipation!
– we must control exactly where (into what states) all of the energy goes!
• the set of possible microstates of the system must not grow quickly
[1] Schwartz, Principles of Electrodynamics, Dover, 1972.
11/7/2015
M. Frank, "Approaching the Physical Limits of Computing"
36
Unfavorable Scaling of Resonator
Quality Factor with Size?
• I don’t yet have a perfectly clear and general understanding of
this issue, but…
– In a lot of oscillating systems I’ve looked at, the resonant Q factor may
tend to get worse (or at least, not very much better) as the resonator
dimensions get smaller.
• E.g., in LC oscillators, inductor Q scales inversely to frequency
– EM emission is greater at high frequencies
– But, the tendency is for low f  large coil sizes, not small!
• Anecdotal reports from people working in NEMS community…
– It can be difficult to get high Q in nanoscale electromechanical resonators
» Perhaps due to present difficulty of precision engineering at nanoscale?
• Our own experience working with transmission-line resonators
• Example: In a cubical EM cavity of length L,
– We have 2πQ = L / 8δ, where δ = skin depth. ([1] again)
• Skin depth δ = (2πσk)−1/2, where σ = wall conductivity, k = wave #.
– So if L is fixed, high Q  small δ  large k  high f  low Q in logic!
11/7/2015
M. Frank, "Approaching the Physical Limits of Computing"
37
The Effective Quality Factor
Problem
• Actual quality factor of resonator Q = Eres/Edissr.
– Where Eres = energy contained in resonator signal
– and Edissr = energy dissipated in resonator per cycle.
• But the effective quality factor, for purposes of doing
energy-efficient logic transitions is Qeff = Edeliv/Edissr.
– Where Edeliv = energy delivered to the logic per transition.
• Since 1/Qeff of the logic signal energy is dissipated per cycle.
• Thus, Qeff = Q · (Edeliv/Eres).
– That is, the effective Q is taken down by the fraction of
resonator energy delivered to the logic per cycle.
• If a resonator needs to be large to attain high Q,
– it may also hold a large amount of energy Eres,
• and so it may not have a very high effective Q for driving the logic!
11/7/2015
M. Frank, "Approaching the Physical Limits of Computing"
38
(PATENT PENDING, UNIVERSITY OF FLORIDA)
Trapezoidal Resonator Concept
Arm anchored to nodal points of fixed-fixed beam flexures,
located a little ways away, in both directions (for symmetry)
Moving metal plate support arm/electrode
Moving
plate Range of Motion
z
Phase 0° electrode
C(θ)
0°
θ
11/7/2015
360°
Repeat
interdigitated
structure
arbitrarily many
times along y axis,
all anchored to the
same flexure
Phase 180° electrode
x
C(θ)
0°
θ
M. Frank, "Approaching the Physical Limits of Computing"
y
360°
39
Previous CMOS-MEMS Resonators
in post-CMOS DRIE process (in use at UF)
Front-side
view
Serpentine
Proof
spring
mass
Comb
drive
Back-side
view
150 kHz
Resonators
11/7/2015
M. Frank, "Approaching the Physical Limits of Computing"
40
PATENT PENDING, UNIVERSITY OF FLORIDA
Resonator Schematic
Vc
vac
Actuator
Vc
vac
Ca
Sensor
Vb
Sensor
Cs
Cr
Vb
Sensor
Vc
Sensor
 vac
Actuator
11/7/2015
Vp  Vc  Vb
M. Frank, "Approaching the Physical Limits of Computing"
41
PATENT PENDING, UNIVERSITY OF FLORIDA
Post-TSMC35 AdiaMEMS Resonator
(Coventorware model)
Taped out
April ‘04
Drive
comb
Sense
comb
Flex
arm
11/7/2015
M. Frank, "Approaching the Physical Limits of Computing"
42
Quasi-Trapezoidal MEMS
Resonator: 1st Fabbed Prototype
• Post-etch process is still being fine-tuned.
– Parts are not yet ready for testing…
Primary
flexure
(fin)
Sense
comb
Drive comb
PATENT PENDING, UNIVERSITY OF FLORIDA
11/7/2015
M. Frank, "Approaching the Physical Limits of Computing"
43
Conclusions
• Reversible computing will become necessary
within our lifetimes,
– if we wish to continue progress in computing
performance/power beyond the next 1-2 decades.
• Much progress in our understanding of RC
has been made in the past three decades…
– But much important work still remains to be done.
• I encourage my audience to join the
community of researchers who are working to
address the reversible computing challenge.
11/7/2015
M. Frank, "Approaching the Physical Limits of Computing"
44