Michael P. Frank http://www.eng.fsu.edu/~mpf Introduction to Reversible Computing: Motivation, Progress, and Challenges ACM Computing Frontiers Conference 2005 Special Session: 1st Int’l Workshop on Reversible Computing Thursday, May 5,

Download Report

Transcript Michael P. Frank http://www.eng.fsu.edu/~mpf Introduction to Reversible Computing: Motivation, Progress, and Challenges ACM Computing Frontiers Conference 2005 Special Session: 1st Int’l Workshop on Reversible Computing Thursday, May 5,

Michael P. Frank
http://www.eng.fsu.edu/~mpf
Introduction to Reversible
Computing:
Motivation, Progress, and Challenges
ACM Computing Frontiers Conference 2005
Special Session:
1st Int’l Workshop on Reversible Computing
Thursday, May 5, 2005
Abstract of Talk
• The practical performance of a computational
process is ultimately limited by its energy efficiency.
– Useful work accomplished per unit energy dissipated.
• Fundamental physics limits the energy efficiency of
conventional, irreversible logic.
– The energy efficiency of conventional devices will likely be
forced to level off in roughly the next 10-20 years.
• Further advances beyond this point will require the
use of highly energy-recovering circuit techniques…
– and (eventually) this will require an increasing degree of
logical reversibility throughout the digital design.
• In this talk, we:
– explain these motivations for reversible computing,
– summarize some recent progress towards its realization
– and discuss some outstanding challenges for the field.
11/7/2015
M. Frank, "Introduction to Reversible Computing"
2
Michael P. Frank
http://www.eng.fsu.edu/~mpf
Introduction to
Reversible Computing
PART 1:
Motivation
Energy Efficiency
• The efficiency η of a process that consumes valued
resource R and produces valued product P is the
ratio between the amount of product produced, and
the amount of resource consumed: η = Pprod/Rcons.
– Example 1: A heat engine “consumes” (which in this case,
means “degrades”) an amount Q of high-temperature heat
energy, and produces an amount W of work.
• The heat engine’s efficiency is thus ηh.e. = W/Q. (Dimensionless.)
– Of course, ηh.e < 1 because of the conservation of energy…
– In the 19th cent., Sadi Carnot showed that ηh.e. ≤ (TH − TL)/TH.
» Where TH,TL = temps. of hot, cold thermal reservoirs
– Example 2: A computer (i.e., “computational engine”)
consumes an amount Econs of free energy, and performs
Nops useful computational operations (produces Nops
operations worth of useful computational “effort”).
• The computer’s (energy) efficiency is thus ηE,comp = Nops/Econs.
– Units: Operations per unit energy, or ops/sec/watt.
11/7/2015
M. Frank, "Introduction to Reversible Computing"
4
Lower Bounds on
Energy Dissipation
• In today’s 90 nm VLSI technology, for minimal operations
(e.g., conventional switching of a minimum-sized transistor):
– Ediss,op is on the order of 1 fJ (femtojoule)  ηE ≲ 1015 ops/sec/watt.
• Will be a bit better in coming technologies (65 nm, maybe 45 nm)
• But, conventional digital technologies are subject to several
lower bounds on their energy dissipation Ediss,op for digital
transitions (logic / storage / communication operations),
– And thus, corresponding upper bounds on their energy efficiency.
• Some of the known bounds include:
– Leakage-based limit for high-performance field-effect transistors:
• Maybe roughly ~5 aJ (attojoules)  ηE ≲ 2×1017 operations/sec./watt
– Reliability-based limit for all non-energy-recovering technologies:
• Roughly 1 eV (electron-volt)  ηE ≲ 6×1018 ops./sec/watt
– von Neumann-Landauer (VNL) bound for all irreversible technologies:
• Exactly kT ln 2 ≈ 18 meV  ηE ≲ 3.5×1020 ops/sec/watt
– For systems whose waste heat ultimately winds up in Earth’s atmosphere,
» i.e., at temperature T ≈ Troom = 300 K.
11/7/2015
M. Frank, "Introduction to Reversible Computing"
5
1.E-14
Gate Energy
Trends
Trend of ITRS
Min.'97-'03
Transistor
Switching
Energy
Based on ITRS ’97-03 roadmaps
250
180
1.E-15
130
90
Node numbers
(nm DRAM hp)
65
1.E-16
CVV/2 energy, J
LP min gate energy, aJ
HP min gate energy, aJ
100 k(300 K)
ln(2) k(300 K)
1 eV
k(300 K)
45
32
1.E-17
fJ
22
Practical limit for CMOS?
1.E-18
aJ
Room-temperature 100 kT reliability limit
One electron volt
1.E-19
1.E-20
Room-temperature kT thermal energy
Room-temperature von Neumann - Landauer limit
zJ
1.E-21
1.E-22
1995
2000
2005
2010
2015
2020
2025
2030
2035
2040
2045
Year
11/7/2015
M. Frank, "Introduction to Reversible Computing"
6
Reliability Bound on Logic
Signal Energies
• Let Esig denote the logic signal energy,
– The energy involved in storing, transmitting, or transforming a bit’s worth of
digital information.
• But note that “involved” does not necessarily mean “dissipated!”
• As a result of fundamental thermodynamic considerations, it is required
that Esig ≥ kBTsig ln R,
– Where kB is Boltzmann’s constant, 1.38×10−12 J/K;
– and Tsig is the temperature of the local subsystem carrying the signal;
– and R is the reliability factor, i.e., the improbability 1/perr of error.
• In non-energy-recovering logic technologies (totally dominant today)
– Basically all of the signal energy is dissipated to heat on each operation.
• And often additional energy (e.g., short-circuit power) as well.
• In this case, minimum sustainable dissipation is Ediss,op ≳ kBTenv ln R,
– Where Tenv is now the temperature of the waste-heat reservoir
• Averages around 300 K (room temperature) in Earth’s atmosphere
• For a decent R = 2×1017, this energy is ~40 kT ≈ 1 eV.
–  For energy efficiency > 1 op/eV, we must recover some of the signal energy.
• Rather than dissipating it all to heat with each manipulation of the signal.
11/7/2015
M. Frank, "Introduction to Reversible Computing"
7
(von Neumann?)-Landauer (VNL) Bound
A rigorous result first stated clearly by Rolf Landauer, IBM, 1961
(von Neumann had suggested something similar in 1949 but did not publish details)
• Bound is a simple, direct logical consequence of the timereversibility (invertibility) of all fundamental physical dynamics.
– This in turn is implied by the Hamiltonian formulation of all mechanics;
e.g., the unitarity of quantum mechanics.  Very firmly established!
• Invertibility implies physical information can’t be destroyed!
– Only reversibly (i.e., mathematically invertibly) transformed!
• When we lose or discard a bit’s worth of logical information,
– e.g., by erasing or destructively overwriting a bit storage location…
• the ‘lost’ information must actually remain in existence,
– if not in a known form, then as a bit’s worth (k ln 2) of physical entropy.
• Entropy simply means unknown information residing in the physical state.
• If the logical bit was originally known (not entropy)
– then, entropy has increased in this process by ∆S = 1 bit = k ln 2.
• The energy in the heat reservoir must be increased by an amount ∆S·Tenv
= kTenv ln 2 in order to accommodate this additional entropy.
11/7/2015
M. Frank, "Introduction to Reversible Computing"
8
VNL Bound on Energy Dissipation
from Information Loss
N physical microstates
per logical macrostate
before bit erasure
(shown as 8 for clarity
in this simple example)
Physical
microstate
trajectories
Follows directly from the reversibility
of fundamental physics!
Logical
state “0”,
after
operation
S = k ln 8
= 3 bits
S = k ln 16
= 4 bits
Logical
state “0”,
before
operation
∆S = 1 bit
= k ln 2
Logical
state “1”,
before
operation
11/7/2015
S = k ln 8
= 3 bits
M. Frank, "Introduction to Reversible Computing"
Ediss = ∆S·Tenv
= kTenv ln 2
9
Reversible Computing
• The basic idea is simply this:
– Don’t erase information when performing logic / storage /
communication operations!
• Instead, just reversibly (invertibly) transform it in place!
• When reversible digital operations are implemented
using well-designed energy-recovering circuitry,
– This can result in local energy dissipation Ediss << Esig,
• this has already been empirically demonstrated by many groups.
– and even total energy dissipation Ediss << kT ln 2!
• This has been shown in theory, but we are not yet to the point of
demonstrating such low levels of dissipation experimentally.
– Achieving this goal requires very careful design,
– and verifying it requires very sensitive measurement equipment.
11/7/2015
M. Frank, "Introduction to Reversible Computing"
10
Michael P. Frank
http://www.eng.fsu.edu/~mpf
Introduction to
Reversible Computing
PART 2:
Progress
(1973-2005)
A Few Highlights Of Reversible
Computing History
• Bennett, 1973-1989:
– Reversible Turing machines & emulation algorithms
• Can run “virtual” irreversible machines on reversible architectures.
– But, the emulation introduces some inefficiencies
– Early chemical & Brownian-motion models of physical
implementations.
• Fredkin and Toffoli, late 1970’s/early 1980’s
– Reversible logic gates and networks
– Ballistic and adiabatic implementation schemes
• Groups @ Caltech,ISI,Amherst,Xerox,MIT, ‘85-’95:
– Concepts & implementation for adiabatic circuits in VLSI
– Small explosion of adiabatic circuit literature since then
• Mid 1990s-today:
– Better understanding of overheads, tradeoffs, asymptotic scaling
– A few groups begin exploring post-CMOS implementations
11/7/2015
M. Frank, "Introduction to Reversible Computing"
12
Early Chemical Implementations
• How to physically implement reversible logic?
– Bennett’s original inspiration: DNA polymerization!
• Reversible copying of a DNA strand
– Molecular basis of cell division / organism reproduction
• This (and all) chemical reactions are reversible…
– Direction (forward vs. backward) & reaction rate depends on relative
concentrations of reagent and product species  affect free energy
• Energy dissipated per step turns out to be proportional to speed.
– Implies process is characterized by an energy-time constant.
» I call this the “energy coefficient” cE ≡ Ediss,optop = Ediss,op/fop.
• For DNA, typical figures are 40 kT ≈ 1eV @ ~1,000 bp/s
– Thus, the energy coefficient cE is about 1 eV/kHz.
• Can we achieve better energy coefficients?
– Yes, in fact, we had already beat DNA’s cE in reversible
CMOS VLSI technology circa 1995!
11/7/2015
M. Frank, "Introduction to Reversible Computing"
13
Energy Coefficients
in Electronics
• For a transition involving the adiabatic transfer of an amount
Q of charge along a path with resistance R:
– The raw (local) energy coefficient is given by
cE = Edisst = Pdisst2 = IVt2 = I2Rt2 = Q2R.
Q
• Here, V is the voltage drop along the path
R
• Example: In a fairly recent (180 nm) CMOS VLSI technology:
– Energy stored per min. sized transistor gate: ~1 fJ @ 2V
• Corresponds to charge per gate of Q = 1 fC ≈ 6,000 electrons
– Resistance per turned-on transistor of ~14 kΩ
• Order of quantum resistance R = R0 = 1/G0 = h/2q2 = 12.9 kΩ
– Ideal energy coefficient for a single-gate transition ~1.4×10−26 J/Hz
• Or in more convenient units, ~80 eV/GHz = 0.08 eV/MHz!
– with some expected overheads for a simple test circuit, calculated
energy coefficient comes out to about 8× higher, or ~10−25 J·s
• Or ~600 eV/GHz = 0.6 eV/MHz.
– Detailed Cadence simulations gave us, per transistor:
• @ 1 GHz: P = 20 μW, E = 20 fJ = 1.2 keV, so Ec = 1.2 eV/MHz
• @ 1 MHz: P = 0.35 pW, E = 3.5 aJ = 2.2 eV, so Ec = 2.1 eV/MHz
11/7/2015
M. Frank, "Introduction to Reversible Computing"
14
Simulation Results from Cadence
Power vs. freq., TSMC 0.18, Std. CMOS vs. 2LAL
1.E-05
1.E-07
1.E-08
Standard
CMOS
1.E-10
1.E-11
1.E-12
<.01× the power
@ 1 MHz
1.E-09
>100× faster
@ 1 pW/T
1.E-13
1.E-14
1.E+09 1.E+08 1.E+07 1.E+06 1.E+05 1.E+04 1.E+03
11/7/2015
Energy dissipated per nFET per cycle
Average power dissipation per nFET, W
1.E-06
Assumptions & caveats:
•Assumes ideal trapezoidal
power/clock waveform.
• Minimum-sized devices, 2λ×3λ
* .18 µm (L) × .24 µm (W)
• nFET data is shown
* pFETs data is very similar
• Various body biases tried
* Higher Vth suppresses leakage
• Room temperature operation.
• Interconnect parasitics have not
yet been included.
• Activity factor (transitions per
device-cycle) is 1 for CMOS,
0.5 for 2LAL in this graph.
• Hardware overhead from fullyadiabatic design style is not
yet reflected
* ≥2× transistor-tick hardware
overhead in known reversible
CMOS design styles
Frequency,
Hz
M. Frank, "Introduction
to Reversible Computing"
15
A Useful Two-Bit Primitive:
Controlled-SET or cSET(a,b)
• Semantics: If a=1, then set b:=1.
a
0
0
1
– Conditionally reversible, if the special
precondition ab=0 is met.
• Note it’s 1-to-1 on the subset of states used
– Sufficient to avoid Landauer’s principle
• Can implement cSET in dual-rail CMOS
with a pair of transmission gates
– Each needs just 2 transistors
• plus one drive signal
• This 2-bit semi-reversible operation &
its inverse are together universal for
reversible (and irreversible) logic!
– If we compose them in special ways.
11/7/2015
M. Frank, "Introduction to Reversible Computing"
b
0
1
0
a’ b’
0 0
0 1
1 1
drive
(0→1)
a
switch
(T-gate)
b
a
b
16
Reversible OR (rOR) from cSET
• Semantics: rOR(a,b) ::= if a|b, c:=1.
– Set c:=1 on the condition that either a or b is 1.
• Reversible under precondition that initially a|b → ~c.
• Two parallel cSETs simultaneously
Hardware diagram
driving a single output line
a
implements the rOR operation!
c
– This type of composition is
not traditionally considered.
• Similarly one can do
rAND, and reversible
versions of all operations.
– Logic synthesis is extremely
straightforward…
11/7/2015
b
Spacetime diagram
a’
a
c
0
b
M. Frank, "Introduction to Reversible Computing"
a OR b
c’
b’
17
O(log n)-time carry-skip adder
(8 bit segment shown)
3rd carry tick
4th carry tick
S AB
G
S AB
Cin
P
Pms
G
S AB
GCoutCin
G
P
S AB
P
Gls Pls
MS
G
P
Pms
Gls
GCout
P
S AB
GCoutCin
Cin
LS
G
Gls
S AB
P
Pls
Pms
G
Cin
S AB
GCoutCin
Cin
P
Pms
With this structure, we can do a
2n-bit add in 2(n+1) logic levels
→ 4(n+1) reversible ticks
→ n+1 clock cycles.
2nd carry tick
G
P
Gls Pls
MS
Pms
GCout
Gls
GCout LS
P
P
Gls
LS
Pls
Cin
P
Pms
MS
GCoutCin
P
P
Pls
S AB
Cin
Hardware
overhead is
<2× regular
ripple-carry!
Pls
Cin
P
Pms
Gls
GCout LS
Pls
Cin
P
11/7/2015
M. Frank, "Introduction to Reversible Computing"
18
32-bit Adder Simulation Results
32-bit adder power vs.
frequency
32-bit adder energy vs.
frequency
1.E-04
1.E-11
Energy/Add (J)
1.E-05
Power (W)
1.E-06
1.E-07
1.E-12
1V CMOS
0.5V CMOS
1.E-13
1.E-14
CMOS energy
1.E-08
Adia. enrgy
20x better perf.
@ 3 nW/adder
CMOS pwr
1.E-09
1.E-15
1.E+08
Adia. pwr
1.E+07
1.E+06
1.E+05
1.E+04
Add Frequency (Hz)
1.E-10
1.E+08
1.E+07
1.E+06
1.E+05
Add Frequency (Hz)
11/7/2015
1.E+04
(All results normalized to a
throughput level of 1 add/cycle)
M. Frank, "Introduction to Reversible Computing"
19
CMOS Gate Implementing
rLatch / rUnLatch
• Symmetric Reversible Latch
Implementation
Icon
Spacetime Diagram
crLatch
connect
in
2
in
mem
mem
crUnLatch
in
or
connect
in
mem
mem
(in)
• Just a transmission gate again
• This time controlled by a clock, with the data signal driving
• Concise, symmetric hardware icon – Just a short orthogonal line
• Thin strapping lines denote connection in spacetime diagram.
11/7/2015
M. Frank, "Introduction to Reversible Computing"
20
Example:
Building cNOT from rlXOR
• rlXOR(a,b,c): Reversible latched XOR.
– Semantics: c := ab.
• Reversible under precondition that c is initially clear.
• cNOT(a,b): Controlled-NOT operation.
– Semantics: b := ab. (No preconditions.)
• A classic “primitive” in reversible & quantum computing
– But, it turns out to be fairly complex to implement cNOT in
available fully adiabatic hardware…
• Thus, it’s really not a very good building block for practical
hardware designs!
– We can (of course) still build it, if we really want to.
• Since, as I said, our gate set is universal for reversible logic
11/7/2015
M. Frank, "Introduction to Reversible Computing"
21
cNOT from rlXOR:
Hardware Diagram
• A logic block providing an in-place cNOT
operation (a cNOT “gate”) can be constructed
from 2 rlXOR gates and two latched buffers.
A
B
Reversible
latches
X
• The key is:
– Operate some of the gates in reverse!
11/7/2015
M. Frank, "Introduction to Reversible Computing"
22
Michael P. Frank
http://www.eng.fsu.edu/~mpf
Introduction to
Reversible Computing
PART 3:
Challenges
for the Field
Challenges for the Field
• If we want our field to go beyond academia,
– and become a practical computing technology,
• then we need to address both:
– a few remaining technological challenges
– and also, a variety of “PR” type challenges
• because these are closely coupled!
– A convincing technology gets people excited
– Positive perceptions  more funding, workers
11/7/2015
M. Frank, "Introduction to Reversible Computing"
24
Technological Challenges
• Fundamental theoretical challenges:
– Find more efficient reversible algorithms
• Or prove rigorous lower bounds on complexity overheads
– Study fundamental physical limits of reversible computing
• Implementation challenges:
– Design new devices with lower energy coefficients
– Design high-quality resonators for driving transitions
– Empirically demonstrate large system-level power savings
• Application development challenges:
– Find a plausible near- to medium-term “killer app” for RC
• Something that’s very valuable, and can’t be done without it
– Build a prototype RC-based solution prototype
11/7/2015
M. Frank, "Introduction to Reversible Computing"
25
Power vs. freq., alt. device techs.
Plenty of Room for
Device Improvement
Power per device, vs. frequency
1.E-03
1.E-04
1.E-05
1.E-06
• Recall, irreversible device
technology has at most ~34 orders of magnitude of
power-performance
improvements remaining.
1.E-07
1.E-08
1.E-09
1.E-10
1.E-11
1.E-12
1.E-13
1.E-15
– And then, the firm kT ln 2 limit
is encountered.
1.E-16
1.E-17
1.E-18
• But, a wide variety of
proposed reversible device
technologies have been
analyzed by physicists.
1.E-19
1.E-20
1.E-21
.18um 2LAL
nSQUID
QCA cell
Quantum FET
Rod logic
Param. quantron
Helical logic
.18um CMOS
kT ln 2
– With theoretical powerperformance up to 10-12
orders of magnitude better
than today’s CMOS!
• Ultimate limits are unclear.
11/7/2015
Power per device (W)
1.E-14
1.E+12
1.E+11
1.E+10
1.E+09
1.E-22
1.E-23
1.E-24
Various
reversible
device proposals
1.E-25
1.E-26
1.E-27
1.E-28
1.E-29
1.E-30
1.E+08
1.E+07
Frequency (Hz)
M. Frank, "Introduction to Reversible Computing"
1.E+06
1.E+05
1.E+04
1.E-31
1.E+03
26
(PATENT PENDING, UNIVERSITY OF FLORIDA)
MEMS Resonator (One Concept)
Arm anchored to nodal points of fixed-fixed beam flexures,
located a little ways away, in both directions (for symmetry)
Moving metal plate support arm/electrode
Moving
plate Range of Motion
z
Phase 0° electrode
C(θ)
0°
θ
11/7/2015
360°
Repeat
interdigitated
structure
arbitrarily many
times along y axis,
all anchored to the
same flexure
Phase 180° electrode
x
C(θ)
0°
θ
M. Frank, "Introduction to Reversible Computing"
y
360°
27
A Challenge for Our Community
• I suspect that the field’s critics will never be silenced by theory
and simulations alone…
– To prove to the world that reversible computing can really work will
require a complete empirical demonstration.
• We thus cannot afford to continue to sweep issues such as
resonator design under the rug…
– A convincing demonstration of low total system power must be
completely self-contained, including the resonator.
• with only DC power input as needed to keep it running
• My challenge to us:
– Let’s work together to fabricate and empirically demonstrate a simple
test chip (e.g., a binary counter) that measurably dissipates much less
than the logic signal energy, and eventually much less than some small
multiple of kT energy (within a room temperature environment)
• Where this measures “wall-plug” power, as our critics like to put it.
11/7/2015
M. Frank, "Introduction to Reversible Computing"
28
Public Relations Challenges
• Difficulty: Reversible computing is little known
– And people have a lot of misconceptions about it.
• We need to strive to do better at things like:
– Educating the broader science, engineering, and
CS community about the field
• Including overcoming misconceptions and prejudices
– Gaining “political” standing with funding agencies,
industry, investors, professional organizations
• To lead to the “next level” of more intensive research
– Working collaboratively with colleagues in other
disciplines (outside CS) who have relevant skills
• Device physicists, analog circuit designers, etc.
11/7/2015
M. Frank, "Introduction to Reversible Computing"
29
Conclusions
• Reversible computing will very likely become
necessary within our lifetimes,
– if we are to continue progress in computing
performance/power.
• Much progress in our understanding of RC
has been made in the past three decades…
– But much important work still remains to be done.
• Let’s work together to solve the difficult
technological challenges, as well as to raise
awareness & improve perceptions of the field.
– I hope this workshop will help that to happen
11/7/2015
M. Frank, "Introduction to Reversible Computing"
30
Structure of Today’s Session
• Sub-session 1: Perspectives on RC (-11:00 am)
– Bennett’s keynote, this introductory talk
– Eric DeBenedictis on supercomputing apps
• Sub-session 2: Novel Impl. Techs. (11:20-12:50)
– Sarah Frost, Notre Dame, RC with Quantum Dots
– Erik Forsberg, KTH/Zhejiang, Y-branch switches
• Sub-session 3: Quasi-reversible circuits (2-3:50)
– Four talks, groups from USA, Korea, Germany
• Sub-session 4: Rev. comp. theory (4:20-5:20)
– Paul Vitanyi, time/space/energy tradeoffs
– Levitin & Toffoli, on thermodynamic limits of RC
• Panel Discussion: What next steps should we take?
11/7/2015
M. Frank, "Introduction to Reversible Computing"
31