Reversible Computing A Brief Introduction Dr. Michael P. Frank [email protected] Dept. of Computer & Information Science & Engineering (Affil.

Download Report

Transcript Reversible Computing A Brief Introduction Dr. Michael P. Frank [email protected] Dept. of Computer & Information Science & Engineering (Affil.

Reversible Computing
A Brief Introduction
Dr. Michael P. Frank
[email protected]
Dept. of Computer & Information Science & Engineering
(Affil. Dept. of Electrical & Computer Engineering)
University of Florida, Gainesville, Florida
Presented at:
2004 Computing Beyond Silicon Summer School (Week 4)
California Institute of Technology
Pasadena, California, July 6-8, 2004
Abstract
• The performance of power-limited computing systems is
directly limited by the energy efficiency of logic operations.
Performance (ops / time) = Power (energy dissipated / time) ×
Energy efficiency (ops / energy dissipated)
• Traditional logic techniques are approaching a number of
very general physical limits on energy efficiency.
– Due to quite fundamental thermodynamic considerations.
• The only potential way to circumvent all of these limits is
through (logically & physically) reversible computing (RC).
– It is related to quantum computing, but easier in some ways.
• RC appears to be doable, but it is still very challenging…
– But, it is a challenge that we must meet, for continued progress.
• In this talk, we survey fundamental concepts, available
technologies, and outstanding problems of RC.
Moore’sMoore's
Law
–
Devices
per
IC
Law - Transistors per Chip
1,000,000,000
Madison
Itanium 2
P4
P3
Intel µpu’s
P2
486DX Pentium
386
286
8086
100,000,000
10,000,000
1,000,000
100,000
10,000
4004
1,000
Early
Fairchild
10
ICs
Avg. increase
of 57%/year
100
1
1950
1960
1970
1980
1990
2000
2010
ITRS
'03 Feature
Lengths
Device Size
Scaling
Trends
1000 (1 µm)
350
250
Feature length (nm)
180
130
100
DRAM hp
MPU M1 hp
poly hp
printed GL
physical GL
Node
EOT
Based on ITRS ’97-03 roadmaps
Virus
90
65
45
32
22
10 Protein molecule
1 DNA/CNT radius
Silicon atom
0.1 Hydrogen atom
1990
1995
2000
2005
2010
2015
2020
2025
Year of Production
2030
2035
2040
2045
ITRS '97-'03
Gate Energy Trends
Trend of Minimum
Transistor
Switching Energy
Based on ITRS ’97-03 roadmaps
1.E-14
250
LP min gate energy, aJ
HP min gate energy, aJ
100 k(300 K)
ln(2) k(300 K)
1 eV
k(300 K)
180
1.E-15
130
90
65
CVV/2 energy, J
1.E-16
45
32
1.E-17
fJ
22
Practical limit for CMOS?
1.E-18
aJ
Room-temperature 100 kT reliability limit
One electron volt
1.E-19
1.E-20
Room-temperature kT thermal energy
Room-temperature von Neumann - Landauer limit
zJ
1.E-21
1.E-22
1995
2000
2005
2010
2015
2020
Year
2025
2030
2035
2040
2045
The Leakage Problem
• The primary traditional approach to decrease energy dissipation per
logic-op has been:
– Simply decrease the magnitude of the ½CV2 energy that is stored per bit.
• This is done by moving to smaller transistor structures, which
decreases C and usable V.
– However, as V decreases, there is a problem.
• An upper bound on the on/off ratio Ron/off = Ion/Ioff of transistors is
given by the relation log Ron/off ≲ V/s.
– The parameter s is called the subthreshold slope.
• Typical units: mV/decade (decade = log 10)
• The exact value of s depends on the precise device geometry,
– It is reduced by going to multi-gate or surround-gate structures.
• But, s has a fundamental room-temperature T minimum of
s ≥ T/q = (kT/q ln 10)/decade ≈ 60 mV/decade in FETs, independent
of materials! (Whether carbon nanotubes, Si nanowires, etc.
– This is just due to the ratio between above-barrier state occupancy
probabilities for a change in barrier height of V. (From Boltzmann distrib.)
• At low voltages (e.g., a few hundred mV), transistors can’t turn off
effectively, and there is substantial continuous power dissipation.
– Leakage already accounts for as much as 40% of total power in many
designs!
A Fairly Conventional “Optimistic”
Technology Scenario for CMOS
• Suppose device lengths are cut in half every 3 years…
– From 90 nm today down to 22 nm node in 2010 (then stop).
– Node capacitances, gate delays also decrease accordingly…
• “Technology boosters” such as high-κ dielectrics &
novel FET structures (FinFET, surround-gate, etc.)
keep leakage power manageable, for a little while…
– However, the absolute minimum room-T subthreshold slope
for FETs will remain 60 mV/decade! (= (kT/q)/log 10)
• Assume this point is also reached by around 2007.
• Voltages then reach a minimum of ~0.5V in 2007.
– Can’t go lower while keeping on/off ratio above 108 level!
• A minimum level chosen so as to keep leakage small
• Now, consider what all this implies about future chip
performance, given a 100 W maximum power level…
– Let max raw performance = 100 W / (½CV2 gate energy)
Not much life left for standard CMOS…
CMOS Raw Performance - "Optimistic" Scenario
Device-ops/second per 100W chip
1.00E+19
e.g. 825 million devices actively switching
@ 4 GHz, ~7,000 kT dissip. per device-op
1.00E+18
CMOS
e.g., 67 million devices
actively switching @ 3 GHz
1.00E+17
2004
2006
2008
2010
2012
2014
2016
2018
2020
Year
Now, even if the leakage problem were solved, the ~100 kT limit for
reliable switching is only another factor of 70 beyond this point!
Reversible Computing
Motivation & Basic Concepts
Landauer’s (1961) principle:
(Was hinted at by
von Neumann ’49)
The minimum energy cost of oblivious bit erasure
Before bit erasure:
s0
0
0
…
…
sN−1
t0
0
…
N
possible
distinct
states
After bit erasure:
tN−1
0
tN
0
Unitary
(one-to-one)
evolution
s′0
1
1
…
s′N−1
…
…
…
N
possible
distinct
states
2N
possible
distinct
states
t2N−1
0
Increase in entropy: ∆S = log 2 = k ln 2. Energy dissipated to heat: T∆S = kT ln 2
Non-oblivious “erasure” (by decomputing known
bits) avoids the von Neumann–Landauer bound
Before decomputing B:
A
s0
B
A
t0
0 0
sN−1
B
0 0
…
A
…
…
N
possible
distinct
states
After decomputing B:
tN−1
B
0 0
A
B
0 0
N
possible
distinct
states
Unitary
(one-to-one)
A
s′0
1 1
A
B
1 1
t′0
…
s′N−1
evolution
…
…
N
possible
distinct
states
B
t′N−1
A
B
1 0
A
B
1 0
N
possible
distinct
states
Increase in entropy: ∆S → 0. Energy dissipated to heat: T∆S → 0
Reversible Computing
• A reversible digital logic operation is:
– Any operation that performs an invertible (one-to-one)
transformation of the device’s local digital state space.
• Or at least, of that subset of states that are actually used in a design.
• Landauer’s principle only limits the energy dissipation of
ordinary irreversible (many-to-one) logic operations.
– Reversible logic operations can dissipate much less energy,
• Since they can be implemented in a thermodynamically reversible way.
• In 1973, Charles Bennett (IBM Research) showed how
any desired computation can in fact be performed using
only reversible operations (with basically no bit erasure).
– This opened up the possibility of a vastly more energy-efficient
alternative paradigm for digital computation.
• After 30 years of (sporadic) research, this idea is finally
approaching the realm of practical implementability…
– Making it happen is the goal of the RevComp project at UF.
Requirements for Reversible vs.
Quantum Computing
Property of
Computing
Mechanism
Approximate Meaning
Required for
Quantum
Computing?
Required for
Reversible
Computing?
System’s full invertible
quantum evolution, w. all
phase information, is
modeled & tracked
Yes, device & system
evolution must be
modeled as ~unitary,
within threshold
No, only reversible
evolution of classical
state variables must be
modeled & tracked
Coherent
Pure quantum states
don’t decohere (for us)
into statistical mixtures
Yes, must maintain full
global coherence,
locally within threshold
No, only maintain
stability of local pointer
states & transitions
Adiabatic
No heat flow in/out of
computational subsystem
Yes, must be above a
certain threshold
Yes, adiabaticity as
high as possible
No new entropy generated
by mechanism
Yes, must be above a
certain threshold
Yes, isentropicity as
high as possible
Time-Independent
Hamiltonian,
Self-Controlled
Closed system, evolves
autonomously w/o
external control
No, transitions can be
externally timed &
controlled
Yes, if we care about
energy dissipation in
the driving system
Ballistic
System evolves w. net
forward momentum
No, transitions can be
externally driven
Yes, if we care about
performance
(Treated As)
Unitary
Isentropic /
Thermodynamically
Reversible
Some Doubts and Their Answers
Some Claims Against Reversible Computing
Eventual Resolution of Claim
John von Neumann, 1949 – Offhandedly claims during a lecture that computing
requires kT ln 2 dissipation per “elementary act of decision” (bit-operation).
No proof provided. Twelve years later, Rolf Landauer of IBM tries valiantly to
prove it, but succeeds only for logically irreversible operations.
Rolf Landauer, 1961 – Proposes that the logically irreversible operations which
necessarily cause dissipation are unavoidable.
Landauer’s argument for unavoidability of logically irreversible operations was
conclusively refuted by Bennett’s 1973 paper.
Bennett’s 1973 construction is criticized for using too much memory.
Bennett devises a more space-efficient version of the algorithm in 1989.
Bennett’s models criticized by various parties for depending on random Brownian
motion, and not making steady forward progress.
Fredkin and Toffoli at MIT, 1980, provide ballistic “billiard ball” model of
reversible computing that makes steady progress.
Various parties note that Fredkin’s original classical-mechanical billiard-ball model
is chaotically unstable.
Zurek, 1984, shows that quantum models can avoid the chaotic instabilities.
(Though there are workable classical ways to fix the problem also.)
Various parties propose that classical reversible logic principles won’t work at the
nanoscale, for unspecified or vaguely-stated reasons.
Drexler, 1980’s, designs various mechanical nanoscale reversible logics and
carefully analyzes their energy dissipation.
Carver Mead, CalTech, 1980 – Attempts to show that the kT bound is unavoidable
in electronic devices, via a collection of counter-examples.
No general proof provided. Later he asked Feynman about the issue; in 1985
Feynman provided a quantum-mechanical model of reversible computing.
Various parties point out that Feynman’s model only supports serial computation.
Margolus at MIT, 1990, demonstrates a parallel quantum model of reversible
computing—but only with 1 dimension of parallelism.
People question whether the various theoretical models can be validated with a
working electronic implementation.
Seitz and colleagues at CalTech, 1985, demonstrate
circuits using adiabatic switching principles.
Seitz, 1985—Has some working circuits, unsure if arbitrary logic is possible.
Koller & Athas, Hall, and Merkle (1992) separately devise general reversible
combinational logics.
Koller & Athas, 1992 – Conjecture reversible sequential feedback logic impossible.
Younis & Knight @MIT do reversible sequential, pipelineable circuits in 1993-94.
Some computer architects wonder whether the constraint of reversible logic leads to
unreasonable design convolutions.
Vieri, Frank and coworkers at MIT, 1995-99, refute these qualms by demonstrating
straightforward designs for fully-reversible, scalable gate arrays,
microprocessors, and instruction sets.
Some computer science theorists suggest that the algorithmic overheads of
reversible computing might outweigh their practical benefits.
Frank, 1997-2003, publishes a variety of rigorous theoretical analysis refuting these
claims for the most general classes of applications.
Various parties point out that high-quality power supplies for adiabatic circuits seem
difficult to build electronically.
Frank, 2000, suggests microscale/nanoscale electromechanical resonators for highquality energy recovery with desired waveform shape and frequency.
Frank, 2002—Briefly wonders if synchronization of parallel reversible computation
in 3 dimensions (not covered by Margolus) might not be possible.
Later that year, Frank devises a simple mechanical model showing that parallel
reversible systems can indeed be synchronized locally in 3 dimensions.
working energy recovery
Adiabatic Circuits
• Reversible logic can be implemented today using fairly
ordinary voltage-coded CMOS VLSI circuits.
– With a few changes to the logic-gate/circuit architecture.
• We avoid dissipating most of the circuit node energy
when switching, by transferring charges in a nearly
adiabatic (literally, “without flow of heat”) fashion.
– I.e., asymptotically thermodynamically reversible.
• In the limit, as various low-level technology parameters are scaled.
• There are many designs for purported “adiabatic” circuits
in the literature, but most of them contain fatal flaws and
are not truly adiabatic.
– Many past designers are unaware of (or accidentally failed to
meet) all the requirements for true thermodynamic reversibility.
Reversible and/or Adiabatic VLSI Chips
Designed @ MIT, 1996-1999
By Frank and other then-students in the MIT Reversible Computing group,
under CS/AI lab members Tom Knight and Norm Margolus.
AND
Transition Tables
•
•
Recall how a truth table for Boolean
logic lists all possible input
combinations on the left, and the
corresponding output(s) on the right.
Q
0
0
0
0
1
0
1
0
0
1
1
1
A transition table is a similar device designed to allow us to easily distinguish
reversible operations from irreversible ones.
– We list each combination of all local bits
once in both “before” and “after” columns.
• Corresponding to just before the operation begins,
and just after it is completely finished.
– We draw an arrow from each before state
to the particular after state that it transforms to.
• Red if the transition is dissipative, green otherwise.
– Must obey the following rule: Only one of the
arrows going into any given after state may be green.
Before After
in out in out
00
00
01
10
01
10
11
11
Before
CD
00
01
10
After
CD
00
01
11
11
10
• It is convenient to order the after column so that
all the green arrows go straight horizontally.
– It may be that only a subset of the input and/or output
states arise in the context of a given circuit design.
• We may “fade away” the particular states and
transitions which never arise.
– An operation is always reversible iff there are no
red arrows in the table.
• This means the operation is one-to-one.
– An operation is reversible in context iff there are
no un-faded red arrows in the resulting table.
•
A B
• I.e., the operation is 1-1 on the states that arise.
We will find these tables to be very useful.
Standard inverter
(present-day
“NOT gate”)
operation.
Function:
out := ¬ in
Usually irreversible.
Only reversible in
the context that its
input never changes!
cNOT
(controlled-NOT)
“gate” (operation)
Function:
D = C
Always reversible.
Bistable Potential-Energy Wells
A Technology-Independent Model of Digital Devices
(Landauer ’61)
• Consider any system having an (adjustable) potential
energy surface (PES) in its configuration space.
– The PES should have at least two local minima (or wells)
– Therefore the system is bistable
• It has two stable (or at least metastable) configurations
– Located at well bottoms
– One state can represent 0, the other 1.
• This picture can also be easily generalized to
larger numbers of stable states.
• Consider now the PES having
two adjustable parameters:
– (1) “Height” (energy) of the potential energy
barrier between wells, relative to well bottoms
– (2) Relative height of the left and right
states in the well (call this “bias”)
Potential
energy
• The two stable states form a natural bit.
0
1
Generalized
configuration
coordinate
Possible Parameter Settings
• In the following slides, we will distinguish
six qualitatively different settings of the
well parameters, as shown below…
Raised
Barrier
Height
Lowered
Left
Neutral
Direction of Bias Force
Right
Box
spring
Bias rod
Rightward
bias
Fixed
sleeve
bearing
Gate rod
One Mechanical Implementation
State
knob
Barrier
wedge
Barrier up
Barrier down
Leftward
bias
MOSFET Implementation
• The logical state is in the location of a charge packet
(excess of electrons) on either side terminal of a FET.
– The charge packet might even consist of just a single excess
electron in a sufficiently small (nanoscale) logic node.
• The potential energy barrier is provided by the built-in
voltage across the PN junctions in the FET.
– The barrier height is lowered when the device is turned on
by adjusting the voltage on the gate electrode.
• Bias forces can be provided by (e.g.) capacitive
coupling to nearby electrodes.
n
e e e
p
n
Possible Well Transitions
• Catalog of all the possible transitions in
the bistable wells, adiabatic & not...
(Ignoring
superposition
states.)
– We can characterize a wide variety of digital
logic and memory styles in terms of how their
operation corresponds to subgraphs of this diagram.
1
leak
0
0
0
Barrier
Height
∆E
0
1
1
k ln 2
N
Direction of Bias Force
leak
∆E
1
“1”
states
“0”
states
Logic & Memory Styles
All describable within the potential-well paradigm!
• Irreversible styles:
– Input-barrier, fixed-bias logic.
• E.g. standard static CMOS inverters & combinational gates.
– Input-bias, clocked-barrier latching.
• Standard static CMOS latches, dynamic RAM cells, etc.
• Reversible styles:
– Type 1: Input-bias, clocked-barrier latching.
– Type 2: Input-barrier, clocked-bias logic.
– Type 3: Input-barrier, clocked-bias latching logic.
• All of these are available in a very wide variety of
different physical instantiations of the bistable well.
– E.g., CMOS, superconducting, quantum-dot, Y-branch
switches, mechanical implementations, etc.
Ordinary Irreversible Logics
• Principle of operation: Lower a barrier, or not,
based on input. Series/parallel combinations of
barriers do logic. Major
1
dissipation in at least one of
the
possible
transitions.
Input
changes,
barrier
lowered
0
0
• Can amplify input signals.
Example: Ordinary CMOS logics
Output
irreversibly
changed to 0
Irreversible SET/CLR operations
• Irreversible SET: Turn on a pFET connecting node B to a high
voltage source.
SET operation
B
B
B
Voltage
color scheme:
Low / High
½CV2
B
B
before after
0
0
1
1
• Irreversible CLR: Turn on an nFET connecting node B to a
low voltage source.
CLR operation
B
B
½CV2
B
B
B
before after
0
0
1
1
Conventional Logic is Irreversible
Even a simple NOT gate, as it’s traditionally implemented!
• Here’s what all of today’s logic gates (including NOT)
do continually, i.e., every time their input changes:
–
–
–
–
They overwrite previous output with a function of their input.
Performs many-to-one transformation of local digital state!
 required to dissipate ≳kT on avg., by Landauer principle
Incurs ½CV2 energy dissipation when the output changes.
Example:
Static CMOS Inverter:
in
out
Inverter transition table:
Just before
After
transition:
transition:
in out
0 0
0 1
1 0
1 1
in out
0
1
1
0
Example: Standard CMOS Inverter
Power (Vdd)
on
In
=0
Out
=1
off
Ground (0V)
Barrier
lowered
Charge
Vdd
falls in
Out
Power (Vdd)
Input
goes
high
off
In
=1
on
Input
goes
low
Barrier btwn.
Out and Ground
lowered, charge
“falls” to lower
energy level
Ground (0V)
Voltage color scheme:
Low / High
Barrier
raised
Simplified
← picture →
of PES
GND
Out = 0
Barrier
lowered
Charge falls out
Vdd
Out
GND
Spacetime Logic Network Diagrams
• In this general class of diagrams (popular in reversible & quantum logic),
– Time is plotted in one direction, often left→right,
– Horizontal lines denote locations (nodes, bits of state).
– Operations (potential change events) are denoted by icons on
and/or connections between bit-lines.
• Please keep in mind: These diagrams do not directly depict the spatial
structure of how a physical circuit is wired!
– E.g., a long horizontal line denotes the evolution of a localized node in a physical
circuit over a long period of time, not a long, spatially extended wire.
– A vertical connection between lines or an icon on a line (often called a “gate”)
denotes a momentary interaction event, not a perpetual physical link, or a
physical object.
Location
An icon denotes that O
potentially changes (whether
spontaneously or under
external control) at this time.
I
This arrow denotes that some
external event causes the value
of node I to change at this time.
The change in I is propagated so as to
cause node O to change a moment later.
O
Time
Inverter action in spacetime diagram
• Note: This notation makes it explicit that an ordinary
inverter’s real semantics is that it should carry out a
logically irreversible transformation of its output node.
Some outside
influence causes
In to possibly
change here
Location
In
The “×” icon denotes
that the old value
of Out gets obliviously overwritten
Out
Time
This (standard) icon denotes
that In’s value gets copied
(with gain & delay) & inverted
to produce the new Out.
Possible Well Transitions
• Catalog of all the possible transitions in
the bistable wells, adiabatic & not...
(Ignoring
superposition
states.)
– We can characterize a wide variety of digital
logic and memory styles in terms of how their
operation corresponds to subgraphs of this diagram.
1
leak
0
0
0
Barrier
Height
∆E
0
1
1
k ln 2
N
Direction of Bias Force
leak
∆E
1
“1”
states
“0”
states
Ordinary Irreversible Memory
• (1) Lower a barrier, obliviously erasing stored
information. (2) Apply an input bias. (3) Raise
the barrier to latch the new information
into place. (4) Remove input
(4)
Retract
1
bias.
input
(1) and (2) can
also be in the
opposite order
Examples:
ordinary
DRAM cell,
rod logic
register
(4)
Retract
input
0
Barrier
up
0
(3)
Input
“0”
0
Dissipation
here can be
made as low
as kT ln 2
(2)
(1)
N
Barrier
up
Input
“1”
(2)
1
1
(3)
Example: NMOS latch / DRAM cell
• Sequence corresponds exactly to general
picture illustrated on previous slide.
I
off
M
I
I
off
M
(1)
Oblivious
erasure
on
Voltage color scheme:
Low / Medium / High
I
on
M
I
off
M
I
off
M
I
on
M
I
off
M
I
off
M
M
(2)
Apply
input
bias
Could also do these in the other order also
(3)
Raise
barrier
(4)
Remove
input
bias
(& back
to start)
Irreversible latch in spacetime diagram
• Again, this notation makes it clear that irreversible
behavior is occurring.
Location
Outside
influence causes
I to possibly
change here
I may change
again later without
necessarily
affecting value of M
I
The “×” & arrow denotes
that the old value of M
gets obliviously erased or
overwritten by I when
barrier is
lowered
M
Later arrow denotes that
I gets reflected (without gain) in
location M with a small delay
Barrier is raised shortly
afterwards (end of shaded area)
Time
Conventional vs. Adiabatic Charging
For charging a capacitive load C through a voltage swing V
• Conventional charging:
– Constant voltage source
• Ideal adiabatic charging:
– Constant current source
Q=CV
Q=CV
V
I
C
– Energy dissipated:
Ediss  CV
1
2
R
C
– Energy dissipated:
2
Ediss
2
Q
R
2
2 RC
 I Rt 
 CV
t
t
Note: Adiabatic beats conventional by advantage factor A = t/2RC.
Adiabatic Switching with MOSFETs
Vg
• Use a voltage ramp to approximate
an ideal current source.
~R
+
V
• Switch conditionally,
C
−
Q=CV
if MOSFET gate voltage
t
Vg > V+VT during ramp.
• Can discharge the load later using a similar ramp.
– Either through the same path, or a different path.
t ≫ RC  Ediss
RC
 CV
t
2
t ≪ RC  Ediss  12 CV
2
Exact formula:
 

Ediss  s 1  s e1/ s 1 CV 2
given speed fraction
s : RC/t
Athas ’96, Tzartzanis ‘98
Requirements for True Adiabatic Logic
in Voltage-coded, FET-based circuits
• Avoid passing current through diodes.
– Crossing the “diode drop” leads to irreducible dissipation.
• Follow a “dry switching” discipline (in the relay lingo):
– Never turn on a transistor when VDS ≠ 0.
– Never turn off a transistor when IDS ≠ 0.
• Together these rules imply:
Important
but often
neglected!
– The logic design must be logically reversible
• There is no way to erase information under these rules!
– Transitions must be driven by a quasi-trapezoidal waveform
• It must be generated resonantly, with high Q
• Of course, leakage power must also be kept manageable.
– Because of this, the optimal design point will not necessarily
use the smallest devices that can ever be manufactured!
• Since the smallest devices may have insoluble problems with leakage.
Possible Well Transitions
• Catalog of all the possible transitions in
the bistable wells, adiabatic & not...
(Ignoring
superposition
states.)
– We can characterize a wide variety of digital
logic and memory styles in terms of how their
operation corresponds to subgraphs of this diagram.
1
leak
0
0
0
Barrier
Height
∆E
0
1
1
k ln 2
N
Direction of Bias Force
leak
∆E
1
“1”
states
“0”
states
Erasing Digital Entropy
• Note that if the information in a bit-system is already entropy,
– Then erasing it just moves this entropy to the surroundings.
– This can be done with a thermodynamically reversible process, and does not
necessarily increase total entropy!
• However, if/when we take a bit that is known, and irrevocably
commit ourselves to thereafter treating it as if it were unknown,
– that is the true irreversible step,
– and that is when the entropy is
effectively generated!!
0 ?1
0
1
This state contains 1 bit
of decomputable information,
in a stable, “digital” form
This state contains 1 bit
of physical entropy, but in
a stable, “digital” form
Note: This transformation is reversible!!
0
N
In these 3 states, there is no
entropy in the digital state;
it has all been pushed out
into the environment.
Reversible Set (rSET) & Clear (rCLR)
• rSET operation semantics: Given assurance that a bit is initially 0,
unconditionally change it to 1.
– To implement: Traverse the adiabat (reversible trajectory) shown below.
• Reverse this path to perform rCLR.
(6)
1
(1)
0
Barrier
Height
Get work
out
1
Put work
back in
0
(5)
(2)
(3)
0
N
(4)
Direction of Bias Force
1
“1”
states
“0”
states
Taking rSET & rCLR out of context
• What happens if we attempt to perform rSET on a bit that is already a 1?
– It still ends up with the right value (1), but…
– Irreversible dissipation occurs in step 2 (when barrier is lowered), as shown below.
• Similarly if we try to rCLR a 0.
(1)
1
1
(takes
work to
raise 1)
(2)
Barrier
Height
1
(takes
work to
raise 1)
(5)
(dissipates
it as heat)
(3)
0
(6)
N
(4)
Direction of Bias Force
1
“1”
states
“0”
states
rSET/rCLR transition tables
• Note that these tables are not reversible according to the strict
traditional definition…
– Since they don’t represent a 1-1 transformation of all possible input
states.
• However, if we restrict our use of these operations so as to
always avoid the input states that actually result in dissipation,
– Then, we obtain a 1-1 transformation of the subset of the input states that
are actually used,
– And that is the correct statement of the true logical requirement for
avoiding Landauer’s principle!
Before After
rSET rSET
0
1
1
Before After
rCLR rCLR
0
1
0
Type 1: Input-Bias Clocked-Barrier
Reversible Latching (& Logic)
• Cycle of operation:
(Can amplify/restore input signal
– (1) Data input applies bias in the barrier-raising step.)
• Add forces to do majority logic
– (2) Clock signal raises barrier
– (3) Data input bias removed
(3)
1
1
(4)
Can reset latch
reversibly (4)
given copy of
contents.
(3)
0
0
(2) (4)
(4)
(4)
Examples: Adiabatic
QCA, SCRL latch, Rod
logic latch, PQ logic,
Buckled logic, Helical logic
(2)
(1)
0
(4)
N
(1)
(4)
1
Type 1 Example: Adiabatic
NMOS latch / DRAM cell
• Same as irrev. latch, just skip the erasure step!
Voltage color scheme:
Low / Medium / High
I
on
M
I
off
M
I
off
M
I
Can similarly use a
CMOS transmission
gate (nFET/pFET pair)
(1)
to latch a full-swing
Apply
signal if necessary.
on
M
I
off
M
I
off
M
I
on
M
input
bias
(2)
Raise
barrier
(3)
Remove
input
bias
(Reverse steps
to reversibly
unlatch M)
A Simple Reversible CMOS Latch
• Uses a single standard CMOS transmission gate (T-gate).
• Sequence of operation:
(0) input level initially tied to latch ‘contents’ (output);
(1) input changes gradually  output follows closely;
(2) latch closes, charge is stored dynamically (node floats);
(3) afterwards, the input signal can be removed.
Before
input:
in out
0 0
P
in
out
P
“Reversible latch”
(0) (1) (2) (3)
Input
arrived:
in out
0 0
1 1
Input
removed:
in out
0 0
0 1
• Later, we can reversibly
“unlatch” the data with
an exactly time-reversed
sequence of steps.
Reversible latch in spacetime diagram
Location
Outside
influence causes
I to possibly
change here
I may be restored to
neutral again later
without necessarily
affecting value of M
I
Arrow to dotted line denotes that change
to I is reversibly carried through (without
gain) to location M at this time (energy
transferred into I is also fanned out to M)
Dotted lines denote that these
nodes contain no information
at these times (they are in
a predetermined state)
M
Barrier is raised some time
afterwards (end of shaded area)
Barrier is lowered some time
in here (start of shaded area)
Time
Unlatching
sequence:
I
M
Time
Note this operation is
reversible only if I and M
match up exactly when they
are first connected together!
Simplified Version of Diagram
• Suppose the signal on the input node I was produced
as a temporary copy of some origin node O.
– We will see how to implement this reversibly later.
• Then for simplicity of our diagrams, we may wish to
omit explicit representation of the intermediate node I.
– However, we must keep in mind that there is then a small
additional space usage not explicitly shown in the diagram.
O
O
“Reversible
copy”
I
M
M
Time
Time
Type 2: Input-Barrier, Clocked-Bias
Reversible Retractile Logic
• Cycle of operation:
– (1) Inputs raise or lower barriers
• Do logic w. series/parallel barriers
• Barrier signal is amplified!
Gain, restoring logic, fan-out.
• Must reset output prior to
changing input.
• Combinational logic only!
– (2) Clock applies bias force, which changes state, or not
0
0
0
(1) Input barrier height
Examples:
Hall’s logic,
SCRL gates,
Rod logic interlocks
0
N
1
(2) Clocked bias force applied 
Type 2 example: Adiabatic CMOS
“buffer” (really, a cSET/cCLR gate)
• Controlled-SET / controlled-CLEAR.
• Structure: Essentially just a pair of CMOS transmission gates
– 2 transistors each, an nFET and a pFET in parallel
• Using dual-rail signaling, we can reversibly set or clear a bit on an
unoccupied logic node (pair of voltage nodes), conditionally on an input
node.
– Amplifies input signal.
– Fully restores logic levels.
DriveN
DriveN
InN
InP
on
InN
OutN
DriveN
off
InN
InP
off
DriveN
InP
OutN
OutN
(And similarly for OutP)
InP
on
OutN
Voltage color scheme:
Low / High
InN
DriveN
InN
off
InP
OutN
Spacetime diagram for buffer
• Subscript NP notation denotes shorthand for dual-rail NP pair of wires.
– Still denotes a single logical bit.
• Diagram emphasizes that the buffer copies InNP’s value to a new location.
– The value simultaneously remains available in the old location.
• Dotted horizontal line shows that OutNP is empty prior to the operation.
– The absence of “×” icon shows that the operation is reversible.
• Buffer icon indicates that the input signal is being amplified and restored.
– Note that the input comes from InNP, not from previous value of OutNP.
• Downward wedges remind us the output remains dependent on the input.
– Input can’t be changed without (possibly) irreversibly destroying output.
• Fortunately, the buffer’s entire operation sequence is reversible!
– So, sometime later on, we can unbuffer the output,
• and then we are free to change the input.
InNP
…
OutNP
InNP
Input value
can be
changed
afterwards.
Restored to null.
OutNP
Time
Time
A
Reversible Buffered Latch
• Uses two dual-rail T-gates.
• Combines a buffer and latch.
This is our icon for a
CMOS transmission
gate (T-gate). It says
that nodes A and B
are connected whenever
the control signal
CNP has logic value 1.
CNP
B
– Reversibly copies InNP to
Spacetime diagram for operation sequence:
MemNP when operated.
In
NP
Physical structure:
IntNP
DriveNP
MemNP
InNP
LatchNP
MemNP
IntNP
Implements “reversible copy”:
InNP
MemNP
Transition Table for cSET
• It is not always reversible,
– Not a one-to-one transformation of all possible
local states,
• But, it is reversible in context
– I.e., in the context that input state 1,1 is avoided.
Before cSET
After cSET
Source Destination Source Destination
0
0
0
0
0
1
0
1
1
0
1
1
1
1
1
0
Type 2 example: SCRL inverter (w/o latch)
• Same structure as static CMOS inverter, but used reversibly.
• Produces a fully-restored, amplified output signal.
• Inverters can be cascaded, but need latches to get feedback.
driveH
driveH
In
In
off
In
off
Out
on
driveL
driveL
driveH
driveH
on
on
In
Voltage color scheme:
Low / Medium / High
off
Out
on
off
Out
driveL
driveH
Out
In
Out
off
off
driveL
driveL
SCRL Inverter Transition Table
Before
After
SCRL-Inv SCRL-Inv
In Out
In Out
0 0
0 ½
0 1
0 1
½ 0
½ ½
½ 1
1 0
1 ½
1 1
1 0
• Reversible in context, if input
is valid and output is ½ just
before drivers do their thing.
• No point in even listing the
table entries that don’t occur;
can summarize operation below.
Before
After
SCRL-Inv SCRL-Inv
In Out
In Out
0 ½
0 1
1 ½
1 0
Spacetime Diagram for SCRL Inverter
• Note that the notation shows that Out is being
computed from In on a separate wire.
– In is explicitly not being inverted “in place.”
• Wedge symbols show ongoing dependence.
– Of course, we can always undo the op later.
In
Out
…
Example: Adiabatic NMOS OR gate
• Together
A
A
Out
Drive
B
A
A
B
B
A
B
A
Out
Drive
B
Out
Drive
Out
Drive
B
• Reverse sequence
decomputes Out.
• Can’t change A,B
freely until then.
B
A
Out
Drive
Out
Drive
B
A
Out
Drive
Out = A  B
B
A
Out
Drive
Out
Drive
• With NMOS, Out
is weak (orange).
• Can use an SCRL
inverter to restore
the signal levels.
• If appropriately
biased…
• Or, just use CMOS
transmission gates
instead (8T OR)
Type 3: Input-Barrier, Clocked-Bias
Latching Logic
● Cycle of operation:
1. Input conditionally lowers barrier
•
Do logic w. series/parallel barriers
2. Clock applies bias force; conditional bit flip
3. Input removed, raising the barrier &
(4)
locking in the state-change
(4)
4. Clock
0
0
bias can 0
(2)
(2)
retract
(1)
Examples: Mike’s
4-cycle 2-level adiabatic
CMOS logic (2LAL)
(2)
0
N
(2)
1
(3)
1
2LAL: 2-level Adiabatic Logic
A pipelined fully-adiabatic logic invented at UF (Spring 2000),
implementable using ordinary CMOS transistors.
TN
T
• Use simplified T-gate symbol:
1
• Basic buffer element:
– cross-coupled T-gates:
• need 8 transistors to
buffer 1 dual-rail signal
in
0
out
• Only 4 timing signals 0-3 are
needed. Only 4 ticks per cycle:
– i rises during ticks t≡i (mod 4)
– i falls during ticks t≡i+2 (mod 4)
2
:
(implicit
dual-rail
encoding
everywhere)
TP
Animation:
0
1
2
3
Tick #
0 1 2 3…
2LAL Cycle of Operation
Tick #0
Tick #1
in1
in
Tick #2
11
in0
Tick #3
10
out1
01
in=0
01
00
11
out0
out=0
00
A Schematic Notation for 2LAL
PP
(a)
P
PN
A
B
≡
A
B
PN
≡
A
A
out
≡
in
B
PN
PN
t
A
B
φt mod 4
(b)
in
(e)
PP
≡
AB
t
t
B
AB
A
B
A+B
t
≡
out
A
t
out
≡
t
int-1
A
outt
t-1
≡
~A
(h)
t
A+B
B
(g)
(c)
in
A
(f)
PN
PN
A
t
~A=0
A=0
A
2
2
A=1
~A=1
(d)
A=1
in0
1
2
3
4
5
A=1
A
out5
A
B
t
AB
≡
B=1
A=0
B=0
t
AB=1
t
AB=0
~A
2LAL Shift Register Structure
Animation:
• 1-tick delay per logic stage:
1
2
3
0
in@0
0
1
2
3
out@4
• Logic pulse timing and signal propagation:
0 1 2 3 ...
inN
inP
0 1 2 3 ...
More Complex Logic Functions
• Non-inverting multi-input Boolean functions:
A0
B0
0 AND gate
(plus delayed A)

A0
A1
OR gate
B0
(AB)1
(AB)1
• One way to do inverting functions in pipelined
logic is to use a quad-rail logic encoding:
– To invert, just
swap the rails!
• Zero-transistor
“inverters.”
A=0
AN
AP
AN
AP
A=1
Minimum Losses w. Leakage
topt 
Pleak

cE
Sleak
cS
Etot = Eadia + Eleak
Eleak = Pleak·tr
 2 PleakcE
 2T SleakcS
Eadia = cE / tr
UF CONFIDENTIAL – PATENT PENDING
MEMS Resonator Concept
A potential approach for efficiently
driving adiabatic logic transitions
The Power Supply Problem
• In adiabatics, the factor of reduction in energy
dissipated per switching event is limited to (at most)
the Q factor of the clock/power supply.
Qoverall = (Qlogic−1 + Qsupply−1)−1
• Electronic resonator designs typically have low Q
factors, due to considerations such as:
– Energy overhead of switching a clamping power MOSFET
to limit the voltage swing of a sinusoidal LC oscillator.
– Low coil count, substrate coupling in integrated inductors.
– Unfavorable scaling of inductor Q with frequency.
• Our proposed solution:
– Use electromechanical resonators instead!
MEMS (& NEMS) Resonators
• State of the art of technology demonstrated in lab:
– Frequencies up to the 100s of MHz, even GHz
– Q’s >10,000 in vacuum, several thousand even in air!
• An important emerging technology being explored
for use in RF filters, U. Mich., poly, f=156 MHz, Q=9,400
etc., in
communications
SoCs, e.g. for
34 µm
cellphones.
UF CONFIDENTIAL – PATENT PENDING
Original Concept
• Imagine a set of charged plates whose horizontal position
oscillates between two sets of interdigitated fixed plates.
– Structure forms a variable capacitor and voltage divider with the load.
• Capacitance changes substantially only when crossing border.
– Produces nearly flat-topped (quasi-trapezoidal) output waveforms.
– The two output signals have opposite phases (2 of the 4 φ’s in 2LAL)
Logic
load #2
Logic
load #1
V1
RL
CL
V2
RL
CL
x
t
V1
t
V2
t
UF CONFIDENTIAL – PATENT PENDING
MEMS Resonant Power Supply for
Ultra-Low-Power Adiabatic Circuits
A.k.a. The “AdiaMEMS” Project
•
Part of CISE’s Reversible & Quantum Computing group
– Collab. with Huikai Xie (MEMS, ECE dept.)
•
Goal: Demonstrate orders-of-magnitude improvement in
power-performance efficiency of digital CMOS circuits.
– Based on reversible logic in adiabatic circuits powered by
high-quality custom microelectromechanical resonators.
•
Funding: $40K seed grant from SRC’s Cross-Disciplinary
Semiconductor Research (CSR) Program
MEMS Designer:
Maojiao He
VLSI designer: Krishna Natarajan
UF CONFIDENTIAL – PATENT PENDING
Key Characteristics of Resonator
• Goal: Produce a near-ideal trapezoidal output voltage
waveform resonantly, with high Q.
• To be optimized with logic: Resonant frequency f.
• Key resonator figures of merit:
– Effective quality factor: Qeff = Etrans/Ediss.
– Area efficiency: EA = Etrans/A.
• Key resonator figures of demerit:
– Maximum relative transition slope:
smax = (dC/dt)max / (∆Cmax/∆ttrans)
– Fractional capacitance variation:
 dC 


vC = ∆Cvar / ∆Cmax
 dt 
max
∆ttrans
∆Cvar
∆Cmax
UF CONFIDENTIAL – PATENT PENDING
First MEMS Technology Tried
• MEMS process donated by Robert Bosch corp.
• It is a thin-film technology
– We have since moved to a multi-layer, bulk singlecrystal process which can be expected to do better.
• Integrated CMOS/MEMS devices will
eventually be available in this process.
– However our initial design was dual-die
• CMOS side was not mature yet in this process
• Minimum etched structure width: λ = 0.5 µm
• Minimum etched gap size: d = 0.1 µm
UF CONFIDENTIAL – PATENT PENDING
Some Early Resonator Designs
By Ph.D. student Maojiao He, under supervision of Huikai Xie
drive
Close-up of sense fingers
comb
sense
comb
Another
finger
design
UF CONFIDENTIAL – PATENT PENDING
Resonator Schematic
Vc
vac
Actuator
Vc
Vb
vac
Ca
Sensor
Sensor
Cs
Cr
Vb
Sensor
Vc
Sensor
 vac
Actuator
Vp  Vc  Vb
UF CONFIDENTIAL – PATENT PENDING
Sensor Design
ds  d
Lst  
Wsst
Ls
Ws
Ws  
X  8Lst
Wst  Ws  4d
Wsst  Ws  2d
Ls  d ( Ls  20d )
Lst
Wst
ds
4Csf  8 1016 F
16
Four-finger sensor
14
Capacitance 10
(Early
design
w. thin
fingers)
F
12
10
8
6
4
Simulated Output Waveform
t
2
0
-5
-4
-3
-2
-1
0
1
2
t
Dissipation in Resonator
Ways to minimize some major sources of dissipation:
• Air damping:
– Vacuum packaging, small size, or optimize airflow
• Clamping losses to the substrate:
– Locate support at a nodal point of vibration mode
– Use impedance-mismatched supports to reflect energy back
• Thermoelastic dissipation (heat flow resulting from
nonuniform strain):
– Small size
– Use stiff, high thermal conductivity materials (Si, diamond?)
– Utilize modes with uniform compression/expansion
• Surface loss mechanisms:
– Avoid layered structures (thin-film interfaces) at surfaces
• Intrinsic material losses:
– Prefer single-crystal materials
Status / Plans for Near Future
• Improved resonator designs afforded by a suitably
modified post-CMOS process flow are being developed.
– I will briefly review some aspects of the new process.
• A small prototype resonator design was taped out in a
post-CMOS MEMS process (TSMC .35)
– Parts were just received last week; are presently being etched.
• Process donation has been obtained from MOSIS for
fabricating a integrated CMOS/MEMS test chip (~$20k).
– Resonator driving a simple 2LAL shift register or adder pipeline
– Tape-out for this chip is scheduled for July 26.
• Test the various parts separately, & together.
– Characterize power dissipation using sensitive calorimetry
techniques.
Post CMOS-MEMS Process
(DRIE)
CMOS-region
(a) Backside etch
STS: 12-sec etching
130-sccm SF6, 13-sccm O2,
23 mT, 600 W coil power, 12
W platen power;
8-sec passivation
85-sccm C4F8, 12 mT, 600 W
coil power, 0 platen power.
(b) Oxide etch
PlasmaTherm-790:
22.5-sccm CHF3, 16-sccm
O2, 100 W, 125 mT for 125
minutes and then 100 mT for
10 minutes.
Single-crystal Si
(SCS) membrane
metal-3
metal-2
metal-1
oxide
poly-Si
(a)
Post CMOS-MEMS Process
(DRIE)
(b)
CMOS layer
(c) Deep Si etch
STS: same as Step (a).
Flat structure
Thin-film
structure
(d) Si undercut
STS: 130-sccm SF6,
13-sccm O2, 23 mT,
600 W coil power,
and 0 platen power.
SCS layer
(20~100mm)
H. Xie et al, J. MEMS, Vol.11, no.2, 2002
Electrical Isolation of Silicon
 Electrically isolated
silicon island
 Electrically isolated
comb fingers
 Using n-well to improve
undercut yield
n-well
Al
Oxide
DRIE CMOS-MEMS Resonators
Front-side
view
Serpentine
Proof
spring
mass
Comb
drive
Back-side
view
150 kHz
Resonators
UF CONFIDENTIAL – PATENT PENDING
Post-TSMC35 AdiaMEMS Resonator
Taped out
April ‘04
Drive
comb
Sense
comb
Flex
arm
UF CONFIDENTIAL – PATENT PENDING
Close-Up View, Drive/Sense Combs
UF CONFIDENTIAL – PATENT PENDING
Side View, Showing Si Undercut
UF CONFIDENTIAL – PATENT PENDING
New Comb Finger Shape
Concepts
For improved waveform shape and
area efficiency
UF CONFIDENTIAL – PATENT PENDING
New Comb Finger Shape I
Load electrode
Maximum
vertical (z)
thickness for
maximum
overlap
capacitance
per planar
area
Fixed
plate
Moving plate support arm/electrode
Minimum
thickness
to minimize
undesired
arm-load
capacitance
Moving
Fixed
plate Fixed plate
(cut awayplate
Moving Plate Range of Motion
view)
Minimum gap size
for maximum overlap
capacitance per-area
Note that the new configuration increases the
magnitude of the capacitance variation while
reducing the magnitude of departures from
the desired trapezoidal wave shape.
Metal/oxide layers
Color
key:
Silicon
substrate
material
UF CONFIDENTIAL – PATENT PENDING
New Comb Finger Shape II
Fixed
plate
Maximum
vertical (z)
thickness for
maximum
overlap
capacitance
per planar
area
Moving
Fixed
plate
plate Fixed
(cut awayplate
Moving Plate Range of Motion
view)
Moving plate support arm/electrode
Minimum gap size
for maximum overlap
capacitance per-area
Note that the new configuration increases the
magnitude of the capacitance variation while
reducing the magnitude of departures from
the desired trapezoidal wave shape.
In addition, the structures are made of silicon
Metal/oxide layers
Color
key:
Silicon
substrate
material
UF CONFIDENTIAL – PATENT PENDING
New Comb Finger Shape III
Moving plate support arm/electrode
Load electrode
High vertical
(z) thickness
for large
overlap
capacitance
per planar
area
Fixed
plate
Moving
plate Plate Range of Motion
FixedMoving
plate
Note that the new configuration increases the
magnitude of the capacitance variation while
reducing the magnitude of departures from
the desired trapezoidal wave shape.
Note
separation
to reduce
undesired
arm-load
capacitance
Minimum gap size
for maximum overlap
capacitance per-area
Metal/oxide layers
Color
key:
Silicon
substrate
material
UF CONFIDENTIAL – PATENT PENDING
New Comb Finger Shape IV
Arm anchored to nodal points of fixed-fixed beam flexures,
located a little ways away, in both directions (for symmetry)
Moving metal plate support arm/electrode
Moving
plate Range of Motion
z
Phase 0° electrode
C(θ)
0°
θ
360°
Repeat
interdigitated
structure
arbitrarily many
times along y axis,
all anchored to the
same flexure
Phase 180° electrode
x
C(θ)
0°
θ
y
360°
Or, if we can do the structure on the previous slide, then why not this one too? Or, will there be a problem
etching the intervening silicon out from in between the metal/oxide layers and the bulk substrate?
UF CONFIDENTIAL – PATENT PENDING
New Comb Finger Shape V
Fixed
plate
Fixed
plate
Moving
plate
Fixed
plate
Fixed
plate
In this design, the plates are attached directly to a supprt
arm which extends in the y direction instead of x. This arm
can be the flexure, or it can be attached to a surrounding
frame anchored to a flexure. Note that in the initial position,
at all points, we only need etch from top and/or bottom, with
no undercuts. Also, the flexure can be single-crystal Si.
Requires accurate,
variable-depth
backside etch
(not presently
available).
UF CONFIDENTIAL – PATENT PENDING
New finger: One Candidate Layout
UF CONFIDENTIAL – PATENT PENDING
New finger simulation results
8
7
6
5
4
3
2
1
0
0
1
2
3
4
5
6
7
8
9
8
7
6
5
4
3
2
1
0
0
2
4
6
8
10
12
Cadence simulation results
Work by AdiaMEMS project students:
Krishna Natarajan
Venkiteswaran Anantharam
(UF ECE Dept., under supervision of
Dr. Frank, CISE/ECE)
2LAL
8-stage
circular shift
register
Shift register layout, in progress
Pulse propagation in 8-stage circuit
Simulation Results from Cadence
Power vs. freq., TSMC 0.18, Std. CMOS vs. 2LAL
1.E-05
1.E-07
1.E-08
Standard
CMOS
1.E-10
1.E-11
1.E-12
<.01× the power
@ 1 MHz
1.E-09
>100× faster
@ 1 pW/T
1.E-13
1.E-14
1.E+09 1.E+08 1.E+07 1.E+06 1.E+05 1.E+04 1.E+03
Frequency, Hz
Energy dissipated per nFET per cycle
Average power dissipation per nFET, W
1.E-06
Assumptions & caveats:
•Assumes ideal trapezoidal
power/clock waveform.
• Minimum-sized devices, 2λ×3λ
* .18 µm (L) × .24 µm (W)
• nFET data is shown
* pFETs data is very similar
• Various body biases tried
* Higher Vth suppresses leakage
• Room temperature operation.
• Interconnect parasitics have not
yet been included.
• Activity factor (transitions per
device-cycle) is 1 for CMOS,
0.5 for 2LAL in this graph.
• Hardware overhead from fullyadiabatic design style is not
yet reflected
* ≥2× transistor-tick hardware
overhead in known reversible
CMOS design styles
O(log n)-time carry-skip adder
(8 bit segment shown)
3rd carry tick
4th carry tick
S AB
G
S AB
Cin
GCoutCin
P
Pms
G
S AB
G
P
S AB
GCoutCin
Cin
P
Gls Pls
MS
Pms
GCout
P
S AB
G
P
Gls
LS
Cin
P
Pls
Pms
G
Cin
P
Pms
G
Gls
With this structure, we can do a
2n-bit add in 2(n+1) logic levels
→ 4(n+1) reversible ticks
2nd carry tick
→ n+1 clock cycles.
Hardware
overhead is
<2× regular
G P
P
G P
ripple-carry.
MS
LS
GC
C
S AB
S AB
GCoutCin
G
P
ls
Cin
ls
P
ms
Gls
P
Pms
Gls
GCout LS
P
Pls
Cin
Pls
Cin
ls
in
P
GCout LS
P
ls
out
Pms
MS
GCoutCin
P
P
Pls
S AB
Adder Schematic – High 16 Bits
32-bit Adder Simulation Results
32-bit adder power vs.
frequency
32-bit adder energy vs.
frequency
1.E-04
1.E-11
Energy/Add (J)
1.E-05
Power (W)
1.E-06
1.E-07
1.E-12
1V CMOS
0.5V CMOS
1.E-13
1.E-14
CMOS energy
1.E-08
Adia. enrgy
20x better perf.
@ 3 nW/adder
CMOS pwr
1.E-09
1.E-15
1.E+08
Adia. pwr
1.E+07
1.E+06
1.E+05
1.E+04
Add Frequency (Hz)
1.E-10
1.E+08
1.E+07
1.E+06
1.E+05
Add Frequency (Hz)
1.E+04
(All results normalized to a
throughput level of 1 add/cycle)
Power vs. freq., alt. device techs.
Power per device, vs. frequency
Plenty of Room for
Device Improvement
1.E-03
1.E-04
1.E-05
1.E-06
1.E-07
• Recall, irreversible device
technology has at most
~3-4 orders of magnitude
of power-performance
improvements remaining.
1.E-08
1.E-09
1.E-10
1.E-11
1.E-12
1.E-13
1.E-15
– And then, the firm kT ln 2
limit is encountered.
1.E-16
1.E-17
1.E-18
• But, a wide variety of
proposed reversible device
technologies have been
analyzed by physicists.
1.E-19
1.E-20
1.E-21
.18um 2LAL
nSQUID
QCA cell
Quantum FET
Rod logic
Param. quantron
Helical logic
.18um CMOS
kT ln 2
– With theoretical powerperformance up to 10-12
orders of magnitude better
than today’s CMOS!
• Ultimate limits are unclear.
1.E+12
1.E+11
1.E+10
1.E+09
1.E-22
1.E-23
1.E-24
Various
reversible
device proposals
1.E-25
1.E-26
1.E-27
1.E-28
1.E-29
1.E-30
1.E+08
1.E+07
Frequency (Hz)
1.E+06
1.E+05
1.E+04
1.E-31
1.E+03
Power per device (W)
1.E-14
A Potential Scaling Scenario for
Reversible Computing Technology
Make same assumptions as previously, except:
• Assume energy coefficient (energy diss. / freq.)
of reversible technology continues declining at
historical rate of 16× / 3 years, through 2020.
– For adiabatic CMOS, cE = CV2RC = C2V2R.
• This has been going as ~4 under constant-field scaling.
– But, requires new devices after CMOS scaling stops.
• However, many candidates are waiting in the wings…
• Assume number of affordable layers of active
circuitry per chip (or per package, e.g., stacked
dies) doubles every 3 years, through 2020.
– Competitive pressures will tend to ensure this will
happen, esp. if device-size scaling stops, as assumed.
Result of Scenario
A Potential Scenario for CMOS vs. Reversible Raw Affordable Chip Performance
40 layers, ea. w.
8 billion active
devices,
freq. 180 GHz,
0.4 kT dissip.
per device-op
Device-ops/second per affordable 100W chip
1.00E+23
1.00E+22
1.00E+21
CMOS
1.00E+20
Reversible
1.00E+19
e.g. 1 billion devices actively switching at
3.3 GHz, ~7,000 kT dissip. per device-op
1.00E+18
1.00E+17
2004
2006
2008
2010
2012
2014
2016
2018
2020
Year
Note that by 2020, there could be a factor of 20,000× difference in raw
performance per 100W package. (E.g., a 100× overhead factor from reversible
design could be absorbed while still showing a 200× boost in performance!)
Conclusions
• Standard CMOS is approaching imminent limits on
raw performance per unit power consumed.
– Due to various lower bounds on the energy dissipated by
conventional irreversible switching.
• Only mostly-reversible logic architectures have the
potential to bypass all of the known energy limits!
– Via migration to an increasingly adiabatic, ballistic mode of
operation, and an increasingly reversible logic design.
• With increasingly high-Q energy transfers during logic.
• UF’s AdiaMEMS project is refining techniques for
near-term reversible computing in CMOS/MEMS.
– Potentially viable technology for ultra-low-power products.
• Long-term, digital circuit architectures that are
designed in a mostly-reversible logic style will be the
only ones that can be easily ported to future ultra-highperformance reversible logic-device nanotechnologies.
– We need to start paying more attention to these issues!
AdiaMEMS Project Members – Thanks!
Left to
Right:
Venki,
Mike,
Maojiao,
Krishna,
& Huikai