http://www.eng.fsu.edu/~mpf Low Power Electronics Exploring the Fundamental Limits of Computation Dr. Michael P. Frank Invited Videoconference Talk Pragyaa Festival Shri Guru Gobind Singhji Institute of Engineering and Technology,

Download Report

Transcript http://www.eng.fsu.edu/~mpf Low Power Electronics Exploring the Fundamental Limits of Computation Dr. Michael P. Frank Invited Videoconference Talk Pragyaa Festival Shri Guru Gobind Singhji Institute of Engineering and Technology,

http://www.eng.fsu.edu/~mpf
Low Power Electronics
Exploring the Fundamental Limits
of Computation
Dr. Michael P. Frank
Invited Videoconference Talk
Pragyaa Festival
Shri Guru Gobind Singhji Institute of Engineering
and Technology, Vishnupuri, Nanded, India
April 3, 2005
Abstract of Talk
• The electronics industry is rapidly approaching various
fundamental physical limits to the energy efficiency of
conventional digital technologies.
– As a result, the performance of practical digital systems based on
conventional technology must level off within at most 1-3 decades.
• Our generation will be forced to deal with the consequences!
• There is only one potential way to circumvent all these limits
that is consistent with the known laws of physics.
– Namely: To develop highly reversible computing technologies.
• These recycle and reuse most of the logic signal energy.
• Reversible computing opens the door to potentially unlimited
future improvements in computer efficiency.
– This could have enormous potential implications!
• But, to develop reversible computing into a fully practical
technology is a very challenging engineering problem…
– The world’s brightest, most creative people will be needed to solve it!
• With hard work, maybe you will be the one to make the key breakthroughs!
11/7/2015
M. Frank, "Low-Power Electronics"
2
Computer Performance
versus Power
• Computer performance Π is defined as the rate at which
operations (of some standard size) are performed;
– i.e., the number nop of operations per unit t of time, Π ≡ nop/t.
• Example: Boolean logic operations per second.
• In contrast, power P in physics refers to an amount E of
energy per unit of time t, that is, P ≡ E/t.
– I.e., a rate at which energy undergoes some process.
• E.g., being transmitted, transformed, or dissipated to heat.
– The question of exactly what the power/energy is doing is a crucial one!
• Please, please! Do not confuse this technical meaning of the
word “power” with other, more informal uses in English!
– E.g., to mean performance:
• “My computer is more powerful than yours.”
 You really mean, it has better performance.
– Or to mean energy:
• “How much power do we require in order to add two numbers?”
 You really mean, how much energy must be dissipated.
11/7/2015
M. Frank, "Low-Power Electronics"
3
Computational Energy
Efficiency
• We define the energy efficiency ηE of a computer as its
performance per unit of organized power being used up,
ηE ≡ Π/Pdiss.
– Where “used up” means transformed into disorganized waste heat
power that is dissipated out into the environment.
• Therefore, ηE = (nop/t)/(Ediss/t) = nop/Ediss.
– Energy efficiency is thus also the number of operations that can be
performed per unit of energy that is dissipated.
• High energy efficiency is desirable because it can allow us to:
– Compute faster while consuming power at a given fixed level, or…
– Perform more computational work before running out of energy.
• Let Ediss,op denote the amount of energy dissipated in
performing 1 standard operation.
– Note this has a reciprocal relationship with energy efficiency.
• That is, we have ηE = (1 op)/Ediss,op and Ediss,op = (1 op)/ηE.
• Thus, high energy efficiency ηE  low Ediss,op.
11/7/2015
M. Frank, "Low-Power Electronics"
4
1.E-14
Gate Energy
Trends
Trend of ITRS
Min.'97-'03
Transistor
Switching
Energy
Based on ITRS ’97-03 roadmaps
250
180
1.E-15
130
90
Node numbers
(nm DRAM hp)
65
1.E-16
CVV/2 energy, J
LP min gate energy, aJ
HP min gate energy, aJ
100 k(300 K)
ln(2) k(300 K)
1 eV
k(300 K)
45
32
1.E-17
fJ
22
Practical limit for CMOS?
1.E-18
aJ
Room-temperature 100 kT reliability limit
One electron volt
1.E-19
1.E-20
Room-temperature kT thermal energy
Room-temperature von Neumann - Landauer limit
zJ
1.E-21
1.E-22
1995
2000
2005
2010
2015
2020
2025
2030
2035
2040
2045
Year
11/7/2015
M. Frank, "Low-Power Electronics"
5
Lower Bounds on
Energy Dissipation
• In today’s 90 nm VLSI technology, for minimal operations
(conventional switching of a minimum-sized transistor):
– Ediss,op is on the order of 1 fJ (femtojoule)  ηE ≲ 1015 ops/sec/watt.
• Will be a bit better in coming technologies (65 nm, maybe 45 nm)
• Conventional digital technologies are subject to several lower
bounds on their energy dissipation Ediss,op for digital logic /
storage / communication operations,
– And thus, corresponding upper bounds on their energy efficiency.
• Some of the known bounds include:
– Leakage-based limit for high-performance field-effect transistors:
• Roughly at least ~5 aJ (attojoules)  ηE ≲ 2×1017 operations/sec/watt
– Reliability-based limit for all non-energy-recovering technologies:
• Roughly 1 eV (electron-volt)  ηE ≲ 6×1018 operations/sec/watt
– von Neumann-Landauer (VNL) bound for all irreversible technologies:
• Exactly kT ln 2 ≈ 18 meV (on Earth)  ηE ≲ 3.5×1020 operations/sec/watt
11/7/2015
M. Frank, "Low-Power Electronics"
6
Reliability Bound on Logic
Signal Energies
• Let Esig denote the logic signal energy,
– The energy involved in storing, transmitting, or transforming a bit’s worth of
digital information.
• But note that “involved” does not necessarily mean “dissipated!”
• As a result of fundamental thermodynamic considerations, it is required
that Esig ≥ kBTsig ln R,
– Where kB is Boltzmann’s constant, 1.38×10−12 J/K;
– Tsig is the temperature of the local subsystem carrying the signal;
– R is the reliability factor, i.e., the improbability 1/perr of error.
• In non-energy-recovering logic technologies (totally dominant today)
– Basically all of the signal energy (and often additional energy) is dissipated to
heat on each operation.
• In this case, minimum sustainable dissipation is Ediss,op ≳ kBTenv ln R
– Where Tenv is the temperature of the waste-heat reservoir
• Averages around 300 K (room temperature) in Earth’s atmosphere
• For a decent R = 2×1017, this energy is ~40 kT ≈ 1 eV.
– For better energy efficiency, we must recover some of the signal energy.
• Rather than dissipating it all to heat with each manipulation of the signal.
11/7/2015
M. Frank, "Low-Power Electronics"
7
VNL Bound on Energy Dissipation
from Information Loss
N physical microstates
per logical macrostate
before bit erasure
(shown as 8 for clarity
in this simple example)
Physical
microstate
trajectories
Follows directly from the reversibility
of fundamental physics!
Logical
state “0”,
after
operation
S = k ln 8
= 3 bits
S = k ln 16
= 4 bits
Logical
state “0”,
before
operation
∆S = 1 bit
= k ln 2
Logical
state “1”,
before
operation
11/7/2015
S = k ln 8
= 3 bits
M. Frank, "Low-Power Electronics"
Ediss = ∆S·Tenv
= kTenv ln 2
8
Von Neumann-Landauer Bound
• Follows directly from the time-reversibility (invertibility) of all
fundamental physical dynamics.
– This in turn is implied by the Hamiltonian formulation of mechanics;
and the unitarity of quantum mechanics.  Very well-established.
• Implies that physical information can never be destroyed!
– Only reversibly transformed!
• When we lose or discard a bit’s worth of logical information,
– e.g., by erasing or destructively overwriting a bit storage location…
• the ‘lost’ information must actually remain in existence,
– if in no other form, then as a bit’s worth (k ln 2) of physical entropy.
• Entropy simply means unknown information in the physical state.
• If the logical bit was originally known (not entropy)
– then entropy has increased in this process by ∆S = 1 bit = k ln 2.
• The energy in the heat reservoir must be increased by an amount ∆S·Tenv
= kTenv ln 2 in order to contain this additional entropy.
11/7/2015
M. Frank, "Low-Power Electronics"
9
Reversible Computing
• The basic idea is simply this:
– Don’t erase information when performing logic / storage /
communication operations!
• Instead, just reversibly transform it in place!
• When reversible digital operations are implemented
using well-designed energy-recovering circuitry,
– This can result in energy dissipation Ediss << Esig,
• This has already been empirically demonstrated in many chips.
– and even (in principle) energy dissipation Ediss << kT ln 2!
• This is pretty clear in theory, but we are not yet to the point of
achieving such low levels of dissipation experimentally.
– Achieving this goal requires very careful design,
– and verifying it requires very sensitive measurement equipment.
11/7/2015
M. Frank, "Low-Power Electronics"
10
Adiabatic Circuits
• Reversible logic can be implemented today using
fairly ordinary voltage-coded CMOS VLSI circuits.
– With a few changes to the logic-gate/circuit architecture.
• We avoid dissipating most of the circuit node energy
when switching, by transferring charges in a nearly
adiabatic (literally, “without flow of heat”) fashion.
– I.e., asymptotically thermodynamically reversible.
• In the limit, as various low-level technology parameters are scaled.
• There are many designs for purported “adiabatic”
circuits in the literature,
– but, watch out! Most of designs out there contain fatal
flaws, and are not truly adiabatic.
• Many past designers are unaware of (or accidentally failed to
meet) all the requirements for true thermodynamic reversibility.
11/7/2015
M. Frank, "Low-Power Electronics"
11
Reversible and/or Adiabatic VLSI Chips
Designed @ MIT, 1996-1999
By Frank and other then-students in the MIT Reversible Computing group,
under CS/AI lab members Tom Knight and Norm Margolus.
Bistable Potential-Energy Wells
A Technology-Independent Model of Digital Devices
(Landauer ’61)
• Consider any system having an (adjustable) potential
energy surface (PES) in its configuration space.
– The PES should have at least two local minima (or wells)
– Therefore the system is bistable
• It has two stable (or at least metastable) configurations
– Located at well bottoms
– One state can represent 0, the other 1.
• This picture can also be easily generalized to
larger numbers of stable states.
• Consider now the PES having
two adjustable parameters:
– (1) “Height” (energy) of the potential energy
barrier between wells, relative to well bottoms
– (2) Relative height of the left and right
states in the well (call this “bias”)
Potential
energy
• The two stable states form a natural bit.
0
1
Generalized
configuration
coordinate
Possible Parameter Settings
• In some of the following slides, we will
mention six qualitatively different settings
of the well parameters, as shown below…
Raised
Barrier
Height
Lowered
Left
Neutral
Direction of Bias Force
Right
MOSFET Implementation
• The logical state is in the location of a charge packet
(excess of electrons) on either side terminal of a FET.
– The charge packet might even consist of just a single excess
electron in a sufficiently small (nanoscale) logic node.
• The potential energy barrier is provided by the built-in
voltage across the PN junctions in the FET.
– The barrier height is lowered when the device is turned on
by adjusting the voltage on the gate electrode.
• Bias forces can be provided by (e.g.) capacitive
coupling to nearby electrodes.
n
e e e
p
n
Possible Well Transitions
• Catalog of all the possible transitions in
the bistable wells, adiabatic & not...
(Ignoring
superposition
states.)
– We can characterize a wide variety of digital
logic and memory styles in terms of how their
operation corresponds to subgraphs of this diagram.
1
leak
0
0
0
Barrier
Height
∆E
0
1
1
k ln 2
N
Direction of Bias Force
leak
∆E
1
“1”
states
“0”
states
Ordinary Irreversible Logics
• Principle of operation: Lower a barrier, or not,
based on input. Series/parallel combinations of
barriers do logic. Major
1
dissipation in at least one of
the
possible
transitions.
Input
changes,
barrier
lowered
0
0
• Can amplify input signals.
Example: Ordinary CMOS logics
Output
irreversibly
changed to 0
Irreversible SET/CLR operations
• Irreversible SET: Turn on a pFET connecting node B to a high
voltage source.
SET operation
B
B
B
½CV2
Voltage
color scheme:
Low / High
B
before
B
after
0
0
1
1
• Irreversible CLR: Turn on an nFET connecting node B to a
low voltage source.
CLR operation
B
B
½CV2
B
B
before
B
after
0
0
1
1
Conventional Logic is Irreversible
Even a simple NOT gate, as it’s traditionally implemented!
• Here’s what all of today’s logic gates (including NOT)
do continually, i.e., every time their input changes:
–
–
–
–
They overwrite previous output with a function of their input.
Performs many-to-one transformation of local digital state!
 required to dissipate ≳kT on avg., by Landauer principle
Incurs ½CV2 energy dissipation when the output changes.
Example:
Static CMOS Inverter:
in
out
Inverter transition table:
Just before
After
transition:
transition:
in out
0 0
0 1
1 0
1 1
in out
0
1
1
0
Example: Standard CMOS Inverter
Power (Vdd)
on
In
=0
Out
=1
off
Ground (0V)
Barrier
lowered
Charge
Vdd
falls in
Out
Power (Vdd)
Input
goes
high
off
In
=1
on
Input
goes
low
Barrier btwn.
Out and Ground
lowered, charge
“falls” to lower
energy level
Ground (0V)
Voltage color scheme:
Low / High
Barrier
raised
Simplified
← picture →
of PES
GND
Out = 0
Barrier
lowered
Charge falls out
Vdd
Out
GND
Ordinary Irreversible Memory
• (1) Lower a barrier, obliviously erasing stored
information. (2) Apply an input bias. (3) Raise
the barrier to latch the new information
into place. (4) Remove input
(4)
Retract
1
bias.
input
(1) and (2) can
also be in the
opposite order
Examples:
ordinary
DRAM cell,
rod logic
register
(4)
Retract
input
0
Barrier
up
0
(3)
Input
“0”
0
Dissipation
here can be
made as low
as kT ln 2
(2)
(1)
N
Barrier
up
Input
“1”
(2)
1
1
(3)
Example: NMOS latch / DRAM cell
• Sequence corresponds exactly to general
picture illustrated on previous slide.
I
off
M
I
I
off
M
(1)
Oblivious
erasure
on
Voltage color scheme:
Low / Medium / High
I
on
M
I
off
M
I
off
M
I
on
M
I
off
M
I
off
M
M
(2)
Apply
input
bias
Could also do these in the other order also
(3)
Raise
barrier
(4)
Remove
input
bias
(& back
to start)
Conventional vs. Adiabatic Charging
For charging a capacitive load C through a voltage swing V
• Conventional charging:
– Constant voltage source
• Ideal adiabatic charging:
– Constant current source
Q=CV
Q=CV
I
V
R
C
C
– Energy dissipated:
Ediss  12 CV 2
– Energy dissipated:
Ediss
2
Q
R
2
2 RC
 I Rt 
 CV
t
t
Note: Adiabatic beats conventional by advantage factor A = t/2RC.
Adiabatic Switching with MOSFETs
Vg
• Use a voltage ramp to approximate
an ideal current source.
~R
+
V
• Switch conditionally,
C
−
Q=CV
if MOSFET gate voltage
t
Vg > V+VT during ramp.
• Can discharge the load later using a similar ramp.
– Either through the same path, or a different path.
t ≫ RC  Ediss
RC
 CV
t
2
t ≪ RC  Ediss  12 CV
2
Exact formula:
 

Ediss  s 1  s e1/ s 1 CV 2
given speed fraction
s : RC/t
Athas ’96, Tzartzanis ‘98
Requirements for True Adiabatic Logic
in Voltage-coded, FET-based circuits
• Avoid passing current through diodes.
– Crossing the “diode drop” leads to irreducible dissipation.
• Follow a “dry switching” discipline (in the relay lingo):
– Never turn on a transistor when VDS ≠ 0.
– Never turn off a transistor when IDS ≠ 0.
• Together these rules imply:
Important
but often
neglected!
– The logic design must be logically reversible
• There is no way to erase information under these rules!
– Transitions must be driven by a quasi-trapezoidal waveform
• It must be generated resonantly, with high Q
• Of course, leakage power must also be kept manageable.
– Because of this, the optimal design point will not necessarily
use the smallest devices that can ever be manufactured!
• Since the smallest devices may have insoluble problems with leakage.
Reversible Set (rSET) & Clear (rCLR)
• rSET operation semantics: Given assurance that a bit is initially 0,
unconditionally change it to 1.
– To implement: Traverse the adiabat (reversible trajectory) shown below.
• Reverse this path to perform rCLR.
(6)
1
(1)
0
Barrier
Height
Get work
out
1
Put work
back in
0
(5)
(2)
(3)
0
N
(4)
Direction of Bias Force
1
“1”
states
“0”
states
rSET/rCLR transition tables
• Note that these tables are not reversible according to
the strict traditional definition…
– Since they don’t represent a 1-1 transformation of all
possible input states.
• However, if we restrict our use of these operations so
as to always avoid the input states that actually result
in dissipation,
– Then, we obtain a 1-1 transformation of the subset of the
input states that are actually used,
– And that is the correct statement of the true logical
requirement for avoiding Landauer’s principle!
Before
rSET
After
rSET
Before
rCLR
0
1
0
1
1
After
rCLR
0
Type 1: Input-Bias Clocked-Barrier
Reversible Latching (& Logic)
• Cycle of operation:
(Can amplify/restore input signal
– (1) Data input applies bias in the barrier-raising step.)
• Add forces to do majority logic
– (2) Clock signal raises barrier
– (3) Data input bias removed
(3)
1
1
(4)
Can reset latch
reversibly (4)
given copy of
contents.
(3)
0
0
(2) (4)
(4)
(4)
Examples: Adiabatic
QCA, SCRL latch, Rod
logic latch, PQ logic,
Buckled logic, Helical logic
(2)
(1)
0
(4)
N
(1)
(4)
1
Type 1 Example: Adiabatic
NMOS latch / DRAM cell
• Same as irrev. latch, just skip the erasure step!
Voltage color scheme:
Low / Medium / High
I
on
M
I
off
M
I
off
M
I
Can similarly use a
CMOS transmission
gate (nFET/pFET pair)
(1)
to latch a full-swing
Apply
signal if necessary.
on
M
I
off
M
I
off
M
I
on
M
input
bias
(2)
Raise
barrier
(3)
Remove
input
bias
(Reverse steps
to reversibly
unlatch M)
A Simple Reversible CMOS Latch
• Uses a single standard CMOS transmission gate (T-gate).
• Sequence of operation:
(0) input level initially tied to latch ‘contents’ (output);
(1) input changes gradually  output follows closely;
(2) latch closes, charge is stored dynamically (node floats);
(3) afterwards, the input signal can be removed.
Before
input:
in out
0 0
P
in
out
P
“Reversible latch”
(0) (1) (2) (3)
Input
arrived:
in out
0 0
1 1
Input
removed:
in out
0 0
0 1
• Later, we can reversibly
“unlatch” the data with
an exactly time-reversed
sequence of steps.
Type 2: Input-Barrier, Clocked-Bias
Reversible Retractile Logic
• Cycle of operation:
– (1) Inputs raise or lower barriers
• Do logic w. series/parallel barriers
• Barrier signal is amplified!
Gain, restoring logic, fan-out.
• Must reset output prior to
changing input.
• Combinational logic only!
– (2) Clock applies bias force, which changes state, or not
0
0
0
(1) Input barrier height
Examples:
Hall’s logic,
SCRL gates,
Rod logic interlocks
0
N
1
(2) Clocked bias force applied 
Type 2 example: Adiabatic CMOS
“buffer” (really, a cSET/cCLR gate)
• Controlled-SET / controlled-CLEAR.
• Structure: Essentially just a pair of CMOS transmission gates
– 2 transistors each, an nFET and a pFET in parallel
• Using dual-rail signaling, we can reversibly set or clear a bit on an unoccupied
logic node (pair of voltage nodes), conditionally on an input node.
– Amplifies input signal.
– Fully restores logic levels.
DriveN
DriveN
InN
InP
on
InN
OutN
DriveN
off
InN
InP
off
DriveN
InP
OutN
OutN
(And similarly for OutP)
InP
on
OutN
Voltage color scheme:
Low / High
InN
DriveN
InN
off
InP
OutN
Transition Table for cSET
• It is not unconditionally reversible,
– Not a one-to-one transformation of all possible
local states,
• But, it is conditionally reversible
– I.e., on condition that input state 1,1 is avoided.
Before cSET
Source Destination
After cSET
Source
Destination
0
0
0
0
0
1
0
1
1
0
1
1
1
1
1
0
Type 2 example: SCRL inverter
• Same structure as static CMOS inverter, but used reversibly.
• Produces a fully-restored, amplified output signal.
• Inverters can be cascaded, but need latches to get feedback.
driveH
driveH
In
In
off
In
off
Out
on
driveL
driveL
driveH
driveH
on
on
In
Voltage color scheme:
Low / Medium / High
off
Out
on
off
Out
driveL
driveH
Out
In
Out
off
off
driveL
driveL
SCRL Inverter Transition Table
Before
After
SCRL-Inv SCRL-Inv
In Out
In Out
0 0
0 ½
0 1
0 1
½ 0
½ ½
½ 1
1 0
1 ½
1 1
1 0
• Conditionally reversible, if input
is valid and output is ½ just
before drivers do their thing.
• No point in even listing the
table entries that don’t occur;
can summarize operation below.
Before
After
SCRL-Inv SCRL-Inv
In Out
In Out
0 ½
0 1
1 ½
1 0
Example: Adiabatic NMOS OR gate
• Input barriers along two parallel paths
A
A
Out
Drive
B
A
A
B
B
A
B
A
Out
Drive
B
Out
Drive
Out
Drive
B
• Reverse sequence
decomputes Out.
• Can’t change A,B
freely until then.
B
A
Out
Drive
Out
Drive
B
A
Out
Drive
Out = A  B
B
A
Out
Drive
Out
Drive
• With NMOS, Out
is weak (orange).
• Can use an SCRL
inverter to restore
the signal levels.
• If appropriately
biased…
• Or, just use CMOS
transmission gates
instead (8T OR)
Type 3: Input-Barrier, Clocked-Bias
Latching Logic
● Cycle of operation:
1. Input conditionally lowers barrier
•
Do logic w. series/parallel barriers
2. Clock applies bias force; conditional bit flip
3. Input removed, raising the barrier &
(4)
locking in the state-change
(4)
4. Clock
0
0
bias can 0
(2)
(2)
retract
(1)
Examples: Mike’s
4-cycle 2-level adiabatic
CMOS logic (2LAL)
(2)
0
N
(2)
1
(3)
1
2LAL: 2-level Adiabatic Logic
A pipelined fully-adiabatic logic invented at UF (Spring 2000),
implementable using ordinary CMOS transistors.
TN
T
• Use simplified T-gate symbol:
1
• Basic buffer element:
– cross-coupled T-gates:
• need 8 transistors to
buffer 1 dual-rail signal
in
0
out
• Only 4 timing signals 0-3 are
needed. Only 4 ticks per cycle:
– i rises during ticks t≡i (mod 4)
– i falls during ticks t≡i+2 (mod 4)
2
:
(implicit
dual-rail
encoding
everywhere)
TP
Animation:
0
1
2
3
Tick #
0 1 2 3…
2LAL Cycle of Operation
Tick #0
Tick #1
in1
in
Tick #2
11
in0
Tick #3
10
out1
01
in=0
01
00
11
out0
out=0
00
2LAL Shift Register Structure
Animation:
• 1-tick delay per logic stage:
1
2
3
0
in@0
0
1
2
3
out@4
• Logic pulse timing and signal propagation:
0 1 2 3 ...
inN
inP
0 1 2 3 ...
More Complex Logic Functions
• Non-inverting multi-input Boolean functions:
A0
B0
0 AND gate
(plus delayed A)

A0
A1
OR gate
B0
(AB)1
(AB)1
• One way to do inverting functions in pipelined
logic is to use a quad-rail logic encoding:
– To invert, just
swap the rails!
• Zero-transistor
“inverters.”
A=0
AN
AP
AN
AP
A=1
Cadence simulation results
Work by AdiaMEMS project students:
Krishna Natarajan
Venkiteswaran Anantharam
(UF ECE Dept., under supervision of
Dr. Frank, CISE/ECE)
Simulation Results from Cadence
Power vs. freq., TSMC 0.18, Std. CMOS vs. 2LAL
1.E-05
1.E-07
1.E-08
Standard
CMOS
1.E-10
1.E-11
1.E-12
<.01× the power
@ 1 MHz
1.E-09
>100× faster
@ 1 pW/T
1.E-13
1.E-14
1.E+09 1.E+08 1.E+07 1.E+06 1.E+05 1.E+04 1.E+03
Frequency, Hz
Energy dissipated per nFET per cycle
Average power dissipation per nFET, W
1.E-06
Assumptions & caveats:
•Assumes ideal trapezoidal
power/clock waveform.
• Minimum-sized devices, 2λ×3λ
* .18 µm (L) × .24 µm (W)
• nFET data is shown
* pFETs data is very similar
• Various body biases tried
* Higher Vth suppresses leakage
• Room temperature operation.
• Interconnect parasitics have not
yet been included.
• Activity factor (transitions per
device-cycle) is 1 for CMOS,
0.5 for 2LAL in this graph.
• Hardware overhead from fullyadiabatic design style is not
yet reflected
* ≥2× transistor-tick hardware
overhead in known reversible
CMOS design styles
O(log n)-time carry-skip adder
(8 bit segment shown)
3rd carry tick
4th carry tick
S AB
G
S AB
Cin
GCoutCin
P
Pms
G
S AB
G
P
S AB
GCoutCin
Cin
P
Gls Pls
MS
Pms
GCout
P
S AB
G
P
Gls
LS
Cin
P
Pls
Pms
G
Cin
P
Pms
G
Gls
With this structure, we can do a
2n-bit add in 2(n+1) logic levels
→ 4(n+1) reversible ticks
2nd carry tick
→ n+1 clock cycles.
Hardware
overhead is
<2× regular
G P
P
G P
ripple-carry.
MS
LS
GC
C
S AB
S AB
GCoutCin
G
P
ls
Cin
ls
P
ms
Gls
P
Pms
Gls
GCout LS
P
Pls
Cin
Pls
Cin
ls
in
P
GCout LS
P
ls
out
Pms
MS
GCoutCin
P
P
Pls
S AB
32-bit Adder Simulation Results
32-bit adder power vs.
frequency
32-bit adder energy vs.
frequency
1.E-04
1.E-11
Energy/Add (J)
1.E-05
Power (W)
1.E-06
1.E-07
1.E-12
1V CMOS
0.5V CMOS
1.E-13
1.E-14
CMOS energy
1.E-08
Adia. enrgy
20x better perf.
@ 3 nW/adder
CMOS pwr
1.E-09
1.E-15
1.E+08
Adia. pwr
1.E+07
1.E+06
1.E+05
1.E+04
Add Frequency (Hz)
1.E-10
1.E+08
1.E+07
1.E+06
1.E+05
Add Frequency (Hz)
1.E+04
(All results normalized to a
throughput level of 1 add/cycle)
Thanks to AdiaMEMS Project Members
Left to
Right:
Venki,
Mike,
Maojiao,
Krishna,
& Huikai
11/7/2015
M. Frank, "Low-Power Electronics"
46
Power vs. freq., alt. device techs.
Power per device, vs. frequency
Plenty of room for
device improvement…
1.E-03
1.E-04
1.E-05
1.E-06
1.E-07
• Recall, irreversible device
technology has at most
~3-4 orders of magnitude
of power-performance
improvements remaining.
1.E-08
1.E-09
1.E-10
1.E-11
1.E-12
1.E-13
1.E-15
– And then, the firm kT ln 2
limit is encountered.
1.E-16
1.E-17
1.E-18
• But, a wide variety of
proposed reversible device
technologies have been
analyzed by physicists.
1.E-19
1.E-20
1.E-21
.18um 2LAL
nSQUID
QCA cell
Quantum FET
Rod logic
Param. quantron
Helical logic
.18um CMOS
kT ln 2
– With theoretical powerperformance up to 10-12
orders of magnitude better
than today’s CMOS!
• Ultimate limits are unclear.
1.E+12
1.E+11
1.E+10
1.E+09
1.E-22
1.E-23
1.E-24
Various
reversible
device proposals
1.E-25
1.E-26
1.E-27
1.E-28
1.E-29
1.E-30
1.E+08
1.E+07
Frequency (Hz)
1.E+06
1.E+05
1.E+04
1.E-31
1.E+03
Power per device (W)
1.E-14
A Potential Scaling Scenario for
Reversible Computing Technology
• Assume energy coefficient (energy diss. / freq.)
of reversible technology continues declining at
historical rate of 16× / 3 years, through 2020.
– For adiabatic CMOS, cE = CV2RC = C2V2R.
• This has been going as ~4 under constant-field scaling.
– Requires new devices after CMOS scaling stops.
• But, many potential candidates are waiting in the wings…
• Assume affordable number of layers of active
circuitry per chip (or per package, e.g., stacked
dies) doubles every 3 years, through 2020.
– Competitive pressures will tend to reduce per-layer
cost, esp. if device-size scaling stops, as assumed.
Result of Scenario
A Potential Scenario for CMOS vs. Reversible Raw Affordable Chip Performance
40 layers, ea. w.
8 billion active
devices,
freq. 180 GHz,
0.4 kT dissip.
per device-op
Device-ops/second per affordable 100W chip
1.00E+23
1.00E+22
1.00E+21
CMOS
1.00E+20
Reversible
1.00E+19
e.g. 1 billion devices actively switching at
3.3 GHz, ~7,000 kT dissip. per device-op
1.00E+18
1.00E+17
2004
2006
2008
2010
2012
2014
2016
2018
2020
Year
Note that by 2020, there could be a factor of 20,000× difference in raw
performance per 100W package. (E.g., a 100× overhead factor from reversible
design could be absorbed while still showing a 200× boost in performance!)
Possible Cosmic (!) Implications of
Reversible Computing
• Astrophysicists Krauss and Starkman have argued that,
– even if we someday colonize the stars,
• The total energy we can ever harvest is finite!
– We can never reach galaxies beyond a certain distance,
• due to the accelerating expansion of the universe.
• Thus if we never create reversible computing,
– Then someday we must run out of energy! (Due to VNL.)
• And then, all computation (thus all life) will permanently cease.
• However, if we invent reversible computing,
– and if we can make it ever more energy-efficient over time,
• Then potentially, an infinite number of computations (thoughts?) can
be performed using only a finite supply of energy!
•  Reversible computing is needed to save the universe!
– If Krauss & Starkman’s basic arguments are correct.
11/7/2015
M. Frank, "Low-Power Electronics"
50
Infinite Computation with Finite Energy
• Suppose we perform N operations using total energy E,
– And we then discover how to make computation twice as energy-efficient.
• Then, we perform N more operations with energy E/2,
– and then discover how to make computation twice as efficient again.
• Then, do N more operations using energy E/4,
– and so on forever (you get the picture)
  
Total operations done    N    (infinite)
 i 0 
  E
Total energy used    i   2E ( finite)
 i 0 2 
E
11/7/2015
E/2
M. Frank, "Low-Power Electronics"
E/4
E/8
E/16
51
Challenges that Must be Met for
Reversible Computing to Happen
• Need to design extremely efficient energy-recovering
power-clock resonators.
– With very high quality factor Q = Esig/Ediss.
• Requires very precise engineering, and very refined designs.
• Need to design novel logic devices with a very low
adiabatic energy coefficient cE = Ediss·top.
– And develop a cost-effective manufacturing process for
fabricating them in large numbers.
• Need to optimize reversible logic circuits,
architectures, and algorithms.
– To minimize the overheads of reversible operation.
• These tasks are all quite difficult!
11/7/2015
M. Frank, "Low-Power Electronics"
52
Skills that the Inventors of Future
Reversible Computers will Need
• Very strong mathematics background:
– E.g., Linear (matrix) algebra, abstract algebra (e.g. group
theory), real & complex analysis, probability & statistics.
• Very strong grasp of fundamental physics:
– Mechanics, thermodynamics, electrodynamics, quantum
mechanics, condensed matter, quantum chemistry, relativity…
• Solid engineering knowledge & skills:
– Electrical engineering, solid-state devices, digital circuits,
digital logic design, information & communication theory,
computer architecture, systems engineering & optimization,
software engineering, algorithm design.
• As you can see, this task is not for the faint of heart!
– We must seek great breadth, depth, and quality of knowledge,
– and close cooperation with others in large collaborations.
11/7/2015
M. Frank, "Low-Power Electronics"
53
Conclusions
• The evolution of conventional computing technology
is reaching a dead end…
– That is, a permanent limit on its practical performance, due
to power dissipation constraints.
• These limits might possibly be circumvented…
– But only by aggressively moving towards new energyrecovering, reversible digital logic technologies.
• Reversible computing is physically possible, according
to our best modern knowledge of quantum physics.
– But, achieving it will require extremely high-precision, highquality engineering of nanoscale devices and systems,
• And many bright, inventive, creative people, working hard together.
• The future of our technology, our civilization, and
perhaps even all life in the universe, might just depend
on whether you and your peers choose to pursue the
goal of meeting the reversible computing challenge!