Michael P. Frank http://www.eng.fsu.edu/~mpf Requirements for Energy-Efficient Computing Beyond the von Neumann Limit ECE Graduate Seminar Thursday, October 20, 2005
Download ReportTranscript Michael P. Frank http://www.eng.fsu.edu/~mpf Requirements for Energy-Efficient Computing Beyond the von Neumann Limit ECE Graduate Seminar Thursday, October 20, 2005
Michael P. Frank http://www.eng.fsu.edu/~mpf Requirements for Energy-Efficient Computing Beyond the von Neumann Limit ECE Graduate Seminar Thursday, October 20, 2005 Abstract of Talk • Fundamental physics limits the performance of conventional computing technologies. – The energy efficiency of conventional machines will be forced to level off in roughly the next 10-20 years. • Practical computer performance must then plateau as well. • However, all of the proven limits to computer energy efficiency can, in principle, be circumvented… – but only if computing undergoes a radical paradigm shift. • The essential new paradigm: Reversible computing. – It involves reusing energy to improve energy efficiency. • However, doing this well tightly constrains computer design at all levels from devices through logic, architectures, and algorithms. • In this talk, I review the stringent physical and logical requirements that must be met, – if we wish to break through the near-term barriers, • and approach the true physical limits of computing. 11/7/2015 M. Frank, "Approaching the Physical Limits of Computing" 2 Moore’s Law and Performance Moore's Law - Transistors per Chip 1,000,000,000 Madison Itanium 2 P4 P3 P2 486DX Pentium 386 286 • Gordon Moore, 1975: – Devices per IC can be doubled every 18 months • Borne out by history! • Some fortuitous corollaries: 10,000,000 1,000,000 100,000 10,000 1,000 100 Devices per IC 100,000,000 8086 4004 10 1 1950 Avg. increase of 57%/year Year of Introduction 1960 1970 1980 1990 – Every 3 years: Devices ½ as long – Every 1.5 years: ~½ as much stored energy per bit! 2000 2010 • It is that that has enabled us to throw away bits (and their energies) 2× more frequently every 1.5 years, at reasonable power levels! – And thereby double processor performance ~2× every 1.5 years! • Increased energy efficiency of computation is a prerequisite for improved raw performance! – Given realistic levels of total power consumption. 11/7/2015 M. Frank, "Approaching the Physical Limits of Computing" 3 Efficiency in General, and Energy Efficiency • The efficiency η of any process is: η = P/C – Where P = Amount of some valued product produced – and C = Amount of some costly resources consumed • In energy efficiency ηe, the cost C measures energy. • We can talk about the energy efficiency of: – A heat engine: ηhe = W/Q, where: • W = work energy output, Q = heat energy input – An energy recovering process : ηer = Eend/Estart, where: • Eend = available energy at end of process, • Estart = energy input at start of process – A computer: ηec = Nops/Econs, where: • Nops = useful operations performed • Econs = free-energy consumed 11/7/2015 M. Frank, "Approaching the Physical Limits of Computing" 4 ITRS '97-'03 Gate Energy Switching Trends Trend of Minimum Transistor Energy Based on ITRS ’97-03 roadmaps 1.E-14 250 180 1.E-15 130 Joules energy, CV2/2 gate CVV/2 energy, J 90 LP min gate energy, aJ HP min gate energy, aJ 100 k(300 K) ln(2) k(300 K) 1 eV k(300 K) Node numbers (nm DRAM hp) 65 1.E-16 45 32 1.E-17 fJ 22 Practical limit for CMOS? 1.E-18 aJ Room-temperature 100 kT reliability limit One electron volt 1.E-19 1.E-20 Room-temperature kT thermal energy Room-temperature von Neumann - Landauer limit zJ 1.E-21 1.E-22 1995 2000 2005 2010 2015 2020 2025 2030 2035 2040 2045 Year 11/7/2015 M. Frank, "Approaching the Physical Limits of Computing" 5 Some Lower Bounds on Energy Dissipation • In today’s 90 nm VLSI technology, for minimal operations (e.g., conventional switching of a minimum-sized transistor): – Ediss,op is on the order of 1 fJ (femtojoule) ηec ≲ 1015 ops/sec/watt. • Will be a bit better in coming technologies (65 nm, maybe 45 nm) • But, conventional digital technologies are subject to several lower bounds on their energy dissipation Ediss,op for digital transitions (logic / storage / communication operations), – And thus, corresponding upper bounds on their energy efficiency. • Some of the known bounds include: – Leakage-based limit for high-performance field-effect transistors: • Maybe roughly ~5 aJ (attojoules) ηec ≲ 2×1017 operations/sec./watt – Reliability-based limit for all non-energy-recovering technologies: • On the order of 1 eV (electron-volt) ηec ≲ 6×1018 ops./sec/watt – von Neumann-Landauer (VNL) bound for all irreversible technologies: • Exactly kT ln 2 ≈ 18 meV ηec ≲ 3.5×1020 ops/sec/watt – For systems whose waste heat ultimately winds up in Earth’s atmosphere, » i.e., at temperature T ≈ Troom = 300 K. 11/7/2015 M. Frank, "Approaching the Physical Limits of Computing" 6 Reliability Bound on Logic Signal Energies • Let Esig denote the logic signal energy, – The energy involved (transferred, manipulated) in the process of storing, transmitting, or transforming a bit’s worth of digital information. • But note that “involved” does not necessarily mean “dissipated!” • As a result of fundamental thermodynamic considerations, it is required that Esig ≲ kBTsig ln r (with quantum corrections that are small for large r) – Where kB is Boltzmann’s constant, 1.38×10−12 J/K; – and Tsig is the temperature in the degrees of freedom carrying the signal; – and r is the reliability factor, i.e., the improbability of error, 1/perr. • In non-energy-recovering logic technologies (totally dominant today) – Basically all of the signal energy is dissipated to heat on each operation. • And often additional energy (e.g., short-circuit power) as well. • In this case, minimum sustainable dissipation is Ediss,op ≳ kBTenv ln r, – Where Tenv is now the temperature of the waste-heat reservoir (environment) • Averages around 300 K (room temperature) in Earth’s atmosphere • For a decent r of e.g. 2×1017, this energy is on the order ~40 kT ≈ 1 eV. – Therefore, if we want energy efficiency ηec > ~1 op/eV, we must recover some of the signal energy for later reuse. • Rather than dissipating it all to heat with each manipulation of the signal. 11/7/2015 M. Frank, "Approaching the Physical Limits of Computing" 7 The von Neumann-Landauer (VNL) Principle • First alluded to by John von Neumann in 1949. – Developed explicitly by Rolf Landauer of IBM in 1961. • The principle is a rigorous theorem of physics! – It follows from the reversibility of fundamental dynamics. • A correct statement of the principle is the following: – Any process that loses or obliviously erases 1 bit of known (correlated) information increases total entropy by at least ∆S = 1 bit = kB ln 2, and thus implies the eventual dissipation at least Ediss = kBTenv ln 2 of free energy to the environment as waste heat. • where kB = log e = 1.38×10−23 J/K is Boltzmann’s constant • and Tenv = temperature of the waste-heat reservoir (environment) – Not less than about room temperature, or 300 K for earthbound computers. implies Ediss ≥ 18 meV. 11/7/2015 M. Frank, "Approaching the Physical Limits of Computing" 8 Definition of Reversibility • What does it mean for a dynamical system (either continuous or discrete) to be (time-) reversible? – Let x(t) denote the state of the system at time t. • The universe, or any closed system of interest (e.g. a computer). – Let Ft→u(x) be the transition relation operating between a given two times t and u; i.e., x(u) = Ft→u[x(t)]. • Determined by the system’s dynamics (laws of physics, or a FSM). – Then the system is called “dynamically reversible” iff Ft→u is a one-to-one function, for any times (t, u) where u > t. • That is, t >u: ¬ x1x2: Ft→u(x1) = Ft→u(x2). – That is, no two distinct states would ever go to the same state over the course of a given time interval. – The definition implies determinism, if we also allow u < t. • A reversible system is deterministic in the reverse time direction. 11/7/2015 M. Frank, "Approaching the Physical Limits of Computing" 9 Types of Dynamics • Nondeterministic, irreversible • Nondeterministic, reversible • Deterministic, irreversible • Deterministic, reversible 11/7/2015 M. Frank, "Approaching the Physical Limits of Computing" WE ARE HERE 10 Physics is Reversible! • All successful models of fundamental physics are expressible in the Hamiltonian formalism. – Including: Classical mechanics, quantum mechanics, special and general relativity, quantum field theories. • The latter two (GR & QFT) are backed up by enormous, overwhelming mountains of evidence confirming their predictions! – 11 decimal places of precision so far! And, no contradicting evidence. • In Hamiltonian systems, the dynamical state x(t) obeys a differential equation that’s first-order in time, dx/dt = g(x) (where g is some function) – This immediately implies determinism of the dynamics. • And, since the time differential dt can be taken to be negative, the formalism also implies reversibility! – Thus, dynamical reversibility is one of the most firmlyestablished, fundamental, inviolable facts of physics! 11/7/2015 M. Frank, "Approaching the Physical Limits of Computing" 11 Illustration of VNL Principle • Either digital state is initially encoded by any of N possible physical microstates – Illustrated as 4 in this simple example (the real number would usually be much larger) – Initial entropy S = log[#microstates] = log 4 = 2 bits. • Reversibility of physics ensures “bit erasure” operation can’t possibly merge two microstates, so it must double the possible microstates in the digital state! – Entropy S = log[#microstates] increases by log 2 = 1 bit = (log e)(ln 2) = kB ln 2. – To prevent entropy from accumulating locally, it must be expelled into the environment. Microstates representing logical “0” Entropy S′ S == log 8 4= 3 2 bits 11/7/2015 Microstates representing logical “1” ∆S = S′ − S = 3 bits − 2 bits = 1 bit M. Frank, "Approaching the Physical Limits of Computing" [Play as slideshow to see animations] Entropy S= log 4 = 2 bits 12 Reversible Computing • The basic idea is simply this: – Don’t erase information when performing logic / storage / communication operations! • Instead, just reversibly (invertibly) transform it in place! • When reversible digital operations are implemented using well-designed energy-recovering circuitry, – This can result in local energy dissipation Ediss << Esig, • this has already been empirically demonstrated by many groups. – and even total energy dissipation Ediss << kT ln 2! • This is easily shown in theory & simulations, – but we are not yet to the point of demonstrating such low levels of total dissipation empirically in a physical experiment. • Achieving this goal requires very careful design, – and verifying it requires very sensitive measurement equipment. 11/7/2015 M. Frank, "Approaching the Physical Limits of Computing" 13 How Reversible Logic Avoids the von Neumann-Landauer Bound • We arrange our logical manipulations to never attempt to merge two distinct digital states, – but only to reversibly transform them from one state to another! • E.g., illustrated is a reversible operation cCLR (controlled CLR) logic 00 logic 01 logic 10 logic 11 – It and its inverse cSET enable arbitrary logic! 11/7/2015 M. Frank, "Approaching the Physical Limits of Computing" 14 A Few Highlights Of Reversible Computing History • Charles Bennett @ IBM, 1973-1989: – Reversible Turing machines & emulation algorithms • Can emulate irreversible machines on reversible architectures. – But, the emulation introduces some inefficiencies – Models of chemical & Brownian-motion physical realizations. • Fredkin and Toffoli’s group @ MIT, late 1970’s/early 1980’s – Reversible logic gates and networks (space/time diagrams) – Ballistic and adiabatic circuit implementation proposals • Groups @ Caltech, ISI, Amherst, Xerox, MIT, ‘85-’95: – Concepts for & implementations of adiabatic circuits in VLSI tech. – Small explosion of adiabatic circuit literature since then! • Mid 1990s-today: – Better understanding of overheads, tradeoffs, asymptotic scaling – A few groups begin development of post-CMOS implementations • Most notably, the Quantum-dot Cellular Automata group at Notre Dame 11/7/2015 M. Frank, "Approaching the Physical Limits of Computing" 15 Caveat #1 • Technically, to avoid the VNL bound doesn’t actually require that the digital operation must be reversible at the level of the logical states… – It can be logically irreversible if the information in the digital state is already entropy! • In the below example, the non-digital entropy doesn’t change, because the operation is also nondeterministic (N to N), and the transition relation between logical states has semi-detailed balance, so the entropy in the digital state remains constant. • However, such operations just re-randomize bits that are already random! Digital bit with unknown value – It’s not clear if this kind of operation is computationally useful. 0 0 1 1 11/7/2015 Physical dynamics whose precise details may be uncertain M. Frank, "Approaching the Physical Limits of Computing" 16 Caveat #2 • Operations that are logically N-to-1 can be used, if there are sufficient compensating 1-to-N (nondeterministic) logical operations. – All that is really required is that the logical dynamics be 1to-1 in the long-term average. • Thus, it’s possible to thermally generate random bits and discard them later when we are through with them. – While maintaining overall thermodynamic reversibility. • This ability is useful for probabilistic (randomized) algorithms. logic 0 11/7/2015 logic 1 M. Frank, "Approaching the Physical Limits of Computing" 17 Reversibility and Reliability • A widespread myth: “Future low-level digital devices will necessarily be highly unreliable.” – This comes from a flawed line of reasoning: • Faster more energy efficient lower bit energies high rate of bit errors from thermal noise – However, this scaling strategy doesn’t work, because: • High rate of thermal errors high power dissipation from error correction less energy efficient ultimately slower! • But in contrast, using reversible computing, we can achieve arbitrarily high energy efficiency while also maintaining arbitrarily high reliability! – The key is to keep bit energies reasonably high! • While recovering most of the bit energy… 11/7/2015 M. Frank, "Approaching the Physical Limits of Computing" 18 Minimizing Energy Dissipation Due to Thermal Errors • Let perr = 1/r be the bit-error probability per operation. – Where r quantifies the “reliability level.” – And pok = 1 − perr is the probability the bit is correct • The necessary entropy increase ∆S per op due to error occurrence is given by the (binary) Shannon entropy of the bit-value after the operation: H(perr) = perr log perr-1 + pok log pok-1. • For r >> 1 (i.e., as r → ∞), this increase approaches 0: ∆S = H(perr) ≈ perr log perr-1 = (log r)/r → 0 • Thus, the required energy dissipation per op also approaches 0: Ediss = T∆S ≈ (kT ln r)/r → 0 • Could get the same result by assuming the signal energy Esig = kT ln r required for reliability level r is dissipated each time an error occurs: Ediss = perrEsig = perr(kT ln r) = (kT ln r)/r → 0 as r → ∞. • Further, note that as r → ∞, the required signal energy grows only very slowly… – Specifically, only logarithmically in the reliability, i.e., Esig = Θ(log r). 11/7/2015 M. Frank, "Approaching the Physical Limits of Computing" 19 Device-Level Requirements for Reversible Computing • A good reversible device technology should have: – Low manufacturing cost ¢d per device • Important for good overall (system-level) cost-efficiency – Low rate of static power dissipation Pleak due to energy leakage. • Required for energy-efficient storage especially (but also in logic) – Low energy coefficient cE = Ediss/f (energy dissipated per operation, per unit transition frequency) for adiabatic transitions. • Implies we can achieve a high operating frequency (and thus good costperformance) at a given level of energy efficiency. – High maximum available transition frequency fmax. • Important for those applications in which the latency of serial threads of computation dominates total cost • Important: For system-level energy efficiency, Pleak and cE must be taken as effective global values measuring the implied amount of energy emitted into the outside environment at temperature Tenv. – With an ideal (Carnot) refrigerator, Pleak = StTenv and cE = cSTenv, • Where St = the static rate of leakage entropy generation per unit time, • and cS = Sgen/f adiabatic entropy coefficient, or entropy generated per unit transition frequency. 11/7/2015 M. Frank, "Approaching the Physical Limits of Computing" 20 Early Chemical Implementations • How to physically implement reversible logic? – Bennett’s original inspiration: DNA polymerization! • Reversible copying of a DNA strand – Molecular basis of cell division / organism reproduction • This (and all) chemical reactions are reversible… – Direction (forward vs. backward) & reaction rate depends on relative concentrations of reagent and product species affect free energy • Energy dissipated per step turns out to be proportional to speed. – Implies process is characterized by an energy-time constant. » I call this the “energy coefficient” cEt ≡ Ediss,optop = Ediss,op/fop. • For DNA, typical figures are 40 kT ≈ 1eV @ ~1,000 bp/s – Thus, the energy coefficient cE is about 1 eV/kHz. • Can we achieve better energy coefficients? – Yes, in fact, we had already beat DNA’s cE in reversible CMOS VLSI technology available circa 1995! 11/7/2015 M. Frank, "Approaching the Physical Limits of Computing" 21 Energy & Entropy Coefficients Q in Electronics R • For a transition involving the adiabatic transfer of an amount Q of charge along a path with resistance R: – The raw (local) energy coefficient is given by cEt = Edisst = Pdisst2 = IVt2 = I2Rt2 = Q2R. • Where V is the voltage drop along the path. – The entropy coefficient cSt = Q2R/Tpath. • where Tpath is the local thermodynamic temperature in the path. – The effective (global) energy coefficient is cEt,eff = Q2R(Tenv/Tpath). • We pay a penalty for low-T operation! 11/7/2015 M. Frank, "Approaching the Physical Limits of Computing" 22 Example of Electronic cEt • In a fairly recent (180 nm) CMOS VLSI technology: – Energy stored per min. sized transistor gate: ~1 fJ @ 2V • Corresponds to charge per gate of Q = 1 fC ≈ 6,000 electrons – Resistance per turned-on min-sized nFET of ~14 kΩ • Order of the quantum resistance R = R0 = 1/G0 = h/2q2 = 12.9 kΩ – Ideal energy coefficient for a single-gate transition ~1.4×10−26 J/Hz • Or in more convenient units, ~80 eV/GHz = 0.08 eV/MHz! – with some expected overheads for a simple test circuit, calculated energy coefficient comes out to about 8× higher, or ~10−25 J·s • Or ~600 eV/GHz = 0.6 eV/MHz. – Detailed Cadence simulations gave us, per transistor: • @ 1 GHz: P = 20 μW, E = 20 fJ = 1.2 keV, so Ec = 1.2 eV/MHz • @ 1 MHz: P = 0.35 pW, E = 3.5 aJ = 2.2 eV, so Ec = 2.1 eV/MHz 11/7/2015 M. Frank, "Approaching the Physical Limits of Computing" 23 Cadence Simulation Results Power vs. freq., TSMC 0.18, Std. CMOS vs. 2LAL • 2LAL = Two-level adiabatic logic 1.E-05 – in a shift register. • 1.E-07 Standard CMOS 1.E-09 1.E-10 1.E-11 1.E-12 1.E-13 1.E-14 1.E+09 1.E+08 1.E+07 1.E+06 1.E+05 1.E+04 1.E+03 11/7/2015 Energy dissipated per nFET per cycle Average power dissipation per nFET, W 1.E-06 1.E-08 Graph shows power dissipation vs. frequency At moderate frequencies (1 MHz), – Reversible uses < 1/100th the power of irreversible! • At ultra-low power (1 pW/transistor) – Reversible is 100× faster than irreversible! • Minimum energy dissipation < 1 eV! – 500× lower than best irreversible! • 500× higher computational energy efficiency! • Energy transferred is still ~10 fJ (~100 keV) – So, energy recovery efficiency is 99.999%! Frequency, Hz M. Frank, "Approaching the Physical Limits of Computing" • Not including losses in power supply 24 A Useful Two-Bit Primitive: Controlled-SET or cSET(a,b) • Semantics: If a=1, then set b:=1. a 0 0 1 – Conditionally reversible, if the special precondition ab=0 is met. • Note it’s 1-to-1 on the subset of states used – Sufficient to avoid Landauer’s principle! • We can implement cSET in dual-rail CMOS with a pair of transmission gates a • This 2-bit semi-reversible operation & its inverse cCLR are universal for reversible (and irreversible) logic! – If we compose them in special ways. • And include latches for sequential logic. 11/7/2015 M. Frank, "Approaching the Physical Limits of Computing" a’ b’ 0 0 0 1 1 1 drive (0→1) – Each needs just 2 transistors, • plus one controlling “drive” signal b 0 1 0 switch (T-gate) b a b 25 Reversible OR (rOR) from cSET • Semantics: rOR(a,b) ::= if a|b, c:=1. – Set c:=1, on the condition that either a or b is 1. • Reversible under precondition that initially a|b → ~c. • Two parallel cSETs simultaneously Hardware diagram driving a shared output line a implement the rOR operation! c – This type of gate composition was not traditionally considered. • Similarly one can do rAND, and reversible versions of all operations. – Logic synthesis with these is extremely straightforward… 11/7/2015 b Spacetime diagram a’ a c 0 b M. Frank, "Approaching the Physical Limits of Computing" a OR b c’ b’ 26 CMOS Gate Implementing rLatch / rUnLatch • Symmetric Reversible Latch Implementation Icon Spacetime Diagram crLatch connect in 2 in mem mem crUnLatch in or connect in mem mem (in) • The hardware is just a CMOS transmission gate again • This time controlled by a clock, with the data signal driving • Concise, symmetric hardware icon – Just a short orthogonal line • Thin strapping lines denote connection in spacetime diagram. 11/7/2015 M. Frank, "Approaching the Physical Limits of Computing" 27 Example: Building cNOT from rlXOR • rlXOR(a,b,c): Reversible latched XOR. – Semantics: c := ab. • Reversible under precondition that c is initially clear. • cNOT(a,b): Controlled-NOT operation. – Semantics: b := ab. (No preconditions.) • A classic “primitive” operation in reversible & quantum computing – But, it turns out to be fairly complex to implement cNOT in available fully adiabatic hardware technologies… • Thus, it’s really not a very good building block for practical reversible hardware designs! – Of course, we can still build it, if we really want to. • Since, as I said, our gate set is universal for reversible logic 11/7/2015 M. Frank, "Approaching the Physical Limits of Computing" 28 cNOT from rlXOR: Hardware Diagram • A logic block providing an in-place cNOT operation (a cNOT “gate”) can be constructed from 2 rlXOR gates and two latched buffers. A B Reversible latches X • The key is: – Operate some of the gates in reverse! 11/7/2015 M. Frank, "Approaching the Physical Limits of Computing" 29 Θ(log n)-time carry-skip adder (8 bit segment shown) 3rd carry tick 4th carry tick S AB G S AB Cin P Pms G S AB GCoutCin G P S AB P Gls Pls MS Pms GCout P S AB GCoutCin Cin G P Gls LS G Gls S AB P Pls Pms G Cin S AB GCoutCin Cin P Pms With this structure, we can do a 2n-bit add in 2(n+1) logic levels → 4(n+1) reversible ticks → n+1 clock cycles. 2nd carry tick G P Gls Pls MS Pms GCout Gls GCout LS P P Gls LS Pls Cin Pls Cin P Pms Gls GCout LS Hardware overhead is < 2× regular ripple-carry! P Pms MS GCoutCin P P Pls S AB Cin Pls Cin Spacetime overhead only ~2(n+1)× a conventional single-cycle equivalent. P 11/7/2015 M. Frank, "Approaching the Physical Limits of Computing" 30 32-bit Adder Simulation Results 32-bit adder power vs. frequency 32-bit adder energy vs. frequency 1.E-04 1.E-11 Energy/Add (J) 1.E-05 Power (W) 1.E-06 1.E-07 1.E-12 1V CMOS 0.5V CMOS 1.E-13 1.E-14 1.E-08 Adia. enrgy 20x better perf. @ 3 nW/adder CMOS pwr 1.E-09 CMOS energy 1.E-15 1.E+08 Adia. pwr 1.E+07 1.E+06 1.E+05 1.E+04 Add Frequency (Hz) 1.E-10 1.E+08 1.E+07 1.E+06 1.E+05 Add Frequency (Hz) 11/7/2015 1.E+04 (All results here are normalized to a throughput level of 1 add/cycle) M. Frank, "Approaching the Physical Limits of Computing" 31 Technological Challenges • Fundamental theoretical challenges: – Find more efficient reversible algorithms • Or, prove rigorous lower bounds on complexity overheads – Study fundamental physical limits of reversible computing • Implementation challenges: – Design new devices with lower energy coefficients cEt – Design high-quality resonators for driving transitions – Empirically demonstrate large system-level power savings • Application development challenges: – Find a plausible near- to medium-term “killer app” for RC • Something that’s very valuable, and can’t be done without it – Build a prototype RC-based solution prototype 11/7/2015 M. Frank, "Approaching the Physical Limits of Computing" 32 Power vs. freq., alt. device techs. Plenty of Room for Device Improvement Power per device, vs. frequency 1.E-03 1.E-04 1.E-05 1.E-06 • Recall, irreversible device technology has at most ~34 orders of magnitude of power-performance improvements remaining. 1.E-07 1.E-08 1.E-09 1.E-10 1.E-11 1.E-12 1.E-13 1.E-15 – And then, the firm kT ln 2 (VNL) limit is encountered. 1.E-16 1.E-17 1.E-18 • But, a wide variety of proposed reversible device technologies have been analyzed by physicists. 1.E-19 1.E-20 1.E-21 .18um 2LAL nSQUID QCA cell Quantum FET Rod logic Param. quantron Helical logic .18um CMOS kT ln 2 – With theoretical powerperformance up to 10-12 orders of magnitude better than today’s CMOS! • Ultimate limits are unclear. 11/7/2015 Power per device (W) 1.E-14 1.E+12 1.E+11 1.E+10 1.E+09 1.E-22 1.E-23 1.E-24 Various reversible device proposals 1.E-25 1.E-26 1.E-27 1.E-28 1.E-29 1.E-30 1.E+08 1.E+07 Frequency (Hz) M. Frank, "Approaching the Physical Limits of Computing" 1.E+06 1.E+05 1.E+04 1.E-31 1.E+03 33 Limiting Cases of Energy/Entropy Coefficients • Entropy/entropy coefficients in adiabatic “single electronics:” – Suppose the amount of charge moved |Q| = q (a single electron) – Let the path consist of a single quantum channel (chain of states) • Has quantum resistance R = R0 = 1/G0 = h/2q2 = 12.9 kΩ. – Then cE = h/2 = 2.07 meV/THz (very low!) • If path is at Tpath = Troom = 300 K, then cS = 0.08 k/THz. – For N× better efficiency than this, let the path consist of N parallel quantum channels. N× lower resistance. • What about systems where resistive models may not apply? – E.g., superconductors, photonics, etc. • A more general and rigorous (but perhaps loose) lower bound on the energy coefficient in all adiabatic quantum systems is given by the expression cE ≥ h2/4Egt, – where Eg = energy gap between ground & excited states, – and t = time taken for a single orthogonalizing transition – Ex.: Let Eg = 1 eV, t = 1 ps. Then cE ≥ 4.28 μeV/THz. 11/7/2015 M. Frank, "Approaching the Physical Limits of Computing" 34 Requirements for EnergyRecovering Clock/Power Supplies • All known reversible computing schemes require a periodic global signal that synchronizes and drives adiabatic transitions. – For good system-level energy efficiency, this signal must oscillate resonantly and near-ballistically, with a high effective quality factor. • Several factors make the design of a satisfactory resonator quite difficult: – Need to avoid uncompensated back-action of logic on resonator – In some resonators, Q factor may scale unfavorably with size – Effective quality factor problem • There’s no reason to think that it’s impossible to do… – But it is definitely a nontrivial hurdle, that we need to face up to, pretty urgently… • If we want to make reversible computing practical in time to avoid an extended period of stagnation in computer performance growth. 11/7/2015 M. Frank, "Approaching the Physical Limits of Computing" 35 The Back-Action Problem • The ideal resonator signal is a pure periodic signal. – A pretty general result from communications theory: • A resonator’s quality factor is inversely proportional to its signal bandwidth B. – E.g., for an EM cavity w. resonant frequency ω0, • the half-maximum BW is B = ∆ω = ω0/(2πQ) [1]. – Thus Q∞ B 0. • There must be little or no information in the resonator signal! • However, if the logic load being driven varies from on cycle to the next, – whether due to data-dependent variations, – or structural variations (different amounts of logic being driven per cycle) • this will tend to produce impedance nonuniformities, which will lead to nonuniform reflections of the resonator signal – and thereby introduce nonzero bandwidth into that signal. • Even more generally, any departure of resonator energy away from its ideal desired trajectory represents a form of effective energy dissipation! – we must control exactly where (into what states) all of the energy goes! • the set of possible microstates of the system must not grow quickly [1] Schwartz, Principles of Electrodynamics, Dover, 1972. 11/7/2015 M. Frank, "Approaching the Physical Limits of Computing" 36 Unfavorable Scaling of Resonator Quality Factor with Size? • I don’t yet have a perfectly clear and general understanding of this issue, but… – In a lot of oscillating systems I’ve looked at, the resonant Q factor may tend to get worse (or at least, not very much better) as the resonator dimensions get smaller. • E.g., in LC oscillators, inductor Q scales inversely to frequency – EM emission is greater at high frequencies – But, the tendency is for low f large coil sizes, not small! • Anecdotal reports from people working in NEMS community… – It can be difficult to get high Q in nanoscale electromechanical resonators » Perhaps due to present difficulty of precision engineering at nanoscale? • Our own experience working with transmission-line resonators • Example: In a cubical EM cavity of length L, – We have 2πQ = L / 8δ, where δ = skin depth. ([1] again) • Skin depth δ = (2πσk)−1/2, where σ = wall conductivity, k = wave #. – So if L is fixed, high Q small δ large k high f low Q in logic! 11/7/2015 M. Frank, "Approaching the Physical Limits of Computing" 37 The Effective Quality Factor Problem • Actual quality factor of resonator Q = Eres/Edissr. – Where Eres = energy contained in resonator signal – and Edissr = energy dissipated in resonator per cycle. • But the effective quality factor, for purposes of doing energy-efficient logic transitions is Qeff = Edeliv/Edissr. – Where Edeliv = energy delivered to the logic per transition. • Since 1/Qeff of the logic signal energy is dissipated per cycle. • Thus, Qeff = Q · (Edeliv/Eres). – That is, the effective Q is taken down by the fraction of resonator energy delivered to the logic per cycle. • If a resonator needs to be large to attain high Q, – it may also hold a large amount of energy Eres, • and so it may not have a very high effective Q for driving the logic! 11/7/2015 M. Frank, "Approaching the Physical Limits of Computing" 38 (PATENT PENDING, UNIVERSITY OF FLORIDA) Trapezoidal Resonator Concept Arm anchored to nodal points of fixed-fixed beam flexures, located a little ways away, in both directions (for symmetry) Moving metal plate support arm/electrode Moving plate Range of Motion z Phase 0° electrode C(θ) 0° θ 11/7/2015 360° Repeat interdigitated structure arbitrarily many times along y axis, all anchored to the same flexure Phase 180° electrode x C(θ) 0° θ M. Frank, "Approaching the Physical Limits of Computing" y 360° 39 Previous CMOS-MEMS Resonators in post-CMOS DRIE process (in use at UF) Front-side view Serpentine Proof spring mass Comb drive Back-side view 150 kHz Resonators 11/7/2015 M. Frank, "Approaching the Physical Limits of Computing" 40 PATENT PENDING, UNIVERSITY OF FLORIDA Resonator Schematic Vc vac Actuator Vc vac Ca Sensor Vb Sensor Cs Cr Vb Sensor Vc Sensor vac Actuator 11/7/2015 Vp Vc Vb M. Frank, "Approaching the Physical Limits of Computing" 41 PATENT PENDING, UNIVERSITY OF FLORIDA Post-TSMC35 AdiaMEMS Resonator (Coventorware model) Taped out April ‘04 Drive comb Sense comb Flex arm 11/7/2015 M. Frank, "Approaching the Physical Limits of Computing" 42 Quasi-Trapezoidal MEMS Resonator: 1st Fabbed Prototype • Post-etch process is still being fine-tuned. – Parts are not yet ready for testing… Primary flexure (fin) Sense comb Drive comb PATENT PENDING, UNIVERSITY OF FLORIDA 11/7/2015 M. Frank, "Approaching the Physical Limits of Computing" 43 Conclusions • Reversible computing will become necessary within our lifetimes, – if we wish to continue progress in computing performance/power beyond the next 1-2 decades. • Much progress in our understanding of RC has been made in the past three decades… – But much important work still remains to be done. • I encourage my audience to join the community of researchers who are working to address the reversible computing challenge. 11/7/2015 M. Frank, "Approaching the Physical Limits of Computing" 44