CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing
Download
Report
Transcript CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing
CSE 599 Lecture 7: Information Theory,
Thermodynamics and Reversible Computing
What have we done so far?
Theoretical computer science: Abstract models of computing
Turing machines, computability, time and space complexity
Physical Instantiations
1. Digital Computing
Silicon switches manipulate binary variables with near-zero
error
2. DNA computing
Massive parallelism and biochemical properties of organic
molecules allow fast solutions to hard search problems
3. Neural Computing
Distributed networks of neurons compute fast, parallel,
adaptive, and fault-tolerant solutions to hard pattern
recognition and motor control problems
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
1
Overview of Today’s Lecture
Information theory and Kolmogorov Complexity
What is information?
Definition based on probability theory
Error-correcting codes and compression
An algorithmic definition of information (Kolmogorov complexity)
Thermodynamics
The physics of computation
Relation to information theory
Energy requirements for computing
Reversible Computing
Computing without energy consumption?
Biological example
Reversibe logic gates Quantum computing (next week!)
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
2
Information and Algorithmic Complexity
3 principal results:
Shannon’s source-coding theorem
The main theorem of information content
A measure of the number of bits needed to specify the expected
outcome of an experiment
Shannon’s noisy-channel coding theorem
Describes how much information we can transmit over a channel
A strict bound on information transfer
Kolmogorov complexity
Measures the algorithmic information content of a string
An uncomputable function
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
3
What is information?
First try at a definition…
Suppose you have stored n different bookmarks on your web
browser.
What is the minimum number of bits you need to store these
as binary numbers?
Let I be the minimum number of bits needed. Then,
2I n I log2 n
So, the “information” contained in your collection of n
bookmarks is I0 = log2 n
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
4
Deterministic information I0
Consider a set of alternatives: X = {a1, a2, a3, …aK}
When the outcome is a3, we say x = a3
I0(X) is the amount of information needed to specify the
outcome of X
I0(X) = log2X
We will assume base 2 from now on (unless stated otherwise)
Units are bits (binary digits)
Relationship between bits and binary digits
B = {0, 1}
X = BM = set of all binary strings of length M
I0(X) = logBM= log2M= M bits
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
5
Is this definition satisfactory?
Appeal to your intuition…
Which of these two messages contains more “information”?
“Dog bites man”
or
“Man bites dog”
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
6
Is this definition satisfactory?
Appeal to your intuition…
Which of these two messages contains more “information”?
“Dog bites man”
or
“Man bites dog”
Same number of bits to represent each message!
But, it seems like the second message contains a lot more
information than the first. Why?
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
7
Enter probability theory…
Surprising events (unexpected messages) contain more
information than ordinary or expected events
“Dog bites man” occurs much more frequently than “Man bites dog
Messages about less frequent events carry more information
So, information about an event varies inversely with the
probability of that event
But, we also want information to be additive
If message xy contains sub-parts x and y, we want:
I(xy) = I(x) + I(y)
Use the logarithm function: log(xy) = log(x) + log(y)
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
8
New Definition of Information
Define the information contained in a message x in terms of
log of the inverse probability of that message:
I(x) = log(1/P(x)) = - log P(x)
First defined rigorously and studied by Shannon (1948)
“A mathematical theory of communication” – electronic handout
(PDF file) on class website.
Our previous definition is a special case:
Suppose you had n equally likely items (e.g. bookmarks)
For any item x, P(x) = 1/n
I(x) = log(1/P(x)) = log n
Same as before (minimum number of bits needed to store n items)
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
9
Review: Axioms of probability theory
Kolmogorov, 1933
P(a) >= 0
P(l) = 1
P(a + b) = P(a) + P(b)
where a is an event
where l is the certain event
where a and b are mutually exclusive
Kolmogorov (axiomatic) definition is computable
Probability theory forms the basis for information theory
Classical definition based on event frequencies (Bernoulli) is
uncomputable:
na
P( a) = lim
n n
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
10
Review: Results from probability theory
Joint probability of two events a and b: P(ab)
Independence
Events a and b are independent if P(ab) = P(a)P(b)
Conditional probability: P(a|b) = probability that event a
happens given that b has happened
P(a|b) = P(ab)/P(b)
P(b|a) = P(ba)/P(a) = P(ab)/P(a)
e
j
ej
P ba P a
We just proved Bayes’ Theorem: P ab =
P b
P(a) is called the a priori probability of a
P(ab) is called the a posteriori probability of a
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
11
Summary: Postulates of information theory
1. Information is defined in the context of a set of alternatives.
The amount of information quantifies the number of bits
needed to specify an outcome from the alternatives
2. The amount of information is independent of the semantics
(only depends on probability)
3. Information is always positive
4. Information is measured on a logarithmic scale
Probabilities are multiplicative, but information is
additive
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
12
In-Class Example
Message y contains duplicates: y = xx
Message x has probability P(x)
What is the information content of y?
Is I(y) = 2 I(x)?
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
13
In-Class Example
Message y contains duplicates: y = xx
Message x has probability P(x)
What is the information content of y?
Is I(y) = 2 I(x)?
I(y) = log(1/P(xx)) = log[1/(P(x|x)P(x))]
= log(1/P(x|x)) + log(1/P(x))
= 0 + log(1/P(x))
= I(x)
Duplicates convey no additional information!
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
14
Definition: Entropy
The average self-information or entropy of an ensemble X=
{a1, a2, a3, …aK}
F
1 I
1 I
F
H X = E log
H P( x ) K= Paa flogG
HPaa fJ
K
K
k =1
k
k
E expected (or average) value
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
15
Properties of Entropy
0 <= H(X) <= I0(X)
Equals I0(X) = logXif all the ak’s are equally probable
Equals 0 if only one ak is possible
Consider the case where k = 2
X = {a1, a2}
P(a1) = ; P(a2) = 1–
H X = log
F1 I 1 log 1
H K
1
= log 1 log1
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
16
Examples
Entropy is a measure of randomness of the source producing
the events
Example 1 : Coin toss: Heads or tails with equal probability
H = -(½ log ½ + ½ log ½) = -(½ (-1) + ½ (-1)) = 1 bit per coin toss
Example 2 : P(heads) = ¾ and P(tails) = ¼
H = -(¾ log ¾ + ¼ log ¼) = 0.811 bits per coin toss
As things get less random, entropy decreases
Redundancy and regularity increases
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
17
Question
If we have N different symbols, we can encode them in
log(N) bits. Example: English - 26 letters 5 bits
So, over many, many messages, the average cost/symbol is
still 5 bits.
But, letters occur with very different probabilities! “A” and
“E” much more common than “X” and “Q”. The log(N)
estimate assumes equal probabilities.
Question: Can we encode symbols based on probabilities so
that the average cost/symbol is minimized?
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
18
Shannon’s noiseless source-coding theorem
Also called the fundamental theorem. In words:
You can compress N independent, identically distributed (i.i.d.)
random variables, each with entropy H, down to NH bits with
negligible loss of information (as N)
If you compress them into fewer than NH bits you will dramatically
lose information
The theorem:
Let X be an ensemble with H(X) = H bits. Let Hd (X) be the entropy
of an encoding of X with allowable probability of error d
Given any > 0 and 0 < d < 1, there exists a positive integer No such
that, for N > No,
e j
N
1
H X
H
N d
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
19
Comments on the theorem
What do the two inequalities tell us?
1
N
Hd X H
N
ej
The number of bits
1
N
Hd X that we need to specify outcomes x
N
ej
with vanishingly small error probability d does not exceed H +
If we accept a vanishingly small error, the number of bits we need
to specify x drops to N(H + )
H
1
N
Hd X
N
ej
The number of bits
1
N
that we need to specify outcomes x
Hd X
N
ej
with large allowable error probability d is at least H –
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
20
Source coding (data compression)
Question: How do we compress the outcomes XN?
With vanishingly small probability of error
How do we assign the elements of X such that the number of bits we
need to encode XN drops to N(H + )
Symbol coding: Given x = a3 a2 a7 … a5
Generate codeword (x) = 01 1010 00
Want Io((x)) ~ H(X)
Well-known coding examples
Zip, gzip, compress, etc.
The performance of these algorithms is, in general, poor when
compared to the Shannon limit
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
21
Source-coding definitions
A code is a function : X B+
B = {0, 1}
B+ the set of finite strings over B
B+ = {0, 1, 00, 01, 10, 11, 000, 001, …}
(x) = (x1) (x2) (x3) … (xN)
A code is uniquely decodable (UD) iff
: X+ B+ is one-to-one
A code is instantaneous iff
No codeword is the prefix of another
(x1) is not a prefix of (x2)
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
22
Huffman coding
Given X = {a1, a2, …aK}, with associated probabilities P(ak)
Given a code with codeword lengths n1, n2, …nk
The expected code length n =
K
Pk nk
k =1
No instantaneous, UD code can achieve a smaller n than a
Huffman code
n=
K
Pk nk H X 1
k =1
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
23
Constructing a Huffman code
Feynman example: Encoding an alphabet
Code is instantaneous and UD: 00100001101010 = ANOTHER
Code achieves close to Shannon limit
H(X) = 2.06 bits; n = 2.13 bits
1
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
24
Information channels
Input
x
Output
channel
H(X) entropy
of input ensemble X
I(X;Y) is the average mutual
information between X and Y
Definition: Channel capacity
y
I(X;Y) what we know
about X given Y
a f
I X;Y = H X H X Y
af
= P xy log
xy
P( xy )
P( x ) P( y )
a f
= H Y H Y X
The information capacity of a channel
is: C = max[I(X;Y)]
The channel may add noise
Corrupting our symbols
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
25
Example: Channel capacity
Problem: A binary source sends equiprobable messages in a
time T, using the alphabet {0, 1} with a symbol rate R. As a
result of noise, a “0” may be mistaken for a “1”, and a “1” for
a “0”, both with probability q. What is the channel capacity C?
Channel is discrete
and memoryless
X
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
Y
26
Example: Channel capacity (con’t)
Assume no noise (no errors)
T is the time to send the string, R is the rate
The number of possible message strings is 2RT
The maximum entropy of the source is Ho = log(2RT ) bits
The source rate is (1/T) Ho = R bits per second
The entropy of the noise (per transmitted bit) is
Hn = qlog[1/q] + (1–q)log[1/(1–q)]
The channel capacity C (bits/sec) = R – RHn = R(1 – Hn)
C is always less than R (a fixed fraction of R)!
We must add code bits to correct the received message
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
27
How many code bits must we add?
We want to send a message string of length M
We add codebits to M, thereby increasing its length to Mc
How are M, Mc, and q related?
M = Mc(1 – Hn)
Intuitively, from our example
Also see pgs. 106 – 110 of Feynman
Note: this is an asymptotic limit
May require a huge Mc
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
28
Shannon’s Channel-Coding Theorem
The Theorem:
There is a nonnegative channel capacity C associated with each
discrete memoryless channel with the following property: For any
symbol rate R < C, and any error rate > 0, there is a protocol that
achieves a rate >= R and a probability of error <=
In words:
If the entropy of our symbol stream is equal to or less than the
channel capacity, then there exists a coding technique that enables
transmission over the channel with arbitrarily small error
Can transmit information at a rate H(X) <= C
Shannon’s theorem tells us the asymptotically maximum rate
It does not tell us the code that we must use to obtain this rate
Achieving a high rate may require a prohibitively long code
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
29
Error-correction codes
Error-correcting codes allow us to detect and correct errors in
symbol streams
Used in all signal communications (digital phones, etc)
Used in quantum computing to ameliorate effects of decoherence
Many techniques and algorithms
Block codes
Hamming codes
BCH codes
Reed-Solomon codes
Turbo codes
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
30
Hamming codes
An example: Construct a code that corrects a single error
We add m check bits to our message
Can encode at most (2m – 1) error positions
Errors can occur in the message bits and/or in the check bits
If n is the length of the original message then 2m – 1 >= (n + m)
Examples:
If n = 11, m = 4:
24 – 1= 15 >= (n + m) = 15
If n = 1013, m = 10: 210 – 1= 1023 >= (n + m) = 1023
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
31
Hamming codes (cont.)
Example: An 11/15 SEC Hamming code
Idea: Calculate parity over subsets of input bits
Four subsets: Four parity bits
Check bit x stores parity of input bit positions
whose binary representation holds a “1” in
position x:
Check bit c1: Bits 1,3,5,7,9,11,13,15
Check bit c2: Bits 2,3,6,7,10,11,14,15
Check bit c3: Bits 4,5,6,7,12,13,14,15
Check bit c4: Bits 8,9,10,11,12,13,14,15
The parity-check bits are called a
syndrome
The syndrome tells us the location of the error
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
Position in message
binary decimal
0001
1
0010
2
0011
3
0100
4
0101
5
0110
6
0111
7
1000
8
1001
9
1010
10
1011
11
1100
12
1101
13
1110
14
1111
15
32
Hamming codes (con’t)
The check bits specify the error location
Suppose check bits turn out to be as follows:
Check c1 = 1 (Bits 1,3,5,7,9,11,13,15)
Error is in one of bits 1,3,5,7,9,11,13,15
Check c2 = 1 (Bits 2,3,6,7,10,11,14,15)
Error is in one of bits 3,7,11,15
Check c3 = 0 (Bits 4,5,6,7,12,13,14,15)
Error is in one of bits 3,11
Check c4 = 0 (Bits 8,9,10,11,12,13,14,15)
So error is in bit 3!!
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
33
Hamming codes (cont.)
Example: Encode 10111011011
Code position:
Code symbol:
Codeword:
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
1 0 1 1 1 0 1 c4 1 0 1 c3 1 c2 c1
1 0 1 1 1 0 1 1 1 0 1 1 1 0 1
Notice that we can generate the code bits on the fly!
What if we receive 101100111011101?
c4 = 1
101100111011101
c3 = 0
101100111011101
c2 = 1
101100111011101
c1 = 1
101100111011101
The error is in location 1011 = 1110
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
34
Kolmogorov Complexity (Algorithmic Information)
Computers represent information as stored symbols
Not probabilistic n the Shannon sense)
Can we quantify information from an algorithmic standpoint?
Kolmogorov complexity K(s) of a finite binary string s is the
single, natural number representing the minimum length (in
bits) of a program p that generates s when run on a Universal
Turing machine U
K(s) is the algorithmic information content of s
Quantifies the “algorithmic randomness” of the string
K(s) is an uncomputable function
Similar argument to the halting problem
How do we know when we have the shortest program?
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
35
Kolmogorov Complexity: Example
Randomness of a string defined by shortest algorithm that
can print it out.
Suppose you were given the binary string x:
“11111111111111….11111111111111111111111” (1000 1’s)
Instead of 1000 bits, you can compress this string to a few
tens of bits, representing the length |P| of the program:
For I = 1 to 1000
Print “1”
So, K(x) <= |P|
Possible project topic: Quantum Kolmogorov complexity?
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
36
5-minute break…
Next: Thermodynamics and Reversible Computing
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
37
Thermodynamics and the Physics of Computation
Physics imposes fundamental limitations on computing
Computers are physical machines
Computers manipulate physical quantities
Physical quantities represent information
The limitations are both technological and theoretical
Physical limitations on what we can build
Example: Silicon-technology scaling
Major limiting factor in the future: Power Consumption
Theoretical limitations of energy consumed during computation
Thermodynamics and computation
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
38
Principal Questions of Interest
How much energy must we use to carry out a computation?
The theoretical, minimum energy
Is there a minimum energy for a certain rate of computation?
A relationship between computing speed and energy consumption
What is the link between energy and information?
Between information–entropy and thermodynamic–entropy
Is there a physical definition for information content?
The information content of a message in physical units
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
39
Main Results
Computation has no inherent thermodynamic cost
A reversible computation, that proceeds at an infinitesimal rate,
consumes no energy
Destroying information requires kTln2 joules per bit
Information-theoretic bits (not binary digits)
Driving a computation forward requires kTln(r) joules per
step
r is the rate of going forward rather than backward
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
40
Basic thermodynamics
First law: Conservation of energy
(heat put into system) + (work done on system) = increase in energy
of a system
DQ + DW = DU
Total energy of the universe is constant
Second law: It is not possible to have heat flow from a colder
region to a hotter region i.e. DQ/T >= 0
Change in Entropy DS = DQ/T
Equality holds only for reversible processes
The entropy of the universe is always increasing
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
41
Heat engines
A basic heat engine: Q2 = Q1 – W
T1 and T2 are temperatures
T1 > T2
Reversible heat engines are those
that have:
No friction
Infinitesimal heat gradients
The Carnot cycle: Motivation
was steam engine
Reversible
Pumps heat DQ from T1 to T2
Does work W = DQ
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
42
Heat engines (cont.)
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
43
The Second Law
No engine that takes heat Q1 at T1 and delivers heat Q2 at T2
can do more work than a reversible engine
W = Q1 – Q2 = Q1(T1 – T2) / T1
Heat will not, by itself, flow from a cold object to a hot object
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
44
Thermodynamic entropy
If we add heat DQ reversibly to a system at fixed temperature
T, the increase in entropy of the system is DS = DQ/T
S is a measure of degrees of
freedom
The probability of a configuration
The probability of a point in
phase space
In a reversible system, the total
entropy is constant
In an irreversible system, the total
entropy always increases
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
45
Thermodynamic versus Information Entropy
Assume a gas containing N atoms
Occupies a volume V1
Ideal gas: No attraction or repulsion between particles
Now shrink the volume
Isothermally (at constant temperature, immerse in a bath)
Reversibly, with no friction
How much work does this require?
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
46
Compressing the gas
From mechanics
work = force × distance
force = pressure × (area of piston)
volume change = (area of piston) × distance
Solving: DW = pDV
DW = F Dx
F = pA
DV = ADx
From gas theory
The idea gas law: pV = NkT
N number of molecules
k Boltzmann’s constant (in joules/Kelvin)
Solving:
W=
V2
z
V1
FI
HK
V2
NkT
dV = NkT ln
V
V1
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
47
A few notes
W is negative because we are doing work on the gas:V2 < V1
W would be positive if the gas did work for us
Where did the work go?
Isothermal compression
The temperature is constant (same before and after)
First law: The work went into heating the bath
Second law: We decreased the entropy of the gas
and increased the entropy of the bath
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
48
Free energy and entropy
The total energy of the gas, U, remains unchanged
Same number of particles
Same temperature
The “free energy” Fe, and the entropy S both change
Both are related to the number of states (degrees of freedom)
Fe = U – TS
For our experiment, change in free energy is equal to the
work done on the gas and U remains unchanged
V I
F
= DF = DU TDS = TDS
G
J
HV K
FV I
DS = Nk lnG J
HV K
NkT ln
2
e
1
2
DFe is the (negative)
heat siphoned off
into the bath
1
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
49
Special Case: N = 1
Imagine that our gas contains only one molecule
Take statistical averages of same molecule over time rather than over
a population of particles
Halve the volume
Fe increases by +kTln2
S decreases by kln2
But U is constant
What’s going on?
Our knowledge of the possible locations
of the particle has changed!
Fewer places that themolecule can be in,
now that volume has been halved
The entropy, a measure of the uncertainty
of a configuration, has decreased
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
50
Thermodynamic entropy revisited
Take the probability of a gas configuration to be P
Then S ~ klnP
Random configurations (molecules moving haphazardly) have
large P and large S
Ordered configurations (all molecules moving in one direction)
have small P and small S
The less we know about a gas…
the more states it could be in
and the greater the entropy
A clear analogy with information theory
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
51
The fuel value of knowledge
Analysis is from Bennett: Tape cells with particles coding 0 (left side)
or 1 (right side)
If we know the message on a tape
Then randomizing the tape can do useful work
Increasing the tape’s entropy
What is the fuel value of the tape
(i.e. what is the fuel value of our knowledge)?
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
52
Bennett’s idea
The procedure
Tape cell comes in with known particle location
Orient a piston depending on whether cell is a 0 or a 1
Particle pushes piston outward
Increasing the entropy by kln2
Providing free energy of kTln2 joules per bit
Tape cell goes out with randomized particle location
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
53
The energy value of knowledge
Define fuel value of tape = (N – I)kTln2
N is the number of tape cells
I is information (Shannon)
Examples
Random tape (I = N) has no fuel value
Known tape (I = 0) has maximum fuel value
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
54
Feynman’s tape-erasing machine
Define the information in the tape to be the amount of free
energy required to reset the tape
The energy required to compress each bit to a known state
Only the “surprise” bits cost us energy
Doesn’t take any energy to reset known bits
Cost to erase the tape: IkTln2 joules
For known bits, just move
the partition (without
changing the volume)
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
55
Reversible Computing
A reversible computation, that proceeds at an infinitesimal
rate, destroying no information, consumes no energy
Regardless of the complexity of the computation
The only cost is in resetting the machine at the end
Erasing information costs energy
Reversible computers are like heat engines
If we run a reversible heat engine at an infinitesimal pace, it
consumes no energy other than the work that it does
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
56
Energy cost versus speed
We want our computations to run in finite time
We need to drive the computation forward
Dissipates energy (kinetic, thermal, etc.)
Assume we are driving the computation forward at a rate r
The computation is r times as likely to go forward as go backward
What is the minimum energy per computational step?
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
57
Energy-driven computation
Computation is a transition between states
State transitions have an associated energy diagram
Assume forward state E2 has a lower energy than backward state E1
“A” is the activation energy for a state transition
Thermal fluctuations cause the computer to move between states
Whenever the energy exceeds “A”
We also used this
model in neural
networks (e.g. Hopfield
networks)
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
58
State transitions
The probability of a transition between states differing in
positive energy DE is proportional to exp(–DE/kT)
Our state transitions have unequal probabilities
The energy required for a forward step is (A – E1)
The energy required for a backward step is (A – E2)
forward rate = Ce
A E1
kT
, and backward rate = Ce
forward rate
r
=e
backward rate
A E2
kT
E1 E2
kT
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
59
Driving computation by energy differences
The (reaction) rate r depends only on the energy difference
between successive states
The bigger (E1 – E2), the more likely the state transitions, and the
faster the computation
Energy expended per step = E1 – E2= kTlnr
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
60
Driving computation by state availability
We can drive a computation even if the forward and
backward states have the same energy
As long as there are more forward states than backward states
The computation proceeds by diffusion
More likely to move into a state with greater availability
Thermodynamic entropy drives the computation
r
n
forward rate
= 2
backward rate n1
b a f afg a
f
kT ln r = kT ln n2 ln n1 = S2 S1 T
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
61
Rate-Driven Reversible Computing: A Biological Example
Protein synthesis is an example...
of (nearly) reversible computation
of the copy computation
of a computation driven forward by thermodynamic entropy
Protein synthesis is a 2-stage process
1. DNA forms mRNA
2. mRNA forms a protein
We will consider step 1
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
62
DNA
DNA comprises a double-stranded helix
Each strand comprises alternating phosphate and sugar groups
One of four bases attaches to each sugar
Adenine (A)
Thymine (T)
Cytosine (C)
Guanine (G)
(base + sugar + phosphate) group is called a nucleotide
DNA provides a template for protein synthesis
The sequence of nucleotides forms a code
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
63
RNA polymerase
RNA polymerase attaches itself to a DNA strand
Moves along, building an mRNA strand one base at a time
RNA polymerase catalyzes the copying reaction
Within the nucleus there is DNA, RNA polymerase, and triphosphates
(nucleotides with 2 extra phosphates), plus other stuff
The triphosphates are
adenosine triphosphate (ATP)
cytosine triphosphate (CTP)
guanine triphosphate (GTP)
uracil triphosphate (UTP)
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
64
mRNA
The mRNA strand is complementary to the DNA
The matching pairs are
DNA RNA
A
U
T
A
C
G
G
C
As each nucleotide is added, two phosphates are released
Bound as a pyrophosphate
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
65
The process
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
66
RNA polymerase is a catalyst
Catalysts influence the rate of a biochemical reaction
But not the direction
Chemical reactions are reversible
RNA polymerase can unmake an mRNA strand
Just as easily as it can make one
Grab a pyrophosphate, attach to a base, and release
The direction of the reaction depends on the relative
concentrations of the pyrophosphates and triphosphates
More triphosphates than pyrophosphates: Make RNA
More pyrophosphates than triphosphates: Unmake RNA
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
67
DNA, entropy, and states
The relative concentrations of pyrophosphate and
triphosphate define the number of states available
Cells hydrolyze pyrophosphate to keep the reactions going forward
How much energy does a cell use to drive this reaction?
Energy = kTlnr = (S2 – S1)T ~ 100kT/bit
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
68
Efficiency of a representation
Cells create protein engines (mRNA) for 100kT/bit
0.03µm transistors consume 100kT per switching event
Think of representational efficiency
What does each system get for 100kT?
Digital logic uses an impoverished representation
104 switching events to perform an 8-bit multiply
Semiconductor scaling doesn’t improve the representation
We pay a huge thermodynamic cost to use discrete math
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
69
Example 2: Computing using Reversible Logic Gates
Two reversible gates: controlled not (CN) and controlled
controlled not (CCN).
A CCN gate
A CN gate
A
A
A'
B
B'
C
C'
A'
B
B'
A
0
0
1
1
B
0
1
0
1
A’
0
0
1
1
B’
0
1
1
0
A
0
0
0
0
1
1
1
1
B
0
0
1
1
0
0
1
1
C
0
1
0
1
0
1
0
1
A’
0
0
0
0
1
1
1
1
B’ C’
0 0
0 1
1 0
1 1
0 0
0 1
1 1
1 0
CCN is complete: we can form any Boolean
function using only CCN gates: e.g. AND if C = 0
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
70
Next Week: Quantum Computing
Reversible Logic Gates and Quantum Computing
Quantum versions of CN and CCN gates
Quantum superposition of states allows exponential speedup
Shor’s fast algorithm for factoring and breaking the RSA
cryptosystem
Grover’s database search algorithm
Physical substrates for quantum computing
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
71
Next Week…
Guest Lecturer: Dan Simon, Microsoft Research
Introductory lecture on quantum computing and Shor’s algorithm
Discussion and review afterwards
Homework # 4 due: submit code and results electronically
by Thursday (let us know if you have problems meeting the
deadline)
Sign up for project and presentation times
Feel free to contact instructor and TA if you want to discuss
your project
Have a great weekend!
R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing
72