Transcript Slide 1

Biology in Computation
and
Computation in Biology
Molecular Computation of Solutions to Combinatorial Problems,
Leonard M.Adleman,
Science 1994
Imposing specificity by localization: mechanism and evolvability,
Mark Ptashne and Alexander Gann,
Current Biology 1998
Note: use
to follow links in the presentation
What kind on computations can be done with DNA?
Molecular Computation of Solutions
to Combinatorial Problems
DNA complementary
The directed Hamiltonian path problem
A directed graph G with vertices vin and vout
is said to have a Hamiltonian path
if and only if
there exist a sequence of “one-way” edges e1, e2... en (that is, a path)
that begins at vin and ends vout and enters every other vertex exactly once.
4
g
vin
3
e
1f
vin
vout
0
c
6
d
2
a
5
b
vout
No Hamiltonian
path45, 56
01,
12, 23, 34,
The directed Hamiltonian path problem
A directed graph G with vertices vin and vout
is said to have a Hamiltonian path
if and only if
there exist a sequence of “one-way” edges e1, e2... en (that is, a path)
that begins at vin and ends vout and enters every other vertex exactly once.
There
A particular
is no known
case efficient
of the
Hamiltonian
algorithmpath/circuit
for finding is
a the
4
Traveling
Hamiltonian
Salesman
path/circuit.
problem
3
where a salesman wants to visit
1
vin
vout
0
The
n cities
fastest
via known
the shortest
algorithms
route
take exponential time.
6
In general, this is an
NP-complete problem.
2
5
Solving the Hamiltonian path problem
An algorithm solving the Hamiltonian path problem:
Step 1: Generate random paths through the graph
Step 2: Keep only those paths that begin with vin and end with vout
Step 3: If the graph has n vertices, then keep only those paths
that enter exactly n vertices
Step 4: keep only paths that enter all of the
vertices of the graph at least once
4
1
3
Step 5: If any paths remain, say “yes”;
otherwise, say “no”.
vin
vout
0
6
2
Note: use
to follow links in the presentation
5
Solving the Hamiltonian path problem
4
Implementing the algorithm at the molecular level:
1
3
vin
vout
0
6
Drawbacks:
2
5
This computation requires ~7 days of lab work
Possible errors – for example:
- “pseudopaths” caused by incompatible ligation
 unlikely to survive all the separation steps
confirm that the Hamiltonian path received actually occurs in the graph
- Inexact reactions such as:
Loss of Hamiltonian path molecules that failed to bind and
retention of non-Hamiltonian path molecules that succeeded to bind
 more stringent or repeated separation procedures
Solving the Hamiltonian path problem
4
Implementing the algorithm at the molecular level:
1
3
vin
vout
0
6
Advantages:
2
5
With the described algorithm, the number of procedures grows
linearly with the number of vertices in the graph. O(n)
The number of oligonucleotides grows linearly with the number of edges.
Supercomputers vs DNA computation
- 1012 op/sec vs 1014 op/sec
- 109 op/J vs 1019 op/J (in the ligation step)
- 1 bit per 1012 nm3 vs 1 bit per 1 nm3
(video tape vs. DNA molecules)
Solving the Hamiltonian path problem
4
Implementing the algorithm at the molecular level:
1
3
vin
vout
0
6
“For certain intrinsically complex problems,
2
5
such as the directed Hamiltonian path problem
where existing electronic computers are very inefficient
and where massively parallel searches can be organized to take advantage
of the operations that molecular biology currently provides,
it is conceivable that molecular computation might compete with
electronic computation in the near term”
DNA self-assembled nanostructures
DNA computers
DNA Nanotechnology and its Biological Applications,
Chapter 13 of Book: Bio-inspired and Nano-scale Integrated Computing, Publisher: Wiley, USA, (2007).
Going into the biological system…
Biology in Computation
Molecular computation
of solutions to
combinatorial problems
Computation in Biology
combinatorial computation
within molecular biologic
systems
Imposing specificity by localization: mechanism and evolvability,
Mark Ptashne and Alexander Gann,
Current Biology 1998
Specificty by localization
Input
Signal
A
C
signal
Machine
Output
signal
Machine
Output
Output A
C
Input
1
3
Input
1
Input
2
Input
3
Input
4
How is specificity encoded?
Specificty
Transcription
by localization
regulation
We have a powerful machine that can bring the “instructions” to life
Signal C
Machine
RNA
Enzyme
pol
mRNA
Outputof
C gene 3
Gene
Input
3
Gene product
Input
Input
Input
How does
it know which
instructions should be performed
1
2
4
at any given time?
How is specificity encoded?
Transcription regulation
Input
signal
Allosteric change
of a target protein
• Activation of transcription factors
• Inactivation of transcription factors
These transcription factors then serve as “locators”
Transcription regulation
A typical activator has 2 domains:
1. An ‘activating domain’ – that interacts with RNA polymerase
2. A DNA binding domain
RNA pol
Activator
DNA
The specificity is thus determined by the binding of the activator to a site
– a DNA binding address - on one/several promoters.
Similarly, a typical repressor binds to
specific sites on a promoter and blocks the
polymerase from accessing these regions
Transcription regulation
RNA pol
activator
RNA pol
repressor
ON
OFF
Once the RNA polymerase
is brought to a specific promoter
the transcription proceeds spontaneously
Binding sites combinatorics
Modulating the binding function (and thus the expression function) –
Weak sites versus Strong sites
Cooperativity (synergism) in DNA binding:
- Between an activator and the polymerase – enhanced recruitment
- Between 2 activators - fine tuning the function
- Cooperativity via nucleosomes
- Phage lambda’s sensitive switch
Combining signals – creating an AND gate:
- Sugar metabolism genes in E.coli
- Human interferon- gene – combinatorics in Eukaryotes
Cooperativity (synergism) in polymerase activation:
- Via multiple sites
- Via multiple components in the initiation machinery
Note: use
to follow links in the presentation
The main players
Specificity by localization
Why is the strategy of imposing specificity by localization
found so widely in nature?
Let’s
consider
an alternative
method:
The same
enzyme
can be used
in many different pathways —
Determining
purely work
by allosteric
control with many different regulators.
This requiresspecificity
that the enzyme
in combination
This would require a separate RNA polymerase for each promoter –
the integration of the relevant signals will induce an allosteric transition
in the appropriate polymerase – triggering transcription.
However, designing such a variaty of polymerases seems quite difficult…
It is hard to imagine how a purely allosteric based
“implementation” can posses a flexible and sensitive
combinatorial control as the one achieved by the
strategy of localization
Biology in Computation
and
Computation in Biology
Molecular Computation of Solutions to Combinatorial Problems,
Leonard M.Adleman,
Science 1994
Imposing specificity by localization: mechanism and evolvability,
Mark Ptashne and Alexander Gann,
Current Biology 1998
The End…
Solving the Hamiltonian path problem
4
Implementing the algorithm at the molecular level:
Step 1: Generate random paths through the graph
1
3
vin
vout
0
6
2
Vertex i
Oi =
5
A random 20-mer sequence of DNA, denoted Oi
5’
3’
A C A T G A G C T G G G T A C G A A T T
Watson-Crick
complementary
Edge ij
Oi =
T G T A C AnToligonucleotide
C G A C consisitng
C C A of:
T G C T T A A
A 3’C10-mer
G Aof O
Ai followed
T T Aby T
C C of
Goj A T T A
Oij = G G T the
the C
5’ 10-mer
(if i=0 then oij = Oi, if j=6 then oij = oj)
Vertex j
Oj =
A T C C C G A T T A T G T C A G A C G G
Solving the Hamiltonian path problem
4
Implementing the algorithm at the molecular level:
Step 1: Generate random paths through the graph
1
3
vin
vout
0
6
2
5
For each
vertex
and for
each edge
graph,
The
scale(except
of thisi=0,6)
ligation
>>>>
whatinisthe
necessary
for this graph
50 pmol
oi edge
and 50
of oij
were13mixed
together
in a single molecule
liigation reaction
For of
each
in pmol
the graph,
~ 3X10
copies
of the associated
were
added to the ligation reaction
Oj
many DNA molecules
encoding the Hamiltonian path were created
G G T A C G A A T T A T C C C G A T T A
T G T A C T C G A C C C A T G C T T A A T A G G G C T A A T A C A G T C T G C C
It seems a much larger graph could have been processed with
the quantities used here.
Oij
Ojk
The ligation reaction results in the formation of DNA molecules
encoding random paths through the graph
Solving the Hamiltonian path problem
4
Implementing the algorithm at the molecular level:
Step 2: Keep only those paths that
begin with vin and end with vout
The product of step 1
Selective amplification by
PCR with primers o0 and o6
Only those molecules encoding paths that begin
with vertex 0 and end with vertex 6 were amplified
1
3
vin
vout
0
6
2
5
Solving the Hamiltonian path problem
4
Implementing the algorithm at the molecular level:
Step 3: Keep only those paths that
enter exactly n vertices
The product of step 2
Run on agarose gel and
extract 140bp bands
Only those molecules encoding paths that
enter exactly 7 vertices
were extracted and amplified
1
3
vin
vout
0
6
2
5
Solving the Hamiltonian path problem
4
Implementing the algorithm at the molecular level:
Step 4: keep only those paths that
enter all of the vertices at least once
The product of step 3
Generating single stranded DNA
incubating the DNA with o1
Repeat with
conjugated to magnetic beads
o2, o3, o4, o5
Only molecules that containing o1
annealed to the bound o1 and were retained
Only molecules that entered
vertices 1, 2, 3, 4 and 5 were retained
1
3
vin
vout
0
6
2
5
Solving the Hamiltonian path problem
4
Implementing the algorithm at the molecular level:
Step 5: If any paths remain, say “yes”;
otherwise, say “no”
The product of step 4
1
3
vin
vout
0
6
2
5
For the molecules encoding the
Hamiltonian path:
Graduated PCR –
01, 12, 23, 34, 45, 56
A method for “printing” results
by running different PCR reactions
each with O0 as the right primer
and Oi as the left primer
Identifying the Hamiltonian path
this method will produce bands of
40, 60, 80, 100, 120 and 140bp
in successive lanes
Weak site Vs Strong site
A protein recognizes different sequences with different affinities –
A likely situation
1
Depending on the
2
factor concentration
TF binding prob. (Pbound)
1
0.8
0.6
0.4
0.2
0
-3
10
-2
10
-1
10
0
10
TF concentration
1
10
2
10
Cooperativity in DNA binding
Two DNA binding proteins
1
2
The sites are filled in a highly sigmodial
3
function of the protein concentration
TF binding prob. (Pbound)
1
• Confers buffer against minor fluctuations
0.8
in the protein concentration
0.6
• Confers the ability for a dramatic change
0.4
when a significant proportion of the
0.2
protein is activated / inactivated at once
0
-3
10
-2
10
-1
10
0
10
TF concentration
1
10
2
10
Cooperativity in DNA binding
If activation
Anothermerely
possile
involves
form oflocating
cooperativity
the transcription
between transcription
machinery at
factors
the gene
– any factors
that inhibit
or facilitate that
relocation
process
can
have an effect
cooperativity
Via nucleosomes
(and
not protein
protein
interatcion)
on gene expression
Such a factor – are nucleosomes…
1
2
activator
Nucleosome
Nucleosome
activator
activator
3
4
TF binding prob. (Pbound)
1
0.8
0.6
RNA pol
OFF
ON
0.4
0.2
0
-4
10
-3
10
-2
10
-1
10
TF concentration
See works from Jon Widom lab
0
10
1
10
Phage lambda’s sensitive switch
Inducting signal
Lysogenic state
The bacterial genes,
within a host E.coli,
are in a silent state
Lytic state
The bacterial genes,
within a host E.coli,
are active
PRM = promoter controlling
the repressor gene
PR = promoter controlling
the lytic genes
An “all-or-none” switch implemented by
two adjacent promoters –
when one is “on” the other is “off”!
The main players
Phage lambda’s sensitive switch
Induction signal
Lysogenic
state
Lytic state
Lambda repressor
Repressor dimer at OR2 recruits
the polymerase
Two Repressor dimers
at OR1 and OR2
PRM = promoter controlling
the repressor gene
OFF
ON
PR = promoter controlling
the lytic genes
OFF
ON
Cro repressor
Phage lambda’s sensitive switch
Switch properties:
1
Protein-protein interaction:
2
3
1. repressor dimerization
2. Cooperative interaction of
repressor dimers
3. Cooperative binding of RNA
polymerase and the activator
(lambda repressor to PRM promoter)
The surfaces involved in these interactions are interchangeable –
An example of an “activator bypass” experiment
Phage lambda’s sensitive switch
Switch properties:
Both the protein-protein
and the binding interactions
are relatively weak interactions
The cooperative nature of the
The components are
interaction is necessary for the
maintained in a relatively
performance of the switch
narrow range of concentrations
1
2
3
Sugar metabolism genes in E.coli
The genes are transcribed if and only if:
1. Absence of glucose
2. The relevant sugar is present
Let’s take a closer look at the Lac genes:
AND
gate
Expression
of alternative
sugar genes
Sugar metabolism genes in E.coli
Let’s take a closer look at the Lac genes:
Low glucose
High cAMP
High lactose
signal
A metabolic derivative of lactose
binds the lac repressor
CAP
CAP-cAMP complex
Lac repressor
Inactive Lac repressor
cannot bind the DNA
Allosteric change
CAP-cAMP complex
binds the DNA
Localization
Interpretation of the signal at
the DNA binding level
Information processing
Synergism in polymerase activation
The level of
transcription elicited
by contact 1
The level of
transcription elicited
by contact 2
The level of
transcription elicited
by the two contacts
One such example:
Measuring expression from an
artificial PRM promoter construct
The construct contains:
*It CAP
sitethe factors contact the
seems
* lambda repressor
site
polymerase
simultaneously
Thedifferent
sites aresubunits)
positioned so that
(at
each
of the
can makeresponse
its
resulting
in factors
an a synergistic
natural contact with polymerase
Joung JK, Koepp DM, Hochschild A: Synergistic activation of transcription by
bacteriophage l cl protein and E. coli cAMP receptor protein. Science 1994
lacZ
PRM
The main players…
Prokaryotes Vs. Eukaryotes
The prokaryotes are a group of organisms, mostly unicellular, that lack a cell
nucleus or any other membrane-bound organelles.
Animals, plants, fungi, and protists are eukaryotes - organisms whose cells are
organized into complex structures enclosed within membranes.
The distinction between prokaryotes and eukaryotes is that eukaryotes have a
"true" nuclei containing their DNA, whereas the genetic material in prokaryotes
is not membrane-bound.
Escherichia coli
phage λ
Prokaryotes
Bacteria
The main players…
A bacteriophage is any one of a number of viruses that infect bacteria
Enterobacteria phage λ (lambda phage) –
A temperate bacteriophage that infects Escherichia coli.
The
lysogenic
pathway:
Lambda
lytic phage
pathway:
is a virus particle consisting of a head,
The
DNA
integrates
into
theas
host
cell chromosome
It willphage
containing
replicates
double-stranded
its DNA, itself
linear
DNA
its genetic
material,
In
this
state,
the
λ DNA
is
called
a prophage
and its
stays
resident within the host's
degrades
and
a tail
–the
through
host
DNA
which
and
it injects
its DNA into
host.
genome
without
harm
to the host.and translation mechanisms
hijacks the
cell's apparent
replication,
transcription
The
prophage
is duplicated
with every
cellallow.
division of the host.
to
produce
as many
phage particles
assubsequent
cell resources
The phage genes expressed in this dormant state code for proteins that repress
expression
of other phage
genes. the phage will lyse (break open) the host cell,
When cell resources
are depleted,
releasing the new phage particles.
When the host cell is under stress - these proteins are broken down
 resulting in the expression of the repressed phage genes.
The activated prophage then enters its lytic pathway.
NP-complete
An important aspect of the Computational complexity theory is to categorize computational problems
and algorithms into complexity classes
Complexity classes:
• P - the set of decision problems that can be solved by a deterministic machine in polynomial time.
• NP - the set of decision problems that can be solved by a non-deterministic machine in polynomial
time. The solution for all the problems in this class can be verified in polynomial time
?
The most important open question of complexity theory is whether P = NP
• NP-complete is a subset of NP - A decision problem X is NP-complete if :
- X is in NP
- Every problem in NP is reducible to x (every other problem in NP can be quickly transformed into x)
Although any given solution to such a problem can be verified quickly, there is no known efficient
way to locate a solution in the first place; indeed, the most notable characteristic of NP-complete
problems is that no fast solution to them is known. That is, the time required to solve the problem
using any currently known algorithm increases very quickly as the size of the problem grows. As a
result, the time required to solve even moderately large versions of many of these problems easily
reaches into the billions or trillions of years, using any amount of computing power available today.