Conformation Networks: an Application to Protein Folding

Download Report

Transcript Conformation Networks: an Application to Protein Folding

Conformation Networks:
an Application to Protein
Folding
Zoltán Toroczkai
Erzsébet Ravasz
Center for Nonlinear Studies
Gnana Gnanakaran (T-10)
Theoretical Biology and Biophysics
Los Alamos National Laboratory
Center for Nonlinear Studies
Proteins
 the most complex molecules in nature
 globular or fibrous
 basic functional units of a cell
 chains of amino acids (50 – 103)
 peptide bonds link the backbone
Native state
 unique 3D structure (native physiological conditions)
 biological function
 fold in nanoseconds to minutes
 about 1000 known 3D structures:
crystallography, NMR
X-ray
Myoglobin
153 Residues, Mol. Weight=17181 [D], 1260 Atoms
Main function: primary oxygen storage and carrier in
muscle tissue
It contains a heme (iron-containing porphyrin ) group
in the center. C34H32N4O4FeHO
Protein conformations
• defined by dihedral angles
 2 angles with 2-3 local minima of the torsion energy
Amino-acid
 N monomers  about 10N different conformations
Levinthal’s paradox
• Anfinsen: thermodynamic hypothesis
native state is at the global
minimum of the free energy
• Levinthal’s paradox, 1968
Epstain, Goldberger, & Anfinsen,
Cold Harbor Symp. Quant. Biol. 28,
439 (1963)
Levinthal, J. Chim. Phys. 65, 44-45 (1968)
 finding the native state by random sampling is not
possible
 40 monomer polypeptide  1013 conf/s
 3 1019 years to sample all
 universe ~ 2 1010 years old
 nucleation
 folding pathways
Wetlaufer, P.N.A.S.
70, 691 (1973)
Free energy landscapes
• Bryngelson & Wolynes, 1987
Bryngelson & Wolynes, P.N.A.S. 84, 7524 (1987)
 free energy landscape
 a random hetero-polymer typically does NOT fold
Experiment:
— random sequences
— GLU, ARG, LEU
— 80-100 amino-acids
~ 95% did not fold
in a stable manner
Davidson & Sauer, P.N.A.S. 91, 2146 (1994)
Funnels
• Leopold, Mortal & Onuchic, 1992
Leopold, Mortal & Onuchic, P.N.A.S. 89, 8721 (1992)
Energy funnels
Given any amino-acid
sequence: can we tell if it
is a good folder?
 experiments (X-ray, NMR)
 molecular dynamics
simulations
 homology modeling
 many folding pathways
Difficult and slow
Molecular dynamics
• State of the art
Sanbonmatsu, Joseph & Tung, P.N.A.S. 102 15854 (2005)
 supercomputer (LANL)
Ribosome in explicit solvent:
– targeted MD
– 2.64x106 atoms (2.5x105 + water)
– Q machine, 768 processors
– 260 days of simulation (event: 2 ns)
 distributed computing
(Stanford, Folding@home)
~ 1016 times
slower
– more than 100,000 CPU’s
– simulation of complete folding event
» BBA5, 23-residue, implicit water
» 10,000 CPU days/folding event (~1s)
Shirts & Pande, Science 290, 1903 (2000)
Snow, Nguyen, Pande, Gruebele, Nature 420,102 (2002)
Configuration networks
• Configuration networks
Protein conformations
 dihedral angles have
few preferred values
Ramachandran & Sasisekharan, J.Mol.Biol. 7, 95 (1963)
• Helix
• Sheet
• other
Ramachandran map
PDB structures
NODE  configuration
LINK  change of one degree
of freedom (angle)
 refinement of angle
values  continuous case
Why networks?
• VERY LARGE: 100 monomers  10100 nodes. However:
Generic features of folding are determined
by STATISTICAL properties
of the configuration network
 toolkit from network research
 captures the high dimensionality
 degree distribution
 average distance
 clustering
 degree correlations
Albert & Barabási, Rev. Mod. Phys. 74, 67 (2002); Newman, SIAM Rev. 45, 167 (2003)
 faster algorithms to simulate folding events
 pre-screening synthetic proteins
 insights into misfolding
A real example
• The Protein Folding Network: F. Rao, A. Caflisch,
J.Mol.Biol, 342, 299 (2004)
 beta3s: 20 monomers,
antiparallel beta sheets
 MD simulation,
implicit water
 330K, equilibrium
folded  random coil
NODE -- 8 letters / AA
(local secondary struct)
LINK -- 2ps transition
Its native conformation has been studied by NMR experiments:
De Alba et.al. Prot.Sci. 8, 854 (1999).
Beta3s in aqueous solution forms a monomeric triple-stranded antiparallel
beta sheet in equilibrium with the denaturated state.
•Simulations @ 330K
•The average folding time from denaturated state ~ 83ns
•The average unfolding time ~83ns
•Simulation time ~12.6s
•Coordinates saved at every 20ps (5105 snapshots in 10s)
•Secondary structures: H,G,I,E,B,T,S,- (-helix, 310 helix, -helix, extended,
isolated -bridge, hydrogen-bonded turn, bend and unstructured).
•The native state: -EEEESSEEEEEESSEEEE•There are approx. 818 1016 conformations.
•Nodes: conformations, transitions: links.
Scale-free network
beta3s
randomized
Many real-world
networks are scale free

 hubs
Barabási & Albert,
Science 286, 509,
(1999);
Many reasons
 co-authorship (=1 - 2.5)
behind SF topology
 citations (=3)
•
•
•
 sexual contacts (=3.4)
 movie actors (=2.3)
 Internet
Why is the protein network scale
free?(y=2.4)
 World
Why does the randomized chain
haveWide Web (=2.1/2.5)
 Genetic regulation (=1.3)
similar degree distribution?  Protein-protein interactions ( =2.4)
 Metabolic pathways (=2.2)
Why is = - 2 ?
 Food webs (=1.1)

Robot arm networks
12
02
n=0
22
021
11
n=1
0
1
2
01
21
00
20
020
10
010
00
n=2
22
01
02
10 11 12
20
21
000
 n-dimensional hypercube
100
200
• Steric constraints?
 binomial degree distribution
Homogeneous
 missing nodes
 missing links
Swiss cheese
A bead-chain model
• Beads on a chain in 3D: robot arm model
 similar to C protein models
 rod-rod angle 
 3 positions around axis
N=18;  = 120
2212112212111122
Honeycutt & Thirumalai, Biopolymers 32, 695 (1992)
N=6;  = 90
 Homogeneous network
Another example:
L = 7,  = 75 , r = 0.25
“00100”
state “00100”
allowed state
forbidden state
Adding monomers not only increases the number of nodes in the network but also
its dimensionality!! The combined effect is small-world.
Shortcuts in Folding Space
The “dilemma”
HOMOGENEOUS
• from studies of
conformation networks
SCALE FREE
• from polypeptide MD
simulations
 bead chain
 beta3s
 robot arm
 randomized version
?
Gradient Networks
Gradients of a scalar (temperature, concentration, potential, etc.) induce flows (heat,
particles, currents, etc.).
Naturally, gradients will induce flows on networks as well.
Ex.:
Load balancing in parallel computation and packet routing on the internet
Y. Rabani, A. Sinclair and R. Wanka, Proc. 39th Symp. On Foundations of Computer
Science (FOCS), 1998: “Local Divergence of Markov Chains and the Analysis of Iterative
Load-balancing Schemes”
References:
Z. T. and K.E. Bassler, “Jamming is Limited in Scale-free Networks”, Nature, 428, 716 (2004)
Z. T., B. Kozma, K.E. Bassler, N.W. Hengartner and G. Korniss “Gradient Networks”,
http://www.arxiv.org/cond-mat/0408262
Setup:
Let G=G(V,E) be an undirected graph, which we call the substrate network.
The vertex set:
V  {x0 , x1,...,xN 1}  {0,1,2,...,N 1}
The edge set:
E  V V , e  E, e  xi x j  (i, j), xx  E (noself - loops)
A simple representation of E is via the Nx N adjacency (or incidence) matrix A
1 if (i, j )  E
A( xi , x j )  aij  
0 if (i, j )  E
Let us consider a scalar field
(1)
{h} : V  
Set of nearest neighbor nodes on G of i :
Si(1)
Definition 1
The gradient h(i) of the field {h} in node i is a directed edge:
h(i)  (i,  (i))
(2)
(1)
Which points from i to that nearest neighbor   Si {i} for G for which the increase in the
scalar is the largest, i.e.,:
 (i)  arg max(h j )
(3)
jSi(1) {i}
The weight associated with edge (i,) is given by:
h(i)  h  hi
If  (i)  i then h(i)  (i, i)  0(i) .
The self-loop
0(i)
is a loop through i
with zero weight.
Definition 2
The set F of directed gradient edges on G together with the vertex set V forms
the gradient network:
G  G(V , F )
If (3) admits more than one solution, than the gradient in i is degenerate.
In the following we will only consider scalar fields with non-degenerate gradients. This means:
Prob.{hi  h j if (i, j)  E}  0
Theorem 1
Proof:
Non-degenerate gradient networks form forests.
Theorem 2
The number of trees in this forest = number of local maxima of {h} on G.
0.48
0.82
0.67
0.65
0.46
0.6
0.53
0.44
0.5
0.22
0.2
0.65
0.1
0.19
0.16
0.87
0.15
0.32
0.14
0.2
0.2
0.18
0.44
0.43
0.67
0.7
0.05
0.15
0.24
0.16
0.65
0.13
0.55
0.05
0.65
0.8
For Erdős - Rényi random graph substrates with i.i.d random numbers as scalars, the in-degree distribution is:
In the limit p  0, N  , z  Np  const. , z  1 :
1
RN (l )  , 1  l  z  Np ,
zl
The Configuration model
A. Clauset, C. Moore, Z.T., E. Lopez, to be published.
Generating functions:
g ( z )   ki z k
i

xg ( x) 

R( z )   dx g 1  (1  z )
g (1) 

0
1
K-th Power of a Ring



4 3  9 K  4 K 2  2 Kl
, 1  l  K 1

 (2 K  l )(2 K  l  1)(2 K  l  2)(2 K  l  3)


6 2  7K  7K 2

, lK
3K (3K  1)(3K  2)(3K  3)

R ( 2 K ) (l )  

42 K  1

, K  1  l  2K  1
 (2 K  l  1)(2 K  l  2)(2 K  l  3)


1

,
l  2K

4 K  1


Power law with exponent =- 3
2K+l
The energy
landscape
• Energy associated with each node (configuration)
 the gradient network
 most favorable transitions
 T=0 backbone of the flow
 MD simulation
 tracks the flow network
 biased walk close to the gradient network
 trees
 basins of local minima
What generates  = - 2 ?
The REM generates an exponent of -1.
Model ingredients
• A network model of configuration spaces
 network topology
 homogeneous
 degree correlations
 how to associate energies
constrained (folded)
small kconf
lower energy
k, E increases
loose (random coil)
large kconf
higher energy
Random geometric graph
• random geometric graph
R=0.113, <k>=20
Dall & Christensen, Phys.Rev.E 66, 026121 (2002)
• Energy proportional to
connectivity
E
 in higher D: similar to
hypercube with holes
 degree correlations
k
N=30000, <k> = 1000, d=2.
Exponent is - 2
2 essential ingredients:
1)
2)
k1-k2 correlations
<E> with k monotonic
Bead-chain model
• more realistic model: bead-chain
 configuration network
 excluded volume
 energy: Lennard-Jones
Lennard-Jones
potential
Repulsive
Attractive
L = 30,  = 75
The case of the -helix
AKA peptide
• ALA: orange
• LYS: blue
• TYR: green
MD simulations, no water.
The MD traced network
T = 400
More than one simulation
3 different runs: yellow, red
and green
The role of temperature
Conclusions
• A network approach was introduced to study sterically constrained conformations
of ball-chain like objects.
• This networks approach is based on the “statistical dogma” stating that generic
features must be the result of statistical properties of the networks and should not
depend on details.
• Protein conformation dynamics happens in high dimensional spaces that are not
adequately described by simplistic reaction coordinates.
• The dynamics performs a locally biased sampling of the full conformational
network. For low enough temperatures the sampled network is a gradient graph
which is typically a scale-free structure.
• The -2 degree exponent appears at and bellow the temperature where the basins
of the local energy minima become kinetically disconnected.
• Understanding the protein folding network has the potential of leading to faster
simulation algorithms towards closing the gap between nature’s speed and ours.
Coming up: conditions on side chain distributions for the
existence of funneled energy landscapes.