Bio-CS Exploration of Molecular Conformational Spaces Jean-Claude Latombe Computer Science Department Robotics Laboratory & Bio-X Clark Center.
Download ReportTranscript Bio-CS Exploration of Molecular Conformational Spaces Jean-Claude Latombe Computer Science Department Robotics Laboratory & Bio-X Clark Center.
Bio-CS Exploration of Molecular Conformational Spaces Jean-Claude Latombe Computer Science Department Robotics Laboratory & Bio-X Clark Center Range of Bio-CS Research Body system Tissue/Organs Cells Molecules Gene Molecular structures, similarities and motions Robotic surgery Soft-tissue simulation and surgical training Simulation of cell interaction Range of Bio-CS Research Body system Tissue/Organs Cells Molecules Gene Molecular structures, similarities and motions Robotic surgery Soft-tissue simulation and surgical training Simulation of cell interaction Accuray Range of Bio-CS Research Body system Tissue/Organs Cells Molecules Gene Molecular structures, similarities and motions Robotic surgery Soft-tissue simulation and surgical training Simulation of cell interaction Motion Structure 4 2 1 3 Motion Structure Function Develop efficient algorithms and data structures to explore protein conformational spaces: Sampling Similarities Pathways Vision for the Future In-silico experiments Drugs on demand “Interactive” Biology Analogy with Robotics free space [Kavraki, Svetska, Latombe,Overmars, 95] But Biology Robotics … Energy field, instead of joint control Continuous energy field, instead of binary free and in-collision spaces Multiple pathways, instead of single collision-free path Potentially many more degrees of freedom Relation to real world is more complex Overview Part I Probabilistic Roadmaps: A Tool for Computing Ensemble Properties of Molecular Motions M.S. Apaydin, D.L. Brutlag, C. Guestrin, D. Hsu, J.C. Latombe, and C. Varma. Stochastic Roadmap Simulation: An Efficient Representation and Algorithm for Analyzing Molecular Motion. J. Computational Biology, 10(34):257-281, 2003. Part II ChainTree: A Data Structure for Efficient Monte Carlo Simulation of Proteins I. Lotan, F. Schwarzer, J.C. Latombe. Efficient Energy Computation for Monte Carlo Simulation of Proteins. 3rd Workshop on Algorithms in Bioinformatics (WABI), Budapest, Hungary, Sept., 2003. Part I Probabilistic Roadmaps: A Tool for Computing Ensemble Properties of Molecular Motions Serkan Apaydin, Doug Brutlag1, Carlos Guestrin, David Hsu2, Jean-Claude Latombe, Chris Varma Computer Science Department Stanford University 1 Department of Biochemistry, Stanford University 2 Computer Science Department, Nat. Univ. of Singapore Initial Work [Singh, Latombe, Brutlag, 99] Study of ligand-protein binding Probabilistic roadmaps with edges weighted by energetic plausibility vi wij vj Initial Work [Singh, Latombe, Brutlag, 99] Study of ligand-protein binding Probabilistic roadmaps with edges weighted by energetic plausibility vi wij vj Search of most plausible path Initial Work [Singh, Latombe, Brutlag, 99] Study of energy profiles along most plausible paths energy Catalytic Site Extensions to protein folding [Song and Amato, 01] [Apaydin et al., 01] But: Molecules fold/bind along a myriad of pathways. Any single pathway is of limited interest. New Idea: Capture the stochastic nature of molecular motion by assigning probabilities to edges vi Pij vj Edge probabilities exp( Eij / k BT ) , if Eij 0; Ni Follow Metropolis criteria: Pij 1 , otherwise. N i vi Self-transition probability: Pii 1 Pij j i Pii Pij vj Stochastic Roadmap Simulation Pij S Stochastic simulation on roadmap and Monte Carlo simulation converge to same Boltzmann distribution Problems with Monte Carlo Simulation Much time is wasted escaping local minima Each run generates a single pathway Proposed Solution Pij Treat a roadmap as a Markov chain and use First-Step Analysis tool Example #1: Probability of Folding pfold HIV integrase [Du et al. ‘98] 1- pfold pfold “We stress that we do not suggest using pfold as a transition coordinate for practical purposes as it is Unfolded state very computationally intensive.” Folded state Du, Pande, Grosberg, Tanaka, and Shakhnovich “On the Transition Coordinate for Protein Folding” Journal of Chemical Physics (1998). First-Step Analysis U: Unfolded set F: Folded set One linear equation per node Solution gives pfold for all nodes l k No explicit simulation run j Pik Pil All pathways are taken Pij into account m Pim Sparse linear system i Pii Let fi = pfold(i) After one step: fi = Pii fi + Pij fj + Pik fk + Pil fl + Pim fm =1 =1 In Contrast … Computing pfold with MC simulation requires: For every conformation c of interest Perform many MC simulation runs from c Count number of times F is attained first Computational Tests • 1ROP (repressor of primer) • 2 a helices • 6 DOF • 1HDD (Engrailed homeodomain) • 3 a helices • 12 DOF H-P energy model with steric clash exclusion [Sun et al., 95] Correlation with MC Approach 1ROP Computation Times (1ROP) Monte Carlo: 49 conformations Over 11 days of computer time Over 106 energy computations Roadmap: 5000 conformations 1.5 hours of computer time ~15,000 energy computations ~4 orders of magnitude speedup! Example #2: Ligand-Protein Interaction Computation of escape time from funnels of attraction around potential binding sites funnel = ball of 10Å rmsd [Camacho, Vajda, 01] Similar Computation Through Simulation [Sept, Elcock and McCammon `99] 10K to 30K independent simulations Computing Escape Time with Roadmap l k j Pil Pik Pij i Pii m Pim Funnel of Attraction ti = 1 + Pii ti + Pij tj+ Pik tk + Pil tl + Pim tm =0 (escape time is measured as number of steps of stochastic simulation) Distinguishing Catalytic Site Given several potential binding sites, which one is the catalytic site? Energy: electrostatic + van der Waals + solvation free energy terms Complexes Studied ligand protein # random nodes # DOFs oxamate 1ldm 8000 7 Streptavidin 1stp 8000 11 Hydroxylamine 4ts1 8000 9 COT 1cjw 8000 21 THK 1aid 8000 14 IPM 1ao5 8000 10 PTI 3tpi 8000 13 Distinction Based on Energy Protein Bound state Best potential binding site 1stp -15.1 -14.6 4ts1 -19.4 -14.6 3tpi -25.2 -16.0 1ldm -11.8 -13.6 1cjw -11.7 -18.0 1aid -11.2 -22.2 1ao5 -7.5 -13.1 Able to distinguish catalytic site Not able (kcal/mol) Distinction Based on Escape Time Protein 1stp 4ts1 3tpi 1ldm 1cjw 1aid 1ao5 Bound state 3.4E+9 3.8E+10 1.3E+11 8.1E+5 5.4E+8 9.7E+5 6.6E+7 Best potential binding site 1.1E+7 1.8E+6 5.9E+5 3.4E+6 4.2E+6 1.6E+8 5.7E+6 Able to distinguish catalytic site Not able (# steps) Conclusion Probabilistic roadmaps are a promising tool for computing ensemble properties of molecular pathways Current work: Non-uniform sampling strategies to handle more complex molecules More realistic energetic models Extension to molecular dynamic simulation Connection to in-vitro experiments (interaction of two proteins) Part II ChainTree: A Data Structure for Efficient Monte Carlo Simulation of Proteins Itay Lotan, Fabian Schwarzer, Dan Halperin1, Jean-Claude Latombe Computer Science Department Stanford University 1 Computer Science Department, Tel Aviv University Monte Carlo Simulation (MCS) Used to study thermodynamic and kinetic properties of proteins Random walk through conformation space At each attempted step: – Perturb current conformation at random – Accept step with probability: P(accept ) min 1, e E / kbT Problem: How to maintain energy efficiently? Energy Function E = S bonded terms + S non-bonded terms Bonded terms, e.g. bond length Easy to compute Non-bonded terms, e.g. Van der Waals, depend on distances between pairs of atoms Expensive to compute, O(n2) Energy Function Non-bonded terms Use cutoff distance (6 - 12Å) Only O(n) interacting pairs [Halperin & Overmars ’98] Problem: How to find interacting pairs without enumerating all atom pairs? Grid Method Subdivide space into cubic cells Compute cell that contains each atom center Store results in hash table dcutof f • Θ(n) time to update grid • O(1) time to find interactions for each atom • Θ(n) to find all interactions Asymptotically optimal in worst-case! Can We Do Better on Average? Proteins are long kinematic chains Protein’s Kinematic Structure torsional dof Angles j, y for backbone and c for side-chains Conformational space Can We Do Better on Average? Proteins are long chain kinematics Few DOFs are perturbed at each MC step How to retrieve unchanged partial sums? Long sub-chains stay rigid at each step Many partial energy sums remain constant Two New Data Structures 1. ChainTree Fast detection of interacting atom pairs 2. EnergyTree Reuse of unchanged partial energy sums ChainTree Combination of two hierarchies: Transform hierarchy: Bounding volume hierarchy: ChainTree Combination of two hierarchies: Transform hierarchy: approximate kinematics of protein backbone at successive resolutions ChainTree Combination of two hierarchies: (Larsen et al., ’00) Bounding volume hierarchy: approximate geometry of protein at successive resolutions ChainTree Updating the ChainTree Update path to root – Recompute transforms that shortcut change – Recompute bounding volumes that contain change Finding Interacting Pairs vs. • Do not search inside rigid sub-chains (unmarked nodes) • Do not test two nodes with no marked node between them Finding Interacting Pairs vs. • Do not search inside rigid sub-chains (unmarked nodes) • Do not test two nodes with no marked node between them Computational Complexity • n : total number of DOFs in protein backbone • k : number of simultaneous DOF changes at each step of MCS • Updating complexity: n O k log k • Worst-case complexity of finding all 4 interacting pairs: 3 (n ) but performs much better in practice!!! EnergyTree E(P,P) E(N,N) E(N,O) E(O,O) EnergyTree E(P,P) E(N,N) E(N,O) E(O,O) E (a , ) E (al , l ) E (a r , r ) E (al , r ) E (a r , l ) E (a ,a ) E (al ,al ) E (a r ,a r ) E (al ,a r ) Experimental Setup Energy function: – – – – Van der Waals Electrostatic Attraction between native contacts Cutoff at 12Å 300,000 steps MCS Early rejection for large vdW terms Results: 1-DOF change (68) (144) (374) (755) Results: 5-DOF change (68) (144) (374) (755) Two-Pass ChainTree (68) (144) (374) (755) Conclusion • Chain/EnergyTree reduces average time per step in MCS of proteins (vs. grid) • Exploit chain kinematics of protein • Larger speed-up for bigger proteins and for smaller number of simultaneous DOF changes What is Computational Biology? Using computers in Biology? Designing efficient algorithms for analyzing biological data and simulating biological processes? Using Biology to design new algorithms and computing hardware? Cultural clash Biology classification Computer Science abstraction In any case, Computational Biology will be a critical domain for the next 20 years, probably the next “big thing” after the Internet