Languages for Systems Biology Luca Cardelli Microsoft Research Cambridge UK http://www.luca.demon.co.uk/BioComputing.htm http://research.microsoft.com/bioinfo 50 Years of Molecular Cell Biology ● Genes are made of DNA – Store digital information.
Download ReportTranscript Languages for Systems Biology Luca Cardelli Microsoft Research Cambridge UK http://www.luca.demon.co.uk/BioComputing.htm http://research.microsoft.com/bioinfo 50 Years of Molecular Cell Biology ● Genes are made of DNA – Store digital information.
Languages for Systems Biology Luca Cardelli Microsoft Research Cambridge UK http://www.luca.demon.co.uk/BioComputing.htm http://research.microsoft.com/bioinfo 50 Years of Molecular Cell Biology ● Genes are made of DNA – Store digital information as sequences of 4 different nucleotides – Direct protein assembly through RNA and the Genetic Code ● Proteins (>10000) are made of amino acids – – – – – Process signals Activate genes Move materials Catalyze reactions to produce substances Control energy production and consumption ● Bootstrapping still a mystery – DNA, RNA, proteins, membranes are today interdependent. Not clear who came first – Separation of tasks happened a long time ago – Not understood, not essential 11/6/2015 2 Towards Systems Biology ● Biologists now understand many of the cellular components ● But this has not led to understand how “the system” works ● New approach: try to understand “the system” ● What kind of a system? ● Can we fix it when it breaks? – A whole team of biologists will typically study a single protein for years – When each component and each reaction is understood, the system is understood (?) – Behavior comes from complex chains of interactions between components – Predictive biology and pharmacology still rare – Synthetic biology still unreliable – Experimentally: massive data gathering and data mining (e.g. Genome projects) – Conceptually: modeling and analyzing networks (i.e. interactions) of components – Just beyond the basic chemistry of energy and materials processing… – Built right out of digital information (DNA) – Based on information processing for both survival and evolution – Really becomes: How is information structured and processed? 11/6/2015 3 Size Pentium II Gate 1 micron E. Coli 1 micron DVD 1 micron = 0.25 micron in Pentium II 11/6/2015 4 Performance Pentium II ● ● ● ~3 million transistors ~1/4 megabyte of memory ~100 million operations per second DVD E. Coli ● ● ● ~1 million macromolecules ~1 megabyte of static genetic memory ~1 million amino-acids per second ● ● 4700 megabytes of memory 1.385 megabytes per second Comparison courtesy of Eric Winfree 11/6/2015 5 Aims ● Modeling biological systems. – By adapting paradigms and techniques developed for modeling information-processing systems. ● Because they have some similar features: – – – – – – – Deep layering of abstractions. Complex composition of simpler components. Discrete (non-linear) evolution. Digital coding of information. Reactive information-driven behavior. Very high degree of concurrency. “Emergent behavior” (not obvious from part list). 11/6/2015 6 EU Commission, Health Research Report on Computational Systems Biology ● General Modelling Requirements – Research projects should focus on integrated modelling of several cellular processes leading to as complete an understanding as possible of the dynamic behaviour of a cell. Several projects may be required to develop modules (metabolism, signalling, trafficking, organelles, cell cycle, gene expression, replication, cytoskeleton) in model organisms. This modelling should involve realistic analysis of experimental data, including a wide range of data for transcriptomics, proteomics and functional genomics, and interactions with cellular pathways including signal transduction, regulatory cascades, metabolic pathways etc. It should involve: ● Coherent, high-quality, quantitative, heterogeneous and dynamic data sets as a basis for novel model constructions to advance from analytical to predictive modelling. ● Experimental functional analysis tools (in-situ proteomics, protein-protein interactions, metabolic fluxes, etc) 11/6/2015 7 Methods ● Applying techniques unique to Computing: ● Model Construction (writing things down precisely) – Studying the notations used in systems biology. – Devising formal languages to reflect them. – Studying their dynamics (semantics). ● Model Validation (using models for postdiction and prediction) – Stochastic Simulation ● Stochastic = Quantitative concurrent semantics. ● Based on compositional descriptions. – “Program” Analysis ● Control flow analysis ● Causality analysis – Modelchecking ● Standard, Quantitative, Probabilistic 11/6/2015 8 Structural Architecture Eukaryotic Cell (10~100 trillion in human body) Nuclear membrane Mitochondria Membranes everywhere Golgi Vesicles E.R. Plasma membrane (<10% of all membranes) 9 11/6/2015 Functional Architecture The Abstract Machines of Systems Biology Regulation The “hardware” (biochemistry) is fairly well understood. But what is the “software” that runs on these machines? Gene Machine Notations already used in Biology Nucleotides Biochemical toolkits Model Integration Different time and space scales Protein Machine Holds receptors, actuators hosts reactions Aminoacids Metabolism, Propulsion Signal Processing Molecular Transport Implements fusion, fission P Q Membrane Machine Phospholipids Phospholipids Confinement Storage Bulk Transport 11/6/2015 10 1: The Protein Machine cf. BioCalculus [Kitano&Nagasaki], k-calculus [Danos&Laneve] On/Off switches Inaccessible Protein Binding Sites Pretty close to the atoms. Each protein has a structure of binary switches and binding sites. But not all may be always accessible. Inaccessible Switching of accessible switches. - May cause other switches and binding sites to become (in)accessible. - May be triggered or inhibited by nearby specific proteins in specific states. Binding on accessible sites. - May cause other switches and binding sites to become (in)accessible. -- May be triggered or inhibited by nearby specific proteins in specific states. 11/6/2015 11 Molecular Interaction Maps http://www.cds.caltech.edu/~hsauro/index.htm JDesigner The p53-Mdm2 and DNA Repair Regulatory Network Taken from Kohn 12 Kurt W. 11/6/2015 2. The Gene Machine Positive Regulation Negative Regulation Input Pretty far from the atoms. cf. Hybrid Petri Nets [Matsuno, Doi, Nagasaki, Miyano] Transcription Output Gene (Stretch of DNA) Regulation of a gene (positive and negative) influences transcription. The regulatory region has precise DNA sequences, but not meant for coding proteins: meant for binding regulators. Transcription produces molecules (RNA or, through RNA, proteins) that bind to regulatory region of other genes (or that are endproducts). Coding region Regulatory region Output2 Input Output1 “External Choice” The phage lambda switch Human (and mammalian) Genome Size 3Gbp (Giga base pairs) 750MB @ 4bp/Byte (CD) Non-repetitive: 1Gbp 250MB In genes: 320Mbp 80MB Coding: 160Mbp 40MB Protein-coding genes: 30,000-40,000 M.Genitalium (smallest true organism) 580,073bp 145KB (eBook) E.Coli (bacteria): 4Mbp 1MB (floppy) Yeast (eukarya): 12Mbp 3MB (MP3 song) Wheat 17Gbp 4.25GB (DVD) 11/6/2015 13 Gene Regulatory Networks http://strc.herts.ac.uk/bio/maria/NetBuilder/ NetBuilder Taken from Eric H Davidson And Begin coding region DNA Or Sum Amplify Gate 11/6/2015 14 3. The Membrane Machine P P Q Very far from the atoms. Q Mate Mito P Q Fusion Fission P P Q Q Exo Endo P Q Fusion Fission 11/6/2015 15 Membrane Transport Algorithms Protein Production and Secretion LDL-Cholesterol Degradation Viral Replication Taken from p.730 16 MCB 11/6/2015 Process Calculi ● Today we represent, store, and analyze: – – – – Gene sequence data Protein structure data Metabolic network data … ● In the long run, how can we represent, store, and analyze biological processes? – We want to do better than informal “circuit diagrams”, or huge list of chemical reactions. – Scalable, precise, dynamic, highly structured, maintainable representations for systems biology. ● Process Calculi – General formal framework for the description and analysis of highly concurrent interacting processes. 11/6/2015 17 Conclusions ● Identifying the architecture – Emphasis on architecture, not components ● Modeling the system – Information-oriented language-based models ● Analyzing the model – Exploiting techniques unique to computing ● Perturbing, predicting, engineering “The data are accumulating and the computers are humming, what we are lacking are the words, the grammar and the syntax of a new language…” D. Bray (TIBS 22(9):325-326, 1997) “Although the road ahead is long and winding, it leads to a future where biology and medicine are transformed into precision engineering.” Hiroaki Kitano. 11/6/2015 18