Languages for Systems Biology Luca Cardelli Microsoft Research Cambridge UK http://www.luca.demon.co.uk/BioComputing.htm http://research.microsoft.com/bioinfo 50 Years of Molecular Cell Biology ● Genes are made of DNA – Store digital information.

Download Report

Transcript Languages for Systems Biology Luca Cardelli Microsoft Research Cambridge UK http://www.luca.demon.co.uk/BioComputing.htm http://research.microsoft.com/bioinfo 50 Years of Molecular Cell Biology ● Genes are made of DNA – Store digital information.

Languages for
Systems Biology
Luca Cardelli
Microsoft Research
Cambridge UK
http://www.luca.demon.co.uk/BioComputing.htm
http://research.microsoft.com/bioinfo
50 Years of Molecular Cell Biology
● Genes are made of DNA
– Store digital information as sequences of 4
different nucleotides
– Direct protein assembly through RNA and the
Genetic Code
● Proteins (>10000) are made of amino acids
–
–
–
–
–
Process signals
Activate genes
Move materials
Catalyze reactions to produce substances
Control energy production and consumption
● Bootstrapping still a mystery
– DNA, RNA, proteins, membranes are today
interdependent. Not clear who came first
– Separation of tasks happened a long time ago
– Not understood, not essential
11/6/2015
2
Towards Systems Biology
●
Biologists now understand many of the cellular components
●
But this has not led to understand how “the system” works
●
New approach: try to understand “the system”
●
What kind of a system?
●
Can we fix it when it breaks?
– A whole team of biologists will typically study a single protein for years
– When each component and each reaction is understood, the system is understood (?)
– Behavior comes from complex chains of interactions between components
– Predictive biology and pharmacology still rare
– Synthetic biology still unreliable
– Experimentally: massive data gathering and data mining (e.g. Genome projects)
– Conceptually: modeling and analyzing networks (i.e. interactions) of components
– Just beyond the basic chemistry of energy and materials processing…
– Built right out of digital information (DNA)
– Based on information processing for both survival and evolution
– Really becomes: How is information structured and processed?
11/6/2015
3
Size
Pentium II Gate
1 micron
E. Coli
1 micron
DVD
1 micron
 = 0.25 micron
in Pentium II
11/6/2015
4
Performance
Pentium II
●
●
●
~3 million transistors
~1/4 megabyte of
memory
~100 million operations
per second
DVD
E. Coli
●
●
●
~1 million
macromolecules
~1 megabyte of static
genetic memory
~1 million amino-acids
per second
●
●
4700 megabytes of
memory
1.385 megabytes per
second
Comparison courtesy of Eric Winfree
11/6/2015
5
Aims
● Modeling biological systems.
– By adapting paradigms and techniques developed
for modeling information-processing systems.
● Because they have some similar features:
–
–
–
–
–
–
–
Deep layering of abstractions.
Complex composition of simpler components.
Discrete (non-linear) evolution.
Digital coding of information.
Reactive information-driven behavior.
Very high degree of concurrency.
“Emergent behavior” (not obvious from part list).
11/6/2015
6
EU Commission, Health Research Report
on Computational Systems Biology
● General Modelling Requirements
– Research projects should focus on integrated modelling
of several cellular processes leading to as complete an
understanding as possible of the dynamic behaviour of a
cell. Several projects may be required to develop
modules (metabolism, signalling, trafficking, organelles,
cell cycle, gene expression, replication, cytoskeleton) in
model organisms. This modelling should involve realistic
analysis of experimental data, including a wide range of
data for transcriptomics, proteomics and functional
genomics, and interactions with cellular pathways
including signal transduction, regulatory cascades,
metabolic pathways etc. It should involve:
● Coherent, high-quality, quantitative, heterogeneous and
dynamic data sets as a basis for novel model constructions
to advance from analytical to predictive modelling.
● Experimental functional analysis tools (in-situ proteomics,
protein-protein interactions, metabolic fluxes, etc)
11/6/2015
7
Methods
● Applying techniques unique to Computing:
● Model Construction (writing things down precisely)
– Studying the notations used in systems biology.
– Devising formal languages to reflect them.
– Studying their dynamics (semantics).
● Model Validation (using models for postdiction and prediction)
– Stochastic Simulation
● Stochastic = Quantitative concurrent semantics.
● Based on compositional descriptions.
– “Program” Analysis
● Control flow analysis
● Causality analysis
– Modelchecking
● Standard, Quantitative, Probabilistic
11/6/2015
8
Structural Architecture
Eukaryotic
Cell
(10~100 trillion
in human body)
Nuclear
membrane
Mitochondria
Membranes
everywhere
Golgi
Vesicles
E.R.
Plasma
membrane
(<10% of all
membranes)
9
11/6/2015
Functional Architecture
The Abstract Machines of
Systems Biology
Regulation
The “hardware” (biochemistry) is
fairly well understood.
But what is the “software” that
runs on these machines?
Gene
Machine
Notations already
used in Biology
Nucleotides
Biochemical
toolkits
Model Integration
Different time
and space scales
Protein
Machine
Holds receptors, actuators
hosts reactions
Aminoacids
Metabolism, Propulsion
Signal Processing
Molecular Transport
Implements fusion, fission
P
Q
Membrane
Machine
Phospholipids
Phospholipids
Confinement
Storage
Bulk Transport
11/6/2015
10
1: The Protein Machine
cf. BioCalculus [Kitano&Nagasaki], k-calculus [Danos&Laneve]
On/Off switches
Inaccessible
Protein
Binding Sites
Pretty close
to the atoms.
Each protein has a structure
of binary switches and binding sites.
But not all may be always accessible.
Inaccessible
Switching of accessible switches.
- May cause other switches and
binding sites to become (in)accessible.
- May be triggered or inhibited by nearby specific
proteins in specific states.
Binding on accessible sites.
- May cause other switches and
binding sites to become (in)accessible.
-- May be triggered or inhibited by nearby specific
proteins in specific states.
11/6/2015
11
Molecular Interaction Maps
http://www.cds.caltech.edu/~hsauro/index.htm
JDesigner
The p53-Mdm2 and DNA Repair Regulatory Network
Taken from
Kohn
12
Kurt W.
11/6/2015
2. The Gene Machine
Positive Regulation
Negative Regulation
Input
Pretty far from
the atoms.
cf. Hybrid Petri Nets [Matsuno, Doi, Nagasaki, Miyano]
Transcription
Output
Gene
(Stretch of DNA)
Regulation of a gene (positive and
negative) influences
transcription. The regulatory
region has precise DNA
sequences, but not meant for
coding proteins: meant for
binding regulators.
Transcription produces molecules
(RNA or, through RNA, proteins)
that bind to regulatory region of
other genes (or that are endproducts).
Coding region
Regulatory region
Output2
Input
Output1
“External Choice”
The phage
lambda switch
Human (and mammalian) Genome Size
3Gbp (Giga base pairs) 750MB @ 4bp/Byte (CD)
Non-repetitive: 1Gbp 250MB
In genes: 320Mbp 80MB
Coding: 160Mbp 40MB
Protein-coding genes: 30,000-40,000
M.Genitalium (smallest true organism)
580,073bp 145KB (eBook)
E.Coli (bacteria): 4Mbp 1MB (floppy)
Yeast (eukarya): 12Mbp 3MB (MP3 song)
Wheat 17Gbp 4.25GB (DVD)
11/6/2015
13
Gene Regulatory Networks
http://strc.herts.ac.uk/bio/maria/NetBuilder/
NetBuilder
Taken from
Eric H Davidson
And
Begin coding region
DNA
Or
Sum
Amplify
Gate
11/6/2015
14
3. The Membrane Machine
P
P
Q
Very far from
the atoms.
Q
Mate
Mito
P
Q
Fusion
Fission
P
P
Q
Q
Exo
Endo
P Q
Fusion
Fission
11/6/2015
15
Membrane Transport Algorithms
Protein Production
and Secretion
LDL-Cholesterol
Degradation
Viral Replication
Taken from
p.730
16
MCB
11/6/2015
Process Calculi
● Today we represent, store, and analyze:
–
–
–
–
Gene sequence data
Protein structure data
Metabolic network data
…
● In the long run, how can we represent, store, and analyze biological
processes?
– We want to do better than informal “circuit diagrams”, or huge list of
chemical reactions.
– Scalable, precise, dynamic, highly structured, maintainable representations
for systems biology.
● Process Calculi
– General formal framework for the description and analysis of highly
concurrent interacting processes.
11/6/2015
17
Conclusions
● Identifying the architecture
– Emphasis on architecture, not components
● Modeling the system
– Information-oriented language-based models
● Analyzing the model
– Exploiting techniques unique to computing
● Perturbing, predicting, engineering
“The data are accumulating and the
computers are humming, what we are
lacking are the words, the grammar
and the syntax of a new language…”
D. Bray (TIBS 22(9):325-326, 1997)
“Although the road ahead is long
and winding, it leads to a future
where biology and medicine are
transformed into precision
engineering.” Hiroaki Kitano.
11/6/2015
18