Transcript Slide 1

BioUML - extensible workbench for
systems biology
Laboratory of Bioinformatics
Novosibirsk, Russia.
www.itcsoftware.com
 2004-2011 ITC Software All rights reserved.
1
Schedule
9.30 – 9.40
Introduction
9.40 – 10.00
Visual modeling, graphical notations
10.00 – 10.20
BioUML main concepts and architecture
overview
10.20 – 11.00
Demonstration of BioUML workbench from the
customer view point.
11.00 – 11.15
Coffee break
11.15 – 11.30
Simulations with BioUML.
MATLAB plug-in, Java plug-in.
11.30 – 11.45
Usage of BioUML in real life projects –
Cyclonet and Biopath (codename) projects
11.45 – 12.15
Current work and further development
12.15 – 12.30
Discussion
2
The challenge
Sydney Brenner, 2002
Nobel Prize winner
"We now have unprecedented ability to collect
data about nature but there is now a crisis
developing in biology, in that completely
unstructured information does not enhance
understanding. We need a framework to put all
of this knowledge and data into — that is going
to be the problem in biology.
We've reached the stage where we can't talk to
each other — we've all become highly
specialized. We need a framework, a
framework where people can come back to us
and say, 'Yes, I understand.' Driving toward that
framework is really the big challenge.“
Sydney Brenner
3
4
Main BioUML concepts and ideas
• Visual modeling
• Plug-in based architecture (Eclipse platform runtime from
IBM).
• Meta model – problem domain neutral level of
abstraction that describes system as compartmentalized
graph. It is also common contract for all parties (software
developers, problem domains specialists, annotators,
mathematicians).
• Diagram type concept – formally defines graphical
notation and provides its incorporation into BioUML
workbench.
• Module concept - allows developer to incorporate
databases on biological pathways into BioUML
framework taking into account database peculiarities. 5
Visual Modeling
The problem of modeling and simulating of complex
systems can be significantly simplified for customers by
using computer systems providing visual modeling.
These visual depictions offer alternative syntax to
completely and formally specify models.
A number of visual syntaxes were developed and
implemented in computer systems for electrical
engineering and computer science. The most known
graphical language for computer science is UML –
Unified Modeling Language.
6
The OMG specification states:
"The Unified Modeling Language (UML) is a
graphical language for visualizing,
specifying, constructing, and documenting the
artifacts of a software-intensive system.
The UML offers a standard way to write a
system's blueprints, including conceptual
things such as business processes and system
functions as well as concrete things such
as programming language statements,
database schemas, and reusable software
components."
7
UML diagrams
• use case diagram
• class diagram
• behavior diagrams:
– statechart diagram
– activity diagram
– interaction diagrams:
• sequence diagram
• collaboration diagram
• implementation diagrams:
– component diagram
– deployment diagram
8
UML use case diagram
9
UML class diagram
10
UML state chart diagram
11
UML
activity diagram
12
UML sequence diagram
13
UML collaboration diagram
14
If we will consider UML architecture from developer view
point then we will note:
1) UML was really designed for modeling software
systems and are hardly suitable for other problem
domains.
2) UML has complicated structure that is quite hard for
implementation. OMG specification is more the 700
pages.
3) UML was not designed for visual modeling and
simulation of dynamics of complex systems.
That is why we need new language for modeling biological
systems and we called this language BioUML.
15
Graphical notations for
biological pathways
16
Some graphical notations for biological pathways
Kohn K.W. (1999). Molecular Interaction Map of the
Mammalian Cell Cycle Control and DNA Repair Systems.
Mol. Biol.Cell. 10, 2703-2734.
Kitano H. (2003). A graphical notation for biochemical
networks. BIOSILICO Vol. 1. No. 5.
R. Maimon and S. Browning (2001). Diagrammatic
Notation and Computational Grammar for Gene Networks.
Proceedings of the International Conference on Systems
Biology. 2001.
Cook D.L. et al. (2001). A basis for a visual language for
describing, archiving and analyzing functional models of
complex biological systems. Genome Biol. 2. RESEARCH
0012.
Database specific notations:
- KEGG/Metabolic pathways; GeneNet system;
17
TRANSPATH …
Kohn K.W. (1999). Molecular Interaction Map of the Mammalian Cell
Cycle Control and DNA Repair Systems. Mol. Biol.Cell. 10, 2703-2734.
18
19
Representation of multimolecular complexes: stimulatory and
inhibitory complexes of E2F1, DP1, and pRb. (a) E2F1:DP1
dimer; (b) E2F1:DP1:pRb trimer; (c) E2F1:DP1 bound to
promoter element E2 (transcriptional activation shown); (d)
E2F1:DP1:pRb bound to E2 (transcriptional inhibition shown).
Note that the promoter element can be occupied either by
E2F1:DP1 or by E2F1:DP1:pRb (alternative binding
20
represented by interaction lines joined at an acute angle).
21
Kitano H. (2003). A graphical notation for biochemical
networks. BIOSILICO Vol. 1. No. 5.
22
23
24
R. Maimon and S. Browning.
Diagrammatic Notation and Computational Grammar for
Gene Networks. Proceedings of the International
Conference on Systems Biology. 2001.
25
26
Formal description and modeling of biological
systems require coordinated efforts of different
group of researchers:
• programmers - they should provide computer tools
for this task.
• problem domain experts - they should specify
what and how should be described.
• experimenters and annotators - they should
describe corresponding data following to these
rules.
• mathematicians - they should provide methods for
models analysis and simulations.
BioUML architecture separates these tasks so they
can be effectively solved by corresponding group
of researchers and provides simple contract how
these groups and corresponding software parts 27
should communicate.
p53, The gatekeeper of death:
28
Cook D.L. et al.
(2001). A basis for a
visual language for
describing, archiving
and analyzing
functional models of
complex biological
systems. Genome
Biol. 2. RESEARCH
0012.
29
30
31
KEGG - metabolic pathways
32
KEGG - signaling pathways
33
GeneNet system
A chemical formalism was
employed as a basis for
describing the events occurring
in biological pathways
B1+..+BM
A1+...+AN
C1+...+CK
2 types of relationships between
entities:
reaction - the interaction
between the entities that leads
to the appearance of new entity
regulatory event - the effect of
an entity on a certain reaction
34
GeneNet – antiviral responce
35
TRANSPATH database
36
TRANSPATH – p53 pathway
37
BioUML architecture
38
Plug-in based architecture
A plug-in is the smallest unit of BioUML workbench function that can be
developed and delivered separately into BioUML workbench. A plug-in is
described in an XML manifest file, called plugin.xml. The parsed contents of
plug-in manifest files are made available programmatically through a plug-in
registry API provided by Eclipse runtime.
- extension points are well-defined function points in the system where other
plug-ins can contribute functionality.
- extension is a specific contribution to an extension point. Plug-ins can
define their own extension points, so that other plug-ins can integrate tightly
with them.
Plug-in
- plugin.xml
- Java jar files
Plug-in
- plugin.xml
- Java jar files
Eclipse platform runtime
Plug-in
- plugin.xml
- etc.
39
Standard module
GeneNet module
Diagram types
- Semantic map
- Pathway
- Pathway simulation
Database
Database adapter
KEGG/pathways
module
TRANSPATH
module
Java objects
Gene
Protein
Query engine
…
SBML module
Diagram
view part
Meta model
Graph structure
ModuleType
DiagramType
Executable model
-diagram types
-data categories
-query engine
-semantic controller
-diagram view builder
-diagram filter
Diagram
editor
Eclipse platform runtime
Analysis
tools
Diagram
editor part
Simulation
tools
Other
tools
Workbench UI
Perspectives
Views,
editors
Menus,
40
toolbars, etc.
41
42
Plugin.xml example (1)
<?xml version="1.0" encoding="UTF-8"?>
<plugin
id="ru.biosoft.plugins.javascript“
version="0.7.4">
<requires>
<import plugin="org.apache.log4j"/>
<import plugin="ru.biosoft.access"/>
<import plugin="ru.biosoft.workbench"/>
<import plugin="biouml.workbench"/>
</requires>
<runtime>
<library name="jsplugin.jar“/>
</runtime>
<extension-point name="function" id="function"/>
<extension-point name="function" id="hostObject"/>
43
Plugin.xml example (2)
<extension-point name="function" id="function"/>
<extension-point name="function" id="hostObject"/>
<extension id="help" point="ru.biosoft.plugins.javascript.function">
<function
name="help"
class="ru.biosoft.plugins.javascript.Global"
method="help"
varargs="true">
<doc description="%help.descr">
<argument name="obj" type="Object" obligatory="true“
description="%help.arg_1"/>
<example code="%help.ex_1c" description="%help.ex_1d"/>
<example code="%help.ex_2c" description="%help.ex_2d"/>
<example code="%help.ex_3c" description="%help.ex_3d"/>
</doc>
44
</function>
</extension>
Plugin.properties example
help.descr = Returns description for the specified JavaScript function \
or host object.\nThis description should be loaded using \
'biouml.pligins.javascript.function' or \
\n'biouml.pligins.javascript.hostObject' extension point.
help.arg_1 = JavaScript object of function name.
help.ex_1c = help()
help.ex_1d = will print description of 'help' function itself.
help.ex_2c = help(print)
help.ex_2d = will print description of 'print' function.
help.ex_3c = help(sbw)
help.ex_3d = will print description of sbw host object.
45
BioUML meta model
The core of BioUML workbench is meta model.
Unlike UML meta mode BioUML meta model is
problem domain neutral and provides an abstract
layer for comprehensive formal description of
wide range of biological and other complex
systems.
Content of databases on biological pathways or
SBML models are expressed in terms of meta
model and then can be used by other workbench
plug-ins.
46
Пример двух последовательных химических реакций
A
100
eq1
eq2
R1
B
0
eq3
eq4
R2
C
0
Соответствующая ему математическая модель
dA
 eq1
dt
dB
 eq 2  eq 3
dt
dC
 eq 4
dt
47
Example of formalized description of
System from two chemical reactions
A
-k1[A] k1[A]
R1
100
ID A
CC ..
...
//
ID
R1
A->B
...
//
R1
A
100
-k1[A]
B
0
ID B
CC ..
...
//
B
k1[A] 0
-k2[B] K2[B]
R2
ID
R2
B->C
...
//
R2
-k2[B]
C
0
ID C
CC ..
...
//
Description of system
components in the
database
System structure is
C
K2[B] 0
described as a graph
Mathematical model
of the system
48
Main BioUML meta-model classes to describe system
structure as compartmentalized graph
49
50
51
Diagram type concept
Diagram type defines:
• what system components can be shown
in the diagram;
• diagram view builder - it is used to
generate view for each diagram element
taking into account problem domain
peculiarities;
• semantic controller - provides semantic
integrity of the diagram during its
editing;
• filters – hide or highlight diagram
elements according to some selection
criteria.
52
53
Module Concept
• The module concept allows to
developer define new diagram
types and incorporate other
databases on biological
pathways into BioUML
framework.
• The module defines mapping of
database content into diagram
elements and diagram types
that can be used with the
database.
• Module also provides query
engine that can be used by
BioUML workbench to find
interactiong components of the
system.
54
Modules
• standard BioUML module for biological
pathways;
• module for models in SBML format;
• module for models in CellML format
(only biochemical models);
• GeneNet database module;
• module for KEGG/Pathways datbase (draft);
• module for TRANSPATH database (draft);
• GeneOntology (under development).
55
Standard BioUML module
for biological pathways
The module defines most common biological data types
(gene, protein, RNA, substance, reaction, etc.), they
mapping into simple text database and three diagram
types for description of biological pathways on several
semantic levels:
1. Semantic network (ontology) - this diagram type is used
to describe semantic relationships between system
components, system states, and related problem domain
concepts.
2. Pathway diagram type is used for formalized description
of biological pathway structure. This diagram type uses
GeneNet graphical notation.
3. Pathway simulation diagram type is extension of pathway
structure diagram, where variables are associated with
graph nodes and differential equations with graph edges.
This allows to BioUML workbench automatically generate
mathematical model of the system and simulate its
56
dynamics.
57
58
59
• Suggested approach can be applied for modeling
biological systems using:
– Systems of ordinary differential equations
– Systems of algebra-differential equations
– State and transition diagrams
– Hybrid models
– Boolean and logical networks
– Petri nets
– Markov chains
– Stochastic models
– …
• Limitations
– Spatial models
– PDE
– …
60
Standard module
GeneNet module
Diagram types
- Semantic map
- Pathway
- Pathway simulation
Database
Database adapter
KEGG/pathways
module
TRANSPATH
module
Java objects
Gene
Protein
Query engine
…
SBML module
Diagram
view part
Meta model
Graph structure
ModuleType
DiagramType
Executable model
-diagram types
-data categories
-query engine
-semantic controller
-diagram view builder
-diagram filter
Diagram
editor
Eclipse platform runtime
Analysis
tools
Diagram
editor part
Simulation
tools
Other
tools
Workbench UI
Perspectives
Views,
editors
Menus,
61
toolbars, etc.
Formal description and modeling of biological systems
require coordinated efforts of different group of
researchers:
• programmers - they should provide computer tools
for this task.
• problem domain experts - they should specify what
and how should be described.
• experimenters and annotators - they should describe
corresponding data following to these rules.
• mathematicians - they should provide methods for
models analysis and simulations.
BioUML architecture separates these tasks so they can
be effectively solved by corresponding group of
researchers and provides simple contract how these
groups and corresponding software parts should 62
communicate.
BioUML live demonstration
63
Biological databases
Data search and retrieving
Formal description of structure of
biological system
Visual modeling
Automated code generation for model
simulation of model behavior
MATLAB code
Simulating using MATLAB.
JMatLink allows to BioUML
workbench to start MATLAB and
retrieve simulations results
… code
Java code
Java simulation plug-in.
Contains ODE solvers ported
from odeToJava and methods
for hybrid models support.
64
Cyclonet database
65
Cyclonet database
Using BioUML workbench we are developing
Cyclonet database – a database on cell cycle
regulation in eukaryotes.
Using BeanExplorer Enterprise Edition database
content is available via the Internet.
The database contains information about cell cycle
specific genes, proteins, protein complexes and
their interactions, diagrams of cell cycle regulation
for vertebrates, models of cell cycle and results of
their analyses, microarray data, literature
references and other related resources.
66
Cyclonet database content
Data:
for semantic networks
- concepts
- relations
for pathway diagrams
- cells
- compartments
- genes
- proteins
- RNAs
- substances
- reactions
-for simulation
- constants
- equations
Diagrams
Publications
Resources:
- formats
- software
- sites, etc.
Microarray
data
Analysis
results
TRANSPATH
TRANSFAC
Dictionaries:
- databases
- relation types
- species
- units
- constants
PASS
EnSmart
GeneOntology
67
Cyclonet system architecture
Web browser
BioUML workbench
Cyclonet module
Client side:
Server side:
Servlet container: Tomcat
BeanExplorer
Enterprise Edition
JDBC
Cyclonet
MySQL database
GeneOntology
MySQL database68
69
70
Biopath (codename) database
Purpose of this project is create formal
description of pathogeneses of chronic
respiratory diseases and optimization of their
treatment.
The work is performed jointly with the Institute
of Medical Problems of North (Krasnoyarsk)
and the Institute of Pulmonology (Moscow).
71
Current work
• Graph search engine
– Graph search engine completion and testing
– adaptation of new graph layout library
• Complete support of SBML level 2
– Time delay
– Algebraic rules
– Units
– SBML semantic tests
• Formula editor
• State and transition diagrams.
72
Force directed graph layout
(takes nodes size into account)
(lay-out time ~50ms, 300 iterations)
73
74
75
76
Further steps
•
•
•
•
•
•
Library of predefined kinetic laws
Stochastic modeling
BioPAX support
New diagram type for gene regulatory regions
TRANSPATH support
Microarray data support
77
Availability
BioUML workbench (including source code) is
freely available at www.itcsoftware.com
Cyclonet database
Biopath database
}
see site
}
for details
78
Acknowledgments
Part of this work was partially supported by following
grants: Volkswagen-Stiftung (I/75941), INTAS
Nr. 03-51-5218 and RFBR Nr. 04-04-49826-а.
Author is grateful to for useful comments, discussions
and technical support
Alexander Kel
Sergey Zhatchenko
Software developers
Mikhail Puzanov
Vasiliy Hudyakov
Alexandr Koshukov
Annotators
Igor Tyazhev
Ruslan Sharipov
Vlad Zhvaleev Elena Cheremushkina
Oleg Onegov
Artem Shaidukov
79
About I TC Software
ITC Software is an advanced information technology solutions provider, focused on
the global commercial and scientific markets.
With over 4047 software engineers in the Americas, Europe, Russia and Asia, and a host of
world-renowned scientists at, inter alia, Harvard and MIT, ITC Software’s services range from
research & development in finance, bioinformatics and FEA engineering, to cutting-edge
custom software development and maintenance services in most industry verticals, to the
turnkey establishment of Offshore IT Centers.
ITC Software has won numerous international awards and its centers are ISO 9001:2000
certified and CMMI level 4 pre-certified. Pricing is extremely competitive for both complex
scientific applications such as bioinformatics and biomechanics, and commercial software
development projects.
We hope you enjoyed our presentation !
Please call or email us, if you have any questions, need more information, would like to
embark on a research project, or require software development services
Tel. +1 978 287 4855
[email protected]
Become a high performance enterprise with Strategic Outsourcing, visit www.itcsoftware.com
Thank you!
80