Bergen Center for Computational Science

Download Report

Transcript Bergen Center for Computational Science

Bioinformatics & systems biology
Pål Puntervoll
Outline
• Bergen Center for Computational Science
– Computational Biology Unit
• Bioinformatics
– From sequence analysis to systems biology
• eSysbio – e-infrastructure for systems biology
– Software development; integration/work flows
• Department of UNIFOB AS
• Four research units:
–
–
–
–
CBU – Computational Biology Unit
CMU – Computational Mathematics Unit
BCPL – Bergen Computational Physics Laboratory
Parallab – high performance computing laboratory
• These groups share significant needs for:
–
–
–
–
Computational resources
Methods
Algorithms
Software
• A main objective of BCCS: boost cross-disciplinary activity
BCCS Administration
Research director: Petter Bjørstad
4 staff members
CBU
Leader:
Inge Jonassen
Researchers:
8
Engineers:
3
PostDocs:
2
PhD students: 12
Total staff:
25
CMU
Øyvind Thiem
5
BCPL
Csaba Anderlik
1
2
2
2
2
9
5
BCCS: 55 employees
Parallab
Klaus Johannsen
4
5
2
11
Research groups:
Inge Jonassen
Pattern discovery in molecular biology data
Nathalie Reuter
Molecular modeling of proteins
Boris Lenhard
Transcription and transcriptional gene regulation
Igor Berezovsky
Protein stability and adaptation, protein-protein interactions,
and evolution of protein function
Rein Aasland
Functional annotation of proteins and protein domains
Service group:
Pål Puntervoll
leader
Yvan Strahm
service scientist
Svenn Helge Grindhaug
system engineer/programmer
Alexandr Oltu
system engineer
Services to Norwegian researchers:
- molecular biology/biochemistry
- biomedicine
- microbiology
Types of services:
- consultancy/support
- analysis
- programming
- online web tools
Courses/work shops
The central dogma of
molecular biology
DNA
RNA
Genes
Protein
Molecular machines
of the cell
The central dogma of
molecular biology
DNA
The central dogma of
molecular biology
DNA
…atg ggg ctc agc gac ggg gaa tgg cag ttg…
||| ||| ||| ||| ||| ||| ||| ||| ||| |||
…tac ccc gag tcg ctg ccc ctt acc gtc aac…
RNA
…aug ggg cuc agc gac ggg gaa ugg cag uug…
Protein
M
G
L
S
D
G
E
W
Q
L
…
Myoglobin
< 90% identical
83% identical
89% similar
41% identical
62% similar
DNA sequence databases
GenBank
EMBL
DDBJ
Growth of GenBank
bases
“… from 1982 to the
present, the number of
bases in GenBank has
doubled approximately
every 18 months.”
year
Data in GenBank
December 2007
• More than 190 billion bases
• More than 570 complete microbial genomes
– 200 added in the last year
• More than 190 eukaryotic genomes
– With significant coverage
• More than 260,000 named species
– 1,700 new species per month
Challenges in bioinformatics
• Typical analysis involve many resources
– Local tools and databases
– Web tools and databases
• No common data model for biological data
– Some exceptions
Stein, L., Nature. 2002 417(6885):119-120
“EMBRACE is an EUsponsored Network of
Excellence aimed at
enabling
bioinformatics
research through
better operability of
databases, servers, and
services.”
Taverna Workbench
a free software tool for designing and executing workflows
Workflow for identifying
candidate human
pathways from
differentially expressed
genes
Paul Fisher
http://www.myexperiment.org/workflows/143
Metabolism map
Metabolism:
“the set of chemical
reactions that occur in
living organisms in order
to maintain life”
KEGG, http://www.genome.jp/kegg/atlas/
DNA microarrays:
differential
gene expression
Systems biology
“Systems biology is the science of discovering,
modelling, understanding and ultimately engineering
at the molecular level the dynamic relationships
between the biological molecules that define living
organisms.”
Leroy Hood, President Institute for Systems Biology
eSysbio
an e-science environment for systems biology
“The key to performing excellent systems
biology research is the ability to exploit
all available sources of information, and
to integrate existing knowledge with
newly generated experimental data.”
“To facilitate the interdisciplinary
interaction … we propose to construct a
collaborative virtual workspace … an
integrated environment for entering,
analysing and accessing data.”
The buzzword slide
• Web services
– XML
– WSDL
– SOAP (XML messages)
• Service-oriented architecture
• GRID technology (distributed computing)
• Semantic web