Бионформатика: извлечение смысла из (пос

Download Report

Transcript Бионформатика: извлечение смысла из (пос

Bioinformatics:
Data-driven molecular biology
Mikhail Gelfand
A.A.Kharkevich Institute for Information Transmission Problems, RAS
Moscow
II Испано-российский форум по информационным и коммуникационным
технологиям
Madrid, 21-25 / IX / 2009
Exponential increase of data volume
100000000000
10000000000
1000000000
100000000
10000000
1000000
100000
10000
1000
100
1982
1987
1992
red – papers (PubMed)
blue – sequence fragments (GenBank)
green – nucleorides (GenBank)
1997
2002
2007
of 18 million papers in
PubMed, ~675 thousand
have keywords
“bioinformat* OR comput*”
622 complete genomes (bacteria)
186
200
180
142
160
140
120
100
80
60
40
81
66
48
3
3
6
6
7
30
25
19
20
0
1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
>45 thousand Google hits
on “genome deciphered”
Top 10 hits:
• bioremediation
– bacterium Pseudomonas
•
agriculture and biotech
•
medicine
•
individual genome (medicine)
•
science / model organism
•
science / evolution
– crop and biofuel plant Sorghum
– rice
– pathogenic bacterium Staphylococcus
– SARS (atypical pneumonia) virus
– Brugia worm (elephantiasis)
– James Watson
– macaque
– mammoth (mitochondrial)
– platypus
Sequencing is just the beginning
Bacterial genome: several million nucleotides
600 through 9,000 genes (~ 90% of a genome codes for proteins)
This slide: 0,1% of the Escherichia coli genome
Human genome: 3 billion nucleotides, 25-30 thousand genes
polymorphisms (individual differences): ~ 1 for 1000 nucleotides
differences between human and chimpanzee: ~ 1 of 100
Not just genomes
Other types of large-scale
experiments / datasets:
• State of the genome
(gene expression)
– methylation
– nucleosome positioning
– histone modifications
• Transcriptomics, protein
abundance (gene expression)
• Protein-protein interactions
– signaling etc.
– functional complexes
• Protein-DNA interactions
(regulation)
• etc. etc.
Goals
• Functional annotation of genes and proteins
– biological function
– regulation (in what conditions)
• Functional annotation of genomes
– metabolic reconstruction and modeling
– regulatory networks and development
– prediction of organism properties from its genome
Applications: biotechnology
• Improvement of production strains (chemistry,
pharma, food industry)
– via modeling of metabolic pathways
• New enzymes (new functions, stress tolerance)
– via sequencing and functional annotation
• Biofuels
– fast-growing, stress-tolerant plants;
identification of genes
– microbes as producers of ethanol or fatty acids:
targeted genome design
Applications: medicine and pharma
• Personalized medicine
– identification of predisposing alleles: lifestyle
– pharmacogenomics (metabolic alleles)
– diagnostics
• Drug targets (chronic disease)
– analysis of signaling pathways
• Anti-infectives
– identification of drug targets
• Drug design; identification of drug candidates
– modeling of protein structure and interactions of
proteins with small molecules
Methods. Integration of data
• Systems biology:
Integration of diverse datasets
for one organism
• Comparative genomics:
Simultaneous analysis of genomic data for
many organisms
• Comparative systems biology:
understanding the evolution of gene
regulation and expression, signaling etc.
• Comparative structural biology
Bioinformatics in Russia
• Few high-throughput experiments
– Open data
– Collaborations
– Theory (evolution), methods, algorithms
• Highlights:
– Evolution (IITP RAS) and taxonomy (IPCB MSU)
– Regulation (FBB MSU, GosNIIGenetika, IITP RAS, ICaG SB
RAS)
– Annotation (FBB MSU, IITP RAS)
– Protein Structure (IPR RAS, IMB RAS, IPCB MSU, BF MSU)
– Modeling
• Metabolism (IPCB MSU, ICaG SB RAS)
• Regulation (SpBSPU , ICaG SB RAS)
– Drug design (IBMC RAMS)
Research and Training Center
“Bioinformatics”, Institute of Information
Transmission Problems (5 years: 2003-2009)
• Molecular evolution
– Alternative splicing as a driver of evolution in
eukaryotes
– Positive selection
• Comparative genomics of regulation in
bacteria
– Evolution of regulatory pathways
– Protein-DNA interactions
• Annotation
– Gene recognition
– Functional annotation
– Regulation
Comparative genomics in action:
confirmed predictions
• Regulatory mechanisms
– riboswitches (riboflavin – vitamin B1, thiamin – vitamin B2)
– antisense regulation of the methionine-cysteine pathway
– role of the ribosome in zinc homeostasis
• Regulators: NrdR, MtaR/MetR, CmbR, NiaR
• Enzymes: FadE, ThiN, TenA, CobZ, CobX/CbiZ, PduX, NagP,
NagB-II
• Microcins (capistruin, Burkholderia thailandensis)
• Transporters
– АВС-transporters with universal energizing components:
Co, Ni, biotin (vitamin H), thiamin (vitamin B2), riboflavin (vitamin
B1)
– other: threonin, methionin, oligogalacturonides, N-acetylglucosamin,
corrinoids, nyacin, riboflacin, Co
• Regulatory motifs: nitrogen-fixation, fatty acid biosynthesis,
iron homeostasis, catabolism of chitin and pectin
• Regulatory sites: several dozens
Functional annotation of genomes
Трансляция
Транскрипция
Репликация и репарация
Деление
Сигнальные пути
Внешняя мембрана
Движение
Оборот белков
Ионы
Защита
Секреция
Энергия
Сахара
Аминокислоты
Нуклеотиды
Коферменты
Липиды
Вторичный метаболизм
Слабо определено
Не определено
First Russian bacterial genome, Acholeplasma laidlawii (2008):
sequencing and proteomics: Institute of Physico-Chemical Medicine;
annotation: IITP: ~1,5 Mb; ~1400 genes.
Established function for ~80% genes; metabolic reconstruction
Publications (refereed)
Book Chapters
35
Russian Journals
International Journals
30
Collaboration (USA)
25
Collaboration (Europe)
20
15
10
5
0
2003
2004
2005
2006
2007
2008
2009 average
Collaborations
• European Laboratory of Molecular Biology *
• Germany
– Humboldt University, Berlin
– Munich Technical University
• France
– Lyon University
• United Kingdom
– University of East Anglia
• Spain
– Center for Genome Regulation (Barcelona)
• USA
–
–
–
–
–
MIT
Burnham Institute *
Lawrence Berkeley National Laboratory *
Stowers Institute *
Rutgers University
• China
– China-Germany Partner Institute of Molecular Genetics (Shanghai)
• Industry
– Biomax (Germany)
– Interated Genomics (USA)
Bold: on-going
* Former students