Structural proteomics of Thermotoga maritima.

Download Report

Transcript Structural proteomics of Thermotoga maritima.

JCSG Bioinformatics core
overview: 2006
BIC - last two years
• Organizational and personal changes
– Two sites (UCSD & Burnham)
– Six people left, five new people hired
• Transformation to a production center
– Core tools developed, but still significant tool
development
– Increasing role of data analysis
John W ooley (UCSD)
co-PI
Josie Alaoen
Adam Godzik
core leader
Slawek Grzechnik
Core manager
Andrew Morse
Lukasz Jaroszewski
Tamara Astakhova
Krishna Subramanian
Lian Duan
Dana Weekes
Naomi Cotton
Piotr Kozbial
Hemant Joshi*
Bioinformatics - convergence of
methods, but also challenges
• Maximizing production
• Data management for
high throughput
• Covering the universe of
proteins with structures
• Maximizing impact of
structures
• Making sense of
structures one at a time
• Understanding protein
universe using structures
Bioinformatics core of JCSG - integration within
and outside
• Integrating data across
JCSG
– Flow of data connects cores
across physical locations,
different proteins
– “intuitive crystallography”
doesn’t scale up to high
throughput, centralized data
management does
– Growing production,
growing challenges, new
robot, new databases
• Leveraging JCSG
experiences and results
– CAMERA: developing new
generation of biological
databases, new horizons in
protein universe
– JCMM: improving modeling by
protein structure analysis
– “experimental bioinformatics” JCSG structures and
bioinformatics function
predictions leading
biochemistry and biology
experiments
CAMERA: first look at the ever
expanding universe of proteins
• New type of genomics
• New types of data (and
lots of it)
– 17M new (predicted
proteins!) 4-5 x growth in
just few months
– New challenges of really
high throughput genomics
– Genomics without genomes
- metagenomics and its
challenges
Joint Center for Molecular Modeling
•
Newly funded (3/28/06) P20
center in response to NIGMS
RFA “High accuracy protein
structure modeling”
•
Burnham/UCSD collaboration
•
PI - Adam Godzik, coPIs Pavel Pevzner (UCSD), Yuzhen
Ye (Burnham)
• Goals:
– Improve modeling by analysis of
existing structures
• Methods
– New approaches to structure
comparison
• Evolution of protein structures
• Protein is a graph
• Comparing graphs has a long
history and many tools are
available
– New ways of evaluating protein
models
These tools allow us to study entire
structural families
Multiple structural alignment is
actually a graph (POG)
• Partial order graphs have been extensively
studied in mathematics and have many
interesting properties
Using these tools we can identify
“microdomains” in proteins
d1a06_
d1blxa
d1byga
d1ckia
d1cm8a
d1csn_
d1f3mc
d1fgka
d1fmk_
d1fota
d1fvra
d1gjoa
d1gz8a
d1gzka
d1h4la
Protein
Kinases
(SCOP family
d.144.1.7)
Aligned segments length: 98 aa, Ca-RMSD: 1.8Å
These “microdomains” move
independently from each other
d1a06_
d1blxa
d1byga
d1ckia
d1cm8a
d1csn_
d1f3mc
d1fgka
d1fmk_
d1fota
d1fvra
d1gjoa
d1gz8a
d1gzka
d1h4la
Protein
Kinases
(SCOP family
d.144.1.7)
Aligned segments length: 33 aa, Ca-RMSD: 1.9Å
Universe of protein structures and PSI goals
Fold
Superfamily
Family
Evolution of folds and structures
Predicted new
superfamilies in
known folds
Expected new
superfamilies
in yet to be
discovered folds
?
? ?
?
?
?
P
D
B
Folds
“new” folds
Nothing in Biology Makes Sense Except in the
Light of Evolution
You are here
JCSG is here
But most elements
of machinery of life
were developed here
Tree of life from Carl Woese, et al
We are built from the same parts!
E.coli – rat oxireductase
RMSD of 2.5 on 140 positions
7% (!!!!) sequence id
E.coli – human Ribokinase
RMSD of 2.4 on 300 aa
18% sequence id
E.coli – mouse
Ribonucleotide
Reductase
2.2/320
Some statistics
• At least 70% of all human proteins have at least
one domain that have homologs in bacteria
• Ribosomal proteins and enzymes involved in
central metabolism are well represented, but so
are stress response and regulatory proteins (and
a lot of domains with unknown functions).
Domains of Central Machinery of Life
Pfam
No fold
prediction
Present in
Eukaryotes
430
Present in
Prokaryotes
0
Distribution of CML
targets in different
prokaryotes
>
but
~
Aeropyrum_pernix
Archaeoglobus_fulgidus
Bacillus_cereus_ATCC14579
Bacillus_halodurans
Bacillus_subtilis
Bacteroides_thetaiotaomicron_VPI-5482
Bartonella_henselae_Houston-1
Bordetella_bronchiseptica
Bordetella_parapertussis
Bordetella_pertussis
Borrelia_burgdorferi
Campylobacter_jejuni
Caulobacter_crescentus
Chlorobium_tepidum_TLS
Chromobacterium_violaceum
Clostridium_acetobutylicum
Clostridium_perfringens
Corynebacterium_diphtheriae
Corynebacterium_glutamicum_ATCC_13032_Bielefeld
Deinococcus_radiodurans
Desulfovibrio_vulgaris_Hildenborough
Enterococcus_faecalis_V583
Escherichia_coli_K12
Geobacter_sulfurreducens
Haemophilus_ducreyi_35000HP
Haemophilus_influenzae
Halobacterium_sp
Helicobacter_pylori_26695
Lactobacillus_plantarum
Listeria_innocua
Listeria_monocytogenes
Mesorhizobium_loti
Methanococcus_jannaschii
Methanococcus_maripaludis_S2
Mycoplasma_genitalium
Neisseria_gonorrhoeae_FA_1090
Neisseria_meningitidis_Z2491
Nitrosomonas_europaea
Nostoc_sp
Porphyromonas_gingivalis_W83
Pseudomonas_aeruginosa
Pseudomonas_putida_KT2440
Pseudomonas_syringae
Pyrococcus_furiosus
Pyrococcus_horikoshii
Rhodopseudomonas_palustris_CGA009
Salmonella_enterica_Choleraesuis
Shewanella_oneidensis
Shigella_flexneri_2a
Staphylococcus_aureus_MW2
Staphylococcus_epidermidis_ATCC_12228
Streptococcus_agalactiae_2603
Streptococcus_mutans
Streptococcus_pneumoniae_TIGR4
Streptococcus_pyogenes_M1_GAS
Streptomyces_avermitilis
Sulfolobus_solfataricus
Thermoplasma_acidophilum
Thermotoga_maritima
Thermus_thermophilus_HB27
Xanthomonas_campestris
Xylella_fastidiosa
10
20
30
40
50
60
70
80
90
100
CML targets - first results
Expanding the scope of target selection
Pfam
No fold
prediction
1367
Present in
Prokaryotes
PFAM targets - very first results
Next steps - going where no PFAM has gone before
Universe
of known
proteins
Pfam
400
The future - how large is the universe of proteins? First
GOS results
GOS data
(and we know
its just the
begining
Universe
proteins we
know today
Pfam
200
Growing structural coverage of T. maritima
• Direct structural coverage of 32% of the expressed soluble proteins and ~13%
of proteome; (238 unique PDB structures).
• With homology and fold recognition models, over 72% (89% of predicted
crystallizable non-orphan proteins), one of the highest structural coverage of an
organism.
Structural coverage of t.maritima proteome
80
70
% covered
60
~73% of feasible targets
in PDB
sequence identity > 30%
blast e-value < 0.001
FFAS score < -9.5
50
40
30
20
10
0
1980
1985
1990
1995
year
2000
2005
What is real impact of PSI - are new folds most important ?
TM0875 from t.maritima
53686717 from n.punctiforme
•new fold
•two domains of known folds but no
recognizable sequence similarity to
known structures
•no homologs – an “orphan”
•no corresponding Pfam family
•C-terminal domain provides the first
structural template for Pfam family of over
500 sequences (PF00877)
GNF & TSRI
Crystallomics Core
Scott Lesley
Mark Knuth
Dennis Carlton
Marc Deller
Thomas Clayton
Michael DiDonato
Glen Spraggon
Andreas Kreusch
Daniel McMullan
Heath Klock
Polat Abdubek
Eileen Ambing
Joanna C. Hale
Eric Hampton
Eric Koesema
Edward Nigoghossian
Aprilfawn White
Sanjay Agarwalla
Christina Trout
Ylva Elias
Hope Johnson
Jessica Paulsen
Linda Okach
Bernhard Geierstanger
Julie Feuerhelm
Jessica Canseco
UCSD & Burnham
Bioinformatics Core
John Wooley
Adam Godzik
Slawomir Grzechnik
Lukasz Jaroszewski
Sri Krishna Subramanian
Andrew Morse
Tamara Astakhova
Lian Duan
Piotr Kozbial
Naomi Cotton
Dana Weekes
Lukasz Slabinski
Josie Alaoen
TSRI
NMR Core
Kurt Wüthrich
Reto Horst
Maggie Johnson
Marcius Almeida
Michael Gerault
Wojtek Augustyniak
Pedro Serrano
Bill Pedrini
Stanford /SSRL
Structure
Determination Core
Keith Hodgson
Ashley Deacon
Mitchell Miller
Herbert Axelrod
Hsiu-Ju (Jessica) Chiu
Kevin Jin
Christopher Rife
Qingping Xu
Silvya Oommachen
Henry van den Bedem
Scott Talafuse
Ronald Reyes
Abhinav Kumar
Jonathan Caruthers
Chloe Zabieta
Amanda Prado
TSRI
Administrative Core
Ian Wilson
Marc Elsliger
Jason Kay
Gye Won Han
David Marciano
Scientific Advisory Board
Sir Tom Blundell
Univ. Cambridge
Homme Helinga
Duke University Medical Center
James Naismith
The Scottish Structural Proteomics facility
Univ. St. Andrews
James Paulson,
Consortium for Functional Glycomics,
The Scripps Research Institute
Robert Stroud,
Center for Structure of Membrane Proteins,
Membrane Protein Expression Center
UC San Francisco
Todd Yeates,
UCLA-DOE, Inst. for
Genomics and Proteomics
Soichi Wakatsuki,
Photon Factory, KEK, Japan
James Wells,
UC San Francisco
The JCSG is supported by the NIH Protein
Structure Initiative grant U54 GM074898
from the National Institute of General
Medical Sciences (www.nigms.nih.gov).