Transcript Slide 1

Ligand Search and Data Mining of
Structural Genomics Structures
Abhinav Kumar, Herbert Axelrod, Ashley Deacon
Structure Determination Core, Joint Center for Structural Genomics (JCSG), Stanford Synchrotron Radiation Laboratory, Menlo Park, CA, USA
1
2
The Joint Center for Structural Genomics (JCSG)
3
The Role of the Structure Determination Core in the JCSG
The JCSG (www.jcsg.org) is one of the four large-scale structural genomics centers funded by
NIGMS as part of the production phase of the Protein Structure Initiative (PSI). More than 2600
structures have been deposited into the PDB by the PSI centers as of 2007, of which the JCSG
has contributed well over 500 structures. Although the major part of JCSG's resources is
dedicated to protein structure determination, we are also making efforts to disseminate
information gained from these structures to a larger community of researchers. Here we report
the
development
of
a
web-based
data
mining
engine
(smb.slac.stanford.edu/public/jcsg/cgi/jcsg_ligand_check.pl) that queries all of the PSI
structures based on a variety of search criteria. The main objective is to extract ligands,
biological or otherwise, bound to these structures, and to explore them further with a number of
associated links. In addition, the structures can be queried by a host of other criteria, such as
target names, PDB IDs, PFAM family names, structure descriptions, organisms, and PSI
centers. Preliminary analysis indicates that 1515 of these PSI structures have some type of
bound ligand, metal or solvent molecules, and 262 contain 136 unique biological ligands.
Interestingly, several of these ligands had not been previously identified in structures in the
PDB. In addition, 21 different co-factors have been observed in 210 structures.
JCSG Ligand Search
1. Screen Crystals and Collect Data
2. Automatically Process Data
Autoindex
Integrate
Scale
Solve
Trace
3. Refine and Evaluate Structures
4. Disseminate Information*
 Publish
 Web based Tools
 TOPSPAN (www.topsan.org)
 Ligand Search (smb.slac.stanford.edu/public/jcsg/cgi/jcsg_ligand_check.pl)
* in collaboration with JCSG Bioinformatics Core
7
40
80
Ligands
70
60
30
50
25
5
Non-metal Ions (692 structures; 22 different non-metal ions): SO4(324), CL(243), PO4(118), NO3(11), IOD(10), BR(10), SCN(8), CO3(4), CAC(4), POP(3), AZI(3),
SUL(2), BCT(2), ALF(2), OXL(2), PER(1), SO3(1), MLI(1), PO3(1), THJ(1), 1AL(1), NH4(1)
200
Organics (90 structures; 26 different organics): IPA(14), EOH(13), BME(9), BEZ(5), TLA(5), SEO(5), AKG(5), ETX(4), TAR(4), PGO(4), DTT(4), OAA(2), ACE(2),
DMS(2), MLA(1), DOX(1), XYL(1), MOH(1), 3OH(1), AZ1(1), PPI(1), IOH(1), FOR(1), MYR(1), GTT(1), LMT(1)
350
Metal Ions
180
300
160
60
25
50
20
CTP
200
150
80
40
15
30
60
100
10
20
40
50
0
0
0
MG
CA
FE
CD
CO
PR
CS
ARS YT3 MO3
5
10
20
Detergents (2 structures; 1 different detergents): BOG(2)
UTP
70
100
Salts (3 structures; 3 different salts): DPO(1), AF3(1), PPC(1)
Precipitants
35
30
250
120
Precipitants (98 structures; 13 different precipitants): PEG(38), PG4(28), PGE(16), 1PE(8), P6G(7), 2PE(3), PE4(3), P33(3), PE5(2), PEF(1), BU3(1), 1PG(1), PE8(1)
Buffers
90
80
140
Buffers (240 structures; 15 different buffers): ACT(86), ACY(47), FMT(37), CIT(27), TRS(16), EPE(15), MES(12), IMD(8), TMN(2), 10A(2), BTB(2), ICT(1), CPS(1),
FLC(1), NHE(1)
40
100
Non-metal Ions
CNC
NIO
COD
BGC
MLC
MPO
U5P
GNP
FS4
CEI
GDP
GAL
ACO
BAL
HEM
UNK
AMP
NDP
SAH
UNL
ATP
0
0
SAM
10
FAD
Metal Ions (647 structures; 30 different metal ions): MG(177), ZN(174), NA(102), CA(83), NI(40), MN(31), FE(26), K(16), FE2(9), CD(8), PT(8), HG(7), CO(5), SM(2),
WO4(2), PR(2), AU(2), BA(1), CS(1), MW2(1), SE(1), ARS(1), ZN3(1), O4M(1), YT3(1), LI(1), MO2(1), MO3(1), VO4(1), MO6(1)
FS4
10
GAL
20
PLP
15
MPO
ADP
30
PLP
Co-factors (211 structures; 21 different co-factors): FMN(36), NAD(29), COA(18), NAP(17), PLP(15), ADP(15), FAD(15), SAM(14), ATP(9), SAH(9), AMP(9), HEM(8),
ACO(7), GDP(4), FS4(3), U5P(2), MLC(1), COD(1), CNC(1), UTP(1), CTP(1)
FMN
20
NDP
40
Co-factors
35
NAP
Ligands (269 structures; 140 different ligands): UNL(70), UNX(22), LLP(6), SIN(6), NDP(6), MA7(6), NAG(5), PLM(4), UNK(4), GUN(3), APC(3), SUC(3), BAL(3),
GLC(3), PAF(3), APR(2), GAL(2), NCN(2), CSD(2), SAI(2), CEI(2), BIO(2), HMH(2), SAP(2), GNP(2), 144(2), NCA(2), G4P(2), MPO(2), SRT(2), ANP(2), PCP(2),
BGC(2), PAJ(2), NIG(1), PRP(1), NIO(1), ABF(1), IPR(1), MTA(1), CP(1), MLT(1), DI6(1), MED(1), MLZ(1), 5GP(1), CSO(1), CDP(1), I3A(1), 2PL(1), HED(1),
G1P(1), NBZ(1), CSY(1), FRU(1), PLG(1), THF(1), B1M(1), ACP(1), DU(1), MMZ(1), OHA(1), 16A(1), THT(1), M7P(1), 3GC(1), CF5(1), PEO(1), CTZ(1), ADE(1),
FT6(1), KEG(1), LUM(1), XLS(1), BAM(1), ADN(1), PMP(1), ADQ(1), B33(1), DGI(1), G3H(1), OXG(1), NDS(1), SAL(1), 3SL(1), SIB(1), STH(1), FEO(1), G3P(1),
OXN(1), FES(1), TYD(1), DGT(1), 8PP(1), CO2(1), MP5(1), NTM(1), PNS(1), AES(1), APK(1), UVW(1), TRE(1), PYR(1), NAI(1), TCL(1), NMN(1), MAN(1), BFD(1),
HHP(1), RIP(1), RBF(1), ORO(1), SNN(1), DTP(1), ZID(1), DEP(1), UPG(1), HXA(1), AAT(1), DTY(1), DON(1), NPO(1), C2E(1), AGC(1), BDF(1), PHT(1), OSB(1),
NVA(1), CRO(1), BDN(1), TNE(1), SOG(1), AGS(1), TLP(1), 1PS(1), DUT(1), CXS(1), GEQ(1), MRD(1), G6P(1)
Distribution of Ligands
COA
Summary of Ligands (1606 structures)
NAD
6
FMN
Examples of Search Queries
4
SO4 PO4 IOD SCN CAC AZI BCT OXL SO3 PO3 1AL
0
ACT
FMT
TRS
MES
TMN
BTB
CPS
NHE
PEG PG4 PGE 1PE P6G 2PE PE4 P33 PE5 PEF BU3 1PG PE8
Cryos (502 structures; 5 different cryos): GOL(244), EDO(241), MPD(32), EGL(3), CRY(2)
5
8
A Typical Search Result
Unique PSI Ligands
PDB
2A3L
2OU3
1VR0
2OD6
1X92
1O8B
2OSU
1M33
1RTW
2NW9
1XKL
1LW4
2B4B
1TUF
2PUZ
2Q09
2GVC
1Y0G
1Z2L
1Y80
1KPH
1KPI
1N2H
1N2I
1BVR
1QPR
1P44
Ligand Name
Coformycin 5'-Phosphate
1H-Indole-3-Carbaldehyde
(2R)-3-Sulfolactic Acid
10-Oxohexadecanoic Acid
D-Glycero-D-Mannopyranose-7-Phosphate
Beta-D-Arabinofuranose-5'-Phosphate
6-Diazenyl-5-Oxo-L-Norleucine
3-Hydroxy-Propanoic Acid
(4-Amino-2-Methylpyrimidin-5-Yl)Methyl Dihydrogen Phosphate
6-Fluoro-L-Tryptophan
2-Amino-4H-1,3-Benzoxathiin-4-Ol
3-Hydroxy-2-[(3-Hydroxy-2-Methyl-5-Phosphonooxymethyl- Pyridin-4-Ylmethyl)-Amino]-Butyric Acid
N-Ethyl-N-[3-(Propylamino)Propyl]Propane- 1,3-Diamine
Azelaic Acid
N-(Iminomethyl)-L-Glutamic Acid
3-[(4S)-2,5-Dioxoimidazolidin-4-Yl]Propanoic Acid
1-Methyl-1,3-Dihydro-2H-Imidazole-2-Thione
2-[(2E,6E,10E,14E,18E,22E,26E)-3,7,11,15,19,23,27,31- Octamethyldotriaconta-2,6,10,14,18,22,26,30- Octaenyl]Phenol
Allantoate Ion
Co-5-Methoxybenzimidazolylcobamide
Didecyl-Dimethyl-Ammonium
Didecyl-Dimethyl-Ammonium
Pantoyl Adenylate
Pantoyl Adenylate
Trans-2-Hexadecenoyl-(N-Acetyl-Cysteamine)- Thioester
5-Phosphoribosyl-1-(Beta-Methylene) Pyrophosphate
5-{[4-(9H-Fluoren-9-Yl)Piperazin-1-Yl]Carbonyl}- 1H-Indole
10
Search Results (35 hits)
N
Target
PDB
PFAM
Accession
1
FB10607B
2r6v
PF01613
NP_142786.1
2
3
.
FH7614A
FJ9446A
…
34 SGT98480
35 TB0885A
2ig6
2ou5
PF01243
PF01243
NP_349178.1
YP_508196.1
Description
Organism
Ligands
Crystal Structure of FMN-binding Protein
Pyrococcus
(NP_142786.1) from Pyrococcus Horikoshii at Horikoshii Ot3
1.35 Å resolution
EDO
FMN
NCA
Clostridium
EDO
JCSG
Acetobutylicum FMN SO4
UNL
Crystal Structure of Pyridoxamine 5'phosphate Oxidase- Related FMN-binding
(YP_508196.1) From Jannaschia Sp. Ccs1 at
1.60 Å resolution
Jannaschia Sp.
Ccs1
FMN
GOL SO4
JCSG
…
…
…
…
…
…
…
1q45
PF00724
NP_178662.1
12-0xo-Phytodienoate Reductase Isoform 3
Arabidopsis
Thaliana
FMN
CESG
1vp8
PF08981
NP_068944.1
Crystal Structure of Hypothetical Protein
(NP_068944.1) from Archaeoglobus Fulgidus
at 1.30 Å resolution
Target
PDB
Description
Organism
Putative ATTH (NP_841447.1) at 2.00 A
Nitrosomonas Europaea
FMN
UNL
Ligand
NHE
Clostridium Acetobutylicum 3SL
TM0160
1VJL
Thermotoga Maritima
UNL
TM0449
1KQ4 Thy1-complementing Protein at 2.25 A
Thermotoga Maritima
FAD
TM0574
1VKY S-adenosylmethionine Trna Ribosyltransferase at 2.00 A
Thermotoga Maritima
UNL
TM1394
1VQ0 33 kDa Chaperonin (heat Shock Protein 33 Homolog) at 2.20 A
Thermotoga Maritima
UNL
TM1464
1VKM Conserved Hypothetical Protein Possibly Involved in Carbohydrate Metabolism at 1.90 A Thermotoga Maritima Msb8 UNL
TM1506
1VK9
TM1553
1VRM Hypothetical Protein at 1.58 A
Predicted Protein related to Wound Inducive Proteins in Plants at 1.90 A
Hypothetical Protein at 2.70 A
Thermotoga Maritima
UNL
Thermotoga Maritima Msb8 UNL
1VRM
JCSG
1VJL
1VR0
1KQ4
Ligand Visualization Links
HIC-Up:
ACY ADP AMP BR CA CL EDO FMN GLC GOL IOD MG NCA NI ORO P33 PO4 SO4
1VQ0
1VKM
1VK9
1VKY
Ligand Depot: ACY ADP AMP BR CA CL EDO FMN GLC GOL IOD MG NCA NI ORO P33 PO4 SO4
UCSD & Burnham
(Bioinformatics Core)
John Wooley
Lukasz Jaroszewski
Lian Duan
Natasha Sefcovic
Andrew Morse
Tamara Astakhova
Cindy Cook
Adam Godzik
Slawomir Grzechnik
Sri Krishna Subramanian
Piotr Kozbial
Prasad Burra
Josie Alaoen
Dana Weekes
GNF & TSRI
(Crystallomics Core)
Scott Lesley
Dennis Carlton
Marc Deller
Polat Abdubek
Julie Feuerhelm
Hope Johnson
Sebastian Sudek
Glen Spraggon
Charlene Cho
Jessica Canseco
Mark Knuth
Heath Klock
Thomas Clayton
Kevin D. Murphy
Daniel McMullan
Christina Trout
Claire Acosta
Linda M. Columbus
Joanna C. Hale
Thamara Janaratne
Linda Okach
Edward Nigoghossian
Aprilfawn White
Bernhard Geierstanger
Ylva Elias
Sanjay Agarwalla
Bi-Ying YehAnna Grzechnik
Mimmi Brown
Stanford /SSRL
(Structure Determination Core)
Keith Hodgson
Mitchell Miller
Hsiu-Ju (Jessica) Chiu
Christopher Rife
Silvya Oommachen
Henry van den Bedem
Christine Trame
Ashley Deacon
Debanu Das
Kevin Jin
Qingping Xu
Scott Talafuse
Ronald Reyes
TSRI
(NMR Core)
TSRI
(Admin Core)
Kurt Wüthrich
Reto Horst
Maggie Johnson
Amaranth Chatterjee
Michael Geralt
Wojtek Augustyniak
Pedro Serrano
Bill Pedrini
William Placzek
Ian Wilson
Marc Elsliger
Gye Won Han
David Marciano
Henry Tien
Xiaoping Dai
Lisa van Veen
Scientific Advisory Board
Sir Tom Blundell
Univ. Cambridge
Homme Hellinga
Duke University Medical Center
James Naismith
The Scottish Structural Proteomics facility
Univ. St. Andrews
Soichi Wakatsuki
Photon Factory, KEK, Japan
James Wells
UC San Francisco
Unique Ligands
(R)-2-Hydroxy-3-Sulfopropanoic acid
(3SL) bound to the putative
2-phosphosulfolactatetitle 2 phosphatase
from Clostridium Acetobutylicum (1VR0)
Robert Stroud
Center for Structure of Membrane Proteins
Membrane Protein Expression Center
UC San Francisco
James Paulson
Consortium for Functional Glycomics
The Scripps Research Institute
Todd Yeates
UCLA-DOE Inst. for Genomics and Proteomics
Indole-3-Carboxaldehyde (I3A) bound to
tellurite resistance protein of COG3793
(ZP_00109916.1) from Nostoc
Punctiforme PCC 73102 (2OU3)
10-Oxohexadecanoic acid (OHA) bound to
Ferredoxin-like protein
(JCVI_PEP_1096682647733) from an
environmental metagenome (unidentified marine
microbe) (2OD6)
FB8805A (2Q9K)
Protein of unknown
function
Unknown Ligands (UNL)
FK9436A (2OH1)
Acetyltransferase Gnat family
11
TB0797A 1VR0 Putative 2-phosphosulfolactate Phosphatase at 2.6 A
2ICH
Archaeoglobus
Fulgidus Dsm
4304
PSI
CESG
JCSG
JCSG
JCSG
MCSG
MCSG
MCSG
MCSG
NESG
NESG
NESG
NYSGXRC
NYSGXRC
NYSGXRC
NYSGXRC
NYSGXRC
NYSGXRC
NYSGXRC
NYSGXRC
SECSG
TBSGC
TBSGC
TBSGC
TBSGC
TBSGC
TBSGC
TBSGC
Ligands bound to JCSG Structures with New Folds
CL6107A 2ICH
Crystal Structure of NIMC/NIMA Family
Protein (NP_349178.1) from Clostridium
Acetobutylicum at 1.80 Å resolution
Ligand
CF5
I3A
3SL
OHA
M7P
ABF
DON
3OH
MP5
FT6
STH
TLP
B33
AZ1
NIG
DI6
MMZ
8PP
1AL
B1M
10A
10A
PAJ
PAJ
THT
PPC
GEQ
9 out of 26 new fold structures from JCSG have bound ligands, which identify their active sites and give some clues
to function. Often the ligands are modeled as UNL, because their precise identity is unknown.
PSI
JCSG
9
Exploring the Binding Modes of Ligands
Over 340 structures in the PDB have the co-factor Flavin Mononucleotide (FMN) bound to the protein
FMN displays
considerable
variation in
binding due to
the torsional
flexibility in the
molecule.
PF01243
(Pyridox._oxidase )
However, unique binding
modes can be observed in
proteins belonging to specific
PFAM families.
Number of Structures
PFAM
PSI
Non-PSI
Total
PF01243
7
14
21
PF00881
9
8
17
PF00258
3
13
16
PF00724
2
8
10
PF01613
2
7
9
PF01180
1
8
9
PF01070
0
8
8
PF01180
(DHOdehase )
PF01613
(Flavin reductase-like)
PF00724
(Oxidored._FMN )
The JCSG is supported by the NIH Protein
Structure Initiative (PSI) Grant U54 GM074898 from
NIGMS (www.nigms.nih.gov). Portions of this
research were carried out at the Stanford Synchrotron
Radiation Laboratory (SSRL). The SSRL is a national
user facility operated by Stanford University on behalf
of the U.S. Department of Energy, Office of Basic
Energy Sciences. The SSRL Structural Molecular
Biology Program is supported by the Department of
Energy, Office of Biological and Environmental
Research, and by the NIH.
Annual meeting with SAB 2007
PF00258
(Flavodoxin _1)
PF00881
(Nitroreductase)
PF01070
(FMN-dependent
dehydrogenase )