Transcript Slide 1

Ligand search and data mining of
Structural Genomics structures
Abhinav Kumar, Herbert Axelrod, Ashley Deacon
Structure Determination Core, Joint Center for Structural Genomics (JCSG), Stanford Synchrotron Radiation Laboratory, Menlo Park, CA, USA
2
1
3
The JCSG Target Pipeline
The Joint Center for Structural Genomics (JCSG)
The Role of the Structure Determination Core in the JCSG
1. Screen Crystals and Collect Data
2. Automatically Process Data
Autoindex
Each project moves from target
selection through publication along
the Target Pipeline.
Integrate
Scale
Solve
Trace
3. Refine and Evaluate Structures
4. Disseminate Information*
 Publish
 Web based Tools
 TOPSPAN (www.topsan.org)
 Ligand Search (smb.slac.stanford.edu/public/jcsg/cgi/jcsg_ligand_check.pl)
The JCSG (www.jcsg.org) is one of the four large-scale structural genomics centers
funded by NIGMS as part of the production phase of the Protein Structure Initiative
(PSI). More than 2600 structures have been deposited into the PDB by the PSI
centers as of 2007, of which the JCSG has contributed over 500 structures.
* in collaboration with BIC
4
5
JCSG Ligand Search
6
Summary of Ligands (1606 structures)
Ligands (269 structures; 140 different ligands): UNL(70), UNX(22), LLP(6), SIN(6), NDP(6), MA7(6), NAG(5), PLM(4), UNK(4), GUN(3), APC(3), SUC(3), BAL(3),
GLC(3), PAF(3), APR(2), GAL(2), NCN(2), CSD(2), SAI(2), CEI(2), BIO(2), HMH(2), SAP(2), GNP(2), 144(2), NCA(2), G4P(2), MPO(2), SRT(2), ANP(2), PCP(2),
BGC(2), PAJ(2), NIG(1), PRP(1), NIO(1), ABF(1), IPR(1), MTA(1), CP(1), MLT(1), DI6(1), MED(1), MLZ(1), 5GP(1), CSO(1), CDP(1), I3A(1), 2PL(1), HED(1),
G1P(1), NBZ(1), CSY(1), FRU(1), PLG(1), THF(1), B1M(1), ACP(1), DU(1), MMZ(1), OHA(1), 16A(1), THT(1), M7P(1), 3GC(1), CF5(1), PEO(1), CTZ(1), ADE(1),
FT6(1), KEG(1), LUM(1), XLS(1), BAM(1), ADN(1), PMP(1), ADQ(1), B33(1), DGI(1), G3H(1), OXG(1), NDS(1), SAL(1), 3SL(1), SIB(1), STH(1), FEO(1), G3P(1),
OXN(1), FES(1), TYD(1), DGT(1), 8PP(1), CO2(1), MP5(1), NTM(1), PNS(1), AES(1), APK(1), UVW(1), TRE(1), PYR(1), NAI(1), TCL(1), NMN(1), MAN(1), BFD(1),
HHP(1), RIP(1), RBF(1), ORO(1), SNN(1), DTP(1), ZID(1), DEP(1), UPG(1), HXA(1), AAT(1), DTY(1), DON(1), NPO(1), C2E(1), AGC(1), BDF(1), PHT(1), OSB(1),
NVA(1), CRO(1), BDN(1), TNE(1), SOG(1), AGS(1), TLP(1), 1PS(1), DUT(1), CXS(1), GEQ(1), MRD(1), G6P(1)
Co-factors (211 structures; 21 different co-factors): FMN(36), NAD(29), COA(18), NAP(17), PLP(15), ADP(15), FAD(15), SAM(14), ATP(9), SAH(9), AMP(9), HEM(8),
ACO(7), GDP(4), FS4(3), U5P(2), MLC(1), COD(1), CNC(1), UTP(1), CTP(1)
Metal Ions (647 structures; 30 different metal ions): MG(177), ZN(174), NA(102), CA(83), NI(40), MN(31), FE(26), K(16), FE2(9), CD(8), PT(8), HG(7), CO(5), SM(2),
WO4(2), PR(2), AU(2), BA(1), CS(1), MW2(1), SE(1), ARS(1), ZN3(1), O4M(1), YT3(1), LI(1), MO2(1), MO3(1), VO4(1), MO6(1)
Non-metal Ions (692 structures; 22 different non-metal ions): SO4(324), CL(243), PO4(118), NO3(11), IOD(10), BR(10), SCN(8), CO3(4), CAC(4), POP(3), AZI(3),
SUL(2), BCT(2), ALF(2), OXL(2), PER(1), SO3(1), MLI(1), PO3(1), THJ(1), 1AL(1), NH4(1)
Organics (90 structures; 26 different organics): IPA(14), EOH(13), BME(9), BEZ(5), TLA(5), SEO(5), AKG(5), ETX(4), TAR(4), PGO(4), DTT(4), OAA(2), ACE(2),
DMS(2), MLA(1), DOX(1), XYL(1), MOH(1), 3OH(1), AZ1(1), PPI(1), IOH(1), FOR(1), MYR(1), GTT(1), LMT(1)
Buffers (240 structures; 15 different buffers): ACT(86), ACY(47), FMT(37), CIT(27), TRS(16), EPE(15), MES(12), IMD(8), TMN(2), 10A(2), BTB(2), ICT(1), CPS(1),
FLC(1), NHE(1)
Precipitants (98 structures; 13 different precipitants): PEG(38), PG4(28), PGE(16), 1PE(8), P6G(7), 2PE(3), PE4(3), P33(3), PE5(2), PEF(1), BU3(1), 1PG(1), PE8(1)
Salts (3 structures; 3 different salts): DPO(1), AF3(1), PPC(1)
Detergents (2 structures; 1 different detergents): BOG(2)
Cryos (502 structures; 5 different cryos): GOL(244), EDO(241), MPD(32), EGL(3), CRY(2)
7
8
PDB
2A3L
2OU3
1VR0
2OD6
1X92
1O8B
2OSU
1M33
1RTW
2NW9
1XKL
1LW4
2B4B
1TUF
2PUZ
2Q09
2GVC
1Y0G
1Z2L
1Y80
1KPH
1KPI
1N2H
1N2I
1BVR
1QPR
1P44
Search Results (35 hits)
N
Target
PDB
PFAM
Accession
1
FB10607B
2r6v
PF01613
NP_142786.1
2
3
FH7614A
FJ9446A
2ou5
PF01243
PF01243
NP_349178.1
YP_508196.1
Organism
Crystal Structure of FMN-binding Protein
Pyrococcus
(NP_142786.1) from Pyrococcus Horikoshii at Horikoshii Ot3
1.35 Å resolution
Ligands
EDO
FMN
NCA
PSI
JCSG
Crystal Structure of NIMC/NIMA Family
Protein (NP_349178.1) from Clostridium
Acetobutylicum at 1.80 Å resolution
Clostridium
EDO
JCSG
Acetobutylicum FMN SO4
UNL
Crystal Structure of Pyridoxamine 5'phosphate Oxidase- Related FMN-binding
(YP_508196.1) From Jannaschia Sp. Ccs1 at
1.60 Å resolution
Jannaschia Sp.
Ccs1
FMN
GOL SO4
JCSG
…
…
…
…
…
…
…
34 SGT98480
1q45
PF00724
NP_178662.1
12-0xo-Phytodienoate Reductase Isoform 3
Arabidopsis
Thaliana
FMN
CESG
35 TB0885A
1vp8
PF08981
NP_068944.1
Crystal Structure of Hypothetical Protein
(NP_068944.1) from Archaeoglobus Fulgidus
at 1.30 Å resolution
Archaeoglobus
Fulgidus Dsm
4304
FMN
UNL
JCSG
.
…
2ig6
Ligand Name
Coformycin 5'-Phosphate
1H-Indole-3-Carbaldehyde
(2R)-3-Sulfolactic Acid
10-Oxohexadecanoic Acid
D-Glycero-D-Mannopyranose-7-Phosphate
Beta-D-Arabinofuranose-5'-Phosphate
6-Diazenyl-5-Oxo-L-Norleucine
3-Hydroxy-Propanoic Acid
(4-Amino-2-Methylpyrimidin-5-Yl)Methyl Dihydrogen Phosphate
6-Fluoro-L-Tryptophan
2-Amino-4H-1,3-Benzoxathiin-4-Ol
3-Hydroxy-2-[(3-Hydroxy-2-Methyl-5-Phosphonooxymethyl- Pyridin-4-Ylmethyl)-Amino]-Butyric Acid
N-Ethyl-N-[3-(Propylamino)Propyl]Propane- 1,3-Diamine
Azelaic Acid
N-(Iminomethyl)-L-Glutamic Acid
3-[(4S)-2,5-Dioxoimidazolidin-4-Yl]Propanoic Acid
1-Methyl-1,3-Dihydro-2H-Imidazole-2-Thione
2-[(2E,6E,10E,14E,18E,22E,26E)-3,7,11,15,19,23,27,31- Octamethyldotriaconta-2,6,10,14,18,22,26,30- Octaenyl]Phenol
Allantoate Ion
Co-5-Methoxybenzimidazolylcobamide
Didecyl-Dimethyl-Ammonium
Didecyl-Dimethyl-Ammonium
Pantoyl Adenylate
Pantoyl Adenylate
Trans-2-Hexadecenoyl-(N-Acetyl-Cysteamine)- Thioester
5-Phosphoribosyl-1-(Beta-Methylene) Pyrophosphate
5-{[4-(9H-Fluoren-9-Yl)Piperazin-1-Yl]Carbonyl}- 1H-Indole
10
Description
9
Unique PSI Ligands
Ligand
CF5
I3A
3SL
OHA
M7P
ABF
DON
3OH
MP5
FT6
STH
TLP
B33
AZ1
NIG
DI6
MMZ
8PP
1AL
B1M
10A
10A
PAJ
PAJ
THT
PPC
GEQ
PSI
CESG
JCSG
JCSG
JCSG
MCSG
MCSG
MCSG
MCSG
NESG
NESG
NESG
NYSGXRC
NYSGXRC
NYSGXRC
NYSGXRC
NYSGXRC
NYSGXRC
NYSGXRC
NYSGXRC
SECSG
TBSGC
TBSGC
TBSGC
TBSGC
TBSGC
TBSGC
TBSGC
Target
PDB
CL6107A 2ICH
Description
Organism
Putative ATTH (NP_841447.1) at 2.00 A
Nitrosomonas Europaea
Ligand
Clostridium Acetobutylicum 3SL
TM0160
1VJL
Thermotoga Maritima
UNL
TM0449
1KQ4 Thy1-complementing Protein at 2.25 A
Thermotoga Maritima
FAD
TM0574
1VKY S-adenosylmethionine Trna Ribosyltransferase at 2.00 A
Thermotoga Maritima
UNL
TM1394
1VQ0 33 kDa Chaperonin (heat Shock Protein 33 Homolog) at 2.20 A
Thermotoga Maritima
UNL
TM1464
1VKM Conserved Hypothetical Protein Possibly Involved in Carbohydrate Metabolism at 1.90 A Thermotoga Maritima Msb8 UNL
TM1506
1VK9
TM1553
1VRM Hypothetical Protein at 1.58 A
Hypothetical Protein at 2.70 A
2ICH
Thermotoga Maritima
Indole-3-Carboxaldehyde (I3A) bound to
the structure of tellurite resistance
protein of cog3793 (zp_00109916.1) from
Nostoc Punctiforme PCC 73102 (2OU3)
FB8805A (2Q9K)
Unknown protein
Unknown Ligands (UNL)
FK9436A (2OH1)
Acetyltransferase Gnat family
Binding Modes of Ligands
There are over 340 structures in PDB with the co-factor Flavin Mononucleotide (FMN) bound to the protein
The binding poses of FMN display
considerable variations due to the
torsional flexibility in the molecule.
However, unique binding poses can be
observed in proteins belonging to specific
PFAM families.
PF01243
(Pyridoxamine 5'phosphate oxidase)
UNL
Thermotoga Maritima Msb8 UNL
1VR0
1VJL
1KQ4
1VKY
Ligand Visualization Links
HIC-Up:
1y30
ACY ADP AMP BR CA CL EDO FMN GLC GOL IOD MG NCA NI ORO P33 PO4 SO4
1VQ0
1VKM
1VK9
1VRM
Ligand Depot: ACY ADP AMP BR CA CL EDO FMN GLC GOL IOD MG NCA NI ORO P33 PO4 SO4
GNF & TSRI (Crystallomics Core)
Scott Lesley
Thomas Clayton
Marc Deller
Polat Abdubek
Julie Feuerhelm
Hope Johnson
Sebastian Sudek
Glen Spraggon
Charlene Cho
Jessica Canseco
Mark Knuth
Dennis Carlton
Kevin D. Murphy
Christina Trout
Daniel McMullan
Heath Klock
Claire Acosta
Linda M. Columbus
Joanna C. Hale
Thamara Janaratne
Linda Okach
Edward Nigoghossian
Aprilfawn White
Bernhard Geierstanger
Ylva Elias
Sanjay Agarwalla
Bi-Ying YehAnna Grzechnik
Mimmi Brown
UCSD & Burnham (Bioinformatics Core)
John Wooley
Adam Godzik
Slawomir Grzechnik
Lukasz Jaroszewski Dana Weekes
Lian Duan
Sri Krishna Subramanian
Natasha Sefcovic
Piotr Kozbial
Andrew Morse
Prasad Burra
Tamara Astakhova
Josie Alaoen
Cindy Cook
TSRI (NMR Core)
Kurt Wüthrich
Reto Horst
Maggie Johnson
Amaranth Chatterjee
Michael Geralt
Wojtek Augustyniak
Pedro Serrano
Bill Pedrini
William Placzek
Stanford /SSRL
Structure Determination Core
Keith Hodgson
Mitchell Miller
Hsiu-Ju (Jessica) Chiu
Christopher Rife
Silvya Oommachen
Henry van den Bedem
Christine Trame
Ashley Deacon
Debanu Das
Kevin Jin
Qingping Xu
Scott Talafuse
Ronald Reyes
Scientific Advisory Board
Sir Tom Blundell
Univ. Cambridge
Homme Hellinga
Duke University Medical Center
James Naismith
The Scottish Structural Proteomics facility
Univ. St. Andrews
Soichi Wakatsuki
Photon Factory, KEK, Japan
Proteomics
James Wells
UC San Francisco
10-Oxohexadecanoic acid (OHA) bound
to the structure of Ferredoxin-like
Protein (JCVI_PEP_1096682647733) from
an environmental metagenome
(Unidentified Marine Microbe) (2OD6)
NHE
TB0797A 1VR0 Putative 2-phosphosulfolactate Phosphatase at 2.6 A
Predicted Protein related to Wound Inducive Proteins in Plants at 1.90 A
(R)-2-Hydroxy-3-Sulfopropanoic acid
(3SL) bound to the structure of putative
2-phosphosulfolactatetitle 2 phosphatase
from Clostridium Acetobutylicum (1VR0)
11
Ligands bound to JCSG new folds
Unique Ligands
Robert Stroud
Center for Structure of Membrane Proteins
Membrane Protein Expression Center
UC San Francisco
James Paulson
Consortium for Functional Glycomics
The Scripps Research Institute
Todd Yeates
UCLA-DOE Inst. for Genomics and
The JCSG is supported by the NIH Protein
Structure Initiative (PSI) Grant U54 GM074898 from
NIGMS (www.nigms.nih.gov). Portions of this
research were carried out at the Stanford Synchrotron
Radiation Laboratory (SSRL). The SSRL is a national
user facility operated by Stanford University on behalf
of the U.S. Department of Energy, Office of Basic
Energy Sciences. The SSRL Structural Molecular
Biology Program is supported by the Department of
Energy, Office of Biological and Environmental
Research, and by the NIH.
PF01613
(Flavin reductase like
domain)
Number of Structures
PFAM
PSI
Non-PSI
Total
PF01243
7
14
21
PF01613
2
8
10
PF04289
3
0
3
PF04289
(Unknown Function DUF447)