Fisica Computazionale applicata alle Macromolecole

Download Report

Transcript Fisica Computazionale applicata alle Macromolecole

Fisica Computazionale applicata alle
Macromolecole
Predizione della struttura proteica
Pier Luigi Martelli
Università di Bologna
[email protected]
051 2094005
338 3991609
3D structure prediction of proteins
New folds
Existing folds
Ab initio
prediction
Threading
0
10
20 30
Building by
homology
40 50 60 70 80 90 100
Homology (%)
“Comparative modelling” di proteine
Da: Martì-Renom et al. (2000) Annu. Rev. Biophys. Biomol. Struct. 29:291
“Comparative modelling” di proteine
Modelling per omologia
Su larga scala?
Modelli affidabili solo
per il 45% delle
proteine di Swiss Prot
(MODBASE)
http://alto.compbio.ucsf.edu/modb
ase
E’ possibile abbassare la
soglia di identità di
sequenza?
Da: Sanchez et al. (2000) Nature Struct. Biol. (Suppl) 7:986
Comparative Modelling
Selection of Templates
Alignment of the Target
sequence with Template
Modelling of the Target
on the Template
Evaluation of the Model
THE TEMPLATE: 1f13
Sequence alignment of TGL3_HUMAN with 1f13
TGL3
1F13A
MAALGVQSINWQKAFNRQAHHTDKFSSQELILRRGQNFQVLMIMNKGLGSNERLEFIDTT 60
VHLFKERWDTNKVDHHTDKYENNKLIVRRGQSFYVQIDFSRPYDPRRDLFRVEYVIGRYP 60
: : : . : .: : :..:*: :.
*
:
. . :
.
TGL3
1F13A
GPYPSESAMTKAVFPLSNGSSGGWSAVLQASNGNTLTISISSPASAPIGRYTMALQIFSQ 120
QENKGTYIPVPIVSELQSGKWGAKIVMREDRSVRLSIQSSPKCIVGKFRMYVAVWTPYGV 120
.
. * *..*. *. .: : . .
* ..
. : *. .
:.
TGL3
1F13A
GGISSVKLGTFILLFNPWLNVDSVFMGNHAEREEYVQEDAGIIFVGSTNRIGMIGWNFGQ 180
LRTSRNPETDTYILFNPWCEDDAVYLDNEKEREEYVLNDIGVIFYGEVNDIKTRSWSYGQ 180
*
:***** : *:*::.*. ****** :* *:** *..* *
.*.:**
TGL3
1F13A
FEEDILSICLSILDRSLNFRRDAATDVASRNDPKYVGRVLSAMINSNDDNGVLAGNWSGT 240
FEDGILDTCLYVMDR-------AQMDLSGRGNPIKVSRVGSAMVNAKDDEGVLVGSWDNI 233
**:.**. ** ::**
* *::.*.:* *.** ***:*::**:***.*.*..
TGL3
1F13A
YTGGRDPRSWDGSVEILKNWKKSGFSPVRYGQCWVFAGTLNTALRSLGIPSRVITNFNSA 300
YAYGVPPSAWTGSVDILLEYRSSENPVRYGQCWVFAGVFNTFLRCLGIPARIVTNYFSAH 293
*: * * :* ***:** :::.* .
. .
.
. * . *.:
TGL3
1F13A
HDTDRNLSVDVYYDPMGNPLDKGSDSVWNFHVWNEGWFVRSDLGPPYGGWQVLDATPQER 360
DNDANLQMDIFLEEDGNVNSKLTKDSVWNYHCWNEAWMTRPDLPVGFGGWQAVDSTPQEN 353
.: .
. : .
. .*****:* ***.*:.*.**
:****.:*:****.
TGL3
1F13A
SQGVFQCGPASVIGVREGDVQLNFDMPFIFAEVNADRITWLYDNTTGKQWKNSVNSHTIG 420
SDGMYRCGPASVQAIKHGHVCFQFDAPFVFAEVNSDLIYITAKKDGTHVVENVDATHIGK 413
*:*:::****** .::.*.* ::** **:*****:* *
.:
: :*
:*
TGL3
1F13A
RYISTKAVGSNARMDVTDKYKYPEGSDQERQVFQKALGKLKPNTPFAATSSMGLETEEQE 480
LIVTKQIGGDGMMDITDTYKFQEGQEEERLALETALMYGAKKPLNT--------EGVMKS 465
::.: *..
.
.::. :
:
*
*
:.
TGL3
1F13A
PSIIGKLKVAGMLAVGKEVNLVLLLKNLSRDTKTVTVNMTAWTIIYNGTLVHEVWKDSAT 540
RSNVDMDFEVENAVLGKDFKLSITFRNNSHNRYTITAYLSANITFYTGVPKAEFKKETFD 525
* :.
.
.:**:.:* : ::* *:: *:*. ::*
:*.*.
*. *::
TGL3
1F13A
MSLDPEEEAEHPIKISYAQYERYLKSDNMIRITAVCKVPDESEVVVERDIILDNPTLTLE 600
VTLEPLSFKKEAVLIQAGEYMGQLLEQASLHFFVTARINETRDVLAKQKSTVLTIPEIII 585
::*:* . :..: *. .:*
* .: ::: ...:: : :*:.::. : . . :
TGL3
1F13A
VLNEARVRKPVNVQMLFSNPLDEPVRDCVLMVEGSGLLLGNLKIDVPTLGPKERSRVRFD 660
KVRGTQVVGSDMTVTVEFTNPLKETLRNVWVHLDGPGVTRPMKKMFREIRPNSTVQWEEV 645
:. ::* . . : .
: .
* : .. :
:* . : *:. : .
TGL3
1F13A
ILPSRSGTKQLLADFSCNKFPAIKAMLSIDVAE 693
CRPWVSGHRKLIASMSSDSLRHVYGELDVQIQR 678
* ** ::*:*.:*.:.: : . *.::: .
sequence identity 34%
Building the Model: MODELLER
http://salilab.org/modeller/modeller.html
THE TARGET: TGL3_HUMAN
Evaluating the Model: PROCHECK
http://biotech.ebi.ac.uk:8400/
Servers:
http://www.expasy.ch/swissmod/SWISS-MODEL.html
Servers:
http://www.salilab.org/modbase/
Modelling a bassa identità
•Scelta del template in base a dati sperimentali
La determinazione sperimentale della funzione o della
presenza di metalli o gruppi prostetici riduce moltissimo il
numero di fold possibili
Modelling a bassa identità
•Scelta del template in base a dati sperimentali
•Allineamento multiplo di proteine della stessa famiglia
ADH1_SULSO
ADHE_HORSE
ADHS_HORSE
ADH_GADCA
ADH7_HUMAN
ADHX_HUMAN
ADHB_HUMAN
ADH3_ECOLI
GVPKPKGPQVLIKVEAAGVCHSDVH-MRQGRFGNLRIVEDLGVKLPVTLGHEIAGKIEEVGDE
EVAPPKAHEVRIKMVATGICRSDDH-VVSGTLV--------T-PLPVIAGHEAAGIVESIGEG
EVAPPKAHEVRIKMVAAGICRSDDH-VVSGTLV--------A-PLPVIAGHEAAGIVESIGEG
EVDVPHANEIRIKIIATGVCHTDLYHLFEGKHK--------DG-FPVVLGHEGAGIVESVGPG
EVAPPKTKEVRIKILATGICRTDDH-VIKGTMV--------S-KFPVIVGHEATGIVESIGEG
EVAPPKAHEVRIKIIATAVCHTDAY-TLSGADP--------EGCFPVILGHEGAGIVESVGEG
EVAPPKAYEVRIKMVAVGICRTDDH-VVSGNLV--------T-PLPVILGHEAAGIVESVGEG
DVAPPKKGEVLIKVTHTGVCHTDAF-TLSGDDP--------EGVFPVVLGHEGAGVVVEVGEG
La determinazione dei residui maggiormente conservati
fissa alcuni residui importanti (nell’ambito della famiglia) la
cui posizione deve essere mantenuta
Modelling a bassa identità
•Scelta del template in base a dati sperimentali
•Allineamento multiplo di proteine della stessa famiglia
•Utilizzo di predittori (struttura secondaria, accessibilità al
solvente, stato di legame delle cisteine, segmenti
transmembrana….)
TARGET
TEMPLATE
PDDAEMQGTIRSLDENVRSKAKDYMRRIVSSICGIYGATCEVKFMEDVYPTTVNN----PASATLNADVRYARNEDFDAAMKTLEERAQQKKLP---EADVKVIVTR-----GRPAFNA
TARGET
TEMPLATE
PEVTDEVMKILSSISTV------VETEPVLGAEDFSRFLQKAPGTYFFLGTRNEKKGCIY
GEGGKKLVDKAVAYYKEAGGTLGVEERTGGGTDAAYAALSG---KPVIES--LGLPGFGY
a-elica
b-strand
La predizione di caratteristiche strutturale del target
aiuta l’allineamento col template
Alcool deidrogenasi da Sulfolobus solfataricus
Dati sperimentali
•Contiene 2 atomi di zinco per monomero
•Attiva come tetramero
Strutture presenti nella banca dati
•Alcool deidrogenasi a 2 atomi di zinco, dimeriche
2OHX (fegato di cavallo)
ID: 24%
•Alcool deidrogenasi a 1 atomo di zinco, tetrameriche
1YKF (Thermoanaerobacterium brockii)
ID: 23%
Monomeri simili (RMSD < 0.2 nm). Differenze in:
•
loop che coordina il secondo atomo di zinco
•
aree di tetramerizzazione
Allineamento di 87 ADH a 2 atomi di Zn per monomero
ADH1_SULSO
ADHE_HORSE
ADHS_HORSE
ADH_GADCA
ADH7_HUMAN
ADHX_HUMAN
ADHB_HUMAN
ADH1_PEA
ADH3_ECOLI
ADH3_SOLTU
ADH2_BACST
ADH1_ZYMMO
ADHP_ECOLI
ADH2_EMENI
ADH_MYCTU
.........
1
10
20
30
40
50
60
70
80
90
100
110
-----------MRAVRLVEIGKP-LSLQEIGVPKPKGPQVLIKVEAAGVCHSDVH-MRQGRFGNLRIVEDLGVKLPVTLGHEIAGKIEEVGDEVVG—-YSKGDLVAVNPWQG--EGNCYYCRIGEEHLCDSPR---------STAGKVIKCKAAVLWEEKKP-FSIEEVEVAPPKAHEVRIKMVATGICRSDDH-VVSGTLV--------T-PLPVIAGHEAAGIVESIGEGVTT--VRPGDKV-IP-LFTPQCGKCRVCKHPEGNFCLKND-LSMPRG
---STAGKVIKCKAAVLWEQKKP-FSIEEVEVAPPKAHEVRIKMVAAGICRSDDH-VVSGTLV--------A-PLPVIAGHEAAGIVESIGEGVTT--VRPGDKV-IP-LFIPQCGKCSVCKHPEGNLCLKN--LSMPRG
---ATVGKVIKCKAAVAWEANKP-LVIEEIEVDVPHANEIRIKIIATGVCHTDLYHLFEGKHK--------DG-FPVVLGHEGAGIVESVGPGVTE--FQPGEKV-IP-LFISQCGECRFCQSPKTNQCVKGWANES-PD
--MGTAGKVIKCKAAVLWEQKQP-FSIEEIEVAPPKTKEVRIKILATGICRTDDH-VIKGTMV--------S-KFPVIVGHEATGIVESIGEGVTT--VKPGDKV-IP-LFLPQCRECNACRNPDGNLCIRSDIT-G-RG
-----ANEVIKCKAAVAWEAGKP-LSIEEIEVAPPKAHEVRIKIIATAVCHTDAY-TLSGADP--------EGCFPVILGHEGAGIVESVGEGVTK--LKAGDTV-IP-LYIPQCGECKFCLNPKTNLCQKIRVTQG-KG
---STAGKVIKCKAAVLWEVKKP-FSIEDVEVAPPKAYEVRIKMVAVGICRTDDH-VVSGNLV--------T-PLPVILGHEAAGIVESVGEGVTT--VKPGDKV-IP-LFTPQCGKCRVCKNPESNYCLKND-LGNP-MS-NTVGQIIKCRAAVAWEAGKP-LVIEEVEVAPPQAGEVRLKILFTSLCHTDVY-FWEAKGQ--------TPLFPRIFGHEAGGIVESVGEGVTH--LKPGDHA-LP-VFTGECGECPHCKSEESNMCDLLRINTD-RG
---------MKSRAAVAFAPGKP-LEIVEIDVAPPKKGEVLIKVTHTGVCHTDAF-TLSGDDP--------EGVFPVVLGHEGAGVVVEVGEGVTS--VKPGDHV-IP-LYTAECGECEFCRSGKTNLCVAVRETQG-KG
MS-TTVGQVIRCKAAVAWEAGKP-LVMEEVDVAPPQKMEVRLKILYTSLCHTDVY-FWEAKGQ--------NPVFPRILGHEAAGIVESVGEGVTE--LAPGDHV-LP-VFTGECKDCAHCKSEESNMCSLLRINTD-RG
-----------MKAAVVNEFKKA-LEIKEVERPKLEEGEVLVKIEACGVCHTDLH-AAHGDWP-------IKPKLPLIPGHEGVGIVVEVAKGVKS--IKVGDRVGIP-WLYSACGECEYCLTGQETLCPHQL-----------------MKAAVITK-DHT-IEVKDTKLRPLKYGEALLEMEYCGVCHTDLH-VKNGDFG---------DETGRITGHEGIGIVKQVGEGVTS--LKAGDRASVA-WFFKGCGHCEYCVSGNETLCRNVE-----------------MKAAVVTK-DHH-VDVTYKTLRSLKHGEALLKMECCGVCHTDLH-VKNGDFG---------DKTGVILGHEGIGVVAEVGPGVTS--LKPGDRASVA-WFYEGCGHCEYCNSGNETLCRSVK--------MAAPEIPKKQKAVIYDNPGTVSTKVVELDVPEPGDNEVLINLTHSGVCHSDFG-IMTNTWKILP----FPTQPGQVGGHEGVGKVVKLGAGAEASGLKIGDRVGVK-WISSACGQCPPCQDGADGLCFNQK--------------MSTVAAYAAMSATEP-LTKTTITRRDPGPHDVAIDIKFAGICHSDIH-TVKAEWG--------QPNYPVVPGHEIAGVVTAVGSEVTK--YRQGDRVGVG-CFVDSCRECNSCTRGIEQYCKPGAN-----............................................................................................................................................
ADH1_SULSO
ADHE_HORSE
ADHS_HORSE
ADH_GADCA
ADH7_HUMAN
ADHX_HUMAN
ADHB_HUMAN
ADH1_PEA
ADH3_ECOLI
ADH3_SOLTU
ADH2_BACST
ADH1_ZYMMO
ADHP_ECOLI
ADH2_EMENI
ADH_MYCTU
.........
120
130
140
150
160
170
180
190
200
210
220
230
------WLGINF DG----------AYAEYVIVPHYKYMYKLRRLNAVEAAPLTCSGITTY-RAVRKASLDPTKTLLVVGAGGGLGTMAVQI-AKAVSGATIIGVDVREEAVEAAKRAGADYVINASMQ----D---PLA
TMQ-DGTSRFT-CRGKPIHHFLGTSTFSQYTVVDEISVAKIDAASPLEKVCLIGCGFSTGYGSAVKVAKVTQGSTCAVFGLGG-VGLSVIMG-CKAAGAARIIGVDINKDKFAKAKEVGATECVNPQDYK---K--PIQE
TMQ-DGTSRFT-CRGKPIHHFLGTSTFSQYTVVDEISVAKIDAASPLEKVCLVGCGFSTGYGSAVKVAKVTQGSTCAVFGLGG-VGLSVIMG-CKAAGAARIIGVDINKDKFAKAKEVGATECVNPQDYK---K--PIQE
VMS-PKETRFT-CKGRKVLQFLGTSTFSQYTVVNQIAVAKIDPSAPLDTVCLLGCGVSTGFGAAVNTAKVEPGSTCAVFGLGA-VGLAAVMG-CHSAGAKRIIAVDLNPDKFEKAKVFGATDFVNPNDHS---E--PISQ
VLA-DGTTRFT-CKGKPVHHFMNTSTFTEYTVVDESSVAKIDDAAPPEKVCLIGCGFSTGYGAAVKTGKVKPGSTCVVFGLGG-VGLSVIMG-CKSAGASRIIGIDLNKDKFEKAMAVGATECISPKDST---K--PISE
LMP-DGTSRFT-CKGKTILHYMGTSTFSEYTVVADISVAKIDPLAPLDKVCLLGCGISTGYGAAVNTAKLEPGSVCAVFGLGG-VGLAVIMG-CKVAGASRIIGVDINKDKFARAKEFGATECINPQDFS---K--PIQE
TLQ-DGTRRFT-CRGKPIHHFLGTSTFSQYTVVDENAVAKIDAASPLEKVCLIGCGFSTGYGSAVNVAKVTPGSTCAVFGLGG-VGLSAVMG-CKAAGAARIIAVDINKDKFAKAKELGATECINPQDYK---K--PIQE
VMLNDNKSRFS-IKGQPVHHFVGTSTFSEYTVVHAGCVAKINPDAPLDKVCILSCGICTGLGATINVAKPKPGSSVAIFGLGA-VGLAAAEG-ARISGASRIIGVDLVSSRFELAKKFGVNEFVNPKEH----DK-PVQQ
LMP-DGTTRFS-YNGQPLYHYMGCSTFSEYTVVAEVSLAKINPEANHEHVCLLGCGVTTGIGAVHNTAKVQPGDSVAVFGLGA-IGLAVVQG-ARQAKAGRIIAIDTNPKKFDLARRFGATDCINPNDYD---K--PIKD
VMINDGQSRFS-INGKPIYHFVGTSTFSEYTVVHVGCVAKINPLAPLDKVCVLSCGISTGLGATLNVAKPTKGSSVAIFGLGA-VGLAAAEG-ARIAGASRIIGVDLNASRFEQAKKFGVTEFVNPKDY----SK-PVQE
------NGGYS-VDG----------GYAEYCKAPADYVAKIPDNLDPVEVAPILCAGVTTY-KALKVSGARPGEWVAIYGIGG-LGHIALQY-AKAMG-LNVVAVDISDEKSKLAKDLGADIAINGLKE----D---PVK
------NAGYT-VDG----------AMAEECIVVADYSVKVPDGLDPAVASSITCAGVTTY-KAVKVSQIQPGQWLAIYGLGG-LGNLALQY-AKNVFNAKVIAIDVNDEQLAFAKELGADMVINPKNE----D---AAK
------NAGYS-VDG----------GMAEECIVVADYAVKVPDGLDSAAASSITCAGVTTY-KAVKLSKIRPGQWIAIYGLGG-LGNLALQY-AKNVFNAKVIAIDVNDEQLKLATEMGADLAINSHTE----D---AAK
------VSGYY-TPG----------TFQQYVLGPAQYVTPIPDGLPSAEAAPLLCAGVTVY-ASLKRSKAQPGQWIVISGAGGGLGHLAVQIAAKGMG-LRVIGVDHG-SKEELVKASGAEHFVDITKFPTGDKFEAISS
----FTYNSIG-KDGQP-----TQGGYSEAIVVDENYVLRIPDVLPLDVAAPLLCAGITLY-SPLRHWNAGANTRVAIIGLGG-LGHMGVKL-GAAMG-ADVTVLSQSLKKMEDGLRLGAKSYYATADP---------D............................................................................................................................................
ADH1_SULSO
ADHE_HORSE
ADHS_HORSE
ADH_GADCA
ADH7_HUMAN
ADHX_HUMAN
ADHB_HUMAN
ADH1_PEA
ADH3_ECOLI
ADH3_SOLTU
ADH2_BACST
ADH1_ZYMMO
ADHP_ECOLI
ADH2_EMENI
ADH_MYCTU
.........
240
250
260
270
280
290
300
310
320
330
340
347
EIRRITE-SK-GVDAVIDLNNSEKTLSVYPKALAKQ-GKYVMVGLFG---ADLHYHAPLITLS-EIQFVGS-LVG--NQSDFLGIMRLAEAG--KVKPMITKTMKLEEANEAIDNLENFKAIGRQVLIP--VLTEMSN-G--GVDFSFEVIGRLDTMVTALSCCQEAYGVSVIVGVPPD--SQNLSMNPMLLLS-GRTWKGAIFGGFKSKDSVPKLVADFMAKKFALDPLITHVLPFEKINEGFDLLRSGESI-RTILTF--VLTEMSN-G--GVDFSFEVIGRLDTMVAALSCCQEAYGVSVIVGVPPD--SQNLSMNPMLLLS-GRTWKGAIFGGFKSKDSVPKLVADFMAKKFALDPLITHVLPFEKINEGFDLLRSGKSI-RTILTF--VLSKMTN-G--GVDFSLECVGNVGVMRNALESCLKGWGVSVLVG-WTD--LHDVATRPIQLIA-GRTWKGSMFGGFKGKDGVPKMVKAYLDKKVKLDEFITHRMPLESVNDAIDLMKHGKCI-RTVLSLE-VLSEMTG-N--NVGYTFEVIGHLETMIDALASCHMNYGTSVVVGVPPS--AKMLTYDPMLLFT-GRTWKGCVFGGLKSRDDVPKLVTEFLAKKFDLDQLITHVLPFKKISEGFELLNSGQSI-RTVLTF--VLIEMTD-G--GVDYSFECIGNVKVMRAALEACHKGWGVSVVVGVAAS--GEEIATRPFQLVT-GRTWKGTAFGGWKSVESVPKLVSEYMSKKIKVDEFVTHNLSFDEINKAFELMHSGKSI-RTVVKI--VLKEMTD-G--GVDFSFEVIGRLDTMMASLLCCHEACGTSVIVGVPPA--SQNLSINPMLLLT-GRTWKGA-VYGGFKSKEGIPKLVADFMAKKFSLDALITHVLPFEKINEGFDLLHSGKSIRTVLTF--VIAEMTN-G--GVDRAVECTGSIQAMISAFECVHDGWGVAVLVGVPSK--DDAFKTHPMNFLN-ERTLKGTFYGNYKPRTDLPNVVEKYMKGELELEKFITHTVPFSEINKAFDYMLKGESI-RCIIKMEEVLLDINK-W--GIDHTFECIGNVNVMRAALESAHRGWGQSVIIGVAVA--GQEISTRPFQLVT-GRVWKGSAFGGVKGRSQLPGMVEDAMKGDIDLEPFVTHTMSLDEINDAFDLMHEGKSI-RTVIRY--VIAEMTD-G--GVDRSVECTGHIDAMISAFECVHDGWGVAVLVGVPHK--EAVFKTHPMNFLN-ERTLKGTFFGNYKPRSDIPSVVEKYMNKELELEKFITHTLPFAEINKAFDLMLKGEGL-RCIITMEDAIHDQVG-G---VHAAISVAVNKKAFEQAYQSVKRG-GTLVVVGLPN---ADLPIPIFDTVLN-GVSVKGS-IVG--TRKDMQEALDFAARG--KVRPIV-ETAELEEINEVFERMEKGKINGRIVLKLKED
IIQEKVG-G---AHATVVTAVAKSAFNSAVEAIRAG-GRVVAVGLPP---EKMDLSIPRLVLD-GIEVLGS-LVG--TREDLKEAFQFAAEG--KVKPKV-TKRKVEEINQIFDEMEHGKFTGRMVVDFTHH
IVQEKTG-G---AHAAVVTAVAKAAFNSAVDAVRAG-GRVVAVGLPP---ESMSLDIPRLVLD-GIEVVGS-LVG--TRQDLTEAFQFAAEG--KVVPKV-ALRPLADINTIFTEMEEGKIRGRMVIDFRHHVKSLTTKG-LGAHAVIVCTASNIAYAQSLLFLRYN-GTMVCVGIPENEPQRIASAYPGLFIQKHVHVTGS-AVG--NRNEAIETMEFAARG--VIKAHF-REEKMEALTEIFKEMEEGKLQGRVVLDLS-TFRKLR--G--GFDLILNTVSANLDLGQYLNLLDVD-GTLVELGIPEH--PMAVPAFALALMR--RSLAGSNIGG---IAETQEMLNFCAEH--GVTPEI-ELIEPDYINDAYERVLASDVRYRFVIDISAL
....................................................................................................................................
•38 residui sono conservati in più del 90% delle sequenze
•12 residui sono sempre conservati
Tra questi i residui coinvolti nel coordinare i due centri
metallici
Allineamento di 24 ADH tetrameriche
ADH1_SULSO
ADH_CLOBE
ADH_THEBR
ADH1_SOLTU
ADH2_LYCES
ADH1_ASPFL
ADH1_EMENI
ADH1_KLULA
ADH1_KLUMA
ADH1_YEAST
ADH1_CANAL
ADH1_PICST
ADH_SCHPO
ADH2_EMENI
ADH_ALCEU
.........
1
10
20
30
40
50
60
70
80
90
100
110
----------MRAVRLVEIGKP--LSLQEIGVPKPKGPQVLIKVEAAGVCHSDVHMRQGRFGNLRIVEDLGVKLPVTLGHEIAGKIEEVGDEVVG--YSKGDLVAVNPWQG-EGNCYYCRIGEEHLCDS--------------------MKGFAMLGINKLG---WIEKERPVAGSYDAIVRPLAVSPCTSDIHTVFEGA--------LGDRKNMILGHEAVGEVVEVGSEVKD--FKPGDRVIVPCTTPDWRSLEVQAGFQQHSN----------------------MKGFAMLSIGKVG---WIEKEKPAPGPFDAIVRPLAVAPCTSDIHTVFEGA--------IGERHNMILGHEAVGEVVEVGSEVKD--FKPGDRVVVPAITPDWRTSEVQRGYHQHSG------------MSTTVGQVIRCKAAVAWEAGKP--LVMEEVDVAPPQKMEVRLKILYTSLCHTDVYFWEAKG--------QNPVFPRILGHEAAGIVESVGEGVTE--LGPGDHV-LPVFTGECKDCAHCKSEESNMCSL----------MSTTVGQVIRCKAAVAWEAGKP--LVMEEVDVAPPQKMEVRLKILYTSLCHTDVYFWEAKG--------QNPVFPRILGHEAAGIVESVGEGVTD--LAPGDHV-LPVFTGECKDCAHCKSEESNMCSL--------------MSIPEMQWAQVAEQKGGP--LIYKQIPVPKPGPDEILVKVRYSGVCHTDLHALKGDW-------PLPVKMPLVGGHEGAGVVVARGDLVT--EFEIGDHAGLKWLNGSCLACEFCKQADEPLCPN--------------MCIPTMQWAQVAEKVGGP--LVYKQIPVPKPGPDQILVKIRYSGVCHTDLHAMMGHW-------PIPVKMPLVGGHEGAGIVVAKGELVH--EFEIGDQAGIKWLNGSCGECEFCRQSDDPLCAR------------MAASIPETQKGVIFYENGGE--LQYKDIPVPKPKANELLINVKYSGVCHTDLHAWKGDW-------PLPTKLPLVGGHEGAGVVVAMGENVKG--WKIGDFAGIKWLNGSCMSCEYCELSNESNCPE--------------MAIPETQKGVIFYEHGGE--LQYKDIPVPKPKPNELLINVKYSGVCHTDLHAWQGDW-------PLDTKLPLVGGHEGAGIVVAMGENVTG--WEIGDYAGIKWLNGSCMSCEECELSNEPNCPK---------------SIPETQKGVIFYESHGK--LEHKDIPVPKPKANELLINVKYSGVCHTDLHAWHGDW-------PLPVKLPLVGGHEGAGVVVGMGENVKG--WKIGDYAGIKWLNGSCMACEYCELGNESNCPH------------MSEQIPKTQKAVVFDTNGGQ--LVYKDYPVPTPKPNELLIHVKYSGVCHTDLHARKGDW-------PLATKLPLVGGHEGAGVVVGMGENVKG--WKIGDFAGIKWLNGSCMSCEFCQQGAEPNCGE--------------MSVPTTQKAVVFESNGGP--LLYKDIPVPTPKPNEILINVKYSGVCHTDLHAWKGDW-------PLDTKLPLVGGHEGAGVVVGIGSNVTG--WELGDYAGIKWLNGSCLNCEFCQHSDEPNCAK--------------MTIPDKQLAAVFHTHGGPENVKFEEVPVAEPGQDEVLVNIKYTGVCHTDLHALQGDW-------PLPAKMPLIGGHEGAGVVVKVGAGVTR--LKIGDRVGVKWMNSSCGNCEYCMKAEETICPH-----------MAAPEIPKKQKAVIYDNPGTVS-TKVVELDVPEPGDNEVLINLTHSGVCHSDFGIMTNTWK----ILPFPTQPGQVGGHEGVGKVVKLGAGAEASGLKIGDRVGVKWISSACGQCPPCQDGADGLCFN----------------MTAMMKAAVFVEPGRIE---LADKPIPDIGPNDALVRITTTTICGTDVH-ILKGE--------YPVAKGLTVGHEPVGIIEKLGSAVTG--YREGQRVIAGAICPNFNSYAAQDGVASQDCSYLMASGQCGCHG
............................................................................................................................................
ADH1_SULSO
ADH_CLOBE
ADH_THEBR
ADH1_SOLTU
ADH2_LYCES
ADH1_ASPFL
ADH1_EMENI
ADH1_KLULA
ADH1_KLUMA
ADH1_YEAST
ADH1_CANAL
ADH1_PICST
ADH_SCHPO
ADH2_EMENI
ADH_ALCEU
.........
120
130
140
150
160
170
180
190
200
210
220
-PRWLG----INFDG------------------AYAEYVIVPHYKYMYKLRRLNAVEAAPLT--CSGITTYRAVRKASLDPTKTLLVVGAGGGLGTMAVQIAKAVSGATIIGVDVREEAVEAAKRAGADYVINASMQ---GMLAGWKFSNFKDG------------------VFGEYFHVNDADMNLAILPKDMPLENAVMITDMMTSGFHGAELADIQMGSSVVVIGIG-AVGLMGIAGAKLRGAGRIIGVGSRPICVEAAKFYGATDILNYKNG---GMLAGWKFSNVKDG------------------VFGEFFHVNDADMNLAHLPKEIPLEAAVMIPDMMTTGFHGAELADIELGATVAVLGIG-PVGLMAVAGAKLRGAGRIIAVGSRPVCVDAAKYYGATDIVNYKDG---LRINTDRGVMINDGQSRFSINGKPIYHFVGTSTFSEYTVVHVGCVAKINPLAPLDKVCVLS--CGISTLGATLNVAKPTKGSSVAIFGLG-AVGLAAAEGARIAGASRIIGVDLNASRFEQAKKFGVTEFVNPKDY---LRINTDRGVMLNDGKSRFSINGNPIYHFVGTSTFSEYTVVHVGCVAKINPLAPLDKVCVLS--CGISTLGASLNVAKPTKGSSVAIFGLG-AVGLAAAEGARIAGASRIIGVDLNASRFEQAKKFGVTEFVNPKDY---ASLSG----YTVDG------------------TFQQYAIGKATHASKLPKNVPLDAVAPVL--CAGITVYKGLKESGVRPGQTVAIVGAGGGLGSLALQYA-KAMGIRVVAIDGGEEKQAMCEQLGAEAYVDFTKT---AQLSG----YTVDG------------------TFQQYALGKASHASKIPAGVPVDAAAPVL--CAGITVYKGLKEAGVRPGQTVAIVGAGGGLGSLAQQYA-KAMGIRVVAVDGGDEKRAMCESLGTETYVDFTKS---ADLSG----YTHDG------------------SFQQYATADAVQAAKIPVGTDLAEVAPVL--CAGVTVYKALKSANLKAGDWVAISGAAGGLGSLAVQYA-KAMGYRVLGIDAGEEKAKLFKDLGGEYFIDFTKS---ADLSG----YTHDG------------------SFQQYATADAVQAARIPKNVDLAEVAPIL--CAGVTVYKALKSAHIKAGDWVAISGACGGLGSLAIQYA-KAMGYRVLGIDAGDEKAKLFKELGGEYFIDFTKT---ADLSG----YTHDG------------------SFQQYATADAVQAAHIPQGTDLAQVAPIL--CAGITVYKALKSANLMAGHWVAISGAAGGLGSLAVQYA-KAMGYRVLGIDGGEGKEELFRSIGGEVFIDFTKE---ADLSG----YTHDG------------------SFEQYATADAVQAAKIPAGTDLANVAPIL--CAGVTVYKALKTADLAAGQWVAISGAGGGLGSLAVQYA-RAMGLRVVAIDGGDEKGEFVKSLGAEAYVDFTKD---ADLSG----YTHDG------------------SFQQYATADAVQAARLPKGTDLAQAAPIL--CAGITVYKALKTAQIQPGNWVCISGAGGGLGSLAIQYA-KAMGFRVIAIDGGEEKGEFVKSLGAEAYVDFTVS---IQLSG----YTVDG------------------TFQHYCIANATHATIIPESVPLEVAAPIM--CAGITCYRALKESKVGPGEWICIPGAGGGLGHLAVQYA-KAMAMRVVAIDTGDDKAELVKSFGAEVFLDFKKE---QKVSG----YYTPG------------------TFQQYVLGPAQYVTPIPDGLPSAEAAPLL--CAGVTVYASLKRSKAQPGQWIVISGAGGGLGHLAVQIAAKGMGLRVIGVDHGS-KEELVKASGAEHFVDITKFPTG
YKATAGWRFGNMIDG------------------TQAEYVLVPDAQANLTPIPDGLTDEQVLMCPDIMSTGFKGAENANIRIGHTVAVFAQG-PIGLCATAGARLCGATTIIAIDGNDHRLEIARKMGADVVLNFRNC--............................................................................................................................................
ADH1_SULSO
ADH_CLOBE
ADH_THEBR
ADH1_SOLTU
ADH2_LYCES
ADH1_ASPFL
ADH1_EMENI
ADH1_KLULA
ADH1_KLUMA
ADH1_YEAST
ADH1_CANAL
ADH1_PICST
ADH_SCHPO
ADH2_EMENI
ADH_ALCEU
..........
230
240
250
260
270
280
290
300
310
320
330
340
347
-----DPLAEIRRITESKGVDAVIDLNNSEKTLSVYPKALAKQGKYVMVGLFGADLHYHAPLIT----LSEIQFVG-SLVGNQSDFLGIMRLAEAGK-----VKPMITKTMKLEEANEAIDNLENFKAIGRQVLIP------HIVDQVMKLTNGEGVDRVIMAGGGSETLSQAVSMVKPGGIISNINYHGSGDALLIPRVEWGCGMAHKTIKGGLCPGGRLRAEMLRDMVVYNRVDL--SKLVTHVYHGFDHIEEALLLMKDKPKDLIKAVVIL-----PIESQIMNLTEGKGVDAAIIAGGNADIMATAVKIVKPGGTIANVNYFGEGEVLPVPRLEWGCGMAHKTIKGGLCPGGRLRMERLIDLVFYKRVDP--SKLVTHVFRGFDNIEKAFMLMKDKPKDLIKPVVILA
---SKPVQEVIAEMTDGGVDRSVECTGHIDAMISAFECVHDGWGVAVLVGVPHKEAVFKTHPMN---LLNERTLKG-TFFGNYKPRSDIPSVVEKYMNKELELEKFITHTLPFAEINKAFDLMLKGEGLRCIITMED---SKPVQEVIAEMTDGGVDRSVECTGHIDAMISAFECVHDGWGVAVLVGVPHKEAVFKTHPLN---FLNERTLKG-TFFGNYKPRSDIPCVVEKYMNKELELEKFITHTLPFAEINKAFDLMLKGEGLRCIITMAD---QDLVADVKAATPEGLGAHAVILLAVAEKPFQQAAEYV-SRGTVVAIGLPAG-AFLRAPVFN--TVVRMINIKG-SYVGNRQDGVEAVDFFARGL-----IKAPFK-TAPLQDLPKIFELMEQGKIAGRYVLEIPE
---KDLVADVRHGR-GCLGAHAVILLAVSEKPFQQATEYVRSRGTIVAIGLPPD-AYLKAPVIN--TVVRMITIKG-SYVGNRQDGVEALDFFARGL-----IKAPFK-TAPLKDLPKIYELMEQGRIAGRYVLEMPE
----KNIPEEVIEAT-KGGAHGVINVSVSEFAIEQSTNYVRSNGTVVLVGLPRD-AKCKSDVFN--QVVKSISIVG-SYVGNRADTREAIDFFSRGL-----VKAPIH-VVGLSELPSIYEKMEKGAIVGRYVVDTSK
----KDMVAEVIEAT-NGGAHAVINVSVSEAAISTSVLYTRSNGTVVLVGLPRD-AQCKSDVFN--QVVKSISIVG-SYVGNRADTREALDFFSRGL-----VKAPIK-ILGLSELASVYDKMVKGQIVGRIVVDTSK
----KDIVGAVLKAT-DGGAHGVINVSVSEAAIEASTRYVRANGTTVLVGMPAG-AKCCSDVFN--QVVKSISIVG-SYVGNRADTREALDFFARGL-----VKSPIK-VVGLSTLPEIYEKMEKGQIVGRYVVDTSK
----KDIVEAVKKAT-DGGPHGAINVSVSEKAIDQSVEYVRPLGKVVLVGLPAH-AKVTAPVFD--AVVKSIEIKG-SYVGNRKDTAEAIDFFSRGL-----IKCPIK-IVGLSDLPEVFKLMEEGKILGRYVLDTSK
----KDIVKDIQTAT-DGGPHAAINVSVSEKAIAQSCQYVRSTGTVVLVGLPAG-AKVVAPVFD--AVVKSISIRG-SYVGNRADSAEAIDFFTRGL-----IKCPIK-VVGLSELPKVYELMEAGKVIGRYVVDTSK
----ADMIEAVKACT-NGGAHGTLVLSTSPKSYEQAAGFARPGSTMVTVSMPAG-AKLGADIFW--LTVKMLKICG-SHVGNRIDSIEALEYVSRGL-----VKPYYK-VQPFSTLPDVYRLMHENKIAGRIVLDLSK
DKFEAISSHVKSLTTKGLGAHAVIVCTASNIAYAQSLLFLRYNGTMVCVGIPENEPQRIASAYPGLFIQKHVHVTG-SAVGNRNEAIETMEFAARGV-----IKAHFR-EEKMEALTEIFKEMEEGKLQGRVVLDLS----DVVDEVMKLTG-GRGVDASIEALGTQATFEQSLRVLKPGGTLSSLGVYSSD--LTIPLSAFAAGLGDHKINTALCPGGKERMRRLINVIESGRVDL--GALVTHQYR-LDDIVAAYDLFANQRDGVLKIAIKPH
.........................................................................................................................................
Allineamento tra il target e due template
Target
ADH a 2 atomi Zn
ADH tetramerica
a-elica
b-strand
L’allineamento considera: posizioni conservate, struttura
secondaria, accessibilità al solvente.
Modello del monomero
Dominio di legame del coenzima
Zinco catalitico
Zinco strutturale
Dominio catalitico
Modello del tetramero
Casadio R, Martelli PL, Giordano A, Rossi M, Raia CA
A low-resolution 3D model of the tetrameric alcohol dehydrogenase from Sulfolobus solfataricus
Protein eng 15:215-223 (2002)
Conferme: la struttura della proteina è stata risolta
Modello
Struttura a raggi X (1JVB)
RMSD = 0.25 nm
Casadio et al, Protein eng 15:215 (March 2002)
Esposito et al., JMB 318:463 (April 2002)
Carbossipeptidasi da Sulfolobus solfataricus
Dati sperimentali
•Contiene 1 atomi di zinco per monomero
•Attiva in forma oligomerica, ignoto il numero di monomeri
Strutture presenti nella banca dati
•Carbossipeptidasi a 1 atomo di zinco
1OBR (Thermoactinomyces vulgaris)
ID: 16%
simmetria compatibile con esameri
•Carbossipeptidasi a 2 atomi di zinco
1CG2 (Pseudomonas spirullum)
ID: 21%
simmetria compatibile con tetrameri
1OBR
Sovrapposizione strutturale dei
domini catalitici
1CG2:His90
1OBR:His69
1CG2
1OBR:His204
1CG2:Glu178
1CG2:Asp119
1OBR:Glu72
RMSD = 0.25 nm
Allineamento tra il target e 1OBR
CPSso
1OBR
MDLVEKLKNDVREIEDWIIQIRRKIHEYPELSYKEYNTSKLVAETLRKLGVEVEEGVGLP
------------------------DFPSYDSGYHNYNEMVNKINTVASNYPNI------V
CPSso
1OBR
TAVVGKIR-GSKPGKTVALRADMDALPVEENTDLEFKSKVKGVMHACGH--DTH-VAMLL
KKFSIGKSYEGR--ELWAVKIS-DNVGTDEN-------EPEVLYTALHHAREHLTVEMAL
CPSso
1OBR
GGAYLLVKNKDLISGEIRLIFQPAEEDGGLGGAKPMIEAGVMNGVDYVFGIHISSSYP-S
YTLDLFTQNYNLDSRITNLVNN-REIYIVFNINPDGGEYDISSGS---YKSWRKNRQPNS
CPSso
1OBR
GVFATRKGPIMATPDAFKIIVHGKGGHGSAPHETIDP--IFISLQIANAIYGITARQIDP
G--SSYVGTDLNRNYGYKWGCCG-GSSGSPSSETYRGRSAFSAPETAAMRDFINSRVVGG
CPSso
1OBR
VQPFIISITTIHSGTKDNIIPDDAEMQGTIRSLDENVRSKAKDYMRRIVSSICGIYGATC
KQ-QIKTLITFHTYSELILYPYGYTYTDVP-----SDMT-QDDFNVFKTMANTMAQTNGY
CPSso
1OBR
EVKFMEDVYPTTVNNPEVTDEVMKILSSISTVVETEPVLGA----EDFSRFLQKAPGTYF
TPQQASDLY---ITDGDMTDWAYGQHKIFAFTFEMYPTSYNPGFYPPDEVIGRETSRN--
CPSso
1OBR
FLGTRNEKKGCIYPNHSSKFCVDEDVLKLGALAHALLAVKFSNK
-------KEAVLYVAEKAD-CPYSVIGKSCSTK-----------
a-elica
b-strand
L’allineamento considera: leganti dello zinco, struttura
secondaria, accessibilità al solvente.
Modello di CPSso basato su 1OBR
Coordinano lo zinco
His 245
Acqua Asp 109
His 108
Zinco
Glu 327
Coordina l’acqua
Allineamento tra il target e 1CG2
CPSso
1CG2
-------MDLVEKLKNDVREIEDWIIQIRRKIHEYPELSYKEYNTSKLVAETLRKLGVEV
ALAQKRDNVLFQAATDEQPAVIKTLEKLVNIETGTGD---AEGIAAAGNFLEAELKNLGF
CPSso
1CG2
EEGVGL------PTAVVGKIRGSKPGKTVALRADMDALPVEENTDLEFKSKVKGVMHACG
TVTRSKSAGLVVGDNIVGKIK--------------GRGGK--------------NLLLMS
CPSso
1CG2
H----------------------------DTHVAMLLGGAYLLVKNKDLIS--GEIRLIF
HMDTVYLKGILAKAPFRVEGDKAYGPGIADDKGGNAVILHTLKLLKEYGVRDYGTITVLF
CPSso
1CG2
QPAEEDGGLG---GAKPMIEAGVMNGV-DYVFGIHISSSYPSGVFATRKGPIMATPDAFK
NTDEE-KGS-FGSRDLIQEEAKL----ADYVLSFEPTS-AGDEKLSL-GTS---GIAYVQ
CPSso
1CG2
IIVHGKGGHGSAPHETIDPIFISLQIANAIYGITARQIDPVQPFIISITTIHSGTKDNII
VNITGKASHAGAAPELGVN---ALVEASDLVLRTMNIDDKAKNLRFNWTIAKAGNVSNII
CPSso
1CG2
PDDAEMQGTIRSLDENVRSKAKDYMRRIVSSICGIYGATCEVKFMEDVYPTTVNN----PASATLNADVRYARNEDFDAAMKTLEERAQQKKLP---EADVKVIVTR-----GRPAFNA
CPSso
1CG2
PEVTDEVMKILSSISTV------VETEPVLGAEDFSRFLQKAPGTYFFLGTRNEKKGCIY
GEGGKKLVDKAVAYYKEAGGTLGVEERTGGGTDAAYAALSG---KPVIES--LGLPGFGY
CPSso
1CG2
PNHSSKFCVDEDVLKLGALAHALLAVKFSNK
HSDKAEYVDISAIPRRLYMAARLIMDLGAGK
a-elica
b-strand
L’allineamento considera: leganti dello zinco, struttura
secondaria, accessibilità al solvente.
Modello di CPSso basato su 1CG2
His 108
Asp 109
His 168
Acqua
Coordinano lo zinco
Glu 142
Zinco
Coordina l’acqua
Mutagenesi sitospecifica
His 108
Asp 109
His 245
Coordinano lo zinco
His 108
Asp 109
His 168
H108A
D109L
H245A
H168A
Inattivo
Inattivo
Attivo
Inattivo
Aggregati
Modello basato su 1obr
Simmetria 6-merica
Modello basato su 1cg2
Simmetria 4-merica
Diffrazione a Raggi X a Basso Angolo
Occhipinti E, Martelli PL, Spinozzi F, Corsi F, Formantici C, Molteni L, Amenitsch H, Mariani P, Tortora P, Casadio R
3D structure of Sulfolobus solfataricus carboxypeptidase developed by molecular modeling is confirmed by site-directed
mutagenesis and small angle X-ray scattering
Biophys J 85:1165-1175 (2003)
Conclusioni
Il modelling a bassa identità di sequenza può dare buoni
risultati se tutte le informazioni disponibili (sia sperimentali
che derivanti da predizioni) sono utilizzate per la scelta del
template e per l’allineamento.
Queste procedure
automatiche
sono
in
gran
parte
ANCORA non
A low resolution 3D Model of
VDAC the sequence from
Neurospora crassa)
Structural alignment of VDAC with the template
2omf_.seq/
2omf_.str/
protx.str/
protx.seq/
AEIYNKDGNK
CCCCCCCCEE
*******CCC
*******KGY
VDLYGKAVGL
EEEEEEEEEE
CCCCEEEEEE
NFGLWKLDLK
HYFSKGNGEN
EEECCCCCCC
EEEC******
TKTS******
SYGGNGDMTY
CCCCCCCCCE
********CE
********SG
ARLGFKGETQ
EEEEEEEEEE
EEEEEEEECC
IEFNTAGHSN
2omf_.seq/
2omf_.str/
protx.str/
protx.seq/
I*NSDLTGYG
C*CCCEEEEE
CCCCCEEEEE
QESGKVFGSL
QWEYNFQGNN
EEEEEEECCC
EEEEEEC***
ETKYKVK***
SEGADAQTGN
CCCCCCCCCC
**********
**********
KTRLAFAGLK
EEEEEEEEEE
EEEEEEEEEC
DYGLTLTEKW
YADVGSFDYG
ECCCEEEEEE
CCCCCEEEEE
NTDNTLFTEV
2omf_.seq/
2omf_.str/
protx.str/
protx.seq/
RNYGVVYDAL
ECCCCCCCCC
EEEECC****
AVQDQL****
GYTDMLPEFG
CCCCCCCCCC
**********
**********
GDTAYSDDFF
CCCCCCCCCC
**********
**********
VGRVGGVATY
CCCCCCEEEE
**CCEEEEEE
**LEGLKLSL
RNSNFFGLVD
EECCCCCCCC
EEECCCCCCC
EGNFAPQSGN
2omf_.seq/
2omf_.str/
protx.str/
protx.seq/
GLNFAVQYLG
CEEEEEEEEC
EEEEEEEEEE
KNGKFKVAYG
KNER******
CCCC******
EEEECCCCCC
HENVKADSDV
*********D
*********C
CCCCCCCEEE
NIDLKGPLIN
TARRSNGDGV
CCCCCCCCEE
EEEEEEEEEE
ASAVLGYQGW
GGSISYEYE*
EEEEEEEEC*
EEEEEEECCC
LAGYQTAFDT
2omf_.seq/
2omf_.str/
protx.str/
protx.seq/
**GFGIVGAY
**CEEEEEEE
CCEEEEEEEE
QQSKLTTNNF
GAADRTNLQE
EEEECCCCCC
EEEEEEEEEE
ALGYTTKDFV
AQPLGNGKKA
CCCCCCCCEE
EEECCCCCCC
LHTAVNDGQE
EQWATGLKYD
EEEEEEEEEE
EEEEEEEEEE
FSGSIFQRTS
ANNIYLAANY
ECCEEEEEEE
CEEEEEEEEE
DKLDVGVQLS
2omf_.seq/
2omf_.str/
protx.str/
protx.seq/
GETRNATPIT
EEEECCCCCC
EEECC*****
WASGT*****
NKFTNTSGFA
CCCCCCCCCC
**********
**********
NKTQDVLLVA
CEEEEEEEEE
*CCCEEEEEE
*SNTKFAIGA
QYQFDFGLRP
EEECCCCEEE
EEECCCCEEE
KYQLDDDARV
SIAYTKSKAK
EEEEEEEEEE
EEEEEEC***
RAKVNNA***
2omf_.seq/
2omf_.str/
protx.str/
protx.seq/
DVEGIGDVDL
CCCCCCCEEE
*********E
*********S
VNYFEVGATY
EEEEEEEEEE
EEEEEEEEEE
QVGLGYQQKL
YFNKNMSTYV
ECCCCEEEEE
EC***EEEEE
RT***GVTLT
DYIINQIDSD
EEEEECCCCC
EEEEECCC**
LSTLVDGK**
NKLGVGSDDT
CCCCCCCCCE
*****CCCCE
*****NFNAG
2omf_.seq/
2omf_.str/
protx.str/
protx.seq/
VAVGIVYQF*
EEEEEEEEE*
EEEEEEEEEE
GHKIGVGLEL
***
***
EC*
EA*
Prediction with HMM
A low resolution 3D model of VDAC:
location of mutated residues
Casadio et al., FEBS Lett 520:1-7 (2002)
Threading
Thread the Sequence ….ACDGGTKLMAG…… into
Model 3
Model 1
Model 2
Score 1
Score 3
Score 2
The best scoring model is chosen as candidate fold
for the sequence
THREADING SERVERS
TOPITS (PredictProtein) Burkhard Rost (Columbia Univ.)
http://cubic.bioc.columbia.edu/predictprotein/
FRSVR David Eisenberg (UCLA)
http://fold.doe-mbi.ucla.edu/
3DPSSM Michael Sternberg (Imperial Cancer Res. Fund)
http://www.sbg.bio.ic.ac.uk/~3dpssm/
GenTHREADER David Jones (Brunel Univ.)
http://bioinf.cs.ucl.ac.uk/psipred/
HoMo
1D
FoRc
….the art of
being humble
Ab initio methods:
•Knowledge based potentials
•Contact map predictions
Prediction of Contact Maps
Contact definition
F 156
F 297
V 299
I 269
V 238
V 271
I 240
Contact definition:
•Cb-Cb distance < 0.8 nm
•Sequence gap > 7 residues
Computation of Contact Maps
From 3D Structure
F 156
F 297
I 269
V 238
V 299
V 271
I 240
To Contact Map
T
T
C
C
P
S
I
V
A
R
S
N
F
N
V
C
R
L
P
G
T
P
E
A
I
C
A
T
Y
T
G
C
I
I
I
P
G
A
T
C
P
G
D
Y
A
N
TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN
We can build the correct structure from the
correct contact map
Model
1QHJ (1.9 Å)
N
Contact map
MARC
C
RMSD = 2.5 Å
Representation of the input coding based on ordered couples.
(A) An alignment of 5 (hypothetical) sequences as they are represented in a HSSP file (Sander and
Schneider, 1991). i and j stand for the positions of the two residues making or not making contact (A
and D in the leading sequence or sequence 1). (B) Single sequence coding. The position representing
the couple (AD) in the vector is set to 1.0 while the other positions are set to 0. (C) Multiple
sequence coding. For each sequence in the alignment (1 to 5 in the scheme in A) a couple of residues
in position i and j is counted. The final input coding representing the frequency of each couple in
the alignment is normalized to the number of the sequences
The neural network architecture for prediction
of contact maps
T0087: 310 residues
A=20 % (FR/NF)
C
N
T0110: 128 residues
A=30% (NF)
N
C
Bioinformatics I
Model Accuracy Evaluation
CASP
Community Wide Experiment on the Critical Assessment
of Techniques for Protein Structure Prediction
http://PredictionCenter.llnl.gov/casp5/
EVA
Evaluation of Automatic protein structure prediction
[ Burkhard Rost, Andrej Sali, http://maple.bioc.columbia.edu/eva/ ]
3D - Crunch
Very Large Scale Protein Modelling Project
http://www.expasy.org/swissmod/SM_LikelyPrecision.html
Bioinformatics I
Protein Structure Resources
PDB
http://www.pdb.org
PDB – Protein Data Bank of experimentally solved structures (RCSB)
CATH
http://www.biochem.ucl.ac.uk/bsm/cath/
Hierarchical classification of protein domain structures
SCOP
http://scop.mrc-lmb.cam.ac.uk/scop/
Alexey Murzin’s Structural Classification of proteins
DALI
http://www2.ebi.ac.uk/dali/
Lisa Holm and Chris Sander’s protein structure comparison server
SS-Prediction and Fold Recognition
PHD
http://cubic.bioc.columbia.edu/predictprotein/
Burkhard Rost’s Secondary Structure and Solvent Accessibility Prediction Server
3DPSSM http://www.sbg.bio.ic.ac.uk/~3dpssm/
Fold Recognition Server using 1D and 3D Sequence Profiles coupled with Secondary
Structure and Solvation Potential Information.
Bioinformatics I
Protein Homology Modeling Resources
SWISS MODEL: http://www.expasy.ch/swissmod/
Deep View - SPDBV:
homepage: http://www.expasy.ch/spdbv/
Tutorials http://www.usm.maine.edu/~rhodes/SPVTut/
http://www.bbsrc.ac.uk/molbiol/
WhatIf http://www.cmbi.kun.nl/whatif/
Gert Vriend’s protein structure modeling analysis program WhatIf
Modeller: http://guitar.rockefeller.edu/modeller/
Andrej Sali's homology protein structure modelling by satisfaction of spatial restraints
FAMS: http://physchem.pharm.kitasato-u.ac.jp/FAMS/fams.html
Full Automatic Modelling System (FAMS); Kitasato University; Tokyo, Japan
3D-JIGSAW: http://www.bmm.icnet.uk/people/paulb/3dj/form.html
Comparative Modelling Server; Imperial Cancer Research Fund; London, UK
CPHmodels: http://www.cbs.dtu.dk/services/CPHmodels/
Centre for Biological Sequence Analysis; The Technical University of Denmark; Denmark
SDSC1: http://cl.sdsc.edu/hm.html
SDSC Structure Homology Modelling Server; San Diego Supercomputing Centre