MHC binding and MHC polymorphism Or Finding the needle in the haystack MHC-I molecules present peptides on the surface of most cells.
Download ReportTranscript MHC binding and MHC polymorphism Or Finding the needle in the haystack MHC-I molecules present peptides on the surface of most cells.
MHC binding and MHC polymorphism Or Finding the needle in the haystack MHC-I molecules present peptides on the surface of most cells HLA peptide binding Figure by Anne Mølgaard HLA binding motif SLLPAIVEL LLDVPTAAV HLIDYLVTS ILFGHENRV LERPGGNEI PLDGEYFTL ILGFVFTLT KLVALGINA KTWGQYWQV SLLAPGAKQ ILTVILGVL TGAPVTYST GAGIGVAVL KARDPHSGH AVFDRKSDA GLCTLVAML VLHDDLLEA ISNDVCAQV YTAFTIPSI NMFTPYIGV VVLGVVFGI GLYDGMEHL EAAGIGILT YLSTAFARV FLDEFMEGV AAGIGILTV AAGIGILTV YLLPAIVHI VLFRGGPRG ILAPPVVKL ILMEHIHKL ALSNLEVKL GVLVGVALI LLFGYPVYV DLMGYIPLV TITDQVPFS KIFGSLAFL KVLEYVIKV VIYQYMDDL IAGIGILAI KACDPHSGH LLDFVRFMG FIDSYICQV LMWITQCFL VKTDGNPPE RLMKQDFSV LMIIPLINV ILHNGAYSL KMVELVHFL TLDSQVMSL YLLEMLWRL ALQPGTALL FLPSDFFPS FLPSDFFPS TLWVDPYEV MVDGTLLLL ALFPQLVIL ILDQKINEV ALNELLQHV RTLDKVLEV GLSPTVWLS RLVTLKDIV AFHHVAREL ELVSEFSRM FLWGPRALV VLPDVFIRC LIVIGILIL ACDPHSGHF VLVKSPNHV IISAVVGIL SLLMWITQC SVYDFFVWL RLPRIFCSC TLFIGSHVV MIMVKCWMI YLQLVFGIE STPPPGTRV SLDDYNHLV VLDGLDVLL SVRDRLARL AAGIGILTV GLVPFLVSV YMNGTMSQV GILGFVFTL SLAGGIIGV DLERKVESL HLSTAFARV WLSLLVPFV MLLAVLYCL YLNKIQNSL KLTPLCVTL GLSRYVARL VLPDVFIRC LAGIGLIAA SLYNTVATL GLAPPQHLI VMAGVGSPY QLSLLMWIT FLYGALLLA FLWGPRAYA SLVIVTTFV MLGTHTMEV MLMAQEALA KVAELVHFL RTLDKVLEV SLYSFPEPE SLREWLLRI FLPSDFFPS KLLEPVLLL MLLSVPLLL STNRQSGRQ LLIENVASL FLGENISNF RLDSYVRSL FLPSDFFPS AAGIGILTV MMRKLAILS VLYRYGSFS FLLTRILTI AVGIGIAVV VDGIGILTI RGPGRAFVT LLGRNSFEV LLWTLVVLL LLGATCMFV VLFSSDFRI RLLQETELV VLQWASLAV MLGTHTMEV LMAQEALAF IMIGVLVGV GLPVEYLQV ALYVDSLFF LLSAWILTA AAGIGILTV LLDVPTAAV SLLGLLVEV GLDVLTAKV FLLWATAEA ALSDHHIYL YMNGTMSQV CLGGLLTMV YLEPGPVTA AIMDKNIIL YIGEVLVSV HLGNVKYLV LVVLGLLAV GAGIGVLTA NLVPMVATV PLTFGWCYK SVRDRLARL RLTRFLSRV LMWAKIGPV SLFEGIDFY ILAKFLHWL SLADTNSLA VYDGREHTV ALCRWGLLL KLIANNTRV SLLQHLIGL AAGIGILTV FLWGPRALV LLDVPTAAV ALLPPINIL RILGAVAKV SLPDFGISY GLSEFTEYL GILGFVFTL FIAGNSAYE LLDGTATLR IMDKNIILK CINGVCWTV GIAGGLALL ALGLGLLPV AAGIGIIQI GLHCYEQLV VLEWRFDSR LLMDCSGSI YMDGTMSQV SLLLELEEV SLDQSVVEL STAPPHVNV LLWAARPRL YLSGANLNL LLFAGVQCQ FIYAGSLSA ELTLGEFLK AVPDEIPPL ETVSEQSNV LLDVPTAAV TLIKIQHTL QVCERIPTI KKREEAPSL STAPPAHGV ILKEPVHGV KLGEFYNQM ITDQVPFSV SMVGNWAKV VMNILLQYV GLQDCTMLV GIGIGVLAA QAGIGILLA PLKQHFQIV TLNAWVKVV CLTSTVQLV FLTPKKLQC SLSRFSWGA RLNMFTPYI LLLLTVLTV GVALQTMKQ RMFPNAPYL VLLCESTAV KLVANNTRL MINAYLDKL FAYDGKDYI ITLWQRPLV Peptide Binding motif • Height of a column equal to I • Relative height of a letter is p HLA-A0201 High information positions NetMHC-3.2 www.cbs.dtu.dk/services/NetMHC-3.2 79 ANN’s Covering Human, Primate and Mouse MHC Predictions can be made for 8-11 mer petides HLA polymorphism The IMGT/HLA Sequence Database currently encompass more than 1500 HLA class I proteins Source: http://www.anthonynolan.com/HIG/index.html HLA polymorphism • < 70 HLA alleles are characterized by binding data • Reliable MHC class I binding predictions (NetMHC-3.2) for 57 HLA A and B molecules • No methods for HLA-C, and HLA-E • Long way to over 1500! More MHC molecules: more diversity in the presented peptides • 1% probability that MHC molecule presents a peptide • Different hosts sample different peptides from same pathogen. HLA polymorphism • Few human beings will share the same set of HLA alleles – Different persons will react to a pathogen infection in a non-similar manner • A CTL based vaccine must include epitopes specific for each HLA allele in a population – A CTL based vaccine must consist of ~1000 HLA class I epitopes HLA specificity clustering A0201 A0101 A6802 B0702 Logos of HLA-A alleles O Lund et al., Immunogenetics. 2004 55:797-810 Coverage of HLA alleles Supertype Selected allele A1 A*0101 A2 A*0201 A3 A*1101 A24 A*2401 A26 (new*) A*2601 B7 B*0702 B8 (new*) B*0801 B27 B*2705 B39(new*) B*3901 B44 B*4001 B58 B*5801 B62 B*1501 Clustering in: O Lund et al., Immunogenetics. 2004 55:797-810 Data 1200 1000 800 600 400 200 0 HLA-A HLA-B HLA-C Proteins 681 1165 569 SYFPEITHI 27 59 4 IEDB 34 28 0 • Alleles characterized with 5 or more data points • 3% covered Supertypes. What are they good for? • Alleles with in supertypes present the same set of peptides! • Is this really so? – Less that 50% of A6802 binders will bind to A0201! – Less than 33% of A0201 binders will bind to A6802! The truth about supertypes! A3 A26 A24 A2 A1 HLA polymorphism! B0807 A6601 B4058 A3401 B5124 B2728 B4411 B0729 A0265 B3526 A3602 A0254 B4038 B1302 B0714 B3902 B0826 B7804 B3509 B4404 B4808 A2907 A1109 A2313 B4018 B4046 B0818 B5103 A2606 A0209 A2444 B5101 B1502 A6803 A2441 B4804 A0268 B1803 B5106 B4103 A3404 A0220 B3537 B5203 B4445 B0805 B2702 A0304 B4021 B1303 A2503 B3926 B0718 A3306 A3015 A7407 B4431 B3558 B0706 B4403 A0106 B5806 B5109 B1578 B0806 B4430 B1308 B3935 A0278 B5126 B0710 B0817 B1527 B3912 B0811 A6820 B1510 A2314 A3013 A0216 A6808 A6815 A7408 A2909 B1566 B1536 A2428 B4446 A6602 B5704 B1809 A0252 B5134 B1534 B1550 B9507 B0724 B5604 B1538 B4418 B0739 B4406 A2312 A3004 A2426 B1513 B5002 B3801 B1525 B3927 A3107 A2433 B0734 B3530 B1539 B4505 A3201 B7805 B3933 B2714 A0302 A1114 B4905 B1504 B4437 A0222 B4102 B5139 B5138 A0317 B3505 B7802 B1575 A2504 A2454 A3006 B4015 B4441 B4606 A1102 A6817 B5602 A6826 B5703 B4104 A2430 B5512 B3702 B4701 A3308 B1544 B1570 B3549 B4408 B3923 A3209 A2414 B9509 B5611 B4427 B4031 A2601 A0289 B0803 B4432 B4016 B3561 A3007 B1813 A2902 B2724 A2309 A3307 B1574 A2446 B5130 B3811 B5606 B4402 A1110 A0235 B5306 A0214 B4061 A2455 A0285 A0255 B1503 B4105 B5801 A0205 A3301 A0112 A2904 B8101 B1511 A6825 B5121 A2429 B4433 B3922 B0728 A2627 B4407 B8301 B1818 B8102 B1592 B1535 A0307 A0204 B4810 B0725 B0733 B1553 A2914 B1540 B4805 A0316 A0206 A3108 B5708 B4420 B0727 A2439 B2715 A0239 A0256 B3535 B4002 B4429 B5116 B4208 B5507 B3551 A7410 B1585 B3536 A0244 B4057 A2418 B0720 B0703 B1583 B1554 B3503 A0103 B5603 A2901 A2621 B1301 B5114 A0269 B4814 B4605 B5402 B4033 A1120 B5508 B2719 B5131 B4054 A6604 A2447 B3901 B1564 B5608 A0271 A6810 B9505 B1509 B2730 A2437 B1556 B5520 A3103 B4813 B4803 B1820 A0318 A2415 B1530 A0110 B0711 B5115 B4004 B3934 A3102 B2710 B2725 B6701 B4435 B1815 B4108 A0219 A0262 B0825 B4029 B6702 A1103 A2406 B4201 B2705 B1405 B8201 B0822 B4030 B3805 B5307 A2903 B5514 B3557 B0708 B3909 A3001 B0740 B4415 B1586 A6603 B1599 A2620 B5510 B5206 A7411 A0310 A6901 A2405 B5129 A3405 A2602 A6805 A0308 B1807 B1572 B3928 B1515 B5110 A2407 B2713 A3303 A3012 B4604 B4812 A0272 A6824 B0723 A6812 B5133 A2427 B1588 B3929 A3111 A3205 B3907 A0102 B1573 B1521 A6819 B3930 B4037 B0730 B4007 B0801 B1315 A2413 B5201 B3563 B5901 A2417 A2408 B5601 B4422 B4501 B3547 B5804 A0319 B3513 A1113 A2608 B1545 A2456 A2419 B1587 B5208 B3524 A0250 B7803 A0212 B4023 B5102 A0259 B0810 B3707 B0702 A1104 B4056 B4034 B0827 B3517 B1821 A1119 A0305 A2906 B1811 A6827 A2301 B2720 B3550 B4013 B4008 B4503 B3809 B5518 B2723 A0275 B4060 A0277 A0225 A0234 B3936 B5204 A6804 B3511 B2717 A0207 B0804 B5137 A3011 B5702 A2622 B5205 B4806 B5001 A1116 A0260 B1402 B4036 B1304 A2452 B1517 B4101 B2727 A2410 A3003 A0208 B5207 B5403 B3803 A2913 B4417 B5308 B4703 B5311 B0715 B3519 A2420 B3520 A2603 B4507 B4444 B1548 B3932 A1123 A1107 B5607 B1310 B5615 A3402 B0731 B4410 A0270 B1589 B3501 B3542 B0824 B3506 A3304 B2706 B5119 A0230 B1531 B3529 A0313 A2619 A0114 B3559 B5605 B0745 B0743 B4603 B1804 B3528 B5120 B4502 A3002 A2616 B4802 B1822 B7801 B4504 B5805 A0218 A0314 B4053 A6605 A2450 B1314 A2502 A2612 B1576 A0113 B1306 B1552 A3010 B1819 B3904 A2617 B3514 A0231 B3548 B1547 B9506 B5519 B0709 A2442 B3523 A2610 A0251 B4807 A6813 B5401 B4044 A6823 A0246 B4602 B1404 B3527 B4405 B1516 B1309 A1111 B1563 B5509 B1542 B4601 B5710 A2425 A1101 B0726 B2726 A2910 A3110 B9502 B2721 A0322 B5616 B3545 A0263 B5305 B1812 B3502 A6802 A3106 A2438 B5709 B0707 B3709 A4301 B3534 B1598 A2435 B3512 A2305 B4704 B8202 A3008 B4005 B4107 B1507 A2303 A7404 B5501 A0273 A3204 B3533 B5613 B5128 A6816 B4051 B0732 B4205 A0261 B1562 A0236 A0227 A3202 A2404 A6801 B1312 B5515 A2453 B3915 B3917 A0228 A3112 A2614 B0814 B4438 B1403 B4426 B3806 A3104 B2707 B5406 B4811 B3531 A0233 B1546 B3552 B4428 B0717 B3504 B3808 B1551 B4059 A7402 A2615 A2458 A0274 A2424 B0802 A7406 B5135 B1590 B4439 A2609 B2729 B4702 B1596 B0813 A7405 B5301 B4052 A6830 A2623 A6822 B4440 A0117 B3911 B4003 A0201 B0736 B3905 B3802 B5404 A2403 B3924 A2911 B5112 B3918 B4421 B5504 A2501 A2310 B0741 A3601 B0744 B1567 A0258 B1561 B3554 B3810 B5118 A3305 B5113 B1520 A6829 B0823 B5610 B4042 A0202 B5122 B4032 A2421 A2605 B4902 A2423 B4409 A3105 A0267 A2912 B3539 A0108 B4035 A0241 B4001 B4436 B4020 B4901 A1117 B4047 B3701 B4012 B5310 A2618 A0245 A0238 B3708 B2711 A0237 B3920 B4904 A8001 A3009 B1805 B5503 A3206 B3914 A2443 B1505 B1581 B1549 B5808 B4062 B1529 B3510 B5511 B1524 B2701 B5132 B1597 A7403 B4009 B5706 B3546 HLA polymorphism! B1513 B3811 A3106 B3912 B5102 A3107 B3709 A2314 A7411 A0216 A3108 A2405 B4052 B4408 B4426 A0302 B4036 B5901 A2904 A3015 B1515 B4422 A0273 B4403 B5207 B3514 B1578 A6824 B2724 B5605 A2458 B0709 A2442 Predicting the specificity Align A3001 (365) versus A3002 (365). Aln score 2445.000 Aln len 365 Id 0.9890 A3001 0 MAVMAPRTLLLLLSGALALTQTWAGSHSMRYFSTSVSRPGSGEPRFIAVGYVDDTQFVRFDSDAA ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: A3002 0 MAVMAPRTLLLLLSGALALTQTWAGSHSMRYFSTSVSRPGSGEPRFIAVGYVDDTQFVRFDSDAA A3001 A3002 A3001 A3002 A3001 A3002 A3001 A3002 A3001 A3002 65 SQRMEPRAPWIEQERPEYWDQETRNVKAQSQTDRVDLGTLRGYYNQSEAGSHTIQIMYGCDVGSD :::::::::::::::::::::::::::: ::::: ::::::::::::::::::::::::::::: 65 SQRMEPRAPWIEQERPEYWDQETRNVKAHSQTDRENLGTLRGYYNQSEAGSHTIQIMYGCDVGSD 130 GRFLRGYEQHAYDGKDYIALNEDLRSWTAADMAAQITQRKWEAARWAEQLRAYLEGTCVEWLRRY ::::::::::::::::::::::::::::::::::::::::::::: ::::::::::::::::::: 130 GRFLRGYEQHAYDGKDYIALNEDLRSWTAADMAAQITQRKWEAARRAEQLRAYLEGTCVEWLRRY 195 LENGKETLQRTDPPKTHMTHHPISDHEATLRCWALGFYPAEITLTWQRDGEDQTQDTELVETRPA ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 195 LENGKETLQRTDPPKTHMTHHPISDHEATLRCWALGFYPAEITLTWQRDGEDQTQDTELVETRPA 260 GDGTFQKWAAVVVPSGEEQRYTCHVQHEGLPKPLTLRWELSSQPTIPIVGIIAGLVLLGAVITGA ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 260 GDGTFQKWAAVVVPSGEEQRYTCHVQHEGLPKPLTLRWELSSQPTIPIVGIIAGLVLLGAVITGA 325 VVAAVMWRRKSSDRKGGSYTQAASSDSAQGSDVSLTACKV :::::::::::::::::::::::::::::::::::::::: 325 VVAAVMWRRKSSDRKGGSYTQAASSDSAQGSDVSLTACKV HLAA*3001 HLAA*3002 NetMHCpan - the method NetMHC NetMHCpan Pan-specific method • Include polymorphic residues in potential contact with the bound peptide • The contact residues are defined as being within 4.0 Å of the peptide in any of a representative set of HLA-A and -B structures with nonamer peptides. • Only polymorphic residues from A, B, and C alleles are included • Pseudo-sequence consisting of 34 amino acid residues. Example Peptide VVLQQHSIA SQVSFQQPL SQCQAIHNV LQQSTYQLV LQPFLQPQL VLAGLLGNV VLAGLLGNV VLAGLLGNV VLAGLLGNV VLAGLLGNV Amino acids of HLA pockets YFAVLTWYGEKVHTHVDTLVRYHY YFAVLTWYGEKVHTHVDTLVRYHY YFAVLTWYGEKVHTHVDTLVRYHY YFAVLTWYGEKVHTHVDTLVRYHY YFAVLTWYGEKVHTHVDTLVRYHY YFAVLTWYGEKVHTHVDTLVRYHY YFAVWTWYGEKVHTHVDTLLRYHY YFAEWTWYGEKVHTHVDTLVRYHY YYAVLTWYGEKVHTHVDTLVRYHY YYAVWTWYRNNVQTDVDTLIRYHY HLA A0201 A0201 A0201 A0201 A0201 A0201 A0202 A0203 A0206 A6802 Aff 0.131751 0.487500 0.364186 0.582749 0.206700 0.727865 0.706274 1.000000 0.682619 0.407855 NetMHCpan-2.2 www.cbs.dtu.dk/services/NetMHCpan NetMHCpan-2.2 More than 1500 MHC Predictions for 8-11 mer peptides Predictions for novel HLA alleles Specificity clustering HLA-C HLA-C HLA-C*0102 Evaluation. MHC ligands from SYFPEITHI Sort on binding Top Rank: F-rank=0.0 Random Rank: F-rank=0.5 SYFPEITHI benchmark (1400 ligands restricted to 46 HLA molecules) Prediction Primate MHCs • Can we predict binding specificities for non-human primates using the NetMHCpan method trained on human specificity data only? Yes. Monkey are just like humans Patr B*0101 Patr A*0101 Qu ickTime™ and a TIFF (Uncompressed) dec ompressor are nee ded to see this picture. Qu ickTime™ and a TIFF (Uncompressed) dec ompressor are nee ded to see this picture. Sidney et al. (2006) Sidney et al. (2006) And even Pigs and Cows are (somewhat) like humans Heste, grise, kø, og får HLA class II polymorphism • More than 2000 HLA class II allele combinations – HLA-DR – HLA-DQ – HLA-DP • Only data for 14 of the more than 500 known HLA-DR allele (< 3%) • No data for HLA-DQ and HLA-DP Class II MHC binding • MHC class II binds peptides in the class II antigen presentation pathway • Binds peptides of length 9-18 (even whole proteins can bind!) • Binding cleft is open • Binding core is 9 aa NN-align Update method to Minimize prediction error PEPTIDE VPLTDLRIPS GWPYIGSRSQIIGRS ILVQAGEAETMTPSG HNWVNHAVPLAMKLI SSTVKLRQNEFGPAR NMLTHSINSLISDNL LSSKFNKFVSPKSVS GRWDEDGAKRIPVDV ACVKDLVSKYLADNE NLYIKSIQSLISDTQ IYGLPWMTTQTSALS QYDVIIQHPADMSWC Pred 0.00 0.19 0.07 0.77 0.15 0.17 0.81 0.39 0.58 0.84 1.00 0.12 NN-align Meas 0.03 0.08 0.24 0.59 0.19 0.02 0.97 0.45 0.57 0.66 0.93 0.11 Predict binding affinity and core GRWDEDGAKRIPVDV 0.45 GRWDEDGAKRIP 0.15 G RWDEDGAKRIPV 0.03 GR WDEDGAKRIPVD 0.39 GRW DEDGAKRIP VDV 0.05 Calculate prediction error Nielsen et al. BMC Bioinformatics 2009, 10:296 NetMHCII (NN-align) P<0.001 P<0.05 Nielsen et al. BMC Bioinformatics 2009, 10:296 P<0.05 NetMHC-IIpan • Train on peptide:MHC sequences – As for the NetMHCpan for class I • Need to identify peptide core! – Use NetMHCII Data MHC pseudo sequence • Include polymorphic residues in potential contact with the bound peptide • The contact residues are defined as being within 4.0 Å of the peptide in any of a representative set of HLA-DR, -DQ, and DP structures with peptides. • Only polymorphic residues are included • Pseudo-sequence consisting of 25 amino acid residues. NetMHCIIpan server MHC Class II pathway Figure by Eric A.J. Reits Evaluation. MHC ligands from SYFPEITHI Performance details Epitope based vaccines and diagnostics • Challenges • Identify epitopes in pathogen genome • A small viral genome contains >> 1000 potential CTL epitopes • HLA diversity • No two humans will induce the same reaction to a pathogen infection • Viral escape • No two viral strains will “host” the same set of T cell epitopes Viral escape Figure courtesy Mette Voldby Larsen Viral escape The virus of today is different from the virus of tomorrow (Viral escape) Figure courtesy Mette Voldby Larsen Viral escape The virus of today is different from the virus of tomorrow (Viral escape) ??? ?? ???? Figure courtesy Mette Voldby Larsen Immune dominance • Highly immunogenic peptides • High variablility = easy escapable • Immune response useless Dominance Subdominance • Weakly immunogenic peptides • Low variability = no escapable • Immune response highly effective = good vaccine candidates Pathogen variability Rational epitope selection • We have more than 2000 MHC molecules • We have more than 500 different pathogenic strains • How to design a method to select a small pool of peptides that will cover both the MHC polymorphism and the pathogen diversity? – No peptide will bind to all MHC molecules and few (maybe even no) peptides will be present in all pathogenic strains Polyvalent vaccines • The equivalent of this in epitope based vaccines is to select epitopes in a way so that they together cover all strains. Uneven coverage, Average coverage = 2 Epitope Strain 1 Strain 2 Even coverage, Average coverage = 2 Strain 1 Strain 2 EpiSelect. Pathogen diversity j P SjG i i Ci Cross-clade immunogens Table 3 Highly i mmu n oge n i cepi tope sand th ere cross-clade re cogn i ti on . 21 HLA-supertype restricted epit opes were highly immunogenic and induced a CTL-response in at least four subjects. The table shows the subtype the responding subjects were infected with and at which frequency the epit ope sequence is found among the HIV-1 subtype reference strains. Epitope sequence HLA-supertype The subtypes & protein region of the responders Frequence of the epitope sequen ce in subtype1: A B C D AE B, B, C, D, AE, nd QVPLRPMTY A1-nef B, B, B, C, C, AE LTDTTNQKT A1-pol B, D, A E, nd KIQNFRVYY A1-pol A1, A1,A1, B, B, B, B, C, AE, nd FLGKIWPSHK A2-gag A1, B, B, B, C, C, C SLYNTVATL A2-gag GALDLSHFL A2-nef, var. 12 A1, B, B, B, C, AG AAVDLSHFL A2-nef, var. 2 A1, B, B, B, A G B, B, B, B, C, C, nd ILKEPVHGV A2-pol B, C QLTEAVQKI A2-pol AVDLSHFLK A3-nef, var. 1 A1, B, D, nd ALDLSHFLK A3-nef, var. 2 A1, B, D, nd AFDLSFFLK A3-nef, var. 3 B, C, C, C, C, AE, AE B B, B, C, C WYIKIFIII A24-env A24-gag A1, B, B, C HYMLKHLVW IPRRIRQGL B7-env, var 1 A1, B, C, AE IPRRIRQGF B7-env, var 2 A1, B, AE, CP X06 A1, B, C, D HPVHAGPVA B7-gag B7-gag A1, B, C, D RALGPGATL A1, B, C, C TPQDLNTML B7-pol A1, A1,B, C, C, D, AE SPAIFQSSM B7-pol A1, A1, B, B, B, C QEILDLWVY B44-nef 1 The color represents the frequencies of the exact epit opes sequence in the different subtypes; blue: 0%, light blue: 1-24%, orange: 25-49% and red: >50%. 2Subtype variants of the same epit ope. nd: not determined Perez. et al. JI, 2008 All HIV responsive patients respond to at least one of nine peptides Perez et al., JI, 2008 PopCover - Searching in two dimensions. HIV class II case story • Data – 396 full length genomes with annotated tat, nef, gag and pol proteins covering A(50), B(104) ,C(156), D(40) and AE(46) strains • HLA-DR frequencies taken from – 43 (allele frequency in at least one population > 2.5%) HLA class II alleles • 36 HLA-DRB1, HLA-DR3,4,5, and 4 HLA-DQ alleles • Select predicted peptide binders – 5608(tat), 20961(nef), 31848(gag),42748(pol) • Select peptides from each protein with optimal genomic and HLA coverage – tat(4), nef(15), gag(15) and pol(15) EpiSelect and PoPCover • EpiSelect j P SjG i i Ci The sum is over all genomes i. Pji is 1 if epitope j is present in genome i. Ci is the number of times genome i has been targeted in the already selected set of epitopes • PopCover SjA G Rkij fk gi Eik i k The sum is over all genomes i and HLA alleles k. Rjki is 1 if epitope j is present in genome i and is presented by allele k, and Eki is the number of times allele k has been targets by epitopes in genome i by the already selected set of epitopes, and gi is the genomes frequency Benchmark • Create 10,000 virtual patients with a given HIV genomic sequence and HLA alleles as defined by the HLA allele frequencies and HIV genomic data • Test how many of these patients that are targets by at least on of the selected peptides 0 .4 0 .3 HIV patient coverage 0 .2 0 .1 0 tat nef gag 1 0 .9 0 .8 0 .7 %Hit 0 .6 Random H ighes t P opC over 0 .5 0 .4 0 .3 0 .2 0 .1 0 tat nef gag •Selected peptide pools pol –tat(4), nef(15), gag(15) and pol(15) pol So, we can find the needle in the haystack • Given a protein sequence and an HLA molecule, we can accurately predict with peptides will bind (70-95%) • 15-80% of these will in turn be epitopes But, can we find the haystack? Conclusions • Rational epitope discovery is feasible – Prediction methods are an important guide for epitope identification – Given a protein sequence and an HLA molecule, we can predict the peptide binders (find the needle in the haystack) • Pan-specific MHC prediction method can deal with the immense MHC polymorphism • Epitope selection strategies can deal with pathogen diversity • For large pathogens, we still have no handle on how to select immunogenic proteins (we cannot find the haystack) Acknowledgements Immunological Bioinformatics group, CBS, DTU • – Ole Lund - Group leader – Claus Lundegaard - Data bases, HLA binding predictions • Collaborators – IMMI, University of Copenhagen • Søren Buus: MHC binding – La Jolla Institute of Allergy and Infectious Diseases • A. Sette, B. Peters: Epitope database • and many, many more www.cbs.dtu.dk/services