MHC binding and MHC polymorphism Or Finding the needle in the haystack MHC-I molecules present peptides on the surface of most cells.

Download Report

Transcript MHC binding and MHC polymorphism Or Finding the needle in the haystack MHC-I molecules present peptides on the surface of most cells.

MHC binding and
MHC polymorphism
Or
Finding the needle in the haystack
MHC-I molecules present peptides
on the surface of most cells
HLA peptide binding
Figure by Anne Mølgaard
HLA binding motif
SLLPAIVEL
LLDVPTAAV
HLIDYLVTS
ILFGHENRV
LERPGGNEI
PLDGEYFTL
ILGFVFTLT
KLVALGINA
KTWGQYWQV
SLLAPGAKQ
ILTVILGVL
TGAPVTYST
GAGIGVAVL
KARDPHSGH
AVFDRKSDA
GLCTLVAML
VLHDDLLEA
ISNDVCAQV
YTAFTIPSI
NMFTPYIGV
VVLGVVFGI
GLYDGMEHL
EAAGIGILT
YLSTAFARV
FLDEFMEGV
AAGIGILTV
AAGIGILTV
YLLPAIVHI
VLFRGGPRG
ILAPPVVKL
ILMEHIHKL
ALSNLEVKL
GVLVGVALI
LLFGYPVYV
DLMGYIPLV
TITDQVPFS
KIFGSLAFL
KVLEYVIKV
VIYQYMDDL
IAGIGILAI
KACDPHSGH
LLDFVRFMG
FIDSYICQV
LMWITQCFL
VKTDGNPPE
RLMKQDFSV
LMIIPLINV
ILHNGAYSL
KMVELVHFL
TLDSQVMSL
YLLEMLWRL
ALQPGTALL
FLPSDFFPS
FLPSDFFPS
TLWVDPYEV
MVDGTLLLL
ALFPQLVIL
ILDQKINEV
ALNELLQHV
RTLDKVLEV
GLSPTVWLS
RLVTLKDIV
AFHHVAREL
ELVSEFSRM
FLWGPRALV
VLPDVFIRC
LIVIGILIL
ACDPHSGHF
VLVKSPNHV
IISAVVGIL
SLLMWITQC
SVYDFFVWL
RLPRIFCSC
TLFIGSHVV
MIMVKCWMI
YLQLVFGIE
STPPPGTRV
SLDDYNHLV
VLDGLDVLL
SVRDRLARL
AAGIGILTV
GLVPFLVSV
YMNGTMSQV
GILGFVFTL
SLAGGIIGV
DLERKVESL
HLSTAFARV
WLSLLVPFV
MLLAVLYCL
YLNKIQNSL
KLTPLCVTL
GLSRYVARL
VLPDVFIRC
LAGIGLIAA
SLYNTVATL
GLAPPQHLI
VMAGVGSPY
QLSLLMWIT
FLYGALLLA
FLWGPRAYA
SLVIVTTFV
MLGTHTMEV
MLMAQEALA
KVAELVHFL
RTLDKVLEV
SLYSFPEPE
SLREWLLRI
FLPSDFFPS
KLLEPVLLL
MLLSVPLLL
STNRQSGRQ
LLIENVASL
FLGENISNF
RLDSYVRSL
FLPSDFFPS
AAGIGILTV
MMRKLAILS
VLYRYGSFS
FLLTRILTI
AVGIGIAVV
VDGIGILTI
RGPGRAFVT
LLGRNSFEV
LLWTLVVLL
LLGATCMFV
VLFSSDFRI
RLLQETELV
VLQWASLAV
MLGTHTMEV
LMAQEALAF
IMIGVLVGV
GLPVEYLQV
ALYVDSLFF
LLSAWILTA
AAGIGILTV
LLDVPTAAV
SLLGLLVEV
GLDVLTAKV
FLLWATAEA
ALSDHHIYL
YMNGTMSQV
CLGGLLTMV
YLEPGPVTA
AIMDKNIIL
YIGEVLVSV
HLGNVKYLV
LVVLGLLAV
GAGIGVLTA
NLVPMVATV
PLTFGWCYK
SVRDRLARL
RLTRFLSRV
LMWAKIGPV
SLFEGIDFY
ILAKFLHWL
SLADTNSLA
VYDGREHTV
ALCRWGLLL
KLIANNTRV
SLLQHLIGL
AAGIGILTV
FLWGPRALV
LLDVPTAAV
ALLPPINIL
RILGAVAKV
SLPDFGISY
GLSEFTEYL
GILGFVFTL
FIAGNSAYE
LLDGTATLR
IMDKNIILK
CINGVCWTV
GIAGGLALL
ALGLGLLPV
AAGIGIIQI
GLHCYEQLV
VLEWRFDSR
LLMDCSGSI
YMDGTMSQV
SLLLELEEV
SLDQSVVEL
STAPPHVNV
LLWAARPRL
YLSGANLNL
LLFAGVQCQ
FIYAGSLSA
ELTLGEFLK
AVPDEIPPL
ETVSEQSNV
LLDVPTAAV
TLIKIQHTL
QVCERIPTI
KKREEAPSL
STAPPAHGV
ILKEPVHGV
KLGEFYNQM
ITDQVPFSV
SMVGNWAKV
VMNILLQYV
GLQDCTMLV
GIGIGVLAA
QAGIGILLA
PLKQHFQIV
TLNAWVKVV
CLTSTVQLV
FLTPKKLQC
SLSRFSWGA
RLNMFTPYI
LLLLTVLTV
GVALQTMKQ
RMFPNAPYL
VLLCESTAV
KLVANNTRL
MINAYLDKL
FAYDGKDYI
ITLWQRPLV
Peptide Binding motif
• Height of a column equal to I
• Relative height of a letter is p
HLA-A0201
High information
positions
NetMHC-3.2
www.cbs.dtu.dk/services/NetMHC-3.2
79 ANN’s
Covering Human, Primate and Mouse MHC
Predictions can be made for 8-11 mer petides
HLA polymorphism
The IMGT/HLA Sequence Database currently encompass more
than 1500 HLA class I proteins
Source: http://www.anthonynolan.com/HIG/index.html
HLA polymorphism
• < 70 HLA alleles are characterized by
binding data
• Reliable MHC class I binding predictions
(NetMHC-3.2) for 57 HLA A and B
molecules
• No methods for HLA-C, and HLA-E
• Long way to over 1500!
More MHC molecules: more diversity in
the presented peptides
• 1% probability that MHC molecule presents a peptide
• Different hosts sample different peptides from same pathogen.
HLA polymorphism
• Few human beings will share the same set
of HLA alleles
– Different persons will react to a pathogen
infection in a non-similar manner
• A CTL based vaccine must include
epitopes specific for each HLA allele in a
population
– A CTL based vaccine must consist of ~1000
HLA class I epitopes
HLA specificity clustering
A0201
A0101
A6802
B0702
Logos of
HLA-A
alleles
O Lund et al., Immunogenetics. 2004 55:797-810
Coverage of HLA alleles
Supertype Selected allele
A1
A*0101
A2
A*0201
A3
A*1101
A24
A*2401
A26 (new*)
A*2601
B7
B*0702
B8 (new*)
B*0801
B27
B*2705
B39(new*)
B*3901
B44
B*4001
B58
B*5801
B62
B*1501
Clustering in: O Lund et al., Immunogenetics. 2004 55:797-810
Data
1200
1000
800
600
400
200
0
HLA-A
HLA-B
HLA-C
Proteins
681
1165
569
SYFPEITHI
27
59
4
IEDB
34
28
0
• Alleles characterized with 5 or more data points
• 3% covered
Supertypes. What are they good
for?
• Alleles with in supertypes present the
same set of peptides!
• Is this really so?
– Less that 50% of A6802 binders will bind to
A0201!
– Less than 33% of A0201 binders will bind to
A6802!
The truth about supertypes!
A3
A26
A24
A2
A1
HLA polymorphism!
B0807
A6601
B4058
A3401
B5124
B2728
B4411
B0729
A0265
B3526
A3602
A0254
B4038
B1302
B0714
B3902
B0826
B7804
B3509
B4404
B4808
A2907
A1109
A2313
B4018
B4046
B0818
B5103
A2606
A0209
A2444
B5101
B1502
A6803
A2441
B4804
A0268
B1803
B5106
B4103
A3404
A0220
B3537
B5203
B4445
B0805
B2702
A0304
B4021
B1303
A2503
B3926
B0718
A3306
A3015
A7407
B4431
B3558
B0706
B4403
A0106
B5806
B5109
B1578
B0806
B4430
B1308
B3935
A0278
B5126
B0710
B0817
B1527
B3912
B0811
A6820
B1510
A2314
A3013
A0216
A6808
A6815
A7408
A2909
B1566
B1536
A2428
B4446
A6602
B5704
B1809
A0252
B5134
B1534
B1550
B9507
B0724
B5604
B1538
B4418
B0739
B4406
A2312
A3004
A2426
B1513
B5002
B3801
B1525
B3927
A3107
A2433
B0734
B3530
B1539
B4505
A3201
B7805
B3933
B2714
A0302
A1114
B4905
B1504
B4437
A0222
B4102
B5139
B5138
A0317
B3505
B7802
B1575
A2504
A2454
A3006
B4015
B4441
B4606
A1102
A6817
B5602
A6826
B5703
B4104
A2430
B5512
B3702
B4701
A3308
B1544
B1570
B3549
B4408
B3923
A3209
A2414
B9509
B5611
B4427
B4031
A2601
A0289
B0803
B4432
B4016
B3561
A3007
B1813
A2902
B2724
A2309
A3307
B1574
A2446
B5130
B3811
B5606
B4402
A1110
A0235
B5306
A0214
B4061
A2455
A0285
A0255
B1503
B4105
B5801
A0205
A3301
A0112
A2904
B8101
B1511
A6825
B5121
A2429
B4433
B3922
B0728
A2627
B4407
B8301
B1818
B8102
B1592
B1535
A0307
A0204
B4810
B0725
B0733
B1553
A2914
B1540
B4805
A0316
A0206
A3108
B5708
B4420
B0727
A2439
B2715
A0239
A0256
B3535
B4002
B4429
B5116
B4208
B5507
B3551
A7410
B1585
B3536
A0244
B4057
A2418
B0720
B0703
B1583
B1554
B3503
A0103
B5603
A2901
A2621
B1301
B5114
A0269
B4814
B4605
B5402
B4033
A1120
B5508
B2719
B5131
B4054
A6604
A2447
B3901
B1564
B5608
A0271
A6810
B9505
B1509
B2730
A2437
B1556
B5520
A3103
B4813
B4803
B1820
A0318
A2415
B1530
A0110
B0711
B5115
B4004
B3934
A3102
B2710
B2725
B6701
B4435
B1815
B4108
A0219
A0262
B0825
B4029
B6702
A1103
A2406
B4201
B2705
B1405
B8201
B0822
B4030
B3805
B5307
A2903
B5514
B3557
B0708
B3909
A3001
B0740
B4415
B1586
A6603
B1599
A2620
B5510
B5206
A7411
A0310
A6901
A2405
B5129
A3405
A2602
A6805
A0308
B1807
B1572
B3928
B1515
B5110
A2407
B2713
A3303
A3012
B4604
B4812
A0272
A6824
B0723
A6812
B5133
A2427
B1588
B3929
A3111
A3205
B3907
A0102
B1573
B1521
A6819
B3930
B4037
B0730
B4007
B0801
B1315
A2413
B5201
B3563
B5901
A2417
A2408
B5601
B4422
B4501
B3547
B5804
A0319
B3513
A1113
A2608
B1545
A2456
A2419
B1587
B5208
B3524
A0250
B7803
A0212
B4023
B5102
A0259
B0810
B3707
B0702
A1104
B4056
B4034
B0827
B3517
B1821
A1119
A0305
A2906
B1811
A6827
A2301
B2720
B3550
B4013
B4008
B4503
B3809
B5518
B2723
A0275
B4060
A0277
A0225
A0234
B3936
B5204
A6804
B3511
B2717
A0207
B0804
B5137
A3011
B5702
A2622
B5205
B4806
B5001
A1116
A0260
B1402
B4036
B1304
A2452
B1517
B4101
B2727
A2410
A3003
A0208
B5207
B5403
B3803
A2913
B4417
B5308
B4703
B5311
B0715
B3519
A2420
B3520
A2603
B4507
B4444
B1548
B3932
A1123
A1107
B5607
B1310
B5615
A3402
B0731
B4410
A0270
B1589
B3501
B3542
B0824
B3506
A3304
B2706
B5119
A0230
B1531
B3529
A0313
A2619
A0114
B3559
B5605
B0745
B0743
B4603
B1804
B3528
B5120
B4502
A3002
A2616
B4802
B1822
B7801
B4504
B5805
A0218
A0314
B4053
A6605
A2450
B1314
A2502
A2612
B1576
A0113
B1306
B1552
A3010
B1819
B3904
A2617
B3514
A0231
B3548
B1547
B9506
B5519
B0709
A2442
B3523
A2610
A0251
B4807
A6813
B5401
B4044
A6823
A0246
B4602
B1404
B3527
B4405
B1516
B1309
A1111
B1563
B5509
B1542
B4601
B5710
A2425
A1101
B0726
B2726
A2910
A3110
B9502
B2721
A0322
B5616
B3545
A0263
B5305
B1812
B3502
A6802
A3106
A2438
B5709
B0707
B3709
A4301
B3534
B1598
A2435
B3512
A2305
B4704
B8202
A3008
B4005
B4107
B1507
A2303
A7404
B5501
A0273
A3204
B3533
B5613
B5128
A6816
B4051
B0732
B4205
A0261
B1562
A0236
A0227
A3202
A2404
A6801
B1312
B5515
A2453
B3915
B3917
A0228
A3112
A2614
B0814
B4438
B1403
B4426
B3806
A3104
B2707
B5406
B4811
B3531
A0233
B1546
B3552
B4428
B0717
B3504
B3808
B1551
B4059
A7402
A2615
A2458
A0274
A2424
B0802
A7406
B5135
B1590
B4439
A2609
B2729
B4702
B1596
B0813
A7405
B5301
B4052
A6830
A2623
A6822
B4440
A0117
B3911
B4003
A0201
B0736
B3905
B3802
B5404
A2403
B3924
A2911
B5112
B3918
B4421
B5504
A2501
A2310
B0741
A3601
B0744
B1567
A0258
B1561
B3554
B3810
B5118
A3305
B5113
B1520
A6829
B0823
B5610
B4042
A0202
B5122
B4032
A2421
A2605
B4902
A2423
B4409
A3105
A0267
A2912
B3539
A0108
B4035
A0241
B4001
B4436
B4020
B4901
A1117
B4047
B3701
B4012
B5310
A2618
A0245
A0238
B3708
B2711
A0237
B3920
B4904
A8001
A3009
B1805
B5503
A3206
B3914
A2443
B1505
B1581
B1549
B5808
B4062
B1529
B3510
B5511
B1524
B2701
B5132
B1597
A7403
B4009
B5706
B3546
HLA polymorphism!
B1513
B3811
A3106
B3912
B5102
A3107
B3709
A2314
A7411
A0216
A3108
A2405
B4052
B4408
B4426
A0302
B4036
B5901
A2904
A3015
B1515
B4422
A0273
B4403
B5207
B3514
B1578
A6824
B2724
B5605
A2458
B0709
A2442
Predicting the specificity
Align A3001 (365) versus A3002 (365). Aln score 2445.000 Aln len 365 Id 0.9890
A3001
0 MAVMAPRTLLLLLSGALALTQTWAGSHSMRYFSTSVSRPGSGEPRFIAVGYVDDTQFVRFDSDAA
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
A3002
0 MAVMAPRTLLLLLSGALALTQTWAGSHSMRYFSTSVSRPGSGEPRFIAVGYVDDTQFVRFDSDAA
A3001
A3002
A3001
A3002
A3001
A3002
A3001
A3002
A3001
A3002
65 SQRMEPRAPWIEQERPEYWDQETRNVKAQSQTDRVDLGTLRGYYNQSEAGSHTIQIMYGCDVGSD
:::::::::::::::::::::::::::: ::::: :::::::::::::::::::::::::::::
65 SQRMEPRAPWIEQERPEYWDQETRNVKAHSQTDRENLGTLRGYYNQSEAGSHTIQIMYGCDVGSD
130 GRFLRGYEQHAYDGKDYIALNEDLRSWTAADMAAQITQRKWEAARWAEQLRAYLEGTCVEWLRRY
::::::::::::::::::::::::::::::::::::::::::::: :::::::::::::::::::
130 GRFLRGYEQHAYDGKDYIALNEDLRSWTAADMAAQITQRKWEAARRAEQLRAYLEGTCVEWLRRY
195 LENGKETLQRTDPPKTHMTHHPISDHEATLRCWALGFYPAEITLTWQRDGEDQTQDTELVETRPA
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
195 LENGKETLQRTDPPKTHMTHHPISDHEATLRCWALGFYPAEITLTWQRDGEDQTQDTELVETRPA
260 GDGTFQKWAAVVVPSGEEQRYTCHVQHEGLPKPLTLRWELSSQPTIPIVGIIAGLVLLGAVITGA
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
260 GDGTFQKWAAVVVPSGEEQRYTCHVQHEGLPKPLTLRWELSSQPTIPIVGIIAGLVLLGAVITGA
325 VVAAVMWRRKSSDRKGGSYTQAASSDSAQGSDVSLTACKV
::::::::::::::::::::::::::::::::::::::::
325 VVAAVMWRRKSSDRKGGSYTQAASSDSAQGSDVSLTACKV
HLAA*3001
HLAA*3002
NetMHCpan - the method
NetMHC
NetMHCpan
Pan-specific method
• Include polymorphic residues in potential
contact with the bound peptide
• The contact residues are defined as being
within 4.0 Å of the peptide in any of a
representative set of HLA-A and -B
structures with nonamer peptides.
• Only polymorphic residues from A, B, and C
alleles are included
• Pseudo-sequence consisting of 34
amino acid residues.
Example
Peptide
VVLQQHSIA
SQVSFQQPL
SQCQAIHNV
LQQSTYQLV
LQPFLQPQL
VLAGLLGNV
VLAGLLGNV
VLAGLLGNV
VLAGLLGNV
VLAGLLGNV
Amino acids of HLA pockets
YFAVLTWYGEKVHTHVDTLVRYHY
YFAVLTWYGEKVHTHVDTLVRYHY
YFAVLTWYGEKVHTHVDTLVRYHY
YFAVLTWYGEKVHTHVDTLVRYHY
YFAVLTWYGEKVHTHVDTLVRYHY
YFAVLTWYGEKVHTHVDTLVRYHY
YFAVWTWYGEKVHTHVDTLLRYHY
YFAEWTWYGEKVHTHVDTLVRYHY
YYAVLTWYGEKVHTHVDTLVRYHY
YYAVWTWYRNNVQTDVDTLIRYHY
HLA
A0201
A0201
A0201
A0201
A0201
A0201
A0202
A0203
A0206
A6802
Aff
0.131751
0.487500
0.364186
0.582749
0.206700
0.727865
0.706274
1.000000
0.682619
0.407855
NetMHCpan-2.2
www.cbs.dtu.dk/services/NetMHCpan
NetMHCpan-2.2
More than 1500 MHC
Predictions for 8-11 mer peptides
Predictions for novel HLA alleles
Specificity clustering
HLA-C
HLA-C
HLA-C*0102
Evaluation. MHC ligands from SYFPEITHI
Sort on
binding
Top Rank: F-rank=0.0
Random Rank: F-rank=0.5
SYFPEITHI benchmark
(1400 ligands restricted to 46 HLA molecules)
Prediction Primate MHCs
• Can we predict binding specificities for
non-human primates using the NetMHCpan
method trained on human specificity data
only?
Yes. Monkey are just like humans
Patr B*0101
Patr A*0101
Qu ickTime™ and a
TIFF (Uncompressed) dec ompressor
are nee ded to see this picture.
Qu ickTime™ and a
TIFF (Uncompressed) dec ompressor
are nee ded to see this picture.
Sidney et al. (2006)
Sidney et al. (2006)
And even Pigs and Cows are (somewhat)
like humans
Heste, grise, kø, og får
HLA class II polymorphism
• More than 2000 HLA class II allele
combinations
– HLA-DR
– HLA-DQ
– HLA-DP
• Only data for 14 of the more than 500
known HLA-DR allele (< 3%)
• No data for HLA-DQ and HLA-DP
Class II MHC binding
• MHC class II binds
peptides in the class II
antigen presentation
pathway
• Binds peptides of
length 9-18 (even whole
proteins can bind!)
• Binding cleft is open
• Binding core is 9 aa
NN-align
Update method to
Minimize prediction
error
PEPTIDE
VPLTDLRIPS
GWPYIGSRSQIIGRS
ILVQAGEAETMTPSG
HNWVNHAVPLAMKLI
SSTVKLRQNEFGPAR
NMLTHSINSLISDNL
LSSKFNKFVSPKSVS
GRWDEDGAKRIPVDV
ACVKDLVSKYLADNE
NLYIKSIQSLISDTQ
IYGLPWMTTQTSALS
QYDVIIQHPADMSWC
Pred
0.00
0.19
0.07
0.77
0.15
0.17
0.81
0.39
0.58
0.84
1.00
0.12
NN-align
Meas
0.03
0.08
0.24
0.59
0.19
0.02
0.97
0.45
0.57
0.66
0.93
0.11
Predict binding affinity
and core
GRWDEDGAKRIPVDV
0.45
GRWDEDGAKRIP
0.15
G RWDEDGAKRIPV
0.03
GR WDEDGAKRIPVD
0.39
GRW DEDGAKRIP VDV
0.05
Calculate prediction
error
Nielsen et al. BMC Bioinformatics 2009, 10:296
NetMHCII (NN-align)
P<0.001
P<0.05
Nielsen et al. BMC Bioinformatics 2009, 10:296
P<0.05
NetMHC-IIpan
• Train on peptide:MHC sequences
– As for the NetMHCpan for class I
• Need to identify peptide core!
– Use NetMHCII
Data
MHC pseudo sequence
• Include polymorphic residues in potential
contact with the bound peptide
• The contact residues are defined as being
within 4.0 Å of the peptide in any of a
representative set of HLA-DR, -DQ, and DP
structures with peptides.
• Only polymorphic residues are included
• Pseudo-sequence consisting of 25
amino acid residues.
NetMHCIIpan server
MHC Class II pathway
Figure by Eric A.J. Reits
Evaluation. MHC ligands from SYFPEITHI
Performance details
Epitope based vaccines and diagnostics
• Challenges
• Identify epitopes in pathogen genome
• A small viral genome contains >> 1000 potential CTL
epitopes
• HLA diversity
• No two humans will induce the same reaction to a
pathogen infection
• Viral escape
• No two viral strains will “host” the same set of T
cell epitopes
Viral escape
Figure courtesy Mette Voldby Larsen
Viral escape
The virus of today is different from the virus of
tomorrow (Viral escape)
Figure courtesy Mette Voldby Larsen
Viral escape
The virus of today is different from the virus of
tomorrow (Viral escape)
???
??
????
Figure courtesy Mette Voldby Larsen
Immune dominance
• Highly immunogenic
peptides
• High variablility = easy
escapable
• Immune response useless
Dominance
Subdominance
• Weakly immunogenic
peptides
• Low variability = no
escapable
• Immune response highly
effective = good vaccine
candidates
Pathogen variability
Rational epitope selection
• We have more than 2000 MHC molecules
• We have more than 500 different
pathogenic strains
• How to design a method to select a small
pool of peptides that will cover both the
MHC polymorphism and the pathogen
diversity?
– No peptide will bind to all MHC molecules and
few (maybe even no) peptides will be present
in all pathogenic strains
Polyvalent vaccines
• The equivalent of this in epitope based
vaccines is to select epitopes in a way so
that they together cover all strains.
Uneven coverage, Average coverage = 2
Epitope
Strain 1
Strain 2
Even coverage, Average coverage = 2
Strain 1
Strain 2
EpiSelect. Pathogen diversity
j
P
SjG   i
i   Ci
Cross-clade immunogens
Table 3 Highly i mmu n oge n i cepi tope sand th ere cross-clade re cogn i ti on
. 21 HLA-supertype
restricted epit opes were highly immunogenic and induced a CTL-response in at least four subjects.
The table shows the subtype the responding subjects were infected with and at which frequency the
epit ope sequence is found among the HIV-1 subtype reference strains.
Epitope sequence
HLA-supertype The subtypes
& protein region of the responders
Frequence of the epitope sequen ce in
subtype1:
A
B
C
D
AE
B, B, C, D, AE, nd
QVPLRPMTY A1-nef
B, B, B, C, C, AE
LTDTTNQKT A1-pol
B, D, A E, nd
KIQNFRVYY A1-pol
A1, A1,A1, B, B, B, B, C, AE, nd
FLGKIWPSHK A2-gag
A1, B, B, B, C, C, C
SLYNTVATL A2-gag
GALDLSHFL A2-nef, var. 12 A1, B, B, B, C, AG
AAVDLSHFL A2-nef, var. 2 A1, B, B, B, A G
B, B, B, B, C, C, nd
ILKEPVHGV A2-pol
B, C
QLTEAVQKI A2-pol
AVDLSHFLK A3-nef, var. 1 A1, B, D, nd
ALDLSHFLK A3-nef, var. 2 A1, B, D, nd
AFDLSFFLK A3-nef, var. 3 B, C, C, C, C, AE, AE
B B, B, C, C
WYIKIFIII A24-env
A24-gag
A1, B, B, C
HYMLKHLVW
IPRRIRQGL B7-env, var 1 A1, B, C, AE
IPRRIRQGF B7-env, var 2 A1, B, AE, CP X06
A1, B, C, D
HPVHAGPVA B7-gag
B7-gag
A1, B, C, D
RALGPGATL
A1, B, C, C
TPQDLNTML B7-pol
A1, A1,B, C, C, D, AE
SPAIFQSSM B7-pol
A1, A1, B, B, B, C
QEILDLWVY B44-nef
1
The color represents the frequencies of the exact epit opes sequence in the different subtypes; blue:
0%, light blue: 1-24%, orange: 25-49% and red: >50%. 2Subtype variants of the same epit ope. nd:
not determined
Perez. et al. JI, 2008
All HIV responsive patients respond to at
least one of nine peptides
Perez et al., JI, 2008
PopCover - Searching in two dimensions.
HIV class II case story
• Data
– 396 full length genomes with annotated tat, nef, gag and
pol proteins covering A(50), B(104) ,C(156), D(40) and
AE(46) strains
• HLA-DR frequencies taken from
– 43 (allele frequency in at least one population > 2.5%) HLA
class II alleles
• 36 HLA-DRB1, HLA-DR3,4,5, and 4 HLA-DQ alleles
• Select predicted peptide binders
– 5608(tat), 20961(nef), 31848(gag),42748(pol)
• Select peptides from each protein with optimal
genomic and HLA coverage
– tat(4), nef(15), gag(15) and pol(15)
EpiSelect and PoPCover
• EpiSelect
j
P
SjG   i
i   Ci
The sum is over all genomes i. Pji is 1 if epitope j is present in genome i. Ci is
the number of times genome i has been targeted in the already selected
set of epitopes
•
PopCover


SjA G
Rkij  fk  gi
 
  Eik
i k
The sum is over all genomes i and HLA alleles k. Rjki is 1 if epitope j is present
in genome i and is presented by allele k, and Eki is the number of times
allele k has been targets by epitopes in genome i by the already selected
set of epitopes, and gi is the genomes frequency
Benchmark
• Create 10,000 virtual patients with a given
HIV genomic sequence and HLA alleles as
defined by the HLA allele frequencies and
HIV genomic data
• Test how many of these patients that are
targets by at least on of the selected
peptides
0 .4
0 .3
HIV patient coverage
0 .2
0 .1
0
tat
nef
gag
1
0 .9
0 .8
0 .7
%Hit
0 .6
Random
H ighes t
P opC over
0 .5
0 .4
0 .3
0 .2
0 .1
0
tat
nef
gag
•Selected peptide pools
pol
–tat(4), nef(15), gag(15) and pol(15)
pol
So, we can find the needle in the haystack
• Given a protein sequence and an HLA molecule, we can
accurately predict with peptides will bind (70-95%)
• 15-80% of these will in turn be epitopes
But, can we find the haystack?
Conclusions
• Rational epitope discovery is feasible
– Prediction methods are an important guide for epitope
identification
– Given a protein sequence and an HLA molecule, we can predict
the peptide binders (find the needle in the haystack)
• Pan-specific MHC prediction method can deal with the
immense MHC polymorphism
• Epitope selection strategies can deal with pathogen
diversity
• For large pathogens, we still have no handle on how to
select immunogenic proteins (we cannot find the
haystack)
Acknowledgements
Immunological Bioinformatics group,
CBS, DTU
•
– Ole Lund - Group leader
– Claus Lundegaard - Data bases, HLA
binding predictions
• Collaborators
– IMMI, University of Copenhagen
• Søren Buus: MHC binding
– La Jolla Institute of Allergy and
Infectious Diseases
• A. Sette, B. Peters: Epitope
database
• and many, many more
www.cbs.dtu.dk/services