Immunological Bioinformatics The Immunological Bioinformatics group •Immunological Bioinformatics group, CBS, Technical University of Denmark (www.cbs.dtu.dk) •Ole Lund, Group Leader • Morten Nielsen, Associate Professor •
Download ReportTranscript Immunological Bioinformatics The Immunological Bioinformatics group •Immunological Bioinformatics group, CBS, Technical University of Denmark (www.cbs.dtu.dk) •Ole Lund, Group Leader • Morten Nielsen, Associate Professor •
Immunological Bioinformatics The Immunological Bioinformatics group •Immunological Bioinformatics group, CBS, Technical University of Denmark (www.cbs.dtu.dk) •Ole Lund, Group Leader • Morten Nielsen, Associate Professor • Claus Lundegaard , Associate Professor • Jean Vennestrøm, post doc. • Thomas Blicher (50%), post doc. • Mette Voldby Larsen, PhD student • Pernille Haste Andersen, PhD student • Sune Frankild, PhD student • Sheila Tang, PhD student • Thomas Rask (50%), PhD student • Nicolas Rapin , PhD student • Ilka Hoff , PhD student •Jorid Sørli, PhD student • Hao Zhang, PhD student •MSc students •Collaborators •IMMI, University of Copenhagen • Søren Buus MHC binding • Mogens H Claesson Elispot Assay •La Jolla Institute of Allergy and Infectious Diseases • Allesandro Sette Epitope database • Bjoern Peters •Leiden University Medical Center • Tom Ottenhoff Tuberculosis • Michel Klein •Ganymed • Ugur Sahin Genetic library •University of Tubingen • Stefan Stevanovic MHC ligands •INSERM • Peter van Endert Tap binding •University of Mainz • Hansjörg Schild Proteasome •Schafer-Nielsen • Claus Schafer-Nielsen Peptide synthesis •ImmunoGrid • Elda Rossi Simulation of the • Vladimir Brusic Immune system •University of Utrectht • Can Kesmir Ideas Figure 1-20 Effectiveness of vaccines 1958 start of small pox eradication program The Immune System • The innate immune system • The adaptive immune system The innate immune system • • • • • Unspecific Antigen independent Immediate response No training/selection hence no memory Pathogen independent (but response might be pathogen type dependent) The adaptive immune system • Pathogen specific – Humoral Parasite – Cellular http://tpeeaupotable.ifrance.com/ma%20photo/bilharzoze.jpg Virus http://en.wikipedia.org/wiki/Image:Aids_virus.jpg Bacteria http://www.uni-heidelberg.de/zentral/ztl/grafiken_bilder/bilder/e-coli.jpg Adaptive immune response • Signal induced – Pathogens • Antigens – Epitopes B Cell T Cell Humoral immunity Cartoon by Eric Reits Antibody - Antigen interaction Antigen The antibody recognizes structural properties of the surface of the antigen Fab Epitope Paratope Antibody Cellular Immunity MHC class I with peptide Anchor positions HLA specificity clustering A0201 A0101 A6802 B0702 Prediction of HLA binding specificity Historical overview • Simple Motifs – Allowed/non allowed amino acids • Extended motifs – Amino acid preferences (SYFPEITHI) – Anchor/Preferred/other amino acids • Hidden Markov models – Peptide statistics from sequence alignment • SVMs and neural networks – Can take sequence correlations into account Sequence information SLLPAIVEL LLDVPTAAV HLIDYLVTS ILFGHENRV LERPGGNEI PLDGEYFTL ILGFVFTLT KLVALGINA KTWGQYWQV SLLAPGAKQ ILTVILGVL TGAPVTYST GAGIGVAVL KARDPHSGH AVFDRKSDA GLCTLVAML VLHDDLLEA ISNDVCAQV YTAFTIPSI NMFTPYIGV VVLGVVFGI GLYDGMEHL EAAGIGILT YLSTAFARV FLDEFMEGV AAGIGILTV AAGIGILTV YLLPAIVHI VLFRGGPRG ILAPPVVKL ILMEHIHKL ALSNLEVKL GVLVGVALI LLFGYPVYV DLMGYIPLV TITDQVPFS KIFGSLAFL KVLEYVIKV VIYQYMDDL IAGIGILAI KACDPHSGH LLDFVRFMG FIDSYICQV LMWITQCFL VKTDGNPPE RLMKQDFSV LMIIPLINV ILHNGAYSL KMVELVHFL TLDSQVMSL YLLEMLWRL ALQPGTALL FLPSDFFPS FLPSDFFPS TLWVDPYEV MVDGTLLLL ALFPQLVIL ILDQKINEV ALNELLQHV RTLDKVLEV GLSPTVWLS RLVTLKDIV AFHHVAREL ELVSEFSRM FLWGPRALV VLPDVFIRC LIVIGILIL ACDPHSGHF VLVKSPNHV IISAVVGIL SLLMWITQC SVYDFFVWL RLPRIFCSC TLFIGSHVV MIMVKCWMI YLQLVFGIE STPPPGTRV SLDDYNHLV VLDGLDVLL SVRDRLARL AAGIGILTV GLVPFLVSV YMNGTMSQV GILGFVFTL SLAGGIIGV DLERKVESL HLSTAFARV WLSLLVPFV MLLAVLYCL YLNKIQNSL KLTPLCVTL GLSRYVARL VLPDVFIRC LAGIGLIAA SLYNTVATL GLAPPQHLI VMAGVGSPY QLSLLMWIT FLYGALLLA FLWGPRAYA SLVIVTTFV MLGTHTMEV MLMAQEALA KVAELVHFL RTLDKVLEV SLYSFPEPE SLREWLLRI FLPSDFFPS KLLEPVLLL MLLSVPLLL STNRQSGRQ LLIENVASL FLGENISNF RLDSYVRSL FLPSDFFPS AAGIGILTV MMRKLAILS VLYRYGSFS FLLTRILTI AVGIGIAVV VDGIGILTI RGPGRAFVT LLGRNSFEV LLWTLVVLL LLGATCMFV VLFSSDFRI RLLQETELV VLQWASLAV MLGTHTMEV LMAQEALAF IMIGVLVGV GLPVEYLQV ALYVDSLFF LLSAWILTA AAGIGILTV LLDVPTAAV SLLGLLVEV GLDVLTAKV FLLWATAEA ALSDHHIYL YMNGTMSQV CLGGLLTMV YLEPGPVTA AIMDKNIIL YIGEVLVSV HLGNVKYLV LVVLGLLAV GAGIGVLTA NLVPMVATV PLTFGWCYK SVRDRLARL RLTRFLSRV LMWAKIGPV SLFEGIDFY ILAKFLHWL SLADTNSLA VYDGREHTV ALCRWGLLL KLIANNTRV SLLQHLIGL AAGIGILTV FLWGPRALV LLDVPTAAV ALLPPINIL RILGAVAKV SLPDFGISY GLSEFTEYL GILGFVFTL FIAGNSAYE LLDGTATLR IMDKNIILK CINGVCWTV GIAGGLALL ALGLGLLPV AAGIGIIQI GLHCYEQLV VLEWRFDSR LLMDCSGSI YMDGTMSQV SLLLELEEV SLDQSVVEL STAPPHVNV LLWAARPRL YLSGANLNL LLFAGVQCQ FIYAGSLSA ELTLGEFLK AVPDEIPPL ETVSEQSNV LLDVPTAAV TLIKIQHTL QVCERIPTI KKREEAPSL STAPPAHGV ILKEPVHGV KLGEFYNQM ITDQVPFSV SMVGNWAKV VMNILLQYV GLQDCTMLV GIGIGVLAA QAGIGILLA PLKQHFQIV TLNAWVKVV CLTSTVQLV FLTPKKLQC SLSRFSWGA RLNMFTPYI LLLLTVLTV GVALQTMKQ RMFPNAPYL VLLCESTAV KLVANNTRL MINAYLDKL FAYDGKDYI ITLWQRPLV Scoring a sequence to a weight matrix • Score sequences to weight matrix by looking up and adding L values from the matrix 1 2 3 4 5 6 7 8 9 A 0.6 -1.6 0.2 -0.1 -1.6 -0.7 1.1 -2.2 -0.2 R 0.4 -6.6 -1.3 -0.1 -0.1 -1.4 -3.8 1.0 -3.5 N -3.5 -6.5 0.1 -2.0 0.1 -1.0 -0.2 -0.8 -6.1 D -2.4 -5.4 1.5 2.0 -2.2 -2.3 -1.3 -2.9 -4.5 C -0.4 -2.5 0.0 -1.6 -1.2 1.1 1.3 -1.4 0.7 RLLDDTPEV GLLGNVSTV ALAKAAAAL Q -1.9 -4.0 -1.8 0.5 0.4 -1.3 -0.3 0.4 -0.8 E -2.7 -4.7 -3.3 0.8 -0.5 -1.4 -1.3 0.1 -2.5 G 0.3 -3.7 0.4 2.0 1.9 -0.2 -1.4 -0.4 -4.0 H I L K M F -1.1 1.0 0.3 0.0 1.4 1.2 -6.3 1.0 5.1 -3.7 3.1 -4.2 0.5 -1.0 0.3 -2.5 1.2 1.0 -3.3 0.1 -1.7 -1.0 -2.2 -1.6 1.2 -2.2 -0.5 -1.3 -2.2 1.7 -1.0 1.8 0.8 -1.9 0.2 1.0 2.1 0.6 0.7 -5.0 1.1 0.9 0.2 -0.0 1.1 -0.5 -0.5 0.7 -2.6 0.9 2.8 -3.0 -1.8 -1.4 11.9 84nM 14.7 23nM 4.3 309nM P -2.7 -4.3 -0.1 1.7 1.2 -0.4 1.3 -0.3 -6.2 S 1.4 -4.2 -0.3 -0.6 -2.5 -0.6 -0.5 0.8 -1.9 T -1.2 -0.2 -0.5 -0.2 -0.1 0.4 -0.9 0.8 -1.6 W -2.0 -5.9 3.4 1.3 1.7 -0.5 2.9 -0.7 -4.9 Y V 1.1 0.7 -3.8 0.4 1.6 0.0 -6.8 -0.7 1.5 1.0 -0.0 2.1 -0.4 0.5 1.3 -1.1 -1.6 4.5 Which peptide is most likely to bind? Which peptide second? Example from real life • 10 peptides from MHCpep database • Bind to the MHC complex • Relevant for immune system recognition • Estimate sequence motif and weight matrix • Evaluate motif “correctness” on 528 peptides l l l l l l l l l l ALAKAAAAM ALAKAAAAN ALAKAAAAR ALAKAAAAT ALAKAAAAV GMNERPILT GILGFVFTM TLNAWVKVV KLNEPVLLL AVVPFIVSV Prediction accuracy Measured affinity Pearson correlation 0.45 Prediction score Predictive performance Higher order sequence correlations • Neural networks can learn higher order correlations! – What does this mean? Say that the peptide needs one and only one large amino acid in the positions P3 and P4 to fill the binding cleft How would you formulate this to test if a peptide can bind? S S => 0 L S => 1 S L => 1 L L => 0 No linear function can learn this (XOR) pattern Mutual information 313 binding peptides 313 random peptides Sequence encoding (continued) • Sparse encoding V:0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 L:0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 V.L=0 (unrelated) • Blosum encoding V: 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 L:-1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 -2 2 0 -3 -2 -1 -2 -1 1 R:-1 5 0 -2 -3 1 0 -2 0 -3 -2 2 -1 -3 -2 -1 -1 -3 -2 -3 V.L = 0.88 (highly related) V.R = -0.08 (close to unrelated) Evaluation of prediction accuracy Network ensembles • No one single network with a particular architecture and sequence encoding scheme, will constantly perform the best • Also for Neural network predictions will enlightened despotism fail – For some peptides, BLOSUM encoding with a four neuron hidden layer can best predict the peptide/MHC binding, for other peptides a sparse encoded network with zero hidden neurons performs the best – Wisdom of the Crowd • Never use just one neural network • Use Network ensembles Evaluation of prediction accuracy ENS: Ensemble of neural networks trained using sparse, Blosum, and hidden Markov model sequence encoding NetMHC-3.0 update • IEDB + more proprietary data Higher accuracy for existing ANNs More Human alleles Non human alleles (Mice + Primates) Prediction of 8mer binding peptides for some alleles • Prediction of 10- and 11mer peptides for all alleles • Outputs to spread sheet • • • • Prediction of 10- and 11mers using 9mer prediction tools • Approach: • For each peptide of length L create 6 pseudo peptides deleting a sliding window of L- 9 always keeping pos. 1,2,3, and 9 • Example: • • • • • • MLPQWESNTL = MLPWESNTL MLPQESNTL MLPQWSNTL MLPQWENTL MLPQWESTL MLPQWESNL Prediction of 10- and 11mers using 9mer prediction tools Prediction of 10- and 11mers using 9mer prediction tools • Final prediction = average of the 6 log scores: • (0.477+0.405+0.564+0.505+0.559+0.521)/6 • = 0.505 • Affinity: • Exp(log(50000)*(1 - 0.505)) = 211.5 nM Prediction using ANN trained on 10mer peptides Prediction of 10- and 11mers using 9mer prediction tools Cellular Immunity Proteasome specificity • Low polymorphism – Constitutive & Immunoproteasome • Evolutionary conserved • Stochastic and low specificity – Only 70-80% of the cleavage sites are reproduced in repeated experiments Proteasome specificity • NetChop is one of the best available cleavage method – www.cbs.dtu.dk/services/NetChop-3.0 Predicting TAP affinity 9 meric peptides >9 meric ILRGTSFVYV -0.11 + 0.09 - 0.42 - 0.3 = -0.74 Peters et el., 2003. JI, 171: 1741. Integration? Integrating all three steps (protesaomal cleavage, TAP transport and MHC binding) should lead to improved identification of peptides capable of eliciting CTL responses Identifying CTL epitopes HLA affinity 1 2 3 4 5 6 7 8 9 ... EBN3_EBV EBN3_EBV EBN3_EBV EBN3_EBV EBN3_EBV EBN3_EBV EBN3_EBV EBN3_EBV EBN3_EBV YQAYSSWMY QSDETATSH PVSPAVNQY AYSSWMYSY LAAGWPMGY IVQSCNPRY FLQRTDLSY YTDHQTTPT GTDVVQHQL 2.56 2.22 1.55 1.31 1.02 0.99 0.94 1.15 0.96 1.00 0.01 0.01 0.34 1.00 0.10 0.46 1.00 0.01 Proteasomal cleavage 0.03 0.28 0.97 0.99 0.97 0.97 0.99 0.01 0.02 0.34 0.88 0.01 0.02 0.22 0.50 0.02 0.42 0.03 0.99 0.04 0.22 0.01 0.01 0.05 0.82 0.02 0.99 0.02 0.83 0.21 0.75 0.18 0.01 0.07 0.04 1.00 0.01 0.51 1.00 0.94 0.01 0.01 0.01 0.01 0.02 0.75 0.30 0.02 0.92 0.06 0.01 0.63 0.02 0.46 TAP affinity 0.94 0.11 0.04 0.09 0.01 0.02 0.01 0.54 0.30 0.92 2.97 0 2.80 0.99 -0.80 0 2.28 1.00 2.63 0 1.78 1.00 3.28 1 1.58 1.00 3.01 0 1.27 0.93 3.19 0 1.24 0.96 2.79 0 1.18 0.14 -0.87 0 1.12 1.00 0.53 0 1.09 Large scale method validation HIV A3 epitope predictions Case I: SARS Sylvester-Hvid et al, Tissue Antigens. 2004 Sars virus HLA ligands 75% of predicted peptides were binding with an IC50 <500 nM Case II: Discovery of conserved Class I epitopes in Human Influenza Virus H1N1 Wang et al., Vaccine 2007 Pox Strategy Influenza • We selected the Influenza peptides with the top 15 combined scores with conservation p9 > 70% for each pf the 12 supertypes. • 180 peptides selected • 167 tested for binding and CTL response • 89 (53%) of the influenza peptides tested have an affinity better than 500nM Donors •35 normal healthy blood donors •35-65 years old •Expected to have had influenza more than 3 times •HLA typed by SBT for HLA A and B ELISPOT assay •Measure number of white blood cells that in vitro produce interferon-g in response to a peptide •A positive result means that the immune system have earlier reacted to the peptide (during a response of a vaccine/natural infection) FLDVMESM FLDVMESM FLDVMESM FLDVMESM FLDVMESM FLDVMESM Two spots Peptides positive in ELISPOT assay Conservation of epitopes • Number of 9mers 100% conserved: • 10/12 conserved in Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) • 11/12 conserved in Influenza A virus (A/chicken/Jilin/9/2004(H5N1)) EpiSelect Top Scoring Peptides Select peptide with maximal coverage Genotype 1 Genotype 2 Genotype 3 Genotype 4 Select peptide with maximal coverage preferring uncovered strains Genotype 5 Genotype 6 Select peptide with Repeat until the desired maximal coverage number of peptides is preferring lowest selected covered strains HCV Results - B7 Peptides Peptide Genome Predicted affinity (nM) Coverage 5 5 4 Genotype 1 QPRGRRQPI 3 Genotype 2 SPRGSRPSW 43 4 2 Genotype 3 DPRRRSRNL* 66 3 RARAVRAKL 6 3 TPAETTVRL* 38 3 3 Genotype 4 3 Genotype 5 3 Genotype 6 * Verified B7 supertype restricted CD8+ epitope in the Los Alamos HCV epitope database Ongoing work • • • Selection of epitopes covering host (HLA) and pathogen variability Selection of diagnostic peptides in TB Predict cross reactivity (T and B cell) – • • • Applications in epitope prediction, autoimmune diseases, transplantation Virulence factor discovery by comparative genomics Function-antigenecity studies Bioinformatics immune system simulation