Immunological Bioinformatics The Immunological Bioinformatics group •Immunological Bioinformatics group, CBS, Technical University of Denmark (www.cbs.dtu.dk) •Ole Lund, Group Leader • Morten Nielsen, Associate Professor •
Download
Report
Transcript Immunological Bioinformatics The Immunological Bioinformatics group •Immunological Bioinformatics group, CBS, Technical University of Denmark (www.cbs.dtu.dk) •Ole Lund, Group Leader • Morten Nielsen, Associate Professor •
Immunological Bioinformatics
The Immunological Bioinformatics group
•Immunological Bioinformatics group, CBS,
Technical University of Denmark (www.cbs.dtu.dk)
•Ole Lund, Group Leader
• Morten Nielsen, Associate Professor
• Claus Lundegaard , Associate Professor
• Jean Vennestrøm, post doc.
• Thomas Blicher (50%), post doc.
• Mette Voldby Larsen, PhD student
• Pernille Haste Andersen, PhD student
• Sune Frankild, PhD student
• Sheila Tang, PhD student
• Thomas Rask (50%), PhD student
• Nicolas Rapin , PhD student
• Ilka Hoff , PhD student
•Jorid Sørli, PhD student
• Hao Zhang, PhD student
•MSc students
•Collaborators
•IMMI, University of Copenhagen
•
Søren Buus
MHC binding
•
Mogens H Claesson
Elispot Assay
•La Jolla Institute of Allergy and Infectious Diseases
•
Allesandro Sette
Epitope database
•
Bjoern Peters
•Leiden University Medical Center
•
Tom Ottenhoff
Tuberculosis
•
Michel Klein
•Ganymed
•
Ugur Sahin
Genetic library
•University of Tubingen
•
Stefan Stevanovic
MHC ligands
•INSERM
•
Peter van Endert
Tap binding
•University of Mainz
•
Hansjörg Schild
Proteasome
•Schafer-Nielsen
•
Claus Schafer-Nielsen
Peptide synthesis
•ImmunoGrid
•
Elda Rossi
Simulation of the
•
Vladimir Brusic
Immune system
•University of Utrectht
•
Can Kesmir
Ideas
Figure 1-20
Effectiveness of vaccines
1958 start of small pox
eradication program
The Immune System
• The innate immune system
• The adaptive immune system
The innate immune system
•
•
•
•
•
Unspecific
Antigen independent
Immediate response
No training/selection hence no memory
Pathogen independent (but response might
be pathogen type dependent)
The adaptive immune system
• Pathogen specific
– Humoral
Parasite
– Cellular
http://tpeeaupotable.ifrance.com/ma%20photo/bilharzoze.jpg
Virus
http://en.wikipedia.org/wiki/Image:Aids_virus.jpg
Bacteria
http://www.uni-heidelberg.de/zentral/ztl/grafiken_bilder/bilder/e-coli.jpg
Adaptive immune response
• Signal induced
– Pathogens
• Antigens
– Epitopes
B Cell
T Cell
Humoral immunity
Cartoon by Eric Reits
Antibody - Antigen interaction
Antigen
The antibody recognizes
structural properties of the
surface of the antigen
Fab
Epitope
Paratope
Antibody
Cellular Immunity
MHC class I with peptide
Anchor positions
HLA specificity clustering
A0201
A0101
A6802
B0702
Prediction of HLA binding specificity
Historical overview
• Simple Motifs
– Allowed/non allowed amino acids
• Extended motifs
– Amino acid preferences (SYFPEITHI)
– Anchor/Preferred/other amino acids
• Hidden Markov models
– Peptide statistics from sequence alignment
• SVMs and neural networks
– Can take sequence correlations into account
Sequence information
SLLPAIVEL
LLDVPTAAV
HLIDYLVTS
ILFGHENRV
LERPGGNEI
PLDGEYFTL
ILGFVFTLT
KLVALGINA
KTWGQYWQV
SLLAPGAKQ
ILTVILGVL
TGAPVTYST
GAGIGVAVL
KARDPHSGH
AVFDRKSDA
GLCTLVAML
VLHDDLLEA
ISNDVCAQV
YTAFTIPSI
NMFTPYIGV
VVLGVVFGI
GLYDGMEHL
EAAGIGILT
YLSTAFARV
FLDEFMEGV
AAGIGILTV
AAGIGILTV
YLLPAIVHI
VLFRGGPRG
ILAPPVVKL
ILMEHIHKL
ALSNLEVKL
GVLVGVALI
LLFGYPVYV
DLMGYIPLV
TITDQVPFS
KIFGSLAFL
KVLEYVIKV
VIYQYMDDL
IAGIGILAI
KACDPHSGH
LLDFVRFMG
FIDSYICQV
LMWITQCFL
VKTDGNPPE
RLMKQDFSV
LMIIPLINV
ILHNGAYSL
KMVELVHFL
TLDSQVMSL
YLLEMLWRL
ALQPGTALL
FLPSDFFPS
FLPSDFFPS
TLWVDPYEV
MVDGTLLLL
ALFPQLVIL
ILDQKINEV
ALNELLQHV
RTLDKVLEV
GLSPTVWLS
RLVTLKDIV
AFHHVAREL
ELVSEFSRM
FLWGPRALV
VLPDVFIRC
LIVIGILIL
ACDPHSGHF
VLVKSPNHV
IISAVVGIL
SLLMWITQC
SVYDFFVWL
RLPRIFCSC
TLFIGSHVV
MIMVKCWMI
YLQLVFGIE
STPPPGTRV
SLDDYNHLV
VLDGLDVLL
SVRDRLARL
AAGIGILTV
GLVPFLVSV
YMNGTMSQV
GILGFVFTL
SLAGGIIGV
DLERKVESL
HLSTAFARV
WLSLLVPFV
MLLAVLYCL
YLNKIQNSL
KLTPLCVTL
GLSRYVARL
VLPDVFIRC
LAGIGLIAA
SLYNTVATL
GLAPPQHLI
VMAGVGSPY
QLSLLMWIT
FLYGALLLA
FLWGPRAYA
SLVIVTTFV
MLGTHTMEV
MLMAQEALA
KVAELVHFL
RTLDKVLEV
SLYSFPEPE
SLREWLLRI
FLPSDFFPS
KLLEPVLLL
MLLSVPLLL
STNRQSGRQ
LLIENVASL
FLGENISNF
RLDSYVRSL
FLPSDFFPS
AAGIGILTV
MMRKLAILS
VLYRYGSFS
FLLTRILTI
AVGIGIAVV
VDGIGILTI
RGPGRAFVT
LLGRNSFEV
LLWTLVVLL
LLGATCMFV
VLFSSDFRI
RLLQETELV
VLQWASLAV
MLGTHTMEV
LMAQEALAF
IMIGVLVGV
GLPVEYLQV
ALYVDSLFF
LLSAWILTA
AAGIGILTV
LLDVPTAAV
SLLGLLVEV
GLDVLTAKV
FLLWATAEA
ALSDHHIYL
YMNGTMSQV
CLGGLLTMV
YLEPGPVTA
AIMDKNIIL
YIGEVLVSV
HLGNVKYLV
LVVLGLLAV
GAGIGVLTA
NLVPMVATV
PLTFGWCYK
SVRDRLARL
RLTRFLSRV
LMWAKIGPV
SLFEGIDFY
ILAKFLHWL
SLADTNSLA
VYDGREHTV
ALCRWGLLL
KLIANNTRV
SLLQHLIGL
AAGIGILTV
FLWGPRALV
LLDVPTAAV
ALLPPINIL
RILGAVAKV
SLPDFGISY
GLSEFTEYL
GILGFVFTL
FIAGNSAYE
LLDGTATLR
IMDKNIILK
CINGVCWTV
GIAGGLALL
ALGLGLLPV
AAGIGIIQI
GLHCYEQLV
VLEWRFDSR
LLMDCSGSI
YMDGTMSQV
SLLLELEEV
SLDQSVVEL
STAPPHVNV
LLWAARPRL
YLSGANLNL
LLFAGVQCQ
FIYAGSLSA
ELTLGEFLK
AVPDEIPPL
ETVSEQSNV
LLDVPTAAV
TLIKIQHTL
QVCERIPTI
KKREEAPSL
STAPPAHGV
ILKEPVHGV
KLGEFYNQM
ITDQVPFSV
SMVGNWAKV
VMNILLQYV
GLQDCTMLV
GIGIGVLAA
QAGIGILLA
PLKQHFQIV
TLNAWVKVV
CLTSTVQLV
FLTPKKLQC
SLSRFSWGA
RLNMFTPYI
LLLLTVLTV
GVALQTMKQ
RMFPNAPYL
VLLCESTAV
KLVANNTRL
MINAYLDKL
FAYDGKDYI
ITLWQRPLV
Scoring a sequence to a weight matrix
• Score sequences to weight matrix by looking up
and adding L values from the matrix
1
2
3
4
5
6
7
8
9
A
0.6
-1.6
0.2
-0.1
-1.6
-0.7
1.1
-2.2
-0.2
R
0.4
-6.6
-1.3
-0.1
-0.1
-1.4
-3.8
1.0
-3.5
N
-3.5
-6.5
0.1
-2.0
0.1
-1.0
-0.2
-0.8
-6.1
D
-2.4
-5.4
1.5
2.0
-2.2
-2.3
-1.3
-2.9
-4.5
C
-0.4
-2.5
0.0
-1.6
-1.2
1.1
1.3
-1.4
0.7
RLLDDTPEV
GLLGNVSTV
ALAKAAAAL
Q
-1.9
-4.0
-1.8
0.5
0.4
-1.3
-0.3
0.4
-0.8
E
-2.7
-4.7
-3.3
0.8
-0.5
-1.4
-1.3
0.1
-2.5
G
0.3
-3.7
0.4
2.0
1.9
-0.2
-1.4
-0.4
-4.0
H
I
L
K
M
F
-1.1 1.0 0.3 0.0 1.4 1.2
-6.3 1.0 5.1 -3.7 3.1 -4.2
0.5 -1.0 0.3 -2.5 1.2 1.0
-3.3 0.1 -1.7 -1.0 -2.2 -1.6
1.2 -2.2 -0.5 -1.3 -2.2 1.7
-1.0 1.8 0.8 -1.9 0.2 1.0
2.1 0.6 0.7 -5.0 1.1 0.9
0.2 -0.0 1.1 -0.5 -0.5 0.7
-2.6 0.9 2.8 -3.0 -1.8 -1.4
11.9 84nM
14.7 23nM
4.3 309nM
P
-2.7
-4.3
-0.1
1.7
1.2
-0.4
1.3
-0.3
-6.2
S
1.4
-4.2
-0.3
-0.6
-2.5
-0.6
-0.5
0.8
-1.9
T
-1.2
-0.2
-0.5
-0.2
-0.1
0.4
-0.9
0.8
-1.6
W
-2.0
-5.9
3.4
1.3
1.7
-0.5
2.9
-0.7
-4.9
Y
V
1.1 0.7
-3.8 0.4
1.6 0.0
-6.8 -0.7
1.5 1.0
-0.0 2.1
-0.4 0.5
1.3 -1.1
-1.6 4.5
Which peptide is most
likely to bind?
Which peptide second?
Example from real life
• 10 peptides from MHCpep
database
• Bind to the MHC complex
• Relevant for immune
system recognition
• Estimate sequence motif
and weight matrix
• Evaluate motif
“correctness” on 528
peptides
l
l
l
l
l
l
l
l
l
l
ALAKAAAAM
ALAKAAAAN
ALAKAAAAR
ALAKAAAAT
ALAKAAAAV
GMNERPILT
GILGFVFTM
TLNAWVKVV
KLNEPVLLL
AVVPFIVSV
Prediction accuracy
Measured affinity
Pearson correlation 0.45
Prediction score
Predictive performance
Higher order sequence correlations
•
Neural networks can learn higher order correlations!
– What does this mean?
Say that the peptide needs one and only
one large amino acid in the positions P3
and P4 to fill the binding cleft
How would you formulate this to test if
a peptide can bind?
S S => 0
L S => 1
S L => 1
L L => 0
No linear
function can
learn this (XOR)
pattern
Mutual information
313 binding peptides
313 random peptides
Sequence encoding (continued)
• Sparse encoding
V:0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
L:0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
V.L=0 (unrelated)
• Blosum encoding
V: 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4
L:-1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 -2 2 0 -3 -2 -1 -2 -1 1
R:-1 5 0 -2 -3 1 0 -2 0 -3 -2 2 -1 -3 -2 -1 -1 -3 -2 -3
V.L = 0.88 (highly related)
V.R = -0.08 (close to unrelated)
Evaluation of prediction accuracy
Network ensembles
• No one single network with a particular
architecture and sequence encoding scheme,
will constantly perform the best
• Also for Neural network predictions will
enlightened despotism fail
– For some peptides, BLOSUM encoding with a four
neuron hidden layer can best predict the
peptide/MHC binding, for other peptides a sparse
encoded network with zero hidden neurons performs
the best
– Wisdom of the Crowd
• Never use just one neural network
• Use Network ensembles
Evaluation of prediction accuracy
ENS: Ensemble of neural networks trained using sparse,
Blosum, and hidden Markov model sequence encoding
NetMHC-3.0 update
• IEDB + more proprietary data
Higher accuracy for existing ANNs
More Human alleles
Non human alleles (Mice + Primates)
Prediction of 8mer binding peptides for some
alleles
• Prediction of 10- and 11mer peptides for all
alleles
• Outputs to spread sheet
•
•
•
•
Prediction of 10- and 11mers
using 9mer prediction tools
• Approach:
•
For each peptide of length L create 6
pseudo peptides deleting a sliding window
of L- 9 always keeping pos. 1,2,3, and 9
• Example:
•
•
•
•
•
•
MLPQWESNTL
=
MLPWESNTL
MLPQESNTL
MLPQWSNTL
MLPQWENTL
MLPQWESTL
MLPQWESNL
Prediction of 10- and 11mers
using 9mer prediction tools
Prediction of 10- and 11mers
using 9mer prediction tools
• Final prediction = average of the 6 log scores:
• (0.477+0.405+0.564+0.505+0.559+0.521)/6
•
= 0.505
• Affinity:
• Exp(log(50000)*(1 - 0.505))
= 211.5 nM
Prediction using ANN trained on
10mer peptides
Prediction of 10- and 11mers
using 9mer prediction tools
Cellular Immunity
Proteasome specificity
• Low polymorphism
– Constitutive & Immunoproteasome
• Evolutionary conserved
• Stochastic and low
specificity
– Only 70-80% of the
cleavage sites are
reproduced in repeated
experiments
Proteasome specificity
• NetChop is one of the best available cleavage method
– www.cbs.dtu.dk/services/NetChop-3.0
Predicting TAP affinity
9 meric peptides
>9 meric
ILRGTSFVYV
-0.11 + 0.09 - 0.42 - 0.3 = -0.74
Peters et el., 2003. JI, 171: 1741.
Integration?
Integrating all three steps
(protesaomal cleavage, TAP
transport and MHC
binding) should lead to
improved identification of
peptides capable of
eliciting CTL responses
Identifying CTL epitopes
HLA affinity
1
2
3
4
5
6
7
8
9
...
EBN3_EBV
EBN3_EBV
EBN3_EBV
EBN3_EBV
EBN3_EBV
EBN3_EBV
EBN3_EBV
EBN3_EBV
EBN3_EBV
YQAYSSWMY
QSDETATSH
PVSPAVNQY
AYSSWMYSY
LAAGWPMGY
IVQSCNPRY
FLQRTDLSY
YTDHQTTPT
GTDVVQHQL
2.56
2.22
1.55
1.31
1.02
0.99
0.94
1.15
0.96
1.00
0.01
0.01
0.34
1.00
0.10
0.46
1.00
0.01
Proteasomal cleavage
0.03
0.28
0.97
0.99
0.97
0.97
0.99
0.01
0.02
0.34
0.88
0.01
0.02
0.22
0.50
0.02
0.42
0.03
0.99
0.04
0.22
0.01
0.01
0.05
0.82
0.02
0.99
0.02
0.83
0.21
0.75
0.18
0.01
0.07
0.04
1.00
0.01
0.51
1.00
0.94
0.01
0.01
0.01
0.01
0.02
0.75
0.30
0.02
0.92
0.06
0.01
0.63
0.02
0.46
TAP affinity
0.94
0.11
0.04
0.09
0.01
0.02
0.01
0.54
0.30
0.92 2.97 0 2.80
0.99 -0.80 0 2.28
1.00 2.63 0 1.78
1.00 3.28 1 1.58
1.00 3.01 0 1.27
0.93 3.19 0 1.24
0.96 2.79 0 1.18
0.14 -0.87 0 1.12
1.00 0.53 0 1.09
Large scale method validation
HIV A3 epitope predictions
Case I:
SARS
Sylvester-Hvid et al, Tissue
Antigens. 2004
Sars virus HLA ligands
75% of predicted peptides were binding with an IC50 <500 nM
Case II:
Discovery of conserved Class I
epitopes in Human Influenza Virus
H1N1
Wang et al., Vaccine 2007
Pox Strategy
Influenza
• We selected the Influenza peptides with the top 15
combined scores with conservation p9 > 70% for each
pf the 12 supertypes.
• 180 peptides selected
• 167 tested for binding and CTL response
• 89 (53%) of the influenza peptides tested have an
affinity better than 500nM
Donors
•35 normal healthy blood donors
•35-65 years old
•Expected to have had influenza more than 3 times
•HLA typed by SBT for HLA A and B
ELISPOT assay
•Measure number of white blood cells that in vitro
produce interferon-g in response to a peptide
•A positive result means that the immune system have
earlier reacted to the peptide (during a response of a
vaccine/natural infection)
FLDVMESM
FLDVMESM
FLDVMESM
FLDVMESM
FLDVMESM
FLDVMESM
Two spots
Peptides positive in ELISPOT assay
Conservation of epitopes
• Number of 9mers 100% conserved:
• 10/12 conserved in Influenza A virus
(A/Goose/Guangdong/1/96(H5N1))
• 11/12 conserved in Influenza A virus
(A/chicken/Jilin/9/2004(H5N1))
EpiSelect
Top Scoring Peptides
Select peptide with
maximal coverage
Genotype 1
Genotype 2
Genotype 3
Genotype 4
Select peptide with
maximal coverage
preferring uncovered
strains
Genotype 5
Genotype 6
Select peptide with
Repeat until the desired
maximal coverage
number of peptides is
preferring lowest
selected
covered strains
HCV Results - B7
Peptides
Peptide
Genome
Predicted
affinity (nM) Coverage
5
5
4
Genotype 1
QPRGRRQPI
3
Genotype 2
SPRGSRPSW
43
4
2
Genotype 3
DPRRRSRNL*
66
3
RARAVRAKL
6
3
TPAETTVRL*
38
3
3
Genotype 4
3
Genotype 5
3
Genotype 6
* Verified B7 supertype restricted CD8+ epitope in the Los Alamos HCV epitope database
Ongoing work
•
•
•
Selection of epitopes covering host
(HLA) and pathogen variability
Selection of diagnostic peptides in TB
Predict cross reactivity (T and B cell)
–
•
•
•
Applications in epitope prediction,
autoimmune diseases, transplantation
Virulence factor discovery by
comparative genomics
Function-antigenecity studies
Bioinformatics immune system simulation