Transcript Document
Ocena przydatności modeli Markowa do różnych zastosowań w bioinformatyce Jacek Leluk Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego Uniwersytet Warszawski Jacek Leluk Jacek Leluk, Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawski Modele Markowa w identyfikacji i lokalizacji sekwencji kodujących w genomie Jacek Leluk Jacek Leluk, Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawski Identyfikacja regionów kodujących w genomie Metody oparte na wzorcowym DNA kodującym Metody niezależne od wzorcowego DNA kodującego wykorzystujące: występowanie oligonukleotydów Używanie kodonu Używanie aminokwasu Preferencje kodonów Używanie heksamerów tendencje w obsadzeniu pozycji kodonu wykorzystujące: zależności w obsadzeniu sąsiadujących pozycji okresową korelację między pozycjami tendencje w obsadzeniu pozycji kodonu nukleotydów Prototyp kodonu Modele Markowa Asymetria pozycji Indeks okresowej asymetrii Średnia informacja względna Widma Fouriera Jacek Leluk, Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University Metody wymagające wzorcowego DNA kodującego Tendencje w obsadzeniu kolejnych sąsiadujących pozycji Modele Markowa (Markov Models) W modelach Markowa prawdopodobieństwo wystąpienia danego nukleotydu w określonej pozycji kodonu zależy od rodzaju nukletydu(-ów) bezpośrednio poprzedzającego (-ych) w sekwencji. Najprostszym przykładem jest model Markowa I rzędu. Model Markowa I rzędu oparty jest na prawdopodobieństwie napotkania każdego z 4 nukletydów w każdej z trzech pozycji kodonu, uwzględniającym zależność od rodzaju nukleotydu, który tę pozycje poprzedza. W metodzie tej wykorzystuje się trzy 4x4 macierze tranzycji (F1, F2 i F3), z których każda odnosi się do każdej z trzech pozycji kodonu. Stosowane są modele Markowa rzędu od 1 do 5. Jacek Leluk, Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University Genetic conditioning of the amino acid replacement probabilities and spectrum in molecular evolution Jacek Leluk, Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University Do the amino acids possess their pedigree ? or... Do they contain the information about their history (genealogy)? or... Can the amino acid mutational replacements described as Markovian processes ? Jacek Leluk, Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University The Markov model assumes that the substitution probability of amino acid AA1 by AA2 is the same, regardless of what the initial residue AA1 was transformed from (AAx, AAy) AAx AAy AA1 AA1 Pa Pb AA2 AA2 Pa = Pb The currently used statistical algorithms are based on Markovian model of the amino acid replacement (they directly use stochastic matrices of replacement frequency indices) Jacek Leluk, Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University PAM250 matrix of amino acid replacements C S T P A G N D E Q H R K M I L V F Y W 12 0 -2 -3 -2 -3 -4 -5 -5 -5 -3 -4 -5 -5 -2 -6 -2 -4 0 -8 C 2 1 1 1 1 1 0 0 -1 -1 0 0 -2 -1 -3 -1 -3 -3 -2 S 3 0 1 0 0 0 0 -1 -1 -1 0 -1 0 -2 0 -3 -3 -5 T Why tryptophane is here the most conservative residue? 6 1 -1 -1 -1 -1 0 0 0 -1 -2 -2 -3 -1 -5 -5 -6 P 2 1 0 0 0 0 -1 -2 -1 -1 -1 -2 0 -5 -3 -6 A 5 0 1 0 -1 -2 -3 -2 -3 -3 -4 -1 -5 -5 -7 G 2 2 1 1 2 0 1 -2 -2 -3 -2 -4 -2 -4 N 4 3 2 1 -1 0 -3 -2 -4 -2 -6 -4 -7 D 4 2 1 -1 0 -2 -2 -3 -2 -5 -4 -7 E 4 3 1 1 -1 -2 -2 -2 -5 -4 -5 Q 6 2 0 -2 -2 -2 -2 -2 0 -3 H 6 3 0 -2 -3 -2 -4 -4 2 R 5 0 6 -2 2 5 -3 4 2 6 -2 2 4 2 4 -5 0 1 2 -1 -4 -2 -1 -1 -2 -3 -4 -5 -2 -6 K M I L V Jacek Leluk, Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University 9 7 0 F 10 0 17 Y W BLOSUM62 matrix of amino acid replacements A R N D C Q E G H I L K M F P S T W Y V 4 -1 -2 -2 0 -1 -1 0 -2 -1 -1 -1 -1 -2 -1 1 0 -3 -2 0 A 5 0 -2 -3 1 0 -2 0 -3 -2 2 -1 -3 -2 -1 -1 -3 -2 -3 R 6 1 -3 0 0 0 1 -3 -3 0 -2 -3 -2 1 0 -4 -2 -3 N 6 -3 0 2 -1 -1 -3 -4 -1 -3 -3 -1 0 -1 -4 -3 -3 D 9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1 C 5 2 -2 0 -3 -2 1 0 -3 -1 0 -1 -2 -1 -2 Q 5 -2 0 -3 -3 1 -2 -3 -1 0 -1 -3 -2 -2 E 6 -2 -4 -4 -2 -3 -3 -2 0 -2 -2 -3 -3 G 8 -3 -3 -1 -2 -1 -2 -1 -2 -2 2 -3 H 4 2 -3 1 0 -3 -2 -1 -3 -1 3 I 4 -2 2 0 -3 -2 -1 -2 -1 1 L 5 -1 -3 -1 0 -1 -3 -2 -2 K 5 0 -2 -1 -1 -1 -1 1 M 6 -4 -2 -2 1 3 -1 F 7 -1 -1 -4 -3 -2 P 4 1 5 -3 -2 11 -2 -2 2 7 -2 0 -3 -1 4 S T W Y V Jacek Leluk, Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University Replacemant Arg Lys according to the statistical interpretation using stochastical matrix indices Arg PAM250 3 BLOSUM62 2 BLOSUM35 2 BLOSUM45 3 BLOSUM100 3 Lys Jacek Leluk, Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University Diagram of genetic relationships between amino acids K E K E N D R H D G Y H Y – R 1 R G S R G S 3 2 – Q N AGCU – Q T R G A T P T P T A S P L V S L L V I S P V I C S A M C R A I W L L V F L Jacek Leluk, Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University F Diagram Diagram of of amino codon acid genetic genetic relationships relationships K AAA E GAA K AAG E GAG N AAC R AGA 1 D GAC D GAU T ACA T ACG I AUA M AUG S UCU L UUA L CUG V GUC I AUU S UCC P CCU L CUA V GUG I AUC S UCG P CCC A GCU V GUA C UGU S UCA P CCG A GCC T ACU C UGC R CGU P CCA A GCG T ACC W UGG R CGC G GGU A GCA Y UAU – UGA R CGG G GGC S AGU 3 Y UAC H CAU R CGA G GGG S AGC 2 H CAC G GGA R AGG – UAG Q CAG N AAU AGCU – UAA Q CAA L UUG L CUC V GUU F UUC L CUU Jacek Leluk, Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University F UUU Arginine-to-lysine mutational conversion pathways for arginines of different origin Met Arg Lys AUG AGG AAG His Asn CAC AAC ? Arg Lys AGC AGG AAG Arg Gln CGG CAG Pro Arg Ser CCC CGC Jacek Leluk, Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University Possible single-point-mutational processing of serine with respect to its origin Trp Asn UGG AAU Ser Ser UCG AGU Thr Ala Pro Thr Ile Asn Ser Trp Leu Ser Arg Cys (UAG) Gly Jacek Leluk, Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University Amino acid mutational substitution based on the single transition/transversion is NOT the Markovian process Theoretical proof The conversion pathway of arginine into lysine, glutamine and serine for arginine resulting from the processing of the codons encoding different amino acids Possible codons for arginine: AGA AGG CGA CGG CGC CGT Jacek Leluk, Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University Conversion of arginine into lysine Met Arg Lys ATG AGG AAG Gln Leu Arg CTR CGR CAR Lys Arg AAR AGR Ser His Arg CAY CGY AGY Arg Arg Lys AGR AAR CGR Jacek Leluk, Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University Conversion of arginine into serine Met Arg Ser ATG AGG AGY Arg Leu Arg CTR CGR AGR Ser Arg AGY CGY His Arg Ser CAY CGY AGY Jacek Leluk, Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University Conversion of arginine into glutamine Lys Met Arg ATG AGG AAG Gln Arg CAG CGG Leu Arg Gln CTR CGR CAR His His Arg CAY CGY CAY Gln Arg CAR CGR Jacek Leluk, Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University then... Probability of the replacement of one amino acid into another depends significantly on what amino acids occupied that position in the past There is a high risk, that commonly used algorithms applying the stochastic data matrices (MDM, PAM, BLOSUM) lead to the wrong interpretation of mutational processes occurring in proteins Jacek Leluk, Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University Genetic relationhips between Arg and Met/Gln K Q E K Q E N D N AGCU 1 R D R H – G S R A P T P T A S P L V S L L V I S P V I C S A M C R A I W G T Y R G T Y R S 2 – H G 3 – L L V F L Jacek Leluk, Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University F Arg-Met and Arg-Gln substitutions. „Two kinds” of arginine Inhibitory z roślin dyniowatych 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. RVMIG RVMIGS C P RKL I [LW][Y] MNK REKQP C KSQT [KSRHQTY][V] DN RSDA D C LFMP ALTPGR DEGQK C VITKR C LKGQMV PKEQRSA NHEQDS– [I][D]– GE YFIH C G * * # # # Inhibitory typu Bowmana-Birk 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78. C C DRBSN QHELZRSIFTK # C ASTKEMILRDVPF * C T [KR][A] S NMIEKRDQ *# P P QKZETI C [RHQS][V] # C STNVAEHR DZBN MILVTR * R L NDE SKTR C H S A C KSDEN SLGRTFH C 79. IAVLM 80. C 81. ATNR 82. LYFRK 83. S 84. YIEFMQDN 85. P 86. AGP 87. QKZM 88. C 89. FVRIHSQ 90. C 91. VTBGLAYF 92. DB 93. [IMTV][Q] 94. TNBKAHD 95. DBNKT 96. FSY 97. C 98. [YH][T] 99. EAKPD 100. PSAK 101. C Domeny owomukoidu (typ Kazala ) 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. # 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. VILE NDH C [STR][D] LPKQE YF ALPKQ SQTK GTRS– IVKNT GVSTL KRTQ– DGN– G– TNRKE– STLAQP WMLIV– VTI [A][R]– C PT [RM][F] [NI][E] [L][Y] KSLQDV [P][E] [V][H] C GA TS DN GS 33. SFV 34. T 35. Y 36. SDA 37. NS 38. [ED][R] 39. C 40. GSTF 41. ILF 42. C 43. [L][A][N] 44. [YH][A] # 45. NY 46. RAILV 47. EQ 48. HQLS 49. GHRN 50. ATR 51. [NHST][E] 52. VIL 53. ESKAGN 54. [K][L] * 55. ELKSRV 56. [YHS][K] 57. [DN][M] 58. GA 59. EKRA 60. C 61. RKE 62. PLQE 63. KERD 64. [ISV][H] 65. [VG][PT] 66. [MEK][PS] Jacek Leluk, Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University PAM250 matrix of amino acid replacements C S T P A G N D E Q H R K M I L V F Y W 12 0 -2 -3 -2 -3 -4 -5 -5 -5 -3 -4 -5 -5 -2 -6 -2 -4 0 -8 C 2 1 1 1 1 1 0 0 -1 -1 0 0 -2 -1 -3 -1 -3 -3 -2 S 3 0 1 0 0 0 0 -1 -1 -1 0 -1 0 -2 0 -3 -3 -5 T 6 1 -1 -1 -1 -1 0 0 0 -1 -2 -2 -3 -1 -5 -5 -6 P 2 1 0 0 0 0 -1 -2 -1 -1 -1 -2 0 -5 -3 -6 A 5 0 1 0 -1 -2 -3 -2 -3 -3 -4 -1 -5 -5 -7 G 2 2 1 1 2 0 1 -2 -2 -3 -2 -4 -2 -4 N 4 3 2 1 -1 0 -3 -2 -4 -2 -6 -4 -7 D 4 2 1 -1 0 -2 -2 -3 -2 -5 -4 -7 E 4 3 1 1 -1 -2 -2 -2 -5 -4 -5 Q 6 2 0 -2 -2 -2 -2 -2 0 -3 H 6 3 0 -2 -3 -2 -4 -4 2 R 5 0 6 -2 2 5 -3 4 2 6 -2 2 4 2 4 -5 0 1 2 -1 -4 -2 -1 -1 -2 -3 -4 -5 -2 -6 K M I L V Jacek Leluk, Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University 9 7 0 F 10 0 17 Y W PAM250 and BLOSUM62 scores for the replacements: Arg-Lys Lys-Gln Lys-Glu Arg-Gln and Arg-Glu Replacement PAM250 BLOSUM62 Arg/Lys 3 2 Lys/Gln 1 1 Arg/Gln 1 1 Lys/Glu 0 1 Arg/Glu -1 0 Jacek Leluk, Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University Genetic relationships among Arg, Lys, Glu and Gln K E K Q E N Q D N R AGCU 1 D R H – G S R A P T P T A S P L V S L L V I S P V I C S A M C R A I W G T Y R G T Y R S 2 – H G 3 – L L V F L Jacek Leluk, Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University F Arg-Glu and Lys-Glu substitutions (Arg/Lys/Gln/Glu replacements) Inhibitory z roślin dyniowatych 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. RVMIG RVMIGS C P RKL I [LW][Y] MNK REKQP C KSQT [KSRHQTY][V] DN RSDA D C LFMP ALTPGR DEGQK C VITKR C LKGQMV PKEQRSA NHEQDS– [I][D]– GE YFIH C G Inhibitory typu Bowmana-Birk 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78. C C DRBSN QHELZRSIFTK C ASTKEMILRDVPF C T [KR][A] S NMIEKRDQ P P QKZETI C [RHQS][V] C STNVAEHR ! DZBN MILVTR R L NDE SKTR C H S A C KSDEN SLGRTFH C 79. IAVLM 80. C 81. ATNR 82. LYFRK 83. S 84. YIEFMQDN 85. P 86. AGP 87. QKZM 88. C 89. FVRIHSQ 90. C 91. VTBGLAYF 92. DB 93. [IMTV][Q] 94. TNBKAHD 95. DBNKT 96. FSY 97. C 98. [YH][T] 99. EAKPD 100. PSAK 101. C Domeny owomukoidu (typ Kazala) 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. VILE NDH C [STR][D] LPKQE YF ALPKQ SQTK GTRS– IVKNT GVSTL KRTQ– DGN– G– TNRKE– STLAQP WMLIV– VTI [A][R]– C PT [RM][F] [NI][E] [L][Y] KSLQDV [P][E] [V][H] C GA TS DN GS 33. SFV 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. T Y SDA NS [ED][R] C GSTF ILF C [L][A][N] [YH][A] NY RAILV EQ HQLS GHRN ATR [NHST][E] VIL ESKAGN [K][L] ELKSRV [YHS][K] [DN][M] GA EKRA C RKE PLQE KERD [ISV][H] [VG][PT] 66. [MEK][PS] Jacek Leluk, Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University ! Multiple alignment of seven chicken ovoinhibitor domains obtained with Markovian and nonMarkovian methods Jacek Leluk, Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University What part of the codon contains the information about the previous amino acid that occurred at certain position of the protein sequence? At most 2/3 of the entire codon. Ala Val GCG GUG Jacek Leluk, Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University How long is the information about codons of preceeding amino acids stored? The shortest storage period is 3 transitions/transversions Ala Val Met Ile GCG GUG AUG AUA Ser Ser Thr Ser UCC UCU ACU AGU Theoreticaly the longest period is infinite Lys Asn Asp His Gln Glu Asp AAA AAC GAC CAC CAG GAG GAU Tyr His Asn Lys Gln His UAU CAU AAU AAG CAG CAC Jacek Leluk, Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University ... CONCLUSIONS The analysis of genetic semihomology excludes applicability of Markov model for the studies on protein variability at the amino acid level. The amino acid codons do contain the information about the „ancestral” amino acids, whose codons were the starting point to the codon of current residue. It refers mainly to the positions undergoing single-point mutations as the most basic mechanism of evolutionary variability. Jacek Leluk, Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University Thank you for your attention ! Thank you for your attention! Jacek Leluk, Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University