Potentials and limits of haplotype trees in exploring population structure and pathogenicity of mutations Hans-Jürgen Bandelt (Hamburg) 17.

Download Report

Transcript Potentials and limits of haplotype trees in exploring population structure and pathogenicity of mutations Hans-Jürgen Bandelt (Hamburg) 17.

Potentials and limits of
haplotype trees in exploring
population structure and
pathogenicity of mutations
Hans-Jürgen Bandelt (Hamburg)
17. Jahrestagung der Deutschen Gesellschaft für Humangenetik
Heidelberg, 08.–11. März 2006
Human
mtDNA
HVS-I
alias HVR1
from MITOMAP
The perception of evolution as seen through the
lenses of laboratories constitutes an overlay of
two different processes:
Perceived evolution =
Natural evolution (of the genome)
+ Artificial evolution (in the lab)
mtDNA
and evolution
α: Natural evolution
Migrational processes (prehistory)
ML tree of basal African mtDNA haplogroups
200,000
Time
(years)
L
3516A
5442
9042
9347
10589
10664
10915
13276
L1’5 = L1’2’3’4’5’6’7
2758
2885
7146
8468
13105
3666
7055
7389
13789
14178
14560
150,000
L0
L2’5= L2’3’4’5’6’7
L1
4586
9818
3423
7972
12432
12950
2395d
5951
6071
8027
9072
10586
12810
13485
14000A
14911
L0ak
2245
5603
11641
15136
15431
L5
L2’6 = L2’3’4’6’7
100,000
2416
8206
9221
10115
13590
10321
9545
9554
13116
1598
2220
5162
5899+C
6962
10031
11164
11252
11959
12477
12540
15929
709
851
930
1822
4496
5004
5111
5147
5656
6182
6297
7424
7873
8155
8188
8582
8754
9305
9329
9899
11015
11025
11881
12236
13105
13722
14212
14239
14581
14905
14971
15217
15884
1
2
3
L1c1’2
12049
13149
L1c2
6150
6253
7076
7337
8784
8877
10792
10793
11654
L0a
50,000
5147
5711
6257
8460
9bp-del
11172
825A
8655
10688
10810
13506
15301
L1c
L0af
5231
5460
8428
8566
11176
12720
14308
Coding-region variation displayed
Torroni et al. (TIG, June 2006)
1048
4312
6185
9755
11914
12007
L1c2a
L0a2
11143
14755
L0a2a
L2
4104
7521
L4’6 = L3’4’6’7
3594
7256
13650
709
770
961
13710
15289
15499
L3’7 = L3’4’7
769
1018
3693
L2d
Ethiopian samples
.
L6
L7
0
2417G
3027
3720
4976
5213
8152
9809C
10493
11065
11260
11701
12188
12215
12546T
12714
12810
13569
13830
15383
870
2159
2332
3254A
3434
6231
8856
9130A
9554
9941
10700
10955
11353
11944
12630
13239
14845
15263
15458
15703
15777C
965+3C
1461
4964
5267
6002
6284
9332
10978
11116
11743
12405
12714
12771
14533A
14791
14959
15244
3357
5460
6167
7376
7762
7775
8473
8631
8697
10373
11253
11344
11485
11653
12280
12414
13174
13344
14000A
14302
4
5
6
7
3918
8104
9855
12609
13470
L4
9
L3bd = L3bcd
5147
7424
8618
13886
14284
L3ex = L3eix
3483
6401
8311
8817
13708
3435
3621
648
723 5894+T
6392
1413
7129
5471
8041
5580
8197
5746
8928
10750
9941
14182
12340
14861
14034
8
L3
13105
10819
7645
14040
14395
5186
14905
3459
5046
5605
6272
6680
6842
1193
3441
5211
5581
9477
10373
11002
15299
745+T
1719
1842
5821
9365
15314
15479
1822
3666
7819A
8527
8932
11440
14769
14
15
16
17
18
19
20
21
L3d1 L3b
L3e
750
2158
8598
10679
11260
13687
13800A
L3e5
5899+C
14750
15172
5441
8222
12630
14818
15388
15944d
10
11
12
13
4715
8392
12561
15367
10400
14783
15043
M
5601
9950
3197
3693
4048
4350
5194
7270
8853
12507
12634
14148
15106
15952
L3d
L3a
L3h
L3f
959
1692
4643
5181
6293
6480
6602
8158
8251
8400
9932
10604
11176
11770
14590
15940
2352
14212
L3i
L3c
678
792
3582
4491
5393
7394
8835
9337
9682
11944
12373
14221
14371
14560
14587
15833
921
L3x
3450
5773
6221
9449
10086
13914A
15311
15824
15944d
7861
9575
3396
4218
15514
15944d
L3f1
1719
2831
3777
4388
4859
5300
7055
8767
9509
9827
10044
10289
11563
11590
11963
14410
2707
3879
4122
5147
5460
5567
5813
5930
8020
9098
9254
9380
9965
11440
12469
13080
13755
721
2357
5310
10184
10314
12618
12816
13443
13708
14461
14566
14851
15553
22
23
24
6446
6680
12403
12950C
14110
M1
813
3604
3705
4375
4793
6671
12346
13635
15514
25
8701
9540
10398
10873
15301
N
12705
R
750
1438
2706
4769
7028
8860
11719
14766
15326
rCRS
One of the first views of the East Asian
mtDNA phylogeny (Ozawa, Herz 1994)
incorrect rooting
all mutations that distinguish
haplogroups M and R (part of N)
CRS
R
M
Star-burst of autochthonous mtDNA lineages in Eurasia
(haplogroup N and its subhaplogroup R)
R5
U
preHV
JT
W
R2
N1
R1
X
R6
N5
R7
N
R
R8
R30
A
9140
6755 8404
15607
R31
N9
West Eurasia
South Asia
R9
R11
B
P
O
S
East Asia
Oceania
Palanichamy et al (Amer J Hum Genet, 2004)
... and a massive burst in haplogroup M, as e.g. seen in India:
Sun et al (Mol Biol Evol, March 2006)
An Out-of-Africa model based on mtDNA analysis
Kivisild et al (Springer-Verlag, April 2006)
Sketch of the phylogeny of basal
European mtDNA haplogroups
N
R
JT
U
X
R0 = pre-HV
W
R0a = (pre-HV)1
HV
N1
N1b
N1a’I
N1a
I
H
HV0 = pre-V
HV0a
N2
H1
H3
V
Torroni et al (TIG, June 2006)
Spatial frequency distributions of haplogroups H1, H3,
V, and U5b reveal signature of post-LGM expansions
Torroni et al (TIG, June 2006)
mtDNA
and evolution
β: Artificial evolution
Laboratory-specific processes
(error and fraud)
Major sources of error in
mtDNA sequence data
Artificial Recombination
through contamination or sample mix-up
(or targeting nuclear inserts of mtDNA)
Phantom mutations
sequencing errors at electrophoresis
Documentation errors
incurred by casual reading or writing
Impurifying selection is the driving
force in artificial evolution
inasmuch as incorrect data are more
flexible to interpret and can support sexy
stories — seemingly told by DNA — which
are then disseminated by high-impact
factor journals (e.g. Science and Nature).
Worst case: mtDNA in cancer research
(Salas et al, PLoS Medicine 2005)
Case of mtDNA
sample mix-up,
mis-interpreted as
somatic mutations;
data generated
with MitoChip
by Maitra et al
(Genome Res, 2004)
Data re-analysis by
Bandelt et al
(J Med Genet, 2005)
A case of cross-over in the 672 human complete
mtDNA sequences from Tanaka et al (2004)
NDsq0167
NDsq0178
15618
200
195
rCRS
R
F
F1
F1a
F1a1
14002
63
F1a’c
64
13759
16162
9548
12882
12406
16172
4086
R9
16129
9053
13928C
16519
10609
6962
522-523d
12705
10310
6392
249d
L3
16304
3970
M
N
16223
M7
15301
10873
10398
9540
8701
16209
4958
4386
2772
2626
12771
15043
14783
10400
489
M7a
9824
6455
16519
16140
15422
8005
5899+C
4435
2218
965+CC
961
249
10410
@9824
F1a1b
@6455
965.2+CC
NDsq0015
NDsq0168
F
B
D
A
1
F
C
3000
6000
E
9000
12000
15000
NDsq0168
M7a
2
F1a1b
M7a
NDsq0167
F1a1b
M7a
F1a1b
16569
Prime example of a phantom mutation
(Brandstätter et al, Electrophoresis 2005)
Electropherogram from
Nasidze and Stoneking (2001)
generated 1997 / 1998
and for the first time presented
in Stoneking and Nasidze
(Ann Hum Genet, 2006)
rCRS
Phantom mutations can be found in
excess in the HVS-I Caucasus data of
Nasidze and Stoneking (2001).
In view of additional problems, this
may be regarded as the worst data set
ever published in the realm of
molecular anthropology;
see Bandelt and Kivisild (Ann Hum
Genet 2006) for data re-analysis
Sequences with phantom transitions at
16280-16281 in those Caucasus data
Code
Mutation (16000+)
Haplogroup
AR31
AR483
AZ2
AZ342
AZ6
CH444
CH451
DAR23
DAR36
KAB408
067 279G 280 281 355
069 126 145 280 281 367C
280 281
280 281 298
154 168A 280 281 356 384
111 214G 249 280 281 327 388
280 281 292
129 223 278 280 281
258 280 281 384
224 280 281 311
HV1
J
?
pre-V
?
U1b
?
?
?
K
This mutation pair has never been
observed in >40,000 HVS-I sequences!
Electropherogram
presented by
Stoneking and Nasidze
(Ann Hum Genet, 2006)
rCRS
Phantom mutations in the HVS-I data of
Plaza et al (Ann Hum Genet, 2003)
(267 samples)
Sample
Mutation (16000+)
Haplogroup
Algeria
Andalusia
Andalusia
Andalusia
Catalonia
Catalonia
Morroco
Morroco
Morroco
Morroco
Morroco
Morroco
Saharawi
Saharawi
Saharawi
279N 285N
129 182C 183C 189 223 249 311 359 371
129 281
281
093 192 270 281 290A 304 311
224 281 311
093 224 242 311 371
124 223 284C 285T 300 319 374T
126 187 189 223 264 270 278 293 311 371 374
126 284C 292 294
183C 189 223 278 382G
189 192 270 369T
093 172 185 223 327 382G
172 281 311
189 382G
?
M1
?
?
U5b
K
K
L2d
L1b
T2
X
U5b
L3e1
U6?
?
Comparison with 1624
complete sequences stored in
the mtDB database
Variation in 16279-16285:
Only 20 transitional variants at 16284
Variation in 16369-16389:
Only 1+1+6 transitional variants at
16371, 16380, and 16381
Re-evaluation of the mtDNA data from the lab of Min-Xin Guan
rCRS
rCRS WH6967
4
12
15
Qu
2005
WZ4
BJ101
7
QJ383
16265 8270G 16227
16093 5885
15910
15784 5442
13044
10988C 5076
11914
16218A
10980G 1555
10398
14989 16140
10873 495G
4802
10754 14314
5 15326
3535
5773 11914 16519 10894 489
15 5 8860
10427 150
2392
8 7 750 523-524d 11778 16304G
151
3 315+C
10325 15784
12 263
9150 13928
F3b
9021 11778
8167
204
16220C
2389
9947
H2
523-524d
8281-8289d
16519
152
18 7 4769
F3
12 8 7 1438
B5a2
16362
6960
F1
16304
4 3540
16298
H
15 11065
B5a
12882
10320
12406
16266A
17 7 7028
5978
10609
17 8 7
15235
5913
2706
6962
3537
15 5585
210
3434
HV
18 17 16
9 10 11 12 13 15
8 7 6 5 4 2 1
14766
pre-HV
8 7
11719
73
17
B5
10310
6392
249d
12 13928C
12 3970
R
R
7
16223
12705
2
13
16298
16189
13928C
1555
495
204
199
184
N9a
KAsq
0089
BJ104
14384
207
15043C
13182
11778
1978
16291
15930
15244
14605
523-524d
16
5417
15301
17 13 10873
1 10398
9540
8701
14
15758
12468
10742G
10640
10589
8634
6710
3423
1
6
M10a
D4b2
D4a
9824A
8964
1382C
17 16129
14979
8473
3206
152
8020
15218
10646
8856
7250
3172+C
D
16362
13 2 5178A
4883
16519
14978
12957
7853
6338
1 5987
5821
4047
146
16173
15327
11914
11410
200
151
14180
9667
9383
10
11
#078
#081
HNsq SD10324
0152
16292A
16189
16497
16167
16265T
15236
14488
13928T
12477
11860A
2361
10658 16092 10235 14569
8602
11350
11935
2885
200
9554
15924
2238
146
14869
11926
200
154
1719
152
16311
M11b
198
146
14790
13890
10685
16172
C
M11a
M10
D4
5 14668
18 8414
9
16129
16311
16093
16217
13135
15930
11778
7982
11257
1719
9966 5897
8821G 4454
6357 1555
3866 523-524d
1555
9296
18 194
16311
15071
15040
14502
13152
12549
8793
4140
709
14 573+C
16327
1 14318
13263
11914
9545
3552A
M
N
N
missing
mutations
17
BJ103 WZ6 BJ105 LN7710 Wang GD7817 Miao #101
2005
271
16519
7444
3324
16519 1811
15236 217
7511
10410
D4b2b
3010
N9
3
Yuan
2005
D4a1
D4b
16261
16257A
12372
12358
5231
150
5
Li
2005
1555
16519
13
1494
18
BJ106 WZ5
16129
16111
12007
16 4386
16319
7 16290
8794
4824
8 7 4248
1736
663
235
12 16304
16189
8281-8289d
16362
14075C
13856A
11718
10873
11639C
10640
4247
9443
2572G
8532
1709 16294 5046
16390 961+C 14776 1555
16291
13287
2736
8567
1555
8551
961+C
4257
3687 N9a1
1168d
654
A
R9
B
16
Zhao WH6980 BJ102
2004
16362
523-524d
F
16140
10398
4 9950
8584
709
8
Li
2004
CZ
249d
M8
16298
15487T
8584
7196A
4715
10 14340
M11
13074
11969
9 9950
8108
7642
6 6531
1095
10 326
318
215
146
13
misscored
mutations
in red
L3
15043
14783
17 1 10400
14 5 3 489
Yao et al (Hum
Genet, 2006)
Strategies of authors to deal with errors
1st: Publishing a corrigendum
[rare event]
2nd: No correction — but avoiding similar errors
in future work
[common practice]
3rd: No action — and committing the same
errors as before
[e.g. as Min-Xin Guan and colleagues do]
4th: Fraudulent action — performing fake
analyses and giving false statements
[as done by Mark Stoneking and Ivane Nasidze in
the Ann Hum Genet]
... only L strand, no H strand information shown!
Stoneking and Nasidze (2006)
Human Mitochondrial DNA and the Evolution
of Homo sapiens
Series: Nucleic Acids and Molecular Biology,
Vol.18
Volume package: Human Mitochondrial DNA
Bandelt, Hans-Jürgen; Richards, Martin;
Macaulay, Vincent (Eds.)
2006, Approx. 250 p., 31 illus., 2 in colour.,
Hardcover
ISBN: 3-540-31788-0
Springer-Verlag
Due: April 2006