Pitfalls with mtDNA analysis in medical genetics Hans-Jürgen Bandelt (Hamburg) EMBO World Programme Workshop on Human Evolution and Disease, Hyderabad, India.

Download Report

Transcript Pitfalls with mtDNA analysis in medical genetics Hans-Jürgen Bandelt (Hamburg) EMBO World Programme Workshop on Human Evolution and Disease, Hyderabad, India.

Pitfalls with mtDNA analysis in
medical genetics
Hans-Jürgen Bandelt (Hamburg)
EMBO World Programme Workshop on Human Evolution and Disease,
Hyderabad, India. 6th – 9th December 2006
From the journal Oncogene, this year, one
could pick up the following exciting
news:
“Very strikingly, the mitochondrial sequences
in this study (tumor samples as well as
controls) with the Indian population revealed
a unique profile of eight sequence variants,
viz. A73G, A263G, A1438G, A2706G,
A4769G, C7028T, A8860G and A15326G
appeared at high frequencies in all samples
and could be of evolutionary significance.”
“... the profile could be specific
to the Indian population.”
Well, not, quite ...
R0 =
Leaving aside mutations that were
systematically missed (such as A750G and
C14766T, probably due to the use of a wrong
reference sequence), the claim would translate
into:
... Near-absence of haplogroup R0 could
be specific to the Indian population.
Note that haplogroup R0
is specific to the
West Eurasian mtDNA pool.
Absence of phylogenetic knowledge
about human mitochondrial DNA is
characteristic of clinical genetics.
In a paper from Muscle and Nerve, 2003, the
authors (from Taipei) identified among completely
sequenced 17 cases
“a double mutation (A3243G and A14693G) in a
patient with MELAS syndrome, who had a diabetic
mother and normal siblings. The A14693G
substitution is significant from structural and
evolutionary points of view. This result indicates
that mtDNA should be sequenced in its entirety for
the complete evaluation of mitochondriopathy.”
However, the complete mtDNA sequence
of that MELAS patient was not reported ...
“A14693G is not identified in 205 human controls
and 76 randomly examined species.”
Y1b
15460
15221
10097
7933 146
Y1
Y1a
However, it has been
known since 2003 that
A14693G is a
characteristic mutation
of the East Asian
haplogroup Y
16266
3834
16311
15244
14914
7859
6941
5147
Y 482
16231
16223
16126
14693
14178
10398
8392
N9
5417
N
Bandelt, Yao & Kivisild (2005)
Y2
According to
Trejaut et al.
(2005),
haplogroup Y2
does occur in
parts of Taiwan
Suppose that you would now find A14693G
in some disease context and would consult
MITOMAP prior to publishing your case:
Green light from MITOMAP, thus – and here you go:
Dong-Ling Tang, Xin Zhou, Xia Li, Lei Zhao, Fang Liu
Diabetes Res Clin Pract. 2006 Jan 13
“… in a Chinese population, a total of 184 T2DM cases and 279 matched
healthy controls were recruited. … Our results suggest that the mutations
of T3394C and A14693G may contribute to genetic predisposition to
T2DM, with the T16189C variant being associated with insulin resistance.”
Dong-ling Tang, Xin Zhou, Ke-yuan Zhou, Xia Li, Lei Zhao,
Fang Liu, Fang Zheng, Song-mei Liu
Zhonghua Yi Xue Yi Chuan Xue Za Zhi. 2005 Dec 22
“A total of 184 cases of type 2 diabetes mellitus and 210 matched healthy
controls with normal glucose tolerance were recruited for the study. ….
The mutations of 3394 (T-->C) and 14693 (A-->G) may contribute to
the genetic predisposition to type 2 diabetes; 16189 (T-->C) variant is
associated with insulin resistance and risk factor of diabetes.”
Double-publication does not seem to be
a rare phenomenon in China.
Another way of re-cycling is to publish
one and the same mutation as novel
multiple times – either found in one and
the same patient who was re-examined
again and again, or in several different
patients – as long as the mutation would
not find its way into MITOMAP.
The control group investigated by Tang
et al. (2006) has the peculiar feature
that in a sample of size 279 there was
absolutely no haplogroup M9a‘b lineage
(T3394C). Other studies would,
however, rather suggest a figure of 2.7%
(in Han Chinese). Under the hypothesis
that this percentage is the true
frequency, then the event of observing
0/279 M9a‘b lineages would have a
probability of 0.05%!
There are further cases, in cancer
research, where control group data
are almost void of any variation
and thus are either cooked or
fabricated, e.g. taken over (via
copy-and-paste) from other studies
(which have a most peculiar error
spectrum themselves).
Patients
Controls
Cooked data
rCRS
4
4
4
4
1 15326
3
3 2 1 8860
3 2 1 315+C
263
3 2 1
H2a
5 4 3 2 1
750
11955C
11947C
10590G
3969
2472del
16147
14570G
11956
11712G
11457G
11247G
11853
9883G
9860G
5942
4662
3338G
3285G
1486G
1000A
8850
8639
6479+A
2687del
741
F1a’c
1 16129
1 13759
9053
F1
1
1
1
1
H2
5 3 2
5 4 2 1
3
12882
12406
10609
6962
D4c1a
B4
3
3
3
3
3
3
2 16217
F
4769
1438
1 10310
6392
249del
H
5 4 3 2 1
5 4 3 2 1
7028
2706
4 3 2 1
14766
B
3
R9
3
3
R
Missed
mutations
522-523del
13768
14668
8414
3010
11084
4 11017
M8a1
M7a1
4 16324
14364
5899+C
L3
16184
2835
M8a
M8
16298
5 15487T
5 8584
5 7196A
5 4715
M7
4 9824
6455
3
3
5 4 3
5 4 3
5
16319
5 14470
8684
6179
M
3 15301
3 10873
3 10398
3 9540
3 8701
16524+G
10027+G
9572
8494T
8483T
8021+A
7917+G
7904+G
6456
5101
4919+T
4032-4033del
2036+C
1080+C
1074+C
M7a1a
16209
4 12771
4958
4386
4 2772
4 2626
3 16362
3 5178A
4883
N
5 4
4
4
4
5 4
7852
M7a1a5
D4c
D
16223
12705
13586
12417
11853
11771G
11102+G
10999G
7337
500del
M7a
D4
1 13928C
1 3970
5 4
3
5
2766
16245
R0=pre-HV 1 16304
5 4 3 2 1 11719
73
5 4 3 2 1
16223
9755
3391
207
199
194
191+A
D4c1
2 8281-8289del
2 16189
HV
4
UC-Case 1 UC-Case 2 UC-Case 3
Control 1 Control 2
14766
14368G
14365G
14272G
14199G
13702G
5 11335
9559G
4985
3423G
5 3106del
5
5
5
5
2
1
CRS
15043
14783
10400
489
False
mutations
In particular, mtDNA analysis in
cancer studies is riddled with all kinds
of error – for example, sample mix-up
or contamination:
Bundles of perceived somatic
mutations then actually trace parts of
phylogenetic pathways in the mtDNA
phylogeny (Salas et al. 2005).
How would authors of inflicted papers react?
Well, they could say that the revisited pathway
would just express the disease...
In one remarkable case it was proclaimed:
“This patient had a germline mitochondrial
haplotype J, which “shifted” in positions 185,
295, and 16126 back to the phylogenetically
older haplotype H, but shifted in position 195 to
haplotype W and in position 204 to nowhere.”
Pythonesque...*
*Said of a style of humour: bizarre and surreal
Monty Python is the collective name of the creators of
Monty Python's Flying Circus, a British television comedy
sketch show broadcast by the BBC from 1969 to 1974.
MP‘s foot
The Dead Parrot sketch is one of the most famous
in the history of television comedy.
Information and links: Wikipedia
The Dead Parrot sketch portrays a conflict between a
disgruntled customer and a shopkeeper, who hold
contradictory positions on the vital state of a Norwegian
Blue parrot (an apparent absurdity in itself).
"I know a dead parrot when I see one, and I'm lookin' at one right now."
The customer complains that the parrot he has recently
purchased at the location is, in fact, dead. The
shopkeeper denies this and points out the beauty of its
plumage, further suggesting that the bird is merely asleep.
Monty Python's Dead Parrot sketch has
come to life in molecular anthropology:
The Caucasian King Size parrot
starring
Ivani Nasidze & Mark Stoneking
from the MP Institute EVA
Natal history of the Caucasian King Size parrot:
2001 Nasidze and Stoneking published a Caucasian
HVS-I data set (but did not show the data)
2003 At the International Symposium on Forensic
DNA Technologies in Münster this Caucasian
data set was accused of having an enormous
(king size) error spectrum
2004 Nasidze and colleagues gave a false statement
(in the Annals of Human Genetics) about the
vital state of their old data set, by employing a
sort of a filter analysis (however, non-adjusted to
actual sequencing range):
“... (analysis not shown).
These reticulations were
just due to one
sequence*, as removal of
this sequence removed
the excess reticulations.”
False statement!
Here‘s the torso of the
reticulate network after
removal of that sequence
(Bandelt and Kivisild 2006).
* Corrected Sequence not shown
Labels 16000+
Advertisement
This Caucasian HVS-I data set has now been
displayed publicly and can be inspected in the book:
Human Mitochondrial DNA and the
Evolution of Homo sapiens
Bandelt, Hans-Jürgen; Richards, Martin;
Macaulay, Vincent (Eds.)
Springer-Verlag, 2006, 117.65€
Those who wish to evaluate the idiosyncratic
variation of this Caucasian data set but have
no prior knowledge of natural mtDNA
variation may proceed as follows:
Take out all mutations ever observed in any
one of the complete sequences referred to
in Max Ingman‘s database
(http://www.genpat.uu.se/mtDB/)
and represent the surviving variation in the
form of a quasi-median network that
highlights the character conflicts
and compare with other data sets:
Beautiful plumage
From: Bandelt & Dür (2006)*
*Hans-Jürgen Bandelt & Arne Dür (2007) Translating DNA data
tables into quasi-median networks for parsimony analysis and error
detection. Molecular Phylogenetics and Evolution 42: 256–271.
In this article it is demonstrated that the
mtDB2005-filtered variation of the
Caucasian data is even messier than
corresponding randomised data tables.
The stone dead Caucasian King Size parrot:
Bandelt & Kivisild (2006) Quality assessment of
DNA sequence data: autopsy of a mis-sequenced
mtDNA population sample. Annals of Human
Genetics 70: 314–326.
Stoneking & Nasidze (2006) The patient is not
dead yet: premature autopsy of a mtDNA data set.
Annals of Human Genetics 70: 327–331.
Parson (2006) The art of reading sequence
electropherograms. Annals of Human Genetics
Stoneking & Nasidze (2006) Reply to Parson.
Annals of Human Genetics
Remarkable electropherogram!
What‘s wrong with it?
This electropherogram
don't enter into it.
That‘s what‘s wrong with it.
No one has ever claimed that the
transition C16168T is a phantom mutation
– instead, it‘s the transversion C16168A
that may be a phantom mutation.
In Nasidze et al. (2001) the 16168 transition always occurs
together with the 16343 transition: this constitutes a
confirmed motif within haplogroup U3 (Macaulay et al. 1999).
Eight of these paired electropherograms are irrelevant here
because they just show other sequences with the rCRS
nucleotide in question. Instead, the heavy strand
electropherograms could have been shown in these places!
Walther Parson has aptly demonstrated that those
electropherograms reflect well-known and reproducible
sequencer artifacts and were not interpreted properly by
Nasidze and Stoneking.
They have always asserted that they have read both strands
– but without ever giving any evidence for that:
“As stated both in the original papers (Nasidze & Stoneking 2001:
p.1198; Nasidze et al. 2004: p.207) and in the reply to Bandelt & Kivisild
(Stoneking & Nasidze, 2006: p.329), both strands were indeed
sequenced in all samples.”
Thus, they are iterating a false statement.
You know a dead parrot when you see one:
... “(data not shown)” *
* and
not submitted to GenBank either
... “(analysis not shown)”
In case you wish to register a
complaint, please, address
yourself to the ombudsperson:
“Scientific honesty and the
observance of the principle of good
scientific practice are essential in
all scientific work which seeks to
expand our knowledge and which
is intended to earn respect from
the public.”
From Mark Stoneking’s website
(http://email.eva.mpg.de/~stonekg/files/ombud.htm):
“As elected Ombudsperson of the Max Planck
Institute for Evolutionary Anthropology,
I stand at your disposal in case you are
experiencing or observing any kind of scientific
misconduct, or if you need advice on the subject
of good scientific practice.”
“Scientific misconduct includes: false statements,
infringement of intellectual property, impairment
of the research work of others, joint
accountability.”
Acknowledgement:
Sincere thanks are due to
Martin Richards (reminding me of MP‘s
immortal Dead Parrot sketch)
and all collaborators (Anita Brandstätter,
Claudio Bravi, Mike Coble, Arne Dür,
Toomas Kivisild, Jüri Parik, Walther
Parson, Antonio Salas, Richard Villems,
Yong-Gang Yao)
Thank you
for listening