Diapositive 1 - Centre national de la recherche scientifique

Download Report

Transcript Diapositive 1 - Centre national de la recherche scientifique

Origin of Man, Language and
Languages
Central Asia
A common inquiry in genetics, linguistics and
anthropology
Granted by
European Science foundation
CNRS
Eurasia
Central Asia
TURKIC - SPEAKING PEOPLE
KAZAKS
KARAKALPAKS
UZBEKS
TURKMENS
UZBEKS
UZBEKS
TAJIKS
IRANIAN-SPEAKING PEOPLE
KIRGHIZES
TAJIKS
KIRGHIZES
Expeditions
•
•
•
•
2001 Karakalpakie (Karakalpak, Uzbek, Kazakh)
2002 Karakalpakie (On Tort Uruw, Turkmen)
2003 Kirghizie (North and South)
2004 Tajik from Uzbekistan (Ferghana and
Samarkand area)
• 2005 Boukhara area : Kazakh, Ouzbek and Tajik
• 2005 Tajik and Uzbek from Tajikistan (Gharm
and Penjinkent area)
FULL NAME :
Location :
Age :
Localization
Birth place
Father of father
Mother of father
Father of mother
Mother of mother
Language
Individual’s location
Father’s location
Mother’s location
N° :
Sex :
Individual’s language
Father’s language
Ethnological
questionnaire
Father of father
Mother of father
Mother’s language
Father of mother
Mother of mother
Tribe (or group)
Individual’s GF (Group of Filiation: tribes, clan, lignage)
Father’s GF
Father of father
Mother of father
Mother’s GF
Father of mother
Mother of mother
Individual’s spouse
Alive ? : Y/N
Second marriage : Y/N  if Yes add a questionnaire
Why this partner ? (cousin, same clan , same village ?…)
Spouse’s location
(origin)
F
M
Spouse’s Language
F
M
Spouse’s GF
F
Father of father
Mother of father
Father of mother
Mother of mother
Father of father
Mother of father
Father of mother
Mother of mother
Father of father
Mother of father
M
Father of mother
Mother of mother
• Location,
langage spoken,
tribe if applicable
of individuals and
4 grand-parents
• Information on
married children
Linguistic data
Blood, DNA
• 5 ml for each individual
• Informed consent
• On the field : blood is process  white
cells
• In the laboratory : DNA extraction
Main goals
• Trace back
population history
• Describe genetic
diversity in Central
Asia
• Compare genetic
and linguistic
distances
• History of Eurasia : Past demographic
expansion
– By Raphaelle Chaix (Former PhD Student)
Age of expansion
« Mismatch distribution »
Pop 1
Pop 2
All 2 by 2 comparison
seq1
seq2
ATAAT C
ACATTC
Fq
Fq
Estimation of
 = 2Tu
0
2
4
6
8
Number of differences between sequences
0
2
4
6
8
Age of expansion ()
ADNmt
N=133
ADNmt

r=0.7 p=0
longitude
Age of expansion ()
Chr Y
N=77
6
tau ss migr
6
4
5
2
4

tau ss migr
7
8
8
10
ChrY
r=0.3 p=0.01
0
20
40
60
longitude
80
longitude
100
25
3
Age of expansion (T)
ADNmt / Chr Y (KY)
17 5.6
27 7.4
25 6
24 7
30 7.2
31 6.6
mtDNA dating depends on the mutation rate
30000yrs BP China to 17000yrs BP in Europe (-27000 in CA)
62000yrs BP in China to 35000yrs BP in Europe (-54000 in CA)
Center of expansion ?
« intermatch distribution »
(Harpending et al 2000 – Excoffier et al 2004)
Pop 1
Pop 2
All 2 by 2 comparison
Fq
Fq
0
2
4
6
8
Number of differences between sequences
0
2
4
6
8
Same center of expansion
for the 2 populations
Cultural expansion
Fq
0
Europe
AsieCentrale
ExtrêmeOrient
2
4
6
8
Demic expansion
Fq
0
Europe
AsieCentrale
ExtrêmeOrient
2
4
6
8
Cultural expansion with high migration rate =
“Demic expansion”
Fq
0
Europe
AsieCentrale
ExtrêmeOrient
2
4
6
8
Past expansion in Eurasia
• Mitochondrial DNA : East to West in Paleolithic
(from China to Central Asia and then to Europe –
from Middle East to Europe)  no cultural
expansion
• Y chromosome : expansion during Neolithic.
Two centers of expansion (China and ME,
Pakistan CA) perhaps 3 (Europe).
For central Asia : same timing for expansion as
Middle East, a little bit earlier but not statistically
significant.
– Differences explained by lower Ne for male
Central Asia diversity
• 463 individuals typed for two uniparental
markers :
– Y chromosome  Micro satellites + SNP
– Mitochondrial DNA  HVS-I + RFLP
• 400 to be typed
Y Chromosome diversity
code pop
kk1
kz1
otu1
tkm1
uz1
kir2
kz2
tkm2
td2
uz2
ui2
KRA
KRG
KRM
TJK
TJR
N
54
49
54
51
40
37
14
21
22
28
33
46
20
22
30
29
diversité
0,97
0,84
0,89
0,84
0,97
0,91
0,86
0,94
0,87
1
0,98
0,82
0,78
0,7
0,98
0,98
descriptif
Karakalpaks (Qongirat)
Kazakhs
Karakalpaks (On Tort Uruw)
Turkmènes
Ouzbeks
Kirghizes de kirgizie centrale (mé
Kazakhs :Almaty, Katon-Karagay,
Turkmènes d Ashgabat
Tadjiks de Penjikent
Ouzbeks de Kashkadarya
Ouighours d Alma-Aty, Lavar
Kirghizes Andijan
Kirgize Nord Jankatalab
Kirgize Nord Doboloo
Tajik Ferghana Kamangaron
Tajik Freghana Richtan
Genetic distances among
populations : chromosome Y
KZ
KK UZ
KRG KRM
KRA
OTU
TK
r gen-geo = 0,85
TJK
TJR
• Comparison of genetic and ethnological
data
From oral tradition
Common
tribe’s
Ancêtre
commun
de la tribu
ancestor
Common clan’s
ancestor
Common lineage’s
ancestor
If a recent common
ancestor
If no recent common
ancestor
Common
ancestor
Strong genetical
Low genetical
kinship
kinship
Patrilinear filiation  Y chromosome study
1 – Ethnological questionnaire
250 men : genealogical information
2 – Patrilinear genetic kinship
12 microsatellites of Y chromosome
Coefficient
de parenté
coefficient
Kinship
0,9
0,8
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0
-0,1
-0,2
KZ
TK
UZ
QN
OTU
Same lineage
1
Same
2 clan
Same
3 tribe
4  tribe
Mean genetic kinship coefficient for each ethnological class of the five
populations examined in this study.
KZ Kazakhs; TK Turkmen; UZ Uzbeks; QN Qongrat; OTU On Tort Uruw.
Coefficient
de parenté
coefficient
Kinship
0,9
0,8
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0
-0,1
-0,2
KZ
TK
UZ
QN
OTU
Same lineage
1
Same
2 clan
Same
3 tribe
4  tribe
Coefficient
de parenté
coefficient
Kinship
0,9
0,8
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0
-0,1
-0,2
KZ
TK
UZ
QN
OTU
Same lineage
1
Same
2 clan
Same
3 tribe
4 tribe
Coefficient
de parenté
coefficient
Kinship
0,9
0,8
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0
-0,1
-0,2
KZ
TK
UZ
QN
OTU
Same lineage
1
Same
2 clan
Same
3 tribe
4  tribe
Datation
Ancestor of the
clan or of the
lineage
T = ASD / 
T : age
ASD = mean square
number of differences
between the ancestor
and the individuals
mutation
 = mutation rate
2.1x 10-3 per generation
Ancêtre commun de la tribu
Tribe : mythical ancestor
App. 450 years
15 generations
App. 1500 years
Common lineage’s
ancestor
50 generations
Common clan’s
ancestor
Conclusion
Genetical data can help decipher social
organisation
Lineages and clans : people share a recent
common ancestor
Tribes : a conglomerate of clans who
subsequently invented a mythical ancestor to
strengthen group unity
Y chromosome
• Low diversity of some populations is
explained by social organistaion
• Distances among populations related to
geographical distances
Mitochondrial DNA
MDS based on mt DNA – 12 populations – (Kimura 2P – α =0.26)
KZ
KK UZ
OTU
TK
KRG KRM
KRA
Tja
Tju
r gen-geo = O
TJK
TJR
Central Asian Populations
code
N
name of population
reference
location
kk1
kz1
otu1
tk1
uz1
kz3
kit3
kir3
kr4
kz4
kuz4
kg4
tu4
uz4
td4
uz2
tk2
krt2
KRA
KRG
KRM
TJK
TJR
Tja
Tju
ui3
ui4
shu
55
50
53
51
40
55
48
47
20
20
20
20
20
20
20
42
41
32
48
20
26
30
29
33
29
55
16
44
Karakalpaks (Qongirat)
Kazakhs
Karakalpaks (On Tort Uruw)
Turkmènes
Ouzbeks
Kazakhs
Kirghizes de Talas
Kirghizes de Sary-Tash
Karakalpaks
Kazakhs
Ouzbeks du Khorezm
Kirghizes
Turkmènes
Ouzbeks
Tadjiks
Ouzbeks
Turkmènes
Kurds du Turkmenistan
Kirghizes Andijan
Kirgize Nord Jankatalab
Kirgize Nord Doboloo
Tajik Ferghana Kamangaron
Tajik Freghana Richtan
Tajik Samarkande Agalic
Tajik Samarkand Urgut
Ouighours
Ouighours
Shugnan (Pamir - Tajikistan)
Present Study
Present Study
Present Study
Present Study
Present Study
Comas 1998
Comas 1998
Comas 1998
Comas 2004
Comas 2004
Comas 2004
Comas 2004
Comas 2004
Comas 2004
Comas 2004
Quintana 2004
Quintana 2004
Quintana 2004
Present Study
Present Study
Present Study
Present Study
Present Study
Present Study
Present Study
Comas 1998
Comas 2004
Quintana 2004
Karakalpakie
Karakalpakie
Karakalpakie
Karakalpakie
Karakalpakie
Alma Ati
Talas
Sry-Tash
Nukus
Gasli
Urgench
Osh
Urgench
Samarkande
Samarkande
Samarkande
Andijan
Nukus
Nukus
Ferghana Valley
Ferghana Valley
Samarkande area
Samarkande area
Alma Ati
Tashkent
MDS based on mt DNA – 28 populations – (Kimura 2P – α =0.26)
MDS de l’ADN mito
Tajiks
No gen-geo correlation
Ouighours
Kazakhs
Kazakhs
Karakalpaks_(Qongirat)
alpaks
Ouzbeks Karakalpaks_(OTU)
Turkmènes
Kazakhs Kirghizes
Ouzbeks (Korezhm)
Uighurs
Kirghizes (Comas)
Kirghizes A
Turkmènes
Kurds_du_Turkmenistan
Ouzbeks
Turkmènes
Kirghizes G&M
Tajiks R Kirghizes
Tajiks A &U
Tajiks
Ouzbeks
Shugnan_(Pamir_Tajikistan)
Mitochondrial DNA
Karakalpaks Uzbeks
N=4
N=3
Karakalpaks
Uzbeks
Kazakhs
Turkmen
Kirghizes
Tajiks
-0,00013
0,00951 0,00404
-0,00205 0,00478
0,00889 0,01291
0,00626 0,0203
0,02246 0,01517
Kazakhs
N=3
Turkmen Kirghizes
N=6
N=3
-0,00182
0,00835 -0,00079
0,0218
0,01182
0,02256
0,02408
0,0084
0,03533 0,02497
Mean distance (Fst) between populations
Diagonal show intra group distances
Exogamous populations
Endogamous populations
Tajiks
N=5
Mitochondrial DNA
• Distances among populations not related
to linguistic or geographical distances
• Exchange among populations differ
between Turko-Mongol (exogamous)
populations and Indo-Iranian
(endogamous) populations
Conclusion
• Past history: clear movement from east to
west in paleolithic – strong population
growth in neolithic.
• Exchange between populations clearly
different for male and female
• Linguistic distances ?
Computational linguistic
• Background
Design of the sampling
Swadesh list
2/3 speakers for each sampling location (interspeaker variation)
Analyses
We are not interested in historical linguistics.
Central Asia about 1000 CE : language groups
Turkic : Oguz, Kipchak, Karluk
Kipchak
Oguz
Ossetic
Karluk ?
Khorasmian
Sogdian
?
Pamirian
Persian-Tadjik
Dardic
Indo-iranian
We want to statistically compare genetic and
linguistic data
More linguistic differences among Iranian
populations than among Turkish
populations ?
… we selected distance-based approaches
We have two major linguistic groups Indo-Iranian and
Turk
We will focus on them separately since they both constitute
a DIALECT-CHAIN
Borrowing, if it exists is less of a problem since it reflects
CONTACT (migrations), a kind of information that is
embedded in genetic data. More than historical linguistics
we look for a POPULATION LINGUISTICS
Dialectometrical Computation of distances
(Kondrak 2004, Heeringa 2004)
Phonetic alignment:
•An alignement algorithm (string mapping)
•A metric for measuring distances between phonetic segments
Distance Matrices:
•Correlate linguistic and geographic distances
•Correlate linguistic and genetic distances (mt DNA)
From Ph Mennecier
What remains to be done in genetic
analysis
• Phylogeography of Y and mtDNA  geographic
patterns of genetic variation may reveal
migrations synchronic to linguistic phenomena
(replacement, borrowing,..)
• Autosomal markers
• Samples from Tajikistan
Thanks all the people who participated to this
study
In France :
Dr. François Jacquesson, linguist, CNRS
Pr. Evelyne Heyer, geneticist, MNHN, CNRS
Dr. Lluis Quintana, geneticist, CNRS, Inst. Pasteur
Dr. Philippe Mennecier, linguist, MNHN
Dr. Frederic Austerlitz, geneticist, CNRS
Dr. Svetlana Jacquesson, anthropologist, IFEAC
Dr. Franz Manni, geneticist, MNHN
Dr. R Chaix (former PhD student, in Oxford)
Dr. P Balaresque (former PhD student, in Leicester)
In Central Asia :
Dr. Tatiana Hegai, geneticist, Tashkent
Pr. Ruslan Ruzibakiev, geneticist, Tashkent
Dr. Aldashev, geneticist , Bishkek
Pr. Vadim Yagodin, archaeologist, Nukus
Dr. Bakyt Amanbaeva, archaeologist, Bishkek
Pr. Firuza Nasyrova, genetist, Douchanbé
Alignement
I
N
D
U
S
T
R
Y
0
I
0
N
0
T
1
E
2
R
3
E
S
T
4
5
6
6
6
7
8
The indels are weighted as 1 instead of 2 in
a newer version of the algorithm
industry
Subst i/i
0
industry
Subst.
n/n
0
intdustry
Insert t
1
intedustry
Insert e
1
interdustry
Insert r
1
interustry
Delete d
1
Interstry
Delete u
1
interestry
Insert e
1
Interestry
Subst s/s
0
Interestry
Subst t/t
0
Interesty
Delete r
1
Interest
Delete y
1
Total cost
8
Y Chromosome diversity
code pop
kk1
kz1
otu1
tkm1
uz1
kir2
kz2
tkm2
td2
uz2
ui2
KRA
KRG
KRM
TJK
TJR
N
54
49
54
51
40
37
14
21
22
28
33
46
20
22
30
29
diversité
0,97
0,84
0,89
0,84
0,97
0,91
0,86
0,94
0,87
1
0,98
0,82
0,78
0,7
0,98
0,98
descriptif
Karakalpaks (Qongirat)
Kazakhs
Karakalpaks (On Tort Uruw)
Turkmènes
Ouzbeks
Kirghizes de kirgizie centrale (mé
Kazakhs :Almaty, Katon-Karagay,
Turkmènes d Ashgabat
Tadjiks de Penjikent
Ouzbeks de Kashkadarya
Ouighours d Alma-Aty, Lavar
Kirghizes Andijan
Kirgize Nord Jankatalab
Kirgize Nord Doboloo
Tajik Ferghana Kamangaron
Tajik Freghana Richtan
Genetic distances among populations :
chromosome Y
kk1
kz1
otu1
tkm1
uz1
kir2
kz2
tkm2
td2
uz2
ui2
KRA
KRG
KRM
TJK
TJR
Karakalpaks (Qongirat)
Kazakhs
Karakalpaks (On Tort Uruw)
Turkmènes
Ouzbeks
Kirghizes de kirgizie centrale
Kazakhs :Almaty, Katon-Kara
Turkmènes d Ashgabat
Tadjiks de Penjikent
Ouzbeks de Kashkadarya
Ouighours d Alma-Aty, Lavar
Kirghizes Andijan
Kirgize Nord Jankatalab
Kirgize Nord Doboloo
Tajik Ferghana Kamangaron
Tajik Freghana Richtan