Transcript Document

Linguistic and genetic relationships in Northern Cameroon
†,
†, Andrew
†,
Brett C. Haberstick Gary P. Stetler
Smolen John K. Hewitt
‡
‡
‡
Zygmunt Frajzyngier ,Eric Johnston and Erin Shay
†
†,
‡
Institute for Behavioral Genetics and Department of Linguistics
University of Colorado, Boulder, Colorado, USA
ABSTRACT
Cameroon, (Central Africa), has about 270 languages, which belong to three linguistic families.In Northern Cameroon alone, there are over 100 languages spoken, and the differences among the languages are quite significant. The object of this feasibility study was to collect DNA in the field and determine the degree of
genetic diversity among individuals comprising six language groups in Northern Cameroon: Hdi (N=30), Mafa (31), Mina (29), and Gidar (30) (all Central Chadic); Peve (31) (Masa, Chadic); and Mambay (31) (Niger Congo). DNA samples were collected on buccal swabs and sent to Institute of Behavioral Genetics for
analysis. Here we report the preliminary analysis of data from genotyping 20 unlinked Short Tandem Repeat (STR, microsattelite) markers across the genome with average heterozygosities greater than 0.70. Data were analyzed using the program structure (Pritchard et al, 2000, Falush et al, 2003). Results from structure
indicated that the genotype of an individual can be considered to be drawn from their respective subgroup (alpha = 0.0461). Furthermore, there is genetic differentiation across language groups (FST range: 0.0001-0.1231). Our ability to predict language subgroup membership based on genetic information was modest, but our
results indicate that 6 disparate language groups exhibit genetic divergence and vary in genetic diversity. Whether the diversity within language groups resulted from six separate founding populations or from dispersion and isolation of the language groups from ancestral sources remains an open question. Efforts are in
progress to genotype 12 Y-chromosome STR loci, and mitochondrial DNA typing is planned to address the question of ancestral origin of the genetic diversity among the six language groups. This study demonstrates the feasibility of DNA collection in these populations and our analyses suggest the potential value of more
extensive genotyping of these samples and more extensive studies of the relationship between genetic and linguistic diversity in this region.
INTRODUCTION
RESULTS
Hardy-Weinberg Equilibrium
Table 1 details the observed and expected heterozygosities for each of our 20 STR
markers characterized across our six populations. As shown, most of the STR markers,
heterozygosity estimates were lower than expected. Within each population, between 4
and 10 markers were significantly different from HWE expectation. The highest
number of deviations were obtained in the HDi language group (n -= 10). The increase
in homozygosity is consistent with the notion divergence due to genetic drift or nonrandom mating.
D
Figure 2.
METHODS
DNA Samples
Genomic DNA was isolated from buccal cells with three cotton-tipped
swabs as described previously (Anchordoquy and Smolen, 2003). The yield
of DNA was quantified using picogreen® fluorescence, and an aliquot was
diluted to a concentration of 10 ng/µL or less for a working sample. The
average yield of DNA was 11 ± 0.4 µg.
Genotyping of Short Tandem Repeat Loci.
A panel of 20 highly polymorphic unlinked STR markers organized into
two multiplex reactions were characterized in this sample.
PCR reactions contained 1 µl of genomic DNA (10 ng or less), 1x GoldStar
buffer (Promega) and 1.5 units of AmpliTaq Gold® polymerase (ABI) in a
total volume of 20 µl. A 95 oC incubation for 10 min was followed by 10
cycles of 94 oC for 30 sec, 60 oC for 30 sec, and 70 oC for 45 sec, 28 cycles
of 90 oC for 30 sec, 60 oC for 30 sec, and 70 oC for 45 sec, and a final 30
min incubation at 60 oC. After amplification, an aliquot of PCR product
was be combined with formamide and size standard (Genescan 500 Rox®),
and analyzed by capillary electrophoresis with an ABI Prism 3100 DNA
sequencer using protocols supplied by the company. Allele sizes were
scored by two investigators independently.
B
C
Analysis of Molecular Variance (AMOVA)
The study population consisted of 182 individuals that were interviewed in
Cameroon during the summer of 2004. Subjects were males and females
from six language-specific villages. Ages of the subjects could range
between 18 and 65. DNA samples were specifically collected from one
person in a family.
Population Structure
Nature and Extent of Allelic Diversity
The genetic structuring of human populations is thought to be reflected in
patterns of linguistic, social and historical relationships. Relationships
between gene pools and linguistic diversity has been suggested (CavalliSforza et al, 1997; Diamond et al, 2003) and shown to be influenced by
patterns of human dispersion and settlement. The genetic structuring of
populations reflects the exchange of alleles between populations. The
patterns of gene flow have important consequences on the genetic makeup
of individuals and, more broadly, the adaptability of populations to local
conditions as well as the evolution of new traits (Balloux et al, 2002). In
situations where populations are genetically isolated, by either human or
geographical boundaries, decreases in heterozygosity may result which
influences the evolutionary fitness of the population.
In the current report we examined the relationship between the
population-genetic structure of six African populations from Northern
Cameroon and their respective linguistic traditions. Northern Cameroon is
unique in that 270 different languages, belonging to three linguistic families
are spoken. Four of the six populations belonged to the Central branch of
the Chadic family (HDi, Mafa, Mina, and Gidar), one population belonged
to the Masa brand of the Chadic (Peve), and a final language group,
Mambay, belonged to the Niger-Congo family. Geographically, Mafa and
HDi are immediate neighbors, and intermarry during times of economic
hardship among the Mafa. Mina and Gidar are separated from each other by
as little as 15km. Mina has historically interacted with their surrounding
populations. Peve has no contact with any of these populations, except
Mambay.
Subjects
A
Figure 1.
A Linguistic Map of Northern Cameroon
Statistical Methods
Population genetic indexes including Hardy-Weinberg Equilibrum (HWE),
analysis of molecular variance (AMOVA) (Excoffier et al, 1992), and
pairwise co-ancestory coefficients (Reynolds et al, 1983) were determined
using the software package Arlequin (Version 3.01; Excoffier et al, 2005).
Deviations from HWE were tested using Fisher’s exact probability test (Guo
and Thompson (1992). Nonparametric permutation (Excoffier et al, 1992)
tests examined the significance of variance components due to differences
between individuals within language groups, between language groups
within populations, and between populations. The significance of the genetic
distance between any pair, FST, of populations was examined using a
resampling permutation procedure (Excoffier et al, 1992).
We adopted a second approach to understanding the population structure
among our 20 STR markers using the program STRUCTURE (Pritchard et
al, 2000). Population structure is inferred from multilocus genotypes and
allocates individuals to different populations (K) based on allele frequencies
for each locus. We ran 20 independent replicates for values of K ranging
from 1 to 7. Each run consisted of an initial burn-in period of 100,000 steps,
followed 100,000 Markov Chain Monte Carlo iterations, without any prior
population information. We used the admixture model with correlated allele
frequencies.
As shown in Table 2, most of the variance in sample is attributable to within-population
variation (97.7%) as opposed to between-population variation (2.3%). Combining all
six populations resulted in a global FST value of 0.0233 (P < 0.0001), while FST values
for individual populations ranged between 0.023 and 0.024. An AMOVA that included
language indicated there was little variance attributable to between-population variation
(-0.21, P = .587). Estimates of both between-language group (2.46%, P < 0.0001) and
within-language group variation (97.75%, P < 0.0001) were significant. There was
increased homozygosity, both overall (FIT = 0.067) and within each subpopulation (FIS
= 0.088) and suggests that there exists population substructure and admixture within
and between these six language groups. Pairwise coancestry coefficients, FST, were low
and significant for four of the six subpopulations. As shown in Table 3, this suggested
small but significant genetic differentiation between GIDAR, MINA, PEVE and
MAMBAY language groups. Though non-significant, a value of zero for HDi and
MAFA is consistent with the Wahlund effect of reduced heterozygosity and suggests
that there are few restrictions upon mating between these two subpopulations (Balloux
and Lugon-Moulin, 2002).
Because genetic differentiation between groups was
low, we investigated population structure using the
clustering algorythm implemented in the program
STRUCURE. Results, shown in Figure 2 A-D,
indicated that there was a greater likelihood that
individuals descended from six (K = 6; figure 2D)
ancestral populations (log likelihood = -12,387.4)
compared with five (-12.444.8), four (-12,564.1),
three (-12,604.9), two (-12,708.2) or one single
population (-13,140.7). Consistent with AMOVA
results, the correlation between genes within subpopulations was low suggesting that there is slight
genetic differentiation between groups (FST range:
.0001 – 0.231). FST values were the highest for the
MINA, PEVE, and MAMBAY subpopulations. An
alpha value of 0.0461 from our K = 6 model suggests
that the genotype of an individual is almost
completely drawn from their respective population.
This is consistent with inbreeding indexes, FIT and
FIS, from AMOVA results and implicate genetic drift
or non-random mating as putative agents in the
genetic divergence of these groups.
Inferred population structure based on 180 individuals and 20
STR markers. Each individual is represented by a thin line
divided into K colored bars that represent that individual’s
estimated membership coefficients, Q, in K clusters.
Note: A: K = 3; B: K = 4; C: K = 5; D: K =6.
1 = GIDAR; 2 = MINA; 3 = PEVE; 4 = MAMBAY; 5 = HDi;
6 = MAFA
SUMMARY & CONCLUSIONS
We examined the relationship between the population-genetic structure of six African populations from
Northern Cameroon and their respective linguistic tradition. To do so, we adopted two approaches which
extracted indices of the extent and pattern of allelic diversity, inbreeding, and genetic structure. Evidence
from both approaches converged to suggest that despite six different languages among an equal number of
populations, there is little population-genetic diversity. That increases in homozygosity and significant FIT
and FIS values were obtained in these data suggest that these six populations are under-going random
genetic drift possibly due to non-random mating. Overall, the genetic relationships among the six Northern
Cameroon populations are more readily predictable from historical information and contemporary social
structure and cultural interaction than from the genetic relationships among the languages involved.
LITERATURE CITED
1. Balloux F and LuGon-Moulin N (2002). The estimation of population differentiation with microsatellite
markers. Molecular Ecology 11: 155-165.
2. Cavalli-Sforza LL (1997). Genes, peoples and languages. PNAS 94: 7719-7724.
3. Diamond J and Bellwood P (2003). Farmers and their languages: The first expansions. Science 300: 597603.
4. Excoffier L, Smouse P, Quattro JM (1992). Analysis of molecular variance inferred from metric distances
among DNA haplotypes: application to human mitochondrial DNA restriction data. Gentics, 131: 479-491.
5. Excoffier L, Laval G, Schneider S (2005). Arlequin version 3.0: An integrated software package for
population
genetics data analysis. Evolutionary Bioinformatics Online 1 47-50.
6. Guo S, Thompson E (1992). Performing the exact test of Hardy-Weinberg proportion for multiple alleles.
Biometrics 48: 361-372.
7. Falush D, Stephens M, Prichard JK (2003) Inference of population structure using multilocus genotype
data: Linked loci and correlated allele frequencies. Genetics, 164: 1567-1587.
8. Pritchard, J. K., M. Stephens and P. Donnelly. 2000. Inference of Population Structure Using Multilocus
Genotype Data. Genetics 155:945-959.