Transcript Document
Linguistic and genetic relationships in Northern Cameroon †, †, Andrew †, Brett C. Haberstick Gary P. Stetler Smolen John K. Hewitt ‡ ‡ ‡ Zygmunt Frajzyngier ,Eric Johnston and Erin Shay † †, ‡ Institute for Behavioral Genetics and Department of Linguistics University of Colorado, Boulder, Colorado, USA ABSTRACT Cameroon, (Central Africa), has about 270 languages, which belong to three linguistic families.In Northern Cameroon alone, there are over 100 languages spoken, and the differences among the languages are quite significant. The object of this feasibility study was to collect DNA in the field and determine the degree of genetic diversity among individuals comprising six language groups in Northern Cameroon: Hdi (N=30), Mafa (31), Mina (29), and Gidar (30) (all Central Chadic); Peve (31) (Masa, Chadic); and Mambay (31) (Niger Congo). DNA samples were collected on buccal swabs and sent to Institute of Behavioral Genetics for analysis. Here we report the preliminary analysis of data from genotyping 20 unlinked Short Tandem Repeat (STR, microsattelite) markers across the genome with average heterozygosities greater than 0.70. Data were analyzed using the program structure (Pritchard et al, 2000, Falush et al, 2003). Results from structure indicated that the genotype of an individual can be considered to be drawn from their respective subgroup (alpha = 0.0461). Furthermore, there is genetic differentiation across language groups (FST range: 0.0001-0.1231). Our ability to predict language subgroup membership based on genetic information was modest, but our results indicate that 6 disparate language groups exhibit genetic divergence and vary in genetic diversity. Whether the diversity within language groups resulted from six separate founding populations or from dispersion and isolation of the language groups from ancestral sources remains an open question. Efforts are in progress to genotype 12 Y-chromosome STR loci, and mitochondrial DNA typing is planned to address the question of ancestral origin of the genetic diversity among the six language groups. This study demonstrates the feasibility of DNA collection in these populations and our analyses suggest the potential value of more extensive genotyping of these samples and more extensive studies of the relationship between genetic and linguistic diversity in this region. INTRODUCTION RESULTS Hardy-Weinberg Equilibrium Table 1 details the observed and expected heterozygosities for each of our 20 STR markers characterized across our six populations. As shown, most of the STR markers, heterozygosity estimates were lower than expected. Within each population, between 4 and 10 markers were significantly different from HWE expectation. The highest number of deviations were obtained in the HDi language group (n -= 10). The increase in homozygosity is consistent with the notion divergence due to genetic drift or nonrandom mating. D Figure 2. METHODS DNA Samples Genomic DNA was isolated from buccal cells with three cotton-tipped swabs as described previously (Anchordoquy and Smolen, 2003). The yield of DNA was quantified using picogreen® fluorescence, and an aliquot was diluted to a concentration of 10 ng/µL or less for a working sample. The average yield of DNA was 11 ± 0.4 µg. Genotyping of Short Tandem Repeat Loci. A panel of 20 highly polymorphic unlinked STR markers organized into two multiplex reactions were characterized in this sample. PCR reactions contained 1 µl of genomic DNA (10 ng or less), 1x GoldStar buffer (Promega) and 1.5 units of AmpliTaq Gold® polymerase (ABI) in a total volume of 20 µl. A 95 oC incubation for 10 min was followed by 10 cycles of 94 oC for 30 sec, 60 oC for 30 sec, and 70 oC for 45 sec, 28 cycles of 90 oC for 30 sec, 60 oC for 30 sec, and 70 oC for 45 sec, and a final 30 min incubation at 60 oC. After amplification, an aliquot of PCR product was be combined with formamide and size standard (Genescan 500 Rox®), and analyzed by capillary electrophoresis with an ABI Prism 3100 DNA sequencer using protocols supplied by the company. Allele sizes were scored by two investigators independently. B C Analysis of Molecular Variance (AMOVA) The study population consisted of 182 individuals that were interviewed in Cameroon during the summer of 2004. Subjects were males and females from six language-specific villages. Ages of the subjects could range between 18 and 65. DNA samples were specifically collected from one person in a family. Population Structure Nature and Extent of Allelic Diversity The genetic structuring of human populations is thought to be reflected in patterns of linguistic, social and historical relationships. Relationships between gene pools and linguistic diversity has been suggested (CavalliSforza et al, 1997; Diamond et al, 2003) and shown to be influenced by patterns of human dispersion and settlement. The genetic structuring of populations reflects the exchange of alleles between populations. The patterns of gene flow have important consequences on the genetic makeup of individuals and, more broadly, the adaptability of populations to local conditions as well as the evolution of new traits (Balloux et al, 2002). In situations where populations are genetically isolated, by either human or geographical boundaries, decreases in heterozygosity may result which influences the evolutionary fitness of the population. In the current report we examined the relationship between the population-genetic structure of six African populations from Northern Cameroon and their respective linguistic traditions. Northern Cameroon is unique in that 270 different languages, belonging to three linguistic families are spoken. Four of the six populations belonged to the Central branch of the Chadic family (HDi, Mafa, Mina, and Gidar), one population belonged to the Masa brand of the Chadic (Peve), and a final language group, Mambay, belonged to the Niger-Congo family. Geographically, Mafa and HDi are immediate neighbors, and intermarry during times of economic hardship among the Mafa. Mina and Gidar are separated from each other by as little as 15km. Mina has historically interacted with their surrounding populations. Peve has no contact with any of these populations, except Mambay. Subjects A Figure 1. A Linguistic Map of Northern Cameroon Statistical Methods Population genetic indexes including Hardy-Weinberg Equilibrum (HWE), analysis of molecular variance (AMOVA) (Excoffier et al, 1992), and pairwise co-ancestory coefficients (Reynolds et al, 1983) were determined using the software package Arlequin (Version 3.01; Excoffier et al, 2005). Deviations from HWE were tested using Fisher’s exact probability test (Guo and Thompson (1992). Nonparametric permutation (Excoffier et al, 1992) tests examined the significance of variance components due to differences between individuals within language groups, between language groups within populations, and between populations. The significance of the genetic distance between any pair, FST, of populations was examined using a resampling permutation procedure (Excoffier et al, 1992). We adopted a second approach to understanding the population structure among our 20 STR markers using the program STRUCTURE (Pritchard et al, 2000). Population structure is inferred from multilocus genotypes and allocates individuals to different populations (K) based on allele frequencies for each locus. We ran 20 independent replicates for values of K ranging from 1 to 7. Each run consisted of an initial burn-in period of 100,000 steps, followed 100,000 Markov Chain Monte Carlo iterations, without any prior population information. We used the admixture model with correlated allele frequencies. As shown in Table 2, most of the variance in sample is attributable to within-population variation (97.7%) as opposed to between-population variation (2.3%). Combining all six populations resulted in a global FST value of 0.0233 (P < 0.0001), while FST values for individual populations ranged between 0.023 and 0.024. An AMOVA that included language indicated there was little variance attributable to between-population variation (-0.21, P = .587). Estimates of both between-language group (2.46%, P < 0.0001) and within-language group variation (97.75%, P < 0.0001) were significant. There was increased homozygosity, both overall (FIT = 0.067) and within each subpopulation (FIS = 0.088) and suggests that there exists population substructure and admixture within and between these six language groups. Pairwise coancestry coefficients, FST, were low and significant for four of the six subpopulations. As shown in Table 3, this suggested small but significant genetic differentiation between GIDAR, MINA, PEVE and MAMBAY language groups. Though non-significant, a value of zero for HDi and MAFA is consistent with the Wahlund effect of reduced heterozygosity and suggests that there are few restrictions upon mating between these two subpopulations (Balloux and Lugon-Moulin, 2002). Because genetic differentiation between groups was low, we investigated population structure using the clustering algorythm implemented in the program STRUCURE. Results, shown in Figure 2 A-D, indicated that there was a greater likelihood that individuals descended from six (K = 6; figure 2D) ancestral populations (log likelihood = -12,387.4) compared with five (-12.444.8), four (-12,564.1), three (-12,604.9), two (-12,708.2) or one single population (-13,140.7). Consistent with AMOVA results, the correlation between genes within subpopulations was low suggesting that there is slight genetic differentiation between groups (FST range: .0001 – 0.231). FST values were the highest for the MINA, PEVE, and MAMBAY subpopulations. An alpha value of 0.0461 from our K = 6 model suggests that the genotype of an individual is almost completely drawn from their respective population. This is consistent with inbreeding indexes, FIT and FIS, from AMOVA results and implicate genetic drift or non-random mating as putative agents in the genetic divergence of these groups. Inferred population structure based on 180 individuals and 20 STR markers. Each individual is represented by a thin line divided into K colored bars that represent that individual’s estimated membership coefficients, Q, in K clusters. Note: A: K = 3; B: K = 4; C: K = 5; D: K =6. 1 = GIDAR; 2 = MINA; 3 = PEVE; 4 = MAMBAY; 5 = HDi; 6 = MAFA SUMMARY & CONCLUSIONS We examined the relationship between the population-genetic structure of six African populations from Northern Cameroon and their respective linguistic tradition. To do so, we adopted two approaches which extracted indices of the extent and pattern of allelic diversity, inbreeding, and genetic structure. Evidence from both approaches converged to suggest that despite six different languages among an equal number of populations, there is little population-genetic diversity. That increases in homozygosity and significant FIT and FIS values were obtained in these data suggest that these six populations are under-going random genetic drift possibly due to non-random mating. Overall, the genetic relationships among the six Northern Cameroon populations are more readily predictable from historical information and contemporary social structure and cultural interaction than from the genetic relationships among the languages involved. LITERATURE CITED 1. Balloux F and LuGon-Moulin N (2002). The estimation of population differentiation with microsatellite markers. Molecular Ecology 11: 155-165. 2. Cavalli-Sforza LL (1997). Genes, peoples and languages. PNAS 94: 7719-7724. 3. Diamond J and Bellwood P (2003). Farmers and their languages: The first expansions. Science 300: 597603. 4. Excoffier L, Smouse P, Quattro JM (1992). Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. Gentics, 131: 479-491. 5. Excoffier L, Laval G, Schneider S (2005). Arlequin version 3.0: An integrated software package for population genetics data analysis. Evolutionary Bioinformatics Online 1 47-50. 6. Guo S, Thompson E (1992). Performing the exact test of Hardy-Weinberg proportion for multiple alleles. Biometrics 48: 361-372. 7. Falush D, Stephens M, Prichard JK (2003) Inference of population structure using multilocus genotype data: Linked loci and correlated allele frequencies. Genetics, 164: 1567-1587. 8. Pritchard, J. K., M. Stephens and P. Donnelly. 2000. Inference of Population Structure Using Multilocus Genotype Data. Genetics 155:945-959.