Transcript 0 - dimacs

Recombination based population genomics

Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

Recall: IRiS

Identification of Recombinations in Sequences

IRiS  is a computational method developed with biological insight  detects evidence of historical recombinations  minimizes number of recombinations in Ancestral Recombinational Graph (ARG)

Recotypes

Two chromosomes share a recombination if the junction is co-inherited.

mutation edge recombination edge extant sequence

Recotypes

Two chromosomes share a recombination if the junction is co-inherited.

a r1 b

Recotypes

Two chromosomes share a recombination if the junction is co-inherited.

r2 c a r1 b

Recotypes

Two chromosomes share a recombination if the junction is co-inherited.

r1 r2 … a b c …

1 1 0 0 0 1

r2 c a r1 b

Validity of inferred recombinations

 Comparison with sperm typing  Computer simulated recombinations

in vitro

Jeffreys et al. 2005 80 UK semen donor of North European origin - Sperm typing - LDhat and Phase (200 SNPs) HapMap 2 CEU population similar SNP density

Chr 1 near MS32 minisatellite

sperm typing LDhat Phase IRiS

in silico

HapMap 3 X chromosome data

•Select 2 chromosomes at random.

•Pick a random breakpoint.

•Create a new chromosome.

•Check if it is unique, add to the dataset.

•Run IRiS on the dataset to see if the breakpoint is detected.

Chromosomes

in silico

HapMap 3 X chromosome data

•Select 2 chromosomes at random.

•Pick a random breakpoint.

•Create a new chromosome.

•Check if it is unique, add to the dataset.

•Run IRiS on the dataset to see if the breakpoint is detected.

Chromosomes

in silico

HapMap 3 X chromosome data

•Select 2 chromosomes at random.

•Pick a random breakpoint.

•Create a new chromosome.

•Check if it is unique, add to the dataset.

•Run IRiS on the dataset to see if the breakpoint is detected.

Chromosomes

in silico

HapMap 3 X chromosome data

•Select 2 chromosomes at random.

•Pick a random breakpoint.

•Create a new chromosome.

•Check if it is unique, add to the dataset.

•Run IRiS on the dataset to see if the breakpoint is detected.

Chromosomes

in silico

HapMap 3 X chromosome data

•Select 2 chromosomes at random.

•Pick a random breakpoint.

•Create a new chromosome.

•Check if it is unique, add to the dataset.

•Run IRiS on the dataset to see if the breakpoint is detected.

Chromosomes

in silico

Chromosomes

HapMap 3 X chromosome data

•Select 2 chromosomes at random.

•Pick a random breakpoint.

•Create a new chromosome.

•Check if it is unique, add to the dataset.

•Run IRiS on the dataset to see if the breakpoint is detected.

IRiS

recombination detected?

in silico

Chromosomes

HapMap 3 X chromosome data

•Select 2 chromosomes at random.

•Pick a random breakpoint.

•Create a new chromosome.

•Check if it is unique, add to the dataset.

•Run IRiS on the dataset to see if the breakpoint is detected.

69% recombinations detected All detected recombinations detect the correct sequence No false positives

IRiS

recombination detected?

Recombinomics

 Strong population structure  Agreement with traditional methods  FST vs. recombinational distance  More informative than SNPs  STRUCTURE  PCA

Regions

18 regions selected from HapMap 3     X-chromosome in males (

to avoid phasing errors

) 50 KB away from known CNV and SD (

to avoid genotyping errors

) 50 KB away from genes (

to avoid selection

) at least 80 SNPs

Chromosomes

: LWK( 43 ), MKK ( 88 ), YRI ( 88 ), ASW ( 42 ), GIH ( 42 ), CHB ( 40 ), CHD ( 21 ), JPT( 25 ), MEX( 21 ), CEU ( 74 ), TSI ( 40 )

Analysis

For each region IRiS inferred recotypes for each chromosome  5166 recombinations were inferred  3459 co-occurred in at least two chromosomes

Recombination Chromosome LK1 LK2 : LK43 MK1 : TI40 r1

0 1 1 0 0

r2

1 0 0 1 0

r3

1 1 1 0 0

r4

0 1 0 0 0

r5

0 0 0 1 0

r6

0 0 0 1 1

… r3459

0 0 1 0

Analysis

For each region IRiS inferred recotypes for each chromosome  5166 recombinations were inferred  3459 co-occurred in at least two chromosomes

Recombination Chromosome LK1 LK2 : LK43 MK1 : TI40 r1

0 1 1 0 0

r2

1 0 0 1 0

r3

1 1 1 0 0

r4

0 1 0 0 0

r5

0 0 0 1 0

r6

0 0 0 1 1

… r3459

0 0 1 0

Recotype

Agreement with LDhat

Each point represents a short haplotype segment in HapMap CEU population

Spearman correlation = 0.711

pvalue <10 -30 number of recombinations inferred by IRiS

Agreement with LDhat

Each point represents a short haplotype segment in HapMap CEU population

Spearman correlation = 0.711

pvalue <10 -30 Correlation in hotspots c 2 = 38.39

pvalue<6x10 -10 number of recombinations inferred by IRiS

Recombinational distance between populations

Two populations genetically closer will share a higher number of recombinations

Recombinational distance D AB = 1 R AB R A + R B -R AB Correlation between FST distance and recombinational distance for the 18 region [0.35 – 0.75 ] with pvalues < 0.025

MDS All regions combined stress=6.1%

PCA of population data

Recall recotypes r1 r2 LK1

0 1 1 0

LK2 : LK43 MK1 : TI40

1 0 0 0 1 0

r3

1 1 1 0 0

r4

0 1 0 0 0

r5

0 0 0 1 0

r6

0 0 0 1 1

… r3459

0 0 1 0

PCA of population data

Recall recotypes r1 r2 LK1

0 1 1 0

LK2 : LK43 MK1 : TI40

1 0 0 0 1 0

r3

1 1 1 0 0

r4

0 1 0 0 0

r5

0 0 0 1 0

r6

0 0 0 1 1

… r3459

0 0 1 0

LK MK : TI r1

14 1 0

r2

7 4 1

r3

4 7 7

r4

9 0 1

r5

0 5 0

r6

1 7 0

… r3459

0 24 1

PCA of population data

The first two PCs capture 66.4% of the variance

LK MK : TI r1

14 1 0

r2

7 4 1

r3

4 7 7

r4

9 0 1

r5

0 5 0

r6

1 7 0

… r3459

0 24 1

PCA of recotypes

more on this later

Recotypes vs. SNPs

Due to ascertainment bias gene diversity does not reflect population structure

results similar to

Conrad 07 Percentage of variance

Across groups Within groups Within populations SNPs 9% 4% 87% Recotypes 6% 1% 93% in agreement with

Lewontin 72

Normalized comparison linearly scaled to [0,1] using 21 samples per population

K=2

from SNPs to haplotypes to recotypes (a

STRUCTURE

comparison

) SNPs haplotypes recotypes

K=3

from SNPs to haplotypes to recotypes (a

STRUCTURE

comparison

) SNPs haplotypes recotypes

K=4

from SNPs to haplotypes to recotypes (a

STRUCTURE

comparison

) SNPs haplotypes recotypes

K=5

from SNPs to haplotypes to recotypes (a

STRUCTURE

comparison

) SNPs haplotypes recotypes

Africa within global genetic variation

Structure k=4

minority African specific component

Avg. Number of recombinations in 21 random chromsomes

Out of Africa hypothesis Founder’s effect

Genetic variation within Africa

Structure k=5

Maasai specific minor component

 Subsaharan Maasai are distinct among Africans.

 African-American exhibit stronger recombinational affinity with African populations than European populations. ( Parra 98 )

Genetic variation outside Africa

Structure k=5

Avg. Number of recombinations in 21 random chromsomes

 Outside Africa, Gujarati and Japanese exhibit the highest and lowest number of recombinations respectively.

 Gujarati Indians show intermediate position between Europeans and East Asians.

Venturing outside the X-chromosome

 Benefits  The bigger picture  More regions and hence more information  Challenges  Higher number of recombinations makes the picture murkier  Phasing errors

Regions

81 regions selected from HapMap 3     50 KB away from known CNV and SD (

to avoid genotyping errors

) 50 KB away from genes (

to avoid selection

) at least 200 SNPs 25 samples per population (

each sample has two chromosomes

)

Analysis

  For each region IRiS inferred recotypes for each chromosome  34140 recombinations were inferred For each sample the two recotypes were

merged

.

SNPs recotypes PCA plots

Quantifying population structure

 PCA and by k nearest neighbors is used to predict population of every sample Perfectly classified Africans classified with errors (

0

,

7

) (

4

,

3

) ASW YRI LKK MKK Non- Africans GIH E. Asian MEX (

3

,

13

) CHB+CHD JPT European CEU (

8

,

13

) TSI Misclassification by (

recotypes

,

SNPs

)

East Asian population

Recotypes are more informative of underlying population structure.

SNPs recotypes PCA plots

in conclusion …

Recotypes

 show strong agreement with in silico and in vetro recombination rates estimates  are highly informative of the underlying population structure  provide a novel approach to study the recombinational dynamics