No Slide Title

Download Report

Transcript No Slide Title

T C

What is Haplotyping?

A G •Diploid organisms may be polyallelic at some loci ( highlighted to the left) C G A C •SNP sequencing would show the genotype as

A/T, C, A/C, G

•This would still leave two possibilities for the true haplotype:

ACCG

/

TCAG

or

ACAG/TCCG

.

•Single-molecule haplotyping can distinguish between chromosomes, and tell us that the true haplotype is

ACCG/TCAG

•Haplotypes give more information, and have better predictive power than SNP sequences

Haplotypes: Confusion among heterozygotes

Assume we have a mother (M), father (F), and child (C). We look at two SNPs on each. Then we may have , for instance: SNP 1 SNP 2 Mother: A/C G/G : Father: A/C G/T : then Child: A/C G/T.

The child can then have haplotype AG/CT, or AT/CG There is nothing to be done except collect more data – e.g. if there are two children or 4 grandparents. (Bring more data points, i.e., looking at more haplotypes only hurts you.) If one is homozygotic: Mother: A/C G/G : Father: C/C G/G : then Child: AC/GG or CC/GG ??

Single Molecule Fluorescence Haplotyping

3 kbp 17 kbp •In our approach, PCR products are site- and allele-specifically labeled with single dye molecules, and are imaged to establish a “barcode” which can be used to determine the haplotype at selected SNPs.

•Different alleles are labeled with different dyes (e.g. Cy3, Cy5), and can be distinguished by color

Three Underlying Technologies

ACCTGTCAGG C GTACCA TGGACAGTCC G CATGGT Molecular combing is used to stretch the DNA molecules on a surface prior to imaging Padlock probe labeling is used to allele specifically label the SNPs of interest.

3 kbp 17 kbp FIONA is used to localize the labels.

Combining all three gives barcoded DNA

Haplotyping by single molecule “bar-coding”

Fig. 1

Unique "haplotype barcodes" for the two haplotypes where each allele of 5 SNPs is labeled with either a red or a green fluorescent probe. The distance and color combinations between labels along the blue-stained DNA backbone are determined by fluorescence single molecule detection with TIRF microscopy. The haplotype can then be inferred from the “barcode”.

Homozygous Person, at one position (rs12797)

Fig. 2a

Three Channels: Green= Cy3, Allele A Red = Cy5, Allele G Blue non-specific YoYo.

A composite image of all three channels. The alleles of the SNP rs12797 were labeled with green ( Cy3 ) dye for the A allele and red ( Cy5 ) dye for the G allele. The positions of labeled alleles are indicated with red arrow. Few red labels were observed, indicating this sample is A/A homozygous.

Statistics of Homozygous Person

14 12 10 8 6 4 2 0 0.0

Allele A (Cy3) Allele G (Cy5) Calculated location 0.1

0.2

0.3

0.4

Location of dye molecules from one end

Fig. 2B

0.5

Histogram of the distance distribution of the results from Figure 2A. Red bar indicates the G allele and green bar represents the A allele respectively. The Gaussian curve fitting shows a green peak at 3311 ± 161 bp from one end, which is consistent with the expected distance of 3291 Location of dye molecules from one end

Heterozygous Person

Fig. 2C

A composite image of all three channels. The alleles of the SNP rs12797 were labeled with green ( Cy3 ) dye for the A allele and red ( Cy5 ) dye for the G allele. The positions of labeled alleles are indicated with red arrow. Both green and red labels were observed, indicating this sample is G/A heterozygous.

Three Channels: Green= Cy3; Red = Cy5; Blue = non-specific YoYo.

Heterozygous Person, at one position

(rs12797)

Fig. 2D

16 14 12 10 8 6 4 2 0 0.0

Allele A (Cy3) Allele G (Cy5) Calculated location of Cy3 Calculated location of Cy5 0.5

Histogram of distance distribution of the results in Figure 2C. Red indicates the G allele and green represents the A allele respectively. The Gaussian curve fitting shows a green peak and a red peak at 3459 ± 492 bp and 3413 ± 372 bp from one end respectively, which is consistent with the actual distance of 3291 bp.

0.1

0.2

0.3

0.4

Location of dye molecules from one end

As expected 50% allele A, 50% allele G

End label SNP 1 Rs898706(C/T) SNP 2 Rs12797(G/A) SNP 3 rs743242 900bp 3300bp 4460bp SNP 4 rs745318 6500bp 9300bp

Heterozygote at 4 positions

Statistics on Heterozygote

Fig. 4 All eight possible heterozygous haplotypes with their scores. The arrow indicates the score of the highlighted haplotype, RGGR/GRRG.

Inset

: Scores for Cy3 and Cy5 at each individual locus, showing that all four loci are heterozygous.

RRRR=GGGG: The two are equivalent because we can show that all four positions are heterozygous. So you could show each haplotype as a pair, RGGR/GRRG for instance. Because all four positions are heterozygous, the presence of one implies the presence of the other.

Molecular Haplotyping

Why?

Diploid organisms

(e.g. humans) may be

heterozygous

•Knowing the correlations of SNPs on each individual chromosome (the

haplotype

) confers more predictive power than SNPs alone •Most genotyping techniques use

bulk PCR

products, and

cannot distinguish chromosomes

Single molecule analysis

can allow us to distinguish discrete populations How?

•First, DNA is allele specifically labeled with single fluorescent probes using

padlock-probe labeling

•Second, DNA is stretched onto a surface using

molecular combing

•DNA backbone and fluorescent nucleotide labels are imaged using

FIONA

Double stranded DNA can be stretched with molecular combing onto a surface, as shown. Single stranded DNA can potentially be stretched if it is stabilized with RecA, a single-stranded DNA binding protein. Labeling of ssDNA with padlock probes can be more efficient with ssDNA.

Haplotyping and Genomics

•Current genotyping technologies work on bulk PCR products, and hence cannot distinguish heterozygous haplotypes , because • they cannot distinguish products from different chromosomes Haplotypes have much more predictive power than simple SNP sequencing for determining disease susceptibility and drug interactions •Once general patterns have been discerned using genome-wide SNP scans, haplotyping will be the dominant means of determining precise correlations between regions of genomic interest and disease.

•We will focus on a 500 kb region containing the HOXA locus on human chromosome 7, as part of the ongoing work of the International HapMap consortium.