Trio Phasing by Entropy Minimization Ion Mandoiu and

Download Report

Transcript Trio Phasing by Entropy Minimization Ion Mandoiu and

High-Throughput SNP Genotyping by SBE/SBH
2nd SECABC Fall
Workshop on
Biocomputing
Ion Mandoiu and Claudia Prajescu, CSE Department, University of Connecticut
Oct. 27, 2005
Abstract
SBE/SBH Assay Steps
The completion of the Human Genome Project has tremendously advanced
our understanding of the common genetic heritage shared by all of humanity.
However, much remains to be learned from the genomic diversity that exists
between individuals, which manifests itself predominantly in the form of
isolated changes called single nucleotide polymorphisms (SNPs). Largescale SNP analyses promise to provide answers to fundamental problems
ranging from determining the genetic basis of disease to drug development
and uncovering the patterns of historical population migrations.
Determining the specific nucleotides present at a SNP locus in the genetic
DNA of an individual is called SNP genotyping. Although there are numerous
SNP genotyping platforms, current technologies still offer an insufficient
degree of multiplexing and often do not allow genotyping custom sets of
SNPs. In this poster we describe a new genotyping assay architecture that
combines a solution-phase multiplexed single-base extension (SBE) reaction
with sequencing by hybridization (SBH) via hybridization to universal arrays.
Simulation results show that SBE/SBH assays can deliver improved
genotyping throughput and flexibility compared to existing genotyping
platforms.
SNP Genotyping Approaches
•
•
• Synthesize primers complementing the
upstream sequence of target SNP loci
on either sense or antisense strands
GATA
…
• Mix primers with solution of PCR
amplified genomic DNA containing
target SNP loci
• Add DNA polymerase and fluorescently
labeled dideoxynucleotides (ddNTP) to the
solution. Primers are extended by single
ddNTPs complementing SNP values. This
can be run as one reaction with ddNTPs
labeled by 4 distinct colors, or as 4
independent reactions each of which uses
only one type of ddNTP.
• Primers are separated from genomic DNA
and hybridized to an array containing all kmers for k=8-10. For more reliable solidphase hybridization, the array can
alternatively contain a set of isothermic
probes (i.e., probes that have similar melting
temperatures), such as the set of all ctokens for c=11-13 [Ben-Dor et al. 00]
T
A
T
+
• Rise temperature to denature double
stranded genomic DNA then reduce
temperature to allow primers to
hybridize. Hybridization conditions
should be stringent enough to prevent
primer hybridization at locations other
than SNP loci
T
A
T
TTGCA
T
A
GATAA
T
AA AC CC CA
AT
AG CG
Genotyping Throughput
TT
TG GG GT
TA
TC GC GA
40000
60
40
20
35000
30000
25000
20000
15000
10000
5000
0
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
0
1000
21000
41000
61000
81000
101000
#SNPs
121000
141000
161000
181000
•
Genotyping throughput depends non-linearly on primer length. We
used randomly generated genomic sequences to determine optimal
primer length using the sequential greedy SNP assignment algorithm.
100
k=10; r=1 (1 primer)
k=10; r=1 (2 primers)
k=10; r=5 (1 primer)
k=10; r=5 (2 primers)
c=13; r=1 (1 primer)
c=13; r=1 (2 primers)
c=13; r=5 (1 primer)
c=13; r=5 (2 primers)
Conclusions and Ongoing Work
Primer Length Selection
We estimated the throughput achieved by the sequential greedy SNP
assignment algorithm for various array probe sets and redundancy
requirements and randomly generated primers of length 20.
80
CT
• The array hybridization pattern is analyzed
to make the genotype calls. To enable
unique decoding, each primer should
hybridize to at least one array probe to
which no other primer hybridizes
# SNPs assignable per array
•
There are numerous SNP genotyping platforms, combining a variety of allele
discrimination techniques (sequencing, direct hybridization, primer
extension, allele-specific PCR, ligation, and cleavage, etc.), detection
mechanisms (fluorescence, mass spectrometry, etc.) and reaction formats
(solution phase, solid support, bead arrays), see, e.g., [Kwok01,Jenkins02].
Currently, the highest throughput is achieved by direct hybridization
platforms, e.g., the two-array Human Mapping GeneChip® manufactured by
Affymetrix can be used to genotype a fixed set of over 100k SNPs. However,
custom sets of SNPs cannot be easily accommodated in this technology.
Genotyping platforms that allow customization of the sets of SNPs include
tag arrays (ParAllele/Affymetrix universal arrays) and arrayed primer
extension (APEX) arrays, which currently have a throughput of 100s to
1000s of SNPs per assay.
Our approach combines the high genotyping sensitivity of solution-phase
single-base extension (SBE) reactions (already employed for SNP
genotyping in conjunction with tag arrays [Brenner97] and MALDI-TOF mass
spectrometry [Auman03]) with the technique of sequencing by hybridization
(SBH). SBE/SBH assays require the synthesis of much shorter
oligonucleotides compared to tag arrays, and, unlike APEX arrays, do not
place overly restrictive constraints on PCR primer selection.
% Assignable SNPs per Array
•
TTGC
Primer Length
200k SNPs
(1 primer)
200k SNPs
(2 primers)
100k SNPs
(1 primer)
100k SNPs
(2 primers)
20k SNPs (1
primer)
20k SNPs (2
primers)
10k SNPs (1
primer)
10k SNPs (2
primers)
2k SNPs (1
primer)
2k SNPs (2
primers)
•
•
•
•
•
•
•
•
Empirical studies show high genotyping throughput for the new
SBE/SBH assay, even with a simple SNP assigning algorithm
In ongoing work we research improved algorithms for the maximum
assignable SNP set problem, including methods for exploiting further
degrees of freedom such as the presence of disjoint SNP allele sets. We
are also exploring probabilistic decoding algorithms that model possible
hybridization errors.
References
Y. Aumann, E. Manisterski, and Z. Yakhini. Designing optimally multiplexed SNP genotyping
assays. Proc. 3rd Workshop on Algorithms in Bioinformatics, pp. 320-338, 2003.
A. Ben-Dor, R. Karp, B. Schwikowski, and Z. Yakhini. Universal DNA tag systems: a
combinatorial design scheme. Journal of Computational Biology, 7:503-519, 2000.
S. Brenner. Methods for sorting polynucleotides using oligonucleotide tags. US Patent
5,604,097, 1997.
K. Cameron. Induced matchings, Discrete Applied Mathematics, 24:97-102, 1989.
S. Jenkins and N. Gibson. High-throughput SNP genotyping. Comparative and Functional
Genomics, 3:57--66, 2002.
P.Y. Kwok. Methods for genotyping single nucleotide polymorphisms. Annual Review of
Genomics and Human Genetics, 2:235-258, 2001.
L.J. Stockmeyer and V.V. Vazirani. NP-completeness of some generalizations of the
maximum matching problem. Information Processing Letters, 15:14-18, 1982.