Real-Time Primer Design for DNA Chips Annie Hui CMSC 838 Presentation Use of primers in PCR and Microarrays PCR (polymerase chain reaction: to amplify a.
Download
Report
Transcript Real-Time Primer Design for DNA Chips Annie Hui CMSC 838 Presentation Use of primers in PCR and Microarrays PCR (polymerase chain reaction: to amplify a.
Real-Time Primer Design for
DNA Chips
Annie Hui
CMSC 838 Presentation
Use of primers in PCR and Microarrays
PCR (polymerase chain reaction:
to amplify a particular DNA fragment
Use: to test for the presence of nucleotide sequences
Test of PCR products:
Ladder: a mixture of fragments of known length
Lane 1 : PCR fragment is ~1850 bases long.
Lane 2 and 4 : the fragments are ~ 800 bases long.
Lane 3 : no product is formed, so the PCR failed.
Lane 5 : multiple bands are formed because one of
the primers fits on different places.
CMSC 838T – Presentation
Use of primers in PCR and Microarrays
DNA chips (Microarrays):
to analyse a large number of genes in parallel.
fluorescence
Primers:
20 to 100 bases long
Synthetically manufactured
Bound to primer
Automated design of primer
A computational approach
Objective: To find primers that bind
well without self-hybridizing
Critique: how accurate?
Fixed on chip
CMSC 838T – Presentation
Motivation:
This group uses the
automated NucliSens
extraction system
(bioMerieux) to
develop their primers
here.
CMSC 838T – Presentation
Technique: The computational model
1.
Select primers from target sequence
two primers P (forward) and Q (reverse) for PCR, one primer
for DNA chip (microarray)
Using window size W, number of possible primers with length
n
between m and n within 1 window is: S
(W l ) 1
l m
CMSC 838T – Presentation
Technique: The computational model
2.
For each primer pair, or single primer,
Quantify 4 hybridization conditions:
a.
Primer length
b.
Melting temperature
c.
GC content
d.
Secondary structure
i.
ii.
iii.
iv.
We are starting here
Self annealing
Self end annealing
Pair annealing
Pair end annealing
CMSC 838T – Presentation
Technique: quantifying hybridization conditions
a.
Primer length len(P)
b.
Affect melting temperature and hybridization
Melting temperature Tm(P)
Temperature at which the bonds between primer
and gene sequence break
n 1
c.
H p H pi , pi 1
CG content CG(P)
T
p
H ni1p1
m ,1
S p S p , p
G-C pairs are more stable than
A-Tpairs
S p
R ln 4
(because of more H-bonds)
p primer
i 1
i
R 1.987cal / C mol
50 109
# G in P # C in P
GC p
100
T0 237.15 C
p
What is this measure good for?
t 21.6 C
H p enthalpy
S p entropy
CMSC 838T – Presentation
T0 t
i 1
Technique: quantifying hybridization conditions
d.
Secondary structure
Study how likely a primer entangles with itself or with another
primer
P = {p1, p2, …, pn}, Q = {q1, q2, …, qm},
Scoring function:
S(pi, qj)
Example:
= 2
= 4
if {pi, qj} = {A, T}
if {pi, qj} = {C, G}
=
otherwise
0
Position i of primer P
P: ...AGCTTTAGCCATAG
Q:
TCTTAGGATCGC...
score S(pi, q1) = 2+4+2+2+4 = 14
CMSC 838T – Presentation
Technique: quantifying hybridization conditions
Four measures of secondary structure:
i.
Self annealing,
•
SA(P, P’)
P’ = reverse of P
SA( p, p' )
P
m
max s( p , p
k 1 m,...,m 1 i 1
ii.
ik
')
P’ P’ P’P’ P’ P’ P’
Self end annealing, SEA(P, P’)
•
•
•
iii.
i
Like Self annealing
P’ P’ P’ P’
k>=0
Only count longest continuous overlaps
Pair annealing,
PA(P, Q)
P and Q are the forward and reverse primers
Pair end annealing, PEA(P, Q)
•
iv.
P
•
similar to self end annealing
CMSC 838T – Presentation
Technique: How to apply the model
For PCR:
SCPCR( p, q)
[ len( p) GC( p) Tm ( p) SA( p) SEA( p)
len(q) GC(q) Tm (q) SA(q) SEA(q) PA( p, q) PEA( p, q) ]
P is forward primer, Q is reverse primer
Ideally, no annealing, length, GC and temp of P equals Q
SCPCRideal p len p
0 0 0 0
w 0.5 1 1 0.1 0.2 0.5 1 1 0.1 0.2 0.1 0.2
GC p Tm, p
The optimization is:
0 0 len p
GC p Tm, p
min lPCR p
p
where
lPCR p SCPCR ( p, q ) SCPCRideal p wT
For DNA chips (Microarrays):
Q doesn’t exist. No pair annealing to study. Only 5 terms left.
CMSC 838T – Presentation
Technique: parallelize SCPCR(p,q) calculation
Compute PA and
PEA in parallel
Calculate Len, GC,
Temp, SA and SEA
in parallel
CMSC 838T – Presentation
Technique: details
Melting temperature and CG content:
Simple adder+divider
Use pipelining
1st one: O(m)
Subsequent cost: O(1)
Whole window: AGCGATATA
i-th P primer:
GCGATA
(i+I)-th P primer: CGATAT
• CG(Pi+1) = CG(Pi) - 1
• H(Pi+1) = H(Pi) - H(GC) + H(AT),
• similar for S
Annealing matrix
c
b cd
a bd ce
ad be cf
d ae bf
e af
f
CMSC 838T – Presentation
Complexity
Complexity for sequential algorithm:
For PCR:
p
Number of choices of P (window size=Wp): S l m p (W p l ) 1
n
Number of choices of Q (window size=Wq): T l m (Wq l ) 1
Each distance SCPCR(P,Q): Ol p2 lq2 l plq
Total: OS T Wp2 Wq2 WpWq
nq
q
Complexity for parallel algorithm:
For PCR:
Distance measure SCPCR(P, Q) = O(1)
Total: O(S*T)
O(S*S*T*T) is a typo in the paper
Similar but simpler for Microarray
CMSC 838T – Presentation
Evaluation
Experimental environment
512 primer pairs, |Wp| = |Wq| = 16
1.
500MHz Celeron system with integrated hardware accelerator
2.
Software implementation
Evaluation results
1920 secs for software implementation
3.41 secs for using hardware accelerator
CMSC 838T – Presentation
Related Work
Previous approach
DOPRIMER
Same computational model
Differ in the way of doing dynamic programming
Sequential in nature
Other Primer selection softwares
Eg: Primer Premier 5, Primer3, PrimerGen, PrimerDesign
Similarities:
Criteria: Length, Temp range, GC range, GC Clamp, 3’ end stability,
uniqueness of 3’ end base, Dimer/hairpins, Degeneracy, Salt
concentration, Annealing Oligo Concentration, etc
Differences:
Not a weighed linear sum of all criteria
Need much expert’s supervision,
the numerical criteria are used as a guide only
CMSC 838T – Presentation
More Related Works
Case study
Burpo did a critical review of PCR primer design algorithms
Subject: saccharomyces cerevisiae deletion strains
Conclusion:
no suitable program for the task of post-design PCR analysis
Especially in the aspect of accurately predicting non-specific
hybridization events that impair PCR amplification.
CMSC 838T – Presentation
Observations
My observations:
Minus side:
Is the computational model too simplistic?
Specifically, is a weighed linear sum justified?
Plus side:
The design of the parallel architecture is neat.
Since primers are about the length of 18-22 bases, current
technology certainly can handle it.
When would you need fast primer selection?
Primer walking to connect contigs together quickly
To scan through a large number of sequences for possible
primers
CMSC 838T – Presentation