MOLECULAR MARKER TECHNOLOGIES

Download Report

Transcript MOLECULAR MARKER TECHNOLOGIES

MOLECULAR MARKER
TECHNOLOGIES
Training Workshop on Forest Biodiversity
5-16 June 2006
Lee Soon Leong
Forest Research Institute Malaysia
Outlines






Organization and flow of genetic information

Case study 2: using microsatellites for individualspecific DNA fingerprints
Molecular techniques to reveal genetic variation
Type of molecular markers
Which marker for what purpose
Microsatellite marker
Case study 1: using microsatellites to estimate gene
flow via pollen
FLOW OF GENETIC INFORMATION
Deoxyribonucleic Acid (DNA):
The molecule that encodes
genetic information
A pairs with T
C pairs with G
DNA molecule consists of
two strands that wrap
around each other to
resemble a twisted ladder
Nuclear DNA: Diploid; biparental inherited; recombination occur; can be viewed as a
huge ocean of largely nongenic DNA, with some tens of thousands of genes and gene
clusters scattered around like small islands and archipelagos. A high proportion of this
apparently nonfunctional DNA consists of repeated motifs and may be considered as
junk DNA or selfish DNA
Choroplast DNA: Haploid; usually maternally inherited in
angiosperms and paternally inherited in gymnosperms; typically ranging
from 135 to 160 kb in size, is packed with genes and thus resembles the
streamlined configuration of its cyanobacterial ancestral genome
Mitochondrial DNA: Haploid; typically maternally inherited; about 370 to 490
kb, about 10% of these sequences represent genes, another 10 to 26% were
found to be made up of repetitive DNA, including retrotransposons. Thus, the
majority of plant mtDNA sequences lack any obvious features of information
• Organism’s genomic DNAs are subjected to
mutation as a result of normal cellular operations or
interactions with environment
• The rates of mutation are depending on:

Biology of organism

Genomes under consideration

Types of mutations
• Mutations in genomic DNA can be classified into
several categories:
Base substitution
Deletion
GATCCGAGTATCGCAATTAGCA
GATCCGAGTGTCGCAATTAGCA
GATCCGAGTATCGCAATTAGCA
GATCCGAGTAATTAGCA
Insertion
GATCCGAGTATCGCAATTAGCA
GATCCGAGTATCGCAGCATTAGCA
Duplication
Inversion
GATCCGAGTATCGCAATTAGCA
GATCCGAGTATCTCGCAATTAGCA
GATCCGAGTATCGCAATTAGCA
GATGCCAGTATCGCAATTAGCA
Through long evolutionary accumulation, many
different instances of mutation as mentioned above
should exist in any given species
The number and degree of the various types of
mutations define the genetic diversity within a
species
It has been widely recognized that loss of genetic
diversity is a major threat for the maintenance and
adaptive potential of species
Low Genetic diversity
• Example - if low genetic
diversity, when a virulent
form of a disease arises,
many individuals may be
susceptible and die
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
All die
• But as a result of natural
genetic diversity within local
plant populations, there may
be some individuals that are
at least partially resistant
and there are able to survive
and thus perpetuate the
species
High Genetic diversity
R
S
S
S
R
S
S
R
S
S
R
S
S
S
R
S
S
R
S
S
Partially resistant
• For many plant species, ex situ and in situ
conservation strategies have been developed to
safeguard the extant of genetic diversity
• To manage this genetic diversity effectively the
ability to identify genetic diversity is
indispensable
• In addition, for this variation to be useful, it must
be heritable and discernable; as recognizable
phenotypic variation or as genetic mutation
distinguishable through molecular marker
technologies
Definition of molecular markers
A sequence of DNA or protein that can be screened
to reveal key attributes of its state or composition
and thus used to reveal genetic variation
Mutation
Mutation arises
genetic variation at
the DNA level
Subsequently,
mutation arises
genetic variation at
DNA will cause
variation at the protein
level
DNA
markers
Protein
markers
• Four major molecular techniques are commonly
applied to reveal genetic variation. These are:
 Polymerase chain reaction (PCR)
 Electrophoresis
 Hybridization
 DNA sequencing
POLYMERASE CHAIN REACTION
PCR is a procedure used to amplify (make multiple copies of)
a specific sequence of DNA
The method was invented by Kary Banks Mullis in 1983, for
which he received the Nobel Prize in Chemistry ten years later
three temperaturecontrolled steps
ELECTROPHORESIS
The term 'electrophoresis' literally means "to carry with
electricity"
Technique for separating the components of a mixture of
charged molecules (proteins, DNAs, or RNAs) in an electric
field within a gel or other support
Migration rate
depend on electrical
charge and size
HYBRIDIZATION
One of the most commonly used
nucleic acid hybridization techniques
is Southern blot hybridization
Southern blotting was named after
Edward M. Southern who developed
this procedure at Edinburgh
University in the 1975
SEQUENCING
The process of determining the order of the nucleotide bases
along a DNA strand is called sequencing
In 1977, 24 years after the discovery of the structure of DNA,
two separate methods for sequencing DNA were developed:
chain termination method and chemical degradation method
Principle: single-stranded DNA molecules that differ in length
by just a single nucleotide can be separated from one another
using PAGE
Chain elongation
proceeds until, by
chance, DNA
polymerase inserts a
dideoxynucleotide,
blocking further
elongation
Recent detection techniques
TaqMan – a probe used to detect specific sequences in PCR
products by employing 5’ to 3’ exonuclease activity of the Taq
DNA polymerase
Pyrosequencing – refers to sequencing by synthesis, a simple
to use technique for accurate analysis of DNA sequences
Microarray Technology – a high throughput screening
technique based on the hybridization between oligonucleotide
probes (genomic DNA or cDNA) and either DNA or mRNA
TYPES OF MOLECULAR MARKERS
• Due to rapid developments in the field of molecular genetics, a
variety of molecular markers has emerged during the last few
decades
Biochemical
marker
Allozyme
Traditional
marker systems
Non-PCR based
marker
RFLP, Minisatellite (VNTR)
PCR based
marker
Microsatellite, RAPD, AFLP, CAPS
(PCR-RFLP), ISSR, SSCP, SCAR,
SNP, etc.
PCR generation:
in vitro DNA
amplification
Allozyme (biochemical marker)
• The alternative forms of a particular protein visualized on a gel
as bands of different mobility. Polymorphism due to mutation
an amino acid has been replaced, the net electric charge of the
protein may have been altered
Technique: Electrophoresis and enzyme staining
RFLP (Non-PCR based marker)
• Targets variation in DNA restriction sites and in DNA restriction
fragments. Sequence variation affecting the occurrence (absence
or presence) of endonuclease recognition sites is considered to
be main cause of length polymorphisms
Techniques: Electrophoresis and hybridization
RAPD (PCR-based marker)
Uses primers of random sequence to amplify DNA fragments by
PCR. Polymorphisms are considered to be primarily due to variation
in the primer annealing sites, but they can also be generated by
length differences in the amplified sequence between primer
annealing sites
Techniques: PCR and Electrophoresis
AFLP (PCR-based marker)
• A variant of RAPD. Following restriction enzyme digestion of
DNA, a subset of DNA fragments is selected for PCR amplification
and visualization
Techniques: PCR and Electrophoresis
Microsatellite (PCR based marker)
• Targets tandem repeats of a small (1-6 base pairs) nucleotide
repeat motif. Polymorphism due to the number of tandem
repeats
Peak: Scan 3512
12_08.fsa
142
8 Green
144
146
Size 143.84
148
150
Height 158
152
154
156
Area 1485
158
160
162
164
166
168
170
172
174
176
178
180
182
184
3000
2000
1000
155.02
13_10.fsa
163.13
10 Green
4000
3000
2000
1000
155.06
14_12.fsa
161.09
12 Green
4000
3000
2000
1000
154.98
15_14.fsa
161.01
14 Green
2000
1000
153.01
16_16.fsa
157.03
16 Green
3000
2000
1000
155.10
17_01.fsa
157.05
1 Green
4000
2000
155.06
18_03.fsa
163.09
3 Green
4000
2000
156.13
165.12
Techniques: PCR and Electrophoresis
Other markers
• Cleaved Amplified Polymorphic Sequence (CAPS/PCR-RFLP)
• Inter Simple Sequence Repeat (ISSR)
• Single-strand conformation Polymorphism (SSCP)
• Sequence Characterized Amplified Region (SCAR)
More recent markers
• Single-Nucleotide Polymorphism (SNP)
• Retrotransposon-based markers
 Sequence-Specific Amplified Polymorphism (S-SAP)
 Inter-retrotransposon Amplified Polymorphism (IRAP)
 Retrotransposon-Microsatellite Amplified Polymorphism (REMAP)
 Retrotransposon-Based Insertional Polymorphism (RBIP)
Weising, K., Nybom, H., Wolff, K. and Kahl, G. 2005. DNA
Fingerprinting in Plants, Priciples, Methos, and Applications. 2nd
Edition. CRC Press, Boca Raton, Florida, USA.
Spooner, D., van Treuren, R. and de Vicente, M.C. 2005.
Molecular markers for genebank management. IPGRI Technical
Bulletin No. 10. International Plant Genetic Resources Institute,
Rome, Italy.
Henry, R.J. 2001. Plant Genotyping: The DNA Fingerprinting of
Plants. CAB International Publishing, Wallingford, U.K.
Markers differ with respect to important
features:
• Genomic abundance
• Polymorphism level
• Locus specificity
• Reproducibility
• Technical requirements
• Financial investment
• Codominance or dominace
Codominant marker:
A marker in which both alleles are
expressed, thus heterozygous individuals
can be distinguished from either
homozygous state
Dominant marker:
A marker shows dominant inheritance with
homozygous dominant individuals
indistinguishable from heterozygous
individuals
None of the available techniques is superior to all others for a
wide range of applications, but the key-question rather is
which marker to use in which situation
• Within and among population variation – Allozyme, SSR, AFLP and RAPD
• Mating system study – Allozyme or microsatellite
• Estimating gene flow via pollen and seed – Microsatellite (SSR)
• Phylogeography – cpSSR
• Clonal identification – AFLP or RAPD
• Polyploidy – multilocus dominant marker (AFLP)
• Genetic Linkage Mapping – AFLP, RAPD, Allozyme, RFLP, SSR, CAPS, SNP
• Phlogenetic study – conserve within species (DNA sequencing)
Intraspecific (among individuals) – markers target less conserve region
.
Interspecific
(among species) – markers target more conserve region
• A framework for selecting appropriate
techniques for plant genetic resources
conservation can be referred to:
Karp, A., Kresovich, B., Bhat, K.V., Ayad, W.G. and Hodgkin, T.
1997. Molecular Tools in Plant Genetic Resources Conservation: A
Guide to the Technologies. IPGRI Technical Bulletin No. 2.
International Plant Genetic Resources Institute, Rome, Italy
Microsatellite marker
 What are microsatellite?
 Where are microsatellites found?
 How do microsatellites mutate?
 Abundance in genome
 Why do microsatellite exist?
 Models of mutation
 Development of microsatellite primers
 Genotyping procedure
 Advantages
 Disadvantages
 Applications
What are microsatellite?
• Tandem repeated sequences with a 1-6 repeat motif
 Dinucleotide
(CT)6
 Trinucleotide
(CTG)4
 Tetranucleotide (ACTC)4
-
CTCTCTCTCTCT
CTGCTGCTGCTG
ACTCACTCACTCACTC
• Synonymous to SSR and STR; Depending on nature of
repeat tract, SSR can further divided into four
categories:
 Perfect repeat when repeat
tract pure for one motif
CTCTCTCTCTCT
 Compound SSR when repeat
tract pure for two motifs
CTCTCTCACACA
 Imperfect SSR if single base
substitution
CTCTCTACTCTCT
 Region of cryptic simplicity if
complex but repetitive structure
GTGTCACAGAGT
Where are microsatellites found?
Majority are in non-coding region
How do microsatellites mutate?
• Microsatellites alleles change rather quickly over time
 E. coli – 10-2 events per locus per replication
 Drosophila – 6 X 10-6 events per locus per generation
 Human – 10-3 events per locus per generation
DNA polymerase slippage
Unequal crossing over
Abundance in genome
• Microsatellites have been found in every organism
studied so far
• Most frequent in human > insect > plant > yeast >
nematode
• Most common dinucleotide:
 Human
CA/GT
 Conifer
GA/CT & CA/GT
 Dipterocarp
GA/CT
Why do microsatellite exist?
• Majority are found in non-coding regions; thought
no selective pressure; as "junk" DNA?
• Regulate gene expression and protein function,
e.g., human diseases caused by expansions of
polymorphic trinucleotide repeats in genes fragile X
and myotonic dystrophy
• In plant, high density of SSRs were found in close
proximity to coding regions; regulatory properties
• High level of polymorphism; a necessary source of
genetic variation
Models of Mutation
• Several statistics based on estimates of allele frequencies
(e.g., Fst & Rst) rely explicitly on a mutation model
• The mutation model still unclear but stepwise mutation
appears to be the dominant force creating new alleles in
the few model organisms studied to date
 Stepwise Mutation Model (SMM) - when SSRs mutate,
they gain or lose only one repeat
 Two alleles differ by one repeat are more closely
related than alleles differ by many repeats
CTCTCT
CTCTCTCT
CTCTCTCTCT
• Size matters when doing statistical tests of population
substructuring
Development of microsatellite primers
• Can be time consuming and expensive. May be
obtained by screening sequence in databases or
screening libraries of clones
• Standard method to isolate microsatellites from clones





Creation of a small insert genomic library
Library screening by hybridization
DNA sequencing of positive clones
Primer design and PCR analysis
Identification of polymorphisms
• This approach can be extremely tedious and inefficient
for species with low microsatellite frequencies
• Alternative strategies to overcome




Selective hybridization using nylon membrane
Selective hybridization using steptavidin coated beads
RAPD based
Primer extension
Genotyping procedure
PCR
Electrophoresis
Agarose
PAGE
Denaturing PAGE
Capillary
Visualization
Silver
staining
SybrGreen
staining
Autoradiography
Fluorescent
dyes
• The use of fluorescently labeled primers, combine with
automated electrophoresis system greatly simplified
the analysis of microsatellite allele sizes
Primer1
Primer2
Primer3
Primer4
102
29_10.fsa
104
106
10 Green
108
110
112
114
116
118
120
122
124
126
128
130
132
134
136
138
140
142
144
146
148
150
6000
4000
2000
Peak:
92
017_01.fs a
30_12.fsa
Scan
94
2946
96
1 Blue
12 Green
Size
98
106.67
100
102
Height
104
106
1 0 8122.29A r e a
108
110
107.62
119.09
018_03.fs a
31_14.fsa
122.28
775
112
114
116
Locus 1
118
120
122
124
126
128
130
132
134
4000
3000
2000
1000
111.87
3 Blue
14 Green
2000
1000
120.24
019_05.fs a
32_16.fsa
109.75
116.16
124.18
5 Blue
16 Green
3000
2000
107.69
123.34
020
aa n
Peak
:_07.fs
Sc
33_01.fsa
1a.fs a 242
7 Blu
310
0 e
1 Green
24
3
4
4 Blue 246
Size
248
Peak:
04b.fsa
132
257.25
250
Scan
134
252
1919
10136
Gr een 138
254
Size
140
Height
142
256
149.07
144
146
110
258
260
Height
148
Area
150
67
152
668
262
Area
154
264
266
268
158
272
274
276
278
280
282
284
286
288
290
2000
2000
160
162
164
166
168
170
172
174
176
178
180
182
184
186
188
1000
190
1000
3000
109.78
120.23
26 Blue
021_09.fs a
34_03.fsa
270
309
156
260.20
2a.fs a
1000
Locus 2
131.42
2000
126.40
1000
9 Blue
3 Green
150.93
2000
155.07
1000
05b.fsa
4000
13 Gr een
260.20
3000
2000
1000
261.18
109.78
3a.fs a
31 Blue
022_11.fs a
35_05.fsa
120.24
11 Blue
5 Green
06b.fsa
155.07
111.88
124.33
163.02
16 Gr een
261.18
4a.fs a
20 Blue
109.69
155.02
120.24
023_13.fs a
5a.fs a
122.29
Locus 3
118.45
163.02
3000
2000
1000
19 Gr een
260.20
266.10
3000
2000
1000
155.13
10
7.59
103.36
900
600
300
163.02
266.10
22 Gr een
4000
3000
2000
1000
35 Blue
150.94
158.96
266.04
09b.fsa
7a.fs a
1000
300
8 Blue
08b.fsa
6a.fs a
2000
900
600
13 Blue
07b.fsa
2000
800
600
400
200
2 Green
267.01
36 Blue
Locus 4
4000
3000
2000
1000
2000
1500
1000
500
1500
1000
500
155.02
163.00
267.06
10b.fsa
5 Green
4000
3000
2000
1000
150.94
11b.fsa
155.02
8 Green
4000
3000
2000
1000
155.07
136
138
14
106
01-068.f sa
108
110
112
114
116
118
120
122
124
126
128
130
132
134
7 Yellow
3000
2000
1000
120/120
118.36
02-052.f sa
Numberous bands differ in
7 Yellow size by 2 bp caused by
122/122
slippage of DNA polymerase
120.50
121.41
Stutter
Extra A
Non-templated addition
of an extra A to 3’ end of
PCR products
4000
2000
120.50
122.49
123.40
03-115.f sa
5 Yellow
2000
1500
1000
500
120/122
118.37
120.49
122.54
121.39
04-054.f sa
123.43
11 Yellow
120/124
1500
1000
500
120.50
05-022.f sa
124.53
11 Yellow
1500
120/126
1000
500
120.49
06-039.f sa
126.55
13 Yellow
120/128
2000
1500
1000
500
120.49
128.52
1000
Peak: Scan 3034 Size 255.35 Height 193 Area 1214
112
114
116
118
150.93
183.6 189.56 195.38
185.63 191.53 197.35
155.07
236 238 240 242 244 246 248 250 252 254 256 258 260 262 264 266 268 270 272 274 276 278 280 282 284 286 288 290
120 09a.fsa 122 2 Bl124
126
128
130
132
134 05b.fsa136 13138
142
144
146
148
150
86 88 90 92 94 96 98 100 102 104 106 108 110 112 114 116 118 120 122 124 126 128 130 132 134 136 138 140
Green 140
ue
15h_13.fsa 13 Yellow
6000
4000
2000
258.23
122.29
10a.fsa
5 Blue
06b.fsa
266.19
155.07
11a.fsa
8 Blue
07b.fsa
266.21
12a.fsa
11 Blue
08b.fsa
6000
155.13
22 Green
13a.fsa
131.42
253.31
14 Blue
266.19
09b.fsa
3000
2000
4000
2000
1000
2000
150.94
4000
3000
2000
1000
6000
1000
6 Yell ow
4000
3000
2000
1000
98.83 100.80 102.92
163.00
400
2000
200
17e_01.fsa 1 Blue
5 Green
4000
2000
124.33
600
4000
117.69 119.82 121.85
17 Blue
10b.fsa
258.33
20h_08.fsa
900
4000
3000
2000
1000
2000
3000
2000
1000
600
300
107.15 109.23 111.32
155.02
18e_03.fsa 3 Blue
20 Blue
11b.fsa
8 Yell ow
4000
266.30
150.94
15a.fsa
2000
266.20
155.02
120.24
200
150
100
50
16e_16.fsa 16 Blue
19h_06.fsa
4000
14a.fsa
4000
3000
2000
1000
107.18 109.24 111.40
158.96
2 Green
126.40
100
15e_14.fsa 14 Blue
4 Yell ow
2000
2000
120.23
200
107.11 109.24 111.30
4000
1000
200
14e_12.fsa 12 Blue
4000
18h_04.fsa
2000
300
100
2 Yell ow
163.02
3000
123.34
1000
4000
3000
2000
1000
96.88 98.85 100.87
17h_02.fsa
266.19
3000
2000
163.02
19 Green
258.23
13e_10.fsa 10 Blue
2000
1000
124.18
15 Yellow
200
100
4000
155.02
3000
2000
1000
107.17 109.23 111.40
16h_15.fsa
2000
120.24
2000
1000
163.02
16 Green
258.32
122.28
12e_08.fsa 3000
8 Blue
4000
2000
4000
3000
2000
1000
119.09
6000
600
8 Green
2000
1000
4000
2000
4000
3000
2000
1000
400
200
Advantages
 Low quantities of template DNA required (10-100 ng)
 High genomic abundance
 Random distribution throughout the genome
 High level of polymorphism
 Band profiles can be interpreted in terms of loci and alleles
 Codominance of alleles
 Allele sizes can be determined with an accuracy of 1 bp,
allowing accurate comparison across different gels
 High reproducibility
 Different SSRs may be multiplexed in PCR or on gel
 Wide range of applications
 Amenable to automation
Disadvantages
 High development costs in case primers are not yet
available. Primers might be species specific
 Heterozygotes may be misclassified as homozygotes when
null-alleles occur due to mutation in the primer annealing
sites
 Stutter bands on gels may complicate accurate scoring of
polymorphisms
 Underlying mutation model (infinite alleles model or
stepwise mutation model) largely unknown
 Homoplasy due to different forward and backward
mutations may underestimate genetic divergence
Applications
Generally, high mutation rate makes them informative and
suitable for intraspecific studies but unsuitable for studies
involving higher taxonomic levels
 Population genetics: investigations within a genus of centers
of origin, genetic diversity, population structures and
relationships among species
 Parentage analysis: seed orchard monitoring, mating systems
and gene flow via pollen & seed
 Fingerprinting: clone confirmation and individual-specific
fingerprints
 Genome mapping - Constructing full coverage or QTL maps
 Comparative mapping - Genome structure, framework maps,
or transferring trait and marker data among species
Case study 1: Using
microsatellites to estimate
gene flow via pollen
 Pollen flow distance?
 Outcrossing rate?
 Effective breeding unit?
Shorea leprosula
Shorea parvifolia
Methodology
Sample collection
DNA extraction
SSRs development
SSRs analysis
Data analysis
1. Gene flow: exclusion and likelihood approaches
2. Effective breeding unit: Nason et al. (1998)
3. Model of pollen dispersal to get maximum pollen
flow distance
Microsatellite Loci
No. of
clones
sequenced
No. of
clones with
SSR (%)
No. of
unique SSR
clones (%)
Core sequence (no. of clones; % &
repeat times)
CT/GA (266; 84.4 & 6-78)
624
Locus
592
(94.9)
315
(53.2)
Primer sequence (5’
– 3’)
GT/CA (29; 9.2 & 8-46)
Others (20; 6.4 & 6-40)
Repeat
motif
Length
N
Size
range
He
PIC
lep074a
F: ATC ACC AAG TAC CTA TCA TCA
R: GCA ATG GCA CAC AGT CTA TC
(CT)11
124
11
110-130
0.824
0.791
lep079
F: GTT GTC TGT TCT TAC CAG GAA G
R: GCA TAA GTA TCG TCG CCA
(CT)11
162
13
155-198
0.830
0.798
lep111a
F: GGA AAC TAC TGG AGC AGA GAC
R: GGT GGG TTA TGG AGA ATG AG
(GA)14
152
12
138-154
0.855
0.821
lep118
F: AAA GCG TAC AAA TTC ATC A
R: CTA TTG GTT GGG TCA GAA GG
(GA)16
170
15
145-176
0.892
0.861
lep280
F: GCA ACT AAA ATG GAC CAG A
R: GAG TAA GGT GGC AGA TAT AGA G
(CT)7
119
11
107-137
0.851
0.816
lep384
F: CCA AGA CAA CTC AAT CCT CA
R: AGA TGA AGG TGT TGC TGT G
(CT)13
206
14
191-219
0.657
0.632
lep562
F: TGA TTT GGG TGG TTG TAG
R: TAT TAC ATT TTT CAA GTC AAG TC
(GT)8
164
12
154-180
0.883
0.852
Lee, S.L. et al. 2004. Isolation and characterization of 21 microsatellite loci in an important tropical
tree Shorea leprosula and their applicability to S. parvifolia. Molecular Ecology Notes 4: 222-225
50 ha demographic plot in Pasoh Forest Reserve
500
Distance/m
400
300
200
100
0
0
100
200
300
400
500
600
700
800
900
1000
Distance/m
Pasoh Forest Reserve - 50-ha plot (190 individuals of S. leprosula and 102
of S. parvifolia  27 cm dbh within the 50-ha plot)
• Shorea leprosula – 9 loci (Pe = 0.999)
 lep074a, lep384, lep111a, lep118, lep280,
lep267, lep294, lep475 & lep562
 PCR (500 x 9 = 4500 reactions)
• Shorea parvifoila – 6 loci (Pe = 0.999)
 lep074a, lep384, lep111a, lep118, lep280 &
lep294
 PCR (360 x 6 = 2160 reactions)
S. leprosula (SL48)
500
450
400
350
300
250
200
MT48
150
100
50
0
0
50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
850
900
950
1000
S. parvifolia (SP35)
500
450
400
350
300
MT35
250
200
150
100
50
0
0
50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
850
900
950
1000
Mother tree
(no. of seed analyzed)
Mean distance
between MT
% outcrossing
(no. of seed)
% pollen
outside plot
Mean pollen flow
distance
SL048 (45)
267.1  136.2
93.3 (42)
20.0 (9)
152.9  99.6
SL062 (44)
363.2  151.6
88.6 (39)
20.5 (9)
302.6  188.9
SL074 (48)
259.2  151.2
85.4 (41)
18.8 (9)
148.6  187.2
SL075 (43)
292.6  145.8
67.4 (29)
18.6 (8)
173.1  103.8
SL084 (46)
512.6  228.3
82.6 (38)
23.9 (11)
448.2  245.3
SL109 (45)
343.7  158.8
95.6 (43)
33.3 (15)
285.0  154.5
SL160 (44)
567.1  243.1
81.8 (36)
31.8 (14)
580.3  288.4
Mean
372.2  121.6
85.0  9.3
23.8  6.2
298.7  164.0
SP009 (32)
309.0  166.5
59.4 (19)
9.4 (3)
61.9  100.5
SP014 (48)
307.7  165.1
62.5 (30)
14.6 (7)
105.1  140.9
SP020 (42)
348.7  172.2
85.6 (36)
33.3 (14)
194.0  146.7
SP022 (47)
239.6  133.2
72.3 (34)
21.3 (10)
148.2  125.0
SP025 (46)
376.2  192.4
56.5 (26)
19.6 (9)
317.1  277.0
SP035 (44)
244.2  139.9
22.7 (10)
2.3 (1)
185.0  159.7
Mean
304.2  54.7
59.8  21.1
16.8  10.7
168.6  88.1
Shorea leprosula
Shorea parvifolia
Mother tree
(no. of seed analyzed)
Breeding unit parameters
Size (individual)
Area (ha)
Radius (m)
SL048 (45)
203.6
63.6
450.1
SL062 (44)
208.0
65.0
454.9
SL074 (48)
205.0
64.1
451.6
SL075 (43)
221.0
69.0
468.8
SL084 (46)
225.2
70.4
473.3
SL109 (45)
245.7
76.8
494.4
SL160 (44)
261.8
81.8
510.3
224.3  22.1
70.1  6.9
471.9  23.0
SP009 (32)
81.9
59.4
434.7
SP014 (48)
90.0
65.2
455.6
SP020 (42)
112.9
81.8
510.3
SP022 (47)
97.8
70.8
474.8
SP025 (46)
105.5
76.5
493.4
SP035 (44)
76.7
55.6
420.5
94.1  13.9
68.2  10.1
464.9  34.5
Shorea leprosula
Mean
Shorea parvifolia
Mean
A:\data\pollen curve testing sarang.xls
A:\data\pollen curve testing tembaga.xls
Rank 2 Eqn 8157 Exponential(a,b)
Rank 29 Eqn 8157 Exponential(a,b)
r^2=0.8084237 DF Adj r^2=0.78588531 FitStdErr=0.02007574 Fstat=75.957342
a=0.16445904
b=346.58324
r^2=0.81184414 DF Adj r^2=0.78289709 FitStdErr=0.046788411 Fstat=60.4064
a=1.3650821
b=42.410263
0.175
0.15
0.15
0.125
0.125
0.45
0.45
0.4
0.4
0.35
0.35
0.1
0.1
0.075
0.075
0.05
0.05
Frequency
Frequency
Frequency
0.3
0.3
0.25
0.25
0.2
0.2
0.15
0.15
0.1
0.025
0.025
0
0
200
400
600
800
0
1000
0.1
0.05
0.05
0
0
200
400
Distance
Distance
Negative exponential curve
y = ae(-x/c)
600
0
800
Frequency
0.175
Conclusion

Moderate pollen flow (150 – 300 m) – Thrips as
pollinators

Predominant outcrossing (85%) & mix-mating (60%)

Model for pollen dispersal – negative exponential model

Optimum population size for conservation - breeding
unit area & breeding unit size obtained (about 70 ha)
Case study 2: Using
microsatellites for individualspecific DNA fingerprints
In forensic applications in forestry and chain of custody
certification, two types of databases are required
To track the illegal log into
its original population
Required fingerprinting
databases for population
identification
To match the illegal log into
its original stump
Required fingerprinting
databases for individual
identification
DNA markers to match the illegal log into its original stump
Log being stolen / Illegal logging
Collect sample for DNA extraction
• Perform DNA analysis using DNA markers
• Comparison of DNA profiles of log & stump
• If the same, they are from the same tree
Stump being left behind
Collect sample for DNA extraction
• However, In DNA testimony, it is necessary to provide an
estimate of the weight of the evidence
• Three possible outcomes of a DNA test: no match,
inconclusive, or MATCH between samples examined
• If MATCH, it would not be scientifically justifiable to speak
of a match as poor proof of identity in the absence of
underlying data that permit some reasonable estimate of
how rare the matching characteristics actually are
• Therefore, in forensic casework, a population database
must be established for statistical evaluation of the
evidence to extrapolate the possibility of a random match
Random
MATCH!!
Neobalanocarpus heimii
Methodology
Sample collection
DNA extraction
SSRs screening
SSRs analysis
Data analysis
Comprehensive DNA fingerprinting databases of N.
heimii generated for individual identification
throughout P. Malaysia
Sample collection
KEDAH
Bkt. Enggang (BEn)
Sungkop (Sun)
KELANTAN
Lebir (Leb)
Jeli (Jel)
G. Basor (GBa)
PERAK
Piah (Pia)
Bubu (Bub)
Chikus (Chi)
Jel
N. SEMBILAN
Pasoh (Pas)
Pelangai (Pel)
JOHOR
Labis (Lab)
Panti C16 (PaA)
Panti C68 (PaB)
Lenggor C32 (LeA)
Lenggor C76 (LeB)
Rambai Daun (RDa)
H. Terengganu C31
(HTA)
H. Terengganu C14A
(HTB)
Pasir Raja (PRa)
Sun
GB
a
Leb
SELANGOR
Sg Lalang (SLa)
Ampang (Amp)
Gombak (Gom)
TERENGGANU
BEn
Pia
HTB
HTA
Bub
PRa
RD
a
Chi
Ter
Gom
SLa
RT
u
Lak
BTi
Len Ke
m
PAHANG
Ber
Amp
Pas
Pel
Les
Lab
LeB
LeA
Pa
B
Pa
A
Lesong (Les)
Bkt. Tinggi (BTi)
Rotan Tunggal (RTu)
Tersang (Ter)
Lentang (Len)
Lakum (Lak)
Kemasul (Kem)
Berkelah (Ber)
SSRs screening
51 SSR primer pairs developed for dipterocarps
• Neobalanocarpus heimii (6) (Iwata et al. 2000)
• Shorea lumutensis (2) (Lee et al. 2006)
• Shorea leprosula (21) (Lee et al. 2004a)
• Hopea bilitonensis (15) (Lee et al. 2004b)
• Shorea curtisii (7) (Ujino et al. 1998)
Specific amplification
24b.fsa
24 Blue
Peak: Scan 1583
Peak: Scan 2946
92
017_01.fsa
94
96
1 Blue
98
Size 106.67
100
102
104
Height 108
106
108
Area 775
110
112
114
106
116
118
120
122
124
126
128
130
25b.fsa
108
110
132
134
25 Blue
112
Size 118.12
114
136
116
138
118
Height 111
120
122
124
200
Area 616
126
128
100
130
132
134
136
138
140
142
144
146
148
150
152
154
140
3000
2000
1000
107.62
40
20
111.87
26b.fsa
018_03.fsa
60
28 Blue
3 Blue
300
200
2000
100
1000
109.75
019_05.fsa
27b.fsa
116.16
31 Blue
5 Blue
3000
2000
80
60
40
20
1000
28b.fsa
34 Blue
80
60
40
20
107.69
020_07.fsa
7 Blue
4000
29b.fsa
25 Blue
2000
100
109.78
021_09.fsa
50
9 Blue
30b.fsa
28 Blue
2000
1000
109.78
022_11.fsa
11 Blue
31b.fsa
31 Blue
1500
1000
500
109.69
023_13.fsa
90
60
30
111.88
118.45
32b.fsa
13 Blue
34 Blue
2000
1000
103.36
107.59
80
60
40
20
300
200
100
Mode of inheritance
Peak: Scan 1787
01a.f122
sa
124
1261 Blue
128
Size 144.79
130
132
134
Height 421
136
138
140
Area 2038
142
144
146
148
150
152
154
156
158
160
162
164
166
168
170
172
174
176
178
180
400
146.83
02a.f sa
150.89
Maternal genotype
200
4 Blue
6000
4000
2000
146.71
03a.f sa
7 Blue
6000
4000
2000
146.83
04a.f sa
10 Blue
6000
4000
2000
150.77
05a.f sa
13 Blue
Half-sib
genotypes
6000
4000
2000
146.81
06a.f sa
150.88
16 Blue
6000
4000
2000
150.77
07a.f sa
19 Blue
3000
2000
1000
146.81
150.88
Qualitative observations (each progeny possessed at least one maternal
allele) to support the postulation of single-locus mode of inheritance
Null allele
 Homozygote excess (MICROCHECKER; Van
Oosterhout et al. 2004)
 Examine patterns of inheritance
 If any Individuals repeatedly fail to amplify any
alleles at just one locus while other loci amplify
normally
Allele 188
Allele 190
CTGAGCTATGAATGAAATAATTCAATATATATATATATAGAGAGAGAGAGAGA..GGAGGTGAGGCCCAC
CTGAGCTATGAATGAAATAATTCAATATATATATATATAGAGAGAGAGAGAGAGAGGAGGTGAGGCCCAC
G Sle605 (GA)n ? (GA)n(CA)n(GA)n
Allele
Allele
Allele
Allele
Allele
118a
118b
118c
119
121
CCCGAGGAAGGGGGCAGAGAGACACAGAGAGAGAGAGAGA....GGCAGATGGAGGGAC.GGCGACAGCA 70
CCCGAGGAAGGGGGCAGAGAGACACAGAGAGAGAGAGAGA....GGCAGATGGAGGGAC.GGCGACAGCA
CCCGAGGAAGGGGGCAGAGAGACACAGAGAGAGAGAGAGA....GGCAGATGGAGGGAC.GGCGACAGCA
CCCGAGGAAGGGGGCAGAGAGACACAGAGAGAGAGAGAGA....GGCAGATGGAGGGACCAGCGACAGCA
CCCGAGGAAGGGGGCAGAGA..CACAGAGAGAGAGAGAGAGAGAGGCAGATGGAGGGACCAGCGACAGCA
Repeat motif
H Shc09 (CT)n ? (A)n
Allele
Allele
Allele
Allele
Allele
186a
186b
186c
187
194
GGAAAAAAAAAAAAAAAAAA........TACGTACTTTTCGTTTTAGTTACGTTTTTCAATACCAAGAGA 70
GGAAAAAAAAAAAAAAAAAA........TACGTACTTTTCGTTTTAGTTACGTTTTTCAATACCAAGAGA
GGAAAAAAAAAAAAAAAAAA........TACGTACTTTTCGTTTTAGTTACGTTTTTCAATACCAAGAGA
GGAAAAAAAAAAAAAAAAAAA.......TACGTACTTTTCGTTTTATTTACGTTTTTCAATACCAAGAGA
GGAAAAAAAAAAAAAAAAAAAAAAAAAATACGTACTTTTCGTTTTAGTTACGTTTTTCAATACTAAGAGA
Dinucleotide repeats (CT)n to mononucleotide repeats (A)n
Allele117
Allele123
TGAATTGTTAGCAGCTTGAGCTTGAGCCTGATTTGAGCTCTCTCTCTCTCTCTCTCTCTCTCT......A
TGAATTGTTAGCAGCCTGAGCTTGAGCCTGATTTGAGCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTA
B Nhe011 (GA)n
Allele164a
Allele164b
Allele164c
Allele165
Allele174
HOMOPLASY
AAAAGAGAAACAACCATCTTTAAAGAG.AAAAAGGAGGGATAGAGAGAGAGAGAGAGA..........AG 70
AAAAGAGAAACAACCATCTTTAAAGAG.AAAAAGGAGGGAGAGAGAGAGAGAGAGAGA..........AG
AAAAGAGAAACAACCATCTTTAAAGAG.AAAAAGGAGGGAGAGAGAGAGAGAGAGAGA..........AG
AAAAGAGAAACAAGCATCTTTAAAGAGGAAAAAGAAAAGAGAGAGAGAGAGAGAGAGA..........AG
AGAAGAGAAACAAGCATCTTTAAAGAG.AAAAAGAAAAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAAG
C Nhe015 (TC)n(AC)n
Allele147a
Allele147b
Allele147c
Allele149
Allele153
Size homoplasy
HOMOPLASY
AAGACCAGGTCTCTCTCTCTCTCTCTCTCTCTGTCTCTC....ACACACACACACACAC......ATTCA 70
AAGACCAGGTCTCTCTCTCTCTCTCTCTCTCTGTCTCTC....ACACACACACACACAC......ATTCA
AAGACCAGGTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCACACACACACAC..........ATTCA
AAGACCAGGTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCACACACACACACAC........ATTCA
AAGACCAGGTCTCTCTCTCTCTCTCTCTCTCTCTCTCTC....ACACACACACACACACACACACATTCA
D Nhe018 (CT)n → (CT)n(CTAT)n
Allele137a
Allele137b
Allele137c
Allele139
Allele149
CGCTCTCTCTCTCTCTCTCTCTCT..CTATCTATCTATCTAT................CTGTGTCTCTCC 70
CGCTCTCTCTCTCTCTCTCTCTCT..CTATCTATCTATCTAT................CTGTGTCTCTCC
CGCTCTCTCTCTCTCTCTCTCTCT..CTATCTATCTATCTAT................CTGTGTCTCTCC
CGCTCTCTCTCTCTCTCTCTCTCTCTCTATCTATCTATCTAT................CTGTGTCTCTCC
CGCTCTCTCTCTCTCTCTCT......CTATCTATCTATCTATCTATCTATCTATCTATCTGTGTCTCTCC
Mode of
inheritance
Specific
amplification
Null
allele
Repeat
motif
51 SSR
primer pairs
Size
homoplasy
16 SSR primer pairs selected
Nhe004, Nhe005, Nhe011, Nhe015, Nhe018,
Nhe019, Hbi016, Hbi161, Sle111a, Sle392, Sle605,
Slu044a, Shc03, Shc04, Shc07, Shc09
• What model to use: product rule or subpopulation models?
Pasoh Forest Reserve (231 individuals)
Perform statistical tests to check:
• Hardy-Weinberg equilibrium for allele independence
• Linkage equilibrium for locus independence
• Results clearly showed that population is deviated from HWE
Clustering analysis on genetic
distance via NJ method
PA049
PA06
3
PA
00
9
PA
03
7
PA
035
PA
03
1
PA
02
6
PA
243
PA
16
3
PA
148
PA
16
5
4
16
PA
7
10
PA 188 212 7
PA PA A 21
P
7
22
PA
257
PA
104
PA
109 154
PA
PA
291
7
PA 26
6
PA 16
PA 082
PA225
PA174
PA178
PA219
PA087
PA229
PA118
PA147
PA153
PA230
PA255
PA101
PA249
PA080
PA081
PA106
PA276
PA218
PA180
PA 092PA157
PA145
PA152
PA234
PA099
PA 142
9
3
PA 20 PA 09
PA149
PA 110
PA
PA
183
105 2
PA
17
PA
PA
216
PA184
PA277
PA084
PA095
PA252
PA
PA 213
19
PA 5
PA 167
11
PA
18
6
129
PA 27 PA 18
1
9
PA
PA
PA 3
PA PA
13
15 29
0 7
PA2
11
9
177
PA
7
13
3
PA 125 108
28
PA
PA PA 284
A
P
5
5
1
7
26 A21 PA15 A12
P
PA P
6
PA 09
7
24
PA
PA PA
28
2
24 221 P
A29
4
0
3
22 130
PA PA
2 2
22 18
7
PA PA 1
07
14
PA
PA 26
2
PA
1 0
11 28
PA PA
173
PA
4
PA29
207 A299
PA
P
175
PA
7
PA18 270
PA
8
PA15
PA211
7
PA19
0
PA17 202
PA
PA085
PA287PA241
PA285
23
PA296
PA260
PA293
PA269
PA1
PA140
PA11 PA128
5
PA
233
PA083
PA27
2
PA271
PA P
27 A1 PA2
76 20
3
PA23 PA16 PA
240
7
8
PA
199
PA
PA 288
090
PA
09
8
P
PA
15
6
PA198
5
13
PA
P
A1
PA A2
24 06 PA 20
20
2
P
8
A
25
0 P
P
A
A P
29
16 A2
5
1 51
•
Half-siblings
(12.4%)
Inbreeding
•
Full-siblings (0.9%)
•
Parent-offspring (0.3%)
PA134
PA204
PA300
PA091
PA292
PA124
4
27
PA
PA
112
PA192
PA
0
PA 19
PA 268
100
PA116
PA
PA236
6
12 9
PA A16
P
P
A
26
P
2
A
13
PA 1
23
8
PA 16
PA 289 2
PA
19 PA
6 18
5
PA
PA
22
4
PA 235
PA298
PA133
PA144
PA258
Unrelated individual
(86.4%)
23
9
PA
25
3
PA09
4
PA
21
4
079
Population
substructuring
PA
089
PA 09
7
PA 25
9
PA 264
•
PA
22
8
PA
27
5
PA
PA10
3
PA 061
PA071
PA
4
00
1
05
PA
00
02
6
4
PA
PA 05
001
5
PA
PA
062
068
PA 05
6
PA038
PA064
PA
041
PA04
4
PA030
PA05
4
PA
PA05 033
2
PA046
PA023
PA025
PA007
PA04
0
PA058
PA022
PA018
PA027
PA013
PA002
047
PA
008
PA
1
PA01
PA010
3
PA00
2
03
PA 9
01
PA
A
P
PA
PA
021
2
07
PA 075
PA
P
01 A0
65
7
P
PA A 06
9
01
4
015
5
PA04
0
PA05
2
PA01
PA
A
P
6
8
04 A01 28
P
PA
0
PA
29
9
05 A0
P
PA
42
0 A0
07 P
PA
PA
07
4
PA
Relatedness among individuals using
ML-Relate software
PA
26
5
122
PA 143
PA
139 232
• Random match probability need to be calculated using
subpopulation model and corrected for coancestry (FST) and
inbreeding (FIS) coefficients
Homozygote:
P(A i A i/A i A i)
=
[Fst + (1 – Fst)pi]
Fis + (1-Fis)[Fst + (1 – Fst) pi]
+ (1-Fis)
Heterozygote:
P(A i Aj/A i Aj )
=
2(1-Fis)
2
2
Fis + 2Fis(1-Fis)
[2Fst + (1 – Fst)pi]
(1 + Fst)
[2Fst + (1 – Fst)pi][3Fst + (1 – Fst)pi] ]
(1 + Fst)(1 + 2Fst)
[Fst + (1 – Fst)pi][Fst + (1 – Fst) pj]
(1 + Fst)(1 + 2Fst)
Ayres and Overall (1999). Forensic Science International 103: 207-216
Population structure of N. heimii throughout P. Malaysia
PaB
PaA
LeB
LeA
Les
Lab
Pel
Pas
RDa
Region
B
Ber
BEn
Sun
Kem
Jel
Region C
GBa
BTi
Lak
Leb
Pia
Len
Region A
Ter
HTB
HTA
Bub
PRa
RDa
RTu
HTB
Chi
Ter
HTA
RTu
Jel
GBa
Leb
Region
C
Gom
SLa
Ber
Region B
Lak
BTi
Len Kem
Amp
Pia
Pel
Pas
Les
PRa
SLa
Lab
Amp
Gom
Chi
Bub
Sun
BEn
Region
A
LeB
LeA
PaA
PaB
DNA fingerprinting databases of N. heimiii throughout P. Malaysia
REGION A
BEn, Sun, Bub, Chi,
SLa, Amp, Gom
REGION B
Pas, Pel, Lab, PaA, PaB, LeA,
LeB, Les, BTi, RTu, Ter, Len,
Lak, Kem, Ber, RDa
REGION C
HTA, HTB, PRa, Leb,
Jel, GBa, Pia
Hardy-Weinberg equilibrium for allele independence
Linkage equilibrium for locus independence
Allele frequencies
Fst = 0.0470
Fis = 0.1758
Match probability
Allele frequencies
Fst = 0.0285
Fis = 0.1457
Match probability
Allele frequencies
Fst = 0.0334
Fis = 0.1998
Match probability
Applications of the databases
Locus
Genotypes
Genotypes
Nhe004
262/262
262/262
Nhe005
129/129
129/129
Nhe011
176/186
176/186
Nhe015
143/181
143/181
Nhe018
141/169
141/169
Nhe019
214/220
214/220
Hbi016
140/141
140/141
Hbi161
102/105
102/105
Sle111a
137/140
137/140
Sle392
187/189
187/189
Sle605
120/120
120/120
Slu044a
148/148
148/148
Shc03
131/139
131/139
Shc04
85/117
85/117
Shc07
169/169
169/169
Shc09
190/201
190/201
Using database to extrapolate the possibility of a random match
DNA fingerprinting database
Region A (Allele frequencies)
Sub-population model (Fst
= 0.0470; Fis = 0.1758)
99.9999999…% sure that the log is originated from this stump
Provides
legal
evidence to
convict the
illegal
loggers
To ensure conservation &
sustainable utilization of FGRs