Massively Parallel Sequencing and STR Typing Basics

Download Report

Transcript Massively Parallel Sequencing and STR Typing Basics

Massively Parallel Sequencing and STR Typing
Basics and Forensic Applications
Bruce Budowle
Institute of Applied Genetics
Department of Molecular and Medical Genetics,
University of North Texas Health Science Center
Fort Worth, Texas USA
Outline
• Describe the technology/methodology
• Focus primarily on STRs and results to date
• Provide brief discussion on other markers
• Consider transfer to the laboratory
• Suggest additional forensic applications
DNA Typing
•
•
•
•
•
Identification
Associations
Investigative leads
Databases
Demands
• High volume
• Lead to backlogs
• High throughput sample processing
• Special attention situations
• Expedites
•
Technology developments and enhancements to meet demands
Current Forensic DNA Workflows
• CE-based systems
• Mainstay of DNA typing
• Limitations
•
•
•
•
•
•
Low capacity for marker multiplexing
Tend to multiplex one type of marker
One sample per reaction
Genotyping based on size only
Complex mixture analyses
Low depth sequencing coverage
Current Forensic DNA Workflows
• Decision trees required for marker set selection
for various sample types
•
•
•
•
•
Autosomal STRs
Y STRs
X STRs
SNPs
mtDNA
• Markers provide no additional lead without a
suspect or a database hit
• Need tests that provide additional investigative leads
• Ex: facial reconstruction; phenotype and ancestry markers; familial
searching
The Human Genome Project
Scale
350=
1990-2003
•
•
•
•
13 years
~40 Institutions
8-9X Coverage
$3.8 Billion
Human Genome Project - 10 years and cost $2.7 billion
Next Generation Sequencing
• NGS is no longer next generation, consider:
• Current Generation Sequencing
• Massively Parallel Sequencing (MPS)
Personalized Sequencing
Genome Center Capabilities
in the Forensic Laboratory
MPS Features
• Massively parallel sequencing
• Higher throughput
• Faster
• Depth of coverage
• Lower cost per nucleotide
• Reduction in error
MPS and
“Gold Standard” Sanger Sequencing
• Difference in scale: hundreds of billions of bases vs.
hundreds
• Each sequence is the product of a single molecule
(individually or clonally interrogated)
• No a priori sequence knowledge required unless
targeting specific regions
• Will likely target sequences (by PCR) for sensitivity of
detection
MPS and
Forensic STR“Gold Standard”
• Capillary electrophoresis fragments (size differences)
Targeted MPS
~13-24 markers
100s of Markers
GTGTGATGTAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGGTGTGTG
GTGTGATGTAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGGTGTGTG
GTGTGATGTAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGGTGTGTG
GTGTGATGTAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGGTGTGTG
GTGTGATGTAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGGTGTGTG
GTGTGATGTAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGGTGTGTG
GTGTGATGTAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGGTGTGTG
GTGTGATGTAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGGTGTGTG
Increasing size (bp)
Massively parallel sequencing
Fragments can overlap in size
Read counts / noise
Length Variations
Length & Sequence Variations
Size No Longer Matters
• With CE the marker amplicons tagged with the same
fluor had to be different sizes and/or use of mobility
modifiers
• Limits the number of loci that can be placed in a CE-based
multiplex
• With MPS size [for detection purposes] is no longer
an issue
• Each molecule is read independently
• Enables more markers to be analyzed simultaneously
Benefits of MPS in Forensics
• Real estate is no longer a concern
• Create a panel with all the markers to avoid having to
choose a particular set
• Exploit SNP variation in current markers and
flanking area of novel markers
• Integrate STRs and SNPs in the reference
database allowing the needs of the analysis (or
case evidence) to dictate the marker set used
• Increase success in typing results
Massively Parallel Sequencing
• Immediate goals
• Large battery of genetic markers can be analyzed
simultaneously
• Far exceeding the current capacity of 15-27 STRs of CE
system
• Autosomal STRs, Y STRs, X STRs, and SNPs (~400
markers)
• mtDNA
• Barcoding 16 to 384 (in theory) – multiple individuals
• Economies of scale
• Can rival current costs of typing a modicum of STRs
Types of SNPs
• Individual Identification SNPs:
• SNPs that collectively give very low probabilities of two individuals having the
same multisite genotype; individualization, High heterozygosity, low Fst
• Ancestry Informative SNPs:
• SNPs that collectively give a high probability of an individual’s ancestry being
from one part of the world or being derived from two or more areas of the world
• Lineage Informative SNPs:
• Sets of tightly linked SNPs that function as multiallelic markers that can serve to
identify relatives with higher probabilities than simple di-allelic SNPs
• Phenotype Informative SNPs:
• SNPs that provide high probability that the individual has particular phenotypes,
such as a particular skin color, hair color, eye color, etc.
• Pharmacogenetic SNPs – molecular autopsy
General MPS Workflow
• Extract DNA
• Fragment genomic DNA or targeted
amplification
• Library preparation
• Cluster generation/clonal amplification
• Sequence
• Data analysis
• Bioinformatics
General Library Preparation
Fragmentation
End repair
Adapter ligation
Overview Of Ion Torrent PGM™
Technology and Workflow
• Personal Genome Machine
• Ion 314, 316, 318 Chip v2
• Up to 2Gb and 5.5M reads
• 2.3-7.3 hour runs
• Depending on read length and chip size
• Multiplex 96 samples
Concept
• To date, DNA sequencing required imaging
technology to support detection of
electromagnetic intermediates (light) and
specialized nucleotides or other reagents
• An alternative now exists that is based on nonoptical sequencing on integrated circuits
Library and Template Preparation
Clonal amplification
using emulsion PCR
Emulsion droplet
IS
P
Primer
dNTPs
Polymerase
MgCl2
Isolate templated ISPs
Final Templated ISPs ready for sequencing
Sequence Detection by pH
Chemistry
Reduce sequencing errors:
•
•
•
•
Modified bases
Fluorescent bases
Laser detection
Enzymatic amplification cascades
Improve read length limitations:
• Unnatural bases
• Faulty synthesis
• Slow cycle time
Deliver highly uniform genome coverage
Sequencing: Flows
Sequencing: Flows
Sequencing: Flows
26
2_IonU_PGM Worfkflow_v13_la
Sequencing: Flows
27
2_IonU_PGM Worfkflow_v13_la
Data Output is an Ionogram
• An “ionogram” is the output of the signals in flow space
• Must be read “up-and-down” along with “left-to-right”
• Height of bar indicates how many nucleotides incorporated
during flow
TTT
Key Sequence
AA
TCAG
Sequence: AATCTTCTG…
Overview of Illumina
Technology and Workflow
• MiSeq
• FGx- Forensic Genomics System
• Up to 15Gb and 50M reads
• 4-55 hour runs
• Depending on read length
• Multiplex 96 samples
The Illumina Sequencing Process
www.illumina.com
“Tagmentation”
Preparing Libraries with Nextera® XT
adapters
Cluster Generation
Cluster generation occurs on a flow cell
A flow cell is a thick glass slide with
channels or lanes
Each lane is randomly coated
with a lawn of oligos that are
complementary to the library
adapters
Clusters and subsequent
sequencing are performed in a
contained environment (reduces
contamination)
Illumina Sequencing Technology
Overview
3’ 5’
DNA
(0.1-5.0 μg)
A
C
C
T
G
G
C
A
G
A
T
G
C
Single molecule array
5’
Library Preparation
1
2
3
Sequencing
Cluster Growth
4
5
6
7
8
T
G
T
A
C
G
A
T
C
A
C
C
C
G
A
T
C
G
A
A
9
TG TACGAT…
Image Acquisition
Base Calling
Illumina Sequencing-by-Synthesis
Natural competition reversible terminators for nucleotide incorporation
Incorporated
Fl-NTP is
imaged
Add 4 FlNTP’s +
Polymerase
1
2
3
4
5
Terminator and
fluorescent dye are
cleaved from the FlNTP
6
7
8
9
Bioinformatics
• A science in itself
• Many science experiments are carried out with
bioinformatics
• “the new field that merges biology, computer science,
and information technology to manage and analyze the
data, with the ultimate goal of understanding and
modeling living systems."
•
Genomics and Its Impact on Medicine and Society - A 2001 Primer U.S. Department of Energy Human
Genome Program
Bioinformatics
•
•
•
•
•
Allele calls
Alignment
Strand bias
Coverage
….
First Term
• Coverage
•
•
•
•
Number of times a base is sequenced
Sanger sequencing of mtDNA --- 1X or 2X
Human Genome project --- 6X
MPS --- 100s – 1000sX
Multiplexing/Demultiplexing
Barcoding/Index Sequences
• Multiplexing samples with index sequences
• Allows multiple samples to be sequenced together
• Index sequences allow reads to be sorted
Sample 10
Sample 9
Sample 1
Sample 2
Sample 3
Sample 8
Sample 7
Sample 4
Sample 6
Sample 5
STRs
• Current mainstay for identity testing
• High discrimination power
AATG AATG AATG AATG AATG
AATG AATG AATG AATG AATG AATG
AATG AATG AATG AATG AATG AATG AATG
Challenges of Sequencing STRs
• Read length
• Sampling
• Coverage
146 bases
Flanking region
Repeat
Flanking region
Flanking region
Flanking region
Repeat
Repeat
Repeat
Flanking region
Repeat
Repeat
Repeat
Repeat
Flanking
Repeat
Repeat
Repeat
Repeat
Repeat
Repeat
Repeat
STR Motifs
Simple:
Compound:
Complex:
TA
STRait Razor
•
•
•
•
•
STR Allele Identification Tool – Razor
Tool for STR allele calling for concordance studies with CE data
Linux-based Perl script that identifies alleles based on length
Handles repeat motifs ranging from simple to complex
Does not require a reference composed of extensive allelic
sequence data
STRait Razor
Original Image
STRait Razor
Original Image
STRait Razor
• Produces a colon-delimited text file that lists the
alleles called at each STR locus, in order of the
number of reads in which each allele is detected
• Analyzes both single-end and paired-end data
• Accepts either 1 (single-end) or 2 (paired-end) input
FASTQ files
• Recognizes STR loci in both forward and reverse
complement forms
STRait Razor v2.0
STRait Razor includes the following new features:
• Expanded marker set (89 STRs*)
•
•
•
•
Autosomal STRs
X-STRs
Y-STRs
Amelogenin
*Y-STR duplications considered single markers
• Ability to analyze all markers, X only, and Y
only
STRait Razor v2.0
STRait Razor includes the following new features:
• Sorting and counting of intra-repeat variants (SNPs
within STRs)
• Ability to queue batches of samples for analysis, for
ease of use and improved throughput
• Improved locus configuration tool for creating your
own marker list
• “Electropherogram”-style data visualization
Sequence Data Sorting
Excel-based Data Analysis
• Length-based genotypes
• RazorGenotyper
• Sequence-based genotypes
• RazorAnalysis
• “Mock-electropherograms”
• RazorHistogram
• Easy data visualization
RazorAnalysis
• Sequence-based analysis of STRs generates
• Genotype table
• Top-20 sequences for each locus
• Multiple contributor mixtures
RazorHistogram
• Converts genotype tables into ‘mockelectropherograms’
• Easy data visualization
Prototype NGS STR Multiplex System
(Promega )
• 17 STRs
• CODIS 13 core loci
• Penta D, Penta E loci
• D2S1338, D19S433 loci
• Amelogenin locus
• Amplicon lengths range is 176 – 332 bp
Amplicon Sizes
Locus
Penta E
D18S51
D21S11
TH01
D3S1358
FGA
TPOX
D8S1179
vWA
Penta D
CSF1PO
D16S539
D7S820
D13S317
D5S818
D2S1338
D19S443
Smallest Known Allele
179
190
203
220
192
176
196
203
202
192
185
198
211
209
191
197
193
Largest Known Allele
284
277
273
264
240
332
244
255
262
265
229
246
255
257
239
269
253
PCR Conditions
• 1 min at 96°C for polymerase activation
• 30 cycles
• 10 s at 94°C for denaturation
• 1 min at 59°C for primer annealing
• 30 s at 72°C for primer extension
• 10 min at 60°C for final extension
• Total time <90 minutes
Sample Preparation and Sequencing
• Amplified products purified using the MinElute PCR
Purification Kit (Qiagen)
• Libraries were prepared using the TruSeq DNA LT Sample
Preparation Kit (Illumina)
• Indexed DNA library (up to 24)
• Normalized to 2 nM and pooled
• Pooled libraries diluted to 10 pM
• MiSeq v2 (2 x 250 bp) chemistry (Illumina)
• MiSeq re-sequencing protocol for small genome sequencing
Studies for
Developing Effective Protocol
• Size selection
• SPRI beads v MinElute
• Library input
• Amplified product range
• PCR input
• Analysis of 24 individuals at 62 and 250 pg
• Concordance with PowerPlex® Fusion System
(Promega)
Solid Phase Reversible Immobilization
• Purification/Size selection
• Selectively bind fragments based on ratio of beads to
sample
• Paramagnetic bead technology
• Adjusting the ratio eliminates smaller or larger
fragment sizes
• STRs have size ranges
• Balance between differential amplification of alleles
and size selection
https://www.beckmancoulter.com/wsrportal/bibliography?docname=SPRIselect.pdf
SPRI Results (n=3)
• Lower Bead Mixture Ratios (BMRs) recovered larger
amplicons better, while higher ratios were better for
recovering smaller amplicons
• No one BMR yielded a superior Allele Coverage
Ratios (ACR) at all loci
• Selected the least spread of heterozygote ACRs per BMR
• Two out of three samples favored BMRs of 0.65 and
0.70
MinElute PCR Purification Kit
• Purification/Size selection
• Silica-membrane-based purification of PCR
products
• 70 bp – 4 kb size range
• May capture range of allele sizes more
effectively
http://www.qiagen.com/products/catalog/sample-technologies/dna-sample-technologies/dnacleanup/minelute-pcr-purification-kit
Average Allele Coverage Ratios (n=5)
BMRs (0.65, 0.70) v MinElute size selection
1
Average ACRs
0.8
0.6
0.65
0.70
0.4
MinElute
0.2
0
STR loci
Average Locus/Total Coverage Ratios (n=5)
BMRs (0.65, 0.70) v MinElute size selection
0.1
0.09
Locus Coverage/ Total Coverage
0.08
0.07
0.06
0.05
0.65
0.70
0.04
MinElute
0.03
0.02
0.01
0
STR loci
Size Selection
8
7
6
DNA ng/ul
5
4
3
2
1
0
65% AMPXP
70% AMPXP
MinElute
Results of Size Selection Evaluation
• Average ACRs for 0.65 and 0.70 BMRs and MinElute
method were comparable
• Locus-to-locus balance was substantially higher with the
MinElute method
• Especially notable at Amelogenin locus
• MinElute method yielded greater amounts of library for
sequencing
• MinElute method yield was 46.8 – 186 ng
• Largest yield for 0.65 and 0.70 BMR <29.6 ng
PCR Product for Library Preparation
• TruSeq LT protocol input was 800 - 1000 ng genomic
DNA
• TruSeq LT was developed and optimized for genomic
DNA
• The STR protocol substitutes single copy target with
amplified product
• Thus, amount of library input should be able to be
reduced considerably with PCR-based enrichment of
the STR loci
Allele Coverage Ratios
different quantities of PCR product
1.2
1
0.8
ACR
200 ng
0.6
100 ng
50 ng
25 ng
0.4
12.5 ng
6 ng
0.2
0
STR loci
Sample # 17
500pg
The loci Amelogenin (X,X) and D18S51 (17, 17) were homozygotes
Comparison of Locus/Total Coverage
different quantities of PCR product
0.09
0.08
Locus coverage/Total coverage
0.07
0.06
0.05
200 ng
100 ng
0.04
50 ng
25 ng
0.03
12.5 ng
6 ng
0.02
0.01
0
STR loci
Sample # 17
500pg
Results of Library Input Evaluation
• Six different library inputs
• 500 pg for initial PCR
• Only 6 ng input could not meet 2 nM normalization
target
• 1.33, 1.29, and 1.79 nM
• Equal volumes of these samples (1 µl) were used when
pooling libraries
• All heterozygous loci had an ACR ≥ 0.6
• Indicates that <500 pg starting DNA could be placed
into the PCR
STR Profile - 500pg Input DNA
50 ng library input
Sensitivity Study
• DNA samples (n=3) ranging from 16 to 500 pg
• 6 ng (and 50 ng) for library input
• 500 pg results consistent library preparation
• ACRs ≥0.58 at all loci
• Heterozygote imbalance increased with decreasing
template
• Similar results with CE-based methods
• Generally good results to about 62 pg
• Further testing at 250 pg and 62 pg
Allele Coverage Ratios (n=24)
with 250 pg Input DNA
Allele Coverage Ratios (n=24)
with 62 pg Input DNA
Distribution of Individual Locus ACRs
Average Locus Coverage (n=24)
with 250 pg of input DNA
50,000X
Average Locus Coverage (n=24)
with 62 pg of input DNA
Performance Results
• 250 pg of input DNA generated balanced ACR
at all loci
• Average ACR at 18 loci > 0.75
• 62 pg of input DNA more imbalance
• Average ACR > 0.5
• Except at the Amelogenin locus
• Majority complete and some partial profiles
STR Profile – 62 pg Input DNA
50 ng library input
Loci with Intra-Repeat Variation (n=24)
Loci
D2S1338
D3S1358
D3S1358
D3S1358
D3S1358
D3S1358
D3S1358
D3S1358
D8S1179
D8S1179
D8S1179
D8S1179
D8S1179
D8S1179
D8S1179
VWA
VWA
VWA
VWA
VWA
VWA
VWA
VWA
Repeats
25
15
15
15
16
16
17
17
13
13
14
14
14
15
15
16
16
17
17
18
18
19
19
Observations
1
3
10
1
10
4
3
4
11
5
1
7
1
3
1
3
4
1
15
1
5
1
1
Loci
D21S11
D21S11
D21S11
D21S11
D21S11
D21S11
D21S11
D21S11
D21S11
D2S1338
D2S1338
D2S1338
D2S1338
D2S1338
D2S1338
D2S1338
D2S1338
D2S1338
Repeats
29
29
30
30
30
31
31
32.2
32.2
20
20
20
22
22
23
23
25
25
Observations
8
1
3
2
5
5
2
4
1
1
2
1
1
2
1
5
5
1
Variation Example at D3S1358 Locus
Stutter sequence for 15 alleles
AGATAGATAGATAGATAGATAGATAGATAGATAGATAGACAGACAGACAGATAGAT
AGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGACAGACAGATAGAT
Sequence for 15 alleles
AGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGACAGACAGACAGATAGAT
AGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGACAGACAGATAGAT
Allele and Stutter Distribution
for D3S1358 Homozygote
Variation Example at D21S11 Locus
Allele 30
TAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATATGGATAGATAGATGATA
GATAGATAGATATAGATAGATAGACAGACAGACAGACAGACAGATAGATAGATAGATAGAT
AGATAGA
Minus Stutter Allele 31
TAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATATGGATAGATAGATGATA
GATAGATAGATATAGATAGATAGACAGACAGACAGACAGACAGACAGATAGATAGATAGAT
AGATAGA
Allele 31
TAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATATGGATAGATAGAT
GATAGATAGATAGATATAGATAGATAGACAGACAGACAGACAGACAGACAGATAGATAGAT
AGATAGATAGA
Plus Stutter Allele 30
TAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATATGGATAGATAGAT
GATAGATAGATAGATATAGATAGATAGACAGACAGACAGACAGACAGATAGATAGATAGAT
AGATAGATAGA
Allele and Stutter Distribution
for D21S11 Heterozygote
Minor and Major Contributor Alleles
Mixture of 2 people
Both 11,12
D5S818
6000
5000
AGAT...AGAT
Depth of Coverage
4000
AGAT...AGAT
12 distinguishable
3000
AGAT...AGAG
2000
11 indistinguishable
1000
Stutter from
allele 11
0
9
10
11
12
Nominal Alleles by Repeat
Major
Shared
Minor
13
Value of Sequence Variation
• Increased discrimination power
• Use more data for stutter calculations
• Potentially enhanced mixture deconvolution
• Stutter v minor contributor at some loci
• Note that D5S818, D7S820, and D13S317 loci
did not display allele sequence variation
• Detected by mass spectrometry
• Variants reside in flanking DNA (Oberacher et al, Hum.
Mutat. 29 (2008) 427-432)
Conclusions
• Prototype NGS STR Multiplex System (18 loci) for MPS
• MinElute PCR Purification Kit was better for size selection
• A wide range of library input quantities could be used with no notable
differences in the generated STR profile
• High depth of coverage and balanced ACRs can be obtained with input
DNA ≥250 pg
• Similar to that typically observed in CE-based systems
• Input DNA as little as 62 pg could generate complete or nearly
complete (i.e., limited allele drop-out) profiles
• Studies indicate that this STR multiplex and the Illumina MiSeq
system can generate reliable DNA profiles at a sensitivity of detection
level that is comparable to that of current CE-based approaches
• Think about selecting STRs that exhibit intra-allele variation!
Markers Included in
ForenSeq™ DNA Signature Prep Kit
• 59 STRs
• 27 autosomal
• 24 Y
• 8X
• 173 SNPs
• 95 HID
• 54 ancestry
• 24 phenotypic (2 can be used for ancestry)
Beta Testing-32 samples
• Reproducibility- Control 2800M DNA in triplicate at 1ng
• Sensitivity- Control 2800M DNA prepared at 1ng, 500pg, 250pg, 100pg, and 50pg.
• CE concordance- Eleven reference samples were sequenced and typed for
autosomal STRs and Y-STRs, using multiplex PCR and CE to evaluate concordance.
• Casework-type samples- Ten difficult/challenged samples (i.e., degraded, inhibited,
or low-quantity (<1ng)) were sequenced. Traditional STR typing and CE were
performed for comparison.
• Mixture detection- Four mixture samples were prepared, each consisting of 2
individuals at a total of 1ng DNA. Each mixture was prepared in a 1:10 ratio. The
four mixture samples were as follows: 1:10 male/female; 1:10 female/male; 1:10
male/male; and 1:10 female/female.
Study Methods
• Samples: Reference samples and casework-type samples (aged buccal
swabs and human bone samples)
• Quantification: Determined using the Quantifiler® Human DNA
Quantification Kit (Thermo Fisher Scientific)
• Library preparation and sequencing: The ForenSeq™ DNA
Signature Prep Kit (Illumina) to barcode and create libraries;
Sequencing on the MiSeq FGx Sequencer (Illumina)
• Data analysis: ForenSeq™ UAS
• STR typing for concordance: AmpFLSTR® Identifiler® Plus PCR
Amplification Kit and the AmpFLSTR® Yfiler® PCR Amplification
Kit and CE was performed using the 3130xl Genetic Analyzer
• Data analysis: STR data analysis was performed using GeneMapper ID v3.2.1
Sensitivity - Alleles Observed
Average Depth of Coverage (DoC) for STRs
(1ng)
Allele Coverage Ratio (ACR) for STRs
(1ng)
MPS and CE Concordance
• 11 reference and 10 casework-type samples
• 546 shared loci
CE and MPS STR Loci Detection
(Sample 13)
A
B
Minimum Coverage Threshold - 30X
Total STR and SNP Loci Observed
for Casework-Type Samples
Detectable STR Alleles with CE and MPS
for Casework-Type Samples
Mixtures
• male/male, female/female, female/male, male/
female
• 1:10 ratio
• Assessed by presence of alleles consistent with the
minor contributor’s type within each mixture
• Alleles lying within two repeats upstream (minus
stutter) or one repeat downstream (plus stutter) were
not considered for minor contributor assessment
FGx Performance
• Population studies
• Phase I: Autosomal frequencies
• Hispanic, African American, Asian, and Caucasian
• 200 individuals each
• Phase II: Y-STR Haplotype frequencies
• ~1000 individuals across four populations
• Performance of first 29 population samples
Personal Genome Machine®
and Chip
Low cost, convenient,
single use device
Easy, automatic fluid connections
Reduced Footprint
2_IonU_PGM Worfkflow_v13_la
Green Mountain Study
• Data from blind study
• Green Mountain Study
• 12 samples
• Used 1 ng template DNA
• Markers
•
•
•
•
•
HID SNPs
AIMs
Y SNPs
STRs
Whole Genome mtDNA
SNP Panels
• Two SNP Panels
• HID-Ion AmpliSeq™ Identity Panel
• 90 autosomal SNPs
• 34 upper Y-clade SNPs
• HID-Ion AmpliSeq™ Ancestry Panel
• 165 autosomal SNPs
• Sequenced with HID-Ion PGMTM workflow
• Analyzed with HID SNP Genotyper Plugin
HID-Ion AmpliSeq™ Identity Panel
Sample
Gender
1
Male
3
Female
4
Male
5
Female
6
Male
7
Male
10
Female
13
Female
14
Female
15
Female
16
Male
17
Female
HID-Ion AmpliSeq™ Identity Panel
rs4141886
rs2032595
rs2032599
rs20320
rs2032602
rs8179021
rs2032624
rs2032636
rs9341278
rs2032658
rs2319818
rs17269816
rs17222573
M479
rs3848982
rs3900
rs3911
rs2032631
T
T
G
T
C
C
G
G
G
G
C
A
C
C
G
A
A
T
T
T A G C
4
C C C G
C
G
T
G
T
T
A
T
T
G
T
T
A
G
G
A
G
C
A
C
C
G
A
A
T
T
T A G C
6
C C C G
C
C
T
G
T
A
A
T
T
G
T
C
A
G
G
A
G
C
A
C
C
C
A
G
T
T
T A G A
7
C C C G
A
G
T
G
T
T
A
T
T
G
T
C
A
G
G
A
G
C
A
C
C
G
A
G
T
T
G G G C
16
C C C A
C
G
T
G
T
T
G
T
T
G
T
C
A
G
G
A
G
C
A
C
T
C
A
G
T
C
T A T A
rs2033003
rs17306671
A
No evidence of paternal relationships
among the males
rs17842518
P202
T
rs13447443
P256
T
rs16980426
L298
G
rs2032652
rs17250845
T
rs2032673
rs16981290
G
rs9786139
C
rs9786184
C C A G
rs2534636
1
Sample
rs35284970
Y-SNPs:
HID-Ion AmpliSeq™ Identity Panel
Haplogroup Informative Y-SNPs
Sample
Y-Clade
Region
1
R1b
West Asia, Russian Plain or Central Asia
4
Q
Central Asia, the Indian Subcontinent, Siberia
6
J
Arabian Peninsula
7
O2
Asia
16
E
Africa
HID-Ion AmpliSeq™ Ancestry Panel
Sample
Biogeographic Ancestry
1
European
3
European
4
Asian
5
European
6
European
7
Asian
10
European
13
African Americans
14
African admix
15
African admix
16
African
17
European
STR Panel
• 10-plex STR Panel
• Amelogenin • D7S820
• D16S359 • D8S1179
• D3S1358 • TH01
• D5S818
• TPOX
• CSF1PO • vWA
• Analyzed data
• STRait Razor
• STR Genotyper Plugin
• Compared data with genotypes generated on 3130xl
Genetic Analyzer
• Also evaluated of sequence variants within alleles
STRait Razor: Warshauer et al. 2013
STR Panel
Sample
AMEL
CSF1PO
D16S539
D3S1358
D5S818
D7S820
D8S1179
TH01
TPOX
vWA
1
X,Y
11,11
8,12
16*,17
12,12
8,10
13,13
8,9.3
10,12
17,18
3
X,X
10,11
12,12
15,18
9,11
11,12
12*,13
9.3,9.3
11,11
16,17
4
X,Y
12,15
12,13
16*,17
10,12
10,10
11,15
9,9.3
8,8
16,17
5
X,X
11,11
10,12
15,15
11,12
10,11
12*,13
6,7
8,8
15,19
6
X,Y
11,12
10,11
15,17
11,13
8,11
12*,13
7,9.3
8,11
16*,17
7
X,Y
10,12
9,12
15*,16
9,10
11,11
12*,13
6,9
10,11
14,18
10
X,X
12,12
11,14
16,18
11,11
10,11
11,13
7,9
9,11
14*,15
13
X,X
12,12
11,12
16*,17
12,13
10,11
14*,14*
6,7
9,11
15,18
14
X,X
12,13
9,12
16*,17
12,12
8,8
13*,13*
6,8
8,9
15*,17
15
X,X
12,12
9,13
16,17
11,12
8,9
10,13
6,9
9,9
15,16
16
X,Y
11,13
9,9
15,17
12,13
8,11
12,13
6,8
8,11
15,17
17
X,X
10,12
11,12
15*,16
12,13
8,8
13*,13*
7,8
8,9
17,19
* Indicates the presence of a sequence variant for that allele
STR Genotyper Plugin and STRait Razor produced concordant analysis results
Resolving Same Size Alleles
STR Panel
Locus
Allele
Number of Varying
Sequences
vWA
14
2
vWA
15
2
vWA
16
2
D3S1358
15
3
D3S1358
16
3
D8S1179
12
2
D8S1179
13
3
D8S1179
14
2
STR Panel
Sample
AMEL
CSF1PO
D16S539
D3S1358
D5S818
D7S820
D8S1179
TH01
TPOX
vWA
1
X,Y
11,11
8,12
16*,17
12,12
8,10
13,13
8,9.3
10,12
17,18
3
X,X
10,11
12,12
15,18
9,11
11,12
12*,13
9.3,9.3
11,11
16,17
4
X,Y
12,15
12,13
16*,17
10,12
10,10
11,15
9,9.3
8,8
16,17
5
X,X
11,11
10,12
15,15
11,12
10,11
12*,13
6,7
8,8
15,19
6
X,Y
11,12
10,11
15,17
11,13
8,11
12*,13
7,9.3
8,11
16*,17
7
X,Y
10,12
9,12
15*,16
9,10
11,11
12*,13
6,9
10,11
14,18
10
X,X
12,12
11,14
16,18
11,11
10,11
11,13
7,9
9,11
14*,15
13
X,X
12,12
11,12
16*,17
12,13
10,11
14*,14*
6,7
9,11
15,18
14
X,X
12,13
9,12
16*,17
12,12
8,8
13*,13*
6,8
8,9
15*,17
15
X,X
12,12
9,13
16,17
11,12
8,9
10,13
6,9
9,9
15,16
16
X,Y
11,13
9,9
15,17
12,13
8,11
12,13
6,8
8,11
15,17
17
X,X
10,12
11,12
15*,16
12,13
8,8
13*,13*
7,8
8,9
17,19
* Indicates the presence of a sequence variant for that allele
Allele variants detected by Mass Spectrometry – variant does not reside within repeats
PGM mtGenome Data
(initial study, Seo et al BMC Genomics, in press)
• Long PCR
• Whole genome
mtDNA Genome
Sample
Haplogroup
1
J1c5
3
H3b
4
U7b
5
H6a1b4
6
H33
7
M7b1a1c1
10
H5n
13
L2a1f
14
H1c
15
H1c
16
L3e1a1a
17
H1c
mtDNA Genome
Sample
Haplogroup
Population
1
J1c5
European
3
H3b
European
4
U7b
European
5
H6a1b4
European
6
H33
European
7
M7b1a1c1
Asian
10
H5n
European
13
L2a1f
African
14
H1c
European
15
H1c
European
16
L3e1a1a
African
17
H1c
European
mtDNA Genome
Sample
Haplogroup
1
J1c5
3
H3b
4
U7b
5
H6a1b4
6
H33
7
M7b1a1c1
10
H5n
13
L2a1f
14
H1c
15
H1c
16
L3e1a1a
17
H1c
Identifying Relationships
Genotypes from STRs and Identity
SNPs allow for expansion and
refinement of the partial pedigree
identified with the mitochondrial
haplotypes
Identifying Relationships
STRs:
Likelihood Ratio Results:
Posterior probability = 0.999999996671495
Combined likelihood ratio = 300 million
SNPs:
Likelihood Ratio Results:
Combined likelihood ratio = 3.34 E46
Internal ThermoFisher kinship algorithm used for calculations
Family Reconstructions
Intra-allele Variation
Sample 16
12,13
[TCTA]2 [TCTG]1 [TCTA]9
[TCTA]2 [TCTG]1 [TCTA]10
Sample 17
13,13
12,13
13,13
[TCTA]13
[TCTA]1 [TCTG]1 [TCTA]11
Sample 14
13,13
13,13
[TCTA]2 [TCTG]1 [TCTA]10
[TCTA]1 [TCTG]1 [TCTA]11
Sample 15
10,13
[TCTA]10
[TCTA]1 [TCTG]1 [TCTA]11
10,13
Granddaughter and Grandmother - 13 allele IBD
D8S1179
Applying This Technology
• Reference Samples for Databases
• Comprehensive typing
• Better for investigative leads
• Overall cost benefit
• Validation
• Data support reliability and robustness
• Complete studies
• Population data for statistical analyses
Applying This Technology
• Casework
• All types of samples
• Challenged samples
• Mixtures
• Seek additional STRs with intra-repeat/near flanking
area SNPs
• Investigative leads
Combining and Bringing On Line
MPS with Current Workflow
• Essentially the same concepts as current capabilities
• QA
• Pre and post areas
• Validation
• Interpretation
• Better dynamic range***
• Work on automation capabilities
• Develop workflow for barcoding to minimize contamination
• Data analyses workflows
• Run side-by-side with current technology for
verification
• Education and Training
UNTHSC MPS STUDIES
Identical Twins
MPS and Forensic Applications
• Human ID
•
•
•
•
•
•
•
mtDNA
STRs
SNPs
Ancestry
Phenotype
Mixtures
Identical twins
• Animal ID
• Pharmacogenetics
• Molecular autopsy
• Microbial Forensics
•
•
•
•
Biodefense
Human ID
Cause of death
Time since death
Molecular Autopsy
Pharmacogenetics
• Codeine
• Infant died of morphine overdose at 13 days old
• Mother was prescribed Tylenol #3 (acetaminophen and
codeine)
• Codeine is metabolized into morphine
• Mother was an ultra rapid metabolizer
http://www.ilike.com/user/codeine0vr
Microbial Forensics
• Biodefense
• Ex: 2001 Amerithrax case
• Real-time outbreak surveillance
• (Ex: 2011 Germany E. coli, 2013-2014 Ebola)
• Biocrime investigations
• Metagenomics
• Cause of death (ex: drowning); time since death
• Human ID
Human Microbiome
• Human body - 10 trillion cells
• Humans carry over 90 trillion
microbes
• Microbiome is unique to every
individual
• We die with more DNA than
we are born with
Nature 2010
Microbiome-Human ID
• Fierer et al, 2009
Whole Genome Sequencing
Privacy Concerns
ACKNOWLEDGMENTS
• Promega Corporation
• Ann MacPhetridge
• Jaynish Patel
• Douglas R. Storts
• Illumina, Inc.
• Cydne Holt
• Joe Valaro
• Kathy Stephens
Research Team
#1
• Thermo Fisher
•
•
•
•
•
•
•
•
•
•
•
Robert Lagace
Wenchi Liao
Joseph Chang
Narasimhan Rajagopalan
Sharon Wootton
Chien-Wei Chang
Reina Langit
Nnamdi Ihuegbu
Carolina Dallett
Gloria Lam
Jianye Ge