Illumina_JGI.2011.04.01
Download
Report
Transcript Illumina_JGI.2011.04.01
Genome Biology for Programmers
Lecture Series: Illumina Sequencing
Chris Daum
JGI Illumina Group Lead
April 1, 2011
Outline
• Workflow Overview
• Process Science
– Sample Prep & qPCR quantification
– Cluster Generation
– Sequencing
• Sequencer instruments: GA & HiSeq
• Illumina Developments
• Illumina quality & continuous improvement
Illumina Workflow
Sample
Preparation
Sample
Quantification
Clustering
Sequencing
Analysis
Analysis
Sample Preparation
Library Preparation – Main Goals:
• Prepares sample nucleic acids for sequencing
• Many library types and creation procedures exist
• However, all preparation results in the same general
template structure:
– Double-stranded DNA flanked by two different adapters
– Variables include:
• Sequencing Application & Starting material (e.g. gDNA,
mRNA, Mate Pair, Active Chromatin, ChIP-Seq)
• Insert Size
• Adaptor type
• Index for multiplexing
Example Sample Prep Workflow:
TruSeq Paired-end Library
DNA
RNA
Library Quantification - qPCR
• Real-time qPCR allows accurate quantification of DNA
templates:
– qPCR is based on the detection of a fluorescent reporter
molecule that increases as PCR product accumulates with each
cycle of amplification
– By using primers specific to the Illumina universal adapters in a
qPCR reaction containing library template, only cluster-forming
templates will be amplified and quantified
Library Quantification - qPCR
Threshold of florescence for
amplicon to produce a Cq
Cq – Cycle of Quantification
Cycle Threshold
Plot Standard curve using
controls and determine
concentration of library
Phases of qPCR: Geometric phase
– amplicons doubling every cycle;
greatest precision & accuracy for
quantitation
Log initial concentration
Take home: qPCR mimics what is happening on the
surface of the flowcell during cluster generation and
allows for determining optimal loading concentrations.
Cluster Generation
• Process occurs on cBot instrument:
– Aspirates DNA samples into flow cell
– Automates the formation of amplified
clonal clusters from the DNA single
molecules
– 1000x amplification generates clusters
– Hybridizes sequencing primer(s)
Illumina cBot
• Cluster Generation 2.0
– Automated system significantly
reduces workload for
generation of flowcells
– Compact design saves lab space
– Reagent cartridge reduces prep
time
Flowcell
Cluster Generation Prep
• Prepare reagents and denature & dilute library:
• The goal is to have the perfect cluster density to maximize yield (bp), this
is achieved via optimized loading concentrations as determined by qPCR
• Considerations:
– Too low density: Fewer clusters, less sequence generated
– Too high density: Overlapping clusters, removed by analysis filters, poor
quality
Cluster Generation Chemistry
• Cluster generation Chemistry:
– Hybridization
– Amplification
– Linearization
– Blocking
– Primer hybridization
Cluster Generation Chemistry
• Hybridize Sample fragments & extend:
Cluster Generation Chemistry
• Bridge Amplification:
Cluster Generation Chemistry
• Linearization, Blocking & Sequencing Primer
Hybridization:
Sequencing
• Main Goals:
– Translate the chemical information of the
nucleotides into fluorescence information
which can be captured optically
– The optical information is then
transformed into text, which can be
searched, aligned, or otherwise mined for
biologically relevant data
Sequencing Workflow
HiSeq Run Type
Approx.
Run
Days
1x50 Flowcells
2x100 Flowcells
2x150 Flowcells
2
9
13
Sequencing by Synthesis
• Clustered Flowcell is loaded on Illumina
sequencer:
Sequencing Chemistry:
First Cycle Base Incorporation
• To initiate the first sequencing cycle, add
all 4 fluorescently labeled reversible
terminators and DNA polymerase
enzyme to the flowcell.
• The complementary nucleotide will be
added to the first position of each
cluster.
• A laser is then used to excite the
attached fluorophore.
Sequencing Chemistry:
First Cycle Imaging
Sequencing Chemistry:
Cycle 2 and so on…
Sequencing Read 2
• Resynthesis of second strand for Read 2 occurs
on sequencer without removing flowcell:
Paired-End Sequencing:
When performing a paired-end run, after the initial cycles (Read1), an additional cluster generation is perform
the analyzer, and the template is sequenced in the opposite direction, as depicted in the figures below.
Index for Multiplex Sequencing
• Sample multiplexing involves 3
reads:
– A: Sample Read 1 is sequenced
– B: Read 1 product removed and
Index Read is sequenced
– C: Template strand used to generate
complementary strand, and sample
Read 2 is sequenced
• Analysis software identifies the
index sequence from each cluster so
that the sample reads 1 & 2 can be
assigned to single sample
Illumina HiSeq2000 Sequencer
Nifty Lights
HiSeq2000 Reagents
1 HiSeq = 2 GAs
HiSeq2000 Fluidics
Fluidics were the Achilles heel of the GA, and now 2X in the HiSeq
HiSeq2000 Fluidics
FY11 Service Metrics: Pareto
Pareto: FY11 Service Requests
9
100%
8
90%
80%
7
Incidents
60%
5
50%
4
40%
3
Cumulative Percent
70%
6
30%
2
20%
1
10%
0
0%
Service Request Categories
29
HiSeq: Temperature control
• 3 mechanisms:
– Heat extraction via liquid coolant
– Flow cell temperature control via Peltier
– Maintain reagents temperature via cooled compartment
Flow cell sits on Peltier blocks, and is water
cooled (heat extraction from underneath)
Reagent Chiller:
• All reagents cooled at 4C
• Condensation Pump runs every 4 min
for 30 sec
HiSeq Flowcell Loading
HiSeq Imaging
HiSeq Optics
HiSeq Lasers
HiSeq Software Interface
HiSeq Software Interface
HiSeq – Real Time Metrics
HiSeq vs GA
Cost & Throughput Comparison
GAIIx
Run Type
1x36
Seq Prep Reagents $
Seq Reagents $
Seq Prep & Seq Total
$
2x36
HiSeq
2x76
2x150
1x50
2x50
2x100
2x150
2,292
864
$
$
4,012
1,728
$
$
4,012
3,456
$
$
4,012
6,912
$
$
2,442
1,436
$
$
3,747
2,872
$
$
3,747
5,175
$
$
3,737
6,611
3,156
$
5,740
$
7,468
$
10,924
$
3,878
$
6,619
$
8,922
$
10,348
Avg. Bases (Gb)
8.0
19.1
35.9
70.4
20.8
41.6
83.3
124.9
Avg. Reads (Millions)
222.2
265.0
236.3
234.6
416.0
416.0
416.4
416.3
Cost per lane $
451
$
820
$
1,067
$
1,561
$
554
$
946
$
1,275
$
1,478
Cost per 1 Gb $
Cost per Million reads $
395
14
$
$
301
22
$
$
208
32
$
$
155
47
$
$
186
9
$
$
159
16
$
$
107
21
$
$
83
25
Notes:
•Throughput metrics are averages from runs performed in FY11 for each of the run types to date
•Italicized HiSeq Bases & Reads throughput metrics are estimates based on 2x100 run type since we have limited data on other run types
•Only vendor reagent costs shown here; library creation and overhead costs are not included, but are roughly equal and are mostly
independent of run type
•Cost per million reads goes up with the longer run types, but the readlength increases as well and this makes each read more valuable for
some assembly applications
•HiSeq 2x150 run type not yet supported & the current HiSeq chemistry has worse quality beyond 80-100bases than compared to GA
•The HiSeq platform is still new and we are experiencing a higher number of hardware failures than GA; Illumina does replace reagents for
failed runs and we rerun failed flowcells immediately whenever possible.
HiSeq Development
Coming in early Summer:
40
HiSeq Development
41
HiSeq Development
42
Introducing MiSeq
43
MiSeq: all-in-one
44
MiSeq: Fast, low throughput
45
Providing Quality Sequence
Incident Reporting &
Resolution (JIRA)
Troubleshooting Procedures
Throughput Goals &
Metrics
FY11 Cumulative Flowcells
500
489
450
400
350
250
FC Cumulative
FC Goal
200
163
FY11 Cumulative Bases (Gb)
100
45,000
50
40,000
0
39696.000
35,000
30,000
25,000
Bases Cumulative
20,000
Base Goal
15,000
10,000
Continuous Improvement
- Lean Six Sigma
7993.547
5,000
0
Failure Tracking & SPC
Charts; RQC
Instrument Status &
real-time run monitoring
Illumina Process Metrics by week: Cluster & Run failure rates
60%
% of attempts that failed
150
Total Bases (Gb)
Flowcells
300
Instrument Utilization &
Efficiency
*Illumina05
Illumina02
Illumina03
Problematic instruments with
multiple run failures; 06 is being
replaced & 07 had significant
service work
Illumina07
Illumina08
Illumina09
Illumina10
Illumina11
Illumina12
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
30%
Cluster Failures
Run failures
20%
10%
3-May
*Illumina01
Illumina06
40%
0%
FY10 Q3 Illumina Utilization
Illumina04
50%
100.0%
10-May
17-May
24-May
31-May
7-Jun
14-Jun
21-Jun
28-Jun
5-Jul
12-Jul
19-Jul
26-Jul
2-Aug
LLNL – Six Sigma Training
• Tools and methodologies to:
– Improve work quality
– Improve process efficiencies & eliminate waste
– Improve employee and customer satisfaction
• Lean Six Sigma is about:
– Eliminating waste and improving process flow
– Focusing on reducing variation and improving process yield
by following a problem-solving approach using statistical
tools
What is Six Sigma?
• A Six Sigma process is literally one that’s
statistically 99.99966% successful.
• This is not always cost effective to achieve, so as
a methodology it’s about gaining control of a
process and implementing improvements.
What is Six Sigma?
• Six Sigma is a data driven problem solving approach
where process inputs (Xs) are identified and optimized to
impact the output (Y)
Y = f(x)
• The output is a function of the inputs and process
– Y: Output
– f: function
– X: variables that must be controlled to consistently predict Y