Targeted 454 sequencing

Download Report

Transcript Targeted 454 sequencing




Next-Generation sequencing (NGS)
technologies – overview
NGS targeted re-sequencing – fishing out the
regions of interest
NGS workflow: data collection and processing
– the exome sequencing pipeline
Next-Generation sequencing
(NGS) technologies – overview

The automated
Sanger method is
considered as a ‘firstgeneration’
technology, and
newer methods are
referred to as nextgeneration
sequencing (NGS).


1953 Discovery of DNA double helix structure
1977
◦ A Maxam and W Gilbert "DNA seq by chemical degradation"
◦ F Sanger"DNA sequencing with chain-terminating inhibitors"







1984
1987
1991
1996
2001
2003
2004
DNA sequence of the Epstein-Barr virus, 170 kb
Applied Biosystems - first automated sequencer
Sequencing of human genome in Venter's lab
P. Nyrén and M Ronaghi - pyrosequencing
A draft sequence of the human genome
human genome completed
454 Life Sciences markets first NGS machine
Random
genome
sequencing
Sanger
sequencing
• 25 Mb
• 700-1000 bp
• 300k reads
• 110bp
• Targeted

The newer technologies constitute various
strategies that rely on a combination of
◦ Library/template preparation
◦ Sequencing and imaging

Commercially available technologies
◦ Roche – 454
 GSFLX titanium
 Junior
◦ Illumina
 HiSeq2000
 MySeq
◦ Life – SOLiD
 5500xl
 Ion torrent
◦ Helicos BioSciences – HeliScope
◦ Pacific Biosciences – PacBio RS

Produce a non-biased source of nucleic acid
material from the genome

Produce a non-biased source of nucleic acid
material from the genome


Produce a non-biased source of nucleic acid
material from the genome
Current methods:
◦ randomly breaking genomic DNA into smaller sizes
◦ Ligate adaptors
◦ attach or immobilize the template to a solid surface
or support
◦ the spatially separated template sites allows
thousands to billions of sequencing reactions to be
performed simultaneously

Clonal amplification
◦ Roche – 454
◦ Illumina – HiSeq
◦ Life – SOLiD

Single molecule sequencing
◦ Helicos BioSciences – HeliScope
◦ Pacific Biosciences – PacBio RS

In solution – emulsion PCR (emPCR)
◦ Roche – 454
◦ Life – SOLiD

Solid phase – Bridge PCR
◦ Illumina – HiSeq
SOLiD
454
Picotitre plate
Pyrosequencing
Heliscope
BioPac
HiSeq
Heliscope


The major advance offered by NGS is the
ability to cheaply produce an enormous
volume of data
The arrival of NGS technologies in the
marketplace has changed the way we think
about scientific approaches in basic, applied
and clinical research
fishing out the regions of
interest
Random
genome
sequencing
???
???
Sanger
sequencing
•Targeted
•700-1000
bp



Library/template preparation
Library enrichment for target
Sequencing and imaging
Random
genome
sequencing
Hybrid
Capture
PCR based
Sanger
sequencing
In solution
Solid phase
•Agilent
•Agilent
•Nimblegen
•Nimblegen
•...
•Febit
•...
In solution
Solid phase
• Relatively cheap
• Straightforward method
• High throughput is
possible
• Flexible
• Small amounts of DNA
sufficient
• Higher amounts of DNA
•Uniplex
•Multiplex
•Fluidigm
•Raindance
•Multiplicon
•Longrange PCR products
•Raindance
• 48.48 Access Array
• 48.48 Access Array
• 48.48 Access Array
data collection and processing
– the exome sequencing
pipeline

The human genome
◦ Genome = 3Gb
◦ Exome = 30Mb
◦ 180 000 exons

Protein coding genes
◦ constitute only approximately 1% of the human
genome
◦ It is estimated that 85% of the mutations with large
effects on disease-related traits can be found in
exons or splice sites
gDNA
3 Gb
Exome
38Mb
NGS
exome capture
Seq - 2.5Gbases
total cost
7000
5900
3460
2600
1100
860
1300
300
1/01/2010
1/08/2010
1000
1/01/2011

HiSeq specifications:
◦
◦
◦
◦

2 flow cells
16 lanes (8 per flow cell)
200-300 Gbases per flow cell
10 days for a single run
Exome throughput
◦ 96 @ 60x coverage per run
◦ 3000 @ 60x coverage per year
Data formatting & QC
Mapping & QC
Variant calling
Variant annotation
Variant filtering/comparison
DATA GENERATION
INTERPRETATION
RESULTS
REPORTING
&
VALIDATION
DATA PROCESSING
DATA STORAGE
Prepare
sample library
Perfom exome
capture
Perform
sequencing
Prepare
sample library
Perfom exome
capture
Perform
sequencing
Prepare
sample library
Perfom exome
capture
Perform
sequencing
DATA GENERATION
DATA PROCESSING
DATA STORAGE
Image processing
Base calling
Sequence Data
10-15 Gb / exome
1
•Mapping
2
•Duplicate marking
3
•Local realignment
4
•Base quality recalibration
5
•Analysis-ready mapped reads
DATA GENERATION
DATA PROCESSING
DATA STORAGE
Image processing
Base calling
Sequence Data
10-15 Gb / exome
QC sequencing
Mapping sequences
QC capture exp
QC NGS
Mapping
QC HC
QC NGS
Mapping
QC HC
DATA GENERATION
DATA PROCESSING
DATA STORAGE
Image processing
Base calling
Sequence Data
10-15 Gb / exome
QC sequencing
Mapping sequences
QC capture exp
Mapping results
5 Gb / exome
Variant Calling
Variant Annotation
DATA GENERATION
DATA PROCESSING
DATA STORAGE
Image processing
Base calling
Sequence Data
10-15 Gb / exome
QC sequencing
Mapping sequences
QC capture exp
Mapping results
5 Gb / exome
Variant Calling
Variant Annotation
Variant Calls
100Mb / exome
1200000
1000000
800000
600000
400000
200000
0
INDEL
SNP
1000000
900000
800000
stopgain SNV
700000
nonsynonymous SNV
600000
nonframeshift insertion
500000
nonframeshift deletion
non-coding
400000
frameshift insertion
300000
frameshift deletion
200000
100000
0
20000
18000
16000
14000
synonymous SNV
stoploss SNV
stopgain SNV
12000
nonsynonymous SNV
10000
nonframeshift insertion
8000
nonframeshift deletion
6000
frameshift insertion
4000
2000
0
frameshift deletion
500
450
400
stoploss SNV
350
stopgain SNV
300
nonframeshift insertion
250
nonframeshift deletion
200
frameshift insertion
150
frameshift deletion
100
50
0
DATA GENERATION
DATA PROCESSING
DATA STORAGE
Image processing
Base calling
Sequence Data
10-15 Gb / exome
QC sequencing
Mapping sequences
QC capture exp
Mapping results
5 Gb / exome
Variant Calling
Variant Annotation
Variant Calls
100Mb / exome
Variant Filtering
Database known
Variants Public &
Private
DATA PROCESSING
DATA STORAGE
Image processing
Base calling
Sequence Data
10-15 Gb / exome
INTERPRETATION
QC sequencing
Mapping sequences
QC capture exp
Mapping results
5 Gb / exome
RESULTS
Validated
variants in
candidate genes
Variant Calling
Variant Annotation
Variant Calls
100Mb / exome
Variant Filtering
Database known
Variants Public &
Private
DATA GENERATION
REPORTING
&
VALIDATION