+ R. Piazza – NGS Sequencing 30/10/13 - ilte

Download Report

Transcript + R. Piazza – NGS Sequencing 30/10/13 - ilte

TECNICHE DI NEXT GENERATION
SEQUENCING IN CAMPO MEDICO
Dr. R. Piazza
R. Piazza – NGS Sequencing 30/10/13
XVI-XVII secolo: anatomia
umana
XIX secolo: microbiologia
XX secolo: biochimica e
biologia molecolare
2008-2013:
rivoluzione
genetica
R. Piazza – NGS Sequencing 30/10/13
SANGER SEQUENCING
+
DNA
R. Piazza – NGS Sequencing 30/10/13
NEXT GENERATION SEQUENCING
Flowcell
R. Piazza – NGS Sequencing 30/10/13
NEXT GENERATION SEQUENCING
Library di DNA
Genomic DNA
R. Piazza – NGS Sequencing 30/10/13
NEXT GENERATION SEQUENCING
R. Piazza – NGS Sequencing 30/10/13
NEXT GENERATION SEQUENCING
I 4 nucleotidi marcati con
fluorocromi e bloccati in 3’
sono aggiunti
contemporaneamente
Primer di sequenziamento
Nucleotidi marcati e bloccati
R. Piazza – NGS Sequencing 30/10/13
NEXT GENERATION SEQUENCING
R. Piazza – NGS Sequencing 30/10/13
NEXT GENERATION SEQUENCING
ACQUISIZIONE DELL’IMMAGINE
RIMOZIONE DEL BLOCCO AL 3’
RIMOZIONE DEL FLUOROFORO
R. Piazza – NGS Sequencing 30/10/13
HIGH-THROUGHPUT SEQUENCING
R. Piazza – NGS Sequencing 30/10/13
R. Piazza – NGS Sequencing 30/10/13
T = Clusters#/Tile x Tile/Lane# x Lanes# x Seq_Length x 2
T = 300000 * 120 * 8 * 76 * 2 = ~ 45 Gigabasi!
76bp 76bp
Genoma umano = 3 Gigabasi
Un’analisi richiede ~ 6000 Gigabyte per lo storage dei dati!
R. Piazza – NGS Sequencing 30/10/13
SANGER SEQ vs. NGS
THROUGHPUT
COSTO PER-BASE
Allele #1 C A G C G A C A G C A G C A T T G G G A C
Allele #2 C A G C G A C A G C G G C A T T G G G A C
NGS Read #5 C A G C G A C A G C G G C A T T G G G A C
NGS Read #4 C A G C G A C A G C A G C A T T G G G A C
NGS Read #3 C A G C G A C A G C A G C A T T G G G A C
NGS Read #2 C A G C G A C A G C A G C A T T G G G A C
NGS Read #1 C A G C G A C A G C G G C A T T G G G A C
Coverage = 5
Allele #1 C A G C G A C A G C A G C A T T G G G A C
Allele #2 C A G C G A C A G C G G C A T T G G G A C
R. Piazza – NGS Sequencing 30/10/13
HIGH-THROUGHPUT SEQUENCING: APPLICAZIONI
DNA
GENOMIC DNA SEQUENCING
RESEQUENCING
DE NOVO SEQUENCING
WHOLE-EXOME SEQUENCING
ChIP-Seq
DEEP SEQUENCING
METHYL-SEQ
RNA
mRNA SEQUENCING
TRANSCRIPTOME SEQUENCING (RNA-SEQ)
TAG SEQUENCING (DITAG)
MICRO-RNA STUDIES
R. Piazza – NGS Sequencing 30/10/13
WHOLE-GENOME, WHOLE-EXOME AND ULTRADEEP-SEQUENCING
COVERAGE
COVERAGE
R. Piazza – NGS Sequencing 30/10/13
ULTRADEEP SEQUENCING – QUANDO ?
M
M
ABL kinase domain
R. Piazza – NGS Sequencing 30/10/13
R. Piazza – NGS Sequencing 30/10/13
WHOLE-EXOME SEQUENCING
R. Piazza – NGS Sequencing 30/10/13
R. Piazza - Catania - 11/7/13
R. Piazza – NGS Sequencing 30/10/13
ALIGNMENT DONE: WHAT’S NEXT ?
VARIANT CALLING
SINGLE
NUCLEOTIDE
POLYMORPHISM
MUTATION, SEQ ERROR OR SNP ?
VARIANT
CTA AG
G CTA AG
TG CTA AG ....
TTG CTG AG
AA TTG CTG AT
C TGAA TTG CTG AT
..ACTGAATTGCTGATTGTCAAGTCTGCTAGCG...
A T
A T
G T ....
G T
G T
G T
..ACTGAATTGCTGATTGTCAAGTCTGCTAGCG..
CASE SAMPLE
CONTROL SAMPLE
VarScan 2 (http://massgenomics.org/varscan)
Koboldt DC et al., Genome Res. 2012 Mar;22(3):568-76
R. Piazza – NGS Sequencing 30/10/13
CASE
CONTROL
WHOLE-EXOME SEQUENCING GOES DIGITAL
R. Piazza – NGS Sequencing 30/10/13
CONTROL
LOSS OF HETEROZYGOSITY – ALLELIC IMBALANCE
A
A T
CASE
T
A
A T
R. Piazza – NGS Sequencing 30/10/13
WHOLE-EXOME SEQUENCING GOES DIGITAL: CEQer
COMPARATIVE
EXONIC
QUANTIFICATION
ANALYZER
Piazza R. et al., PLoS One. 2013 Oct 4;8(10):e74825
Statistical module
Wilcoxon Signed-Rank
test
Nr
Test statistic W
 
 
W   sgn xi( case )  xi( control )  Ri
i 1
As sample size increases
(Nr> 10) the Z-Score
converges to a Gaussian
distribution!
Estimating the error function of the normal
distribution of W..
..using the Abramowitz and Stegun
approximation equation 7.1.26
Wilcoxon Signed-Rank test


erf ( x)  1  a1t  a2t 2  a3t 3  a4t 4  a5t 5 e  x
2
R. Piazza – NGS Sequencing 30/10/13
CML-BC PATIENT: CML001BC
Chr9
Log2 Ratio
HET POSITION
IN CONTROL
EXON
CDKN2A (p16)
R. Piazza – NGS Sequencing 30/10/13
CML-BC PATIENT: CML004BC
Chr17
http://www.ngsbicocca.org/html/ceqer.html
p53
R. Piazza – NGS Sequencing 30/10/13
ANALISI DI PRODOTTI DI FUSIONE ONCOGENICI
R. Piazza – NGS Sequencing 30/10/13
ANALISI DI PRODOTTI DI FUSIONE ONCOGENICI
FRAMMENTAZIONE
?
R. Piazza – NGS Sequencing 30/10/13
mRNA-seq – DRIVER FUSION TRANSCRIPTS IDENTIFICATION
Junction reads
Bridge reads
76bp
76bp
Piazza R. et al., Nucleic Acids Res. 2012 Sep;40(16):e123
R. Piazza – NGS Sequencing 30/10/13
ALIGNMENT
TO HUMAN
GENOME
CCDS /
REFFLAT
EXOME
DATASET
SAM
EXOME
BUILDER
BAM
ABNORMAL
PAIRS
HALFMAPPED
PAIRS
???
Genome
ABNORMAL
PAIRS
SCANNER
PUTATIVE
TRANSLOCATIONS
SET (PTS)
PREFILTERING
ALGORITHM
Read Quality
Mapping Quality
Homology Filter
Threshold Filter
N Filter
BCR ex14
ABL ex2
FILTERED
HALF-MAPPED
PAIRS
FILTERED
PTS
R. Piazza – NGS Sequencing 30/10/13
1
FILTERED
PTS
JUNCTION
FINDER
2
Ex12 Ex13 Ex14
3 4
Ex2
Ex3
Ex4
ABL
BCR
1
JUNCTIONS
LIST
2
3
4
FILTERED
HALF-MAPPED
PAIRS
???
BCR
ALIGNMENT
ALGORITHM
Ex14
Ex2
JUNCTION
READ
JUNCTION
JUNCTION
R. Piazza – NGS Sequencing 30/10/13
JUNCTION
READ
FRAME
ALGORITHM
DIRECTION
ALGORITHM
RECIPROCAL
TRANSLOCATION
ALGORITHM
5’ BCR
ABL
3’
5’ ABL
BCR
3’
R. Piazza – NGS Sequencing 30/10/13
AML1-ETO t(8;21)
CBFB-MYH11 inv(16)
BCR-ABL1 p190 t(9;22)
BCR-ABL1 p210
e13a2 t(9;22)
BCR-ABL1 p210
e14a2 t(9;22)
CEP110-FGFR1 t(8;9) EWSR1-ERG t(21;22) MLL-MLLT1 t(11;19)
MLL-MLLT3 t(9;11) MLLT10-PICALM t(10;11) NCOA4-RET inv(10)
NPM-ALK t(2;5)
R. Piazza – NGS Sequencing 30/10/13
HIGH EXPRESSION LOW EXPRESSION
RNA-Seq
RNA-SEQ GOES DIGITAL
READ
EXON
RPKM = READS PER KBASE PER MILLION OF MAPPED READS
TPM = TRANSCRIPTS PER MILLION
TOPHAT (http://tophat.cbcb.umd.edu/)
CUFFLINKS (http://cufflinks.cbcb.umd.edu/)
Trapnell C, et al. Nat. Biotechnol. 2010;28:511–515.
HIGH-THROUGHPUT SEQUENCING: APPLICAZIONI
DNA
GENOMIC DNA SEQUENCING
RESEQUENCING
DE NOVO SEQUENCING
WHOLE-EXOME SEQUENCING
ChIP-Seq
DEEP SEQUENCING
METHYL-SEQ
RNA
mRNA SEQUENCING
TRANSCRIPTOME SEQUENCING (RNA-SEQ)
TAG SEQUENCING (DITAG)
MICRO-RNA STUDIES
R. Piazza – NGS Sequencing 30/10/13
METHYL-SEQ
R. Piazza – NGS Sequencing 30/10/13
NEXT GENERATION SEQUENCING
A LARGE NUMBER OF TOOLS HAS BEEN
DEVELOPED TO ANALYSE NGS DATA
THE LARGE MAJORITY OF THEM IS
COMPLETELY FREE
MANY TOOLS ARE OPEN-SOURCE
STANDARDIZED FILE FORMATS ARE NOW
AVAILABLE FOR SEQUENCES AND ALIGNMENTS
IS THIS THE PERFECT NGS WORLD ??
R. Piazza – NGS Sequencing 30/10/13
NEXT GENERATION SEQUENCING
THE LARGE MAJORITY OF NGS TOOLS IS
FAR FROM BEING USER-FRIENDLY
INSTALLATION IS CHALLENGING
(DEPENDENCY HELL!)
MANY TOOLS RUN ONLY UNDER LINUX
THE SAME NGS DATA MUST BE OFTEN
INTERROGATED MULTIPLE TIMES
R. Piazza – NGS Sequencing 30/10/13
ESPERIENZA DELL’EMATOLOGIA MONZESE IN NGS:
LA LEUCEMIA MIELOIDE ATIPICA
La Leucemia Mieloide Cronica Atipica (aCML) è una patologia clonale
appartenente al gruppo delle sindromi mielodisplastiche/mieloproliferative
(MDS/MPN).
La aCML è caratterizzata da manifestazioni cliniche e di laboratorio simili alla
CML classica, tuttavia l’assenza del cromosoma Philadelphia e del prodotto di
fusione BCR/ABL suggeriscono la presenza di un differente meccanismo
patogenetico.
La prognosi dell’aCML è infausta, con una mediana di sopravvivenza di 37
mesi.
La causa molecolare dell’aCML è ad oggi sconosciuta.
Con l’obiettivo di identificare lesioni molecolari ricorrenti in aCML abbiamo
effettuato analisi di sequenziamento esonico in 8 campioni (DNA genomico da
cellule leucemiche + DNA germline) di aCML.
R. Piazza – NGS Sequencing 30/10/13
In media, 8 miliardi
di basi sequenziate
per esoma
Coverage esonico
medio: 80x
84 varianti somatiche
esoniche, di cui 63
non sinonime
GENE
PAZIENTE 1
PAZIENTE 2
MUTAZIONI
FREQ.
SETBP1
CMLPh-003
CMLPh-005
G870S, G870S
2/8 (25%)
R. Piazza – NGS Sequencing 30/10/13
TARGETED RESEQUENCING
R. Piazza – NGS Sequencing 30/10/13
Germline
SGS
Varianti
Somatiche
aCML
SETBP1
R. Piazza – NGS Sequencing 30/10/13
p = 0.01
p = 0.008
SETBP1 WT
SETBP1 mutato
WT
MUT
WT
MUT
WT
MUT
R. Piazza – NGS Sequencing 30/10/13
Proteases
MYC
SETBP1
SET
SET
SET
SET
SET
pY307
PP2A
BetaCatenin
AKT
R. Piazza – NGS Sequencing 30/10/13
DOMINIO PEST
p
p
SESHSEETIPSDSGIGTDNNSTSDQAEKSSE
Beta-TRCP (F-box Protein)
Beta-TRCP
R. Piazza – NGS Sequencing 30/10/13
Ub
G870S
Beta-TRCP
SETBP1
Proteases
Proteasome
MYC
G870S
SETBP1
SET
SET
SET
SET
pY307
PP2A
BetaCatenin
AKT
R. Piazza – NGS Sequencing 30/10/13
P
P
SESHSEETIPSDSGIGTDN NSTS DQAEKSS E
Pept. WT: Biotin-S H S E E T I P S D PS G I G PT D N N S T S
Pept. G870S: Biotin-S H S E E T I P S D PS S I G PT D N N S T S
R. Piazza – NGS Sequencing 30/10/13
R. Piazza – NGS Sequencing 30/10/13
Piazza R. et al., Nat Genet. 2013 Jan;45(1):18-24
R. Piazza – NGS Sequencing 30/10/13
HIGH-THROUGHPUT SEQUENCING: DOMANI
PAZIENTE
EMOCROMO
ESAMI EMATOCHIMICI
ESAMI COLTURALI
ESAMI STRUMENTALI
SEQUENZIAMENTO GENOMA
R. Piazza – NGS Sequencing 30/10/13