Bioinformatics pipeline for detection of immunogenic

Download Report

Transcript Bioinformatics pipeline for detection of immunogenic

Scalable Algorithms for
Next-Generation Sequencing
Data Analysis
Ion Mandoiu
UTC Associate Professor in Engineering Innovation
Department of Computer Science & Engineering
Next Generation Sequencing
Roche/454
SOLiD 5500
PacBio RS
Illumina HiSeq
Ion Proton
Oxford Nanopore
Ongoing Projects
• Transcriptome Analysis
-
Transcriptome quantification and differential expression analysis
Computational deconvolution of heterogeneous samples
Transcriptome and meta-transcriptome assembly
• Viral quasispecies
-
Quasispecies reconstruction from NGS reads
IBV evolution and vaccine optimization
Transmission graphs
• Immunoinformatics
- Genomics-guided immunotherapy
- Deep panning for early cancer detection
• Sequencing error correction, genome assembly and scaffolding, metabolomics,
biomarker selection, …
- More info & software at http://dna.engr.uconn.edu
3
Transcriptome Quantification
• IsoEM algorithm for isoform expression estimation
- Incorporates fragment length distribution, hexamer bias correction, …
Ion Torrent MAQC datasets
0.8
0.7
0.6
R2
0.5
0.4
0.3
0.2
0.1
0
A
B
C
A
C
IsoEM
HBR
• RNA-PhASE pipeline for allele-specific isoform expression
Cufflinks
HBR
IsoEM
UHR
Cufflinks
UHR
Differential Expression
• Fast estimation enables the use of accurate bootstrapping-based methods
MAQC 454 datasets UHRR SRX002934 vs HBRR SRX002935
Computational Deconvolution of
Heterogeneous Samples
• Goal: characterization expression of mesoderm
progenitor cells
– Whole-transcriptome expression data for NSB cell
mixtures + single-cell qPCR data for few genes
• Three step approach
– Cluster of single cell qPCR data and infer
“reduced” cell type signatures
– Infer mixing proportions based on reduced
signatures using quadratic programming
– Infer full expression signatures based on
mixing proportions, solving one quadratic
program per gene
Reference-Guided Transcriptome
Reconstruction
t1 :
t2 :
t3 :
t4 :
1
2
3
4
5
6
7
1
2
3
4
5
6
7
3
4
5
6
7
3
4
5
7
3
4
5
7
1
1
1
2
TRIP: Transciptome Reconstruction
using Integer Programming
• Select the smallest set of putative transcripts
that yields a good statistical fit between
– empirically determined during library preparation
– implied by “mapping” read pairs
500
1
2
3
200
200
200
Mean : 500; Std. dev. 50
300
Mean : 500; Std. dev. 50
1
3
200
200
De Novo (Meta)Transcriptome Assembly of
Bugula Neritina and its Symbiont
• Uncultured bacterial symbiont produces bryostatins
- Symbiont absent in Northern Atlantic populations
De Novo (Meta)Transcriptome Assembly of
Bugula Neritina and its Symbiont
• Developing scalable multi-sample meta
transcriptome assembly pipeline based on
differential-coverage clustering of reads
Acknowledgements
Sahar Al Seesi
Abdul Banday
Amir Bayegan
Gabriel Ilie
Caroline Jakuba
James Lindsay
Rahul Kanadia
Craig Nelson
Marius Nicolae
Adrian Caciula
Nicole Lopanik
Serghei Mangul
Yvette Temate Tiagueu
Alex Zelikovsky