Transcript Week 15

Tools and Algorithms in Bioinformatics
GCBA815, Fall 2015
Week15: CLC Genomics
Matthew Cserhati, Ph.D.
Bioinformatics Programmer
(Guda lab)
Department of Genetics, Cell Biology and Anatomy
University of Nebraska Medical Center
__________________________________________________________________________________________________
Fall 2015
GCBA 815
Introduction




A comprehensive and user-friendly analysis package
for analyzing, comparing, and visualizing next
generation sequencing data
Website: http://www.clcbio.com/products/clcgenomics-workbench/
Latest version 8.5.1.
Also available campus wide via INBREweb in Virtual
Machine

Which we will test in this class
__________________________________________________________________________________________________
Fall 2015
GCBA 815
__________________________________________________________________________________________________
Fall 2015
GCBA 815
Types of tools






Classical Sequence Analysis Tools
 Alignments, sequence shuffling, motif search, nucleotide and
protein analysis
Molecular Biology Tools
 Primer design, restriction analysis
BLAST
 Download databases, BLAST at NCBI, create database
NGS Core Tools
 QC report, trim reads, read mapping, consensus sequence
extraction
De Novo sequencing
And much, much more! …
__________________________________________________________________________________________________
Fall 2015
GCBA 815
Description of test files




Paired end fastq files
 X5: 3.4M reads
 X8: 16.6M reads
Derived from whole genome
Belonging to strains of the same microbial species
Goal is whole genome assembly from these fastq
files
__________________________________________________________________________________________________
Fall 2015
GCBA 815
Exercises







Import data
 Open read data
 Reference genomes
Quality checks
 QC report
 trimming
Guided assembly
De novo assembly
Remove duplicate reads
ORF prediction
Extra: runs with Example data

mRNA secondary structure
 Motif search
__________________________________________________________________________________________________
Fall 2015
GCBA 815
Guided vs. de novo genome assembly



Guided
 Aligning reads in fastq sequence files to a genome
from a relative species
 More efficient and precise than de novo alignment
 Faster
 Variant analysis possible only with guided assembly
De novo
 Done if lacking a relative species
 Results in contigs which must be joined
 Can be combined with mapping contigs to genome
from relative species
 Much slower
Similar to putting together jigsaw puzzle
with/without similar puzzle template
__________________________________________________________________________________________________
Fall 2015
GCBA 815
Variant analysis
Basic
 Germline and somatic variants
 Detects any variants observed in reads
 Fixed ploidy
 Germline variants
 For known ploidy (microbe => 1)
 Discards variants which are due to sequencing error
or mapping artefacts
 Low frequency
 Germline and somatic variants
 For unknown/mixed ploidy
 Discards variants which are due to sequencing error
__________________________________________________________________________________________________

Fall 2015
GCBA 815
Sample outputs
__________________________________________________________________________________________________
Fall 2015
GCBA 815
__________________________________________________________________________________________________
Fall 2015
GCBA 815
__________________________________________________________________________________________________
Fall 2015
GCBA 815
__________________________________________________________________________________________________
Fall 2015
GCBA 815
Thanks for your attention!
__________________________________________________________________________________________________
Fall 2015
GCBA 815