Transcript Week 15
Tools and Algorithms in Bioinformatics
GCBA815, Fall 2015
Week15: CLC Genomics
Matthew Cserhati, Ph.D.
Bioinformatics Programmer
(Guda lab)
Department of Genetics, Cell Biology and Anatomy
University of Nebraska Medical Center
__________________________________________________________________________________________________
Fall 2015
GCBA 815
Introduction
A comprehensive and user-friendly analysis package
for analyzing, comparing, and visualizing next
generation sequencing data
Website: http://www.clcbio.com/products/clcgenomics-workbench/
Latest version 8.5.1.
Also available campus wide via INBREweb in Virtual
Machine
Which we will test in this class
__________________________________________________________________________________________________
Fall 2015
GCBA 815
__________________________________________________________________________________________________
Fall 2015
GCBA 815
Types of tools
Classical Sequence Analysis Tools
Alignments, sequence shuffling, motif search, nucleotide and
protein analysis
Molecular Biology Tools
Primer design, restriction analysis
BLAST
Download databases, BLAST at NCBI, create database
NGS Core Tools
QC report, trim reads, read mapping, consensus sequence
extraction
De Novo sequencing
And much, much more! …
__________________________________________________________________________________________________
Fall 2015
GCBA 815
Description of test files
Paired end fastq files
X5: 3.4M reads
X8: 16.6M reads
Derived from whole genome
Belonging to strains of the same microbial species
Goal is whole genome assembly from these fastq
files
__________________________________________________________________________________________________
Fall 2015
GCBA 815
Exercises
Import data
Open read data
Reference genomes
Quality checks
QC report
trimming
Guided assembly
De novo assembly
Remove duplicate reads
ORF prediction
Extra: runs with Example data
mRNA secondary structure
Motif search
__________________________________________________________________________________________________
Fall 2015
GCBA 815
Guided vs. de novo genome assembly
Guided
Aligning reads in fastq sequence files to a genome
from a relative species
More efficient and precise than de novo alignment
Faster
Variant analysis possible only with guided assembly
De novo
Done if lacking a relative species
Results in contigs which must be joined
Can be combined with mapping contigs to genome
from relative species
Much slower
Similar to putting together jigsaw puzzle
with/without similar puzzle template
__________________________________________________________________________________________________
Fall 2015
GCBA 815
Variant analysis
Basic
Germline and somatic variants
Detects any variants observed in reads
Fixed ploidy
Germline variants
For known ploidy (microbe => 1)
Discards variants which are due to sequencing error
or mapping artefacts
Low frequency
Germline and somatic variants
For unknown/mixed ploidy
Discards variants which are due to sequencing error
__________________________________________________________________________________________________
Fall 2015
GCBA 815
Sample outputs
__________________________________________________________________________________________________
Fall 2015
GCBA 815
__________________________________________________________________________________________________
Fall 2015
GCBA 815
__________________________________________________________________________________________________
Fall 2015
GCBA 815
__________________________________________________________________________________________________
Fall 2015
GCBA 815
Thanks for your attention!
__________________________________________________________________________________________________
Fall 2015
GCBA 815