Genetic Variant Annotation
Download
Report
Transcript Genetic Variant Annotation
CBI Tech. Workshop - NGS Special Session
Lesson 5
Genetic Variant Annotation
Linlin Yan (颜林林)
Center for Bioinformatics, Peking University
Jun 13, 2011
Outline
Review & Overview
Thoughts & Methods
Variant Browsing
Variant Annotation
Association Study
More Beyond
Demos & Exercises
2
Part I: Review & Overview
Workshop Schedule
Topic
Title
Speaker
Date
0
Warm-up
Warm-up and Introduction
GaoG
4-25
1
Basic
File Format & Reads Mapping
YanLL
5-9
Solexa Pipeline
CaiT
5-16
Alignment File Manipulate
YeYX
5-23
4
Genetic Variant Caller
LiuH
5-30
5
Genetic Variant Annotation
YanLL
6-13
6
Genome Assembling
LiZ
6-20
CaiT
6-27
ZhaoHQ
7-4
LiuXQ
7-11
ChenWB
7-18
TangX
7-25
2
3
7
8
Genetics
Transcriptome
(RNA-Seq)
...
Transcript Mapping
9
Transcript Assembling
10
Differential Expression Caller
11
ChIP-Seq Peak Caller
4
NGS Analysis Workflow
Sequencer
Assembling
Contigs / Scaffolds
SNV / CNV / SV
Annotation
Short Reads
Call Variants
Calculate
Expression
Expression Profile
Mapping
Alignments
Call Peaks
Peaks / Regions
5
Genetic Variant Analysis Workflow
Sequencer
Short Reads
Mapping
Alignments
Call Variants
SNV / CNV / SV
Annotation
Solexa Pipeline (Lesson 2)
File Format (Lesson 1)
FASTQ / Quality / SAM / ...
Reads Mapping (Lesson 1)
Maq / Bowtie / BWA
Alignment File Manipulate (Lesson 3)
Samtools / BedTools / FastX-tool
Genetic Variant Caller (Lesson 4)
GATK
Genetic Variant Annotation (Lesson 5)
PolyPhen / SIFT / ANNOVAR / PLINK / ...
6
Part II: Thoughts & Methods
What Could Be Inferred from Variants
SNV / CNV / SV
Genome Annotation
Genetic Variants
Mutation Effects
Disease
Phenotype
What at the positions?
=> Genome Browser
How affect functions?
=> Variant Annotation
What related to phenotype?
=> Association Study
More beyond ...
=> Disease: CDCV vs. CDRV
8
Genome Browser
Online Browsers:
UCSC Genome Browser
http://genome.ucsc.edu/
Ensembl Genome Browser
http://www.ensembl.org/
DNAnexus
https://dnanexus.com/genomes/hg18/public_browse
Local Browsers:
IGV (Integrative Genomics Viewer)
http://www.broadinstitute.org/igv/
9
UCSC Genome Browser
(http://genome.ucsc.edu/cgi-bin/hgTracks?clade=mammal&org=Human&db=hg19)
10
UCSC Genome Browser (cont.)
Support Formats:
BED / bigBed
bedGraph
GFF
GTF
WIG / bigWig
MAF
BAM
BED detail
Personal Genome SNP
PSL
(http://genome.ucsc.edu/)
11
IGV (Integrative Genomics Viewer)
(http://www.broadinstitute.org/igv/)
12
UCSC: Table Browser & Public DB
Retrieve track data in batch
Retrieve sequences in specific regions
Combine regions and/or annotations
Query track data in public MySQL database
(http://genome.ucsc.edu/cgi-bin/hgTables)
13
These are KNOWN variants.
How about UNKNOWN variants?
Mutation Effects Prediction
SIFT (Sorting Intolerant From Tolerant)
http://sift.jcvi.org/
PolyPhen (Polymorphism Phenotyping)
http://genetics.bwh.harvard.edu/pph/
MAPP (Multivariate Analysis of Protein Polymorphism)
http://mendel.stanford.edu/SidowLab/downloads/MAPP/i
ndex.html
SNPs3D
http://www.snps3d.org/
15
Automatically Variant Annotation
ANNOVAR (ANNOtate VARiation)
http://www.openbioinformatics.org/annovar/
Gene-based annotation
SNPs/CNVs affect protein coding
Region-based annotations
Variants in specific region
Filter-based annotation
Variants reported in dbSNP, 1000 genomes
Filter by SIFT score
Others
Retrieve sequences or cadidate gene list in batch
16
Between Patients and Normals
Too many variants detected
Most variants are not related to target disease
Comparing MAF (Minor allele Frequency) between
patients and normals can indicate related variants
MAF
Patients
Normals
Related
SNP1
5%
5%
No
SNP2
40%
10%
Yes
17
Association Study Tools
PLINK
http://pngu.mgh.harvard.edu/~purcell/plink/
gPLINK
http://pngu.mgh.harvard.edu/~purcell/plink/
gplink.shtml
Haploview
http://www.broadinstitute.org/scientific-
community/science/programs/medical-andpopulation-genetics/haploview/haploview
18
More Beyond: Find Out Causal Gene
Two Disease Hypothesis Models:
CDCV: Common Disease, Common Variant
CDRV: Common Disease, Rare Variant
To Find Out Rare Variant
From GWAS (Microarray) to Sequencing
More Samples
Pool-up analysis methods
19
Rare Variant Analysis
Gene-Based Method
(PMID:17660818)
20
Pool Up The Rare Variants
Fixed-Threshold Method (Li, et al, 2008)
Weighted Approach (Madsen, et al, 2009)
Variable-Threshold Method (VT-Test) (Price, et al,
2010)
http://genetics.bwh.harvard.edu/rare_variants/
21
Part III: Demos & Exercises
Demos
Data Preparing
Reads Mapping
Variant Calling
BED/Wig generation
23
Demos (cont.)
UCSC Genome Browser
Uploading BAM/BED/Wig
IGV Genome Browser
Loading BAM/BED/Wig
UCSC Table Browser
Retrieve track data
Retrieve coding sequences
UCSC Public Database
24
Demos (cont.)
SIFT & PolyPhen
ANNOVAR
PLINK
VT-Test
25
Thanks for your attention!