Oncotator Transcript selection and its importance on variant annotation. Overview • Background – Transcript selection has a large affect on annotation results. • Oncotator has two selection.

Download Report

Transcript Oncotator Transcript selection and its importance on variant annotation. Overview • Background – Transcript selection has a large affect on annotation results. • Oncotator has two selection.

Oncotator
Transcript selection and its importance on variant
annotation.
Overview
•
Background
–
Transcript selection has a large affect on annotation results.
• Oncotator has two selection modes: CANONICAL and EFFECT
–
Existing selection modes often fail to capture expected variant annotations
• Crucial for clinical applications!
•
•
Approach
–
Construct list of transcripts that should be used for annotation
–
Override selection mode when list is provided on cmd line
Results
–
New list of optimal transcripts for each gene
• Composed of transcripts with 100% sequence identity match with UniProt record
– Plus some additional tweaks
–
We now “correctly” annotate..
• …all clinically actionable variants described in MyCancerGenome
• … all genes captured by MGH’s SNAPSHOT assay
Oncotator Transcript Selection modes
• CANONICAL (default)
• EFFECT
1. GENCODE level of curation
1. variant classification score
2. variant classification score
2. GENCODE level of curation
3. APPRIS
3. APPRIS
4. transcript sequence length
4. transcript sequence length
5. alphabetical
5. alphabetical
Gene
Variant
Canonical
Classification
CRLF2
chrX:1314966A>C
5'UTR
EGFR
chr7:55259515T>G Missense_Mutation
Canonical
Annotation
p.L813R
Effect
Classification
Effect
Annotation
Missense_Mutation
p.F232C
Missense_Mutation
p.L813R
• Manual transcript selection is necessary to get EGFR p.L858R
annotation
Approach
1. Compile list of well known variants and their expected
annotations
– mycancergenome.org
• 212 variants
– MGH SNAPSHOT assay
• Additional 30 variants
2. “Reverse oncotate” to get expected genomic variants
– e.g. BRAF p.V600E  g.chr7:140453136A>T
3. Compare reverse-oncotated variant annotations with
expected results from Step 1.
– e.g. g.chr7:140453136A>T  ???
Results
Oncotator Modes
Tx Selection Approach
Canonical
Effect
Concordance with Expected
annotation
86%
(243/283)
91%
(257/283)
Transcript Lists
UniProt
Exact
98%
(279/283)
Clinical
100%
(283/283)
• Transcript Lists
– “UniProt Exact” (tx_exact_uniprot_matches.txt)
• 24,000 transcripts with perfect sequence identity with
UniProt record sequence
– UniProt record == gene’s canonical transcript
– “Clinical”
• “UniProt Exact” list + 3 additional transcripts
Conclusions
• We now have a list of preferred transcripts that we recommend
using in most settings
– “-c” option on command line
– 100% concordance with variant annotations in
MyCancerGenome