Converting from GO ID and Reactome ID
Download
Report
Transcript Converting from GO ID and Reactome ID
Annotation for Gene Expression
Analysis with Reactome.db Package
Utah State University – Spring 2012
STAT 6570: Statistical Bioinformatics
Cody Tramp
1
References
Ligtenberg W. 2011. Reactome.db: How to
use the reactome.db package.
www.reactome.org
2
Reactome.db Overview
“Open souce, open access, manually
curated, and peer-reviewed pathway
database” – www.reactome.org
Reactome.db is an R interface that allows
queries to the SQL database containing
pathway information
Contains functions for converting between
annotation IDs and names for GO, Entrez,
and Reactome
3
Getting Help on Specific Reactome.db
Functions
#Load the Reactome.db package
library(reactome.db)
#Check for main manual pages
?reactome.db #This won't get the actual manual
#List all reactome.db objects
ls("package:reactome.db")
# [1]
# [4]
# [7]
#[10]
"reactome“ "reactome_dbconn“ "reactome_dbfile"
"reactome_dbInfo“ "reactome_dbschema“ "reactomeEXTID2PATHID"
"reactomeGO2REACTOMEID“ "reactomeMAPCOUNTS“ "reactomePATHID2EXTID"
"reactomePATHID2NAME“ "reactomePATHNAME2ID“ "reactomeREACTOMEID2GO"
#Look up specific manual for an object
?reactome_dbInfo #Still not very useful – poor documentation
4
How IDs and names are stored in
Reactome.db
Key
15869
68616
68827
68867
68874
The reactome.db links to a SQL database
Functions are interfaces to the database
SQL databases are relational databases
(think of Excel spreedsheets, but better)
Data is stored as key:value pairs
Value
Homo sapiens: Metabolism of nucleotides
Homo sapiens: Assembly of the ORC complex at the origin of replication
Homo sapiens: CDC6 association with the ORC:origin complex
Homo sapiens: CDT1 association with the CDC6:ORC:origin complex
Homo sapiens: Assembly of the pre-replicative complex
5
Reactome.db Function Uses
(NOTE: all return a key:value list)
Converting Between Entrez and Reactome
reactomeEXTID2PATHID = Entrez ID to Reactome.db ID
reactomePATHID2EXTID = Reactome.db Name to Entrez ID
> xx <- toTable(reactomeEXTID2PATHID)
> head(xx)
reactome_id gene_id
1
168253
10898
Use toTable()
2
168254
10898
instead of as.list()
3
168253
8106
that is shown in
4
168254
8106
manuals
5
168253
5610
6
168254
5610
6
Reactome.db Function Uses
(NOTE: all return a key:value list)
Converting from GO ID and Reactome ID
reactomeREACTOMEID2GO = Reactome.db ID to GO IDs
reactomeGO2REACTOMEID = GO ID to Reactome.db ID
> xx <- toTable(reactomeGO2REACTOMEID)
> head(xx)
reactome_id
go_id
1
168276 GO:0019054
2
168276 GO:0019048
3
168276 GO:0044068
4
168276 GO:0022415
5
168276 GO:0051701
6
168276 GO:0044003
7
Reactome.db Function Uses
(NOTE: all return a key:value list)
Retrieving Pathway Names from Reactome IDS
reactomePATHNAME2ID = Reactome.db Name to Reactome.db ID
reactomePATHID2NAME = Reactome.db ID to Reactome.db Name
> xx <- toTable(reactomePATHID2NAME)
> head(xx)
reactome_id path_name
1 15869 Homo sapiens: Metabolism of nucleotides
2 68616 Homo sapiens: Assembly of the ORC complex at the origin of replication
3 68689 Homo sapiens: CDC6 association with the ORC:origin complex
4 68827 Homo sapiens: CDT1 association with the CDC6:ORC:origin complex
5 68867 Homo sapiens: Assembly of the pre-replicative complex
6 68874 Homo sapiens: M/G1 Transition
8
Reactome.db Function Uses
(NOTE: all return a key:value list)
reactomeMAPCOUNTS = shows number of rows in each
function’s relational database (not
very useful unless error checking)
> xx <- as.list(reactomeMAPCOUNTS)
> xx
$reactomePATHID2NAME
$reactomeEXTID2PATHID
[1] 13778
[1] 28363
$reactomePATHNAME2ID
$reactomeGO2REACTOMEID
[1] 13876
[1] 3217
$reactomeREACTOMEID2GO
$reactomePATHID2EXTID
[1] 47575
[1] 8320
9
Ex: Find apoptosis induction-related ID
(compare to Notes 6.1 slide 10)
# Get data.frame summarizing all reactome.db pathways including a
certain string
xx <- toTable(reactomePATHNAME2ID)
all.pathways <- xx$path_name # get name of each reactome.db pathway
t <- grep('apoptosis',all.Terms) # get index where Term includes
#use agrep() for approximate term searching
reactome.Term <- unlist(all.pathways[t])
reactome.IDs <- unlist(xx$reactome_id[t])
reactome.frame <- data.frame(reactome.ID=reactome.IDs,
reactome.Term=reactome.Term)
rownames(reactome.frame) <- 1:length(reactome.ID)
reactome.frame # 13 terms
10
Ex: Find apoptosis induction-related ID
(compare to Notes 6.1 slide 10)
11
Ex. Pathway Term Search Function
##Define Function to search for pathways with given key word
##agrep.bool is indicator to use agrep (TRUE) or grep (FALSE)
searchPathways2REACTOMEID <- function(term, agrep.bool)
{
xx <- toTable(reactomePATHNAME2ID)
all.pathways <- xx$path_name # get name of each reactome.db pathway
#get index where Term is found
if (agrep.bool==FALSE) (t <- grep(term, all.pathways)) else (t <agrep(term, all.pathways))
unlist(xx$reactome_id[t])
}
apop.IDs <- searchPathways2REACTOMEID("apoptosis", FALSE)
length(apop.IDs) #13 pathways matched
apop.IDs <- searchPathways2REACTOMEID("apoptosis", TRUE)
length(apop.IDs) #85 pathways matched
12
Getting GO Terms from single
Reactome ID
##Get List of GO Terms from Reactome ID
xx <- toTable(reactomeGO2REACTOMEID)
t <- xx$reactome_id == "15869"
GOTerms <- xx$go_id[t]
> GOTerms
[1] "GO:0055086" "GO:0006139" "GO:0044281"
[4] "GO:0034641" "GO:0044238" "GO:0008152"
[7] "GO:0006807" "GO:0044237" "GO:0008150"
[10] "GO:0009987"
> xx <- toTable(reactomeGO2REACTOMEID)
> head(xx)
reactome_id
go_id
1
168276 GO:0019054
2
168276 GO:0019048
3
168276 GO:0044068
4
168276 GO:0022415
5
168276 GO:0051701
6
168276 GO:0044003
13
Getting GO Terms from list of
Reactome IDs
##Define Function to get all GO Terms for all Reactome IDs in a list
getGOTerms <- function(list_reactome)
{
listGO = list(); xx <- toTable(reactomeGO2REACTOMEID);
for(i in 1:length(list_reactome))
{t <- xx$reactome_id==list_reactome[i]; temp_list = xx$go_id[t]
listGO = c(listGO, temp_list)}
unlist(listGO)
}
GOTerms.all <- getGOTerms(apop.IDs)#From slide 10
length(GOTerms.all) #136 GO Terms from 13 apop.IDs
Should have yielded 169 terms (Notes 4.1 slide 10)
– reactome.db might not be complete
14
Reactome.org Online Tools
15
Pathway Viewer on reactome.org
http://www.reactome.org/userguide/Usersguide.html#Introduction
16
Pathway Viewer on reactome.org
Details Panel
17
Pathway Viewer on reactome.org
http://www.reactome.org/entitylevelview/PathwayBrowser.html#DB=gk_current&FOCUS_SPECIES_ID=48887&FOCUS_PATHWAY_ID=71387&ID=76213&VID=3422142
18
Reactome Pathway Symbols
Upregulation and
participating proteins
Inhibition
http://www.reactome.org/entitylevelview/PathwayBrowser.html#DB=gk_current&FOCUS_SPECIES_ID=48887&FOCUS_PATHWAY_ID=71387&ID=76213&VID=3422142
19
Reactome Database Assignment Method
Genes seem to be assigned to pathways in a
similar manner to GO database
If gene is up-regulated, it is included
Genes that are down-regulated in a condition are
NOT mapped to the condition/pathway
Haven’t received official response from
reactome.org, but from general browsing this
seems to be the case
20
Pathway Analysis Tool
http://www.reactome.org/ReactomeGWT/entrypoint.html#PathwayAnalysisDataUploadPage
21
Pathway Analysis Tool
http://www.reactome.org/ReactomeGWT/entrypoint.html#PathwayAnalysisDataUploadPage
22
Expression Set Data Analysis
23
Expression Set Data Analysis
24
Summary
Reactome.db provides an interface to the
SQL database containing IDs
Functions for converting between ID types
No functionality for gene testing through R
Online tools include pathway maps and ID
lookup tables
Some limited expression testing (with
unknown statistical methods)
25
Questions?
26