BEL Framework Resources (namespaces

Download Report

Transcript BEL Framework Resources (namespaces

BEL Framework Resources
(namespaces, equivalences, documents)
August 2012
This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view
a copy of this license, visit http://creativecommons.org/licenses/by/3.0/ or send a
letter to Creative Commons, 444 Castro Street, Suite 900, Mountain View, California,
94041, USA.
1
Overview
• The BEL Framework accesses files during compilation
– For checking and equivalencing namespace values
– For augmenting the KAM to increase connectivity
• A set of resource files are maintained by Open BEL
• Flexible - can be substituted or augmented with user
provided documents
2
Contents
• Resource locations
• Resources
–
–
–
–
Namespaces
Equivalences
Annotations
BEL Documents
• Creating and Using Custom Namespaces
Resource Locations
• Resources provided by the BEL Framework can be found here:
– http://resource.belframework.org/belframework/1.0/
• Can also be downloaded from GitHub
– https://github.com/OpenBEL/openbel-framework-resources
4
The BEL Framework Configuration Includes a
Resource Index
• Provides locations for namespace, equivalence, and
augmentation documents
• Can use default or modify to use with custom
namespaces, equivalences, etc.
• Default Resource Index:
– http://resource.belframework.org/belframework/1.0/index.xml
5
Contents
• Resource locations
• Resources
–
–
–
–
Namespaces
Equivalences
Annotations
BEL Documents
• Creating and Using Custom Namespaces
BEL Namespaces
• OpenBEL supports 32 namespaces for:
–
–
–
–
–
genes/RNAs/proteins
protein families
named complexes
biological processes
chemicals
• Namespace documents (.belns) have a specific
format
– Include entity encodings to enforce BEL function semantics
• Users can provide custom namespaces
7
Supported Namespaces
• Genes, RNAs, microRNAs, proteins (6 namespaces)
–
–
–
–
–
–
Entrez Gene Ids (human, mouse, and rat only)
HUGO gene symbols
MGI gene symbols
RGD gene symbols
SwissProt accession numbers
SwissProt names
• Affymetrix Probe Sets (9 namespaces)
– Human, mouse, and rat probe set identifiers
• Protein families (3 namespaces)
– Selventa Protein Families (human, mouse, rat)
8
Supported Namespaces
• Chemicals (3 namespaces)
– ChEBI names
– ChEBI Ids
– Selventa legacy chemicals
• Biological processes and pathologies (5 namespaces)
–
–
–
–
–
GO names
GO Ids
MeSH Phenomena and Processes [G]
MeSH Diseases [C]
Selventa legacy diseases
9
Supported Namespaces
• Named Complexes (5 namespaces)
– Selventa Named Complexes (human, mouse, rat)
– GO Cellular Components names
– GO Cellular Components Ids
• Cellular locations (3 namespaces)
– MeSH Cellular structures [A11.284]
– GO Cellular Components names
– GO Cellular Components Ids
10
BEL Namespace Documents
• Namespaces are .belns files
– Text files with header information and values
• Values include encoding information
– Which BEL functions are valid to apply to this entity
11
Namespace Entity Encoding
Encoding Value
Valid BEL Functions
B
bp(), path()
O
path()
R
r(), m()
M
m()
P
p()
G
g()
A
a(), r(), m(), p(), g(), complex()
C
complex()
• Example values - HGNC namespace
– A2ML1-AS1 (A2ML1 antisense RNA 1),
encoded as "GR" is a valid value for a gene
or RNA abundance, but not protein
abundance
12
BEL Equivalence Files
• A BEL Equivalence File (.beleq) is associated with each BEL
namespace
• Each namespace value in the equivalence file is associated
with a universally unique identifier (UUID)
– 32 hexadecimal digits
• Values with the same UUID are equivalenced
– Terms containing same functions are coalesced to a single node during
compilation
• Values in a namespace file are not required to be included in
the associated equivalence file
Example: Equivalences for MGI namespace
13
Examples of BEL Equivalencing
• The following three protein abundance terms are
equivalent:
– p(HGNC:AKT1)
• The abundance of the protein designated by HUGO gene symbol
‘AKT1’
– p(EGID:207)
• The abundance of the protein designated by EntrezGene Id 207
(AKT1 Human)
– p(SPAC:P31749)
• The abundance of the protein designated by SwissProt Id P31749
(AKT1 Human)
14
Examples of BEL Equivalencing
• The following two biological process terms are
equivalent:
– bp(MESHPP:apoptosis)
• The biological process designated by the MESH Phenomena and
Processes heading ‘apoptosis’
– bp(GOID:0006915)
• The biological process designated by the GO Id 0006915 (apoptotic
process)
15
BEL Annotations
• BEL Annotations and BEL Terms are completely separate
• Annotations are associated with BEL Statements to express context
information about the statement
– Source of the knowledge
• Citation, Evidence
– Biological system
• Cell line, Body part, Species
• 22 Annotation Types are provided with the BEL Framework
– 2 reserved types: Citation and Evidence
– 20 additional defined by .belanno documents
• Additional Annotation Types can be defined by user
– Require unique name within BEL document and domain of allowable values
(as list or .belanno document) or regular expression
16
Annotations Can Be Applied to Individual BEL
Statements or Groups of Statements
Source: PMID 1234567
Cell Type: Fibroblast
Causal relationships
demonstrated in lung
fibroblasts, reported in
PMID 1234567
Tissue: Lung
kin(p(X)) increases p(Z);
p(X) increases r(Y);
Cell Type: Endothelial Cell
Causal relationship
demonstrated in liver
endothelial cells ,
reported in PMID
1234567
Tissue: Liver
p(X) increases r(Y);
17
Each Statement is distinct:
These Statements have
different sets of contexts
Citation Annotation Format
• The Citation annotation is composed of a comma
separated list containing up to 6 fields.
–
SET Citation = {"PubMed","Cell","16962653","2006-10-07","Jacinto
E|Facchinetti V|Liu D|Soto N|Wei S|Jung SY|Huang Q|Qin J|Su B",""}
Field
1
2
3
4
5
6
Required
Contents
Yes
Type of Citation. This is one of the following strings “Book”,
“PubMed”, “Journal”, “Online Reference”, or “Other”
Yes
Name of the Citation. This is typically the journal reference or
book name.
Yes
Reference. This is an identifier that can be used to link to the
citation. For books this is usually the ISBN number, for
PubMeds this would be the PubMed ID and for other types it
could be a URL pointing to the reference such as Wikipedia
page.
No
Date of publication in ISO8061 format (YYYY-MM-DD).
No
Authors. This is a “|” delimited list of authors for the
reference.
No
Comments. This is optional information such as an abstract
that can be stored along with the reference. Limit is 4000
characters.
18
BEL Resource Documents
• BEL Resource Documents are used in compilation Phase
III for network augmentation
– BEL documents
– Relevant assertions are identified and added to the network
• Include:
– Gene Scaffolding
•
•
g(EG:123) transcribedTo r(EG:123)
r(EG:123) translatedTo p(EG:123)
– Protein Family membership
•
p(PFH:"AKT Family") hasMembers list(p(HGNC:AKT1), p(HGNC:AKT2),
p(HGNC:AKT3))
– Named Complex components
•
complex(NCH:"9-1-1 Complex") hasComponents
list(p(HGNC:HUS1),p(HGNC:RAD1),p(HGNC:RAD9A))
– Orthology (2.0.0 and future)
19
Contents
• Resource locations
• Resources
–
–
–
–
Namespaces
Equivalences
Annotations
BEL Documents
• Creating and Using Custom Namespaces
Creating Custom Namespaces
• Allows use of a vocabulary not specifically supported by
the BEL Framework
– Including equivalencing to other namespaces
• Detailed directions can be found here:
–
http://openbel-framework.readthedocs.org/en/latest/tutorials/building_custom_namespaces.html
• Requires:
–
–
–
–
Namespace file in .belns format
URL for the .belns file
Customized resource index
Updated BEL Framework configuration file pointing to new
resource index
• Optional:
– Equivalence file in .beleq format
21