IntAct Presentation

Download Report

Transcript IntAct Presentation

13.04.2015
Master title
Molecular Interactions – the IntAct Database
5
Sandra Orchard
EMBL-EBI
EBI is an Outstation of the European Molecular Biology Laboratory.
Why is it useful to study PPI
interactions and networks?
•
Proteins are the workhorses of cell – and all activities are controlled
through interactions with other molecules.
•
To understand the biology of a single protein, you have to study its
interacting partners
•
One way to predict protein function is through identification of binding
partners – Guilt by Association. If the function of at least one of the
components with which the protein interacts is known, that should let us
assign its function(s) and the pathway(s)
•
Hence, through the intricate network of these interactions we can map
cellular pathways, their interconnectivities and their dynamic regulation
2
Why are there so many issues with interaction
data?
1. Wide variety of methods for demonstrating molecular
interactions – all have their strengths and weaknesses
2. No single method accurately defines an interaction as
being a true binary interaction observed under
physiological conditions
Why do we need interaction databases
• Issues with all interaction data – true picture can only be
built up by combining data derived using multiple
techniques, multiple laboratories
• Problematic for any bench researcher to do – issues with
data formats, molecular identifiers, sheer volume of data
• Molecular interaction databases publicly funded to collect
this data and annotate in a format most useful to
researchers
Interaction Databases
Deep Curation
IntAct – active curation, broad species coverage, all molecule types
MINT – active curation, broad species coverage, PPIs
DIP – active curation, broad species coverage, PPIs
MPACT - ? curation, limited species coverage, PPIs
MatrixDB – active curation, extracellular matrix molecules only
BIND – ceased curating 2006/7, broad species coverage, all
molecule types – information becoming dated
Shallow curation
BioGRID – active curation, limited number of model organisms
HPRD – active curation, human-centric, modelled interactions
MPIDB – active curation, microbial interactions
Engineering 1850
•
Nuts and bolts fit
perfectly together, but
only if they originate
from the same factory
•
Standardisation
proposal in 1864 by
William Sellers
•
6
It took until after WWII
until it was generally
accepted, though …
Proteomics 2003
•
Proteomics data are perfectly
compatible, but only if they
are from the same
lab / database / software
•
“Publish and vanish” by data
producers
•
Collecting all publicly
available data requires huge
effort
•
Urgent need for
standardisation
What constitutes a PSI standard
• Documents that make up each individual standard
• Minimal reporting requirements => MIAPE document
• XML Data exchange format
• Domain-specific controlled vocabulary
MIMIx
PSI-MI XML format
•
Community standard for Molecular Interactions
•
XML schema and detailed controlled vocabularies
•
Jointly developed by major data providers:
BIND, CellZome, DIP, GSK, HPRD, Hybrigenics, IntAct, MINT, MIPS, Serono, U. Bielefeld, U.
Bordeaux, U. Cambridge, and others
•
Version 1.0 published in February 2004
The HUPO PSI Molecular Interaction Format - A community standard for the representation of
protein interaction data.
Henning Hermjakob et al, Nature Biotechnology 2004, 22, 176-183.
•
Version 2.5 published in October 2007
Broadening the Horizon – Level 2.5 of the HUPO-PSI Format for Molecular Interactions;
Samuel Kerrien et al. BioMed Central. 2007.
9
PSI-MI XML benefits
•
Collecting and combining data from different sources
has become easier
•
Standardized annotation through PSI-MI ontologies
•
Tools from different organizations can be chained,
e.g. analysis of IntAct data in Cytoscape.
Home page
http://www.psidev.info/MI
10
Controlled
vocabularies
www.ebi.ac.uk/ols
Additional benefits
• MITAB format – released 2007 by popular demand. Tab-delimitated
organisation of data.
• PSIQUIC – query access that runs across all interaction databases
using PSI formats
• PSISCORE – common scoring mechanism in development
• Access to R Bioconductor statistics packages
• Growth industry in “composite” databases – do no new curation but
merge the output of resources producing data in PSI format.
• IMEx
IMEx
• Consortium of molecular interaction databases dedicated
to producing high quality, annotated data, curated to the
same standards
• Data will be curated once at a single centre then
exchanged between partners
• Users need only go to a single site to obtain all data
IntAct goals & achievements
1. Publicly available repository of molecular
interactions (mainly PPIs) - ~300K binary
interactions taken from >5,300 publications (May
2012)
2. Data is standards-compliant and available via our
website, for download at our ftp site or via PSICQUIC
http://www.ebi.ac.uk/intact
ftp://ftp.ebi.ac.uk/pub/databases/intact
www.ebi.ac.uk/Tools/webservices/psicquic/view/main.xhtml
3. Provide open-access versions of the software to
allow installation of local IntAct nodes.
14
IntAct Curation
“Lifecycle of an Interaction”
Sanity Checks
(nightly)
reject
Public web site
Publication
(full text)
.
exp
accept
p2
I
p1
FTP site
check
CVs
annotate
Curation
manual
IMEx
report
report
MatrixDB
curator
Master headline
Super curator
Mint
DIP
UniProt Knowledge Base
http://www.ebi.uniprot.org/
Interactions can
be mapped to
the canonical
sequence…
16
.. to splice variants..
.. or to postprocessed chains
Relationship with UniProtKB
Interaction
curation
Other IMEx
databases
Protein
sequence
Data filters
In place
High confidence
PPIs
Master headline
Other
DBs
Early 2012
Data model
•
Support for detailed features
i.e. definition of interacting interface
Interacting domains
Overlay of Ranges on sequence:
18
How to deal with Complexes
19
•
Some experimental protocol do generate complex data:
Eg. Tandem affinity purification (TAP)
•
One may want to convert these complexes into sets of
binary interactions, 2 algorithms are available:
Performing and visualing a Simple Search
Data, Standards and Tools
EBI Walthrough
May 2009
EBI
EBI is an Outstation of the European Molecular Biology Laboratory.
http://www.ebi.ac.uk/intact
IntAct – Home Page
21
Performing a Simple Search
22
From search to networkView…
Visualizing - networkView
23
Extend and Visualise your Search
24
Visualizing - networkView
25
Cytoscape Web
• Cytoscape Web - web-based network visualization tool
• Modeled after Cytoscape – open-source, interactive,
customizable and easily integrated into web sites.
• Contains none of the plugin architecture functionality of
Cytoscape
26
Opening the network in Cytoscape…
Visualization
Master headline
Applying a better graph layout…
Visualization
Master headline
Applying a better graph layout…
Visualization
Master headline
Highlighting network properties…
Visualization
Master headline
Highlighting network properties…
Visualization
Master headline
Highlighting network properties…
Visualization
Master headline
Highlighting network properties…
Visualization
Master headline
Cytoscape Plugins
34
Exploring a single interaction in more
depth
EBI is an Outstation of the European Molecular Biology Laboratory.
First search from the home page…
Interaction detail
36
Details of
interaction
Choice of UniProtKB
or Dasty View
UniProt
Taxonomy
PubMed/IMEx ID
Detail of interaction
Details of
interaction
37
Interaction
Score
Expansion
method
Interaction Score
• All evidences of Protein A interacting with Protein B are
clustered.
• Evidences are scored according to
a. Interaction detection method
b. Interaction type
c. Number of publications interaction has been observed in
Score is normalised on 0-1 scale
Low score – low confidence interaction
High score – high confidence interaction
38
Changing the tabular view
39
Search result for ‘RAD1’
Participant information
40
First search from the home page…
Interaction detail
41
Details of
interaction
Viewing Interaction Details
Additional
information
42
Interaction Details
43
IntAct – Home Page-Quick Search
44
Advanced search
Filtering options
Add more filtering options
Ontology search
46
First search from the home page…
Searching with MIQL
47
•
Using the Molecular Interaction Query Language
(MIQL), one can also build complex queries
•
List of terms one can query on :
Binary view of o60671_human
Browsing – Molecule View
48
Browsing – extending your
search
49
http://www.ebi.ac.uk/training/online/
Interactions, Pathways and Networks
Network analysis
Analyzing protein-protein interaction networks.
Koh GC , Porras P , Aranda B , Hermjakob H , Orchard SE
PMID:22385417
J Proteome Res [2012 (11) ] page info:2014-31
50
?
?
?
?
?
?
?
?
? ?
?
?
?
?
?
?
?
?
51
?
?
?