Introduction_final - Bioinfo-casl
Download
Report
Transcript Introduction_final - Bioinfo-casl
Exploring PPI networks using
Cytoscape
EMBO Practical Course Session 8
Nadezhda Doncheva and Piet Molenaar
Course Outline
Lectures & Labs
Protein focus
Graph context
Demo & Do it yourself use cases
Data from recent literature
Tips & Tricks
Biological questions
I have a protein
I have a list of proteins
Shared features, connections
I have data
2
Function, characteristics from known
interactions
Derive causal networks
Network
Topology
Hubs
Clusters
New hypotheses
4/13/2015
Instructor Introductions
Nadezhda Doncheva
Max Planck Institute for
Informatics,
Saarbrücken, Germany
http://www.mpiinf.mpg.de/departments/d3
Graph analysis using Cytoscape
Developed Cytoscape core
plugin
Piet Molenaar
AMC Oncogenomics,
Amsterdam, The Netherlands
[email protected]
http://humangenetics-amc.nl/
Network visualization and
analysis using Cytoscape
Developing Cytoscape plugins in
Java
Member of Cytoscape dev-team
Aidan Budd
Computational Biologist,
Gibson Team,
EMBL Heidelberg
http://www.embl.de/~budd/
Course coordinator/organizer
3
4/13/2015
Schedule
Timeslot
Course item
09:00-10:30
1. Introduction
• Networks and graph theory
• Cytoscape workflow
2. Tutorial session 1
• Focus: network generation
10:30-11:00
Coffee break
11:00-12:30
3. Tutorial session 2
• Focus: network annotation and visualization
12:30-14:00
Lunch
14:00-15:30
4. Tutorial session 3
• Focus: network analysis
15:30-16:00
Tea break
17:30-18:30
Afternoon session; Additional networking ;-)
4
4/13/2015
Overview Introduction
Part I: Introduction to molecular networks and graph
concepts
What are molecular networks?
Why are they useful?
What tools are available?
Part II: Introduction to Cytoscape
5
Network visualization
Plugins/Apps
Workflows
4/13/2015
Why networks?
Complex systems are better described as networks of
interacting components
The topology of a network characterizes the underlying
complex system (global topology parameters) and its
individual components (local topology parameters)
Network topology parameters are easily compared
Useful for discovering patterns in large data sets (better
than tables in Excel)
Allow the integration of multiple data types
6
4/13/2015
Biological networks
Nodes can represent proteins,
genes, metabolites, etc.
Edges can be physical or
functional interactions like
Protein-Protein interactions
Protein-DNA interactions
Metabolic interactions
Co-expression relations
Genetic interactions
…
Important to understand what
the nodes and edges mean
7
4/13/2015
Applications of network biology
”What do you want to do with your network?”
Gene function prediction based on connections to sets of
genes/proteins involved in same biological process
Detection of protein complexes by analyzing modularity
and higher order organization (motifs, feedback loops)
Identification of disease subnetworks that are
transcriptionally active in a disease
8
4/13/2015
Network visualization
Network layouts
Force-directed: nodes repel and
edges pull
Hierarchical: for tree-like networks
Manually adjust layout
Visually interpret a network
9
Global relationships
Dense clusters
4/13/2015
Visual features
Node and edge attributes
represent e.g. gene or
interaction attributes
Map attributes to node and
edge visual properties like
color, shape or size
10
4/13/2015
Common network analysis tasks
Network topology statistics
such as node degree,
betweenness, degree distribution
of nodes, clustering coefficient,
shortest path between nodes
and robustness of the network
to the random removal of single
nodes.
Modularity refers to the
identification of sub-networks of
interconnected nodes that might
represent molecules physically
or functionally linked that work
coordinately to achieve a specific
function.
11
Motif analysis is the
identification of small network
patterns that are overrepresented when compared
with a randomized version of
the same network. Discrete
biological processes such as
regulatory elements are often
composed of such motifs.
Network alignment and
comparison tools can identify
similarities between networks
and have been used to study
evolutionary relationships
between protein networks of
organisms.
4/13/2015
Networks as graphs
Formal graph definition: A graph G is a pair of two sets V
(nodes) and E (edges): G = (V, E)
Neighbors are two nodes n1 and n2 connected by an edge
Neighborhood is the set of all neighbors of node n
Connectivity kn is the size of the neighborhood of n
Degree k is the number of edges incident on n
Note that cases exist with k ≠ kn!
12
4/13/2015
Node degree and shortest path
Hub is a node with an exceptionally
high degree, larger than the average
node degree (see red nodes).
A shortest path between the nodes n
and m is a path between n and m of
minimal length.
The shortest path length, or distance,
between n and m is the length of a
shortest path between n and m.
The characteristic path length is the
average shortest path length, the
expected distance between two
connected nodes.
13
4/13/2015
Small-world networks
A network is a small-world
network if any two arbitrary
nodes are connected by a small
number of intermediate edges, i.e.
the network has an average
shortest path length much smaller
than the number of nodes in the
network (Watts, Nature, 1998).
Interaction networks have been
shown to be small-world
networks (Barabási, Nature
Reviews in Genetics, 2004)
14
4/13/2015
Scale-free networks
Node degree
distribution counts the
number of nodes with
degree k, for k = 0, 1, 2, …
If the node degree
distribution of a network
approximates a power law
P(k) ~ ak-b with b < 3, the
network is scale-free
(Barabási, Science, 1999).
Many biological networks are scale-free.
15
4/13/2015
Scale-free vs. random networks
Random networks are
homogeneous, most nodes
have the same number of links)
not robust to arbitrary
node failure
Scale-free networks have a
number of highly connected
nodes)
robust to random failure,
but very sensitive to hub
failures
Implications to the robustness
of PPI networks (Jeong, Nature,
2001)
16
4/13/2015
Clustering coefficient
The clustering coefficient of
a node n is a ratio N=M, where
N is the number of edges
between the neighbors of a
node n, and M is the maximum
number of edges that could
possibly exist between the
neighbors of n.
The network clustering
coefficient is the average of
the clustering coefficients for all
nodes in the network.
17
4/13/2015
Network clustering
Find subsets of nodes, modules or
clusters, that satisfy some pre-defined
quality measure
Benefits
Finding “natural” clusters
Classifying the data
Detecting outliers
Reducing the data
Downsides
Real data very rarely presents a unique
clustering
Many different models try out more
than one
Several alternative solutions could exist
Interpretation of clusters
18
4/13/2015
Motifs
A small connected graph with a
given number of nodes
Motif frequency is the number of
different matches of a motif
Functionally relevant motifs in
biological networks:
Feed-forward loop (1)
Bifan motif (2)
Single-input motif (3)
Multi-input motif (4)
Significance profiles of motifs
19
2.
1.
3.
4.
4/13/2015
Network organization
The levels of organization of
complex networks:
Node degree provides
information about single nodes
Three or more nodes represent a
motif
Larger groups of nodes are called
modules or communities
Hierarchy describes how the
various structural elements are
combined
20
4/13/2015
Available software tools
Cytoscape http://cytoscape.org/
BioLayout Express3D http://www.biolayout.org/
VisANT http://visant.bu.edu/
Ondex http://www.ondex.org/
Pajek http://pajek.imfm.si/
Ingenuity Pathway Analysis
http://www.ingenuity.com/products/pathways_analysis.html
Pathway Studio
http://www.ariadnegenomics.com/products/pathway-studio/
21
4/13/2015
Why Cytoscape?
www.cytoscape.org
Visualization, Integration & Analysis
Free & open source software application (LGPL license)
Written in Java: can run on Windows, Mac, & Linux
Developed by a consortium: UCSD, ISB, Agilent, MSKCC, Pasteur,
UCSF, Unilever, Utoronto; provide a permanent dedicated team of
developers
Active community: mailing lists, annual conferences
10,000s users, 3000 downloads/month
Extensible through plugins developed by third parties
It is used! Lots of citations
22
4/13/2015
Network analysis using Cytoscape
23
4/13/2015
Cytoscape extended functionality
Cytoscape extends its functionality
with plugins or apps
Developed by third parties
Listed at http://apps.cytoscape.org/
Usually available through the Plugin
Manager
Can be downloaded from the
plugins’s websites
Cover many diverse areas of
application
24
4/13/2015
A typical Cytoscape workflow
1.
Load networks
2.
Load attributes
3.
Analyze and visualize
networks
4.
Prepare for publication
Cline, et al. ”Integration of biological networks and
gene expression data using Cytoscape”, Nature
Protocols, 2, 2366-2382 (2007).
25
4/13/2015
Some useful Cytoscape links
Download: http://www.cytoscape.org/download.html
Tutorials:
http://opentutorials.cgl.ucsf.edu/index.php/Portal:Cytoscape
Cytoscape Mailing lists:
http://www.cytoscape.org/community.html
Plugins/Apps: http://apps.cytoscape.org/
Documentation:
http://www.cytoscape.org/documentation_users.html
26
4/13/2015
On to the first Tutorial session
Unless any questions ???
27
4/13/2015