Transcript pptx
David Amar
http://tau.ac.il/~davidama/bioinfo_tutorials
Network biology
Overview: systems biology
Represent molecular entities
Represent interactions
Two main data types
Pathways
Interaction networks
Biological interaction networks
Nodes: genes or other molecules
Edges: evidence for some interaction – can contain
weights, directions
Magtanong
et al. 2011
Nature
Biological interaction networks
Nodes: genes/proteins or other molecules
Edges based on evidence for interaction
Gene co-expression
Protein-protein
interaction
Genetic interaction
Breker and
Schuldiner
2009
Voineagu et al. 2011 Nature
4
Cytoscape
Cytoscape is an open source software for integrating,
visualizing, and analyzing networks.
This tutorial describes the Cytoscape 3 user interface.
Outline
Basics
Load and visualize data
Customize
Applications
Clustering
Enrichment analysis
GeneMANIA
Modmap
Gene expression analysis
Initial window
Main Network View, initially blank.
The toolbar, contains
command buttons, the
name is shown when the
mouse pointer hovers
over it.
Control Panel: lists the
available networks by
name
Network Overview
Pane
Table Panel: can be used to display node,
edge, and network table data
Load data: import from databases
Load data: import from databases
The initial window enables
searching in the big public
databases
Load data: import from databases
Search example: by gene
name
Choose
databases
Import result
The
imported
networks by
name
Basic
statistics
Look at a network
Main Network View
The toolbar, contains
command buttons, the
name is shown when the
mouse pointer hovers
over it.
Control Panel: lists the
available networks by
name
Network Overview
Pane: move around!
Table Panel: displays node, edge, and
network table data
Search for a gene
Information about the
marked nodes
Load data: import all interactions
Load data: import all interactions
Import result
The new network
Load data: from files
We sometimes have our own data
From papers
A special search in a database
Our experiment (e.g., correlation between genes)
Famous formats
SIF
A table
OWL – for pathways, “complex” text
But easy to get and very informative once uploaded
Load from files
Load from files
Contains an interaction
network of 331 genes from
Ideker et al. 2001 Science
Load data: from SIF files
Text: name1<space or tab>interaction_type<space or tab>name2
Load data: from a table
From excel files or tab-delimited text tables
Load data: from a table
Load data: from a table
Set where to look for
the nodes and the
type
Load data: from a table
OPTIONAL: Click on
the columns that you
want to be kept as
“attributes”
Result
Load data: OWL
Good for looking at pathways
This example: data from the Reactome database
Load data: result
Directed
edges:
signaling
Zoom
Zoom
Focus on a selected region (nodes in yellow)
Zoom: result
Move around
Get a sub-network
Get a sub-network
The subnetwork was
created below
the original
network
Save the session
We imported six networks
Before we start modifying them lets save the session
File -> Save
Sanity check:
close
Cytoscape
and load the
session!
Remarks
At this point we know to load data from databases and
files
We can perform simple navigation, zoom and save
We saved different networks each its own visualization
‘rules’
A good habit that saves troubles: save a session for
each visualization type
Multiple networks, but keep a consistent visualization
Modifying and saving a visualization
Cytoscape supports countless options
Layouts
Node size, color, label…
Edge width, line type…
We will show main examples that are enough to start
To save the graph as an image:
Change the layout
Organic layout
Circular layout
•Places all of the nodes in a
circular arrangement.
•Very quick
•Partitions the network into
disconnected parts and
independently lays out
those parts.
Force-directed
Uses physical simulation that models the nodes as physical objects and
the edges as springs connecting those objects together.
Change layout scale
Change the scale
Before: scale is 1
Scale is 8
Style
Open and
modify
The IntAct netowrk: node color
The IntAct netowrk: node color
Node color
Each column represents some
information that we have
Discrete: set a value for each type of
information
Apps
Cytoscape also has many tools called ‘apps’
Install by going to Apps -> App Manager
Applications support
Advanced analysis
Biological analysis
Integrating data
Import special data
I) Find and annotate dense areas
Use an app that “clusters” the
network
Biological assumption
We look for protein
communities
Many interactions within
Probably share function
Gene function prediction
Step 1: remove duplicated edges
Sometimes nodes are linked by more than one edge
Multiple evidence for interaction
Remove them for clustering and simpler visualization
Step 2: use ClusterViz
Step 3: look at the results
All clusters
Sorted by size
Select a cluster
Step 3: look at the results
Step 4: biological function?
We discovered a cluster
A set of highly connected proteins
What biological processes/functions are enriched in
this cluster?
Discover significantly over-represented biological
functions
Compared to creating random clusters
Step 4: BINGO
Select all nodes (Ctrl+A)
Step 4: BINGO
Give the cluster a
name (“Cluster 1”)
Select human
Step 4: Results
Summary table
GO graph
Only correted p-values
matter!!!
Mark in the
network
II) Analyze a gene set
We have a set of genes we want to interpret
From papers
From data analysis
We want to discover
Functional enrichments
How they interact within themselves and similar genes
Use GeneMANIA
Resources and installation
Installing GeneMANIA may
take >30 minutes
Steps
1. Apps -> Apps Manager
2. Install GeneMANIA
3. Open GeneMANIA
(Apps->GeneMANIA)
1.
2.
Confirm data download
A new window will open:
select human for this
tutorial
GeneMANIA
Our input: a set of genes from Hauser et al. 2005
(http://archneur.ama-assn.org/cgi/pmidlookup?view=long&pmid=15956162)
HSPA1B, HSPA1A, DNAJC6, DNAJB2, UBE1, PARK5,
SLC25A5, COX5B, COX6C, NDUFA3, ATP5I, HK1, COX4I1,
ATP1B1, COX6B, SLC25A3, NDUFS5, ATP5O, UQCRH,
ATP5C1, NDUFB8, ATP5G3, ATP5C1, VDAC3, COX4I1,
COX7B, NDUFA9, ATP1B1, ATP6V0A1, ATP6V0D1, ATP6V0C,
ATP6V1B2, SLC9A6, ATP61P1, ATP6V1D, ATP6V0B,
ATP6V1A1, ATP6V1E1, GDI1, STXBP1, SYT1, VAMP1
GeneMANIA: input window
Paste here the gene
names (or ids)
separated by spaces
(no commas)
GeneMANIA: input window
GeneMANIA: input window
The recognized
genes and their full
names
The type of the supported
networks
For each
interaction type
there is a list of
networks that can
be marked
GeneMANIA: input window
Use physical interactions, pathways
and co-expression for our example
Results
The output network. Grey nodes are
new genes that were added to improve
the connectivity
Information tables.
For example: the
detected functions
Results
Layout was modified to organic for
better visualization
Mark a function:
automatically
marks the relevant
nodes
VS.
Highlight specific interactions
Highlight specific interactions
III) Analyze different interaction types…
Members of
protein complex
VS.
“Positive” – expected within families
“Negative” – expected between families
Some networks contain both
Members of parallel
pathways
Analysis of network pairs
Interactions types can differ: within (“positive”) vs.
between (“negative”) functional units
Input: networks H,G with same vertex set
Goal: summarize both networks in a module map
Node – module: gene set highly connected in H
Link – two modules highly
interconnected in G
Between-pathway models
Kelley and Ideker 2005
Ulitsky et al. 2008
Kelley and Kingsford 2011
Leiserson et al. 2011
69
Solution: ModMap
Cytoscape app: under construction
Currently: run the command line tool and upload to
Cytoscape as a solution
We will show how to upload a solution
Load ModMap analysis
Our example: combined analysis of yeast PPI and GI
data
Find GI among complexes
1. Load the network: type interaction types
2. Load the association of nodes to modules
3. Color the results and the set layout
Load the network
Load the YeastData.xlsx file
Important, we have
several types
Load the network
Load the YeastData.xlsx file
The network is large,
we tell Cytoscape to
generate it
Load a clustering solution
Modmap_modules.txt file format
(text file):
Node module_name
Import Table: a way to add external
information about the nodes
Load a clustering solution
Right click and give it a
name
Load a clustering solution
Right click and give it a
name
Load a clustering solution
Layout a clustering solution
Layout a clustering solution: results
A circle for each cluster
Unclustered nodes
Remove unclustered nodes
Mark the selected nodes and create a sub-network
Remove self and duplicated edges
Zoom in on a part of the solution
Not informative enough, we cannot see edge types…
Change the visualization style
Change the visualization style
Change the visualization style
IV) Overlay gene expression data
Class/Home exercise (data in the exp_data directory)
Load human PPI
Load gene fold-change in a gene expression
experiment
Set node color and size by the fold change
Play with the layout
For example, group attribute layout
Run BINGO on a selected sub-network