Sample Page Title

Download Report

Transcript Sample Page Title

Identifying functional subnetworks
in large-scale datasets
Benno Schwikowski
Institut Pasteur – Systems Biology Group
http://systemsbiology.fr
The three levels of this talk
1. Discovery of pathways active in HepC infection
2. Cytoscape plug-ins
3. Cytoscape platform
Benno Schwikowski
Hepatitis C infection
• One person out of 30 is infected
• No vaccine exists
• In 20% of chronic infections, liver fibrosis
and cirrhosis
• Frequently requires liver transplants
Benno Schwikowski
Studying HepC infection mRNA changes
• 50% of transplant livers become re-infected with
Hepatitis C
• Study expression of 7000 genes in re-infected
livers after transplantation
– 1-24 month post-transplant
– Samples in 3-6 month intervals
• 28 biopsies from 11 patients
– Mixture of hepatocytes, hepatic stellate cell, Kupffer
cells, various types of blood cells
• Compare against pre-transplant reference pool
Benno Schwikowski
Result of mRNA expression analysis
• Most genes (5968 of 7000)
were significantly under- or overexpressed
in one or more experiments
• High patient-to-patient variation
Benno Schwikowski
Our approach
1. Construct seed network
among known molecular players
2. Expand seed network
to include differentially expressed genes
3. Identify putative pathways
by the Active Modules approach
Benno Schwikowski
Seed network
Types of interactions
Protein-protein
Protein-DNA
Phosphorylation
Activation
Repression
Covalent bond
Methylation
InteractionFetcher plug-in
Purpose
• Dynamically retrieves remote information for selected nodes
– From SQL database
– Requests data via XML-RPC protocol
Currently implemented types
• Protein/gene synonyms
• Orthologs
• Sequences (DNA, protein, DNA upstream)
– Gene, protein,
• Interactions/associations
Options
• Cross-species queries
• Ortholog information from Homologene
• Inferred interactions (interologs)
• Interactive links to Source Web pages
100% open-source (client and server)
Benno Schwikowski
2. Expand seed network
Purpose
• Bring significantly up-/downregulated genes
“into the picture”
Approach
• Add interactions with differentially
expressed genes (“in silico pull-down”)
– Use BIND, HPRD databases
– Only human-curated interactions
Benno Schwikowski
Network after
InteractionFetcher
expansion
Identifying putative pathways
Why clustering can be problematic
• Many clustering methods are not model-based
 significance of clusters is unclear
• Any given cluster may not be supported by all
experiments – noise problem
• Clusters tend to contain unrelated genes with
vaguely similar profiles
Benno Schwikowski
The three levels of this talk
1. Discovery of pathways active in HepC infection
2. Cytoscape plug-ins
3. Cytoscape platform
Benno Schwikowski
How can the clustering issues
be addressed? The ActiveModules Plug-in
• Define “up-/downregulated” on the basis of
a well-defined statistical model
• Also derive clusters from some of the input
experiments
• Use additional evidence to focus on
“plausible” clusters  protein interactions
Benno Schwikowski
Interaction networks
Schwikowski, Uetz, Fields
Nature Biotechnology (2000)
Benno Schwikowski
Modular organization of interaction networks
Benno Schwikowski
A lot of interaction data is becoming
available
Databases on...
• Protein-protein interactions
• Protein-DNA interactions
• Genetic interactions
• Metabolic pathways
• Cell signaling pathways, similarity
relationships, literature-based relationships
Benno Schwikowski
Multi-criteria detection of modules
1. Interaction network
between
genes/proteins
2. Differential Gene/Protein
Abundances/Activities
Experiments
Genes
Conditions
Conditions ->
-> gal1D
gal1D
Benno Schwikowski
COX6
COX6
NDT80
NDT80
PRS1
PRS1
UPF3
UPF3
OPI1
OPI1
YGR145W
YGR145W
YGL041C
YGL041C
CRM1
CRM1
HIS3
HIS3
CIT2
CIT2
KHS1
KHS1
YBR026C
YBR026C
YMR244W
YMR244W
YMR317W
YMR317W
YAR047C
YAR047C
DAL7
DAL7
YDL177C
YDL177C
YLR338W
YLR338W
YGR073C
YGR073C
YGR146C
YGR146C
ORT1
ORT1
0.034
0.034
0.09
0.09
0.167
0.167
0.245
0.245
0.174
0.174
0.387
0.387
0.285
0.285
0.018
0.018
0.432
0.432
0.085
0.085
0.159
0.159
0.276
0.276
0.078
0.078
0.181
0.181
0.234
0.234
0.289
0.289
0.002
0.002
0.216
0.216
0.125
0.125
0.189
0.189
0.025
0.025
gal2D
gal2D
0.052
0.052
00
0.063
0.063
0.415
0.415
0.045
0.045
00
0.232
0.232
0.009
0.009
0.568
0.568
0.272
0.272
0.168
0.168
0.072
0.072
00
0.324
0.324
0.121
0.121
0.168
0.168
0.295
0.295
0.091
0.091
0.394
0.394
0.308
0.308
0.068
0.068
gal3D
gal3D
0.152
0.152
0.041
0.041
0.23
0.23
0.253
0.253
0.046
0.046
0.036
0.036
0.126
0.126
0.07
0.07
0.339
0.339
0.038
0.038
0.149
0.149
0.324
0.324
0.077
0.077
0.065
0.065
0.019
0.019
0.09
0.09
0.041
0.041
0.051
0.051
0.056
0.056
0.345
0.345
0.108
0.108
gal4D
gal4D
0.111
0.111
0.007
0.007
0.233
0.233
0.471
0.471
0.015
0.015
0.577
0.577
0.086
0.086
0.001
0.001
0.71
0.71
0.392
0.392
0.139
0.139
0.189
0.189
0.239
0.239
0.086
0.086
0.109
0.109
0.161
0.161
0.367
0.367
0.096
0.096
0.126
0.126
0.067
0.067
0.322
0.322
gal5D
gal5D
0.198
0.198
0.157
0.157
0.003
0.003
0.115
0.115
0.098
0.098
0.151
0.151
0.096
0.096
0.052
0.052
0.188
0.188
0.168
0.168
0.293
0.293
0.014
0.014
0.077
0.077
0.288
0.288
0.107
0.107
0.017
0.017
0.183
0.183
0.07
0.07
0.218
0.218
0.432
0.432
0.195
0.195
gal6D
gal6D
0.097
0.097
0.035
0.035
0.234
0.234
0.111
0.111
0.001
0.001
0.255
0.255
0.002
0.002
0.028
0.028
0.07
0.07
0.077
0.077
0.023
0.023
0.142
0.142
0.254
0.254
0.122
0.122
0.05
0.05
0.041
0.041
0.205
0.205
0.044
0.044
0.088
0.088
0.014
0.014
0.058
0.058
gal7D
gal7D
0.171
0.171
0.037
0.037
0.25
0.25
0.061
0.061
0.029
0.029
0.101
0.101
0.21
0.21
0.017
0.017
0.619
0.619
0.416
0.416
0.043
0.043
0.243
0.243
0.126
0.126
0.233
0.233
0.156
0.156
0.091
0.091
0.085
0.085
0.082
0.082
0.122
0.122
0.116
0.116
0.174
0.174
gal80D
gal80D
0.019
0.019
0.18
0.18
0.19
0.19
0.328
0.328
0.079
0.079
0.498
0.498
0.673
0.673
0.002
0.002
0.742
0.742
0.252
0.252
00
0.156
0.156
0.148
0.148
0.363
0.363
0.146
0.146
0.183
0.183
0.042
0.042
0.143
0.143
0.01
0.01
0.127
0.127
0.136
0.136
gal11
gal11
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Scoring a module candidate
Perturbations
/conditions
Final
Score
Rank adjustment: Binomial summationPz = 1-F(zA(j))
m
p A j   
mh Pz h 1  Pz mh
h j
rA(j)=F-1(1-pA(j))
m = total number of conditions
j = size of subset of conditions
Ideker, Ozier, Schwikowski, Siegel
(2002): Bioinformatics 18. S233-240
Pathways in Rosetta’s compendium
(300 conditions)
Benno Schwikowski
The three levels of this talk
1. Discovery of pathways active in HepC infection
2. Cytoscape plug-ins
3. Cytoscape platform
Benno Schwikowski
Active Modules plug-in applied
to HCV re-infection data
• Iterative application results in four
significant highly overlapping subnetworks
• Repeat analysis only retaining “late-active”
re-infection experiments
– Eliminates pathways activated by transplant
operation
– Cutoff: 8 months
Benno Schwikowski
Network after
InteractionFetcher
expansion
Which observations can we make locally?
Bold: Differentially
regulated subnetwork
Red/Green: Lateactive subnetwork
Cytotalk plug-in
• Overrepresentation analysis using Cytotalk plug-in, R, of
overrepresentation of genes in Gene Ontology classes
• Cytotalk enables interactive communication with
–
–
–
–
–
C/C++ programs
Java processes
Python
UNIX shell scripts
R, R scripts
• Can be run on same machine or any other Internetconnected machine
• Can function as Cytoscape plug-in
• 100% open-source
Benno Schwikowski
The three levels of this talk
1. Discovery of pathways active in HepC infection
2. Cytoscape plug-ins
3. Cytoscape platform
Benno Schwikowski
Some Network Visualization Tools
•
•
•
•
•
•
•
•
Pajek - Slovenia
Osprey - SLRI, Toronto
VisANT - BU
Biolayout - EBI
GraphViz
PowerPoint
Others
Cytoscape (only open-source biology)
Benno Schwikowski
Cytoscape
Cytoscape Basic Concepts
• Objects
visualized as nodes
• Relationships
visualized as edges
• Attributes (name,
sequence, source,...)
• Mapping
attributes  drawing
customizable through
visual mapper
Benno Schwikowski
Cytoscape file formats
Sample interaction file
YDR216W
YDR216W
YDR216W
YDR216W
pd
pd
pd
pd
YIL056W
YKR042W
YGL096W
YDR077W
[...]
Sample interaction file
GENE
GENE0
GENE1
GENE2
GENE3
[...]
DESC
G0
G1
G2
G3
exp0.sig
0.0
0.0
0.0
0.0
exp1.sig
0.0
0.0
0.0
0.0
exp0.sig
23.2
34.6
10.0
1.64
exp1.sig
11.5
5.2
28.0
4.77
Cytoscape
Display
• gene & protein
expression
• protein interactions
(physical and
non-physical)
• protein classifications
Analysis plug-in
modules
http://www.cytoscape.org/
Java: platform
independent + webstart
• 100% open-source
Benno Schwikowski
Visual
Styles
Display gene expression
as clear text
Visual
Styles
Map expression values
to node colors using a
continuous mapper
Visual
Styles
Expression data mapped
to node colors
Multidimensional attributes
Cytoscape, pre-release plug-in
Data from Ideker et al., Science (2001)
Layout
• 16 algorithms available through plug-ins
• Zooming, hide/show, alignment
yFiles Circular
Benno Schwikowski
Cytoscape Core –
Differences to most other approaches
• Emphasis on data analysis & integration
• No built-in semantics
(added by plug-ins)
• Very simple concepts
• Human-readable input formats
• Extensibility
Benno Schwikowski
Cytoscape extensibility
• Core: 100% open source Java
– Plug-in API
– Plug-ins are independently licensed
• “Just need to do the biology”
• Template code samples
Plug-in
Benno Schwikowski
Biomodules plug-in
Prinz S, Avila-Campillo I, Aldridge C, Srinivasan A,
Dimitrov K, Siegel AF, and Galitski T
Genome Res. 2004 14: 380-390
Modules in Complex
Networks
Iliana Avila-Campillo, Tim
Galitski
Cytoscape Plugins
Discovering Regulatory
and Signaling
Circuits in Molecular
Interaction Networks
Trey Ideker, Owen
Ozier,
Benno Schwikowski,
Andrew Siegel
Data Integration in Juvenile Diabetes
Research
Marta Janer, Paul Shannon
Benno Schwikowski
A network motif sampler
David Reiss, Benno Schwikowski
Cytoscape Core Features
•
•
•
•
•
Visualize and lay out networks
Display network data using visual styles
Easily organize multiple networks
Bird’s eye view navigation of large networks
Supports SIF and GML, molecular profiling
formats, node/edge attributes
• Functional annotation from GO + KEGG
• Metanode support (hierarchical groupings)
• Extensible through plugins (20 developed)
Benno Schwikowski
Baliga et al.
Genome Research
June 2004
Benno Schwikowski
Collaborators: HCV
Institute for Systems Biology, Seattle, WA
• David Reiss
• Iliana Avila-Campillo
• Vesteinn Thorsson
• Tim Galitski
Benno Schwikowski
Benno Schwikowski
Collaborators: Cytoscape
• UCSD
Trey Ideker
Chris Workman
• Memorial-Sloan Kettering
Cancer Center
Chris Sander
Gary Bader
Ethan Cerami
• Pasteur
Melissa Cline
Andrea Splendiani
Tero Aittokallio
Benno Schwikowski
• ISB
Leroy Hood
Rowan Christmas
• Agilent Technologies
• Unilever PLC
• Long-term funding from
NIH and participating
institutions
Shannon, P., et al. (2003). Cytoscape: A software environment for integrated models of biomolecular interaction networks.
Genome Res 13, 2498-504.
Collaborators: Active Networks
• Trey Ideker
• Owen Ozier
• Andrew Siegel
• Richard Karp
Benno Schwikowski
Levels of Biological Information
DNA
mRNA
Protein
Pathways
Networks
Cells
Tissues
Organs
Individuals
Populations
Ecologies
Benno Schwikowski