Introduction to molecular networks Sushmita Roy BMI/CS 576 www.biostat.wisc.edu/bmi576 [email protected] Nov 6th, 2014 Understanding a cell as a system • Measure: identify the parts of a.

Download Report

Transcript Introduction to molecular networks Sushmita Roy BMI/CS 576 www.biostat.wisc.edu/bmi576 [email protected] Nov 6th, 2014 Understanding a cell as a system • Measure: identify the parts of a.

Introduction to molecular networks
Sushmita Roy
BMI/CS 576
www.biostat.wisc.edu/bmi576
[email protected]
Nov 6th, 2014
Understanding a cell as a system
• Measure: identify the parts of a system
– Parts: different types of bio-molecules
• genes, proteins, metabolites
– High-throughput assays to measure these molecules
• Model: how these parts are put together
– Clustering
– Network inference and analysis
Omics data provide comprehensive description
of nearly all components of the cell
Joyce & Palsson, Nature Mol cell biol. 2006
Key concepts in networks
• What are molecular networks?
– Different types of networks
– Graph-theoretic representation
• Computational problems in network biology
– Network structure and parameter learning
– Analysis of network properties
– Inference on networks: for data integration, interpretation and
hypothesis generation
• Classes of methods for expression-based network inference
• Probabilistic graphical models for networks
– Bayesian networks and dependency networks
– Evaluation of inferred networks
A network
• Describes connectivity patterns between the parts of a system
– Vertex/Nodes: parts
– Edges/Interactions: connections
• Edges can have sign and/or weight
• Connectivity is represented as a graph
– Node and vertex are used interchangeably
– Edge and interaction are used interchangeably
Vertex/Node
A
Edge
B
E
D
F
C
Why are networks important?
• Genes do not function independently
• Identifying these networks is a central challenge in Systems
biology
• Motivating applications
– Protein Networks for cancer prognosis
– Networks for interpretation of genetic mutants
– Networks for gene prioritization
CB26CH 23-Ideker
a
Protein networks for predicting cancer
prognosis
ARI
26 June 2010
20:15
b
NGFR
CAV1
AHR
* ESR1
SYT1
RNF111
CDC27
c
PTK2B
JAK2
DCTN2
DAB2
ERBB2
SMAD3
YWHAZ
SMARCA2
CHUK
*
JUN
EGF
*
SERPINA3
BRCA1
RIT1
USP4
* TP53
*RAD51
CAV1
SMAD2
TP53
PIN1
*
CR
RPL5
ERBB4
M, S, CR
IFRD1
FGF5
VEGF
CT, A, S, CR
S, CR
Expression level
CDH5
RPS25
Downregulated
Upregulated
VEGFB
d
Diamonds are differentially expressed genes
Good
Poorpredictive power
Circles are not
differentially expressed but importantoutcome
for
outcome
Chuang & Ideker, Annual Reviews 2010
BRCA1
BRCA1
Different types of molecular networks
• Depends on what
– the vertices represent
– the edges represent
– whether edges directed or undirected
• Molecular networks
– Vertices are bio-molecules
• Genes, proteins, metabolites
– Edges represent interaction between molecules
Transcriptional regulatory networks
Nodes: regulatory protein like a TF, or target gene
Edges: TF A regulates C
Transcription factors
(TF)
A
B
! " # $%&' () * ' $! +, -, . &!" ##$%
!! &' (
) **+&,,- - - ./ 012 34536*789.51
E. coli: 153 TFs and 1319S.
target
genes
cerevisiae
Gene C
(a)
(b) T F
B
A
C
(c) R G
Directed, Signed,
weighted
Vargas and Santillan, 2008
Representation
Figure
2
of the S. cerevisiae
regulatory
network target genes
157transcriptional
TFs and
4410
S. cerevisiae:
Representation of the S. cerevisiae transcriptional regulatory network. a) Representation of the tr
gene regulatory network of S. cerevisiae. Green circles represent transcription factors, brown circles denote
Detecting protein-DNA interactions
•
•
ChIP-chip and ChIP-chip binding profiles for transcription factors
Determine the (approximate) locations in the genome where a protein binds
Peter Park, Nature Reviews Genetics, 2009
Protein-protein interaction networks
Vertices: proteins
Edges: Protein U physically interacts with protein X
Protein
complex
U
X
Y
Z
U
X
Y
Z
Undirected
Yeast protein interaction network
Barabasi et al. 2004
Metabolic networks
Vertices: enzymes
Edges: Enzyme M and N share a metabolite
Proteins (enzymes)
M
a
Enzymes
metabolites
b
N
Metabolites
c
O
d
M
N
O
Undirected,
weighted
Figure from KEGG database
Signaling networks
Vertices: Enzymes and other proteins
Edges: Enzyme P modifies protein Q
Receptors
P
Q
A
TF
P
Q
A
Directed
Sachs et al., 2005, Science
Genetic interaction networks
Genetic interaction: If the phenotype of
double mutant is significantly different
than each mutant along
Vertices: Genes
Edges: Genetic interaction between
query (Q) and gene G
Q
G
Undirected
Dixon et al., 2009, Annu. Rev. Genet
Yeast genetic interaction network
Costanzo et al, 2011, Science
Summary of different types of Molecular
networks
• Physical networks
– Transcriptional regulatory networks: interactions between
regulatory proteins (transcription factors) and genes
– Protein-protein: interactions among proteins
– Signaling networks: interactions between protein and small
molecules, and among proteins that relay signals from outside the
cell to the nucleus
• Functional networks
– metabolic: describe reactions through which enzymes convert
substrates to products
– genetic: describe interactions among genes which when
genetically perturbed together produce a significant phenotype
than individually
Computational problems in networks
• Network reconstruction
– Infer the structure and parameters of networks
– We will examine this problem in the context of “expression-based
network inference”
•
Network evaluation
– Properties of networks
• Network applications
– Interpretation of gene sets
– Using networks to infer function of a gene
Network reconstruction
• Given
– A set of attributes associated with network nodes
– Typically attributes are mRNA levels
• Do
– Infer what nodes interact with each other
• Algorithms for network reconstruction can vary based on their
meaning of interaction
– Similarity
– Mutual information
– Predictive ability
Computational methods to infer networks
• We will focus on transcriptional regulatory networks
• These networks are inferred from gene expression data
• Many methods to do network inference
– We will focus on probabilistic graphical models
Modeling a regulatory network
Hot1
Sko1
HSP12
Hot1
Sko1
Hot1 regulates HSP12
X2
X1
ψ(X1,X2)
….
HSP12 is a target of Hot1
HSP12
Structure
Who are the regulators?
BOOLEAN
LINEAR
DIFF. EQNS
PROBABILISTIC
Y
Function
How they determine expression levels?
Mathematical representations of networks
X1
X2
Models differ in the function that maps
input system state to output state
f
X3
Input expression of
neighbors
Output expression of node
Boolean Networks
Differential equations
Probabilistic graphical models
Input Output
X1 X2
X3
X1
X2
X3
0
0
0
0
1
1
1
0
1
1
1
1
Rate equations
Probability distributions
Network evaluation
• How accurate is the network?
– In silico validation
– Agreement with known interactions
– Agreement with experiments
• Do topological properties of the network represent important
biological functions
– Degree distributions
– Network motifs
– Highly connected nodes and relationship to lethality
Network applications
•
•
•
•
Interpretation of gene sets (for example from clustering)
Integration of two or more different types of datasets
Graph clustering
Classification/Inference on graphs
Plan for next lectures
• Overview of Expression-based network inference
– Classes of methods
– Strengths and weaknesses of different methods
• Representing networks as probabilistic graphical models
– Bayesian networks
– Module networks
– Dependency networks
• Evaluation of inferred networks