Computational Biology Lecture #11: Inferring Regulatory Networks from Gene Expression Data  Bud Mishra Professor of Computer Science and Mathematics 12 ¦ 3 ¦ 2001 11/6/2015 ©Bud Mishra,

Download Report

Transcript Computational Biology Lecture #11: Inferring Regulatory Networks from Gene Expression Data  Bud Mishra Professor of Computer Science and Mathematics 12 ¦ 3 ¦ 2001 11/6/2015 ©Bud Mishra,

Computational Biology
Lecture #11: Inferring Regulatory
Networks from Gene Expression Data

Bud Mishra
Professor of Computer Science and Mathematics
12 ¦ 3 ¦ 2001
11/6/2015
©Bud Mishra, 2001
1
Regulatory Networks
• All cells in an organism have the same genomic data,
but the proteins synthesized in each vary according to
cell type, time and environmental factors
• There are network of interactions among various
biochemical entities in a cell (DNA RNA, protein,
small moleules)
• Can we infer the networks of interactions among
genes?
11/6/2015
©Bud Mishra, 2001
2
Gene Regulation
DNA
transcription
Transport to
cytosol
mRNA
Nonphosphorylated
protein
Transport to
nucleus
Nonphosphorylated
protein
Post-translational modifications
Nonphosphorylated
protein
11/6/2015
©Bud Mishra, 2001
3
Regulatory Networks
• There are lots of regulatory interactions that
occur after transcription.
• But we will focus on transcriptional regulation:
– It plays a major role in the regulation of protein
synthesis
– We can measure mRNA levels relatively easily
11/6/2015
©Bud Mishra, 2001
4
Transcriptional Regulation:
Example: The lac Operon
Regions coding for proteins
Regulatory Regions
Diffusable regulatory proteins
RNA
polymerase
lacI
P
P
lacZ
O
lacY
mRNA +
ribosomes
I
11/6/2015
lacA
mRNA +
ribosomes
Z
©Bud Mishra, 2001
Y
A
5
Transcriptional Regulation:
Example: The lac Operon
Regions coding for proteins
Binds but
cannot move to
transcribe
Regulatory Regions
Diffusable regulatory proteins
RNA
polymerase
lacI
I
P
P
I
11/6/2015
mRNA +
ribosomes
O
lacZ lacY lacA
No mRNA
When lactose is absent, the protein encoded
by lacI represses transcription of the lac
operon
©Bud Mishra, 2001
6
Transcriptional Regulation:
Example: The lac Operon
Regions coding for proteins
Regulatory Regions
Diffusable regulatory proteins
RNA
polymerase
lacI
P
P
O
mRNA +
ribosomes
I
lacZ
Z
lacY
Y
lacA
mRNA +
ribosomes
A
Lactose
Confirmational
change
11/6/2015
Blocked
©Bud Mishra, 2001
7
Inferring Regulatory Network
• Given:
– Temporal expression data for a set of genes
• Infer:
– The network of regulatory relationship among the
genes
11/6/2015
©Bud Mishra, 2001
8
Regulatory Network Models
• Boolean Networks
– Kaufmann ’93, Liang, Fuhrman & Somogyi ’98
• Differential Equations
– Chen, He & Church ’99
• Bayesian Networks
– Friedman et al. ’99
• Weight Matrices
– Weaver, Workman & Stormo ‘99
11/6/2015
©Bud Mishra, 2001
9
Inferring Regulatory Networks
with Weight Matrices
• Overview:
– Assume discrete time steps
– u(t) is a vector representing the expression level of
n genes at time t
– Build a model for predicting u(t+1) given u(0),
u(1),…, u(t)
11/6/2015
©Bud Mishra, 2001
10
Overview of the model
•
•
•
•
u(t)
r(t)
x(t)
u(t+1)
u(t)
11/6/2015
Input expression levels at time t
Determine net regulation of each gene at time t
Determine response of each gene at time t
predict input expression levels at time t+1
r(t)
x(t)
©Bud Mishra, 2001
u(t+1)
11
Determining the Net
Regulation of Each Gene
• Model regulative interactions among genes
with a weight matrix
ri(t) = j wij uj(t)
– ri(t) = Regulatory input to i
– wij = Regulatory influence of j on i
– uj(t) = Expression level of j
11/6/2015
©Bud Mishra, 2001
12
Determining the Response of
Each Gene
r(t)
11/6/2015
• The regulatory input to each
gene determines its response
through a sigmoid-like
(“squashing”) function.
xi(t+1) = [1+ exp(-ri(t) – bi)]-1
©Bud Mishra, 2001
13
Determining the Response of
Each Gene
• The bi parameter represents the predisposition
of the gene in the absence of any regulative
input (its basal rate)
• We can represent it as just another weight
connected to a “gene” that is always
completely on.
xi(t+1) = [1 +exp{ -(j wij uj(t) + bi)}]-1
11/6/2015
©Bud Mishra, 2001
14
Predicting the Expression Level of
Each Gene at Time t+1
• The response of each gene is a value in [0,1].
• Convert this relative level into a real unit of
expression
• Allow different levels of maximal expression
for each gene
11/6/2015
©Bud Mishra, 2001
15
Predicting the Expression Level of
Each Gene at Time t+1
ui(t+1) = mi xi(t+1)
• ui = Expression level of i
• mi = Maximal expression level for i
• xi = Response of i
11/6/2015
©Bud Mishra, 2001
16
Putting it Together
ui(t+1) =
mi/ [1 +exp{ -(j wij uj(t) + bi)}]
Maximal expression
level for i
Expression level of
gene i at time t+1
11/6/2015
Regulatory input
to i
©Bud Mishra, 2001
17
Including Environmental
Variables
• One can represent environmental variables
(e.g., the concentration of lactose) as follows:
– Extend input vector to include n genes and p
environmental variables
– Extend weight matrix so that each gene is
connected to p environmental variables
11/6/2015
©Bud Mishra, 2001
18
Learning the Parameters of
the Model
• Given
– A time series of expression measurements
– u(0), …, u(t), u(t+1): Pairs h u(t), u(t+1) i
• Find
– The wij parameters so that the data are closely
modeled.
This model can be solved with “back-propagation”
algorithm as in a feed-forward neural network
11/6/2015
©Bud Mishra, 2001
19
Learning the Parameters:
Linear Algebra Approach
• Weaver et al: Example of a linear algebraic
approach
• The model for each gene is independent
– So one can determine the best weights for gene i,
– Then the best weight for gene j
– etc…
• Set up a linear problem or determining the
weights for each gene i
11/6/2015
©Bud Mishra, 2001
20
Overview of the model
•
•
•
•
u(t)
r(t)
x(t)
u(t+1)
u(t)
11/6/2015
Input expression levels at time t
Determine net regulation of each gene at time t
Determine response of each gene at time t
predict input expression levels at time t+1
r(t)
x(t)
©Bud Mishra, 2001
u(t+1)
21
Linear Algebra
• Learning the parameters:
[
][ ]=[ ]
u1(0) L un(0)
M
O M
u1(t) L un(t)
wi1
M
win
ri(0)
M
ri(t)
• Alternatively: U wi = ri ) wi = U-1ri
• Use singular value decomposition to calculate
the inverse of U.
11/6/2015
©Bud Mishra, 2001
22
Experimental Methodology
• Generate random weight matrix models
• Use model to generate data
– h u(t), u(t+1) i pairs
• See how well the method recovers the “correct”
model
11/6/2015
©Bud Mishra, 2001
23
Experimental Methodology
• Generate random regulatory networks
– # Genes (n) ranged from 10 to 200
– Each had a set maximal expression level
– Several parameters to control the distribution of
weights
• Average % of non-zero weights in a row
• Max and min for absolute value of weights
• Normally distributed noise is introduced into
inputs.
11/6/2015
©Bud Mishra, 2001
24
Experimental Methodology
• Evaluated method according to how well it identified nonzero weights (I.e., correctly identified gene interactions)
• Specifically, consider:
– Sensitivity = TP/(TP+FP)
– TP= True Positive
=#correctly predicted non zero weights
– FP=False Positives
=#incorrectly predicted non zero weights
11/6/2015
©Bud Mishra, 2001
25
Results
• More training data ) More accurate models
• Sparse Networks ) More accurate models
• False positive (non-zero) weights about 10
times smaller than true positive…
Sensitivity > 90%
11/6/2015
©Bud Mishra, 2001
26
Limitations of Approach
• Assumption that all gene interactions are
independent of one another
• Assumption about regular discrete time
evolution
• Assumption that a gene’s maximal expression
level is known or can be estimated
• The model accounts only for transcriptional
regulation
11/6/2015
©Bud Mishra, 2001
27
Bayesian Networks
• Friedman, Linial, Nachman & Pe’er ‘2000
• Learned Bayesian network models from
Stanford yeast cell-cycle data
– 76 measurements of 6177 genes
– Focused on 800 genes whose expression varied
over the cell-cycle stages
11/6/2015
©Bud Mishra, 2001
28
Bayesian Networks
Nodes
represent gene
activities
E
A
E A Pr[B|E,A] Pr[: B|E,A]
0
0
1
1
0
1
0
1
0..3
0.4
0.7
0.1
11/6/2015
0.7
0.6
0.3
0.9
Edges
represent
dependencies
D
B
C
©Bud Mishra, 2001
29
Representing Partial Models
• Since there is little data and many variables,
focus on finding “features” common to lots of
models that could explain the data
– Markov relations: Is Y in the Markov blanket of X?
• X, given its Markov blanket is independent of other
variables in network
– Order relations: Is X an ancestor of Y?
11/6/2015
©Bud Mishra, 2001
30
Estimating Confidence in
Features
• Bootstrap Method:
For I = 1 to m
– Sample (with replacement) expression experiments
– Learn a Bayesian network from this sample
• The confidence in a feature is the fraction of
the m models in which it was represented…
11/6/2015
©Bud Mishra, 2001
31
Biological Analysis
• Using confidence in order relations, the approach
identified “dominant genes”
– Several of these are known to be involved in cell-cycle
control
– Several have non-viable null mutants
– Many encode proteins involved in replication, sporulation,
budding
• Assessing confident Markov relations
– Most pairs are functionally related
11/6/2015
©Bud Mishra, 2001
32