Dependency networks Sushmita Roy BMI/CS 576 www.biostat.wisc.edu/bmi576 [email protected] Nov 25th, 2014 RECAP • Probabilistic graphical models provide a natural way to represent biological networks • So far we.

Download Report

Transcript Dependency networks Sushmita Roy BMI/CS 576 www.biostat.wisc.edu/bmi576 [email protected] Nov 25th, 2014 RECAP • Probabilistic graphical models provide a natural way to represent biological networks • So far we.

Dependency networks
Sushmita Roy
BMI/CS 576
www.biostat.wisc.edu/bmi576
[email protected]
Nov 25th, 2014
RECAP
• Probabilistic graphical models provide a natural way
to represent biological networks
• So far we have see Bayesian networks:
– Sparse candidates
– Module networks
• Today we will focus on dependency networks
What you should know
• What are dependency networks?
• How they differ from Bayesian networks?
• GENIE3 algorithm for learning a dependency network
from expression data
• Different ways to represent conditional distributions
• Evaluation of various network inference methods
Graphical models for representing regulatory
networks
• Bayesian networks
• Dependency networks
Random variables
encode expression levels
Msb
2
Sho1
X1
REGULATORS
X2
TARGET
Ste20
Y3
X2
X1
Y3=f(X1,X2)
Y3
Structure
Function
Edges correspond to some form of statistical dependencies
Dependency network
• A type of probabilistic graphical model
• As in Bayesian networks has
– A graph component describing the dependency structure
between random variables
– Each variable Xj is associated with a prediction function fj
to predict Xj from the state of its neighbors
• Unlike Bayesian network
– Can have cyclic dependencies
Dependency Networks for Inference, Collaborative Filtering and Data visualization
Heckerman, Chickering, Meek, Rounthwaite, Kadie 2000
Notation
•
•
•
•
Xi: ith random variable
X={X1,.., Xp}: set of p random variables
xik: An assignment of Xi in the kth sample
x-ik: Set of assignments to all variables other than Xi
in the kth sample
Learning dependency networks
Regulators
?
?
fj
Xj
…
?
•fj can be of different types.
•Learning requires estimation of each of
the fj functions
•In all cases learning requires us to
minimize an error of predicting Xj from
its neighborhood:
Different representations of the fj function
• If Xj is continuous
– fj can be a linear function
– fj can be a regression tree
– fj can be a random forest
• An ensemble of trees
• If Xj is discrete
– fj can be a conditional probability table
– fj can be a conditional probability tree
GENIE3: GEne Network Inference with
Ensemble of trees
• Solves a set of regression problems
– One per random variable
• Uses an Ensemble of regression trees to represent fj
– Models non-linear dependencies
• Outputs a directed, cyclic graph with a confidence of
each edge
• Focus on generating a ranking over edges rather than
a graph structure and parameters
Inferring Regulatory Networks from Expression Data Using Tree-Based Methods Van Anh
Huynh-Thu, Alexandre Irrthum, Louis Wehenkel, Pierre Geurts, Plos One 2010
Recall our very simple regression tree example
e1
NO
e2
X2 > e1
YES
X3
X2 > e2
X3
X2
NO
YES
An Ensemble of trees
• A single tree is prone to “overfitting”
• Instead of learning a single tree, Ensemble models
make use of a collection of trees
A Random forest: An Ensemble of Trees
leaf nodes
split nodes
x-j
tree t1
x-j
……
tree tT
– Prediction is
Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial
GENIE3 algorithm sketch
• For each Xj, generate learning samples of
input/output pairs
–
–
–
–
LSj={(x-jk,xjk), k=1..N}
On each LSj learn fj to predict the value of Xj
fj is either a Random forest or Extra trees
Estimate wij for all genes i ≠ j
• wij quantifies the confidence of the edge between Xi and Xj
• Generate a global ranking of edges based on each wij
Note that depending of the interpretation of the weights wi,j ,
their aggregation to a get a global ranking of regulatory links isnot
trivial. We will see in the context of tree-based methods that it
requires to normalize each expression vector appropriately.
variable (selected in x ), trying to reduce as
variance of the output variable (x j ) in the
samples. Candidate splits for numerical
compare the input variable values with a
determined during the tree growing.
GENIE3 algorithm sketch
Predictor ranking
Figure 1. GENIE3 procedure. For each gene j~ 1, . . . ,p, a learning sample L Sj is generated with expression levels of j
expression levels of all other genes as input values. A function f j is learned from L Sj and a local ranking of all genes except j is
rankings are then aggregated to get a global ranking of all regulatory links.
Figure from Huynh-Thu et al.
doi:10.1371/journal.pone.0012776.g001
Learning fj in GENIE3
• Random forest or Extra Trees to represent the fj
• Learning the Random forest
– Generate M=1000 bootstrap samples
– At each node to be split, search for best split among K randomly
selected variables
– K was set to p-1 or (p-1)1/2, where p is the number of
regulators/parents
• Learning the Extra-Trees
– Learn 1000 trees
– Each tree is built from the original learning sample
– At each test node, the best split is determined among K random
splits, each determined by randomly selecting one input
(without replacement) and a threshold
Computing the importance weight of a
predictor
• Importance is computed at each interior node
• Remember there can be multiple interior nodes per
regulator
• For an interior node, importance is given by the
reduction in variance if we make a split on that node
Interior node
Set of data samples that reach this node
#S: Size of the set S
Var(S): variance of the output variable in set S
St: subset of S when a test at N is true
Sf: subset of S when a test at N is false
Computing the importance weight of a
predictor
• For a single tree the overall importance is then sum
over all points in the tree where this node is used to
split
• For an ensemble the importance is averaged over all
trees
• To avoid bias towards highly variable genes,
normalize the expression genes to all have unit
variance
Computational complexity of GENIE3
• Complexity per variable
–
–
–
–
O(TKNlog N)
T is the number of trees
K is the number of random attributes selected per split
N is the learning sample size
Evaluation of network inference methods
• Assume we know what the “right” network is
• One can use Precision-Recall curves to evaluate the
predicted network
• Area under the PR curve (AUPR) curve quantifies
performance
Precision=
Recall=
# of correct edges
# of correct edges
# of predicted edges
# of true edges
zed. One apparent
that it does not take
alization. Indeed since
hts satisfy equation (4)
qual weights to all tree
distribution, represented by a (single) regression tree.
Finally, although we exploited tree-based ensemble methods,
our framework is general and other feature selection techniques
could have been used as well. Actually, several existing methods
AUPR based performance comparison
Some comments about expression-based
network inference methods
• We have seen two types of algorithms to learn these
networks
– Per-gene methods
• Sparse candidate: learn regulators for individual genes
• GENIE3
– Per-module methods
• Module networks: learn regulators for sets of genes/modules
– Other implementations of module networks exist
• LIRNET: Learning a Prior on Regulatory Potential from eQTL Data
– Su In Lee et al, Plos genetics 2009
(http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjou
rnal.pgen.1000358)
• LeMoNe: Learning Module Networks
– Michoel et al 2007 (http://www.biomedcentral.com/1471-
Many implementations of per-gene methods
• Mutual Information
– Context Likelihood of relatedness (CLR)
– ARACNE
• Probabilistic methods
– Bayesian network: Sparse Candidates
• Regression
– TIGRESS
– GENIE-3
DREAM: Dialogue for reverse engineeting
assessments and methods
Community effort to assess regulatory network inference
DREAM 5 challenge
Previous challenges: 2006, 2007, 2008, 2009, 2010
Marbach et al. 2012, Nature Methods
Marbach et al., 2010
Community
Random
Where do different methods rank?
Comparing module (LeMoNe) and per-gene
(CLR) methods