System approaches for complex diseases

Download Report

Transcript System approaches for complex diseases

Bayesian network and its
applications
Jun Zhu
Genetics
Rosetta Inpharmatics
Merck & Co.
Outline
• Methods
– Integration of genetics and gene expression
– Integration of data from multiple tissues
– Construction of causal graphic networks
– Integration of transcription factor binding sites
and protein-protein interaction (PPI) data
• Applications
– target selection and prioritization
– Integrate with siRNA screening data
– Integrate with proteomics data
Biological networks/pathways
Data required to train models
Gene sets
Association
networks
Probabilistic causal
networks
Mechanism
based models
Biological details revealed
Biological networks/pathways
Data required to train models
Association
networks
Probabilistic causal
networks
Biological details revealed
4267 top genes in BxH liver female rescan qtl overlap
(num(p(GGC)<1e-15)>100 ~abs(cor)>0.5886)
1. How do genes in the same
module interact?
2. How do genes in different
modules interact?
3. Can we make causal
inferences to elucidate
signaling pathway for
disease targets?
A framework for data integration
knowledge
Medline
Biocarta/Biopathway
Biologists
High throughput
data
Microarray data
Proteomic data
How to integrate
them?
Database
Genomics
Hypothesis, test
Genetics
GUI
Bayesian network
• decompose joint distribution based on conditional
independence
p(G)  p( X 1, X 2....Xn)   p( Xi | Pa( Xi))
i
• Find maximum likelihood of G given data D,
p(D|G)
Bayesian network
• How it is reconstructed?
Data is fixed
Search the best model
p( D | G )
– Local search method (insertion, deletion,
reversion)
– Complexity penalty (BIC score)
– Bayesian average (1000 independent runs to
explore possible space)
Bayesian network-practical issues
• How it is reconstructed?
– NP-hard problem
– Limit numbers of nodes
– Limit search space
BN: Markov equivalent
• Bayesian network is just a graphic model
• Itself does not reveal causal information
AB
BA
p( A, B)  p( B / A) p( A)  p( A / B) p( B)
Bayesian network: A, B and C are
correlated, but through different mechanisms.
A
A
C
B
C
B
L
A
A
B
B
C
B
C
A
C
BN: priors of causal information
• Break Markov equivalence by introducing
priors for structures
• Set priors so that p(AB) is different from
p(BA)
• Priors were derived from genetic
information
Integration of genetics and gene expression
Experimental Design
Experimental Data
• Genetic map
• Genotype
• Gene expression of
relevant tissues
• Clinical end points
Ingredients for inferring causality
• Perturbations with a causal anchor
– KOs/transgenics present a known perturbation (causal
anchor) where response can be studied
– Natural variation in a segregating population provides
the same type of causal anchor (ability to identify DNA
variations associated with response):
AACGGTT
AACAGTT
DNA Supporting
Gene X
Variation in DNA leads
to variation in mRNA
High expression,
alt splicing, codon
change, etc.
Low expression, no alt.
splicing, no codon
change, etc.
Variation in mRNA leads
to variation in protein,
which in turn can lead to
disease
Distinguishing Causal from Reactive
Genes
Causative Model
Independent
Model
Reactive Model
eumelanin RNAs
ob/ob
L
leptin
T1
obesity db/db
T2
P  L,T1,T2   P T1 | L P T2 | T1 
L
obesity leptin
T2
T1
T1
Avy
L
P  L,T1,T2   P T2 | L P T1 | T2 
T2
obesity
P  L,T1,T2   P T2 | L P T1 | L
L DNA Locus controlling RNA levels and/or clinical traits
L:
T1
R:
Quantitative trait 1
T2
C:
Quantitative trait 2
Schadt E, et al., Nature Genetics, 2005
Inference causal relationships
Gene A with cis acting QTL
Gene A
Chr 1
Physical location
Gene A
Gene A
Gene
expression of
A and D
correlate
Locus 1
Chr2
Chr2
Locus 2
Locus 3
Gene expression of B, C & E
correlate
Gene
Chr 9
D Gene D
Physical location
Gene D
Chr 1
Gene C
Gene B
Locus 1
A and D have
overlapping
eQTL on Chr
1 at Locus 1
Genes with complex trans acting QTLs
Locus 1
Gene A controls Gene D
B, C and E have
overlapping eQTL
on Chr 2
Gene E
Chr2
Locus 1
Locus 2
Locus 3
Genes B & C control Gene E
Bayesian network-integrating
genetics
• Experimental Hsd11b1 signature : mice treated with
Hsd1 inhibitor
• Prediction Hsd1 signatures based on BxD data
– Correlation to Hsd1
• 10% of predicted signature overlap with
experimental one
– BN without genetics
• 20% of predicted signature overlap with
experimental one
– BN with genetics
• 52% of predicted signature overlap with
experimental one
Zhu J, et al, Cytogenet Genome Res. 2004
BN: simulation study
BN: Genetics information is critical when
sample size is small
Zhu J, et al, PLoS Comput Biol. 2007
A framework for data integration
knowledge
Medline
Biocarta/Biopathway
Biologists
High throughput
data
Microarray data
Proteomic data
How to integrate
them?
Database
Genomics
Hypothesis, test
Genetics
GUI
How to integrate protein-protein interaction
data?
Can we find overlapped information
better?
4-clique
4-clique
3-clique
3-clique
Clique community
(partial clique)
Comparing protein-protein interactions with gene
co-expression
0.51
0.50
0.29
0.19
Integrating transcription factor (TF)
binding data and PPI
• Introducing scale-free priors for TF and
large PPI complex
p(T   g )  w(T )
w(T )  log(  r (T , g i )  rcutoff )
gi R
• Fixed prior for small PPI complex
Application to yeast cross
BN
KO data
GO terms
TF data
w/o any priors
125
55
26
w/ genetics priors
139
59
34
w/ genetics, TF and PPI
priors
152
66
52
The network integrated genetics, TF and PPI has better prediction power.
Mechanism for a QTL hot spot
Red: TF
Green: PPI
Zhu J, et al. Nature Genetics, 2008
Applications
• How to use networks to prioritize
candidates?
• How to use networks to identify causal
genes in genome-wide association
studies?
Driver potential
Query
gene
Hypergeometric test
Validating connections in human cohorts
• Study of the genetics of gene expression in
pedigrees using blood samples.
• Blood was collected from 455 individuals from
51 Icelandic families (Most families were dense
three generation pedigrees).
• Samples were expression profiled against a
common reference pool.
• Samples were genotyped for 1000 markers
across the genome.
• Each of the 455 individuals was scored for 40
clinical traits.
RG1003 falls under linkage peak for obesity in females
RG1003 supported by obesity/diabetes linkages in
the published literature
RG1003 supported by Decode Linkage
Kissebah et al. 2000
Obese females
RG1003
RG1033 has cis-acting QTL in Decode family blood
expression data
RG1003
RG1003
High-expressor allele for RG1003 associates with high BMI
Overlap between cQTL and eQTL
Cis eQTL for RG1003
5
4,5
C03
LOD
4
3,5
3
2,5
2
1,5
BMI>35
1
0,5
aka GPR105
0
RG1033
LD1
LD2
GPR105 expression: association analysis
6
The best single marker association
5
Marker
D3S1279
Allele
10
p-value
8 x 10-6
RR
2.2
Top 50% BMI
205
Aff freq
0.21
Bottom 50% BMI
205
Ctrl freq
0.11
-log p
Clinical trait
4
3
Expression trait (GPR105)
Marker
Allele
p value
D3S1279
10
1 x 10-6
2
R
0.05
Expressor
High allele
10
2
Low allele
6
1
0
LD1
LD2
Markers across the GPR105 locus
ASO experiment in DIO mice
Wt Gain in DIO C57BL/6 Mice
35
30
25
20
15
10
vehicle
GPR105
SCD1
Scrambled
5
0
0.5
1
1.5
2
2.5
wk
3
3.5
4
4.5
These same approaches can be used to functionate the large
number of GWA studies getting dumped in the public domain today
WTCC paper reports GWA results for 7 common diseases; coming along
side this was a paper focusing on the T1D associations, where genes
corresponding to the associations are identified
et al.
In the T1D Paper Genes Corresponding to the Associations in the
WTCCC Paper are identified
•
•
But what functional support is provided for these identifications?
Consider the chr 12q13 association and the identification of ERBB3:
–
–
The gene was closest to the associated SNP
SH2B3 binds ERBB3, where ITAMs bind proteins like SH2B3 with SH2 signaling domains
involved in immune inflammatory events that lead to autoimmune pancreatic beta-cell
destruction in T1D
1MB Window
rs11171739
Genes Adjacent to rs11171739
Cis eSNP Distribution
(Liver)
> 10%
of cis
eSNPS
> 10%
of cis
eSNPS
Rps26, but NOT ERBB3 Is Significantly
Associated with rs11171739 in Cis
snp_chr snp_pos
12
54756892
12
54756892
12
54756892
12
54756892
12
54756892
12
54756892
12
54756892
12
54756892
12
54756892
12
54756892
12
54756892
12
54756892
12
54756892
12
54756892
12
54756892
gene_symbol
gene_name
MMT00321
MMT17394
MMT12973
MMT21703
MMT00741
MMT09493
MMT15828
ERBB3 Proximal Gene
MMT12163
MMT23083
MMT20493
MMT10434
MMT06311
MMT15103
ERBB3
snp_chr
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
log10_kw_pvalue
gene_chr gene_pos
36.076895
8
35.9099
18
35.418233
10
34.886122
5
34.774948
9
34.746478
1
34.496097 X
34.418756
12
34.01855
15
33.982509
8
32.994015
7
32.594523
8
32.578843
7
24.875821
10
0.39735
12
cis_trans
trans
trans
trans
trans
trans
trans
trans
cis
trans
trans
trans
trans
trans
trans
cis
•
ERBB3 expression activity has 2 suggestive trans eQTL, but is not at all linked to the T1D SNP
•
The Rps26 expression trait is very strongly linked to the T1D SNP; nearly 40% of the in vivo
expression of this gene is explained by this SNP
•
Other genes strongly linked to the T1D SNP in trans are homologs of the Rps26 gene
But now look at probabilistic causal networks
All crosses, all
tissues
• Liver
• Adipose
• Skeletal muscle
• Islets
• Whole brain
• Hypothalamus
Rps26
T1D
KEGG
pathway
genes
Schadt E, et al., PLoS Biology, 2008
Functional Enrichment of Rps26 Mouse
Bayesian Network Genes
Similar Set
Major histocompatibility complex antigen
T-cell mediated immunity
antigen processing
MHCII-mediated immunity
antigen processing, exogenous antigen via
antigen presentation, exogenous antigen
Type I diabetes mellitus
Antigen processing and presentation
MHC class II receptor activity
Cell adhesion molecules (CAMs)
antigen presentation
antigen presentation, exogenous antigen
Expectation
3.59615679443374E-11
4.11814903693412E-11
4.35718665292356E-10
2.19156051592854E-09
1.31842207155735E-08
1.48086534305264E-08
2.60246885295535E-08
3.024258456011E-08
5.66821865604424E-07
6.26076191318305E-07
8.01315717611796E-07
1.42894483510369E-05
Input Identifiers
H2-Aa;H2-Ab1;H2-Eb1;H2-M3;H2-DMa;H2-DMb1;H2-Q2;MMT00082085;H2C2ta;Cd2;Ctss;H2-Aa;H2-Ab1;H2-Eb1;H2-M3;H2-DMa;H2-DMb1;H2Rmcs1;Ctss;H2-Aa;H2-Ab1;H2-Eb1;H2-DMa;H2-DMb1;MMT00082085;Hfe;Psmb8
C2ta;Ctss;H2-Aa;H2-Ab1;H2-Eb1;H2-DMa;H2-DMb1;MMT00072401
Rmcs1;Ctss;H2-Aa;H2-Ab1;H2-Eb1;H2-DMa;H2-DMb1
Rmcs1;Fcgr3;H2-Aa;H2-Ab1;H2-Eb1;H2-DMa;H2-DMb1;Psmb8
H2-Aa;H2-Ab1;H2-Eb1;H2-M3;H2-DMa;H2-DMb1;H2-Q2;MMT00082085;H2-T9;Hspd1
C2ta;Ctss;H2-Aa;H2-Ab1;H2-Eb1;H2-M3;H2-DMa;H2-DMb1;H2Rmcs1;H2-Aa;H2-Ab1;H2-Eb1;H2-DMa;H2-DMb1
Cd2;H2-Aa;H2-Ab1;H2-Eb1;H2-M3;H2-DMa;H2-DMb1;H2-Q2;MMT00082085;H2Rmcs1;Fcgr3;H2-Aa;H2-Ab1;H2-Eb1;H2-DMa;H2-DMb1;MMT00082085;Hfe;Psmb8
H2-Aa;H2-Ab1;H2-Eb1;H2-DMa;H2-DMb1
What about ERBB3 network in mouse?
No functional enrichment in network genes (no T1D association)
AD sub-network
GO: inflammation
GO: anti-apoptosis
Legends:
Red: risk factor
APOE
Yellow: progression marker (proteomic candidates)
Rectangle: association marker (proteomic data)
NPTXR
VGF
CDK5R2
MAPT
BDNF
A2M
GO:Synaptic transmission
(p_value = 1.3e-12)
APBB1IP
How to understand phosphorylation
changes detected by proteomics?
• 16 proteins’ phosphorylation states
changed after inducing PIN1 siRNA (16
proteomic hits);
• Gene expression signature of PIN1 siRNA
is also defined;
• Phosphorylation change is the primary
signal, gene expression change is
amplified signature.
• Do the two types of signals match?
The two types of signals match
around PIN1
PIN1
Diamond: phosphorylation
Red: gene expression
MP
Rutper Vessey
Informatics, Biosoft, Biology
GEL, GEM
Genetics
Eric Schadt
Biology/New Targets
John Lamb
Pek Lum
Valur Emilsson
Jonathan Derry
Michael Coon
I-Ming Wang
Debraj GuhaThakurta
Tao Xie
Xia Yang
Network/Systems Biology
Jun Zhu
Bin Zhang
Radu Dobrin
Zhidong Tu
Dmitri Volfson
Mani Narayanan
Data management/HP computing
Andrew Kasarskis
Archie Russell
Xavier Schildwachter
Eugene Chudin
Statistical Genetics
Cliona Molony
Solly Sieberts
Josh Millstein
Ke Hao
Hunter Fraiser
finance/admin)
PMs: Sonia, Christine, and Rob*
Chunsheng Zhang*
Merck Collaborators
Obesity/Diabetes
Marc Reitman
Nancy Thornberry
Doug MacNeil
Charles Rosenblum
Su Chen
Shirly Pinto
Brian Kennedy
Joe Mancini
Joel Berger
Sajjad A. Qureshi
Cardiovascular
Sam Wright
Carl Sparrow
Marty Springer
Gerry Waters
Kenny Wong
Sleep
John Renger
Alzheimber’s
David Stone
Cancer
Stephen Friend
Theresa Zhang
Joseph Marszalek
Andrew Bloecher
Vinayak Kulkarni
ACSM
Jeff Sachs
Arthur Fridman
Matthew C. Wiener
Eric Minch
Metabolite/Toxicogenomics
Frank Sisteria
Bill scheffer
Ethan Xu
Qiuwei Xue
Other Merck Collaborators
Andy Plump
Larry Peterson
Erik Lund
External Collaborators
UW
Steve Schwartz
Roger Baumgarner
UWisc
Attiegroup
UCLA
Jake Lusis
UNL/UNC
Daniel Pomp
Decode
Kari Stefansson
NSI
Yanqing Chen
Harvard
Jun Liu
Berkerley
Rachel Brem
Princeton
Lenoid Kruglyak