Simplex sigillum veri New approaches to the analysis of

Download Report

Transcript Simplex sigillum veri New approaches to the analysis of

Evidence networks
for the analysis of
biological systems
Rainer Breitling
IBLS – Molecular Plant Science group
Bioinformatics Research Centre
University of Glasgow, Scotland, UK
Background
Datasets and evidence
networks in post-genomic
biology
Genomics
Fully sequenced genomes (1995-2004):
18 archaea
163 bacteria
3 protozoa
24 yeast species and fungi
2 plants (Arabidopsis, rice)
2 insects (flies, honey bee)
2 worms (C.elegans, C. briggsae)
3 fish (fugu, puffer, zebrafish)
chicken, cow, dog, mouse, rat, chimp
human
 lots of “lists” of genes
Transcriptomics
•microarrays measure gene
expression levels (mRNA
concentrations)
•relative or absolute values
•in organisms, tissues, cells
•produce gene lists (e.g., which
genes are up-regulated by a
disease, by drug treatment, in a
certain tissue)
Proteomics
•2D gels, liquid chromatography,
and mass spectrometry measure
protein concentrations
•in tissues, cells, organelles
•detect chemical modifications
and processing of proteins
•produces lists of protein
variants that are different among
conditions
Metabolomics
•chromatography and mass
spectrometry measure
metabolite concentrations
•in tissues, cells, body fluids,
cell culture medium
•produces lists of affected
metabolites
Evidence networks
• relate items (genes, proteins, metabolites)
that “have something to do with each
other”
• relationship is based on objective
evidence
• represented as bipartite graphs
– two classes of nodes: items and evidence
– automated analysis of results possible
– intuitive visualization and links to literature
Types of evidence networks
• Relationship can be based on
– physical neighborhood
– phyletic pattern similarity
– expressional correlation
– biophysical similarity
– chemical transformation
– functional co-operation
– literature co-citations
Types of evidence networks
• Relationship can be based on
– physical neighborhood
– phyletic pattern similarity
– expressional correlation
– biophysical similarity
– chemical transformation
– functional co-operation
– literature co-citations
A O M P K Z Y Q V D R L B C E F G H S N U J X I T W
phy: a o m p k z y - - d - l - - - - - - - - - - - i t –
22 aompkzy--d-l-----------it- NtpA [C] H+-ATPase subunit A
17 aompkzy--d-l-----------it- NtpB [C] H+-ATPase subunit B
17 aompkzy--d-l-----------it- NtpD [C] H+-ATPase subunit D
18 aompkzy--d-l-----------it- NtpI [C] H+-ATPase subunit I
Types of evidence networks
• Relationship can be based on
– physical neighborhood
– phyletic pattern similarity
– expressional correlation
– biophysical similarity
– chemical transformation
– functional co-operation
– literature co-citations
Types of evidence networks
• Relationship can be based on
– physical neighborhood
– phyletic pattern similarity
– expressional correlation
– biophysical similarity
– chemical transformation
– functional co-operation
– literature co-citations
Types of evidence networks
• Relationship can be based on
– physical neighborhood
– phyletic pattern similarity
– expressional correlation
– biophysical similarity
– chemical transformation
– functional co-operation
– literature co-citations
Types of evidence networks
• Relationship can be based on
– physical neighborhood
– phyletic pattern similarity
– expressional correlation
– biophysical similarity
– chemical transformation
– functional co-operation
– literature co-citations
Types of evidence networks
• Relationship can be based on
– physical neighborhood
– phyletic pattern similarity
– expressional correlation
– biophysical similarity
– chemical transformation
– functional co-operation
– literature co-citations
What is the big picture?
Graph-based iterative
Group Analysis for the
automated interpretation of
biological datasets
lists + graphs = understanding
What does this list mean?
Fold-Change
Gene Symbol
Gene Title
1
26.45
TNFAIP6
tumor necrosis factor, alpha-induced protein 6
2
25.79
THBS1
thrombospondin 1
3
23.08
SERPINE2
serine (or cysteine) proteinase inhibitor, clade E (nexin, plasminogen activator inhibitor
type 1), member 2
4
21.5
PTX3
pentaxin-related gene, rapidly induced by IL-1 beta
5
18.82
THBS1
thrombospondin 1
6
16.68
CXCL10
chemokine (C-X-C motif) ligand 10
7
18.23
CCL4
chemokine (C-C motif) ligand 4
8
14.85
SOD2
superoxide dismutase 2, mitochondrial
9
13.62
IL1B
interleukin 1, beta
10
11.53
CCL20
chemokine (C-C motif) ligand 20
11
11.82
CCL3
chemokine (C-C motif) ligand 3
12
11.27
SOD2
superoxide dismutase 2, mitochondrial
13
10.89
GCH1
GTP cyclohydrolase 1 (dopa-responsive dystonia)
14
10.73
IL8
interleukin 8
15
9.98
ICAM1
intercellular adhesion molecule 1 (CD54), human rhinovirus receptor
16
9.97
SLC2A6
solute carrier family 2 (facilitated glucose transporter), member 6
17
8.36
BCL2A1
BCL2-related protein A1
18
7.33
TNFAIP2
tumor necrosis factor, alpha-induced protein 2
19
6.97
SERPINB2
serine (or cysteine) proteinase inhibitor, clade B (ovalbumin), member 2
20
6.69
MAFB
v-maf musculoaponeurotic fibrosarcoma oncogene homolog B (avian)
iterative Group Analysis (iGA)
iGA uses simple hypergeometric distribution to obtain p-values
Breitling et al., BMC Bioinformatics, 2004, 5:34
Graph-based iGA
Breitling et al., BMC Bioinformatics, 2004, 5:100
Graph-based iGA
1. step: build the network
Breitling et al., BMC Bioinformatics, 2004, 5:100
Graph-based iGA
2. step: assign ranks to genes
Breitling et al., BMC Bioinformatics, 2004, 5:100
Graph-based iGA
3. step: find local minima
p = 1/8 = 0.125
p = 6/8 = 0.75
p = 2/8 = 0.25
Breitling et al., BMC Bioinformatics, 2004, 5:100
Graph-based iGA
4. step: extend subgraph from minima
p=0.014
p=0.018
p=0.125
p=1
Breitling et al., BMC Bioinformatics, 2004, 5:100
Graph-based iGA
5. step: select p-value minimum
p=0.014
p=0.018
p=0.125
p=1
Breitling et al., BMC Bioinformatics, 2004, 5:100
Advantages of GiGA
• fast, unbiased and comprehensive analysis
• assignment of statistical significance values to
interpretation
• detection of significant changes even if data are
too noisy to reliably detect changed genes
• statistically meaningful interpretation already
without replication experiments
• detection of patterns even for small absolute
changes
• flexible use of annotations + intuitive
visualization
Example 1
Microarrays
Gene expression
changes during the
yeast diauxic shift
Yeast diauxic shift study
DeRisi et al. (1997)Science 278: 680-6
Yeast diauxic shift study
0h
UP
9.5h
11.5h
13.5h
15.5h
18.5h
20.5h
6144 - purine base
metabolism
6099 - tricarboxylic
acid cycle
6099 - tricarboxylic
acid cycle
3773 - heat shock
protein activity
6099 - tricarboxylic
acid cycle
9277 - cell wall
(sensu Fungi)
3773 - heat shock
protein activity
5749 - respiratory
chain complex II
(sensu Eukarya)
6099 - tricarboxylic
acid cycle
3773 - heat shock
protein activity
297 - spermine
transporter activity
6950 - response to
stress
6121 - oxidative
phosphorylation,
succinate to
ubiquinone
5977 - glycogen
metabolism
5749 - respiratory
chain complex II
(sensu Eukarya)
15846 - polyamine
transport
297 - spermine
transporter activity
8177 - succinate
dehydrogenase
(ubiquinone) activity
6950 - response to
stress
6121 - oxidative
phosphorylation,
succinate to
ubiquinone
4373 - glycogen
(starch) synthase
activity
3773 - heat shock
protein activity
4373 - glycogen
(starch) synthase
activity
8177 - succinate
dehydrogenase
(ubiquinone) activity
15846 - polyamine
transport
4373 - glycogen
(starch) synthase
activity
4129 - cytochrome c
oxidase activity
6537 - glutamate
biosynthesis
5353 - fructose
transporter activity
7039 - vacuolar
protein catabolism
5751 - respiratory
chain complex IV
(sensu Eukarya)
6097 - glyoxylate
cycle
15578 - mannose
transporter activity
6950 - response to
stress
5749 - respiratory
chain complex II
(sensu Eukarya)
5750 - respiratory
chain complex III
(sensu Eukarya)
7039 - vacuolar
protein catabolism
4129 - cytochrome c
oxidase activity
6121 - oxidative
phosphorylation,
succinate to
ubiquinone
9060 - aerobic
respiration
8645 - hexose
transport
5751 - respiratory
chain complex IV
(sensu Eukarya)
8177 - succinate
dehydrogenase
(ubiquinone) activity
4129 - cytochrome c
oxidase activity
GiGA results – diauxic shift
Down-regulated genes using GeneOntology-based network
locus
gene description ("anchor gene")
p-value
members
max. rank
YHL015W
ribosomal protein S20
5.87E-86
39
48
YMR217W
GMP synthase
3.38E-13
9
172
YDR144C
aspartyl protease|related to Yap3p
4.06E-08
6
242
YNL065W
multidrug resistance transporter
4.02E-05
3
141
6.41E-05
4
367
YLR062C
YGL225W
May regulate Golgi function and glycosylation in Golgi
1.12E-04
4
422
YPR074C
transketolase 1
1.44E-04
4
449
total genes measured in network: 4087.
small
ribosomal
subunit
large
ribosomal
subunit
nucleolar
rRNA
processing
translational
elongation
GiGA case study – diauxic shift
Up-regulated genes using metabolic network
locus
gene description
p-value
members
max. rank
YER065C
isocitrate lyase
4.96E-53
39
54
YGR088W
catalase T
3.09E-10
11
106
YFR015C
glycogen synthase (UDP-glucose-starch
glucosyltransferase)
2.08E-04
3
45
YJR073C
unsaturated phospholipid N-methyltransferase
3.85E-04
5
156
YDR001C
neutral trehalase
5.01E-04
3
60
YCR014C
DNA polymerase IV
5.44E-04
17
481
YIR038C
glutathione transferase
8.64E-04
5
183
total genes measured in network: 744.
respiratory chain
complex II
glyoxylate
cycle
citrate (TCA) cycle
oxidative phosphorylation
(complex V)
respiratory chain
complex III
respiratory chain
complex IV
Example 2
Metabolomics
Changes in metabolic
profiles in drug-treated
trypanosomes
GiGA applied to metabolomics data
• Challenge: No
annotation available
• Solution: Build
evidence network
based on
hypothetical
reactions between
observed masses
(=mass differences)
Metabolite tree of mass 257.1028
(glycerylphosphorylcholine)
6 generations
Metabolite tree of mass 257.1028
4 generations
Metabolite tree of mass 257.1028
2 generations
Metabolite tree of mass 257.1028
colors indicate changes of
metabolite signals compared
to untreated samples after 60
min pentamidine (red = down,
green = up)
GiGA metabolite trees for one
experimental example
Choline tree found by GiGA
(most significant subgraph, p<10-13)
extracted from
Summary
• post-genomic technologies produces “lists”
• neighborhood relationships yield “evidence
networks (graphs)
• lists + graphs = biological insights
• GiGA graph analysis highlights and connects
relevant areas in the “evidence network”
Acknowledgements
• Pawel Herzyk – Sir Henry Wellcome Functional
Genomics Facility
• Anna Amtmann & Patrick Armengaud – IBLS
Molecular Plant Science group
• Mike Barrett – IBLS Parasitology Research group
• FGF academic users: Wilhelmina Behan, Simone Boldt,
Anna Casburn-Jones, Gillian Douce, Paul Everest,
Michael Farthing, Heather Johnston, Walter Kolch, Peter
O'Shaughnessy, Susan Pyne, Rosemary Smith, Hawys
Williams
Contact
Rainer Breitling
Bioinformatics Research Centre
Davidson Building A416
University of Glasgow, Scotland, UK
[email protected]
http://www.brc.dcs.gla.ac.uk/~rb106x