Transcript Document

From patterns to pathways.
Causal analysis of gene expression data
Alexander Kel
BIOBASE GmbH
Halchtersche Strasse 33
D-38304 Wolfenbuettel
Germany
[email protected]
www.biobase.de
Pathway builder
Array analyser
TRANSPATH
- mechanistic
- semantic
S/MARt DB
Patho DB
TRANSFAC
Match
Patch
Catch
CMFinder
TRANSCompel
Cytomer
TRANSGenome
TRANSPLORER
The TRANSFAC® System comprises 7
databases:
TRANSFAC®
Professional Suite
TRANSFAC®
Professional
Transcription factor database
TRANSCompel®
Professional
Composite elements database
PathoDB®
Professional
Pathologically altered transcription factors
TRANSPRO™
Professional
Collection of human promoter sequences
S/MARt DB™
Professional
Scaffold or Matrix Attached Regions databases
Cytomer®
Ontology of cells, structures, organs
TRANSPATH®
Professional
Signal transduction pathways
TRANSFAC® Professional
Transcription factor database
trans
cis
…
Human genes
Sequences and positions of AP-1 binding sites
glutathione Ptransferase
enhancer at -2500
hemoglobin,
epsilon
TGAСTTT
-80 н.п.
TGACATC
Akt-2
IFN-
-100 н.п.
TGTCACC
-89 н.п.
Apo АII
TGACTCA
-792 н.п.
TGAGTCA
Melanotransferin
-2013 н.п.
Collagenase
TGAGTCA
-72 н.п.
proto-oncogene
c-myc
porphobilinogen
deaminase
TGATTTA
-335 н.п.
TGACTCA
-162 н.п.
GM-CSF
TGACTCA
enhancer at -3500
Structure of regulatory regions of eukaryotic genes
AP-1
AP-1
AP-1
CBF
AP-1
NF-B
NF-B
c-Rel/p65 p50/p65
GM-CSF
Homo
sapiens
CBF
AP-1
TATTT
NFAT
NFAT
CE
NFAT
NFAT
CE
CE
T-cell specific inducible enhancer at –3500 bp
NFAT
HMG Y(I)
-114
-88
CD28 response element
-54
CE
Promoter
ST
+1
Protein-DNA and protein-protein interactions in gene
transcriptional regulation.
S1
S2
S3
TF 2
TF 3
TF 1
TFIID
TFIIF
TFIIA
TFIIB
TFIIE
TFIIH
RNA pol II
Histone
acetylase
Transcription factors
Sequencespecific DNA
binding
Non-DNA
binding
HAT
Layer III
Co-activator
Layer II
Layer I
DNA
adapter
TF1
TF2
TF3
TF4
TRANSFAC: relational scheme
CLASS
SPECIES
FEATURES
interacting
factor
SYNONYMS
FACTOR
MATRIX
CELL
gene
METHOD
expression
SITE
regulatory region
SEQUENCE
FUNCTIONAL ELEMENT
GENE
coding region
Manual annotation of the databases: input client
TRANSFAC: GENE table
TRANSFAC: SITE table
Structure of transcription factors
USF-1, dimer
Structure of transcription factors
oligomerization
domain
Ligandbinding
domain
Activation
domain
Protein-protein
interaction
domain
DNA binding
domain
TRANSFAC: FACTOR table, protein sequence
TRANSFAC: FACTOR table, protein domains
TRANSFAC: FACTOR table, structural and functional features
TRANSFAC: FACTOR table, links to other databases
TRANSFAC: classification of transcription factors
TRANSFAC: CLASS table
TRANSFAC 8.1 (2004-03-31): number of factor entries for
different species
1400
human
plants
1200
1000
mouse
other
vertebrates
800
600
Fungi
rat
Other
400
fruit fly
200
0
TRANSFAC 8.1 (2004-03-31): distribution of experimentally
known TFBS in 5‘ regions of genes.
800
700
600
500
400
300
200
100
15
00
30
00
50
00
50
0
30
0
10
0
-5
0
-1
50
-3
50
-2
50
-6
00
-4
50
-4
00
0
-2
00
0
-1
00
0
-8
00
-1
00
00
0
TRANSFAC: FACTOR table, protein-DNA and proteinprotein interactions
TRANSFAC: MATRIX table
TRANSCompel® Professional
Composite elements database
Mouse Interleukin-2
gene promoter
AP-1
COMPEL:C00050
NF-ATp
.......
tgccacacaggtagactcttTTGAAAATAtgTGTAATAtgtaaaa catcgtgaca cccccatatt… …
-96
-79
TGAGTCA
AP-1 consensus
ST
Composite elements
Minimal functional units where both protein-DNA and protein-protein
interactions contribute to a highly specific pattern of gene expression
and provide cross-coupling of different signal transduction pathways.
F2
F1
Low level
of transcription
Low level
of transcription
F1
F2
Synergistic activation of
transcription
F1
F2
Combinatorial regulation by the composite elements
N
1.
Gene
IgH ** , Mus
musculus
2.
IL-2, Homo
sapiens
Scheme of CE
Ets
-283
:
-268
:
NFAT
3.
4.
-142
:
NF-B
-167
:
Il-2, Mus
musculus
6.
Serum
amyloid А1,
Rattus norv
IRF-1, Mus
musculus
7.
AP-1
-142
:
AP-1
IgH ** ,
Homo sapiens
AP-1
-167
:
IL-2, Homo
sapiens
5.
AP-1
Ets
Oct-2
CBF
-117
:
-73
:
C/EBP
-123
:
STAT-1
-113
:
NF-B
-49
:
-40
:
NF-B
Ternary complex NFATp - AP1 - DNA
flat files
Description of an
evidence (experiment,
cell type, two individual
interactions)
Link to the
TRANSFAC
GENE table
Link to EMBL
Link to the
TRANSFAC FACTOR
table
Cross-coupling of signal transduction pathways
Membrane receptor
Ca2+ dependent canal
Src
Ras
SH2
Ras
SH3
Phosphorylation
Ca2+
Ca2+
GTP
GDP
PLC
Adaptors
PI3-K
Ca2+
cytoplasm
IP3
Calcineurin
PKB/Akt
P
NFATp
ERK
JNK
NFATp
ERK
NFATp
Nucleus
c-Fos
P38MAPK
JNK
c-Fos
IL-2
P
P
c-Jun
с-Fos c-Jun
Composite element
P38MAPK
c-Jun
ATF-2
c-Jun ATF-2
ATF-2
Inducible/inducible
19 CE‘s ETS / AP-1 providing cross-coupling of
Ras/Raf- and PKC-dependent signalling pathways;
15 CE‘s NFATp / AP-1 providing cross-coupling of
Ca2+ - and PKC-dependent signalling pathways;
Tissue-specific
32
Inducible
44
Cell-cycle
dependent
Dev. stagedependent
Ubiquitous
constitutive
F1
F2
14 CE‘s NF-B / C/EBP
NF-B is inducible by IL-1 and TNF-; C/EBP is
inducible by IL-6.
119
1
2
39
Tissuespecific
2
3
60
Inducible
2
Cellcycle dep.
12
Dev. stagedependent
Ubiquit.
constitut.
Inducible/constitutive
9 CE‘s ETS / Sp1
ETS factors are inducible through Ras/Raf- dependent
signalling pathway;
5 CE‘s Smad / TEF3
Smads are inducible by TGF- signalling.
Tissue-specific
32
Inducible
44
Cell-cycle
dependent
Dev. stagedependent
Ubiquitous
constitutive
F1
F2
119
1
2
39
Tissuespecific
2
3
60
Inducible
2
Cellcycle dep.
12
Dev. stagedependent
Ubiquit.
constitut.
Inducible/tissue-restricted
CE‘s Pit-1 / AP-1
Pit1 is pituitary-restricted transcription factor whereas
AP-1 and Ets are ubiquitous inducible factors;
Tissue-specific
32
Inducible
44
Cell-cycle
dependent
Dev. stagedependent
Ubiquitous
constitutive
F1
F2
119
1
2
39
Tissuespecific
2
3
60
Inducible
2
Cellcycle dep.
12
Dev. stagedependent
Ubiquit.
constitut.
Mechanisms of functioning of synergistic composite elements
1)
F1
F2
S2
S1
F1
F2
S1
S2
2)
F1
F2
S2
S1
F1
F2
S1
S2
Cooperative binding to DNA
and ternary complex formation
A new protein surface for
DNA recognition could be
formed
3)
F1 F2
S1
S2
Simultaneous interaction
of activation domains with the
components of the basal complex
Mechanisms of functioning of synergistic composite elements
4)
F1
F2
S1
S2
Forming a new protein
surface for interaction with
the basal complex
5)
F1
F1
s1
F2
F2
s2
Relief of autoinhibition
as a result of proteinprotein interactions
Mechanisms of functioning of synergistic composite elements
6)
DNA bending by one of the
transcription factors
F1
S1
F2
S2
7)
DNA wrapping around
a nucleosome allows
transcription factors to interact
F1
F2
8)
HAT complex
F1
F2
S1
S2
Recruitment of a HAT complex
by one of the transcription factors
Mechanisms of functioning of antagonistic composite elements
1)
HAT complex
Mutually exclusive binding of
factor F1(activator)
and F2 (repressor)
HDAC complex
Mechanisms of functioning of antagonistic composite elements
2)
HAT complex
Binding of F2 (repressor)
results in the conformational
changes of F1 (activator)
HDAC complex
TRANSPATH® Professional
Database on signal transduction pathways
TRANSPATH: map of IFN pathway
TRANSPATH®
TRANSFAC®
TRANSPATH: molecules
Extracellular ligand
Membrane receptor
Adaptor
Second messanger
Kinase(s)
Transcription factor
Target gene
TRANSPATH: molecule hierarchy
IL-1/Toll receptor family
family
TLRs
family
complexes
TLR4
TLR4(h):MyD88(h)
TLR4(h)
TLR4(h)p
TLR5
TLR4(m)
ortholog
TLR5(h)
TLR4a(h)
basic
TLR4b(m)
isoform
modified form
TRANSPATH: reactions
Enzyme
Educts
Products
•Binding
•Phosphorylation
•Dephosphoralation
•Degradation
•Acetylation
•Dissociation
•Transregulation
•Expression
•Activation
•...
The elementar reaction step
C
R
A
B
Reaction R, catalyzed by catalyst C,
converts substance A into substance B.
TGF
1
R1
Pathway steps:
T:
TR2p
TGF
R-II
NTP
Pathway steps depict the
signaling in a more
biochemical way.
NDP
R2
TGF
R-I
T:
TR2p
:TR1p
R3
Smad
2
Smad
2p
R4
Smad
4
S2P:
S4
R5
gene
tc
In a semantic reaction, just individual
key molecules are given.
Semantic:
TGF1  TGF-RII  TGF-RI  Smad2  Smad4  gene
R1
R2
R3
R4
R5
Info about a specific molecule
Many synonyms make sure,
that you find your protein
External database links allow
identification of proteins easily
Parts of a molecule entry
Specific molecule (cont.)
Disease information
and GO terminology
localization of
human APP
Opens data entry of
a specific reaction
Parts of a molecule entry
Specific reaction of APP(h)
Evaluation of this
reaction is based on
experimental evidences
Part of a reaction entry
Signal transduction pathways
Extracellular ligand
Membrane receptor
Adaptor
Second messanger
Kinase(s)
Transcription factor
Target gene
Connecting path between two molecules
Connection between one specific molecule (magenta)
and a group of molecules (transcription factors in blue)
Oncostatin M pathway
B-cell antigen receptor pathway
PDGF pathway
Insulin pathway
Overview of a pathway – hand-drawn map
TRANSPATH: number of entries
12000
10000
8000
6000
4000
2000
0
Release Professional Release Professional Release Professional
3.1
2.4
2.1
molecules
reactions
references
Statistics: TRANSPATH® 5.1 and NetPro 1.1
Main tables
– Molecule
– Reaction
– Reference
18029
20199
8258
Molecules of mammalian origin
– Human
2503
– Mouse
1653
– Rat
810
+ NetPro
+ 7333
+ 30316
+ 9582
3521
2025
1224
Prediction
26 588 predicted human gene products of which 30.8% (~9000) seem to be
signal transduction relevant
(Venter et al., 2001)
=> 28% coverage of predicted proteins in TRANSPATH®
TRANSFAC® System
From patterns to pathways
Array analysis
The starting point:
A set of induced genes from
microarray experiments
Array analysis
KEGG
The conventional analysis:
deduce the gene products
and map them to the
network of metabolic pathways
biochemical effects
Array analysis
TRANSPATH
Extension of
conventional analysis:
map the induced gene products to the
network of regulatory pathways
biological effects
Array analysis
Identification of
new targets
KEGG
TRANSPATH
Reasoning
of experimental findings:
promoter analysis of induced genes
connected to network mapping
Array analysis
Promoter analysis
identifies additional target genes
and extends the affected network
promoter model
TRANSGENOME
database
additional
predicted genes
extended
predicted network
Array analysis
new target
network analysis
Causes
TRANSPATH
promoter analysis
TRANSFAC
retrieval of upstream sequences
TRANSGENOME
microarray: set of
induced genes
assignment of gene products
KEGG
TRANSPATH
regulatory network mapping
metabolic network mapping
Effects
modeling of effects
indirect hints on causes
trans
cis
…
?
A
C
G
T
9
8
4
8
N
2
3
2
22
T
1
1
2
25
T
0
1
2
26
T
1
13
15
0
S
0
3
26
0
G
0
29
0
0
C
0
0
29
0
G
l
q
0
22
7
0
C
1
8
17
3
S
l
 I (i) f (b , i)   I (i) f
i
i 1
l
 I (i) f
…
15
9
3
2
M
min
13
4
9
3
R
7
8
8
6
N
(i)
i 1
max
13
1
7
8
D
(1)
(i)
i 1
I (i ) 
 f (b, i) ln(4 f (b, i))
b{ A ,T ,G ,C }
(2)
TRANSPLORER (TRANScription exPLORER) is a software package for the analysis of transcription regulatory
sequences. Currently, TRANSPLORER site prediction tool uses position weight matrices (PWM) collections. It is able to
use several matrix sources: the largest and most up-to-date library of matrices derived from TRANSFAC® Professional
database, other matrix libraries as well as any user-developed matrix libraries. This means that it provides an
opportunity to search for a great variety of different transcription factor binding sites. A search can be made using all or
subsets of matrices from the libraries.
Search for most probable binding sites regulating gene expression
Search for binding sites coinsiding with SNPs
Mouse c-fos promoter
(Matrix search for TF binding sites)
1
<------------V$IK1_01(0.86)
-----...V$CREBP1CJUN_01(0.85)
2
<-----------V$IK2_01(0.90)
-----...V$CREB_01(0.96)
3
----------->V$AP2_Q6(0.87)
<-------------V$GKLF_01(0.87)
4-->V$ATF_01(0.89)
<-------V$MZF1_01(0.99)
----...V$ELK1_01(0.87)
5
<-----------V$AP2_Q6(0.92)
<------------V$SP1_Q6(0.88)
6>V$AP1FJ_Q2(0.89)
<-------------V$GKLF_01(0.85)
7>V$AP1_Q2(0.87)
<-------------V$GKLF_01(0.86)
8->V$CREB_Q2(0.86)
<---------V$CETS1P54_01(0.90)
9->V$CREB_Q4(0.90)
<---------V$NRF2_01(0.90)
10
<-------------V$GC_01(0.88)
11
----------->V$CAAT_01(0.87)
12
<------------V$TCF11_01(0.87)
13
----------->V$AP2_Q6(0.87)
14
<---------V$USF_Q6(0.93)
16
--------...V$ATF_01(0.94)
17
-------...V$AP1FJ_Q2(0.95)
20
-------...V$CREBP1_Q2(0.93)
21
-------...V$CREB_Q2(0.95)
23
---...V$IK2_01(0.85)
MMCFOS_1
GAGCGCCCGCAGAGGGCCTTGGGGCGCGCTTCCCCCCCCTTCCAGTTCCGCCCAGTGACG
420
1-->V$CREBP1CJUN_01(0.85)
-------------->V$BARBIE_01(0.86)
2-->V$CREB_01(0.96)
-------------->V$TATA_01(0.95)
3
----------->V$CAAT_01(0.91)
--------->V$AP4_Q5(0.95)
4----------->V$ELK1_01(0.87)
--------------------->V$HEN1_01(0.87)
5
--------->V$AP4_Q5(0.88)
<---...V$CMYB_01(0.93)
6
<---------V$CDPCR3HD_01(0.93)
--...V$VMYB_02(0.89)
7
<--------------V$TATA_01(0.88)
8
--------------------->V$HEN1_02(0.87)
9
<---------------------V$HEN1_02(0.86)
10
<-----------------V$AP4_01(0.88)
11
----------->V$LMO2COM_01(0.93)
12
<-----------V$LMO2COM_01(0.93)
13
<-----------V$MYOD_01(0.88)
17--->V$AP1FJ_Q2(0.95)
<---------V$AP4_Q6(0.99)
20---->V$CREBP1_Q2(0.93)
<---------V$MYOD_Q6(0.96)
21---->V$CREB_Q2(0.95)
Transcription start
23-------->V$IK2_01(0.85)
24
<=========== E2F (0.80)
MMCFOS_1
TAGGAAGTCCATCCATTCACAGCGCTTCTATAAAGGCGCCAGCTGAGGCGCCTACTACTC
480
1
<-----------------V$CMYB_01(0.91)
-------...V$ER_Q6(0.86)
2
<-----------V$LMO2COM_01(0.90)
<----...V$TCF11_01(0.87)
3
--------->V$MYOD_Q6(0.90)
-------->V$STAT_01(0.93)
4
--------->V$VMYB_01(0.89)
<--------V$STAT_01(0.89)
5--------------V$CMYB_01(0.93)
-------->V$LMO2COM_02(0.93)
6------>V$VMYB_02(0.89)
<-----------V$CAAT_01(0.85)
7
-------->V$VMYB_02(0.88)
8
-------------->V$EVI1_04(0.86)
9
------------->V$GATA1_02(0.93)
12
<------------V$ZID_01(0.85)
13
<----------V$CP2_01(0.97)
14
---------->V$GATA_C(0.92)
15
----------------->V$CMYB_01(0.86)
16
--------->V$CREL_01(0.91)
24
<=========== E2F (0.82)
MMCFOS_1
CAACCGCGACTGCAGCGAGCAACTGAGAAGACTGGATAGAGCCGGCGGTTCCGCGAACGA
540
Exon 2 sequence of human thyroid transcription factor-1
(TTF-1) gene (HS198161)
(Matrix search for TF binding sites)
1------------V$AHRARNT_01(0.90)
<-----------------V$NF1_Q6(0.85)
2--------V$NMYC_01(0.89)
--------->V$AP4_Q5(0.91)
3------>V$USF_Q6(0.89)
--------->V$AP4_Q6(0.85)
4------V$USF_C(0.86)
------------...V$YY1_02(0.86)
5 --------->V$AP4_Q5(0.91)
6 --------->V$AP4_Q6(0.86)
7
--------->V$AP4_Q5(0.92)
8
--------->V$AP4_Q6(0.86)
9
--------->V$AP4_Q5(0.86)
HS198161_1 ACGCGCAGCAGCAGGCGCAGCACCAGGCGCAGGCCGCGCAGGCGGCGGCAGCGGCCATCT
540
1
----------------->V$NF1_Q6(0.96)
2
<-----------------V$NF1_Q6(0.90)
3
--------->V$USF_Q6(0.87)
4------->V$YY1_02(0.86)
---------->V$CP2_01(0.88)
5
--------->V$AP4_Q5(0.92)
----------->V$CAAT_01(0.85)
6
--------->V$AP4_Q6(0.85)
--------->V$AP4_Q5(0.86)
7
------...V$CP2_01(0.86)
8
===========> E2F (0.81)
9
===========> E2F (0.90)
HS198161_1 CCGTGGGCAGCGGTGGCGCCGGCCTTGGCGCACACCCGGGCCACCAGCCAGGCAGCGCAG
600
1 <---------V$CETS1P54_01(0.89)
<--------...V$GATA_C(0.86)
2
----------------->V$NF1_Q6(0.85)
<-------...V$GATA1_02(0.90)
3
--------->V$CETS1P54_01(0.90)
<-------...V$GATA1_03(0.92)
4
<--------------------V$R_01(0.88) <-----...V$LMO2COM_02(0.90)
5
<---------------V$AHRARNT_01(0.86)
6
----------->V$AP2_Q6(0.95)
7---->V$CP2_01(0.86)
<-------...V$GATA1_04(0.87)
8
<----...V$CETS1P54_01(0.87)
9
===========> E2F (0.80)
HS198161_1 GCCAGTCTCCGGACCTGGCGCACCACGCCGCCAGCCCCGCGGCGCTGCAGGGCCAGGTAT 660
1--V$GATA_C(0.86)
<---------V$CETS1P54_01(0.89)
2------V$GATA1_02(0.90)
--------...V$DELTAEF1_01(0.96)
3------V$GATA1_03(0.92)
<---...V$CEBPB_01(0.88)
4---V$LMO2COM_02(0.90)
5
<-----------V$IK2_01(0.92)
6
<---------------V$E47_02(0.87)
7-----V$GATA1_04(0.87)
8-----V$CETS1P54_01(0.87)
9
<--------------V$E47_01(0.86)
10
---------->V$DELTAEF1_01(0.99)
11
<-----------V$LMO2COM_01(0.94)
12
<-----------V$MYOD_01(0.87)
13
--------->V$MYOD_Q6(0.91)
14
------->V$USF_C(0.93)
HS198161_1 CCAGCCTGTCCCACCTGAACTCCTCGGGCTCGGACTACGGCACCATGTCCTGCTCCACCT
720
Enhanceosome
Recruitment of CIITA to MHC-II promoters. A prototypical MHC-II promoter (HLA-DRA) is represented schematically with the
W, X, X2, and Y sequences conserved in all MHC-II, Ii, and HLA-DM promoters. RFX, X2BP, NF-Y, and an as yet undefined Wbinding protein bind cooperatively to these sequences and assemble into a stable higher order nucleoprotein complex referred to
here as the MHC-II enhanceosome. CIITA is tethered to the enhanceosome via multiple weak protein-protein interactions with the
W, X, X2, and Y-binding factors. The octamer site found in the HLA-DRA promoter (O), and its cognate activators (Oct and OBF1) are not required for recruitment of CIITA. CIITA is proposed to activate transcription (arrow) via its amino-terminal activation
domains (AD), which contact the RNA polymerase II basal transcription machinery.
Masternak K et al., Genes Dev 2000 May 1;14(9):1156-66
Recognition method for
T-cell specific Composite Elements NFAT/AP-1
AP-1
NFATp
5’
..WRGAAAA.. ..TGASTCA..3’
8-12 bp
A
C
G
T
1
2
3
4
5
6
7
8
5
5
8
8
12
1
2
11
2 0 26
0 0 0
23 26 0
1 0 0
25
0
1
0
25
1
0
0
15
5
2
4
A
C
G
T
NFAT = -log(1-scoreNFAT)
1
2
3
4
5
6
7
8
9
19
3
16
9
4
2
5
36
4 36 3
2 4 13
33 2 29
8 5 2
0
0
0
47
2
44
0
1
47
0
0
0
2
8
24
13
AP-1 = -log(1-scoreAP-1)
6,7
5,7
4,7
3,7
NFAT/AP-1 (training)
Random
2,7
Composite score
 1.47 AP1  4.7

wCE  17,0   NFAT
 NFAT  0.88 AP1  3.5
1,7
0,7
0,7
1,2
1,7
2,2
2,7
3,2
3,7
4,2
4,7
Selection of motifs with high frequency
in a window
motif: WSG
TTTGGCGCGAAA
window: [
]
Promoters of cell-cycle genes:
.............
Exon 2 sequences:
.............
}
}
Frequency
of the motifs
in the window
Motifs found in the local context of E2F sites in
promoters of cell cycle-related genes
Negative
characteristics
Positive
characteristics
N
Motif ()
fˆ Y
fˆ N
0.0048 / 0.0041 = 1.179
0.0112 / 0.0032 = 3.536
0.0851 / 0.0341 = 2.499
0.0675 / 0.0095 = 7.071
0.1233 / 0.0536 = 2.299
0.0337 / 0.0000
0.0980 / 0.0559 = 1.754
0.80
0.75
0.90
0.79
0.72
0.80
0.82
-0.394
0.9618
0.5353
0.5904
0.223
0.5036
0.595
-0.095
-0.2297
-0.261
-0.566
=-5.6767
2)
Utility
i
Window
(w)1)
[27,34]
[39,41]
[17,38]
[13,16]
[17,46]
[21,26]
[3,69]
1
2
3
4
5
6
7
MGCG
TTT
CGSK
HKCG
VDWW
DWTT
GSDM
8
VWS
[7,66]
0.1258 / 0.1932 = 0.651
0.91
9
10
11
HSWY
VTV
BAY
[26,65]
[19,34]
[7,65]
0.0413 / 0.0813 = 0.508
0.0427 / 0.1354 = 0.315
0.0274 / 0.0614 = 0.447
0.79
0.71
0.78
Score of context:
k
d ( X )     i  f (i , wi , X )
i 0
Human uracil DNA-glycosylase (E2F sites)
-1000
+1
1000
3000
5000
7000
9000
+ score of context
-1000
+1
1000
3000
5000
7000
ttTTTGCCGCGAAAag q=0.92 d=2.8 (known site)
9000
SITEVIDEO system
Building of E2F site recognition program (step 2)
SITEVIDEO system
Building of E2F site recognition program (step 3)
Composite modules
w
(1)
1
s
( 2)
1
s
(1)
cut off
s
( 2)
2
( 2)
cut off
(k )
(k )
1 ... nk
q
q
 (1)
 ( 2)
...
C  max 
w
(k )
 q (w)
k 1, K
K - number of TF matrixes
(k )
avr
s
... s
...
Start of
transcription
(k )
cut off
q
 (k )
...
Parameters of
the model to be
estimated
(k )
q
(
s
q (w)   i )
(k )
avr
i 1, nk
(k )
q ( si( k ) )  qcut
off
(k )
si w
Composite modules
w
(1)
1
s
(1)
cut off
( 2)
1
s
s
( 2)
2
( 2)
cut off
(k )
(k )
1 ... nk
s
... s
q
q
...
 (1)
 ( 2)
...
Start of
transcription
(k )
cut off
q
 (k )
Genetic Algorithms
...
Parameters of
the model to be
estimated
Composite module in promoters of
cell cycle-related genes
Weight:

qcutoff
TF matrix
1.000000
0.840072
V$E2F_19
0.954483
0.737637
V$TATA_01
0.888064
0.939687
V$CREB_01
0.816179
0.941583
V$SP1_Q6
0.039746
0.839702
V$TAL1BETAE47_01
4
0
Exon-2 sequences
Cell cycle-related
promoters
Noofsequences
3
0
2
0
C
1
0
(k )
(k )


q

cut off
k 1,5
0
-0
,5
0
,0
0
,5
1
,0
1
,5
2
,0
2
,5
3
,0
3
,5
4
,0
1
<------------V$IK1_01(0.86)
-----...V$CREBP1CJUN_01(0.
2
<-----------V$IK2_01(0.90)
-----...V$CREB_01(0.96)
3
----------->V$AP2_Q6(0.87)
<-------------V$GKLF_01(0.87)
4-->V$ATF_01(0.89)
<-------V$MZF1_01(0.99)
----...V$ELK1_01(0.87)
5
<-----------V$AP2_Q6(0.92)
<------------ V$SP1_Q6(0.88)
6>V$AP1FJ_Q2(0.89)
<-------------V$GKLF_01(0.85)
7>V$AP1_Q2(0.87)
<-------------V$GKLF_01(0.86)
8->V$CREB_Q2(0.86)
<---------V$CETS1P54_01(0.90)
9->V$CREB_Q4(0.90)
<---------V$NRF2_01(0.90)
10
<-------------V$GC_01(0.88)
11
----------->V$CAAT_01(0.87)
12
<------------V$TCF11_01(0.87)
13
----------->V$AP2_Q6(0.87)
14
<---------V$USF_Q6(0.93)
16
--------...V$ATF_01(0.94)
17
-------...V$AP1FJ_Q2(0.95)
20
------- ...V$CREBP1_Q2(0.93
21
------- ...V$CREB_Q2(0.95)
23
---...V$IK2_01(0.85)
MMCFOS_1
GAGCGCCCGCAGAGGGCCTTGGGGCGCGCTTCCCCCCCCTTCCAGTTCCGCCCAGTGACG
420
Mouse c-fos promoter
1-->V$CREBP1CJUN_01(0.85)
-------------->V$BARBIE_01(0.86)
2-->V$CREB_01(0.96)
--------------> V$TATA_01(0.95)
3
----------->V$CAAT_01(0.91)
--------->V$AP4_Q5(0.95)
4----------->V$ELK1_01(0.87)
--------------------->V$HEN1_01(0.87)
5
--------->V$AP4_Q5(0.88)
<---...V$CMYB_01(0.93)
6
<---------V$CDPCR3HD_01(0.93)
--...V$VMYB_02(0.89)
7
<-------------- V$TATA_01(0.88)
8
--------------------->V$HEN1_02(0.87)
9
<---------------------V$HEN1_02(0.86)
10
<-----------------V$AP4_01(0.88)
11
----------->V$LMO2COM_01(0.93)
12
<-----------V$LMO2COM_01(0.93)
13
<-----------V$MYOD_01(0.88)
17--->V$AP1FJ_Q2(0.95)
<---------V$AP4_Q6(0.99)
20----> V$CREBP1_Q2(0.93)
<---------V$MYOD_Q6(0.96)
21----> V$CREB_Q2(0.95)
Transcription star
23-------->V$IK2_01(0.85)
24
<----------- E2F (0.80)
MMCFOS_1
TAGGAAGTCCATCCATTCACAGCGCTTCTATAAAGGCGCCAGCTGAGGCGCCTACTACTC
480
Cell cycle composite module
1
<-----------------V$CMYB_01(0.91)
-------...V$ER_Q6(0.86)
2
<-----------V$LMO2COM_01(0.90)
<----...V$TCF11_01(0.87)
3
--------->V$MYOD_Q6(0.90)
-------->V$STAT_01(0.93)
4
--------->V$VMYB_01(0.89)
<--------V$STAT_01(0.89)
5--------------V$CMYB_01(0.93)
-------->V$LMO2COM_02(0.93)
6------>V$VMYB_02(0.89)
<-----------V$CAAT_01(0.85)
7
-------->V$VMYB_02(0.88)
8
-------------->V$EVI1_04(0.86)
9
------------->V$GATA1_02(0.93)
12
<------------V$ZID_01(0.85)
13
<----------V$CP2_01(0.97)
14
---------->V$GATA_C(0.92)
15
----------------->V$CMYB_01(0.86)
16
--------->V$CREL_01(0.91)
24
<----------- E2F (0.82)
MMCFOS_1
CAACCGCGACTGCAGCGAGCAACTGAGAAGACTGGATAGAGCCGGCGGTTCCGCGAACGA
540
MMCFOS_1
1----------->V$ER_Q6(0.86)
2--------V$TCF11_01(0.87)
3
--------->V$AP4_Q5(0.91)
4
--------->V$AP4_Q6(0.87)
5 ---------->V$AP1FJ_Q2(0.93)
6 ---------->V$AP1_Q2(0.90)
7 ---------->V$AP1_Q4(0.87)
8
<-----------V$IK2_01(0.94)
GCAGTGACCGCGCTCCCACCCAGCTCTGCTCTGCAGCTCC
580
Computationally predicted E2F target genes
confirmed by in vivo footprint
EMBL
Gene
Chromatin crosslinking
c-fos, Hs
HSFOS
JunB, Hs
HS207341
tgf-1, Hs
HSTGFB1P
R
p14ARF, Hs
AF082338
Immunoprecipitation
Mcm4
(Cdc21), Hs
mcm5 (P1cdc46), Hs
PCR
Von HippelLindau
(VHL), Hs
B-myb, Hs
HSU63630
HS286B10
AF010238
HSBMYBD
NA
nucleolin,
Hs
nucleolin,
Cg
nucleolin,
Ms
HSNUCLEO
CSNUCLEO
MMNUCLE
O
Score
,q
(+) aaGCTCGCGCCACTgc
(-) gcAGTGGCGCGAGCtt
(-) gtCTTCGCGCGCGCtc
Position rel.
start of
transcription
-165 .. -176
-92 .. –103
-90 .. –79
-78 .. –89
79 .. 90
91 .. 80
169 .. 158
-513 .. -502
-298 .. -287
28 .. 39
40 .. 29
85 .. 96
-1384 .. -1395
-1009 .. -1020
-739 .. -750
-589 .. -578
-265 .. -276
-491 .. -502
-409 .. -420
-377 .. -366
-175 .. -164
-93 .. -82
-187 .. -176
-175 .. -186
8 .. 19
20 .. 9
-270 .. -259
-258 .. -269
-28 .. 39
(-) gtCCTGGCGCGCGGgc
(+) cgCTTGGCGGGAGAta
-72 .. –83
-53 .. -42
0.83
0.87
1.18
-296 ->
+14 <-
(-) ttTTTGGCGCCGGCtg
(-) ccGTGGGCGCGCGGgt
-297 .. -308
-256 .. -267
0.97
0.81
2.91
-407 ->
-41 <-
(-) cgTTTGGCGCGGCTtg
-296 .. -307
0.97
6.67
-538 ->
-198 <-
(-) agTTTGGCGCGGCTtg
-306 .. -317
0.97
1.76
-531 ->
-232 <-
Sequence of the potential
sites
(-)
(-)
(+)
(-)
gcCTTGGCGCGTGTcc
ggGGTGGCGCGCGGgc
ccTCTGGCGCCACCgt
acGGTGGCGCCAGAgg
(+) gcTATCGCGCCAGAga
(-) tcTCTGGCGCGATAgc
(-) ggGCTGGCGCGGGCgg
(+)
(+)
(+)
(-)
(+)
ctGTTTGCGGGGCGga
ccCTTCGCGCCCTGgg
ctCTTGGCGCGACGct
agCGTCGCGCCAAGag
ccTTTGCCGCCGGGga
(-)
(-)
(-)
(+)
(-)
ctCTCCGCGCGCGGga
gtCTTGGCGACCGTtg
ggCCTGGCGCCGGAct
tgATTGGCGGATAGag
acTTTCCCGCCCTGtg
(-)
(-)
(+)
(+)
(+)
gtTTTCGCGGGAAAac
ctTTCAGCGCCCGTgc
gcAGTGGCGCCTCCcg
ggCGTGGCGCGGAGcc
ctTGTCGCGCAGGTac
(+)
(-)
(+)
(-)
agTTTCGCGCCAAAtt
aaTTTGGCGCGAAAct
ttTTTCCCGCGAAAct
agTTTCGCGGGAAAaa
0.92
0.84
0.88
0.83
0.89
0.91
0.82
0.80
0.91
0.93
0.83
0.85
0.81
0.81
0.81
0.83
0.86
0.93
0.82
0.80
0.83
0.86
0.99
1.00
0.89
0.93
0.81
0.84
0.92
Score of
context,
d
2.92
Positions
of PCR
primers
-201 ->
+96 <-
-27 ->
+313 <3.17
2.03
-122 ->
+210 <-
4.11
-404 ->
-143 <-
3.53
-667 ->
-330 <-
4.39
4.91
-211 ->
+88 <-
3.01
4.21
-137 ->
+123 <2.22
G1
G1/S
S
G2
G1
G1/S
S
G1/S-growth
G1/S-cycle
G2
Results of selection of a specific combinations of sites that distinguish
G1/S cycle and G1/S growth promoters. (microarray data)
a)
Relative importance

(k )
0.141420
0.389941
0.905325
-0.595259
-0.982593
-0.814943
Cut-off value
Matrix AC
Matrix ID
M10009
M00175
M00088
M00098
M00253
M00137
V$E2F_19
V$AP4_Q5
V$IK3_01
V$PAX2_01
V$CAP_01
V$OCT1_03
E2F and a set of additional
factors can distinguish these two
sets of promoters. AP-4 factors – an
ubiquitous factor that have similar
structure of DNA binding domains
as E2F and Myc – main cell cycle
regulators; IK3 (Ik-1...Ik-5 - a
family of zink finger TF that play a
role in development of the
lymphocytes). Pax-2 factor is known
to be involved in regulating cell
cycle by inhibiting the p53
transcription. It is known that Oct-3
differentially phosphorylated during
cell cycle and may have a role in the
regulation of the G1/S growth
promoters. As for Cup site, it was
already speculated that the structure
of the basal promoter may play an
important role in differentiating gene
expression during cell cycle
(k )
cutoff
q
0.923077
0.947434
0.838106
0.856055
0.997639
0.734697
b)
Histogram of G1/S cycle vs. G1/S growth
5
4
No of obs
3
2
1
0
-1,8 -1,6 -1,4 -1,2 -1,0 -0,8 -0,6 -0,4 -0,2 0,0
0,2
Site combination score
0,4
0,6
0,8
1,0
1,2
1,4
1,6
...
Jun
Fos
TGASTCA
AP-1
NFAT
human TNF promoter
-107
AP-1
mast cells
-74
NFAT
T-cells
NF-kB
dendritic cells
VDR
AP-1
C/EBP
T-cells + ?
Fuzzy puzzle hypothesis of the multipurpose structure of the
eukaryotic promoters: of coding multiple regulatory messages in the
same DNA sequence. A,B,C and D,E,F – two sets of TF; 1,2 – two
sites in DNA; BC – basal complex.
A
B
C
1
2
D
E
F
There‘s More Then One Way To Do It
(Convergent evolution)
AXX list of genes
RefSeq
LocusLink symbol
synonyms
NM_002421
4312 MMP1
CLG, CN2
matrix metalloproteinase 1 (interstitial collagenase)
NM_004530
4313 MMP2
CLG4, CLG4A
matrix metalloproteinase 2 (gelatinase A, 72kD
gelatinase, 72kD type IV collagenase)
NM_000611
NM_001972
NM_005317
NM_005532
966
1991
3004
3429
CD59
ELA2
GZMM
IFI27
CD59 antigen p18-20 (antigen identified by monoclonal
MSK21, MIC11, MIN2, MIN1, MIN3
antibodies 16.3A5, EJ16, EJ30, EL32 and G344)
elastase 2, neutrophil
LMET1, MET1
granzyme M (lymphocyte met-ase 1)
P27
interferon, alpha-inducible protein 27
NM_001548
NM_000565
NM_001565
NM_001572
NM_005564
NM_005567
3434
3570
3627
3665
3934
3959
IFIT1
IL6R
SCYB10
IRF7
LCN2
LGALS3BP
GARG-16, IFNAI1, G10P1, IFI56 interferon-induced protein with tetratricopeptide repeats 1
interleukin 6 receptor
chemokine (C-X-C motif) ligand 10
IRF-7A
interferon regulatory factor 7
NGAL
lipocalin 2 (oncogene 24p3)
90K, MAC-2-BP
lectin, galactoside-binding, soluble, 3 binding protein
NM_002422
NM_002423
4314 MMP3
4316 MMP7
STMY, STMY1
MPSL1, PUMP-1
NM_004994
NM_004995
NM_002428
NM_002534
4318
4323
4324
4938
CLG4B
MT1-MMP
MT2-MMP
IFI-4, OIASI, OIAS
NM_002787
NM_004586
NM_007315
5683 PSMA2
6197 RPS6KA3
6772 STAT1
NM_003254
NM_003255
7076 TIMP1
7077 TIMP2
NM_000362
NM_003684
NM_006417
7078 TIMP3
8569 MKNK1
10561 IFI44
MMP9
MMP14
MMP15
OAS1
STAT91
CLGI, EPO, TIMP
SFD
MNK1
p44, MTAP44
matrix metalloproteinase 3 (stromelysin 1, progelatinase)
matrix metalloproteinase 7 (matrilysin, uterine)
matrix metalloproteinase 9 (gelatinase B, 92kD
gelatinase, 92kD type IV collagenase)
matrix metalloproteinase 14 (membrane-inserted)
matrix metalloproteinase 15 (membrane-inserted)
2',5'-oligoadenylate synthetase 1 (40-46 kD)
proteasome (prosome, macropain) subunit, alpha type, 2
ribosomal protein S6 kinase, 90kD, polypeptide 3
signal transducer and activator of transcription 1, 91kD
tissue inhibitor of metalloproteinase 1 (erythroid
potentiating activity, collagenase inhibitor)
tissue inhibitor of metalloproteinase 2
tissue inhibitor of metalloproteinase 3 (Sorsby fundus
dystrophy, pseudoinflammatory)
MAP kinase-interacting serine/threonine kinase 1
interferon-induced protein 44
Extract
promoters
using
TRANSGENOME
AXX promoter
set
>ELA2 elastase 2, neutrophil; chrom=19p13.3; LocusLink=1991; 15-AUG-2002;length=1200
ggtatcacagggccctgggtaaactgaggcaggcgacacagctgcatgtggccggtatcacagggccctgggtaaactga
ggcaggcgacacagctgcatgtggccggtatcacagggccctgggtaaactgaggcaggcgacacagctgcatgtggccg
tatcacagggccctgggtaaactgaggcaggtgacacagctgcatgtggccggtatcacggggccctggataaacagagg
caggcgacacagctgcatgtggccggtatcacggggccctgggtaaactgaggcaggcgaggccacccccatcaagtccc
tcaggtctaggtttggcaggtttggcaaaaacacagcaacgctcggttaaatctgaatttcgggtaagtatatcctgggc
ctcatttggaagagacttagattaaaaaaaaaacgtcgagaccagcccggccaacacggtgaaaccccgtctctactaaa
aatacaaaaaattagccaggcgcagtggctcacgcctgtgatcccagcactctgggaggctgaggcaggcggatcacccg
aggtcagatgttcaagaccagcctggccgacagggcgaaacactgtctctactacaaatacaaaaattagccgggagtgg
tggcaggtgcctgtaatctcagctattcaggaggctgaggcaggagaatcacttgaacctgggaggcggaggttgccgtg
agccgggatcacgccaccgcactccagcctgggcgatagagcaagactctgtctccaaaaaaataaattaaaaaacccac
attgattatctgacatttgaatgcgattgtgcatcctgaattttgtctggaggccccacccgagccaatccagcgtcttg
tcccccttctcccccttttcatcaacgccctgtgccaggggagaggaagtggagggcgctggccggccgtggggcaatgc
aacggcctcccagcacagggctataagaggagccgggcgggcacggaggggcagagaccccggagccccagccccaccat
gaccctcggccgccgactcgcgtgtcttttcctcgcctgtgtcctgccggccttgctgctggggggtgagtttttgagtc
caacctcccgctgctccctctgtcccgggttctgttcccacctctccatagagggccccaccagtgtgggtccctcatcc
>MMP3 matrix metalloproteinase 3 (stromelysin 1, progelatinase); chrom=11q22.3;
LocusLink=4314; 15-AUG-2002;length=1200
aaagttttacaaaatgtcttcctctgaatatgtttagagtcttgcattcaagcatttattatacaccaataatgtgagca
acactttacttgacaaagaaacagaaaagaaaggaaaggaagaaaacagaagagcatgaagagaaaatttaggatggatt
ctgttcttcaacttcaaagcatctgctaatttgaatttagggaggaggggaaaaggttgaaagagaataagacatgtgta
gaagacaaggacagagagaatttcagtccggtaagcaatgtaattcatttcagttctacaactatttatggagcagctac
gtgggcccatcacccattaataaattggttacagaattaaaaccaacccaaagggaatatacttccttctttttcacaga
ccctctttgttctattctgcccatgaggttttcctcctcaagaaccagcaaatccaacgacagtcaatagcaggcattac
aaatcagattcagaaaaataaatcaccccttctaaatttcttctagatattatcttttatgttttgagtataattgtata
tagtatagactatagctatgtatgtacactttccacttacatcttttatttgcttttataatgtctttcttaaaataaaa
ctgcttttagaagttctgcacaattctgatttttaccaagtcaacctacttcttctctcaaaaggacaaacataaattgt
ctagtgaattccagtcaatttttccagaagaaaaaaaatgctccagttttctcctctaccaagacaggaagcacttcctg
gagattaatcactgtgttgccttgcaaaattgggaaggttgagagaaattagtaaagtaggttgtatcatcctactttga
atttggaatgtttggaaatggtcctgctgccatttggatgaaagcaaggatgagtcaagctgcgggtgatccaaacaaac
actgtcactctttaaaagctgcgctcccgaggttggacctacaaggaggcaggcaagacagcaaggcatagagacaacat
agagctaagtaaagccagtggaaatgaagagtcttccaatcctactgttgctgtgcgtggcagtttgctcagcctatcca
ttggatggagctgcaaggggtgaggacaccagcatgaaccttgttcaggtaattaacactaactgacctggccaggtggg
>IL6R interleukin 6 receptor; chrom=1; LocusLink=3570; 15-AUG-2002;length=1200
ttctctccttcctttccttccttcccctctatccctccttccctccctccctccctcctcccttccttttctttctttct
tttctttttttttttttctttccagacagggtctcactgtcatccaggctggagtagcagcccccaatcacggctcactg
taccctggatctcccggactcaagcaattttcccacctcagcttccctagtagctgggactataggtgtgtaccaccaca
cccagctaatttttaaatttttttatagaaatgggggtctcactttgttacacaggctggtctagaattcctggactgaa
gcaatccacccacccggctctcccaaagtgttggggttacaggcgtgagccactgcccctggtgttagtgtctgtctgtc
aagtcaggagggcagccatgaacgttctgatgtctactgagcacgtgtggcccagaccgtgtgtcaggtgtttaggtgcc
atccacagaaccttcctaataaccctgggcagcataggctttcttatctctgacagatgaggaaatggagactcagattc
tgaaccgaagtcacagacacagtagatggtaggtctaaatggggacccaggtctatctgactgcaaagtccaaaccgttt
ccttgcctctgctgcagcctgcgaggagcagctgggcagaaagactgtgcctttacggtggtgagtcttccgatgcccaa
gcctcaccccagaccgatgaaatcagaatctctggagacccgacccagacattggtgggttttagggctcctggctgatt
Composite module found in the AXX promoters
Importance Core cut-off Matr. Cut-off AC
Matrix
--------------------------------------------- ---------------------------------
0.917751
0.323077
0.640828
0.276923
1.000000
0.159172
0.877000
1.000000
0.989000
0.840000
0.756000
0.869000
0.930000 M00062 V$IRF1_01
0.948000 M00339 V$ETS1_B
0.982000 M00199 V$AP1_C
0.853000 M00037 V$NFE2_01
0.760000 M00481 V$AR_01
0.866000 M00699 V$ICSBP_Q6
Histogram (tt1.STA 2v *188c)
Percent of obs
y = 13 * 0,42348 * normal (x; 1,503956; 0,895746)
100%
95%
90%
85%
80%
75%
70%
65%
60%
55%
50%
45%
40%
35%
30%
25%
20%
15%
10%
5%
0%
<= ,423
(,423;,847]
(,847;1,27]
VAR1
(1,27;1,694] (1,694;2,117]
> 2,117
Interferon regulatory factor 1
Ets factors
AP-1
NF-E2 – an erythroid-specific
factor
Androgen receptor
Interferon Consensus Sequence
binding protein
Sites in the AXX promoter set:
Yes
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
V$IRF1_01 V$ETS1_B V$AP1_C V$NFE2_01 V$AR_01 V$ICSBP_Q6
0.951000
1.742000
Char
1.941000 0.984000 0.876000
Char
0.772000
Char
1.681000
Char
0.964000
0.856000 0.764000 1.764000 Char
1.000000 0.880000 1.644000
Char
0.984000
Char
1.860000
0.939000 Char
1.987000 1.850000 0.812000
Char
0.868000 1.548000
Char
0.985000 0.862000 1.575000
Char
0.780000
Char
1.966000 0.853000
Char
Char
1.921000
1.715000
Char
0.802000
Char
0.975000
1.766000
Char
1.866000
1.852000 Char
1.569000 1.892000 Char
0.760000
Char
1.886000
0.810000
Char
0.765000
Char
0.948000
0.873000
Char
1.892000
0.885000 Char
Char
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
0.78964
1.50025
0.77200
1.68100
1.59327
2.52852
0.63057
1.85648
2.59763
1.78836
2.44492
0.78000
1.49608
0.00000
2.33563
0.80200
2.08100
2.00731
1.87015
0.76000
2.54087
0.76500
0.54803
1.87725
0.00000
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
Sites in the other human promoters
Not
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
V$IRF1_01 V$ETS1_B V$AP1_C V$NFE2_01 V$AR_01 V$ICSBP_Q6
Char
Char
Char
Char
Char
Char
Char
Char
Char
Char
Char
Char
Char
Char
Char
Char
ELA2 elastase 2, neutrophil
MMP3 matrix metalloproteinase 3
IL6R interleukin 6 receptor
MMP2 matrix metalloproteinase 2
OAS1 2',5'-oligoadenylate synthetase 1
MMP1 matrix metalloproteinase 1
TIMP1 tissue inhibitor of metalloproteinas
STAT1 signal transducer and activator of t
MMP9 matrix metalloproteinase 9
MMP15 matrix metalloproteinase 15
MMP7 matrix metalloproteinase 7
MMP14 matrix metalloproteinase 14
CD59 CD59 antigen p18-20
LCN2 lipocalin 2 (oncogene 24p3)
GZMM granzyme M (lymphocyte met-ase 1)
IFI27 interferon, alpha-inducible protein
TIMP3 tissue inhibitor of metalloproteinas
IFIT1 interferon-induced protein with tetr
IFI44 interferon-induced protein 44
MKNK1 MAP kinase-interacting serine/threon
IRF7 interferon regulatory factor 7
TIMP2 tissue inhibitor of metalloproteinas
LGALS3BP lectin, galactoside-binding, solu
SCYB10
PSMA2
InsR
Insulin pathway
?
Signaling network analysis
Insulin
Part of the insulin signaling network
in TRANSPATH
InsR
STAT1
Ras
AhR targets
Gene expression Log(Experiment/Control)
Composite model correlate with the
expression level
log(Experiment/Control)
TSS
10
8
+1000
S41
distance = 0.417599 D2:0.658627 SIG:0.000000
MIN_LENGTH 300
0.000000
3.581248 1.000000 0.933000 M00026 V$AHR_Q5
2.942371 1.000000 0.917000 M00639 V$HNF6_Q6
0.798865 0.844000 0.900000 M00220 V$SREBP1_01
0.409376 0.962000 0.926000 M00173 V$AP1_Q2
0.055716 0.959000 0.989000 M00726 V$USF2_Q6
-1.329975 1.000000 0.959000 M00235 V$AHRARNT_01
-0.713625 1.000000 0.918000 M00156 V$RORA1_01
-0.668375 0.903000 0.854000 M00201 V$CEBP_C
V$AHRARNT_01
6
predicted expression
-1000
4
2
0
-4
-2
0
2
4
-2
-4
real expression
V$AHR_Q5
6
8
10
Composite module found in promoters of differentially
expressed genes in liver of
growth hormone-deficient mice (Sma1).
0.1040 * V$CETS1P54_02(0.949) -50- V$TCF4_Q5(0.908)
0.0751 * V$TCF1P_Q6(0.726)
-50- V$STAT6_01(0.861)
0.0728 * V$SF1_Q6(0.684)
-50- V$SMAD3_Q6(0.833)
0.0419 * V$ELK1_02(0.862)
-50- V$GRE_C(0.842)
450
40
400
35
350
30
300
25
250
20
No of obs
0.0983 * V$TCF11MAFG_01(0.821)
0.0471 * V$FOXO4_01(0.961)
0.0301 * V$IPF1_Q4(0.852)
0.0410 * V$AR_01(0.851)
0.0766 * V$GR_Q6(0.971)
0.0482 * V$STAT1_02(0.995)
0.0508 * V$CEBPB_01(0.98)
0.0281 * V$STAT5A_02(0.826)
200
15
150
10
100
50
5
0
0
-0.1
0.0
Non-changed
genes
0.1
0.2
0.3
0.4
Sma1
Norm
0.5
differentially
expressed
genes
Results of the ArrayAnalyzer™ search upstream from TFs resulting in identifying:
growth hormone (GH) and receptor tyrosine kinases (RTK) as potential key molecules
involved in differential expression of the genes in liver of growth hormone-deficient mice
(Sma1).
4
TRANSPATH and tools,
ArrayAnalyzer and PathwayBuilder
At the next step, one
can map the
transcription factors
found at the previous
step on the signaling
network of the
TRANSPATH. If the
factors found are parts
of the same cascades
that have been
suggested on the step
1, then probability is
increased that those
factors are responsible
for the coordinated
gene regulation.
Feedback loops in activating immune cells through
NF-AT/AP-1
cytokines,
chemokines
membrane
receptors
adaptor
proteins
PI3K
Ras, Raf
Calcineurin,
Ca2+ binding
proteins
ERK, JNK,
MAPK
NF-ATs
Jun, Fos
NF-AT/Jun:Fos
Groups that are statistically enriched by potential target genes
for Jun:Fos and NFATs (as shown in the table above).
Other groups that contain potential target genes for Jun:Fos
and NFATs.
+
+
c-ras
htf9a
Ras
RanBP1
+
+
+
Ran
Raf
+
erk-1
?
+
+ c-myc
c-Myc
+
+ B-myb
+
+
MEK
+
++
+ cdc2
c-Ets
B-Myb
+
+
+
cycE
cdk2
c-jun
+
+ c-Fos c-Jun
++ +
cycD1 +
cycE _
cycE
cdc2
+
+ + c-fos
c-ets
+
JNK
Erk-1
cdk4 +
Network
controlling S
phase entry in
response to a
proliferative
signal
cycD3
cycD3
cdk4
cycD1
cdk4
+
rb1
p
pRB
+
e2f-1
Enzymes of nucleotide
metabolism: dhfr, tk, cad
_
pRB
p
E2F-1
Factors and enzymes of replication
DNA pol , cdc6, ori1
DP-1
cdc21, cdc46, p1 co-factor
ada, odc, ts
Histones: H1, H2B-143,H3-143
Nucleolines
S-phase entry
TFBS identification via pattern search
Phylogenetic footprint of promoter regions of nucleolin genes
1
<===========V$CREB_02(0.85)
=============================================================================
2
<=======V$CREB_01(0.82)
MMNUCLEO
GGCCCGCTCATCAGCCCGAGGGAACCCTAGG--CC------TTCCGGCGTTCT------423
MMNUCLEO
TCTCCCCAC-CACACCAGGAAGTCACCTCTCTCA----------ACCTG---GAGTTATA
225
RNNUCIA1
GGCCCACTAAACGGCCCGAATGAACTCTAGG--CC------TTCCGGCGCTCT------435
1
<===========V$CREB_02(0.85)
CSNUCLEO
GGCC-GCGAGCTGGCCCCAGTGG-CTCTAGG--CCCTCAACTTCCGGCGCTCTCCGGCTC
450
2
<=======V$CREB_01(0.82)
HSNUCLEO
TGCCTCCAAAAGGGCCAACGGGAACTCCGCGGTCCCTGAACTTCCGGTGCTGGAGG---A
448
RNNUCIA1
TCTCCCACCACACACCAGGAAGTCACCTCTCTGA----------ACCTG---GAGTTATA
221
*** *
***
* * *
* **
****** * *
1
<===========V$CREB_02(0.85)
=============================================================================
2
<=======V$CREB_01(0.82)
MMNUCLEO
-TCAGCAGGACCACGCGGCG---------------------------------------442
CSNUCLEO
CCTCC-AGCACACACCAGGAAGTCACCTCTCCGAGACCGTCCCCATCAG---GAGTTAAA
229
RNNUCIA1
-CCAGCTCTTCAGCGCGGCGAACGTTCTAGGCCCCTGAGAAGTCCACCGGGAGGCGCAGG
494
1
<===============V$TH1E47_01(0.85)
CSNUCLEO
CTCAGCGGGAACGCGCGGCGAGCAGTTGAGGCCGCCGCGGATTCCAACGGGTTGGGGACG
510
HSNUCLEO
TGGCCCTGT-GAGGCCAGAAAGTTACTTCTCCGAGGCCAGTTCCCCATGTCTGAGAAATA
229
HSNUCLEO
CTCCTCGCTCCAGGGCCACCAGGAGCCGCGGC---------------------GTGAGTG
487
**
* **** **** ** **** *
*
*** * *
* *
** *
=============================================================================
=============================================================================
MMNUCLEO
--------------GGGGGAAA-----GCACCGAGAAACGCCCAGACCACCTGAGCATCG
483
1
<==========V$DELTAEF1_01(0.82)
RNNUCIA1
TTTCCGCTACGCGAGGGGGAAA-----TCCCCGAGAAATGCCCAGACCACCTAAGCACAG
549
MMNUCLEO
CCTACCG-CGAGAGGTCACCGACATTACATGGATCGCTTGTGCACTGCTCGTA--CACAC
282
CSNUCLEO
TTCGC----AGCGCGGGGGATGCTCGGGCCACCCACCACCCCCCCACCCCCCCGGCCACG
566
1
<======== ==V$DELTAEF1_01(0.87)
HSNUCLEO
CGTGCCGGAACCGAGGGCGGGG-----TCTCTGAGGAACTCCAAGGCTGCCCAAGCCTAC
542
RNNUCIA1
CCTACCG-CGTGAGGTCA--GAGATTAAATGGACTGTTTGTGCACTGCTCACA--CACAC
276
*** *
*
* **
* **
**
1
<======== ==V$DELTAEF1_01(0.84)
=============================================================================
CSNUCLEO
TCTACCG-CGCGAGGTTG--GACATTAAGCGAGCTGTTTGAGCACTGCACACAGGCGCGC
286
MMNUCLEO
CCGCCC--------ATGCTGCCTCGGAACACCTGAGGGAATCCGGGCCACGCCGCCACCT
535
1
<========= =V$DELTAEF1_01(0.84)
RNNUCIA1
ACGTCC--------ATGCGGCGTACGGATACCTGAGGGAATCCGGGCCATACCGCCACCT
601
HSNUCLEO
TCTCCCAACTTGAGGTTCT-GTGGGGTAGGGGAGGGTTCGTGACTTTCTCACAGAAAACC
288
CSNUCLEO
AGGCCCGGAGCTCCAGGTAGCAGTGCAGCACTAGGCGGCGTCCGGGCCACGCCGCCCAAT
626
** ** * *****
*
*
* * * *
* * * *
*
HSNUCLEO
GGACCC---------AGCCACATTGGCGAACC----GGAGACCGCCCGATTCCACCACC588
=============================================================================
**
*
*
**
**
*** * * ** **
1
<=======V$NKX25_02(0.84)
2
=========>V$CETS1P54_01(0.87)=============================================================================
1
<=======V$E2F_02(1.00)
MMNUCLEO
ACACACGCAC------------AACTGCTTTTATTAGGAGCT----CTCAGGAAAGCGGG
326
MMNUCLEO
ACCCGCG--CCTCACACACAAGCCGCGCCAAACTCGCCCGTCCCACTGCGCAGGCGTGGG
593
1
<=======V$NKX25_02(0.84)
1
<=======V$E2F_02(1.00)
2
=========>V$CETS1P54_01(0.87)
RNNUCIA1
ACTCGCG--CCTCACTC--AAGCCGCGCCAAACTCGCGCGTTTCACTGCGCAGGCGTGTA
657
RNNUCIA1
ACACACGCGCGCGCGCGCGCGAAATTGCTTTTATTAGGAGCT----CTCAGGAAAGTGGT
332
1
<=======V$E2F_02(1.00)
1
=======>V$NKX25_02(0.82)
TCCCCCGAGCCCCTTCCACAAGCCGCGCCAAACGGGTCTG---CACCGCGCAGGCG--GC
681
2
<==========V$DELTAEF1_01(0.81)CSNUCLEO
1
<=======V$E2F_02(1.00)
3
=========>V$CETS1P54_01(0.84)
HSNUCLEO
-CCCGCGCTCCCCTCAC--AGCCGGCGCCAAAAACGCCAGTCCCACGACGCAGGC----640
CSNUCLEO
ACACACGCACGC----------AACTGCCTTTATTGGGAGCTGTCTCTCAGGAGAACAGC
336
* * ** ** *
* * * ********
*
*
*** *******
1
<=======V$NKX25_02(0.83)
2
<==========V$DELTAEF1_01(0.81)
3
=========>V$CETS1P54_01(0.86)
HSNUCLEO
TCGTACAGACCC-------CGCCACTGCCTTTATTAACAGCT----CTCAGGAGACTGCC
337
* **
*
* *** ******
****
******* *
HSNUCLEO - Homo sapiens;
=============================================================================
CSNUCLEO - Cricetulus griseus;
MMNUCLEO
GACTCGCATCA---TAGCCAAG----AAGCCGTTCGCGAC-TCCGCGGAGAACAGGCCGA
378
RNNUCIA1
GGCTCGCATCAGGCTACCACAGCC--AAGAGGACCGCCACCTCTACCGAGGGCAGGCCAA
390
MMNUCLEO - Mus musculus;
CSNUCLEO
GGCCCGCGGCGCAACACTAGAGCCCCGGGATGTTCTCGGC-TCTGCCGAGGGCAG-CCGA
394
RNNUCIA1 – Rattus norvegicus
HSNUCLEO
TGCAGGAGGGGGGTCGCTCCGGCC---CCATGCTCGCGGG-CAAGCAGGGATAAG--CTG
391
* *
*
* * *
* * *
** *
A
T
G
C
1)
A
T
2)
A
T
3)
A
T
G
G
G
C
C
C
Result of comparison of four different pattern discovery programs on the sets of simulated
sequences with implanted TF binding sites for one matrix; y-axis: the averaged sum of
squared differences between reveled matrix and the original one; x-axis:  values, that are
the probabilities of “consensus nucleotide” in each position of the matrix.
1,000
Kernel
MEME
CONSENSUS
GIBBS
0,800
0,600
0,400
0,200
GIBBS
CONSENSUS
M EM E
0,000
Kernel
0,65
0,7
0,75
0,8
0,85
0,9
0,95
Table 1. Comparison of 3 programs performing the best for the low levels of  value.

0,65
0,7
Kernel
0,205
0,165
MULTIPROFILER
0,208
0,255
PROJECTION
0,260
0,304
Three mechanisms of biopolymer evolution
Gradual evolution
by fixation of multiple substitutions
(Protein functional centres)
Edited bipolymer
by fixation of a small number of
substitutions (Protein folding)
Evolution at once
by fixation of single substitutions
(Regulatory regions of eukaryotic
genes)
Thank you !
www.biobase.de