Transcript Document
From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany [email protected] www.biobase.de Pathway builder Array analyser TRANSPATH - mechanistic - semantic S/MARt DB Patho DB TRANSFAC Match Patch Catch CMFinder TRANSCompel Cytomer TRANSGenome TRANSPLORER The TRANSFAC® System comprises 7 databases: TRANSFAC® Professional Suite TRANSFAC® Professional Transcription factor database TRANSCompel® Professional Composite elements database PathoDB® Professional Pathologically altered transcription factors TRANSPRO™ Professional Collection of human promoter sequences S/MARt DB™ Professional Scaffold or Matrix Attached Regions databases Cytomer® Ontology of cells, structures, organs TRANSPATH® Professional Signal transduction pathways TRANSFAC® Professional Transcription factor database trans cis … Human genes Sequences and positions of AP-1 binding sites glutathione Ptransferase enhancer at -2500 hemoglobin, epsilon TGAСTTT -80 н.п. TGACATC Akt-2 IFN- -100 н.п. TGTCACC -89 н.п. Apo АII TGACTCA -792 н.п. TGAGTCA Melanotransferin -2013 н.п. Collagenase TGAGTCA -72 н.п. proto-oncogene c-myc porphobilinogen deaminase TGATTTA -335 н.п. TGACTCA -162 н.п. GM-CSF TGACTCA enhancer at -3500 Structure of regulatory regions of eukaryotic genes AP-1 AP-1 AP-1 CBF AP-1 NF-B NF-B c-Rel/p65 p50/p65 GM-CSF Homo sapiens CBF AP-1 TATTT NFAT NFAT CE NFAT NFAT CE CE T-cell specific inducible enhancer at –3500 bp NFAT HMG Y(I) -114 -88 CD28 response element -54 CE Promoter ST +1 Protein-DNA and protein-protein interactions in gene transcriptional regulation. S1 S2 S3 TF 2 TF 3 TF 1 TFIID TFIIF TFIIA TFIIB TFIIE TFIIH RNA pol II Histone acetylase Transcription factors Sequencespecific DNA binding Non-DNA binding HAT Layer III Co-activator Layer II Layer I DNA adapter TF1 TF2 TF3 TF4 TRANSFAC: relational scheme CLASS SPECIES FEATURES interacting factor SYNONYMS FACTOR MATRIX CELL gene METHOD expression SITE regulatory region SEQUENCE FUNCTIONAL ELEMENT GENE coding region Manual annotation of the databases: input client TRANSFAC: GENE table TRANSFAC: SITE table Structure of transcription factors USF-1, dimer Structure of transcription factors oligomerization domain Ligandbinding domain Activation domain Protein-protein interaction domain DNA binding domain TRANSFAC: FACTOR table, protein sequence TRANSFAC: FACTOR table, protein domains TRANSFAC: FACTOR table, structural and functional features TRANSFAC: FACTOR table, links to other databases TRANSFAC: classification of transcription factors TRANSFAC: CLASS table TRANSFAC 8.1 (2004-03-31): number of factor entries for different species 1400 human plants 1200 1000 mouse other vertebrates 800 600 Fungi rat Other 400 fruit fly 200 0 TRANSFAC 8.1 (2004-03-31): distribution of experimentally known TFBS in 5‘ regions of genes. 800 700 600 500 400 300 200 100 15 00 30 00 50 00 50 0 30 0 10 0 -5 0 -1 50 -3 50 -2 50 -6 00 -4 50 -4 00 0 -2 00 0 -1 00 0 -8 00 -1 00 00 0 TRANSFAC: FACTOR table, protein-DNA and proteinprotein interactions TRANSFAC: MATRIX table TRANSCompel® Professional Composite elements database Mouse Interleukin-2 gene promoter AP-1 COMPEL:C00050 NF-ATp ....... tgccacacaggtagactcttTTGAAAATAtgTGTAATAtgtaaaa catcgtgaca cccccatatt… … -96 -79 TGAGTCA AP-1 consensus ST Composite elements Minimal functional units where both protein-DNA and protein-protein interactions contribute to a highly specific pattern of gene expression and provide cross-coupling of different signal transduction pathways. F2 F1 Low level of transcription Low level of transcription F1 F2 Synergistic activation of transcription F1 F2 Combinatorial regulation by the composite elements N 1. Gene IgH ** , Mus musculus 2. IL-2, Homo sapiens Scheme of CE Ets -283 : -268 : NFAT 3. 4. -142 : NF-B -167 : Il-2, Mus musculus 6. Serum amyloid А1, Rattus norv IRF-1, Mus musculus 7. AP-1 -142 : AP-1 IgH ** , Homo sapiens AP-1 -167 : IL-2, Homo sapiens 5. AP-1 Ets Oct-2 CBF -117 : -73 : C/EBP -123 : STAT-1 -113 : NF-B -49 : -40 : NF-B Ternary complex NFATp - AP1 - DNA flat files Description of an evidence (experiment, cell type, two individual interactions) Link to the TRANSFAC GENE table Link to EMBL Link to the TRANSFAC FACTOR table Cross-coupling of signal transduction pathways Membrane receptor Ca2+ dependent canal Src Ras SH2 Ras SH3 Phosphorylation Ca2+ Ca2+ GTP GDP PLC Adaptors PI3-K Ca2+ cytoplasm IP3 Calcineurin PKB/Akt P NFATp ERK JNK NFATp ERK NFATp Nucleus c-Fos P38MAPK JNK c-Fos IL-2 P P c-Jun с-Fos c-Jun Composite element P38MAPK c-Jun ATF-2 c-Jun ATF-2 ATF-2 Inducible/inducible 19 CE‘s ETS / AP-1 providing cross-coupling of Ras/Raf- and PKC-dependent signalling pathways; 15 CE‘s NFATp / AP-1 providing cross-coupling of Ca2+ - and PKC-dependent signalling pathways; Tissue-specific 32 Inducible 44 Cell-cycle dependent Dev. stagedependent Ubiquitous constitutive F1 F2 14 CE‘s NF-B / C/EBP NF-B is inducible by IL-1 and TNF-; C/EBP is inducible by IL-6. 119 1 2 39 Tissuespecific 2 3 60 Inducible 2 Cellcycle dep. 12 Dev. stagedependent Ubiquit. constitut. Inducible/constitutive 9 CE‘s ETS / Sp1 ETS factors are inducible through Ras/Raf- dependent signalling pathway; 5 CE‘s Smad / TEF3 Smads are inducible by TGF- signalling. Tissue-specific 32 Inducible 44 Cell-cycle dependent Dev. stagedependent Ubiquitous constitutive F1 F2 119 1 2 39 Tissuespecific 2 3 60 Inducible 2 Cellcycle dep. 12 Dev. stagedependent Ubiquit. constitut. Inducible/tissue-restricted CE‘s Pit-1 / AP-1 Pit1 is pituitary-restricted transcription factor whereas AP-1 and Ets are ubiquitous inducible factors; Tissue-specific 32 Inducible 44 Cell-cycle dependent Dev. stagedependent Ubiquitous constitutive F1 F2 119 1 2 39 Tissuespecific 2 3 60 Inducible 2 Cellcycle dep. 12 Dev. stagedependent Ubiquit. constitut. Mechanisms of functioning of synergistic composite elements 1) F1 F2 S2 S1 F1 F2 S1 S2 2) F1 F2 S2 S1 F1 F2 S1 S2 Cooperative binding to DNA and ternary complex formation A new protein surface for DNA recognition could be formed 3) F1 F2 S1 S2 Simultaneous interaction of activation domains with the components of the basal complex Mechanisms of functioning of synergistic composite elements 4) F1 F2 S1 S2 Forming a new protein surface for interaction with the basal complex 5) F1 F1 s1 F2 F2 s2 Relief of autoinhibition as a result of proteinprotein interactions Mechanisms of functioning of synergistic composite elements 6) DNA bending by one of the transcription factors F1 S1 F2 S2 7) DNA wrapping around a nucleosome allows transcription factors to interact F1 F2 8) HAT complex F1 F2 S1 S2 Recruitment of a HAT complex by one of the transcription factors Mechanisms of functioning of antagonistic composite elements 1) HAT complex Mutually exclusive binding of factor F1(activator) and F2 (repressor) HDAC complex Mechanisms of functioning of antagonistic composite elements 2) HAT complex Binding of F2 (repressor) results in the conformational changes of F1 (activator) HDAC complex TRANSPATH® Professional Database on signal transduction pathways TRANSPATH: map of IFN pathway TRANSPATH® TRANSFAC® TRANSPATH: molecules Extracellular ligand Membrane receptor Adaptor Second messanger Kinase(s) Transcription factor Target gene TRANSPATH: molecule hierarchy IL-1/Toll receptor family family TLRs family complexes TLR4 TLR4(h):MyD88(h) TLR4(h) TLR4(h)p TLR5 TLR4(m) ortholog TLR5(h) TLR4a(h) basic TLR4b(m) isoform modified form TRANSPATH: reactions Enzyme Educts Products •Binding •Phosphorylation •Dephosphoralation •Degradation •Acetylation •Dissociation •Transregulation •Expression •Activation •... The elementar reaction step C R A B Reaction R, catalyzed by catalyst C, converts substance A into substance B. TGF 1 R1 Pathway steps: T: TR2p TGF R-II NTP Pathway steps depict the signaling in a more biochemical way. NDP R2 TGF R-I T: TR2p :TR1p R3 Smad 2 Smad 2p R4 Smad 4 S2P: S4 R5 gene tc In a semantic reaction, just individual key molecules are given. Semantic: TGF1 TGF-RII TGF-RI Smad2 Smad4 gene R1 R2 R3 R4 R5 Info about a specific molecule Many synonyms make sure, that you find your protein External database links allow identification of proteins easily Parts of a molecule entry Specific molecule (cont.) Disease information and GO terminology localization of human APP Opens data entry of a specific reaction Parts of a molecule entry Specific reaction of APP(h) Evaluation of this reaction is based on experimental evidences Part of a reaction entry Signal transduction pathways Extracellular ligand Membrane receptor Adaptor Second messanger Kinase(s) Transcription factor Target gene Connecting path between two molecules Connection between one specific molecule (magenta) and a group of molecules (transcription factors in blue) Oncostatin M pathway B-cell antigen receptor pathway PDGF pathway Insulin pathway Overview of a pathway – hand-drawn map TRANSPATH: number of entries 12000 10000 8000 6000 4000 2000 0 Release Professional Release Professional Release Professional 3.1 2.4 2.1 molecules reactions references Statistics: TRANSPATH® 5.1 and NetPro 1.1 Main tables – Molecule – Reaction – Reference 18029 20199 8258 Molecules of mammalian origin – Human 2503 – Mouse 1653 – Rat 810 + NetPro + 7333 + 30316 + 9582 3521 2025 1224 Prediction 26 588 predicted human gene products of which 30.8% (~9000) seem to be signal transduction relevant (Venter et al., 2001) => 28% coverage of predicted proteins in TRANSPATH® TRANSFAC® System From patterns to pathways Array analysis The starting point: A set of induced genes from microarray experiments Array analysis KEGG The conventional analysis: deduce the gene products and map them to the network of metabolic pathways biochemical effects Array analysis TRANSPATH Extension of conventional analysis: map the induced gene products to the network of regulatory pathways biological effects Array analysis Identification of new targets KEGG TRANSPATH Reasoning of experimental findings: promoter analysis of induced genes connected to network mapping Array analysis Promoter analysis identifies additional target genes and extends the affected network promoter model TRANSGENOME database additional predicted genes extended predicted network Array analysis new target network analysis Causes TRANSPATH promoter analysis TRANSFAC retrieval of upstream sequences TRANSGENOME microarray: set of induced genes assignment of gene products KEGG TRANSPATH regulatory network mapping metabolic network mapping Effects modeling of effects indirect hints on causes trans cis … ? A C G T 9 8 4 8 N 2 3 2 22 T 1 1 2 25 T 0 1 2 26 T 1 13 15 0 S 0 3 26 0 G 0 29 0 0 C 0 0 29 0 G l q 0 22 7 0 C 1 8 17 3 S l I (i) f (b , i) I (i) f i i 1 l I (i) f … 15 9 3 2 M min 13 4 9 3 R 7 8 8 6 N (i) i 1 max 13 1 7 8 D (1) (i) i 1 I (i ) f (b, i) ln(4 f (b, i)) b{ A ,T ,G ,C } (2) TRANSPLORER (TRANScription exPLORER) is a software package for the analysis of transcription regulatory sequences. Currently, TRANSPLORER site prediction tool uses position weight matrices (PWM) collections. It is able to use several matrix sources: the largest and most up-to-date library of matrices derived from TRANSFAC® Professional database, other matrix libraries as well as any user-developed matrix libraries. This means that it provides an opportunity to search for a great variety of different transcription factor binding sites. A search can be made using all or subsets of matrices from the libraries. Search for most probable binding sites regulating gene expression Search for binding sites coinsiding with SNPs Mouse c-fos promoter (Matrix search for TF binding sites) 1 <------------V$IK1_01(0.86) -----...V$CREBP1CJUN_01(0.85) 2 <-----------V$IK2_01(0.90) -----...V$CREB_01(0.96) 3 ----------->V$AP2_Q6(0.87) <-------------V$GKLF_01(0.87) 4-->V$ATF_01(0.89) <-------V$MZF1_01(0.99) ----...V$ELK1_01(0.87) 5 <-----------V$AP2_Q6(0.92) <------------V$SP1_Q6(0.88) 6>V$AP1FJ_Q2(0.89) <-------------V$GKLF_01(0.85) 7>V$AP1_Q2(0.87) <-------------V$GKLF_01(0.86) 8->V$CREB_Q2(0.86) <---------V$CETS1P54_01(0.90) 9->V$CREB_Q4(0.90) <---------V$NRF2_01(0.90) 10 <-------------V$GC_01(0.88) 11 ----------->V$CAAT_01(0.87) 12 <------------V$TCF11_01(0.87) 13 ----------->V$AP2_Q6(0.87) 14 <---------V$USF_Q6(0.93) 16 --------...V$ATF_01(0.94) 17 -------...V$AP1FJ_Q2(0.95) 20 -------...V$CREBP1_Q2(0.93) 21 -------...V$CREB_Q2(0.95) 23 ---...V$IK2_01(0.85) MMCFOS_1 GAGCGCCCGCAGAGGGCCTTGGGGCGCGCTTCCCCCCCCTTCCAGTTCCGCCCAGTGACG 420 1-->V$CREBP1CJUN_01(0.85) -------------->V$BARBIE_01(0.86) 2-->V$CREB_01(0.96) -------------->V$TATA_01(0.95) 3 ----------->V$CAAT_01(0.91) --------->V$AP4_Q5(0.95) 4----------->V$ELK1_01(0.87) --------------------->V$HEN1_01(0.87) 5 --------->V$AP4_Q5(0.88) <---...V$CMYB_01(0.93) 6 <---------V$CDPCR3HD_01(0.93) --...V$VMYB_02(0.89) 7 <--------------V$TATA_01(0.88) 8 --------------------->V$HEN1_02(0.87) 9 <---------------------V$HEN1_02(0.86) 10 <-----------------V$AP4_01(0.88) 11 ----------->V$LMO2COM_01(0.93) 12 <-----------V$LMO2COM_01(0.93) 13 <-----------V$MYOD_01(0.88) 17--->V$AP1FJ_Q2(0.95) <---------V$AP4_Q6(0.99) 20---->V$CREBP1_Q2(0.93) <---------V$MYOD_Q6(0.96) 21---->V$CREB_Q2(0.95) Transcription start 23-------->V$IK2_01(0.85) 24 <=========== E2F (0.80) MMCFOS_1 TAGGAAGTCCATCCATTCACAGCGCTTCTATAAAGGCGCCAGCTGAGGCGCCTACTACTC 480 1 <-----------------V$CMYB_01(0.91) -------...V$ER_Q6(0.86) 2 <-----------V$LMO2COM_01(0.90) <----...V$TCF11_01(0.87) 3 --------->V$MYOD_Q6(0.90) -------->V$STAT_01(0.93) 4 --------->V$VMYB_01(0.89) <--------V$STAT_01(0.89) 5--------------V$CMYB_01(0.93) -------->V$LMO2COM_02(0.93) 6------>V$VMYB_02(0.89) <-----------V$CAAT_01(0.85) 7 -------->V$VMYB_02(0.88) 8 -------------->V$EVI1_04(0.86) 9 ------------->V$GATA1_02(0.93) 12 <------------V$ZID_01(0.85) 13 <----------V$CP2_01(0.97) 14 ---------->V$GATA_C(0.92) 15 ----------------->V$CMYB_01(0.86) 16 --------->V$CREL_01(0.91) 24 <=========== E2F (0.82) MMCFOS_1 CAACCGCGACTGCAGCGAGCAACTGAGAAGACTGGATAGAGCCGGCGGTTCCGCGAACGA 540 Exon 2 sequence of human thyroid transcription factor-1 (TTF-1) gene (HS198161) (Matrix search for TF binding sites) 1------------V$AHRARNT_01(0.90) <-----------------V$NF1_Q6(0.85) 2--------V$NMYC_01(0.89) --------->V$AP4_Q5(0.91) 3------>V$USF_Q6(0.89) --------->V$AP4_Q6(0.85) 4------V$USF_C(0.86) ------------...V$YY1_02(0.86) 5 --------->V$AP4_Q5(0.91) 6 --------->V$AP4_Q6(0.86) 7 --------->V$AP4_Q5(0.92) 8 --------->V$AP4_Q6(0.86) 9 --------->V$AP4_Q5(0.86) HS198161_1 ACGCGCAGCAGCAGGCGCAGCACCAGGCGCAGGCCGCGCAGGCGGCGGCAGCGGCCATCT 540 1 ----------------->V$NF1_Q6(0.96) 2 <-----------------V$NF1_Q6(0.90) 3 --------->V$USF_Q6(0.87) 4------->V$YY1_02(0.86) ---------->V$CP2_01(0.88) 5 --------->V$AP4_Q5(0.92) ----------->V$CAAT_01(0.85) 6 --------->V$AP4_Q6(0.85) --------->V$AP4_Q5(0.86) 7 ------...V$CP2_01(0.86) 8 ===========> E2F (0.81) 9 ===========> E2F (0.90) HS198161_1 CCGTGGGCAGCGGTGGCGCCGGCCTTGGCGCACACCCGGGCCACCAGCCAGGCAGCGCAG 600 1 <---------V$CETS1P54_01(0.89) <--------...V$GATA_C(0.86) 2 ----------------->V$NF1_Q6(0.85) <-------...V$GATA1_02(0.90) 3 --------->V$CETS1P54_01(0.90) <-------...V$GATA1_03(0.92) 4 <--------------------V$R_01(0.88) <-----...V$LMO2COM_02(0.90) 5 <---------------V$AHRARNT_01(0.86) 6 ----------->V$AP2_Q6(0.95) 7---->V$CP2_01(0.86) <-------...V$GATA1_04(0.87) 8 <----...V$CETS1P54_01(0.87) 9 ===========> E2F (0.80) HS198161_1 GCCAGTCTCCGGACCTGGCGCACCACGCCGCCAGCCCCGCGGCGCTGCAGGGCCAGGTAT 660 1--V$GATA_C(0.86) <---------V$CETS1P54_01(0.89) 2------V$GATA1_02(0.90) --------...V$DELTAEF1_01(0.96) 3------V$GATA1_03(0.92) <---...V$CEBPB_01(0.88) 4---V$LMO2COM_02(0.90) 5 <-----------V$IK2_01(0.92) 6 <---------------V$E47_02(0.87) 7-----V$GATA1_04(0.87) 8-----V$CETS1P54_01(0.87) 9 <--------------V$E47_01(0.86) 10 ---------->V$DELTAEF1_01(0.99) 11 <-----------V$LMO2COM_01(0.94) 12 <-----------V$MYOD_01(0.87) 13 --------->V$MYOD_Q6(0.91) 14 ------->V$USF_C(0.93) HS198161_1 CCAGCCTGTCCCACCTGAACTCCTCGGGCTCGGACTACGGCACCATGTCCTGCTCCACCT 720 Enhanceosome Recruitment of CIITA to MHC-II promoters. A prototypical MHC-II promoter (HLA-DRA) is represented schematically with the W, X, X2, and Y sequences conserved in all MHC-II, Ii, and HLA-DM promoters. RFX, X2BP, NF-Y, and an as yet undefined Wbinding protein bind cooperatively to these sequences and assemble into a stable higher order nucleoprotein complex referred to here as the MHC-II enhanceosome. CIITA is tethered to the enhanceosome via multiple weak protein-protein interactions with the W, X, X2, and Y-binding factors. The octamer site found in the HLA-DRA promoter (O), and its cognate activators (Oct and OBF1) are not required for recruitment of CIITA. CIITA is proposed to activate transcription (arrow) via its amino-terminal activation domains (AD), which contact the RNA polymerase II basal transcription machinery. Masternak K et al., Genes Dev 2000 May 1;14(9):1156-66 Recognition method for T-cell specific Composite Elements NFAT/AP-1 AP-1 NFATp 5’ ..WRGAAAA.. ..TGASTCA..3’ 8-12 bp A C G T 1 2 3 4 5 6 7 8 5 5 8 8 12 1 2 11 2 0 26 0 0 0 23 26 0 1 0 0 25 0 1 0 25 1 0 0 15 5 2 4 A C G T NFAT = -log(1-scoreNFAT) 1 2 3 4 5 6 7 8 9 19 3 16 9 4 2 5 36 4 36 3 2 4 13 33 2 29 8 5 2 0 0 0 47 2 44 0 1 47 0 0 0 2 8 24 13 AP-1 = -log(1-scoreAP-1) 6,7 5,7 4,7 3,7 NFAT/AP-1 (training) Random 2,7 Composite score 1.47 AP1 4.7 wCE 17,0 NFAT NFAT 0.88 AP1 3.5 1,7 0,7 0,7 1,2 1,7 2,2 2,7 3,2 3,7 4,2 4,7 Selection of motifs with high frequency in a window motif: WSG TTTGGCGCGAAA window: [ ] Promoters of cell-cycle genes: ............. Exon 2 sequences: ............. } } Frequency of the motifs in the window Motifs found in the local context of E2F sites in promoters of cell cycle-related genes Negative characteristics Positive characteristics N Motif () fˆ Y fˆ N 0.0048 / 0.0041 = 1.179 0.0112 / 0.0032 = 3.536 0.0851 / 0.0341 = 2.499 0.0675 / 0.0095 = 7.071 0.1233 / 0.0536 = 2.299 0.0337 / 0.0000 0.0980 / 0.0559 = 1.754 0.80 0.75 0.90 0.79 0.72 0.80 0.82 -0.394 0.9618 0.5353 0.5904 0.223 0.5036 0.595 -0.095 -0.2297 -0.261 -0.566 =-5.6767 2) Utility i Window (w)1) [27,34] [39,41] [17,38] [13,16] [17,46] [21,26] [3,69] 1 2 3 4 5 6 7 MGCG TTT CGSK HKCG VDWW DWTT GSDM 8 VWS [7,66] 0.1258 / 0.1932 = 0.651 0.91 9 10 11 HSWY VTV BAY [26,65] [19,34] [7,65] 0.0413 / 0.0813 = 0.508 0.0427 / 0.1354 = 0.315 0.0274 / 0.0614 = 0.447 0.79 0.71 0.78 Score of context: k d ( X ) i f (i , wi , X ) i 0 Human uracil DNA-glycosylase (E2F sites) -1000 +1 1000 3000 5000 7000 9000 + score of context -1000 +1 1000 3000 5000 7000 ttTTTGCCGCGAAAag q=0.92 d=2.8 (known site) 9000 SITEVIDEO system Building of E2F site recognition program (step 2) SITEVIDEO system Building of E2F site recognition program (step 3) Composite modules w (1) 1 s ( 2) 1 s (1) cut off s ( 2) 2 ( 2) cut off (k ) (k ) 1 ... nk q q (1) ( 2) ... C max w (k ) q (w) k 1, K K - number of TF matrixes (k ) avr s ... s ... Start of transcription (k ) cut off q (k ) ... Parameters of the model to be estimated (k ) q ( s q (w) i ) (k ) avr i 1, nk (k ) q ( si( k ) ) qcut off (k ) si w Composite modules w (1) 1 s (1) cut off ( 2) 1 s s ( 2) 2 ( 2) cut off (k ) (k ) 1 ... nk s ... s q q ... (1) ( 2) ... Start of transcription (k ) cut off q (k ) Genetic Algorithms ... Parameters of the model to be estimated Composite module in promoters of cell cycle-related genes Weight: qcutoff TF matrix 1.000000 0.840072 V$E2F_19 0.954483 0.737637 V$TATA_01 0.888064 0.939687 V$CREB_01 0.816179 0.941583 V$SP1_Q6 0.039746 0.839702 V$TAL1BETAE47_01 4 0 Exon-2 sequences Cell cycle-related promoters Noofsequences 3 0 2 0 C 1 0 (k ) (k ) q cut off k 1,5 0 -0 ,5 0 ,0 0 ,5 1 ,0 1 ,5 2 ,0 2 ,5 3 ,0 3 ,5 4 ,0 1 <------------V$IK1_01(0.86) -----...V$CREBP1CJUN_01(0. 2 <-----------V$IK2_01(0.90) -----...V$CREB_01(0.96) 3 ----------->V$AP2_Q6(0.87) <-------------V$GKLF_01(0.87) 4-->V$ATF_01(0.89) <-------V$MZF1_01(0.99) ----...V$ELK1_01(0.87) 5 <-----------V$AP2_Q6(0.92) <------------ V$SP1_Q6(0.88) 6>V$AP1FJ_Q2(0.89) <-------------V$GKLF_01(0.85) 7>V$AP1_Q2(0.87) <-------------V$GKLF_01(0.86) 8->V$CREB_Q2(0.86) <---------V$CETS1P54_01(0.90) 9->V$CREB_Q4(0.90) <---------V$NRF2_01(0.90) 10 <-------------V$GC_01(0.88) 11 ----------->V$CAAT_01(0.87) 12 <------------V$TCF11_01(0.87) 13 ----------->V$AP2_Q6(0.87) 14 <---------V$USF_Q6(0.93) 16 --------...V$ATF_01(0.94) 17 -------...V$AP1FJ_Q2(0.95) 20 ------- ...V$CREBP1_Q2(0.93 21 ------- ...V$CREB_Q2(0.95) 23 ---...V$IK2_01(0.85) MMCFOS_1 GAGCGCCCGCAGAGGGCCTTGGGGCGCGCTTCCCCCCCCTTCCAGTTCCGCCCAGTGACG 420 Mouse c-fos promoter 1-->V$CREBP1CJUN_01(0.85) -------------->V$BARBIE_01(0.86) 2-->V$CREB_01(0.96) --------------> V$TATA_01(0.95) 3 ----------->V$CAAT_01(0.91) --------->V$AP4_Q5(0.95) 4----------->V$ELK1_01(0.87) --------------------->V$HEN1_01(0.87) 5 --------->V$AP4_Q5(0.88) <---...V$CMYB_01(0.93) 6 <---------V$CDPCR3HD_01(0.93) --...V$VMYB_02(0.89) 7 <-------------- V$TATA_01(0.88) 8 --------------------->V$HEN1_02(0.87) 9 <---------------------V$HEN1_02(0.86) 10 <-----------------V$AP4_01(0.88) 11 ----------->V$LMO2COM_01(0.93) 12 <-----------V$LMO2COM_01(0.93) 13 <-----------V$MYOD_01(0.88) 17--->V$AP1FJ_Q2(0.95) <---------V$AP4_Q6(0.99) 20----> V$CREBP1_Q2(0.93) <---------V$MYOD_Q6(0.96) 21----> V$CREB_Q2(0.95) Transcription star 23-------->V$IK2_01(0.85) 24 <----------- E2F (0.80) MMCFOS_1 TAGGAAGTCCATCCATTCACAGCGCTTCTATAAAGGCGCCAGCTGAGGCGCCTACTACTC 480 Cell cycle composite module 1 <-----------------V$CMYB_01(0.91) -------...V$ER_Q6(0.86) 2 <-----------V$LMO2COM_01(0.90) <----...V$TCF11_01(0.87) 3 --------->V$MYOD_Q6(0.90) -------->V$STAT_01(0.93) 4 --------->V$VMYB_01(0.89) <--------V$STAT_01(0.89) 5--------------V$CMYB_01(0.93) -------->V$LMO2COM_02(0.93) 6------>V$VMYB_02(0.89) <-----------V$CAAT_01(0.85) 7 -------->V$VMYB_02(0.88) 8 -------------->V$EVI1_04(0.86) 9 ------------->V$GATA1_02(0.93) 12 <------------V$ZID_01(0.85) 13 <----------V$CP2_01(0.97) 14 ---------->V$GATA_C(0.92) 15 ----------------->V$CMYB_01(0.86) 16 --------->V$CREL_01(0.91) 24 <----------- E2F (0.82) MMCFOS_1 CAACCGCGACTGCAGCGAGCAACTGAGAAGACTGGATAGAGCCGGCGGTTCCGCGAACGA 540 MMCFOS_1 1----------->V$ER_Q6(0.86) 2--------V$TCF11_01(0.87) 3 --------->V$AP4_Q5(0.91) 4 --------->V$AP4_Q6(0.87) 5 ---------->V$AP1FJ_Q2(0.93) 6 ---------->V$AP1_Q2(0.90) 7 ---------->V$AP1_Q4(0.87) 8 <-----------V$IK2_01(0.94) GCAGTGACCGCGCTCCCACCCAGCTCTGCTCTGCAGCTCC 580 Computationally predicted E2F target genes confirmed by in vivo footprint EMBL Gene Chromatin crosslinking c-fos, Hs HSFOS JunB, Hs HS207341 tgf-1, Hs HSTGFB1P R p14ARF, Hs AF082338 Immunoprecipitation Mcm4 (Cdc21), Hs mcm5 (P1cdc46), Hs PCR Von HippelLindau (VHL), Hs B-myb, Hs HSU63630 HS286B10 AF010238 HSBMYBD NA nucleolin, Hs nucleolin, Cg nucleolin, Ms HSNUCLEO CSNUCLEO MMNUCLE O Score ,q (+) aaGCTCGCGCCACTgc (-) gcAGTGGCGCGAGCtt (-) gtCTTCGCGCGCGCtc Position rel. start of transcription -165 .. -176 -92 .. –103 -90 .. –79 -78 .. –89 79 .. 90 91 .. 80 169 .. 158 -513 .. -502 -298 .. -287 28 .. 39 40 .. 29 85 .. 96 -1384 .. -1395 -1009 .. -1020 -739 .. -750 -589 .. -578 -265 .. -276 -491 .. -502 -409 .. -420 -377 .. -366 -175 .. -164 -93 .. -82 -187 .. -176 -175 .. -186 8 .. 19 20 .. 9 -270 .. -259 -258 .. -269 -28 .. 39 (-) gtCCTGGCGCGCGGgc (+) cgCTTGGCGGGAGAta -72 .. –83 -53 .. -42 0.83 0.87 1.18 -296 -> +14 <- (-) ttTTTGGCGCCGGCtg (-) ccGTGGGCGCGCGGgt -297 .. -308 -256 .. -267 0.97 0.81 2.91 -407 -> -41 <- (-) cgTTTGGCGCGGCTtg -296 .. -307 0.97 6.67 -538 -> -198 <- (-) agTTTGGCGCGGCTtg -306 .. -317 0.97 1.76 -531 -> -232 <- Sequence of the potential sites (-) (-) (+) (-) gcCTTGGCGCGTGTcc ggGGTGGCGCGCGGgc ccTCTGGCGCCACCgt acGGTGGCGCCAGAgg (+) gcTATCGCGCCAGAga (-) tcTCTGGCGCGATAgc (-) ggGCTGGCGCGGGCgg (+) (+) (+) (-) (+) ctGTTTGCGGGGCGga ccCTTCGCGCCCTGgg ctCTTGGCGCGACGct agCGTCGCGCCAAGag ccTTTGCCGCCGGGga (-) (-) (-) (+) (-) ctCTCCGCGCGCGGga gtCTTGGCGACCGTtg ggCCTGGCGCCGGAct tgATTGGCGGATAGag acTTTCCCGCCCTGtg (-) (-) (+) (+) (+) gtTTTCGCGGGAAAac ctTTCAGCGCCCGTgc gcAGTGGCGCCTCCcg ggCGTGGCGCGGAGcc ctTGTCGCGCAGGTac (+) (-) (+) (-) agTTTCGCGCCAAAtt aaTTTGGCGCGAAAct ttTTTCCCGCGAAAct agTTTCGCGGGAAAaa 0.92 0.84 0.88 0.83 0.89 0.91 0.82 0.80 0.91 0.93 0.83 0.85 0.81 0.81 0.81 0.83 0.86 0.93 0.82 0.80 0.83 0.86 0.99 1.00 0.89 0.93 0.81 0.84 0.92 Score of context, d 2.92 Positions of PCR primers -201 -> +96 <- -27 -> +313 <3.17 2.03 -122 -> +210 <- 4.11 -404 -> -143 <- 3.53 -667 -> -330 <- 4.39 4.91 -211 -> +88 <- 3.01 4.21 -137 -> +123 <2.22 G1 G1/S S G2 G1 G1/S S G1/S-growth G1/S-cycle G2 Results of selection of a specific combinations of sites that distinguish G1/S cycle and G1/S growth promoters. (microarray data) a) Relative importance (k ) 0.141420 0.389941 0.905325 -0.595259 -0.982593 -0.814943 Cut-off value Matrix AC Matrix ID M10009 M00175 M00088 M00098 M00253 M00137 V$E2F_19 V$AP4_Q5 V$IK3_01 V$PAX2_01 V$CAP_01 V$OCT1_03 E2F and a set of additional factors can distinguish these two sets of promoters. AP-4 factors – an ubiquitous factor that have similar structure of DNA binding domains as E2F and Myc – main cell cycle regulators; IK3 (Ik-1...Ik-5 - a family of zink finger TF that play a role in development of the lymphocytes). Pax-2 factor is known to be involved in regulating cell cycle by inhibiting the p53 transcription. It is known that Oct-3 differentially phosphorylated during cell cycle and may have a role in the regulation of the G1/S growth promoters. As for Cup site, it was already speculated that the structure of the basal promoter may play an important role in differentiating gene expression during cell cycle (k ) cutoff q 0.923077 0.947434 0.838106 0.856055 0.997639 0.734697 b) Histogram of G1/S cycle vs. G1/S growth 5 4 No of obs 3 2 1 0 -1,8 -1,6 -1,4 -1,2 -1,0 -0,8 -0,6 -0,4 -0,2 0,0 0,2 Site combination score 0,4 0,6 0,8 1,0 1,2 1,4 1,6 ... Jun Fos TGASTCA AP-1 NFAT human TNF promoter -107 AP-1 mast cells -74 NFAT T-cells NF-kB dendritic cells VDR AP-1 C/EBP T-cells + ? Fuzzy puzzle hypothesis of the multipurpose structure of the eukaryotic promoters: of coding multiple regulatory messages in the same DNA sequence. A,B,C and D,E,F – two sets of TF; 1,2 – two sites in DNA; BC – basal complex. A B C 1 2 D E F There‘s More Then One Way To Do It (Convergent evolution) AXX list of genes RefSeq LocusLink symbol synonyms NM_002421 4312 MMP1 CLG, CN2 matrix metalloproteinase 1 (interstitial collagenase) NM_004530 4313 MMP2 CLG4, CLG4A matrix metalloproteinase 2 (gelatinase A, 72kD gelatinase, 72kD type IV collagenase) NM_000611 NM_001972 NM_005317 NM_005532 966 1991 3004 3429 CD59 ELA2 GZMM IFI27 CD59 antigen p18-20 (antigen identified by monoclonal MSK21, MIC11, MIN2, MIN1, MIN3 antibodies 16.3A5, EJ16, EJ30, EL32 and G344) elastase 2, neutrophil LMET1, MET1 granzyme M (lymphocyte met-ase 1) P27 interferon, alpha-inducible protein 27 NM_001548 NM_000565 NM_001565 NM_001572 NM_005564 NM_005567 3434 3570 3627 3665 3934 3959 IFIT1 IL6R SCYB10 IRF7 LCN2 LGALS3BP GARG-16, IFNAI1, G10P1, IFI56 interferon-induced protein with tetratricopeptide repeats 1 interleukin 6 receptor chemokine (C-X-C motif) ligand 10 IRF-7A interferon regulatory factor 7 NGAL lipocalin 2 (oncogene 24p3) 90K, MAC-2-BP lectin, galactoside-binding, soluble, 3 binding protein NM_002422 NM_002423 4314 MMP3 4316 MMP7 STMY, STMY1 MPSL1, PUMP-1 NM_004994 NM_004995 NM_002428 NM_002534 4318 4323 4324 4938 CLG4B MT1-MMP MT2-MMP IFI-4, OIASI, OIAS NM_002787 NM_004586 NM_007315 5683 PSMA2 6197 RPS6KA3 6772 STAT1 NM_003254 NM_003255 7076 TIMP1 7077 TIMP2 NM_000362 NM_003684 NM_006417 7078 TIMP3 8569 MKNK1 10561 IFI44 MMP9 MMP14 MMP15 OAS1 STAT91 CLGI, EPO, TIMP SFD MNK1 p44, MTAP44 matrix metalloproteinase 3 (stromelysin 1, progelatinase) matrix metalloproteinase 7 (matrilysin, uterine) matrix metalloproteinase 9 (gelatinase B, 92kD gelatinase, 92kD type IV collagenase) matrix metalloproteinase 14 (membrane-inserted) matrix metalloproteinase 15 (membrane-inserted) 2',5'-oligoadenylate synthetase 1 (40-46 kD) proteasome (prosome, macropain) subunit, alpha type, 2 ribosomal protein S6 kinase, 90kD, polypeptide 3 signal transducer and activator of transcription 1, 91kD tissue inhibitor of metalloproteinase 1 (erythroid potentiating activity, collagenase inhibitor) tissue inhibitor of metalloproteinase 2 tissue inhibitor of metalloproteinase 3 (Sorsby fundus dystrophy, pseudoinflammatory) MAP kinase-interacting serine/threonine kinase 1 interferon-induced protein 44 Extract promoters using TRANSGENOME AXX promoter set >ELA2 elastase 2, neutrophil; chrom=19p13.3; LocusLink=1991; 15-AUG-2002;length=1200 ggtatcacagggccctgggtaaactgaggcaggcgacacagctgcatgtggccggtatcacagggccctgggtaaactga ggcaggcgacacagctgcatgtggccggtatcacagggccctgggtaaactgaggcaggcgacacagctgcatgtggccg tatcacagggccctgggtaaactgaggcaggtgacacagctgcatgtggccggtatcacggggccctggataaacagagg caggcgacacagctgcatgtggccggtatcacggggccctgggtaaactgaggcaggcgaggccacccccatcaagtccc tcaggtctaggtttggcaggtttggcaaaaacacagcaacgctcggttaaatctgaatttcgggtaagtatatcctgggc ctcatttggaagagacttagattaaaaaaaaaacgtcgagaccagcccggccaacacggtgaaaccccgtctctactaaa aatacaaaaaattagccaggcgcagtggctcacgcctgtgatcccagcactctgggaggctgaggcaggcggatcacccg aggtcagatgttcaagaccagcctggccgacagggcgaaacactgtctctactacaaatacaaaaattagccgggagtgg tggcaggtgcctgtaatctcagctattcaggaggctgaggcaggagaatcacttgaacctgggaggcggaggttgccgtg agccgggatcacgccaccgcactccagcctgggcgatagagcaagactctgtctccaaaaaaataaattaaaaaacccac attgattatctgacatttgaatgcgattgtgcatcctgaattttgtctggaggccccacccgagccaatccagcgtcttg tcccccttctcccccttttcatcaacgccctgtgccaggggagaggaagtggagggcgctggccggccgtggggcaatgc aacggcctcccagcacagggctataagaggagccgggcgggcacggaggggcagagaccccggagccccagccccaccat gaccctcggccgccgactcgcgtgtcttttcctcgcctgtgtcctgccggccttgctgctggggggtgagtttttgagtc caacctcccgctgctccctctgtcccgggttctgttcccacctctccatagagggccccaccagtgtgggtccctcatcc >MMP3 matrix metalloproteinase 3 (stromelysin 1, progelatinase); chrom=11q22.3; LocusLink=4314; 15-AUG-2002;length=1200 aaagttttacaaaatgtcttcctctgaatatgtttagagtcttgcattcaagcatttattatacaccaataatgtgagca acactttacttgacaaagaaacagaaaagaaaggaaaggaagaaaacagaagagcatgaagagaaaatttaggatggatt ctgttcttcaacttcaaagcatctgctaatttgaatttagggaggaggggaaaaggttgaaagagaataagacatgtgta gaagacaaggacagagagaatttcagtccggtaagcaatgtaattcatttcagttctacaactatttatggagcagctac gtgggcccatcacccattaataaattggttacagaattaaaaccaacccaaagggaatatacttccttctttttcacaga ccctctttgttctattctgcccatgaggttttcctcctcaagaaccagcaaatccaacgacagtcaatagcaggcattac aaatcagattcagaaaaataaatcaccccttctaaatttcttctagatattatcttttatgttttgagtataattgtata tagtatagactatagctatgtatgtacactttccacttacatcttttatttgcttttataatgtctttcttaaaataaaa ctgcttttagaagttctgcacaattctgatttttaccaagtcaacctacttcttctctcaaaaggacaaacataaattgt ctagtgaattccagtcaatttttccagaagaaaaaaaatgctccagttttctcctctaccaagacaggaagcacttcctg gagattaatcactgtgttgccttgcaaaattgggaaggttgagagaaattagtaaagtaggttgtatcatcctactttga atttggaatgtttggaaatggtcctgctgccatttggatgaaagcaaggatgagtcaagctgcgggtgatccaaacaaac actgtcactctttaaaagctgcgctcccgaggttggacctacaaggaggcaggcaagacagcaaggcatagagacaacat agagctaagtaaagccagtggaaatgaagagtcttccaatcctactgttgctgtgcgtggcagtttgctcagcctatcca ttggatggagctgcaaggggtgaggacaccagcatgaaccttgttcaggtaattaacactaactgacctggccaggtggg >IL6R interleukin 6 receptor; chrom=1; LocusLink=3570; 15-AUG-2002;length=1200 ttctctccttcctttccttccttcccctctatccctccttccctccctccctccctcctcccttccttttctttctttct tttctttttttttttttctttccagacagggtctcactgtcatccaggctggagtagcagcccccaatcacggctcactg taccctggatctcccggactcaagcaattttcccacctcagcttccctagtagctgggactataggtgtgtaccaccaca cccagctaatttttaaatttttttatagaaatgggggtctcactttgttacacaggctggtctagaattcctggactgaa gcaatccacccacccggctctcccaaagtgttggggttacaggcgtgagccactgcccctggtgttagtgtctgtctgtc aagtcaggagggcagccatgaacgttctgatgtctactgagcacgtgtggcccagaccgtgtgtcaggtgtttaggtgcc atccacagaaccttcctaataaccctgggcagcataggctttcttatctctgacagatgaggaaatggagactcagattc tgaaccgaagtcacagacacagtagatggtaggtctaaatggggacccaggtctatctgactgcaaagtccaaaccgttt ccttgcctctgctgcagcctgcgaggagcagctgggcagaaagactgtgcctttacggtggtgagtcttccgatgcccaa gcctcaccccagaccgatgaaatcagaatctctggagacccgacccagacattggtgggttttagggctcctggctgatt Composite module found in the AXX promoters Importance Core cut-off Matr. Cut-off AC Matrix --------------------------------------------- --------------------------------- 0.917751 0.323077 0.640828 0.276923 1.000000 0.159172 0.877000 1.000000 0.989000 0.840000 0.756000 0.869000 0.930000 M00062 V$IRF1_01 0.948000 M00339 V$ETS1_B 0.982000 M00199 V$AP1_C 0.853000 M00037 V$NFE2_01 0.760000 M00481 V$AR_01 0.866000 M00699 V$ICSBP_Q6 Histogram (tt1.STA 2v *188c) Percent of obs y = 13 * 0,42348 * normal (x; 1,503956; 0,895746) 100% 95% 90% 85% 80% 75% 70% 65% 60% 55% 50% 45% 40% 35% 30% 25% 20% 15% 10% 5% 0% <= ,423 (,423;,847] (,847;1,27] VAR1 (1,27;1,694] (1,694;2,117] > 2,117 Interferon regulatory factor 1 Ets factors AP-1 NF-E2 – an erythroid-specific factor Androgen receptor Interferon Consensus Sequence binding protein Sites in the AXX promoter set: Yes 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 V$IRF1_01 V$ETS1_B V$AP1_C V$NFE2_01 V$AR_01 V$ICSBP_Q6 0.951000 1.742000 Char 1.941000 0.984000 0.876000 Char 0.772000 Char 1.681000 Char 0.964000 0.856000 0.764000 1.764000 Char 1.000000 0.880000 1.644000 Char 0.984000 Char 1.860000 0.939000 Char 1.987000 1.850000 0.812000 Char 0.868000 1.548000 Char 0.985000 0.862000 1.575000 Char 0.780000 Char 1.966000 0.853000 Char Char 1.921000 1.715000 Char 0.802000 Char 0.975000 1.766000 Char 1.866000 1.852000 Char 1.569000 1.892000 Char 0.760000 Char 1.886000 0.810000 Char 0.765000 Char 0.948000 0.873000 Char 1.892000 0.885000 Char Char = = = = = = = = = = = = = = = = = = = = = = = = = 0.78964 1.50025 0.77200 1.68100 1.59327 2.52852 0.63057 1.85648 2.59763 1.78836 2.44492 0.78000 1.49608 0.00000 2.33563 0.80200 2.08100 2.00731 1.87015 0.76000 2.54087 0.76500 0.54803 1.87725 0.00000 = = = = = = = = = = = = = = = = 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 Sites in the other human promoters Not 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 V$IRF1_01 V$ETS1_B V$AP1_C V$NFE2_01 V$AR_01 V$ICSBP_Q6 Char Char Char Char Char Char Char Char Char Char Char Char Char Char Char Char ELA2 elastase 2, neutrophil MMP3 matrix metalloproteinase 3 IL6R interleukin 6 receptor MMP2 matrix metalloproteinase 2 OAS1 2',5'-oligoadenylate synthetase 1 MMP1 matrix metalloproteinase 1 TIMP1 tissue inhibitor of metalloproteinas STAT1 signal transducer and activator of t MMP9 matrix metalloproteinase 9 MMP15 matrix metalloproteinase 15 MMP7 matrix metalloproteinase 7 MMP14 matrix metalloproteinase 14 CD59 CD59 antigen p18-20 LCN2 lipocalin 2 (oncogene 24p3) GZMM granzyme M (lymphocyte met-ase 1) IFI27 interferon, alpha-inducible protein TIMP3 tissue inhibitor of metalloproteinas IFIT1 interferon-induced protein with tetr IFI44 interferon-induced protein 44 MKNK1 MAP kinase-interacting serine/threon IRF7 interferon regulatory factor 7 TIMP2 tissue inhibitor of metalloproteinas LGALS3BP lectin, galactoside-binding, solu SCYB10 PSMA2 InsR Insulin pathway ? Signaling network analysis Insulin Part of the insulin signaling network in TRANSPATH InsR STAT1 Ras AhR targets Gene expression Log(Experiment/Control) Composite model correlate with the expression level log(Experiment/Control) TSS 10 8 +1000 S41 distance = 0.417599 D2:0.658627 SIG:0.000000 MIN_LENGTH 300 0.000000 3.581248 1.000000 0.933000 M00026 V$AHR_Q5 2.942371 1.000000 0.917000 M00639 V$HNF6_Q6 0.798865 0.844000 0.900000 M00220 V$SREBP1_01 0.409376 0.962000 0.926000 M00173 V$AP1_Q2 0.055716 0.959000 0.989000 M00726 V$USF2_Q6 -1.329975 1.000000 0.959000 M00235 V$AHRARNT_01 -0.713625 1.000000 0.918000 M00156 V$RORA1_01 -0.668375 0.903000 0.854000 M00201 V$CEBP_C V$AHRARNT_01 6 predicted expression -1000 4 2 0 -4 -2 0 2 4 -2 -4 real expression V$AHR_Q5 6 8 10 Composite module found in promoters of differentially expressed genes in liver of growth hormone-deficient mice (Sma1). 0.1040 * V$CETS1P54_02(0.949) -50- V$TCF4_Q5(0.908) 0.0751 * V$TCF1P_Q6(0.726) -50- V$STAT6_01(0.861) 0.0728 * V$SF1_Q6(0.684) -50- V$SMAD3_Q6(0.833) 0.0419 * V$ELK1_02(0.862) -50- V$GRE_C(0.842) 450 40 400 35 350 30 300 25 250 20 No of obs 0.0983 * V$TCF11MAFG_01(0.821) 0.0471 * V$FOXO4_01(0.961) 0.0301 * V$IPF1_Q4(0.852) 0.0410 * V$AR_01(0.851) 0.0766 * V$GR_Q6(0.971) 0.0482 * V$STAT1_02(0.995) 0.0508 * V$CEBPB_01(0.98) 0.0281 * V$STAT5A_02(0.826) 200 15 150 10 100 50 5 0 0 -0.1 0.0 Non-changed genes 0.1 0.2 0.3 0.4 Sma1 Norm 0.5 differentially expressed genes Results of the ArrayAnalyzer™ search upstream from TFs resulting in identifying: growth hormone (GH) and receptor tyrosine kinases (RTK) as potential key molecules involved in differential expression of the genes in liver of growth hormone-deficient mice (Sma1). 4 TRANSPATH and tools, ArrayAnalyzer and PathwayBuilder At the next step, one can map the transcription factors found at the previous step on the signaling network of the TRANSPATH. If the factors found are parts of the same cascades that have been suggested on the step 1, then probability is increased that those factors are responsible for the coordinated gene regulation. Feedback loops in activating immune cells through NF-AT/AP-1 cytokines, chemokines membrane receptors adaptor proteins PI3K Ras, Raf Calcineurin, Ca2+ binding proteins ERK, JNK, MAPK NF-ATs Jun, Fos NF-AT/Jun:Fos Groups that are statistically enriched by potential target genes for Jun:Fos and NFATs (as shown in the table above). Other groups that contain potential target genes for Jun:Fos and NFATs. + + c-ras htf9a Ras RanBP1 + + + Ran Raf + erk-1 ? + + c-myc c-Myc + + B-myb + + MEK + ++ + cdc2 c-Ets B-Myb + + + cycE cdk2 c-jun + + c-Fos c-Jun ++ + cycD1 + cycE _ cycE cdc2 + + + c-fos c-ets + JNK Erk-1 cdk4 + Network controlling S phase entry in response to a proliferative signal cycD3 cycD3 cdk4 cycD1 cdk4 + rb1 p pRB + e2f-1 Enzymes of nucleotide metabolism: dhfr, tk, cad _ pRB p E2F-1 Factors and enzymes of replication DNA pol , cdc6, ori1 DP-1 cdc21, cdc46, p1 co-factor ada, odc, ts Histones: H1, H2B-143,H3-143 Nucleolines S-phase entry TFBS identification via pattern search Phylogenetic footprint of promoter regions of nucleolin genes 1 <===========V$CREB_02(0.85) ============================================================================= 2 <=======V$CREB_01(0.82) MMNUCLEO GGCCCGCTCATCAGCCCGAGGGAACCCTAGG--CC------TTCCGGCGTTCT------423 MMNUCLEO TCTCCCCAC-CACACCAGGAAGTCACCTCTCTCA----------ACCTG---GAGTTATA 225 RNNUCIA1 GGCCCACTAAACGGCCCGAATGAACTCTAGG--CC------TTCCGGCGCTCT------435 1 <===========V$CREB_02(0.85) CSNUCLEO GGCC-GCGAGCTGGCCCCAGTGG-CTCTAGG--CCCTCAACTTCCGGCGCTCTCCGGCTC 450 2 <=======V$CREB_01(0.82) HSNUCLEO TGCCTCCAAAAGGGCCAACGGGAACTCCGCGGTCCCTGAACTTCCGGTGCTGGAGG---A 448 RNNUCIA1 TCTCCCACCACACACCAGGAAGTCACCTCTCTGA----------ACCTG---GAGTTATA 221 *** * *** * * * * ** ****** * * 1 <===========V$CREB_02(0.85) ============================================================================= 2 <=======V$CREB_01(0.82) MMNUCLEO -TCAGCAGGACCACGCGGCG---------------------------------------442 CSNUCLEO CCTCC-AGCACACACCAGGAAGTCACCTCTCCGAGACCGTCCCCATCAG---GAGTTAAA 229 RNNUCIA1 -CCAGCTCTTCAGCGCGGCGAACGTTCTAGGCCCCTGAGAAGTCCACCGGGAGGCGCAGG 494 1 <===============V$TH1E47_01(0.85) CSNUCLEO CTCAGCGGGAACGCGCGGCGAGCAGTTGAGGCCGCCGCGGATTCCAACGGGTTGGGGACG 510 HSNUCLEO TGGCCCTGT-GAGGCCAGAAAGTTACTTCTCCGAGGCCAGTTCCCCATGTCTGAGAAATA 229 HSNUCLEO CTCCTCGCTCCAGGGCCACCAGGAGCCGCGGC---------------------GTGAGTG 487 ** * **** **** ** **** * * *** * * * * ** * ============================================================================= ============================================================================= MMNUCLEO --------------GGGGGAAA-----GCACCGAGAAACGCCCAGACCACCTGAGCATCG 483 1 <==========V$DELTAEF1_01(0.82) RNNUCIA1 TTTCCGCTACGCGAGGGGGAAA-----TCCCCGAGAAATGCCCAGACCACCTAAGCACAG 549 MMNUCLEO CCTACCG-CGAGAGGTCACCGACATTACATGGATCGCTTGTGCACTGCTCGTA--CACAC 282 CSNUCLEO TTCGC----AGCGCGGGGGATGCTCGGGCCACCCACCACCCCCCCACCCCCCCGGCCACG 566 1 <======== ==V$DELTAEF1_01(0.87) HSNUCLEO CGTGCCGGAACCGAGGGCGGGG-----TCTCTGAGGAACTCCAAGGCTGCCCAAGCCTAC 542 RNNUCIA1 CCTACCG-CGTGAGGTCA--GAGATTAAATGGACTGTTTGTGCACTGCTCACA--CACAC 276 *** * * * ** * ** ** 1 <======== ==V$DELTAEF1_01(0.84) ============================================================================= CSNUCLEO TCTACCG-CGCGAGGTTG--GACATTAAGCGAGCTGTTTGAGCACTGCACACAGGCGCGC 286 MMNUCLEO CCGCCC--------ATGCTGCCTCGGAACACCTGAGGGAATCCGGGCCACGCCGCCACCT 535 1 <========= =V$DELTAEF1_01(0.84) RNNUCIA1 ACGTCC--------ATGCGGCGTACGGATACCTGAGGGAATCCGGGCCATACCGCCACCT 601 HSNUCLEO TCTCCCAACTTGAGGTTCT-GTGGGGTAGGGGAGGGTTCGTGACTTTCTCACAGAAAACC 288 CSNUCLEO AGGCCCGGAGCTCCAGGTAGCAGTGCAGCACTAGGCGGCGTCCGGGCCACGCCGCCCAAT 626 ** ** * ***** * * * * * * * * * * * HSNUCLEO GGACCC---------AGCCACATTGGCGAACC----GGAGACCGCCCGATTCCACCACC588 ============================================================================= ** * * ** ** *** * * ** ** 1 <=======V$NKX25_02(0.84) 2 =========>V$CETS1P54_01(0.87)============================================================================= 1 <=======V$E2F_02(1.00) MMNUCLEO ACACACGCAC------------AACTGCTTTTATTAGGAGCT----CTCAGGAAAGCGGG 326 MMNUCLEO ACCCGCG--CCTCACACACAAGCCGCGCCAAACTCGCCCGTCCCACTGCGCAGGCGTGGG 593 1 <=======V$NKX25_02(0.84) 1 <=======V$E2F_02(1.00) 2 =========>V$CETS1P54_01(0.87) RNNUCIA1 ACTCGCG--CCTCACTC--AAGCCGCGCCAAACTCGCGCGTTTCACTGCGCAGGCGTGTA 657 RNNUCIA1 ACACACGCGCGCGCGCGCGCGAAATTGCTTTTATTAGGAGCT----CTCAGGAAAGTGGT 332 1 <=======V$E2F_02(1.00) 1 =======>V$NKX25_02(0.82) TCCCCCGAGCCCCTTCCACAAGCCGCGCCAAACGGGTCTG---CACCGCGCAGGCG--GC 681 2 <==========V$DELTAEF1_01(0.81)CSNUCLEO 1 <=======V$E2F_02(1.00) 3 =========>V$CETS1P54_01(0.84) HSNUCLEO -CCCGCGCTCCCCTCAC--AGCCGGCGCCAAAAACGCCAGTCCCACGACGCAGGC----640 CSNUCLEO ACACACGCACGC----------AACTGCCTTTATTGGGAGCTGTCTCTCAGGAGAACAGC 336 * * ** ** * * * * ******** * * *** ******* 1 <=======V$NKX25_02(0.83) 2 <==========V$DELTAEF1_01(0.81) 3 =========>V$CETS1P54_01(0.86) HSNUCLEO TCGTACAGACCC-------CGCCACTGCCTTTATTAACAGCT----CTCAGGAGACTGCC 337 * ** * * *** ****** **** ******* * HSNUCLEO - Homo sapiens; ============================================================================= CSNUCLEO - Cricetulus griseus; MMNUCLEO GACTCGCATCA---TAGCCAAG----AAGCCGTTCGCGAC-TCCGCGGAGAACAGGCCGA 378 RNNUCIA1 GGCTCGCATCAGGCTACCACAGCC--AAGAGGACCGCCACCTCTACCGAGGGCAGGCCAA 390 MMNUCLEO - Mus musculus; CSNUCLEO GGCCCGCGGCGCAACACTAGAGCCCCGGGATGTTCTCGGC-TCTGCCGAGGGCAG-CCGA 394 RNNUCIA1 – Rattus norvegicus HSNUCLEO TGCAGGAGGGGGGTCGCTCCGGCC---CCATGCTCGCGGG-CAAGCAGGGATAAG--CTG 391 * * * * * * * * * ** * A T G C 1) A T 2) A T 3) A T G G G C C C Result of comparison of four different pattern discovery programs on the sets of simulated sequences with implanted TF binding sites for one matrix; y-axis: the averaged sum of squared differences between reveled matrix and the original one; x-axis: values, that are the probabilities of “consensus nucleotide” in each position of the matrix. 1,000 Kernel MEME CONSENSUS GIBBS 0,800 0,600 0,400 0,200 GIBBS CONSENSUS M EM E 0,000 Kernel 0,65 0,7 0,75 0,8 0,85 0,9 0,95 Table 1. Comparison of 3 programs performing the best for the low levels of value. 0,65 0,7 Kernel 0,205 0,165 MULTIPROFILER 0,208 0,255 PROJECTION 0,260 0,304 Three mechanisms of biopolymer evolution Gradual evolution by fixation of multiple substitutions (Protein functional centres) Edited bipolymer by fixation of a small number of substitutions (Protein folding) Evolution at once by fixation of single substitutions (Regulatory regions of eukaryotic genes) Thank you ! www.biobase.de