Transcript Document
Manolis Kellis modENCODE analysis group January 11, 2007 Part 1: Target identification: comparative vs. exprmt. (really the topic for today) Part 2: Target validation (optional) Part 3: Motif discovery (optional) Part 4: Enhancer identification (optional) Part 1 Identifying targets using comparative genomics Evolutionary signatures of motif instances • Allow for motif movements – Sequencing/alignment errors – Loss, movement, divergence • Measure branch-length score – Sum evidence along branches – Close species little contribution BLS: 25% Mef2:YTAWWWWTAR BLS: 83% Motif confidence selects functional instances Transcription factor motifs Confidence Increasing BLS Increasing confidence Confidence selects functional regions Confidence Confidence selects in vivo bound sites High sensitivity microRNA motifs Increasing BLS Increasing confidence Confidence selects functional regions Confidence selects positive strand Initial regulatory network for an animal genome • ChIP-grade quality – Similar functional enrichment – High sens. High spec. • Systems-level – – – – 81% of Transc. Factors 86% of microRNAs 8k + 2k targets 46k connections • Lessons learned – Pre- and post- are correlated (hihi/lolo) – Regulators are heavily targeted, feedback loop Network captures literature-supported connections Network captures co-expression supported edges Red = co-expressed Grey = not co-expressed Named = literature-supported Bold = literature-supported 46% of edges are supported (P=10-3) ChIP vs. conservation: similar power / complementary • Together: best complementary • Bound but not conserved: reduced enrichmnt Selects functional • All-ChIP vs. Allcons: similar enr. Similar power • Cons-only vs. ChIP-all: similar Additional sites Part 2 Cool story of miRNA targets for a new anti-sense miRNA Surprise: miR-Anti-sense function • • • • A single miRNA locus transcribed from both strands Both processed to mature miRNAs: mir-iab-4, miR-iab-4AS (anti-sense) The two miRNAs show distinct expression domains (mutually exclusive) The two show distinct Hox targets – another Hox master regulator Sensory bristles wing haltere wing w/bristles haltere WT wing sense • Mis-expression of mir-iab-4S & AS: altereswings homeotic transform. • Stronger phenotype for AS miRNA • Sense/anti-sense pairs as general building blocks for miRNA regulation • 9 new anti-sense miRNAs in mouse Antisense Note: C,D,E same magnification Surprise: miR-Anti-sense function Part 3 (optional) Discovering motifs Evolutionary signatures for regulatory motifs 5’-UTR Known engrailed site (footprint) D.mel D.sim D.sec D.yak D.ere D.ana 3’-UTR CAGCT--AGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTC CAGCT--AGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTC CAGCT--AGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTC CAGC--TAGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTC CAGCGGTCGCCAAACTCTCTAATTAGCGACCAAGTC-CAAGTC CACTAGTTCCTAGGCACTCTAATTAGCAAGTTAGTCTCTAGAG ** * * *********** * **** * ** D.mel D. ere D. ana D. pse. • Individual motif instances are preferentially conserved • Measure conservation across entire genome – Over thousands of motif instances Increased discovery power – Couple to rapid enumeration and rapid string search De novo discovery of regulatory motifs Power of evolutionary signatures for motif discovery 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Consensus CTAATTAAA TTKCAATTAA WATTRATTK AAATTTATGCK GCAATAAA DTAATTTRYNR TGATTAAT YMATTAAAA AAACNNGTT RATTKAATT GCACGTGT AACASCTG AATTRMATTA TATGCWAAT TAATTATG CATNAATCA TTACATAA RTAAATCAA AATKNMATTT ATGTCAAHT ATAAAYAAA YYAATCAAA WTTTTATG TTTYMATTA TGTMAATA TAAYGAG AAAKTGA AAANNAAA RTAAWTTAT TTATTTAYR MCS 65.6 57.3 54.9 54.4 51 46.7 45.7 43.1 41.2 40 39.5 38.8 38.2 37.8 37.5 36.9 36.9 36.3 36 35.6 35.5 33.9 33.8 33.6 33.2 33.1 32.9 32.9 32.9 32.9 Matches to known engrailed (en) reversed-polarity (repo) araucan (ara) paired (prd) ventral veins lacking (vvl) Ultrabithorax (Ubx) apterous (ap) abdominal A (abd-A) fushi tarazu (ftz) broad-Z3 (br-Z3) Antennapedia (Antp) Abdominal B (Abd-B) extradenticle (exd) gooseberry-neuro (gsb-n) Deformed (Dfd) Expression enrichment Promoters 25.4 5.8 11.7 4.5 13.2 16 7.1 7 20.1 3.9 17.9 10.7 19.5 5.8 14.1 1.8 5.4 3.2 3.6 2.4 57.2 5.3 6.3 6.7 8.9 4.7 7.6 449.7 11 30.7 Enhancers 2 4.2 2.6 16.5 0.3 3.3 1.7 2.2 4.3 0.7 1.2 2 5.4 1.7 2.8 0 4.6 -0.5 0.6 6 1.7 1.6 2.7 0.3 0.8 0.8 Ability to discover full dictionary of regulatory motifs de novo Tissue-specific enrichment and clustering Functional clusters emerge • Infer candidate functions for novel motifs • Reveal ‘modules’ of co-operating motifs Discovered motifs show positional biases • May represent new core promoter elements • Show enrichment in distinct functional categories Recognizing functional motifs in coding regions miRNAs Top motifs • Challenge: – Overlapping selective pressures – Most ‘motifs’ from di-codon biases – Hundreds of motifs due to noise • Solution: – Test each frame offset separately – Di-codon biases Frame biased – True motifs Frame unbiased • Result: – Top 20 motifs 11 miRNA seeds – (before: 11 seeds in 200+ motifs) Ability to distinguish overlapping pressures Evidence of miRNA targeting in coding reg. miRNA targeting in protein-coding regions • MicroRNA seeds are specifically selected • Coding & 3’UTRs show same conservation profile Part 4 (optional) Characterizing enhancers Developmental enhancer identification in Drosophila • Supported by tiling arrays and regulatory motifs (nucleotide resolution) • Identify nearly all known enhancers (20 of 22 highly bound) Bound in vivo. Conserved D/Tw/Sn motifs in 12 flies. Clear DV expression pattern (lacZ/end). • Large number of novel enhancers (428 Dorsal/Twi/Sna). They validate! Surprise 1: AP genes targeted by DV regulators • Novel DorsoVentral enhancers in known AntPosterior genes – Bound in vivo by DV genes (by all three DV master regulators) – Show evolutionarily conserved motifs for all three DV factors – Yet, found in known AP genes, with clear AP expression patterns Integration of DV and AP patterning networks Surprise 2: Some silent genes show Pol II binding Active Repressed Poised • Distinct modes of Pol II occupancy – Active genes (27%): Pol II throughout the gene, transcribing – Repressed genes (37%): Pol II simply absent, no expression • Third class (12%): Pol II found only at the TSS, stalled – Qualitatively different: abundantly bound, but strongly punctate – Genes not expressed: known repressed genes, confirmed by arrays – Enriched in development, neurogenesis, ectoderm, muscle differ. • Hypothesis: Developmental genes poised for expression – Reminiscent of ‘bivalent’ K4/K27 domains in mammals Surprise 3: Master regulators also bind downstream targets • Abundant feed-forward loops in DV patterning • Cooperation of master reg. & downstream reg. Manolis Kellis - modENCODE analysis - summary • Part 1: Target identification – Comp. vs. Expt: each has unique advantages – Bound & not conserved appear less functional! • Part 2: Target validation (for anti-sense miRNA) – It’s nice when expected outcome comes true – Need more collaborations for target validation • Part 3: Motif discovery – Methods for genome-wide motif discovery – Expect increased power in bound regions • Part 4: Enhancer identification – Many new enhancers – with motifs & validation – AP / DV system cross-talk – expect dense network – PolII stalling: spatial dynamics matter Who’s actually doing the work Main contributors: Alex Stark Pouya Kheradpour Julia Zeitlinger Collaborators: Targets Sushmita Roy @ UNM iab-4AS Natascha Bushati, Steve Cohen @ EMBL Julius Brennecke, Greg Hannon @ CSHL Calvin Jan, David Bartel @ Whitehead Enhancers Julia Zeitlinger, Rick Young @ Whitehead Robert Zinzen, Mike Levine @ UC Berkeley