Transcript Document
Comparative genomics in flies and mammals
Manolis Kellis
Broad Institute of MIT and Harvard
MIT Computer Science & Artificial Intelligence Laboratory
Resolving power in mammals, flies, fungi
12 flies
~20 fungi
Pre-dup Post-duplication
32 mammals
9 Yeasts
P
P
P
P
P
Haploid
8 Candida
Diploid
P
Many species lead to high resolving power in close distances
Comparative genomics and evolutionary signatures
• Comparative genomics can reveal functional elements
– For example: exons are deeply conserved to mouse, chicken, fish
– Many other elements are also strongly conserved: exons / regulatory?
• Can we also pinpoint specific functions of each region? Yes!
– Patterns of change distinguish different types of functional elements
– Specific function Selective pressures Patterns of mutation/inse/del
• Develop evolutionary signatures characteristic of each function
1. Evolutionary signature of protein-coding genes
• Revise protein-coding gene catalogue
Protein-coding evolution vs. nucleotide conservation
High protein-coding signal, low conservation
Evolutionary signatures highly sensitive
High conservation, but not protein-coding
Evolutionary signatures highly specific
Annotated FlyBase gene
Existing cDNA data
New predicted exon
cDNA validation (iPCR)
2. Evolutionary signatures of RNA genes
• Typical substitutions
– Compensatory changes
– G:C G:U … G:U A:U
• Prediction methodology
– Jakob Pedersen: EvoFold with
very stringent parameters
Reveal novel RNA genes and structures
•
•
•
•
Intronic: enriched in A-to-I editing, also novel ncRNAs
Coding: A-to-I editing, also translational regulation
3’UTRs: enriched in regulators of mRNA localization
5’UTRs: translational regulation, ribosomal proteins
- 3’ & 5’UTR structures mostly on coding strand (75% & 80%)
3. Structural and evolutionary signatures of miRNAs
• Recognize miRNA hairpin
Discover novel miRNAs
– Length of hairpin & length of arms
– Fold stability, symm/assym bulges
– Conservation profile: high|low|high
• Pinpoint mature miRNA 5’end
–
–
–
–
Perfect 8mer conservation at start
Predominance of 5’U (78%)
Number of paired bases is bound
Complementary to 3’UTR motifs
Revise existing miRNAs
4. Evolutionary signatures for regulatory motifs
Known
engrailed
site
(footprint)
D.mel
D.sim
D.sec
D.yak
D.ere
D.ana
CAGCT--AGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTC
CAGCT--AGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTC
CAGCT--AGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTC
CAGC--TAGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTC
CAGCGGTCGCCAAACTCTCTAATTAGCGACCAAGTC-CAAGTC
CACTAGTTCCTAGGCACTCTAATTAGCAAGTTAGTCTCTAGAG
**
*
* *********** *
**** * **
D.mel
D. ere
D. ana
D. pse.
Motifs discovered
- Recover known regulators
- Many novel motifs
Evidence for novel motifs
-Tissue-specific enrichment
-Functional enrichment
-In promoters & enhancers
Surprises
- Core promoter elements
- miRNA motifs in coding ex.
Functions of discovered motifs
Tissue-specific enrichment and clustering
Positional biases
miRNA targeting in coding regions
5. Evolutionary signatures of motif instances
• Allow for motif movements
– Sequencing/alignment errors
– Loss, movement, divergence
• Measure branch-length score
– Sum evidence along branches
– Close species little contribution
BLS: 25%
Mef2:YTAWWWWTAR
BLS: 83%
Motif confidence selects functional instances
Transcription factor motifs
Confidence
Increasing BLS
Increasing confidence
Confidence selects
functional regions
Confidence
Confidence selects
in vivo bound sites
High
sensitivity
microRNA motifs
Increasing BLS
Increasing confidence
Confidence selects
functional regions
Confidence selects
positive strand
6. Initial regulatory network for an animal genome
• ChIP-grade quality
– Similar functional
enrichment
– High sens. High spec.
• Systems-level
–
–
–
–
81% of Transc. Factors
86% of microRNAs
8k + 2k targets
46k connections
• Lessons learned
– Pre- and post- are
correlated (hihi/lolo)
– Regulators are heavily
targeted, feedback loop
Network captures literature-supported connections
Network captures co-expression supported edges
Red = co-expressed
Grey = not co-expressed
Named = literature-supported
Bold = literature-supported
7. ChIP vs. conservation: similar power / complementary
• Together: best
complementary
• Bound but not
conserved:
reduced enrich.
Selects functional
• All-ChIP vs. Allcons: similar enr.
Similar power
• Cons-only vs.
ChIP-all: similar
Additional sites
12000
12
10k
10000
10
8k
8000
8
6k
6000
6
4k
4000
4
2k
2
2000
0
0
ert
28
v
d+
no
nm
am
m
All vertebr.
(9.66)
12
hm
r
~6X
9h
+n
on
ma
am
mamm.
(4.33)
20
m
pl-mam
(3.36)
18
pm
am
rd
HMRD
(0.74)
H+non-mamm. HMRD+
non-mam
(6.36)
(6.96)
Total branch length of inf. species
(80% confidence)
11,000 instances
~6X
4h
m
miRNA motif instances recovered (80%)
Recovery of regulatory motif instances in mammals
• Performance increases with branch length (requires closely-related species)
- Measure number of recovered motif instances at a fixed confidence (80%) / FDR (20%)
- Discovery power: 6-fold higher than HMRD (Branch length also ~6-fold higher)
• With 20 currently-aligned mammals:
- Transcription factor motifs:
- microRNA motifs
47 TFs
| 16,000 instances | 340 targets on avg
21 miRNAs | 11,000 instances | 523 targets on avg
• An initial regulatory network for mammalian genomes
New insights into animal biology
1.Large-scale evidence of translational read-through
Protein-coding
conservation
Stop codon
read through
Continued protein-coding
conservation
2nd stop
codon
No more
conservation
• New mechanism of post-transcriptional control.
– Hundreds of fly genes, handful of human genes.
– Enriched in brain proteins, ion channels.
– Experiments show ADAR necessary & sufficient (Reenan Lab).
• Many questions remain
– A-to-I editing of stop codon TAG|TGA|TAA TGG
– Cryptic splice sites? RNA secondary structure?
2. Stop codon read-through in mammals
Four candidates found: GPX2, OPRK1, OPRL1, GRIA2, mostly neuronal
A look at FOXP2 – Possible 3’UTR function (not in fish, yes in frog)
3. New insights into miRNA regulation: miRNA* function
• Both miRNA arms can be functional
– High scores, abundant processing, conserved targets
– Hox miRNAs miR-10 and miR-iab-4 as master Hox regulators
4. New insights into miRNA regulation: miR-AS function
•
•
•
•
A single miRNA locus transcribed from both strands
Both processed to mature miRNAs: mir-iab-4, miR-iab-4AS (anti-sense)
The two miRNAs show distinct expression domains (mutually exclusive)
The two show distinct Hox targets – another Hox master regulator
Sensory bristles
wing
haltere
wing
w/bristles
haltere
WT
wing
sense
• Mis-expression of mir-iab-4S & AS:
altereswings homeotic transform.
• Stronger phenotype for AS miRNA
• Sense/anti-sense pairs as general
building blocks for miRNA regulation
• 9 new anti-sense miRNAs in mouse
Antisense
Note: C,D,E same magnification
5. New insights into miRNA regulation: miR-AS function
Summary of Contributions
• Evolutionary signatures specific to each function
–
–
–
–
–
Protein-coding genes: Revised catalogue affects 10% of genes
RNA: hundreds of new high-confidence structures discovered
miRNAs: ~double number of genes, families, targeting density
Motifs: ~double number of motifs, tissue & positional enrichment
Targets: ChIP-grade quality, global scale, experimental support
• New insights on animal biology
–
–
–
–
–
Genes: Abundant stop codon read-through in neuronal proteins
RNA: Abundant structures in RNA editing, translational regulation
Motifs: Coding regions show miRNA targeting
miRNAs: miR/miR* and sense/anti-sense pairs: building blocks
Networks: TF vs. miRNA targets redundancy and integration
• Methods are general, applicable in any species
Next steps: Drosophila and Human ENCODE
• modENCODE: White / Ren / Kellis / Posakony
– Hundreds of sequence-specific factors
– Dozens of chromatin / histone modifications
– Dozens of tissues / stages / conditions
• humENCODE: Bernstein / Lander / Kellis / Broad
– ChIP-seq for dozens of chromatin modifications
– Follow differentiation lineages – activation inactivation
– Discover tissue-specific regulatory motifs
• Many open questions remain
– Dynamics of tissue-specific regulatory networks
– Sequence determinants of chromatin establ. & maint
– Global views of pre- & post-transcriptional regulation
• Many open positions remain (postdoc/grad/ugrad)
Acknowledgements
Alex
Stark
Mike Lin
Pouya
Kheradpour
Matt
Rasmussen
Genes
FlyBase, BDGP, Bill Gelbart, Sue Celniker, Lynn Crosby
miRNAs
Leo Parts, Julius Brennecke, Greg Hannon, David Bartel
iab-4AS
Natascha Bushati, Steve Cohen, Julius, Greg Hannon
12-flies
Andy Clark, Mike Eisen, Bill Gelbart, Doug Smith
24 mammals Sante Gnerre, Michele Clamp, Manuel Garber, Eric Lander