Transcript Document

Comparative genomics in flies and mammals
Manolis Kellis
Broad Institute of MIT and Harvard
MIT Computer Science & Artificial Intelligence Laboratory
Resolving power in mammals, flies, fungi
12 flies
~20 fungi
Pre-dup Post-duplication
32 mammals
9 Yeasts
P
P
P
P
P
Haploid
8 Candida
Diploid
P
Many species lead to high resolving power in close distances
Comparative genomics and evolutionary signatures
• Comparative genomics can reveal functional elements
– For example: exons are deeply conserved to mouse, chicken, fish
– Many other elements are also strongly conserved: exons / regulatory?
• Can we also pinpoint specific functions of each region? Yes!
– Patterns of change distinguish different types of functional elements
– Specific function  Selective pressures  Patterns of mutation/inse/del
• Develop evolutionary signatures characteristic of each function
1. Evolutionary signature of protein-coding genes
• Revise protein-coding gene catalogue
Protein-coding evolution vs. nucleotide conservation
High protein-coding signal, low conservation
 Evolutionary signatures highly sensitive
High conservation, but not protein-coding
 Evolutionary signatures highly specific
Annotated FlyBase gene
Existing cDNA data
New predicted exon
cDNA validation (iPCR)
2. Evolutionary signatures of RNA genes
• Typical substitutions
– Compensatory changes
– G:C  G:U … G:U  A:U
• Prediction methodology
– Jakob Pedersen: EvoFold with
very stringent parameters
Reveal novel RNA genes and structures
•
•
•
•
Intronic: enriched in A-to-I editing, also novel ncRNAs
Coding: A-to-I editing, also translational regulation
3’UTRs: enriched in regulators of mRNA localization
5’UTRs: translational regulation, ribosomal proteins
- 3’ & 5’UTR structures mostly on coding strand (75% & 80%)
3. Structural and evolutionary signatures of miRNAs
• Recognize miRNA hairpin
Discover novel miRNAs
– Length of hairpin & length of arms
– Fold stability, symm/assym bulges
– Conservation profile: high|low|high
• Pinpoint mature miRNA 5’end
–
–
–
–
Perfect 8mer conservation at start
Predominance of 5’U (78%)
Number of paired bases is bound
Complementary to 3’UTR motifs
Revise existing miRNAs
4. Evolutionary signatures for regulatory motifs
Known
engrailed
site
(footprint)
D.mel
D.sim
D.sec
D.yak
D.ere
D.ana
CAGCT--AGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTC
CAGCT--AGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTC
CAGCT--AGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTC
CAGC--TAGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTC
CAGCGGTCGCCAAACTCTCTAATTAGCGACCAAGTC-CAAGTC
CACTAGTTCCTAGGCACTCTAATTAGCAAGTTAGTCTCTAGAG
**
*
* *********** *
**** * **
D.mel
D. ere
D. ana
D. pse.
Motifs discovered
- Recover known regulators
- Many novel motifs
Evidence for novel motifs
-Tissue-specific enrichment
-Functional enrichment
-In promoters & enhancers
Surprises
- Core promoter elements
- miRNA motifs in coding ex.
Functions of discovered motifs
Tissue-specific enrichment and clustering
Positional biases
miRNA targeting in coding regions
5. Evolutionary signatures of motif instances
• Allow for motif movements
– Sequencing/alignment errors
– Loss, movement, divergence
• Measure branch-length score
– Sum evidence along branches
– Close species little contribution
BLS: 25%
Mef2:YTAWWWWTAR
BLS: 83%
Motif confidence selects functional instances
Transcription factor motifs
Confidence
Increasing BLS 
Increasing confidence
Confidence selects
functional regions
Confidence
Confidence selects
in vivo bound sites
High
sensitivity
microRNA motifs
Increasing BLS 
Increasing confidence
Confidence selects
functional regions
Confidence selects
positive strand
6. Initial regulatory network for an animal genome
• ChIP-grade quality
– Similar functional
enrichment
– High sens. High spec.
• Systems-level
–
–
–
–
81% of Transc. Factors
86% of microRNAs
8k + 2k targets
46k connections
• Lessons learned
– Pre- and post- are
correlated (hihi/lolo)
– Regulators are heavily
targeted, feedback loop
Network captures literature-supported connections
Network captures co-expression supported edges
Red = co-expressed
Grey = not co-expressed
Named = literature-supported
Bold = literature-supported
7. ChIP vs. conservation: similar power / complementary
• Together: best
 complementary
• Bound but not
conserved:
reduced enrich.
 Selects functional
• All-ChIP vs. Allcons: similar enr.
 Similar power
• Cons-only vs.
ChIP-all: similar
 Additional sites
12000
12
10k
10000
10
8k
8000
8
6k
6000
6
4k
4000
4
2k
2
2000
0
0
ert
28
v
d+
no
nm
am
m
All vertebr.
(9.66)
12
hm
r
~6X
9h
+n
on
ma
am
mamm.
(4.33)
20
m
pl-mam
(3.36)
18
pm
am
rd
HMRD
(0.74)
H+non-mamm. HMRD+
non-mam
(6.36)
(6.96)
Total branch length of inf. species
(80% confidence)
11,000 instances
~6X
4h
m
miRNA motif instances recovered (80%)
Recovery of regulatory motif instances in mammals
• Performance increases with branch length (requires closely-related species)
- Measure number of recovered motif instances at a fixed confidence (80%) / FDR (20%)
- Discovery power: 6-fold higher than HMRD (Branch length also ~6-fold higher)
• With 20 currently-aligned mammals:
- Transcription factor motifs:
- microRNA motifs
47 TFs
| 16,000 instances | 340 targets on avg
21 miRNAs | 11,000 instances | 523 targets on avg
• An initial regulatory network for mammalian genomes
New insights into animal biology
1.Large-scale evidence of translational read-through
Protein-coding
conservation
Stop codon
read through
Continued protein-coding
conservation
2nd stop
codon
No more
conservation
• New mechanism of post-transcriptional control.
– Hundreds of fly genes, handful of human genes.
– Enriched in brain proteins, ion channels.
– Experiments show ADAR necessary & sufficient (Reenan Lab).
• Many questions remain
– A-to-I editing of stop codon TAG|TGA|TAA  TGG
– Cryptic splice sites? RNA secondary structure?
2. Stop codon read-through in mammals
Four candidates found: GPX2, OPRK1, OPRL1, GRIA2, mostly neuronal
A look at FOXP2 – Possible 3’UTR function (not in fish, yes in frog)
3. New insights into miRNA regulation: miRNA* function
• Both miRNA arms can be functional
– High scores, abundant processing, conserved targets
– Hox miRNAs miR-10 and miR-iab-4 as master Hox regulators
4. New insights into miRNA regulation: miR-AS function
•
•
•
•
A single miRNA locus transcribed from both strands
Both processed to mature miRNAs: mir-iab-4, miR-iab-4AS (anti-sense)
The two miRNAs show distinct expression domains (mutually exclusive)
The two show distinct Hox targets – another Hox master regulator
Sensory bristles
wing
haltere
wing
w/bristles
haltere
WT
wing
sense
• Mis-expression of mir-iab-4S & AS:
altereswings homeotic transform.
• Stronger phenotype for AS miRNA
• Sense/anti-sense pairs as general
building blocks for miRNA regulation
• 9 new anti-sense miRNAs in mouse
Antisense
Note: C,D,E same magnification
5. New insights into miRNA regulation: miR-AS function
Summary of Contributions
• Evolutionary signatures specific to each function
–
–
–
–
–
Protein-coding genes: Revised catalogue affects 10% of genes
RNA: hundreds of new high-confidence structures discovered
miRNAs: ~double number of genes, families, targeting density
Motifs: ~double number of motifs, tissue & positional enrichment
Targets: ChIP-grade quality, global scale, experimental support
• New insights on animal biology
–
–
–
–
–
Genes: Abundant stop codon read-through in neuronal proteins
RNA: Abundant structures in RNA editing, translational regulation
Motifs: Coding regions show miRNA targeting
miRNAs: miR/miR* and sense/anti-sense pairs: building blocks
Networks: TF vs. miRNA targets redundancy and integration
• Methods are general, applicable in any species
Next steps: Drosophila and Human ENCODE
• modENCODE: White / Ren / Kellis / Posakony
– Hundreds of sequence-specific factors
– Dozens of chromatin / histone modifications
– Dozens of tissues / stages / conditions
• humENCODE: Bernstein / Lander / Kellis / Broad
– ChIP-seq for dozens of chromatin modifications
– Follow differentiation lineages – activation inactivation
– Discover tissue-specific regulatory motifs
• Many open questions remain
– Dynamics of tissue-specific regulatory networks
– Sequence determinants of chromatin establ. & maint
– Global views of pre- & post-transcriptional regulation
• Many open positions remain (postdoc/grad/ugrad)
Acknowledgements
Alex
Stark
Mike Lin
Pouya
Kheradpour
Matt
Rasmussen
Genes
FlyBase, BDGP, Bill Gelbart, Sue Celniker, Lynn Crosby
miRNAs
Leo Parts, Julius Brennecke, Greg Hannon, David Bartel
iab-4AS
Natascha Bushati, Steve Cohen, Julius, Greg Hannon
12-flies
Andy Clark, Mike Eisen, Bill Gelbart, Doug Smith
24 mammals Sante Gnerre, Michele Clamp, Manuel Garber, Eric Lander