Ensembl. Going beyond A,T, G and C - Swiss

Download Report

Transcript Ensembl. Going beyond A,T, G and C - Swiss

Ensembl.
Going beyond A,T, G and C
Ewan Birney
There is more to life than proteins
(but not much)
Ensembl
ENCODE
Reactome
Human/Mouse
Human
Reconcile
with
Genome
Project orthologous
proteins onto genome
Mouse/Other
Mammals
an
-U
H
ni
um
Sw
an
-3
-U
3
H
ni
um
Sw
an
-3
-U
4
ni
Sw
-3
H
5
um
an
-R
H
ef
um
Se
an
q33
Re
H
um
fS
eq
an
-3
-R
4
ef
Se
q35
M
ou
se
-U
M
ni
ou
Sw
se
-3
-U
0
M
n
iS
ou
w
se
-3
-U
2
M
n
iS
ou
w
se
-3
-U
3
ni
Sw
-3
4
M
ou
se
-R
M
ef
ou
Se
se
q-R
30
M
ef
ou
Se
se
q-R
32
M
ef
ou
Se
se
q-R
33
ef
Se
q34
H
um
Increase in quality
100%
80%
60%
40%
20%
0%
Missing
Matching
Edge perfect
Identical
Chicken
Chicken
Mouse
Human
Reconcile
with
Genome
Project orthologous
proteins onto genome
Chicken
•
•
•
•
•
Extant dinosaur lineage
Split from mammals 300 Mya
Neutral rate of 1.5
substitutions per base
No pseudogenes
Good synteny to human
•
Tested Ensembl Gene Build:
–
–
–
90% Perfect exon boundary
prediction
4% within 10 base pairs
85% sensitivity
StickleBack
• “close” to
Fugu/Tetraodon
• 21,135 Genes
• 97% Gene Loci
sensitivity (held out
cDNAs)
• 87% exact exon
prediction, 6%
overlapping
• 63% of cDNAs had a
perfect prediction
without cDNA evidence
5
23
19 species currently in Ensembl
8 to be added by the end of the year
* already in pre-site
91
92
41
?
95
83
105
Eutheria
170
310
Amniota
Tetrapoda 360
450
550
?
Mammalia
Chimpanzee
Rhesus macaque
Mouse
Rat
Rabbit *
Dog
Cow
Armadillo *
Elephant *
Tenrec *
Metatheria
Opposum
Aves
Vertebrata
Human
Chicken
Amphibia
Xenopus
140
Teleostei
70
Chordata
990
?
25
Zebrafish
Medaka
Stickleback
Tetraodon, GENOSCOPE
Fugu, IMCB
?
Urochordata
1500?
70?
250
340
Arthropoda
C. savignyi *
C. intestinalis
Fever mosquito*, VECTORBASE
Malaria mosquito, VECTORBASE
Fruitfly, FLYBASE
Honey bee
Nematoda
C. elegans WORMBASE
Fungi
Yeast, SGD
Million years
1000
500
400
300
200
100
Example of the Insulin cluster
and data flattening
one2many
apparent
one2one
one2one
many2many
one2one
Duplication node
Speciation node or leaf
Gene tree : 1st data assessment
Good concordance with the classical BRH/RHS paired
species approach (RHS are based on gene order conservation)
Find more complex one-to-many and many-to-many relations
RHS
BRH
NEW
many2many
177
113
1,439
one2many
725
1,309
2,815
one2one
205
10,73
6
109
1,571
104
apparent
one2one
lost
78
19,0012,060
2,027
BRH
NEW
many2many
170
1,599
one2many
1,870
4,563
one2one
880
80
apparent
one2one
2,040
241
lost
620
5,580
To do : compare with ~1000 curated trees from TreeFam
11,443
Human/Drosophila
19,381
Human/Mouse
Example of AlignSliceView between
Human/Mouse/Rat/Dog with MLAGAN
Transcript
SNP View
Ensembl Outreach
How do you get it?
•
www ensembl org
– Pretty pictures for genomes and genes
– Web based data mining
•
Open MySQL server - ensembldb
– Script across the internet in Perl, Java or Python
– 100% consistent semantics between genomes
•
Extend via DAS
– At genome, protein or “gene” levels
•
Full download
– Extend in house, run in-house DAS servers
•
•
•
Send someone to us (geek for a week)
Bring over Xose to run a course (only travel costs need to
be covered)
Email [email protected] for more info.
The ENCODE project
1% of the human
The Kitchen Sink of
experimental methods
Protein coding loci are far
more complex than we think
• On average 5 transcripts per
locus
• Many do not encode proteins (as
far as we can see)
• Even the ones which do encode
proteins, many of these proteins
look “weird”
The Clade B Serpins
a inactive, "stressed"
Potential
Missing fragments
b active (beta inserted)
(c)
(d)
(e)
(f)
Parsing the regulatory code
E2F2
PolII
H3K4Me3
Myc
Chromatin marks, Polymerase
In vivo Transcription Factors
Nimblegen
Data
Tab2MAGE
Local
MAGE-ML
Import API
Client
FuncGen DB
(Archive?)
Export DAS
API
Import API
Import API
Processed
Data
Mirror
FuncGen DB
(& Results?)
Export API
Web API?
FuncGen
Results
DB?
Analysis
Pipeline
API
?
Browsers:
QuickTime™ and a
TIFF (LZW) decompress or
QuickTiare
me™
a nd
needed
to a
s ee this pic ture.
TIFF Wiggle
(LZW) de comPlot
press or
are need ed to se e th is p icture.
Histone Glyphs
Raw Data?
Reactome
Pathways…
Insulin binds the insulin receptor, causing it to
dimerise. The dimerised form the autophosphorylates
on 6 cytoplasmic tyrosines. This phosphorylated
form recruits the IRS adaptor....
Reactome data model
Insulin
Peptide
Reaction
Insulin
Receptor
Insulin Receptor
dimer complex
GO:phosphorylation
x2
CatalystActivity
PubMed:1543
Reaction
Insulin Receptor Signalling
PubMed:5623
Insulin Receptor
dimer complex,
P-Tyr on 67...
Lineage Deletion rates
Insulin Signalling
Trp Catabolism
DNA Repair
Redundant Paths
Pathway modules
Head or Tail
Back to Proteins
Proteins are the natural Hub
Regulation
Genome
Variation
Proteins
Pathways
Structures
Literature
Thanks: Ensembl
Leaders
Analysis and Annotation
Pipeline
Database Schema and
Core API
BioMart
Distributed Annotation
System (DAS)
Ewan Birney (EBI), Tim Hubbard (Sanger Institute)
Val Curwen, Steve Searle, Browen Aken, Juilo Banet, Laura
Clarke, Sarah Dyer, Jan-Hinnerck Vogel, Kevin Howe, Felix
Kokocinski, Simon White
Glenn Proctor, Ian Longden, Craig Melsopp, Patrick Meidl
Arek Kasprzyk, Syed Heider, Richard Holland, Damian
Smedley
Andreas Kähäri, Eugene Kulesha
Outreach
Xosé M Fernández, Bert Overduin, Michael Schuster, Giulietta
Spudlich
Web Team
James Smith, Fiona Cunningham, Anne Parker, Stephen Rice,
Steve Trevanion, Matt Wood
Comparative Genomics
Abel Ureta-Vidal, Benoit Ballester, Kathryn Beal, Stephen
Fitzgerald, Javier Herrero, Albert Vilella
Functional Genomics
+ Variation
Paul Flicek, Yuan Chen, Stefan Gräf, Nathan Johnson, Daniel
Rios
Zebrafish Annotation
Kerstin Jekosch, Mario Caccamo
Systems & Support
Guy Coates, Tim Cutts
Thanks: Reactome and Consortia
Reactome
Reactome @ EBI
Reactome @ CSHL
ENCODE
Leaders:
BioSapiens
Structural work of
ENCODE
EBI and CSHL
Ewan Birney, Imre Vastrik, Esther Schmidt, Bernard de Bono,
Bijay Jassal
Lincoln Stein, Peter D’Eustauchio, Gopal Gopinathrao,
Guaming Wu, Lisa Matthews, Marc Gillispie
40 groups worldwide
Zhiping Weng (BU), Mike Snyder (Yale), John Stam. (U. Wash),
Roderic Guigo (Barcelona), Tom Gingeras (Affy), Elliott
Marguilles (NIH), Anindya Dutta (Duke), Manolis Dermzakalis
(Sanger)
20 groups across Europe
Alfonso Valencia (Madrid), Michael Trees (Madrid), Janet
Thornton (EBI) Gabby Logan (EBI)