Genomics Human Genome Evolution 2012

Download Report

Transcript Genomics Human Genome Evolution 2012

DUF1220 Domains & the Search for the
Genes that Made Us Human
James M. Sikela, Ph.D.
Human Medical Genetics, Neuroscience, &
Comparative Genomics Programs,
Department of Biochemistry & Molecular Genetics,
University of Colorado School of Medicine
Genomics Course
February 28, 2012
Key Points
• First gene-based and first genome-wide study
of lineage-specific gene duplication and loss in
human and primate evolution
• Dramatic human-specific increase in copy
number of DUF1220 protein domains
• DUF1220 copy number linked to evolution of
brain size
• Selection of evolutionarily adaptive genome
sequences may be driving disease, e.g. 1q21.1
Primate Evolution
2 MYA
5 MYA
8 MYA
13 MYA
20 MYA
25 MYA
Human
Gorilla
B/C = ~ 2
C/H = ~ 5
HC/G = ~ 8
HCG/O = ~ 13
HCG/O/Gib = ~20
Hom/OWM = ~ 25
HomOWM/NW = ~ 40
Orangutan
Gibbons
Old World Monkeys (e.g. baboon, rhesus, etc.)
New World Monkeys (e.g. squirrel monkey,spider monkey)
Chimpanzee
Gorilla
Bonobo
Orangutan
More Primates!
---- something has changed!
Human Characteristics
• Body shape and thorax
• Cranial properties
(brain case and face)
• Small canine teeth
• Skull balanced upright
on vertebral column
• Reduced hair cover
• Enhanced sweating
• Dimensions of the
pelvis
• Elongated thumb and
shortened fingers
• Relative limb length
• Neocortex expansion
• Enhanced language &
cognition
• Advanced tool making
modified from S. Carroll, Nature, 2005
Reports of “human-specific” genes
• FOXP2
– Mutated in family with language disability
• ASPM/MCPH
– Mutated in individuals with microcephaly
• HAR1F
– Gene sequence highly changed in humans
• DUF1220 protein domains
– Highly increased in copy number in humans;
expressed in important brain regions
Molecular Mechanisms Underlying
Genome Evolution
• Single nucleotide substitutions
- change gene expression & structure
• Genome rearrangements
• Gene duplication
- copy number change: gene dosage
- redundancy as a facilitator of
innovation
Gene Duplication & Evolutionary Change
•“There is now ample evidence that gene
duplication is the most important
mechanism for generating new genes and
new biochemical processes that have
facilitated the evolution of complex
organisms from primitive ones.”
- W. H. Li in Molecular
Evolution, 1997
•“Exceptional duplicated regions underlie
exceptional biology”
- Evan Eichler, Genome
Research 11:653-656, 2001
Interhominoid cDNA Array-Based
Comparative Genomic Hybridization (aCGH)
Fig 1. Measuring genomic DNA
copy number alteration using
cDNA microarrays (array CGH).
Fluorescence ratios are
depicted in a pseudocolor scale,
such that red indicates
increased, and green
decreased, gene copy number
in the test (right) compared to
reference sample (left).
Experimental Design
• Carry out pairwise cDNA aCGH comparisons
between human and other hominoid species
• Use a >39,000 cDNA microarray representing
>29,000 human genes
• Hybridize human genomic DNA (reference
sequence: cy3/green) and other hominoid genomic
DNAs (test sequence: cy5/red) simultaneously to
the microarray
• Visualize aCGH signals “gene-by-gene” along each
chromosome across five species: human (n=5),
bonobo (n=3), chimpanzee (n=4), gorilla (n=3) and
orangutan (n=3)
Whole Genome Caryoscope Image of Interhominoid aCGH Data
Human & Great Ape Genes Showing Lineage-Specific Copy Number Gain/Loss
Fortna, et al, PLoS Biol. 2004
Summary of Human/Primate
ArrayCGH Results
• First genome-wide and first gene-based aCGH
comparison of human and nonhuman primate gene copy
number variation (Fortna, et al 2004)
• 1,004 (4,159) genes identified that showed lineagespecific changes in copy number
• Time machine of evolutionary copy number change
• Gene candidates to underlie lineage-specific traits
• Genes identified represent most of major lineagespecific gene duplications and losses over the last 60
million years of human and primate evolution (Dumas,
et al 2007)
1
2B
1
MST
2
AMY
R1A T4
FCG NUD
3
9
B10
SRP ABC
2
NEK
02
3867 GR1A
C
|nt141
1q12 A453258|F
|A
3
1
O
G
Mb
1p36
1p34
20
0
6
E2F
1p31
50
1p22
PC1 5
ANA
4
1p13
100
6
140
1q21
1q23
1q32
170
C
1q41
210
250
B
3
I2
KHA
DNC PLE
7
H
2
Mb
2p24
30
2p16
50
2p11
90
2q14
110
2q21
130
2q31
2q33
170
200
2q37
2691
1239
04
.1|nt1 FLJ220
2q14
233|
|H98
6
240
O
G
0
1
MST
C
3
B
H
3p25
3p21
20
3p12
50
3q13
80
ALB
3q21
130
160
3q25
3q26
180
3q28
200
0
4
Mb
4p16
4p12
10
4q12
50
4q24
80
4q31
100
4q34
140
190
0
9
1
5
N
2H2 C1
PAIP OCL GTF BIR SMA
C1
8
BIR
5
9
Mb
5p15
20
5q11
50
0
X
MKP
5q13
70
5q15
93
233
100
130
5q34
150
O
G
190
C
116
GPR
FLJ
10
5q23
5q
1
5q 3.1
13 |nt
.1 70
|n 41
t7 7
04 5
**
26 55
IM
48 |AI
A
|H G
4| 29
s. E
W 11
79 : 7
72 8
5
01 5
43 4|O
9| 09
7| C
B 3
G LN
IR |5
T
C |7
F
1 0
2H
46
2
4
**
55
IM
1;
A
5:
5: G
3
34 E:
51
89 95
38
18 04
35
74 59
95
|H |5
s. |7
43 07
24 10
75 19
|S 5;
M 5:2
A 1
5 6
80
52
4
Mb
B
H
1
MST
6
Mb
6p25
6p22
10
6p21
30
40
6q12
50
6q14
90
2IP1
GTF
6q22
130
FAM
GEF
ARH
11
Mb
16p13
10
16p12
20
30
50
16q12
16q22
0
0
7q11
60
2
PMP
7q21
90
7q22
100
7q31
130
140
160
0
Mb
0
7
40
8q12
60
7
FGF
AOP
8q21
80
8q22
100
8q24
120
150
13
13
10
17q11
20
17q12
30
17q21
50
17q23
70
17q25
K1
ROC
17
Mb
18p11
O
G
9
7
18
**
8p12
20
17p13
FGF
10
0
8p21
12
IM
18 AG
:1 E
|H 49 : 3
s. 42 65
37 4 51
49 32 5|
88 ;21 9|4
|F :1 13
G 13 4
F 6 96
7 4 9
22 3
6 ;1
5:
8
Mb
90
FLJ
17
7q35
42
76
14
35
56
306
FLJ
0
7p14
30
16q24
70
56
306
1
PAIP
7p21
10
170
7
Mb
USP
16
6q25
5
3C
SR2
CEL
16
18q12
20
18q21
50
80
18
C
Mb
9p23
9p13
30
40
60
9q21
9q22
80
9q33
100
0
FLJ
A
23
136MPR1
B
19
B
9q34
120
150
H
Mb
19p13
10
0
10
20
19p11
19q11
40
19q12
50
19
Mb
10p15
10p11
20
40
50
10q21
80
10q24
100
10q25
120
20
10q26
140
GF-B
R1A
BMP
0
SCD
Mb
20p13
10 20
0
11
30
20q11
20 FGF7
Mb
11p15
10
3
0
6A1
SLC
DDX
11p14
20
11
50
28
11q12
11q13
70
80
827
253
220
FLJ
LOC
90
4
T
NUD
11q14
11q22
120
140
30
0
12p12
10
30
50
12q13
12q14
70
90
12q21
110
60
Mb
21
12p13
50
21
11q24
TDG
12
Mb
20q13
ALB
12q24
21q22
40
50
6
E2F
22
22
130
0
Mb
13
14
30
13q14
50
13q21
0
14q11
0
110
Human ( Homo Sapiens )
Bonobo ( Pan Paniscus )
Chimpanzee ( Pan Troglodytes )
Gorilla ( Gorilla Gorilla )
Orangutan ( Pongo Pygmaeus )
14q13
7
694
283 FAM
LOC CHR
A
50
14q22
14q31
14q32
70
90
Xp22
20
Xp11
23
Test/Reference ratio:
7
FGF
Y
Mb
<
_ 0.5
20
15q13
40
15q21
40
50
X
100
15
Mb
22q13
Mb
50
15q22
70
15q24
15q26
100
1
>2
_
Yp11
0
15
30
30
3C
13q33
90
14
Mb
FAM
0
13q12
22q11
20
0
Mb
20 50
50
70
Xq21
100
Xq26
130
Xq28
150
19q13
60
90
0
Human & Great Ape Genes Showing Lineage-Specific Copy Number Gain/Loss
Fortna, et al, PLoS Biol. 2004
“This (Fortna, et al, 2004) is the first time
that copy number changes among apes have
been assayed for the vast majority of
human genes, and we can expect that the
biological consequences of the 140 humanspecific copy number changes identified in
this study will be heavily investigated over
the coming years. “
---M. Hurles, PLoS Biol. 2004
DUF1220
Repeat Unit
Popesco, et al, Science 2006
InterPro-predicted DUF1220-containing proteins (NBPF family*)
*Vandepoule, et al, Mol. Biol. & Evol, 2005
Copy Number of DUF1220 (Q8IX62/17-33)
Copy
Num ber of
Sequences
inDUF1220
Primate(Q8IX62/17-33)
Species
70
70
60
50
40
60
50
40
30
20
10
0
30
20
10
Baboon
Macaque
Gibbon
Orangutan
Gorilla
Chimp
Bonobo
0
Human
Number
Q-PCR Predicted Copy
Q-PCRNumber
Predicted Copy
Sequences in Prim ate Species
Summary of aCGH, Q-PCR and
BLAT results:
• DUF1220 domains are highly amplified in
human, reduced in great apes, further
reduced in Old & New World monkeys,
single or low copy non-primate mammals
and absent in non-mammals
DUF1220 copy number in Animal Genomes
Euarchotanglines
Genome
Laurasiatheria
PDE4DIP
DUF1220
Total
DUF1220
NBPF
genes
Human
2
268
21
Chimp
3
125
Gorilla
3
Orangutan
Genome
PDE4DIP
DUF1220
Total
DUF1220
NBPF
genes
Cow
1
6
2
15
Pig
1
3
1
99
15
Horse
1
8
3
4
92
11
Dog
1
3
1
Macaque
1
35
10
Panda
1
2
1
Marmoset
1
30
10
Rabbit
1
8
3
Mouse
1
1
0
Rat
1
1
0
Guinea Pig
1
1
0
Afrotheria
Elephant
1
1
1
Metatheria
Opposum
1
1
0
Prototheria
A total of 40 genomes were
searched, but only the 22 with 4X
coverage or higher are displayed.
Platypus
1
1
0
Other Vertebrates
Chicken
0
0
0
Lizard
0
0
0
Frog
0
0
0
Zebrafish
0
0
0
DUF1220 Copy Number Statistics in hg19
build
DUF1220 Copies
Total in Human Genome
272
Total amplified HLS DUF1220 Triplets
129
Total DUF1220 in Last Common Ancestor of Homo/Pan
102
Total of Newly Added Copies in Human Lineage
167
Total Copies Added via Domain Amplification
146
Total Copies Added via Gene Duplication
21
Average Number Added to Human Lineage every million years
28
This table shows the unprecedented DUF1220 copy number increase in
the human lineage. The primary mechanism for this expansion was
domain amplification via hyper-amplification of the HLS DUF1220 triplet.
Sequences encoding DUF1220 domains
• Show a major copy number burst in primates
• Are increasingly amplified generally as a
function of a species evolutionary proximity
to humans, where the greatest number of
copies (270) is found
• Show signs of positive selection
• Are highly expressed in brain regions
associated with higher cognitive function
• In brain show neuron-specific expression
preferentially in cell bodies and dendrites
Popesco, et al, Science 2006
1q21.1 Deletions* Linked to Microcephaly
1q21.1 Duplications* Linked to Macrocephaly
•Recurrent Reciprocal 1q21.1 Deletions and Duplications Associated
with Microcephaly or Macrocephaly and Developmental and
Behavioral Abnormalities
Brunetti-Pierri, et al, Nature Genetics 2008
•Recurrent Rearrangements of Chromosome 1q21.1 and Variable
Pediatric Phenotypes
Mefford, et al, N. Engl. J. Med. 2008
*Implies human brain size directly related to the dosage of one or
more genes in these 1q21.1 CNVs
We note that these CNVs encompass or are immediately flanked by
DUF1220 sequences (Dumas & Sikela, Cold Spring Harbor Symposium
Quant. Biol., 2009)
DUF1220/NBPF Sequences & Recurrent Disease-associated 1q21.1 CNVs
Association (p<0.0001) of human head circumference
(FOC Z-score) & DUF1220 copy number
Head Circumference (FOC Z-Score) vs. DUF1220 Copy Number
6
FOC Z-Score
4
2
Class II Deletion
0
Class I Deletion
Duplication
-2
-4
-6
20
30
40
50
60
Q-PCR-Predicted DUF1220 Copy Number
70
80
Copy number of genes in
the 1q21.1-q21.2 region
versus brain size
• 46 1q21.1 genes compared
along with brain size across 5
primate species
• DUF1220 shows the most
dramatic human-specific copy
number increase.
• The evolutionary increase in
DUF1220 copy number
parallels the increase in brain
size.
Brain Size (g)
Copy #
DUF1220
PPIAL4
LOC728855
FAM72D
SRGAP
PDE4DIP
SEC22B
NOTCH2NL
HFE2
TXNIP
POLR3
ANKRD34
ANKRD35
LIX1L
RBM8A
GNRHR2
PEX11B
ITGA10
NUDT17
RNF115
CD160
PDZK1
GPR89
PRKAB2
PDIA3P
FMO5
CHD1L
BCL9
ACP6
GJA5
GJA8
LOC645166
FCGR1
SV2A
BOLA1
MTMR11
OTUD7B
SF3B4
VPS45
PLEKHO1
ANP32E
PRPF3
C1orf54
MRPS21
CA14
C1orf51
APH1A
Human
1350
Chimp
380
Orangutan
390
Macaque
88
Mamoset
7
272
5
5
2
1
3
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
3
3
1
1
1
1
1
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
125
1
2
0
0
3
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
92
1
2
0
0
4
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
35
0
2
0
0
1
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
30
0
1
0
0
1
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
1
1
1
1
0
1
1
1
1
1
1
1
1
1
1
DUF1220 Copy Number Versus Brain Size
* Neandertal DUF1220 copy number is estimate based on sequence read
depth from the Neandertal genome (Green et al 2010).
-but correlation is not causation
Factors that must be reconciled with model linking
1q21.1 instability, evolutionary adaptation &
recurrent disease
• Evolutionarily rapid DUF1220 copy number increase
– Estimate, on average, 28 more DUF1220 domains
added to human genome every 1 million years since
Homo/Pan split
• Underlying mechanism must account for continued,
recurrent DUF1220 increases
• Underlying mechanism must account for excess of
1q21.1 disease-associated CNVs containing dosagesensitive genes
Proposed Mechanism Linking
DUF1220, Brain Evolution and Disease
1q21.1 duplications
Evolutionary Advantage
Increased
(Increase in
1q21.1 Instability
Brain Size?)
Macrocephaly;
Autism*
1q21.1 deletions
Increase in
DUF1220
Copy Number
Microcephaly;
Schizophrenia*
*Diseases proposed as “Diametric
Opposites” (including brain size), Crespi,
Stead & Elliot, PNAS, 2009
DUF1220 Model*
DUF1220 model proposes that:
1) DUF1220 copy number is directly involved in
influencing human brain size, and
2) the evolutionary advantage of rapidly
increasing DUF1220 copy number in the
human lineage has resulted in favoring
retention of the high genomic instability of
the 1q21.1 region which, in turn, has
precipitated a spectrum of recurrent human
brain and developmental disorders
*Dumas & Sikela, Cold Spring Harbor
Symposium Quant. Biol., 2009
Concluding Thoughts
• DUF1220 domains shows the largest HLS
protein coding copy number increase in the
genome
– But no one gene made us human
– DUF1220 genotyping challenges
• We know more about our genome than ever
– But there are vast areas of our genome
about which we know virtually nothing
– No mammalian genome has been completely
sequenced
Acknowledgements
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Sikela Lab
Laura Dumas
Majesta O’Bleness
Maggie Popesco
Erik MacLaren
Andy Fortna
Jan Hopkins
Jonathon Keeney
Jack Davis
Jay Jackson
Megan Sikela
Michael Cox
Kriste Marshall
Matt Brenton
Sonya Burgers
Raquel Hink
Erin Dorning
Park McNair
•
•
•
•
•
•
Collaborators
Stanford
– Jon Pollack
– Young Kim
Univ. of Kansas
- Gerald Wyckoff
Univ of Utah
– Lynn Jorde
Baylor College
– Pawel Stankiewicz
– Sau Wai Cheng
UCSOM
– Epidemiology
• Tasha Fingerlin
– Preventive Medicine &
Biometrics
• Anis Karimpour-Fard
– Neuroscience Program
• Rock Levinson
• John Caldwell
A Walk Through Our Genome
--All regions of the genome are not created equal