Genome Organization and Evolution
Download
Report
Transcript Genome Organization and Evolution
09_25_Chromosome22.jpg
Prokaryotes (Ch. 27)
characteristics
genome organization
reproduction
metabolism
representative groups
importance
All the strands
are part of the
one
chromosome,
making a loop.
Fig. 27.8. The single chromosome (DNA +
proteins) of a prokaryote contains all the directions
for making a living cell. The chromosome of a
prokaryopte is not organized into a nucleus,
although it is generally located in one part of the
cell).
Prokaryote genomes
●
Example: E. coli
●
89% coding
●
4,285 genes
●
122 structural RNA genes
●
Prophage remains
●
Insertion sequence elements
●
Horizontal transfers
Prokaryotic genome organization:
• Haploid circular genomes (0.5-10 MB, 500-10000 genes)
• Operons: polycistronic transcription units
• Environment-specific genes on plasmids and other types of
mobile genetic elements
• Usually asexual reproduction, great variety of
recombination mechanisms
• Transcription and translation take place in the same
compartment
EPFL Bioinformatics I – 09 Jan 2006
Some important concepts from Chapter 1 that you should review from
your high school biology course
1. Fig. 1.2 properties of life
2. Fig. 1.3 - levels of
biological
organization
3. A closer look at
cells - eukaryotic
and prokaryotic
(Fig. 1.8).
Prokaryotes often
also have a cell
wall around the
membrane.
Eukaryotic genome
●
Example: C. elegans
●
10 chromosomes
●
19,099 genes
●
Coding region – 27%
●
Average of 5 introns/gene
●
Both long and short duplications
Eukaryotic genome organization
•
•
Multiple genomes: nuclear, plastid genomes: mitochondria, chloroplasts
Plastid genomes resemble prokaryotic genomes
More about the nuclear genome:
•
•
•
•
•
•
•
•
Multiple linear chromosomes, total size 5-10’000 MB, 5000 to 50000
genes
Monocistronic transcription units
Discontinuous coding regions (introns and exons)
Large amounts of non-coding DNA
Transcription and translation take place in different compartments
Variety of RNA genes: rRNA, tRNA, snRNA (small nuclear), sno (small
nucleolar), microRNAs, etc.
Often diploid genomes and obligatory sexual reproduction
Standard mechanism of recombination: meiosis
EPFL Bioinformatics I – 09 Jan 2006
207
21
Complete genomes
>1387 projects
261 published
(01-03-05)
• 654 prokaryotes
33
• 472 eukaryotes
http://www.genomesonline.org/
Reconstructing Life’s Tree
●
Using all of the available information, we can
reconstruct relationships between organisms back
to the earliest forms of life
Hierarchy of gene organization
Gene – single unit of genetic function
Operon – genes transcribed in single transcript
Regulon – genes controlled by same
regulator
Modulon – genes modulated by
same stimilus
** order of ascending
complexity
Element – plasmid, chromosome,phage
Genome
Number of available completely sequenced
genomes
300
261
270
240
224
210
180
165
150
116
120
90
71
60
42
30
0
2
5
1
2
95 96
12
3
19
4
97 98
24
5
99
6
00
7
8
01 02
9
10
03 04
11
03-05
Huge amount of data
with exponential increase
Genomes to date
Polymorphic repeats
●
Can be found anywhere in the genome
●
The longer the repeat, the more polymorphic it is
●
●
Are the basis for DNA fingerprinting & paternity
testing
In silico estimation of potentially polymorphic
repeats suggest > 100,000 in the human genome
Polymorphic repeats in genes
●
●
●
●
Are found in genes less frequently
Have been implicated in several inherited genetic
disorders
Mutations at a rate of up to 100,000x that of SNPs
Have also been linked to roles in development,
cancer & morphology
First microbial genome was completely sequenced
in 1995 by The Institute for Genomic Research
(TIGR)
Genome of
Haemophilus
influenzae Rd
single circular
chromosome
1,860,137 bp
Outer circle – coding
sequences with
database matches
40% of genes at the
time had no match in
the databases
Fleishmann, R.D. et al. 1995. Science 269:496-512.
E. coli K12
Orange squares
-forward genes
Yellow squares
-complement genes
Red arrows
-rRNA
Green arrows
-tRNA
White ring
-REP sequences
Blue ring
-bacteriophage
protein similarity
Sunburst
-CAI (codon
adaptation index)
Blattner et al. 1995. Science 277:1453-
Comparative
genomics
Multipartite
genomes
Circular and
linear DNA
elements
Huge degree of
variation
Microorganism
Chromosome
(kbp)
Plasmid
(kbp)
Genome rRNA
(kbp)
Escherichia coli K12 MG1655*
4640
-
4640
7
Haemophilus influenzae Rd*
1830
-
1830
6
Burkholderia cepacia ATCC 25416
3650 +3170 +
1070
200
8090
4+1+1
Agrobacterium tumefaciens
3000 + 2100
(L)
200, 450
5750
1+1
Rhizobium meliloti
3400
1340, 1700
6440
3
Rhodobacter sphaeroides
3045 + 915
31, 42, 63, 97,
105, 110
4408
1+2
Thiobacillus cuprinus
3800
50 (L)
3850
1
Myxococcus xanthus
9455
-
9455
4
Bacillus subtilis*
4215
-
4215
10
Mycoplasma genitalium*
580
-
580
1
Bacillus cereus Fo837/76
2400
40, 230, 260,
360, 760, 960
5010
3
Streptomyces coelicolor
8000 (L)
350 (L)
8350
6
Synechocystis sp. strain PCC 6803*
3820
2.2, 5.2, 50,
100
3977
2
Borrelia burgdorferi B31*
910 (L)
9-53 (L and C)
>1250
2
Methanococcus jannaschii*
1660
16, 58
1734
2
Haloferax volcanii
2920
6.4, 86, 442,
690
4144
2
Some important concepts from Chapter 1 that you should review from
your high school biology course
1. Fig. 1.2 properties of life
2. Fig. 1.3 - levels of
biological
organization
3. A closer look at
cells - eukaryotic
and prokaryotic
(Fig. 1.8).
Prokaryotes often
also have a cell
wall around the
membrane.
Human Genome Organization
From: Dr Finbarr Hayes lec
HUMAN GENOME
Nuclear genome
3000 Mb
65-80000 genes
30%
Genes and generelated sequences
70%
Extragenic
DNA
Mitochondrial genome
16.6 kb
37 genes
Two rRNA
genes
Unique or moderately repetitive
10%
90%
Coding
DNA
Pseudogenes
Noncoding
DNA
Gene
fragments
Introns,
untranslated
sequences, etc.
22 tRNA
genes
13 polypeptideencoding genes
80%
20%
Unique or
low copy
number
Moderate to
highly
repetitive
Tandemly
repeated
or clustered
repeats
Interspersed
repeats
Human genome =
nuclear genome + mitochondrial genome
Mitochondrial genome
HUMAN
NUCLEAR GENOME
24 chromosomes (haploid)
3200 Mbp
30,000 genes
16569 bp
37 genes
H strand
enriched in G
L strand
enriched in C
Organization of the human genome
Limited autonomy of mt genomes
mt encoded
nuclear
NADH dehydrog
7 subunits
>41 subunits
Succinate CoQ red
0 subunits
4 subunits
Cytochrome b-c1 comp
1 subunit
10 subunits
Cytochrome C oxidase
3 subunits
10 subunits
ATP synthase complex
2 subunits
14 subunits
tRNA components
22 tRNAs
none
rRNA components
2 components
none
Ribosomal proteins
none
~80
Other mt proteins none
mtDNA pol, RNA pol
etc.
Human Mitochondrial Genome
Small (16.5 kb) circular DNA
rRNA, tRNA and protein encoding genes (37)
1 gene/0.45 kb
Very few repeats
No introns
93% coding;
Genes are transcribed as multimeric transcripts
Recombination not evident
Maternal inheritance
●
●
What are the mitochondrial genes?
24 of 37genes are RNA coding
–
22 mt tRNA
–
2 mit ribosomal RNA (23S, 16S)
13 of 37 genes are protein coding
(synthethized on ribosomes inside mitochondria)
some subunits of respiratory complexes
and oxidative phosphorylation enzymes
Two overlapping genes encoded by same strand
of mt DNA
(unique example)
Two independent AUG located in Frame-shift to each other,
second stop codon is derived from TA + A (from poly-A)
Mitochondrial codon table
22 tRNA
cover for
60 positions
via
third base
wobble
Human Nuclear Genome
3200 Mb
23 (XX) or 24 (XY) linear chromosomes
30-35,000 genes
1 gene/100kb
Introns in the most of the genes
1,5 % of DNA is coding
Genes are transcribed individually
Repetitive DNA sequences (45%)
Recombination at least once for each chrom.
Mendelian inheritance (X + auto, paternal Y)
Human Genome Organization
From: Dr Finbarr Hayes lec
HUMAN GENOME
Nuclear genome
3000 Mb
65-80000 genes
30%
Genes and generelated sequences
70%
Extragenic
DNA
Mitochondrial genome
16.6 kb
37 genes
Two rRNA
genes
Unique or moderately repetitive
10%
90%
Coding
DNA
Pseudogenes
Noncoding
DNA
Gene
fragments
Introns,
untranslated
sequences, etc.
22 tRNA
genes
13 polypeptideencoding genes
80%
20%
Unique or
low copy
number
Moderate to
highly
repetitive
Tandemly
repeated
or clustered
repeats
Interspersed
repeats
Human nuclear genome
Euchromatic portion
3000Mb
Constitutive
heterochromatine
200 Mb
chr Total DNA
1
279
Heterochromatin
2
251
is distributed
104
between chromosomes 16
unevenly
17
88
21
45
Hetero DNA
30
3
15
3
11
Gene-poor chromosomes
(With extra heterochromatin)
Short arms of acrocentric chromosomes
–13,
–14,
–15,
–21,
–22
Part of long arms of chr 1,9,16
Long arm of chromosome Y
Human genome base content
●
41% CG in average
38% CG for chromosomes 4 and 13
49% for chromosome 19
●
Regions with wide swings in GC content
(e.g. from 33,1% to 59,3%)
GC content is correlated with Giemsa staining;
Genes correlated too.
Gene density correlates with higher GC content
Location of CpG islands in the gene
CpG islands do NOT have a deficit
of CpG dinucelotides
REPEATS!!!!
3 Main Components in Eukaryotic Genomes
REPEATS
DNA purified from a human,
do not self-anneal
as a simple sigmoidal curve.
Instead we see a curve
which is the sum of the reannealings
of many different components
CoT curve
is a measure of
sequence complexity
NO REPEATS
C0 = the initial concentration of nucleotides, T – time in seconds
Satellite DNA is repetitive DNA that could be
separated by buoyant density
Equilibrium
density
gradient
centrifugation
Sheared DNA
in Cesium Chloride
gradient
Satellite DNA
Alpha –satellite
(Centromere DNA)
Microsatellites
Minisatellites
Are you still remember what it is?
If not please refer to previous lectures and to the book
Repetitive DNA
●
●
Moderately repeated DNA
–
Tandemly repeated rRNA, tRNA and histone genes (gene products
needed in high amounts)
–
Large duplicated gene families
–
Mobile DNA (transposons)
Simple-sequence DNA
–
Tandemly repeated short sequences
–
Found in centromeres and telomeres (and others)
= (MINI and MICROSATELLITES)
Genetic maps
●
●
●
Variable number tandem repeats (VNTRs –
minisatellites), 10-100 bp, are a sort of genetic
fingerprint
Short tandem repeat polymorphisms (STRPs –
microsatellites), 2-5 bp, are another kind of
marker
A sequence tagged site (STS), 200-600 bp, is a
known unique location in the genome
Human Mobile DNA (transposons)
●
Moves within genome
●
LINE (Long interspersed nuclear elements)
●
●
●
–
L1, L2, L3 LINE is ~21% of human DNA (~1,00,000 copies)
SINE (Short interspersed nuclear elements)
–
Alu is ~10,7% of human DNA (1,200, 000 copies)
–
MIR, MIR3 is 3% of hum DNA (500,000 copies)
LTR elements (Long Terminal Repeats)
–
ERV and MalR are 8% of human DNA (500,000 copies)
Transposons
–
MER1 (Charlie), MER2 (Tigger), others
(350, 000 copies), 2,8% of human DNA
TOTAL: approx; 45% of human DNA
RNA or DNA intermediate
• Transposon moves
using DNA
intermediate
• Retrotransposon
moves using RNA
intermediate
http://www.hos.ufl.edu/mooreweb/
LINEs
and
ERVs
Long interspersed nuclear elements (LINEs )
20% of genome
RNA binding
also endonuclease
Internal
promoter
●
LINE1 – active
(Also many truncated inactive sequences)
●
Line2 – inactive
●
Line 3 – inactive
LINEs prefer AT-rich euchromatic bands
IN everyone’s genome 60-100 copies of LINE1
are still capable of transposing,
and may occasionally cause the disease by gene disruption
Mechanism of LINE repeat jumps
Full length LINE transcript is generated from 5’-UTRbased promoter
5’
3’
ORF1 and ORF2 translated into proteins that stay bound
to LINE mRNA
orf2
5’
3’
orf1
ORF1/ORF2/mRNA complex moves back into the
nucleus
5’
Product of ORF2
3’ cut ds DNA
orf2
orf1
Freed 3’ serves as a primer for
LINE reverse transcription from 3’ UTR
5’
3’ 5’
3’
ORF2 and ORF1 function
●
ORF1 keeps ORF2 and LINE mRNA bound together and retracted
into nucleus
●
ORF2 (endonuclease) cut dsDNA to provide free 3’ end as a
primer to LINE 3’UTR
●
ORF2 (reverse transcriptase)
makes cDNA copy of LINE mRNA, which becomes integrated into
chromosomal DNA
(as it bound to it by former 3’ freed end)
TTTT A is ORF1 cleavage site,
that is why integration prefers AT rich regions
LINE replication is not very efficient
process
Reverse transcriptase of LINE elements is a “weak” enzyme
(have a low processivity)
Many insertions are truncational
(copies are not able to copy itself further)
Most insertions are only 900 bp
(instead of 6.1 kb),
only 1 of 100 insertions is successful
Illustration to full-size LINEs
and their fossil derivates
Short interspersed nuclear elements (SINE)
13% of genome
●
Non-autonomous (no revertase)
●
100-400 bp long;
●
No open reading frames
●
Derived from tRNA (transcribed with
RNA pol III, leaving internal promoter)
●
Share sequences with 3’ ends of LINEs
●
Depend on LINE machinery for its movement
AluI - elements
●
Derived from signal recognition particle 7SL
●
Does not share its 3’ end with a LINE
●
Internal promoter is active, but require appropriate
flanking sequence for activation – so it’s active only if
lucky with it’s integration site
●
Integrates in GC rich sequences
●
Only active SINE in the human genome
Mark A. Batzer and Prescott L.Deininger
As ALU repeats
do not have
open reading frames,
ALUs have to use
RT enzyme
and endonuclease
provided by
LINE repeats
or other transposons
After integration Alu copies rapidly mutate at
sites of their 24 CpGs
Alignment of Alu-subfamily consensus sequences.
Mark A. Batzer and Prescott L.Deininger
The expansion of Alu-elements in primate
lineage
Mark A. Batzer and Prescott L.Deininger
Potential Alu-mediated damage to human
genome
Insertional
mutagenesis
ALU-mediated
uneven recombination
Diseases that sometimes caused
by de novo Alu-integration
●
Neurofibromatosis (Shwann cell tumors),
●
haemophilia,
●
breast cancer,
●
●
●
Apert syndrome (distortions of the head and face and
webbing of the hands and feet),
cholinesterase deficiency
(congenital myasthenic syndrome)
complement deficiency
(hereditary angioedema)
●
●
Disease that sometimes caused
by Alu-mediated uneven recombination
insulin-resistant diabetes type II (InsReceptor)
Lesch–Nyhan syndrome (overproduction of uric acid
leading to neurologic syndrome),
●
Tay–Sachs disease,
●
complement component C3 deficiency,
●
Familial hypercholesterolaemia
●
α-thalassaemia
●
Several types of cancer, including Ewing sarcoma, breast
cancer, acute myelogenous leukaemia
Positive role of Alu repeats in evolution
Insertions of the repeat near gene
may change
Alu
its expression pattern,
gene structure,
Alu
or leads to
alternatively spliced
mRNA isoforms
Alu
LTRs contain promoters,
ALUs repeats contain TF binding sites
Human repeat distribution depends on GC
content of integration sites
●
Alu paradox
Alu repeats are found in GC-rich (gene rich) regions more
often than in AT rich;
●
De novo integration of ALU-repeats happens
in AT-rich areas
(as they hijacked ORF2 product of LINE)
ALUs are subject of positive selection
(as they CREATE new genes)
by supplying genome segments ready to become genes
with promoter like elements and exonic-like boundaries.
Also they are GC rich themselves,
so they transform AT-rich regions into GC rich
LTR transposons
●
Any trasposon flanked by Long Terminal Repeats;
●
DNA bases transposons and Retrotransposons;
Contain
Transposase;
Endogenous
Retroviral Sequences
(ERVs)
Already silent
in the human genome
Contain Gag and Pol genes
Fossils
(Charlie and Tigger types)
Only HERV-K look still OK
for moving
DNA transposons and retrotransposons
LINE
Kazazian, Science, Vol 303, Issue 5664
SINE
Human RNA genes
(non-coding RNA transcripts)
3000 RNA genes in human genome (rough)
●
rRNA
●
tRNA
●
Small nuclear RNA
●
Small nucleolar RNA
●
SRP RNA
●
MicroRNA
●
Antisense RNA
●
Non-coding gene mRNA isoforms;
●
RNAs form transcribed pseudogenes
miRNA and antisense RNA
are underestimated;
“other non-coding RNA”
are not represented
rRNA genes (1200 genes)
18S, 5.8S and 28S
are encoded
by single
transcription units;
Located in 5 clusters:
Chr. 13,14,15, 21,22
5S is in tandem arrays,
largest is on
Chr. 1q41-42
All this is to increase
a gene dosage
●
tRNA genes
(497 nuclear genes + 324 putative pseudogenes)
Humans have fewer tRNA genes that the worm (584), but
more than the fly (284);
●
Frog X.laevis have thousands of tRNA genes;
●
Number of tRNA genes correlates with size of the oocytes;
In large oocytes lots of protein needs to be sythethized
simultaneously….
tRNA genes
(497 nuclear genes + 324 putative pseudogenes)
49 families according to codon recognition;
(Should by 61 for every coding triplet)
Paradox is eliminated by codon wobbling
●
Very rough correlation between tRNA gene number and amino
acid frequency in the protein
●
●
●
280 out of 497 genes are on Chr.6, most are clustered in the same 4
Mb region; other are also more or less clustered (Chr. 1 and 7)
All chromosomes still carry at least one tRNA gene – chr.22 and Y
are exclusions
Representation of aminoacids by human tRNA
(examples)
Amino Acid
Alanine
Leucine
Tryptophan
Valine
Aspartate
Cysteine
Histidine
Selenocysteine
Frequency
7,06%
9,95%
1,30%
6,12%
4,78%
2,25%
2,56%
<0,01%
Number of tRNAs
40
35
7
44
10
30
12
1
Small nuclear RNA (snRNA)
●
Uridine rich;
●
Numbered U1, U2, U3 etc
●
Include spliceosomal RNAs U6 and U1
U6 (44 genes) and U1 (16 genes)
●
Sometimes clustered as very irregular or almost perfect
groups,
e.g. RNU1 locus at 1p36 and RNU2 at 17q21;
●
For U6 snRNA 1135 fragmental/pseudogenic sequences are
identified
Small nucleolar RNA (snoRNA)
●
Employed in nucleolus to guide
site-specific base modifications in rRNA;
●
Also can modify U6 RNA;
●
snoRNA genes often found in other gene’ introns
●
●
Generally not clustered except SNURF-SNRPN unit on 15 q which
possibly involved in Prader-Willi sydrome
C/D box snoRNA and
Site-specific
2’-O-ribose methylation
of rRNA (105-107 sites)
H/ACA snoRNA
Site-specific
Pseudouridylation
(95 sites)
SRP RNA (7SL RNA)
Protein export machinery of the endoplasmic reticulum binds a
protein RNA complex
(Signal Recongnition Particle) that contains 7SL RNA
four 7SL genes,
500 7SL pseudogenes
and all the Alu repeats that are derived form 7SL gene
Micro RNA (miRNA)
a family of 21–25-nucleotide small RNAs that negatively
regulate gene expression at the post-transcriptional level;
●
●
primary transcripts of miRNAs are processed sequentially
by two RNase-III enzymes, Drosha and Dicer, into a small,
imperfect dsRNA duplex (miRNA:miRNA*)
mature miRNA strand plus its complementary strand
(miRNA*).
●
RNA-induced silencing complex (RISC) is operated by
miRNA;miRNA* and Ago-proteins
Exonuclease III
Drosha
This form is exported
from the nucleus
By Exportin-5
Dicer cleaves microRNAs
into their mature form
miRNA incorporated into effector complexes
Elizabeth P Murchison
and Gregory J Hannon
miRNA is recognized
by the PAZ domain of an Ago protein,
and incorporated into RISC
facilitates transfer
of miRNAs into RISC.
ss miRNA
Depending on RISC components,
RISC may target homologous mRNA
for cleavage,
or stall mRNA translation
Non-coding mRNA with poly(A )tail transcribed
by RNA pol II
●
Mid-to-large size mRNA
●
For most of them function is unknown
●
Often overexpressed in tumors
●
●
●
7SK RNA decreases rate of RNA pol II
and inhibits the activity of
CDK9/cyclin T complexes;
SRA RNA co-activator of steroid receptors
XIST RNA – X-chromosome incativation in
female cells
elongation
Antisense RNA
●
TSIX regulates XIST gene
●
Antisense regulation of imprinted genes
●
aHIF: regulates hypoxia-inducible factor (HIF)-1alpha and
HIF-2alpha;
●
Makorin-2 gene as an antisense to the RAF1 oncogen
●
RFP2 CLL candidate gene and RFP2OS transcript
●
antisense beta myosin heavy chain RNA switches myosin
heavy chain gene expression from myosin beta to myosin
alpha in heart musc
Some more statistics
●
Gene density 1/100 kb (vary widely);
●
Averagely 9 exons per gene
●
363 exons in titin gene
●
Many genes are intronsless
●
Largest intron is 800 kb (WWOX gene)
●
Smallest introns – 10 bp
●
Average 5’ UTR 0,2-0,3 kb
●
Average 3’ UTR 0,77 kb but underestimated…
●
Largest protein: titin: 38,138 aa
INTRONLESS GENES
●
Interferon genes
●
Histone genes
●
Many ribonuclease genes
●
Heat shock protein genes
●
Many G-protein coupled receptors
●
Some genes with HMG boxes
●
Various neurotransmitters receptors and hormone receptors
Smallest human genes
Percentages
describe
exon content
to the length of the gene
Typical human genes
Extra Large human genes
IG genes are shown
as germline genes,
before rearrangements
Presumable functions of human genes
HUMAN genes
and their homology
to genes
from other organisms
Why so small amount of genes we,
humans, kings of nature, have?
Human 30,000 genes
Drosophila – 13,000
Nematode – 19,000
Potential of proteome and transcriptome diversity is so great
that it is no need for increase of amount of genes
●
●
●
Gene families
Functionally identical genes
-- Recently duplicated genes (Alpha-globins);
-- Histone genes (86 members, some are identical)
-- Ubiquitin-encoding genes (some are in polycistronc transcription
units)
Functionally similar genes
usually arise by duplications also, than diverge
Functionally related genes
belong to the same pathway or to encode subunits of protein
complex (usually non-related)
Chromosomal distribution
of human histone genes
●
●
●
Bidirectional and partially overlapping genes
Not very common in human genome as 1 gene/100 kb density allow
genes to be loose…
Provides possibility for common regulation of a gene pair.
Partially overlapping genes are usually encoded by opposite DNA
strands.
Found in dense gene areas,
as HLA class III complex on 6p21.3
Could represent sense-antisense pair
with one gene is coding mRNA, another is non-coding
HLA class III complex on 6p21.3:
an example of tightly packed genes
MHC Class III genes
Encoding complement proteins C4A and C4B, C2 and FACTOR B
TUMOUR NECROSIS FACTORS AND
Plus some Immunologically irrelevant genes
Genes encoding 21-hydroxylase, RNA Helicase, Casein kinase
Heat shock protein 70, Sialidase
An example of complex human gene locus
INK4a-ARF
From: Prof. Gordon Peters website
Genes within genes
Neurofibromatosis gene (NF1) intron 26 encode :
OGMP (oligodendrocyte myelin glycoprotein)
EVI2A and EVO2B
(homologues of ecotropic viral intergration sites in mouse)
●
●
●
Gene families
Classical gene families
(overall conservativeness)
Histones, alpha and beta-globines
Gene families with large conservative domains
(other parts could be low conservative)
HLH/bZIP box transcription factors
Gene families with short conservative motifs
e.g. DEAD box (Asp-Glu-Ala-Asp), WD repeat
Example of human gene families
clustered together
CS = chorionic somatomammotropin
four placenta-specific genes, primates only
serum albumin
alpha-albumin
vitamin D-binding protein
Example of human protein motifs
DEAD box proteins are involved in mRNA splicing
and translation initiation; 8 conservative boxes, DEAD is the most evident
WD proteins take part in a variety of regulatory functions,
GH (Gly-His should be at 23-41 aa distance from WD (Trp-Aps)
●
●
●
●
Gene superfamilies
Proteins that are functionally related in a general sense, but
show only weak homology
Immunoglobulin superfamily (IG genes, T- cell receptor
genes, HLA-genes….)
Globin superfamily (myoglobin, alpha and beta-globins,
neuroglobin etc….)
G-protein coupled receptor superfamily (seven
transmembrane domains, but low homology)
And so on….
Illustration to gene superfamily
●
●
Major mechanisms of gene family
spreading
Ancient gene or chromosomal segment
duplications
–
Tandem duplications
–
Duplications with gene transfer to another chromosome
Retrotransposition events
(processed copies with no introns only)
Some regions of genome are more prone to
rearrangements than others
Chromosome 22q
Human pseudogenes
Non-processed
pseudogenes
Contain introns;
Arise by duplications;
Frequency of transfer depend
on chromosomal context
(pericentromeral fragment are
transferred more often)
Complete
Partial
Processed
pseudogenes
Do not contain introns;
Arise by
retrotransposition;
Frequency of transfer
depends on initial level of
gene expression
(Highly expressed genes
are transferred more often)
Both types of pseudogenes are raw material for evolution
HLA type I cluster
Domain structure
of a typical HLA type I gene
Complete pseudogenes
Partial pseudogenes and their structure
NF1 gene and its pseudogenes on
different chromosomes
All NF1 pseudogenes are partial;
11 of them are found in the genome
Mechanism of processed pseudogene transfer
into new location
Could be very prolific: there are 95 functional ribosomal genes
and 2090 pseudogenes
Transcription from pseudogenes
Chr 1
Master gene
Chr 15
Chr 7
Partial duplication
with preservation of the promoter;
Expression is preserved in evolution,
If transcript encode partial protein (regulatory),
or if rare transcription factors sites
present in both promoters
LINE-mediated
inclusion of cDNA
copy of master gene;
Brought under
heterologous promoter
by chance,
could be antisense
Genetic maps
●
●
●
Variable number tandem repeats (VNTRs –
minisatellites), 10-100 bp, are a sort of genetic
fingerprint
Short tandem repeat polymorphisms (STRPs –
microsatellites), 2-5 bp, are another kind of
marker
A sequence tagged site (STS), 200-600 bp, is a
known unique location in the genome