Transcript BSA-V7.ppt

V7 Arabidopsis thaliana
Arabidopsis thaliana is a small flowering plant that
is widely used as a model organism in plant
biology.
Arabidopsis is a member of the mustard
(Brassicaceae) family, which includes
cultivated species such as cabbage and radish.
Arabidopsis is not of major agronomic significance,
but it offers important advantages for basic
research in genetics and molecular biology.
TAIR
Biological Sequence Analysis
SS 2009
lecture 7
1
Some useful statistics for Arabidopsis thaliana
– Its small genome (114.5 Mb/125 Mb total) has been
sequenced in the year 2000.
– Extensive genetic and physical maps of all 5 chromosomes.
– A rapid life cycle (about 6 weeks from germination to mature
seed).
– Prolific seed production and easy cultivation in restricted
space.
– Efficient transformation methods utilizing Agrobacterium
tumefaciens.
– A large number of mutant lines and genomic resources many
of which are available from Stock Centers.
– Multinational research community of academic, government
and industry laboratories.
TAIR
Such advantages have made Arabidopsis a model organism for studies of the
cellular and molecular biology of flowering plants.
TAIR collects and makes available the information arising from these efforts.
Biological Sequence Analysis
SS 2009
lecture 7
2
Arabidopsis thaliana chromosomes
red: Sequenced portions,
light blue: telomeric and centromeric
regions,
black: heterochromatic knobs,
magenta: rDNA repeat regions
Gene density (`Genes') ranges from 38
per 100 kb to 1 gene per 100 kb;
expressed sequence tag matches
(`ESTs') ranges from more than 200 per
100 kb to 1 per 100 kb.
Transposable element densities (`TEs')
ranged from 33 per 100 kb to 1 per 100
kb.
Black and green ticks marks:
Mitochondrial and chloroplast insertions
(`MT/CP').
black and red ticks marks: Transfer RNAs DAPI-stained
and small nucleolar RNAs (`RNAs')
chromophores
Nature 408, 796 (2000)
Biological Sequence Analysis
SS 2009
lecture 7
3
Arabidopsis thaliana genome sequence
The proportion of Arabidopsis proteins having related counterparts in eukaryotic genomes varies by a factor of
2 to 3 depending on the functional category. Only 8 ± 23% of Arabidopsis proteins involved in transcription
have related genes in other eukaryotic genomes, reflecting the independent evolution of many plant
transcription factors.
In contrast, 48 ± 60% of genes involved in protein synthesis have counterparts in the other eukaryotic
genomes, reflecting highly conserved gene functions. The relatively high proportion of matches between
Arabidopsis and bacterial proteins in the categories `metabolism' and `energy' reflects both the acquisition of
bacterial genes from the ancestor of the plastid and high conservation of sequences across all species. Finally,
a comparison between unicellular and multicellular eukaryotes indicates that Arabidopsis genes involved in
cellular communication and signal transduction have more counterparts in multicellular eukaryotes than in
yeast, reflecting the need for sets of genes for communication in multicellular organisms.
Biological Sequence Analysis
SS 2009
lecture 7
Nature 408, 796 (2000)
4
Many genes were duplicated
Nature 408, 796 (2000)
Biological Sequence Analysis
SS 2009
lecture 7
5
Segmental duplication
Segmentally duplicated regions in the Arabidopsis genome.
Individual chromosomes are depicted as horizontal grey bars (with chromosome 1
at the top), centromeres are marked black.
Coloured bands connect corresponding duplicated segments.
Similarity between the rDNA repeats are excluded. Duplicated segments in
Nature 408, 796 (2000)
reversed orientation are connected with twisted coloured bands.
Biological Sequence Analysis
SS 2009
lecture 7
6
Membrane channels and transporters
Transporters in the plasma and intracellular
membranes of Arabidopsis are responsible for the
acquisition, redistribution and compartmentalization of organic nutrients and inorganic ions, for
the efflux of toxic compounds and metabolic end
products, energy and signal transduction.
- almost half of the Arabidopsis channel proteins
are aquaporins which emphasizes the importance
of hydraulics in a wide range of plant processes.
- Compared with other sequenced organisms,
Arabidopsis has 10-fold more predicted peptide
transporters, primarily of the proton-dependent
oligopeptide transport (POT) family, emphasizing
the importance of peptide transport or indicating
that there is broader substrate specificity than
previously realized.
- nearly 1,000 Arabidopsis genes encoding Ser/Thr
protein kinases, suggesting that peptides may
have an important role Nature
in plant
signalling.
408,
796 (2000)
Biological Sequence Analysis
SS 2009
lecture 7
7
What is TAIR*?
•
•
•
•
NSF-funded project begun in 1999
Web resource for Arabidopsis data and stocks
Literature-based manual annotation of gene function
Genome annotation (gene structure, computational gene function)
URL
The following slides were borrowed
from a talk at the TAIR7 workshop
by Eva Huala & Donghui Li
SS 2009
lecture 7
*
Biological Sequence Analysis
8
Portals
Biological Sequence Analysis
SS 2009
lecture 7
9
Tools
Biological Sequence Analysis
SS 2009
lecture 7
10
Search
Biological Sequence Analysis
SS 2009
lecture 7
11
Biological Sequence Analysis
SS 2009
lecture 7
12
Names
Description
Biological Sequence Analysis
SS 2009
lecture 7
13
GO annotations
Expression
Biological Sequence Analysis
SS 2009
lecture 7
14
Sequences
Maps
Biological Sequence Analysis
SS 2009
lecture 7
15
Mutations
Seed lines
Biological Sequence Analysis
SS 2009
lecture 7
16
Seed lines
Links to other
sites
Biological Sequence Analysis
SS 2009
lecture 7
17
Seed lines
Links to other
sites
Biological Sequence Analysis
SS 2009
lecture 7
18
Seed lines
Links to other
sites
Biological Sequence Analysis
SS 2009
lecture 7
19
Seed lines
Links to other
sites
Biological Sequence Analysis
SS 2009
lecture 7
20
Comments
References
Biological Sequence Analysis
SS 2009
lecture 7
21
Biological Sequence Analysis
SS 2009
lecture 7
22
Biological Sequence Analysis
SS 2009
lecture 7
23
Biological Sequence Analysis
SS 2009
lecture 7
24
Biological Sequence Analysis
SS 2009
lecture 7
25
GBrowse - coming soon
Biological Sequence Analysis
SS 2009
lecture 7
26
Overview of releases to date
Protein coding genes
Transposons and pseudogenes
Alternatively spliced genes
Gene density (kb per gene)
Avg. exons per gene
Avg. exon length
Avg. intron length
Nature
(12/00)
25,498
TIGR1
(8/01)
25,554
TIGR2
(1/02)
26,156
TIGR3
(8/02)
27,117
TIGR4
(4/03)
27,170
TIGR5
(1/04)
26,207
NA
1,274
1,305
1,967
2,218
3,786
NA
4.50
5.20
250
168
0
4.55
5.23
256
168
28
4.48
5.25
265
167
162
4.32
5.24
266
166
1,267
4.38
5.31
279
166
2,330
4.54
5.42
276
164
TIGR3
(8/02)
27,117
TIGR4
(4/03)
27,170
TIGR5
(1/04)
26,207
TAIR6
(11/05)
26,541
TAIR7
(4/07)
26,819
1,967
2,218
3,786
3,818
3,889
162
4.32
5.24
266
166
1,267
4.38
5.31
279
166
2,330
4.54
5.42
276
164
3,159
4.48
5.64
269
164
3,866
4.44
5.79
268
165
26,819 protein coding genes
3,866 alternatively spliced
Biological Sequence Analysis
SS 2009
lecture 7
27
T
(1
Plant epigenetics - review
The genomes of several plants have been sequenced, and those of many
others are under way.
But genetic information alone cannot fully address the fundamental question of
how genes are differentially expressed during cell differentiation and plant
development, as the DNA sequences in all cells in a plant are essentially the
same.
Several important mechanisms regulate transcription by affecting the
structural properties of the chromatin:
- DNA cytosine methylation,
- covalent modifications of histones, and
- certain aspects of RNA interference (RNAi),
They are referred to as “epigenetic” because they direct “the structural
adaptation of chromosomal regions so as to register, signal or perpetuate
altered activity states”.
Zhang, Science 320, 489 (2008)
Biological Sequence Analysis
SS 2009
lecture 7
28
The epigenetic landscape of A. thaliana
The relative abundance of genes,
repeats, cytosine methylation and
siRNAs is shown for the length of A.
thaliana chromosome 1, which is ~30
Mb long.
diagram of
chromosome.
euchromatic arms,
pericentromeric heterochromatin;
centromeric core.
Henderson & Jacobson, Nature 447, 418 (2007)
Biological Sequence Analysis
SS 2009
lecture 7
29
DNA methylation
3 distinct DNA methylation pathways with overlapping functions have been
characterized in Arabidopsis.
1 The mammalian DNMT1 homolog METHYLTRANSFERASE 1 (MET1)
primarily maintains DNA methylation at CG sites (CG methylation).
2 The plant-specific CHROMOMETHYLASE3 (CMT3) interacts with the H3
Lys9 dimethylation (H3K9me2) pathway to maintain DNA methylation at CHG
sites (CHG methylation, H = A, C, or T).
3 The DNMT3a/3b homologs DOMAINS REARRANGED METHYLASE 1 and 2
(DRM1/2) maintain DNA methylation at CHH sites (CHH methylation), which
requires the active targeting of small interfering RNAs (siRNAs).
Zhang, Science 320, 489 (2008)
Biological Sequence Analysis
SS 2009
lecture 7
30
DNA methylation
Methylated and unmethylated DNA can be distinguished by 3 major types
of experimental approaches:
(1) sodium bisulfite treatment that converts cytosine (but not methylcytosine)
to uracil,
(2) enzymatic digestion (using methylation-specific endonucleases or
methylation sensitive isoschizomers), and
(3) affinity purification or immunoprecipitation (with methyl-cytosine binding
proteins and antibodies to methyl-cytosine, respectively).
The methylated fraction of the genome is then visualized by hybridizing treated
DNA to microarrays.
Zhang, Science 320, 489 (2008)
Biological Sequence Analysis
SS 2009
lecture 7
31
DNA methylation
Results from these microarray studies were largely consistent:
1
~20% of the Arabidopsis genome is methylated.
2 Transposons and other repeats comprise the largest fraction of methlyated
sequence. The promoters of endogenous genes are rarely methylated.
3 Surprisingly, methylation in the transcribed regions of endogenous genes is
unexpectedly rampant (dt. ungezügelt).
4 More than one-third of all genes contain methylation (called “body
methylation”) that is enriched in the 3′ half of the transcribed regions and
primarily occurs at CG sites.
Zhang, Science 320, 489 (2008)
Biological Sequence Analysis
SS 2009
lecture 7
32
DNA methylation
DNA methylation is critically important in silencing transposons and regulating
plant development.
Severe loss of methylation results in a genome-wide massive transcriptional
reactivation of transposons, and quadruple mutations in drm1 drm2 cmt3 met1
cause embryo lethality.
Interestingly, the role of DNA methylation in regulating transcription appears to
depend on the position of methylation relative to genes:
- Methylation in promoters appears to repress transcription.
- Paradoxically, however, body-methylated genes are usually transcribed at
moderate to high levels and are transcribed less tissue-specifically relative to
unmethylated genes.
Zhang, Science 320, 489 (2008)
Biological Sequence Analysis
SS 2009
lecture 7
33
DNA methylation: new paper
Recently, Cokus et al. combined sodium bisulfite treatment of genomic DNA with
ultrahigh-throughput sequencing (>20× genome coverage) to generate the first
DNA methylation map for any organism at single-base resolution.
This “BS-Seq” method has several advantages over microarray-based methods :
1 it can detect methylation in important genomic regions that are not covered by
any microarray platform (such as telomeres, ribosomal DNA, etc.).
2 it reveals the sequence contexts of DNA methylation (i.e., CG, CHG, and CHH)
and therefore provides important information regarding the epigenetic pathways
that function at any given locus. E.g. all 3 types of methylation colocalize to transposons, but gene body methylation occurs exclusively exclusively at CG sites.
3 BS-Seq is more effective in detecting light methylation and subtle changes (e.g.,
in mutants).
4 the theoretically unlimited sequencing depth makes it possible to quantitatively
measure the percentage of cells in which any particular cytosine is methylated,
thereby offering important clues regarding potential cell-specific DNA methylation.
Biological Sequence Analysis
SS 2009
lecture 7
34
RNA-directed DNA methylation
Putative pathway for RNA directed DNA methylation in
A. thaliana. Target loci (in this case tandemly repeated
sequences; coloured arrows) recruit an RNA
polymerase IV complex consisting of NRPD1A and
NRPD2 through an unknown mechanism, and this
results in the generation of a single-stranded RNA
(ssRNA) species. This ssRNA is converted to doublestranded RNA (dsRNA) by the RNA-dependent RNA
polymerase RDR2.
The dsRNA is then processed into 24-nucleotide siRNAs by DCL3. The siRNAs are
subsequently loaded into the protein AGO4, which associates with another form of the RNA
polymerase IV complex, NRPD1B–NRPD2. AGO4 that is ‘programmed’ with siRNAs can
then locate homologous genomic sequences and guide the protein DRM2, which has de
novo cytosine methyltransferase activity. Targeting of DRM2 to DNA sequences also involves
the chromatin remodelling protein DRD1.
Henderson & Jacobson, Nature 447, 418 (2007)
Biological Sequence Analysis
SS 2009
lecture 7
35
DNA methyltransferase structure and function
Plant and mammalian
genomes encode
homologous cytosine
methyltransferases, of
which there are 3
classes in plants and 2
in mammals.
PWWP, Pro-Trp-Trp-Pro motif;
UBA, ubiquitin associated.
A. thaliana MET1 and Homo sapiens DNMT1 both function to maintain CG
methylation after DNA replication, through a preference for hemi methylated
substrates. Both have N-terminal BAH domains of unknown function.
De novo DNA methylation is carried out by the homologous proteins DRM2 (in A.
thaliana) and DNMT3A and DNMT3B (both in H. sapiens).
Despite their homology, these proteins have distinct N-terminal domains, and the
catalytic motifs present in the cytosine methyltransferase domain are ordered
differently in DRM2 and the DNMT3 proteins.
Plants also have another class of methyltransferase, which is not found in mammals.
CMT3 functions together with DRM2 to maintain non-CG methylation.
Henderson & Jacobson, Nature 447, 418 (2007)
Biological Sequence Analysis
SS 2009
lecture 7
36
Motiv density along Arabidopsis chromosomes
Distribution of genes, repetitive sequences,
DNA methylation, siRNAs, H3K27me3, and
low nucleosome density (LND) regions.
Left: chromosomal distributions on chr 1.
The x axis shows chromosomal position.
Zhang, Science 320, 489 (2008)
Biological Sequence Analysis
SS 2009
lecture 7
37
Small RNAs
4 major endogenous RNAi pathways have been described in Arabidopsis.
Functioning at at the posttranscriptional level through mRNA degradation and/or
translation inhibition are
- the microRNA (miRNA),
- transacting siRNA (ta-siRNA), and
- natural-antisense siRNA (nat-siRNA) pathways.
The siRNA pathway is involved in gene silencing both transcriptionally by
directing DNA methylation and posttranscriptionally by guiding mRNA cleavage.
Zhang, Science 320, 489 (2008)
Biological Sequence Analysis
SS 2009
lecture 7
38
Function of small RNAs
MicroRNAs (miRNAs) and transacting siRNAs (tasiRNAs) are primarily involved
in regulating gene expression and plant development,
siRNAs play a major role in defending the genome against the proliferation of
invading viruses and endogenous transposable elements.
The function of the fourth type of sRNAs, natural-antisense siRNAs (natsiRNAs), is not entirely clear but is likely related to plant stress responses
Zhang et al., PNAS 104, 4536 (2007)
Biological Sequence Analysis
SS 2009
lecture 7
39
Small RNAs
Millions of 21- to 24-nucleotide (nt) siRNAs have been cloned and sequenced
from wild-type Arabidopsis plants and siRNA pathway mutants.
Most of these studies generated not only sequence information necessary to
map the siRNAs back to their originating genomic loci,
but also the length information of siRNAs that is indicative of the processing
enzymes involved (e.g., DICER-LIKE enzymes, DCLs).
Zhang, Science 320, 489 (2008)
Biological Sequence Analysis
SS 2009
lecture 7
40
Small RNAs
The majority of the siRNAs (>90%) are produced from double-stranded RNA
(dsRNA) precursors generated by RNA polymerase IV isoform a (Pol IVa) and
RNA-dependent RNA polymerase 2 (RDR2).
RNAP IV is a recently identified class of RNAP that is specific to plant genomes.
Unlike RNAP I, II, and III, RNAP IV appears to be specialized in siRNA
metabolism.
These dsRNA precursors are then processed by DCL3 to 24-nt siRNAs (with
partially redundant contributions from DCL2 and DCL4) and become
preferentially associated with ARGONAUTE4, which then interacts withPol IVb to
direct DRM1/2- mediated CHH methylation.
Most of these siRNAs are derived from genomic loci corresponding to transposons
with high levels of CHH DNA methylation, and very few are found in proteincoding genes.
Zhang, Science 320, 489 (2008)
Biological Sequence Analysis
SS 2009
lecture 7
41
Distribution patterns and transcription activity
detailed distribution
patterns and
transcription activity
(vertical blue bars) in a
gene-rich region (top)
and a repeat-rich region
(bottom).
Red boxes: genes;
Arrows indicate the
direction of transcription.
Zhang, Science 320, 489 (2008)
Biological Sequence Analysis
SS 2009
lecture 7
42
Positioning relative to Arabidopsis genes
(A) Distribution of DNA
methylation, siRNAs, and
H3K27me3 relative to
Arabidopsis genes.
Thick and thin horizontal
bars represent genes
and intergenic regions,
respectively.
(B) Distribution of
repetitive
sequences relative to
genes in Arabidopsis
(green) and rice (red).
Biological Sequence Analysis
SS 2009
lecture 7
Zhang, Science 320, 489 (2008)
43
Conclusions
2 major fractions of the Arabidopsis genome are associated with and regulated by
different epigenetic mechanisms:
(1) Genes are regulated by pathways such as H3K27me3, H3K4me2, and miRNAs/tasiRNAs/nat-siRNAs, whereas
(2) transposons and other repeats are silenced by DNA methylation, H3K9me2, and
siRNAs.
Such a functional distinction, however, is blurred when the 2 genetic fractions overlap,
which occurs much more frequently in larger and more complex genomes.
Zhang, Science 320, 489 (2008)
Biological Sequence Analysis
SS 2009
lecture 7
44
Conclusions
Although increasingly comprehensive, such an epigenomic picture remains static.
Relatively little is known about how the plant epigenome changes in response to
developmental or environmental cues.
A particularly interesting question may be how mechanisms that evolved to stably
silence transposons could offer the flexibility required for the developmental regulation
of endogenous genes.
In addition, we do not yet have a clear understanding of the nature and the
maintenance of the boundaries separating epigenetically distinct chromatin
compartments.
In some cases, genetic landmarks (such as the transcription unit) may serve as
borders; in other cases, the balancing acts of opposing epigenetic mechanisms may
help to stably maintain the epigenetic landscape of plant genomes.
Zhang, Science 320, 489 (2008)
Biological Sequence Analysis
SS 2009
lecture 7
45