Plant genome

Download Report

Transcript Plant genome

• • • • • • • • • • •

Plant molecular genetics

Plant genome Chromatine and DNA methylation RNA interference Genome of plastids and mitochondria Transposible elements Viruses Classical genetic mapping Transgenosis and reverse genetics Genomics, next generation sequencing Transcriptomics Proteomics

Components of plant genome

• nuclear genome =

genome sensu stricto

• plastids -

plastome

• mitochondria -

chondriome

Plant genome sizes

54 Mbp – Cardamine amara 124 852 Mbp - Fritillaria 149 000 Mbp - Paris japonica

-

currently the largest

(not only plant)

http://data.kew.org/cvalues/

Plant genome sizes

10 Mb

Ostreococcus

(single cell alga) 54 Mb

Cardamine amara

64 Mb

Genlisea aurea

125 Mb

Arabidopsis

500 Mb

Oryza

Ratio of globe volumes differing 3000 times 5 000 Mb

Hordeum

17 000 Mb

Triticum

84 000 Mb

Fritillaria (largest diploid)

143 000 Mb

Paris (oktaploid)

- Angiosperms – size differences up to almost 3 000 times - Gymnosperms – genome sizes often around 10 000 Mb - Gene number differences much lower (approx. 20 – 200 fold)

Plant genome sizes

What we can deduce?

- Genomes are increasing in evolution - Average increase is higher in Monocots

C-value paradox

- there is no strong correlation between complexity of an organism and the size of its genome • C-value = size of genome in non-replicated gamete genome size (bp) = (0.910 x 10 9 ) x DNA content (pg) DNA content (pg) = genome size (bp) / (0.910 x 10 9 ) 1 pg = cca 910 Mbp; MW (1 bp) = cca 660 Da • genomes of related organisms often strongly differ in size causes: - duplications of whole genomes (polyploidization) or chromosome segments - replication of invasive DNA (transposable elements) - but reductions also possible (recombination – diploid cotton sp.)

Sequences in plant genomes

Unique sequences – genes, but also non-coding (!) Repetitive: • Duplications of chromosomal regions • Medium repetitive DNA – –

Tandem repeats of rRNA, tRNA a histon genes Gene families with multiple members

Transposable elements – also high repetitive

• Highly repetitive – low complexity DNA

-

– Tandem arranged simple sequence repeats (SSR)

Centromers

(180 bp repeat

Arabidopsis

) (TTTAGGG)n

a telomers

Types of sequences in plant genomes

• Unique sequences – coding genes, but also non coding regulatory (!) • Medium repetitive DNA –

Tandem repeats of rRNA, tRNA a histon genes

– –

Gene families with multiple members Transposable elements – also highly repetitive

• Low complexity DNA (

highly repetitive

) – Tandem arranged simple sequence repeats (SSR) –

Centromers

(180 bp repeat (TTTAGGG)n

Arabidopsis

)

a telomers

- some behave as satelite DNA

Aside – term definition:

sequence complexity

(~ the amount of information) repetitive AAAAAAAAAAAAAAAAAAAAA complexity 1 (21xA) ATCATCATCATCATCATCATC complexity 3 (7xATC) (what is the complexity if it is a coding sequence?) unique ATCGTATCGCGATTTTAACGT complexity 21 (1xAT…) -

unique x repetitive – depends on the size of the evaluated frame (= size of analyzed DNA fragments)

Sequence complexity of plant genomes

Higly repetitive Medium repetitive Unique Sequence complexity

Examples of repetitive DNA representation

in u Soybean and

Silene

(clusters of related sequences)

Silene latifolia

Gypsy, copia = retrotransposon families clDNA = chloroplast DNA (partially contamination, but also recent insertions)

Measuring of genome complexity reasociation kinetics

• DNA fragmented to 300 - 500 bp, denatured • Monitoring of reassociation in time - separation (chromatographic) of ss and ds DNA • Analysis of kinetics (C o t curves) shows representation of various types of repetitive DNA – rare sequences reasociate more slowly that repetitive

Reasociation kinetics

depends on sequence complexity

Eucaryotic genomes usually contain three fractions of sequences with different complexity

Low complexity = highly repetitive Middle repetitive Unique sequences = High complexity

Reasociation kinetics of small and large genomes

Unique Medium repetitive Highly repetitive

Repetitive sequences can be easily detected in situ FISH = fluorescent in situ hybridization (

possible even with unique seq.

)

180 bp A.th.

copia

A.th.

45S rDNA Crocus tandem repeats dp5a1 wheat (Heslop-Harrison, Plant Cell 12:617, 2000)

Subtelomeric repeats in rye Telomers in rye (TTTAGGG)n

Differences in small and large genome arrangements

large genomes:

genes present in

„gene-rich islands“

isolated with long regions of repetitive DNA

Reconstruction of gradual cummulation of transposable elements in maize genome In

Panicum

in the presented region no transposible elements, in maize 60 % of its size

Plant Genome Sequencing

http://genomevolution.org/wiki/index.php/Sequenced_plant_genomes

April 13 – less complete in gray

Large Genome Sequencing

-

sequencing per partes

(separated chromosomes) -

sequencing of non-methylated DNA (= transcriptionally active)

-

sequencing of ESTs

Aside – term definition: Expressed Sequence Tags (ESTs) - short sequenced regions of cDNA (300-600 nt) - mostly gene segments (primarily from mRNA) - alternative sourse of coding sequences for large genomes (rapid and inexpensive) Weak points: - highly redundant, incomplete (!) - problems: various transcript levels - gene expression regulated spatially and temporally, developmentally, environmentally - regulatory sequences not represented (promotors, introns,...)

Expressed Sequence Tags (ESTs) Preparation of EST library - mRNA - RT with oligoT primer  cDNA -cleavage of RNA from heteroduplex RNAseH - 2nd strand cDNA synthesis cleavage with restriction endonuclease - adaptor ligation cloning sequencing

Aside:

Arabidopsis thaliana

the most important model of plant biology 1 week 3 weeks 4 weeks 6 weeks

Arabidopsis

genome: 125 Mbp

genes ESTs TEs genes ESTs TEs genes ESTs TEs genes ESTs TEs High density low density genes ESTs TEs

Total gene number prediction in time

(after whole genome sequencing)

Feature DNA molecule Length (bp) Top arm (bp) Bottom arm (bp) Base composition (%GC) Overall Coding Non-coding Number of genes Gene density (kb per gene ) Average gene Length (bp) Average peptide Length (bp) Exons Number Total length (bp) Average per gene Average size (bp) Number of genes With ESTs (%) Number of ESTs

Genome of

Arabidopsis

Chr.1

29,105,111 14,449,213 14,655,898

statistics

Value

Chr.2

19,646,945 3,607,091 16,039,854

Chr.3

23,172,617 13,590,268 9,582,349

Chr.4

17,549,867 3,052,108 14,497,759

Chr.5

25,53,409 11,132,192 14,803,217

-

SUM

115,409,949 33.4

44.0

32.4

6,543 4.0

2,078 446 35,482 8,772,559 5.4

247 60.8

30,522 35.5

44.0

32.9

4,036 4.9

1,949 421 19,631 5,100,288 4.9

259 56.9

14,989 35.4

44.3

33.0

5,220 4.5

1,925 424 26,570 6,654,507 5.1

250 59.8

20,732 35.5

44.1

32.8

3,825 4.6

2,138 448 20,073 5,150,883 5.2

256 61.4

16,605 34.5

44.1

32.5

5,874 4.4

1,974 429 31,226 7,571,013 5.3

242 61.4

22,885 25,498 132,982 33,249,250 105,773 + hundreds of MIR genes - role in regulation of gene expression

Gene function

The majority of plant genes form gene families

Number of paraloques

• gene families are often in tandem arrangement, but also spead in the genome • tandem repeats are composed of near, but also far paralogues (recombinations) • duplications of long chromosomal regions

Aside – terms definition:

Homologous genes

genes with similar sequences derived from the same ancestral gene

(quantification – sequence identity, similarity)

Paralogous genes

genes with similar sequences derived from the same ancestral gene present at different loci within the same genome . •

Orthologous genes

genes in different species that are similar to each other because they originated from a common ancestral gene in a common ancestor.

(if more paralogues are present – genes serving the same function are regarded to be orthologs)

Orthologues vs. paralogues

Orthologous genes Ancestral Species Gene A Species A Gene A” Ancestral Species Gene A Species B Gene A’ Paralogous genes = genes duplicated within the species Species A Gene A” Gene A’” Paralogous genes Species B Gene A’

Mechanisms of gene duplications

(increase in paralogue number) • tandem duplication • transpozition • segmental duplications • whole genome duplications

Differences in genes/gene families in genomes

Genes Gene families

Arabidopsis

x

Populus

– large overlap, about 1,5 times more paralogues in poplar

(Arabidopsis

+

Populus)

x

Oryza

– many genes specific for Monocots

Arabidopsis is ancient tetraploid

(as well as probably the majority of plants) Duplicated chromosomal regions form about 60 % of genome (67.9 Mb) Polyploidization significantly increases genome (and organism) plasticity and played very important role in plant (genome) evolution; About 30-80% plant species are polyploid

Polyploidization in Angiosperm evolution Fawcett et al. 2009

Dating of whole genome duplication according to the number of synonymous mutations per synonymous site - Ks Ks=3/2,66 Phe Leu Met Val UUU CUA AUG GUU UUC UUG AUG GUU 0 0 1/3 1/3 0 1 0 0 0 0 0 1 = number of syn. sites Gene number

Comparisons of paralogue pairs

Peaks indicate genome duplications Ks Fawcet

et al.

2013

Polyploidization in plant evolution • 35 % species neopolyploids • most species repeatedly polyploid in evolution • viable aneuploid variants – (frequetly after allopolyploidization – hexaploid wheat) stabile wheat lines with missing chromosomal arm (of homeologic chromosome) Blue dots asterix – duplications, – triplication K-T (Fawcett et al. 2013)

Polyploidization fusion of non-reduced gametes or endoreduplication n = x = 4 n = x = 4 n = x = 4 n = x = 7 x x 2n = 4x = 16 Spontaneous duplication (endoreduplication) 2n = 4x = 22 autopolyploidy allopolyploidy Similar frequency in polyploidic plant species

Chromosome doubling is necessary for meiosis in hybrids species A species B X sterile Genome duplication fertile Preferential pairing of homologous chromosomes Related from different species (homeologous) can also pair

Allopolyploidic genomes in

Brassica

genus

Species

Brassica rapa B. nigra B. oleracea B. juncea B. napus B. carinata

Caryotype Genom e

2n = 2x = 20 2n = 2x = 16 A B C 2n = 2x = 18 2n = 4x = 36 AB 2n = 4x = 38 2n = 4x = 34 AC BC

Brassica carinata

CC BB CC

Brassica olarecea

Ancient interspecies hybrids BB

Brassica nigra

AA CC

Brassica napus

AA BB

Brassica juncea

AA

Brassica rapa

Allopolyploid tobacco species

– DNA size changes

Fade of duplicated genes differ

(gene dosage balance theory) • genes encoding interacting proteins “connected genes“ (signal pathways, complex subunits, …) easily preserve in genome after duplication - loss or partial duplication of one component results in gene inbalance decreasing fitness, - whole duplicated complex can be specialized for a new function and increase organism complexity -secondary function probably present already in the ancestral complex (pathway), but only duplication allowed adaptive evolution for both functions without selection constrains Escape from adaptive conflict - EAC model • other „single genes“ more easily lost after genome duplication, but can be preserved after individual duplication

- most of duplicated genes is lost after whole genome duplication - loss is not as even (↑) in both copies - probably frequent epigenetic marks in one copy (methylation) - preferential gene loss and mutagenesis of methylated copy - gene conversion and homogenization can occur (!)

de novo

allopolyploids (~ rape seed) – recombinations preferentially in

homeologous

chromosomes without preference of any parental genome

(= homologní, v jednom genomu, ale původem od různých rodičů)

Changes in newly formed allopolyploid genome: - DNA methylation changes - losses of parts or whole chromosomes (aneuploidy – decreased fertility) - frequent activation of TE - expression of homeologous genes is not usually additive - transcriptome usually more reduced than genome - different regulation of expression - often organ specific expression of genes from each parent, new sites of expression, new regulation - „divergent resolution“ - speciation (different gene loss in individuals - lethality in F2, - absence of essential gene = reproduction barrier

Plants can survive also with haploi genome!

- reprogramming of male or female gametophyte development in vitro – no gamete formation, but development resembling embryogenesis

-

- usually from immature microspores = androgenesis female gametophyte =

gynogenesis

haploid plants are sterile - through endoreduplication (colchicin or spontaneous) – completely homozygous plants – dihaploids

Androgenesis in rape seed (pollen embryogenesis)

... But genomes are still similar

Colinearity, syntheny

Paterson et al., Plant Cell 12: 1523-1539, 2000

„Syntheny“ is usually missused to describe colinearity

Syntheny = orthologous loci in two species on the same chromosome Species A A’ C’ B’ Ancestral Species A B C Species B C” B” A” Colinearity = group of loci in two species on a chromosom in the same order Species A A’ B’ C’ Ancestral Species A B C Species B A” B” C”

Changes in colinearity caused by chromosomal arm inversion

Colinearity of Poaceae genomes

Colinear regions differ mainly in repetitive DNA

Summary:

• Current plant genomes result from repeated cycles of partial and complete duplications, followed by reduction and modification of duplicated sequences.

• There are no genomes without redundancy.

• Plant genomes are still very dynamic.

• High portion of genome consists of repetitive DNA