Transcript Slide 1

Global Analysis of Functional Units of Plant Chromosomes:
DNA Replication, Domain Structure, and Transcription
PI: Bill Thompsona, Co-PIs: George Allenb, Linda Hanley-Bowdoinc, Doreen Maind, Rob Martienssene, Bryon Sosinskif, Matthew Vaughne
a
Plant Biology, Genetics, and Crop Science, NC State University, Raleigh, NC 27695; b Horticultural Science, NC State University; c Biochemistry and Genetics, NC State University
d Horticulture & Landscape Architecture, Washington State University; e Cold Spring Harbor Laboratory, New York; f Horticultural Science, NC State University
Analysis of DNA Replication timing on Arabidopsis Chromosome 4
In this project we are identifying and characterizing regions of the genome that are replicated at different times during S phase. We have developed a
FACS-based procedure, combined with BrdU pulse-labeling and immunoprecipitation, to analyze replication timing of an asynchronous cell population.
This approach is intended to define functional domains of chromatin by determining their preferred time of replication.
Figure 1 shows a chromosome-wide view of genomic features and DNA replication in the early, mid and late S-phase for Arabidopsis Chr 4. Panel A
shows gene coverage (orange line) and TE coverage (purple line). Panel B shows GC percentage (calculated in 1-kb non-overlapping windows) is
shown in panel B. Panel C shows a schematic representation of chr 4 omitting the telomeres and NOR (The gene-rich euchromatic distal short and distal
long arms are shaded light gray while the heterochromatic knob and pericentromere are shaded black. The proximal portions of both the short and long
arms have intermediate characteristics and are shaded dark gray). Panel D shows replication profiles , expressed as log2 ratio of BrdU-incorporation in
early (blue), mid (green) or late (red) S-phase cells relative to total DNA from the same cells. Gene and TE coverage, GC percentage, and replication
profiles are loess-smoothed using a 150-kb window.
Early replication is most prevalent in the distal long arm, a predominately euchromatic region. Late replication predominates in the heterochromatic
knob and pericentromere, whereas regions of late replication are dispersed in other portions of the chromosome, especially in the centromere-proximal
portions of the long and short arms.
A remarkable feature of these data is that the early and late replication profiles in panel D show remarkable complementarity (R= -0.83), while the
profiles for replication in early and mid S-phase cells are very similar to each other (R= 0.87). The most evident difference between the early and mid S
profiles is a broadening and merging of early-replicating regions in mid S. In other words, the DNA replicating in mid S-phase represents nearly the same
population of sequences as that replicating in early S-phase. The mid S-phase profile is also distinct from the late profile (R= -0.85). These data indicate
that replication in Arabidopsis is basically a two phase process. The similarity of the early and mid S profiles is best explained by assuming considerable
heterogeneity in the order of replication in different cells.
Figure 2 displays a schematic representation of replicons, replication timing and replication domains for chr 4. In the top panel, each vertical bar
represents a replicon, with the width of the bar proportional to the length of the replicon. Subdivisions within the bar indicate the percentage of probes
within the replicon with a given replication time. The middle panel illustrates the clustering of replicons with similar timing into replication domains. The
lower panel is a cartoon of the major regions of the chromosome, as in panel C of Figure 1. The complexity of replication timing within many replicons
likely reflects several factors, including time and efficiency of origin firing, the number of origins within initiation zones, and the speed of elongation by
DNA polymerase in specific contexts. Many of replication domains we found in Arabidopsis chr 4 are considerably smaller than those observed in
mammalian cells. However, several larger replication domains do occur, including a 4.5-Mb late replication domain at coordinates 2.6 – 7.1 Mb and a 2.3Mb early/mid replication domain located at coordinates 16.2 – 18.5 Mb.
A
B
C
D
Figure 1. Landscape of Arabidopsis chr 4 with replication timing profiles for early, mid, and late
S-phase cells
Mapping Nuclear Matrix Attachment Regions (MARs)
Genomic DNA is packaged and organized
within the nucleus by histones. When the
histones are extracted, the DNA forms large
loops (nuclear haloes), which remains bound
by Matrix Attachment Regions, or MARs, to a
substructure composed of RNA and protein
called the nuclear matrix. While the biological
significance of MARs remains largely
unknown, several studies have shown that
MARs may function as origins of DNA
replication in higher eukaryotes. We have
used lithium diiodosalisylic acid (LIS) to extract
the histones from A. thaliana nuclei to produce
nuclear haloes, which were then digested with
Eco RI and Hind III. Matrix associated DNA
was separated from unbound DNA by low
speed centrifugation. The MAR DNA was then
amplified and labeled for microarray analysis.
Early
Early and Mid
Mid
Mid and Late
Late
Early and Late
Early, Mid and Late
Indeterminate
Preliminary Mapping Results We have carried out four array hybridization experiments that represent two biological replicates and two technical
replicates using our custom-designed NimbleGen tiling array for chromosome IV. Our ”first pass” analysis used a combination of limma and
NimbleScan, to resolve 933 putative MARs at an estimated FDR <0.05. The median length of the putative MAR regions from this analysis is 800 bp.
Panel A shows that the median AT content of the putative MARs (histogram) is 71%, which contrasts to the median AT content of 63% along
chromosome IV (red curve). These data are consistent with earlier studies showing that MARs are AT-rich. The relative distance between each MAR
can be used to estimate loop size. Panel B depicts the uneven spacing of MARs, which encompass a range of loop sizes across chromosome IV.
The frequency of MAR spacing shows an unimodal distribution. The largest peak contained MARs with spacing that ranged from 42 bp to 265 kb with
an average loop size of 19 kb and a median loop domain size of 10 kb.
DNA Replication Timing in Rice
We are optimizing the essential
conditions for analysis of DNA
replication timing on rice chr 4L and
10L using a rice cell culture (cultivar
Nipponbare). All the technologies we
developed for Arabidopsis replication
timing will apply to this rice work with
the optimized conditions. Panel A:
The highest BrdU incorporation was
observed twelve to sixteen hours after
7 day cultures were supplemented with
fresh medium (“7 day split”). Panel B:
Analytical FACS profile of 1-hr pulselabeled rice nuclei isolated from cells
after a 1 hour pulse given 16 hours
after 7-day split cultures. Gates are
defined for cell populations in G1, S
and G2/M phases.
Figure 2. Replication timing and replicon structure on Arabidopsis chr 4
Mapping of Short Nascent Strands
DNA replication is a strictly regulated process that preserves the genetic information necessary for future generations. Despite its
importance, very little is known of the regulation of DNA replication in higher eukaryotes.
Our goal is to understand and define where DNA replication originates in the Arabidopsis thaliana genome. We are using the newly
synthesized leading strands (short nascent strands, SNS) which are thought to be initiated at the very origins. These SNS are being
analyzed using a NimbleGen custom designed tiling array that covers the entire chromosome lV. We have developed two techniques to
achieve this. In the first (A), we isolated SNS by size using an alkaline sucrose gradient (fig. A). Collected DNA between 1 and 3kb (including
SNS) is then amplified, labeled and hybridized to the array.
The second technique was developed to enrich and purify SNS. During the synthesis of DNA, the RNA primase adds an RNA primer so the
DNA polymerase can recognize it and start synthesizing DNA. To isolate SNS, we used lambda exonuclease (B) which is unable to digest
ssDNA primed with RNA primer at 5’ while digesting unprotected DNA. This allows us to recover newly synthesized DNA that is close to the
origin. The recovered SNS are then amplified, labeled and hybridized to the array.
Profiling Histone Modifications
During the last year we have profiled histone modifications,
RNA Pol II occupancy and gene expression patterns in cell
suspension culture (samples Cells4 and Cells7, taken at
96hrs and 16 hrs post culture split, respectively) and in the
young rosette leaf samples from the wild-type Col plants
using Illumina GA2 high throughput DNA sequencers. We
used antibodies against histone H3K4me2, H3K4me3,
H3K9me2, H3K27me2, H3K27me3, H3K14ac and
H3K56ac, as well as antibodies against RNA Pol II to
immuno-precipitate DNA associated with these histone
modifications or with the initiating or elongating forms of
RNA Pol II. This generated, on average, over 1.5x genome
coverage per sequenced ChIP library. We have also
sequenced over 2M mRNA fragments (40-50 nt in length) in
a strand-specific manner from the same samples. In
addition, we have sequenced between 1.7M and 3M small
RNA from leaf, Cells4 and Cells7 samples. We are currently
focusing our efforts on the analysis of ChIP-seq and RNAseq data in the context of DNA replication timing by
computational identification of genomic regions significantly
enriched by immunoprecipitation over the control, input,
DNA, in 100bp windows. The windows showing enrichment
will be displayed as intensity maps in the Generic Genome
Browser 2 environment.
(C) Comparison of the preliminary data
for SNS with two different procedures.
Data shows that 62% of the peaks
found in SNS Lambda exonuclease are
common with SNS by size. While this
suggest that we have putative SNS,
more analysis is needed to confirm our
results
ORC2 binding sites in Arabidopsis Ch4 euchromatin
often map with early replication timing peaks
Validation of ORC2 and MCM5 ChIP
(A) Proteins from the final ChIP input fraction
(40 ug, lane 1), whole cell extract (40 ug WCE,
lane 2) and volume equivalents from the nonchromatin-associated (S, lane 3) and
chromatin-bound (P, lane 4) fractions were
resolved by SDS-PAGE, and the blots were
probed with the indicated antibodies.
Chromatin was immunoprecipitated with the
indicated quantity of anti-ORC2 serum (B) or
anti- MCM5 serum (C). Proteins remaining in
the supernatant (depletion) and the
immunoprecipitated (enrichment) fractions
were resolved by SDS-PAGE and the blots
were probed with anti-ORC2 serum (B) or antiMCM5 serum (C). (D) Sheared chromatin.
Control (lane 2) and sonicated (lane 3) DNA
was resolved and visualized with ethidium
bromide.
The profile shows the Loess-smoothed early S replication profiles. Vertical lines indicate the termination zones for putative replicons.ORC2-binding sites
are marked by orange spheres. Vertical positioning of ORC2 sites is from the replication profile and does not indicate ORC2 enrichment.
Preliminary analysis of ORC2 binding sites The experiment included 3 bioreps each with an IP-technical rep for a total of 6 ChIP samples
corresponding to ORC2. The samples were hybridized to Nimblegene Ch4 tiling arrays. The raw data was Loess-normalized and scaled in
limma, and peak-finding was performed with NimbleScan using an 800-bp window. This analysis identified 563 putative ORC2-binding sites at
an FDR ≤ 0.05, of which 289 were in constitutive heterochromatin. Euchromatic ORC2 binding sites tend to be intergenic, e.g. distal long arm is
only ~ 22% intergenic but 56% of ORC2 binding sites in this region are intergenic. ORC2 binding sites are also AT-rich (68% for ORC2 binding
compared to 64% for sites not binding ORC2).