FHI Biotechnology Approaches Clonal testing New varieties Marker-aided breeding Transgenics Genome sequencing GE trees Chestnut Genome Research Team John E.

Transcript FHI Biotechnology Approaches Clonal testing New varieties Marker-aided breeding Transgenics Genome sequencing GE trees Chestnut Genome Research Team John E.

FHI Biotechnology Approaches
Clonal testing
New varieties
Marker-aided
breeding
Transgenics
Genome sequencing
GE trees
Chestnut Genome Research Team
John E. Carlson, PI, Schatz Center, Penn State University
DNA Sequencing
Stephan C. Schuster
Professor of Biochemistry and Molecular Biology, Penn State
Lynn P. Tomsho, Daniela Drautz, and Lindsay Kasson
Sequencing Specialists, Penn State
Tyler Wagner
Research Assistant, Penn State
Bioinformatics and Comparative Genomics
Webb Miller
Professor of Biology and Computer Science & Engineering, Penn State
Charles Addo-Quaye
Postdoctoral Fellow, Penn State
Meg Staton, Stephen Ficklin and Christopher Saski
Bioinformatics team at Clemson University Genomics Institute
Abdelali Barakat
Research Associate, Clemson University
FHI Cooperators: Bert Abbott, Sandra Anagnostakis, Kathleen Baier, Ali Barakat,
Nurul Faridi, Eric Feng, Stephen Ficklin, Fred Hebard, Thomas Kubisiak, Charles
Maynard, Scott Merkle, Joseph Nairn, William Powell, Dana Nelson
The Chinese Chestnut Genome Sequencing Project
Our Goals:
1) Develop a complete reference genome sequence for chestnut
2) Identify all genes in the three blight resistance QTL
3) Deliver candidate genes to the FHI Transgenics group and
the FHI Marker-aided breeding group
4) Provide the genome to the research community
5) Demonstrate the potential of genomics to address
forest health and ecosystem restoration.
The Chinese Chestnut Genome Sequencing Project
DELIVERABLES FOR YEAR ONE
were all achieved
1. The reference Castanea mollissima cv. Vanuxem
genome was sequenced to over 25-fold depth.
2. Preliminary de novo assemblies of the reference
genome sequence were conducted.
3. Commenced use of genetic and physical map
information (from the FHI genetic technologies
group) in genome assembly.
The Chinese Chestnut Genome Sequencing Project
DELIVERABLES FOR YEAR ONE, the details
• “Shot-gun” sequencing completed by March, 2010
18-fold* depth by 454 technology = 14.2 Gigabases
 47-fold* depth by Illumina technology = 37.6 Gigabases
• Passed QC tests:
 mtDNA < 0.4% and cpDNA < 0.3% of sequence
 microbial DNA negligible
 sequence reads over 350 bp
 repetitive DNA manageable (conserved repeats at 9 to 12%)
• Preliminary assemblies of the genome sequence were promising
totalling app. 852 Mbp, but in smaller pieces than desired
* assumes a genome size for chestnut of app 800 Mbp
The Chinese Chestnut Genome Sequencing Project
WHAT WE LEARNED IN YEAR ONE
1. “Next Gen” sequencing technologies produce a
large amount of high quality data, very quickly.
2. Large amounts of high quality data take a long time
to assemble using currently available software.
3. Assembly of the reference genome will require more
than just “shot gun” Next Gen sequence data.
4. “Paired end” data are required to pull contigs
together into chromosome scaffolds.
5. For assembly purposes, the chestnut genome may
be larger than 800 Mbp.
The Chinese Chestnut Genome Sequencing Project
DELIVERABLES ACHIEVED IN YEAR TWO
1. Produced paired-end sequence data.
2. Covered the physical map with BAC-end sequences.
3. Commenced gene identification and characterization:

Transcripts aligned to the genome assembly

Assembly searched for genes

Preliminary annotations of genes conducted
4. Strategy for resistance gene discovery updated.
The Chinese Chestnut Genome Sequencing Project
DELIVERABLES FOR YEAR TWO, the details
1. Paired-end sequences from 454 sequencing at 4.5-fold
depth (3.6 Gb).
2. 43,143 BAC-end sequences obtained, “tiling” the physical
genome map to 1.5-fold depth, anchored to genetic map.
3. New assemblies conducted using the paired-end data:

587,208,063 bp assembled into 51,766 scaffolds,

925,312,071 bp assembled into 1,147,939 contigs
The Chinese Chestnut Genome Sequencing Project
DELIVERABLES FOR YEAR TWO
Gene Identification and Characterization
4. Chinese chestnut unigenes (transcripts) from NSF project
aligned well to the current genome assembly:

97% of transcripts (46,954) aligned to genome assembly

98% identity of transcripts and genome sequences
5. Results of gene search with preliminary assembly:

66,662 gene models predicted in the scaffolds
- certainly an over-estimate of gene number at this point
- mean gene length 2,761 bp, maximum length 43,203 bp
- mean number of genes per scaffold 12.8, maximum 58
6. Candidate gene sequences identified in genome contigs

Coding sequences delivered to the transgenics team
The Chinese Chestnut Genome Sequencing Project
The largest gene identified in the preliminary
Chinese Chestnut genome assembly
Homolog of AT1G67120 (NP_176883.4), AAA ATPase, von Willebrand factor
type A domain-containing protein, with nucleoside-triphosphatase activity.
•Transcript length: 43,203 bases
•Number of Exons: 71
•Scaffold ID: scaffold01252
The Chinese Chestnut Genome Sequencing Project
Most Arabidopsis single-copy genes have strong matches to the
current genome assembly (by BLAST alignment)
Number of Genes
250
200
150
All_Contigs
Large_Contigs
Scaffolds
N = 959
100
50
0
e-05 e-10 e-20 e-30 e-40 e-50 e-60 e-70 e-80 e-90 e-100
E-values (strength of matches)
The Chinese Chestnut Genome Sequencing Project
Best matches of proteins from the chestnut genome assembly
are to peach and other related species
Only 1% of best matches to Arabidopsis.
The peach genome is best for chestnut gene discovery.
BLASTx alignments to model plant
genomes in Phytozome
Best matches: • peach, 23%
• rice, 12%
• grapevine, 7%
• Eurosids 1 species, 56%
The Chinese Chestnut Genome Sequencing Project
The predicted chestnut
proteins are most similar to
species in the Eurosids 1
clade, that also includes
peach and chestnut.
eurosids 1
eurosids 2
Source: http://www.phytozome.net/
The Chinese Chestnut Genome Sequencing Project
However, the genome assembly is uneven and not as good as needed
to assemble all of the blight resistance QTL genes
Range of coverage among genome scaffolds
The Chinese Chestnut Genome Sequencing Project
Our target is the blight resistance genes. We will sequence the
Resistance QTL themselves, which is already in progress:
• Sets of BAC contigs covering the QTLs were identified.
• Sequencing of each QTL underway as contig pools.
• Genes will be identified using peach resistance QTL and CC transcripts.
QTL by
Linkage
Group
Physical
Estimated
Map Contig Contig Size
#
# Clones in
minimum tiling
path
Estimated
Clone Lengths
DNA
Pool
G
7039
4.51 Mb
40
6.22 Mb
A
F
403
5.13 Mb
51
7.64 Mb
B
B
9166
2.50 Mb
24
3.47 Mb
C
B
4269
2.31 Mb
24
3.45 Mb
C
B
3279
1.68 Mb
19
2.37 Mb
D
B
11956
3.65 Mb
30
5.06 Mb
D
19.79 Mb
188
28.2 Mb
TOTALS
The Chinese Chestnut Genome Sequencing Project
Year 3 - Gene discovery
Clonal testing
Marker-aided
breeding
New varieties
Markers in
QTL genes
Transgenics
Genome
sequencing
Complete QTL
sequences
GE trees
Candidate genes
from the QTLs
Candidate gene
validation

FHI Biotechnology Approaches Clonal testing New varieties Marker-aided breeding Transgenics Genome sequencing GE trees Chestnut Genome Research Team John E.

Transcript FHI Biotechnology Approaches Clonal testing New varieties Marker-aided breeding Transgenics Genome sequencing GE trees Chestnut Genome Research Team John E.

Directory