Transcript Document

Arrays as tools for Natural Variation studies:
Mapping, Haplotyping, and gene expression
Justin Borevitz
University of Chicago
naturalvariation.org`
Talk Outline
• Single Feature Polymorphisms (SFPs)
– Potential deletions
• Bulk Segregant Mapping
– Extreme Array Mapping
• Haplotyping
– Selection
• Transcriptional profiling
– for QTL candidate genes
What is Array Genotyping?
• Affymetrix expression GeneChips contain
202,806 unique 25bp oligo nucleotides.
• 11 features per probset for 21546 genes
• New array’s have even more
• Genomic DNA is randomly labeled with
biotin, product ~50bp.
• 3 independent biological replicates
compared to the reference strain Col
GeneChip
Potential Deletions
Spatial Correction
Spatial Artifacts
Improved reproducibility
Next: Quantile Normalization
False Discovery and Sensitivity
Cereon
may be a
sequencing
Error
TIGR
match is
a match
PM only
SAM threshold
5% FDR
GeneChip
SFPs nonSFPsCereon marker accuracy
3806 89118 100% 90% 80% 70%
Sequence
817
121
696Sensitivity
Polymorphic 340
117
223 34% 41% 53% 85%
Non-polymorphic 477
4
473
False Discovery rate:
3%
Test for independence of all factors:
Chisq = 177.34, df = 1, p
-value = 1.845e-40
GeneChip
SFPs nonSFPsCereon marker accuracy
10627 82297 100% 90% 80% 70%
Sequence
817
223
594Sensitivity
Polymorphic 340
195
145 57% 67% 85% 100%
Non-polymorphic 477
28
449
False Discovery rate:
13%
Test for independence of all factors:
Chisq = 265.13, df = 1, p
-value = 1.309e-59
SAM threshold
18% FDR
3/4 Cvi markers were also confirmed in PHYB
Chip genotyping of a Recombinant Inbred Line
29kb interval
Discovery 6 replicates X $500 12,000 SFPs = $0.25
Typing 1 replicate X $500 12,000 SFPs = $0.041
LIGHT1
NIL
Potential Deletions
>500 potential deletions
45 confirmed by Ler sequence
23 (of 114) transposons
Disease Resistance
(R) gene clusters
Single R gene deletions
Genes involved in
Secondary metabolism
Unknown genes
Potential Deletions Suggest Candidate Genes
MAF1 natural deletion
FLOWERING1 QTL
Chr1 (bp)
MAF1
Flowering Time QTL caused by a natural deletion in MAF1
Fast Neutron deletions
FKF1 80kb deletion CHR1
Het
cry2 10kb deletion CHR1
Map bibb
100 bibb mutant plants
100 wt mutant plants
bibb mapping
Bulk segregant
Mapping using
Chip hybridization
bibb maps to
Chromosome2 near
ASYMETRIC LEAVES1
AS1
ChipMap
BIBB = ASYMETRIC LEAVES1
AS1 (ASYMMETRIC LEAVES1) =
MYB closely related to
PHANTASTICA located at 64cM
bibb
as1
Sequenced AS1 coding region from
bib-1 …found g -> a change that
would introduce a stop codon in the
MYB domain
bib-1
W49*
MYB
as-101
Q107*
bibb
as1-101
20
40 60
80 100
0.5
-0.5
0
20
40
60
cM Chromosome 2
stamenstaymut
stamenstaymut
80
0
20
40
60
80
cM Chromosome 3
0.0
stamenstay
Ler
Sarah Liljegren
-0.5
-0.5
0.0
allele frequency
0.5
cM Chromosome 1
0.5
0.0
allele frequency
-0.5
0.0
allele frequency
0.0
-0.5
allele frequency
0
allele frequency
stamenstaymut
0.5
stamenstaymut
0.5
stamenstaymut
0
20
40
60
cM Chromosome 4
0
20
40
60
80
cM Chromosome 5
100
Mapping confirmed
40 60
80 100
0.6
0.4
20
40
60
60
cM Chromosome 4
20
40
60
80
ein6een
-0.2
0.0
double mutant
Ramlah Nehring
-0.4
40
0
cM Chromosome 3
0.2
0.6
80
Mapping confirmed
-0.6
-0.4
-0.2
0.0
allele frequency
0.2
0.4
ein6F2mut
0.6
ein6F2mut
0.4
cM Chromosome 2
20
0.2
-0.4
-0.6
0
cM Chromosome 1
-0.6
0
0.0
allele frequency
-0.4
-0.6
20
-0.2
0.4
0.2
0.0
allele frequency
-0.2
0.4
0.2
0.0
-0.2
-0.6
-0.4
allele frequency
0
allele frequency
ein6F2mut
0.6
ein6F2mut
0.6
ein6F2mut
0
20
40
60
80
cM Chromosome 5
100
eXtreme Array Mapping
12
Histogram of Kas/Col RILs Red light
6
4
2
0
counts
8
10
15 tallest RILs pooled vs
15 shortest RILs pooled
6
8
10
hypocotyl length (mm)
12
14
eXtreme Array Mapping
Chromosome 2
12
8
LOD
16
RED2 QTL
4
0
0
20
40
cM
60
80
100
Composite Interval Mapping
RED2 QTL 12cM
LOD
15 tallest RILs pooled vs
15 shortest RILs pooled
Allele frequencies determined by SFP
genotyping. Thresholds set by simulations
Red light QTL RED2 from 100 Kas/ Col RILs
Fine Mapping with Arrays
100
200
300
400
500
600
1.0
0.5
-0.5
-1.0
0
100
200
300
400
500
Chromosome 4 (cM)
Chromosome 5 (cM)
600
0.5
-0.5
-1.0
200
300
kb
400
500
600
100
200
300
400
500
600
Single Additive Gene
1000 F2s
Select recombinants
by PCR 1Mb region
0.0
genotype
0.5
0.0
-0.5
100
0
kb
1.0
kb
1.0
kb
-1.0
0
0.0
genotype
0.5
-1.0
-0.5
0.0
genotype
0.5
0.0
-1.0
-0.5
genotype
0
genotype
Chromosome 3 (cM)
1.0
Chromosome 2 (cM)
1.0
Chromosome 1 (cM)
0
100
200
300
kb
400
500
600
Barley SFPs gDNA
• 9 arrays, random labeled genomic DNA
• 3 wild type, 3 parent 1, 3 parent 2
• Hope to verify some RNA SFPs
• Pairs plots, correlation matrix
• SFP table
Just better than permutations
delta ori.data perm.data difference
0.10
2866
2114.2
751.8
0.15
1870
578.4
1291.6
0.20
1274
269.3
1004.7
0.25
991
174.7
816.3
0.30
816
126.8
689.2
0.35
660
95.8
564.2
0.40
554
75.8
478.2
FDR
0.74
0.31
0.21
0.18
0.16
0.15
0.14
Increase specific activity with other labeling methods
Perform more replicates
• Single Feature Polymorphisms
– Improve with replicates (easy)
– Improved statistical models
• Genotyping
– Precisely define recombination breakpoints
– Fine mapping
• Potential Deletions
– Candidate genes/ induced mutations
• Bulk segregant Mapping
– eXtreme Array Mapping, F2s etc
Array Haplotyping
• What about Diversity/selection across the
genome?
• A genome wide estimate of population
genetics parameters, θw, π, Tajima’D, ρ
• LD decay, Haplotype block size
• Deep population structure?
• Col, Lz, Ler, Bay, Shah, Cvi, Kas, C24,
Est, Kin, Mt, Nd, Sorbo, Van, Ws2
C c c c C c C j j j j j j L L L B B B S S C C C k k c c E E E K K MMM N N N S S S v v V WWW
l l l l l l l C CC L L L r r r y y y a a i i i s s 4 4 t t t n n 0 0 0 - - - r r r n n n - - o o o o o o o w ww w w w e e e a a a h h v v v a a 2 2 s s s e e t t t d d d o o o a a a s s s
Pairwise Correlation between and within replicates
C c c c C c C j j j j j j L L L B B B S S C C C k k c c E E E K K M M M N N N S S S v v V WWW
o o o o o o o wwwwww e e e a a a h h v v v a a 2 2 s s s e e t t t d d d o o o a a a s s s
l l l l l l l CCC L L L r r r y y y a a i i i s s 4 4 t t t n n 0 0 0 - - - r r r n n n - - -
Array Haplotyping
Chromosome1 ~500kb
Inbred lines
Low effective
recombination
due to partial
selfing
Extensive LD
blocks
Col Ler Cvi Kas Bay Shah Lz Nd
Distribution of T-stats
null (permutation)
actual
32,427
Calls
4 e+04
0 e+00
frequency
8 e+04
208,729
(-4,-3.5]
(-3,-2.5]
(-2,-1.5]
(-1,-0.5]
(0,0.5]
(1,1.5]
(2,2.5]
(3,3.5]
T statistic
Not Col
12,250 SFPs
NA
Col
NA duplications
Sequence confirmation of SFPs
Accession
bay
c24
cvi
est
kas
kendl
ler
lz
mt
nd
shah
sorbo
van
ws2
FDR
0.0%
0.2%
0.0%
0.0%
1.9%
3.1%
0.0%
0.0%
0.2%
0.0%
0.0%
0.0%
0.2%
0.0%
Sensitivity
43%
39%
38%
59%
44%
33%
49%
53%
61%
47%
24%
45%
29%
49%
SNP
51
64
91
39
66
57
43
51
49
49
80
55
92
57
Total
563
580
543
548
577
545
562
573
570
568
548
526
571
514
SFPs for reverse genetics
14 Accessions 30,950 SFPs`
http://naturalvariation.org/sfp
Chromosome Wide Diversity
Self Incompatibility-locus
Self Incompatibility-locus
Diversity 50kb windows
Tajima’s D like 50kb windows
RPS4
unknown
R genes vs bHLH Theta W
RPS4
Rgenes vs bHLH Tajimas’ D
RPS4
R genes vs bHLH
Summery Haplotyping
• Patterns of variation across accessions
• Natural reverse genetics
– Polymorphism database
• Increased polymorphism in centromere
• Selection on R/genes
Transcription based cloning
• Look for gene expression differences
between genotypes
• Identify candidate genes that map to
mutation
• Downstream targets that map elsewhere
differences may be due to
expression or hybridization
PAG1 down regulated in Cvi
PLALE GREEN1
knock out has long
hypocotyl in red light
SFPs from RNA
• Barley Affy array 22801 probe sets
– Most probes sets 11 probes
– Background correction “rma2”
– Quantile normalization
• 36 arrays total
– 3 replicates
– 6 tissues, leaf, crown, root, radical, gem, col?
– 2 genotypes (Golden Promise 7,459 ESTs)
–
(Morex 52,695 ESTs)
Look at some plots raw data
Remove probe effect
Remove Tissue + Genotype effect
Look at some plots raw data
Remove probe effect
Remove Tissue + Genotype effect
SAM False Discovery Rate
delta ori.data perm.data difference
FDR
0.1
13210
1210.34
11999.66 0.091623013
0.2
7903
183.95
7719.05 0.023275971
0.3
5462
49.18
5412.82 0.009004028
0.4
4036
18.31
4017.69 0.004536670
0.5
3024
8.49
3015.51 0.002807540
0.6
2285
3.85
2281.15 0.001684902
Both + and – SFPs since no reference comparison
Need to compare with ESTs
Review
• Single Feature
Polymorphisms (SFPs) can be
used to identify recombination
breakpoints, potential
deletions, for eXtreme Array
mapping, and haplotyping
• Expression analysis to
identify QTL candidate genes
and downstream responses
that consider polymorphisms
Universal Whole Genome Array
RNA
Gene Discovery
Gene model correction
Non-coding/ micro-RNA
Antisense transcription
DNA
Chromatin
Immunoprecipitation
ChIP chip
Methylation
Transcriptome Atlas
Expression levels
Tissues specificity
Alternative Splicing
Polymorphism SFPs
Discovery/Genotyping
Comparative Genome
Hybridization (CGH)
Insertion/Deletions
~19 bp tile,
both strands
eliminate repeat regions
“good” binding oligos
Transcriptome Atlas
Improved Genome Annotation
ORFa
ORFb
start
conservation
MMMM M M
AAAAA
SFP
SFP
SFP
SNP
Chromosome (bp)
deletion
MMMM M M
SNP
ChipViewer: Mapping of transcriptional units of ORFeome
From 2000v At1g09750 (MIPS) to the latest AGI At1g09750
2000 v Annotation (MIPS)
The latest AGI Annotation
NaturalVariation.org
Syngenta
Hur-Song Chang
Tong Zhu
Salk
Jon Werner
Todd Mockler
Sarah Liljegren
Ramlah Nehring
Joanne Chory
Detlef Weigel
Joseph Ecker
UC Davis
Julin Maloof
UC San Diego
Charles Berry
University of Guelph, Canada
Dave Wolyn
Scripps
Sam Hazen
Elizabeth Winzeler