MadMapper And CheckMatrix:

Download Report

Transcript MadMapper And CheckMatrix:

Alexander Kozik and Richard Michelmore.
The Genome Center, University of California Davis, CA 95616.
Contemporary molecular marker techniques can generate mapping data for thousands
molecular markers simultaneously. Construction and validation of high density genetic
maps is a challenge and requires robust, high-throughput approaches. As part of the
Compositae Genome Project, we developed a suite of Python scripts for quality control of
genetic markers, grouping and inference of linear order of markers in linkage groups.
These scripts can be used in conjunction with other mapping programs or can be used as
a stand-alone package. The suite consists of three programs: MadMapper_RECBIT,
MadMapper_XDELTA and CheckMatrix. MadMapper_RECBIT analyses raw marker
scores for recombinant inbred lines. MadMapper_RECBIT generates pairwise distance
scores for all markers, clusters based on pairwise distances, identifies genetic bins,
assigns new markers to known linkage groups, validates allele calls, and assigns quality
classes to each marker based on several criteria and cutoff values. MadMapper_XDELTA
utilizes a new algorithm, Minimum Entropy Approach and Best-Fit Extension, to infer linear
order of markers. MadMapper_XDELTA analyzes two-dimensional matrices of all pairwise
scores and finds best map that has minimal total sum of differences between adjacent
cells (map with lowest entropy). This approach scales well and can accommodate large
numbers of markers, unlike some commonly used mapping programs. CheckMatrix serves
as a visualization tool to validate constructed genetic maps. CheckMatrix generates
graphical genotypes and two-dimensional heat plots of pairwise scores. Visualization of
regions with positive and negative linkage as well as of allele fraction per marker simplifies
genetic map validation without applying statistical approaches.
Scripts are freely available at http://cgpdb.ucdavis.edu/XLinkage/MadMapper/
BRIEF DESCRIPTION OF RIL MAPPING PIPELINE:
grouping cutoff stringency
1. Processing of raw markers scores and grouping:
MadMapper_RECBIT generates multiple text files
for further analysis
2. Construction of genetic map (ordering of markers)
per linkage group: MadMapper_XDELTA (or any
other mapping program)
3. Visualization and validation of genetic maps:
CheckMatrix generates heat plots of
recombination scores and graphical genotyping
MadMapper and CheckMatrix are Python scripts
and can be used on any computer platform: UNIX,
Windows, Mac OS-X. Grouping can be done on a
set of ~2,000 markers; map construction works in
reasonable timeframe with up to ~500 markers
CheckMatrix 2D plot:
partially
wrong
order
Three input files are required:
;
;
GM01
GM02
GM03
GM04
GM05
GM06
GM07
GM08
GM09
GM10
GM11
GM12
Linkage group I
Linkage group I
adjacent cells
(values)
CHECKMATRIX USAGE:
Linkage group II
LG
LG
LG
LG
LG
LG
LG
LG
LG
LG
LG
LG
GM01
GM02
GM03
GM04
GM05
GM06
GM07
GM08
GM09
GM10
GM11
GM12
1
|
A
A
A
A
A
A
A
A
A
B
B
B
Locus file
A
A
A
A
A
A
A
A
A
A
B
B
A
A
A
A
A
A
A
A
A
A
A
B
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
Linkage group IV
Linkage group V
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
10
|
A A
A A
A A
A A
A B
B B
B B
B B
B B
A B
A B
A B
A
A
A
B
B
B
B
B
B
B
B
B
A
A
A
B
B
B
B
B
B
B
B
B
A
A
B
B
B
B
B
B
B
B
B
B
A
A
B
B
B
B
B
B
B
B
B
B
A
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
0
1
2
3
4
5
6
7
8
9
10
11
Map file
CheckMatrix
B
B
B
B
B
B
B
B
B
B
B
A
B
B
B
B
B
B
B
B
B
B
A
A
20
|
B B
B B
B B
B B
B B
B B
B B
B B
B A
A A
A A
A A
B
B
B
B
B
B
B
B
A
A
A
A
B
B
B
B
B
B
B
A
A
A
A
A
B
B
B
B
B
B
A
A
A
A
A
A
25
|
B
B
B
B
B
B
A
A
A
A
A
A
...................
GM01
GM07
0.36
GM01
GM08
0.40
GM01
GM09
0.48
GM01
GM10
0.52
GM01
GM11
0.60
GM01
GM12
0.68
GM02
GM01
0.04
GM02
GM02
0.00
GM02
GM03
0.08
GM02
GM04
0.16
GM02
GM05
0.20
GM02
GM06
0.24
...................
Matrix file
Upon program
execution three output
files will be generated:
1
Linkage group III
Linkage group IV
Linkage group V
high density
of markers
Two-dimensional matrix of
recombination pairwise scores
Haplotypes per RIL (inbred line)
[ red – Columbia; blue – L.erecta ]
Linkage group III
main diagonal
with linked markers
CheckMatrix Color Scheme
Visualization of
numerical data
using ChekMatrix
Linkage group II
regions with
negative linkage
MINIMUM ENTROPY APPROACH TO INFER
LINEAR ORDER OF MARKERS:
Numerical data
generated
by MadMapper
right
order
low
‘entropy’
[ MAP WAS RE-CONSTRUCTED USING MADMAPPER ]
Example of group analysis by MadMapper_RECBIT
MadMapper_XDELTA
analyzes twodimensional matrices
of all pairwise scores
and finds best map that
has minimal total sum
of differences between
adjacent cells (map
with lowest ‘entropy’).
random
order
high
‘entropy’
VISUALIZATION OF ARABIDOPSIS GENETIC MAP (DEAN AND LISTER, http://www.arabidopsis.info/ ) USING CHECKMATRIX
HEAT PLOT – it assists to
validate the quality of
constructed genetic map
and identify markers with
wrong position
GRAPHICAL
GENOTYPING:
visualization of
haplotypes per
recombinant line
(suspicious double
crossovers are
highlighted)
2
low density
of markers
regions with
quasi linkage
allele composition
per markers
Linkage group I
distinct linkage group #4
MadMapper And CheckMatrix:
Python Scripts To Infer Orders Of Genetic Markers And
For Visualization And Validation Of Genetic Maps And Haplotypes.
Linkage group II
Linkage group III
Linkage group IV
3
Linkage group V
2-D diagonal ChekMatrix heat-plot: all markers versus all markers [color gradient reflects linkage scores between markers]
1. Dean and Lister Arabidopsis Genetic Map
and Raw Data:
http://www.arabidopsis.info/new_ri_map.html
2. MadMapper:
http://cgpdb.ucdavis.edu/XLinkage/MadMapper/
3. JoinMap:
http://www.kyazma.nl/index.php/mc.JoinMap
4. RECORD:
http://www.dpw.wau.nl/pv/pub/recORD/index.htm
5. GenoPix_2D_Plotter
http://www.atgc.org/GenoPix_2D_Plotter/
CheckMatrix
graphical genotyping
LINEAR ORDER OF MARKERS INFERRED BY THREE DIFFERENT METHODS:
MadMapper_XDELTA
JoinMap
physical coordinates of markers
on Arabidopsis genome
Record
inferred order of markers by three different
approaches (mapping programs)
REFERENCES AND DATA SOURCES:
CREDITS:
This work was funded by NSF grant # 0421630
to Compositae Genome Consortium
http://compgenomics.ucdavis.edu/
PAG-14 POSTERS WITH EXAMPLES OF
MADMAPPER USAGE:
#P751 High-Density Haplotyping With Microarray-Based
Single Feature Polymorphism Markers In Arabidopsis
#P761 Gene Expression Markers: Using Transcript Levels
Obtained From Microarrays To Genotype A Segregating
Population
CIRCULAR GRAPH –
it assists to validate
genetic map and
identify markers with
spurious linkage
Side-by-side comparison of linear order of markers on Arabidopsis genome inferred by three different approaches
(mapping programs) and comparison with physical order of markers (Col- 0 genomic sequence):
MadMapper_XDELTA (minimum entropy approach), JoinMap (maximum likelihood) and RECORD (minimum number
of recombination events) [Diagonal dot-plot was created using GenoPix_2D_Plotter]
MadMapper
JoinMap
RECORD