Integration of Clustering and Multidimensional Scaling to Determine Phylogenetic Trees as Spherical Phylograms Visualized in 3 Dimensions Introduction Phylogenetic analysis is commonly used to.

Download Report

Transcript Integration of Clustering and Multidimensional Scaling to Determine Phylogenetic Trees as Spherical Phylograms Visualized in 3 Dimensions Introduction Phylogenetic analysis is commonly used to.

Integration of Clustering and Multidimensional Scaling to
Determine Phylogenetic Trees as Spherical
Phylograms Visualized in 3 Dimensions
Introduction
Phylogenetic analysis is commonly used to analyze genetic sequence data from fungal
communities, while ordination and clustering techniques commonly are used to analyze sequence
data from bacterial communities. However, few studies have attempted to link these two
independent approaches. We propose a method, which we call spherical phylogram (SP), to
display the phylogenetic tree within the clustering and visualization result from a pipeline called
DACIDR. In comparison with traditional tree display methods, the correlations between the tree
and the clustering can be observed directly. In addition, we propose an algorithm called
interpolative joining (IJ) to construct and visualize the SP in 3D space.
Mega Region Visualization of full data. (446k Fungi Data)
Cluster Visualization from Mega Region 0
Spherical Phylogram visualized using the phylogenetic tree
generated by RaXml using the representative sequences and
reference sequences, the color scheme is same as in Figure 2
Visualization of all the clusters found by Recursive Clustering
Figure 2 Maximum likelihood phylogenetic tree from
reference sequences and representative sequences found
in each clusters, which is collapsed into clades at the
genus level as denoted by colored triangles at the end of
the branches. Branch lengths denote levels of sequence
divergence between genera and nodes are labeled with
bootstrap confidence values. Representative sequences
from spores that are not part of another clade are denoted
with the label ‘454 sequence from spore’. This figure is
generated by FigTree.
Figure 1 Screen shots of visualization result after data clustering
DACIDR
Pairwise
Clustering
Pairwise
Sequence
Alignment
Sample
Clustering
Result
Dissimilarity
Matrix
Multidimensional
Scaling
Input
Sequences
Interpolation
Mega
Region 0
DACIDR
Mega
Region 1
DACIDR
……
Mega
Region N
Find
Cluster
Centers
Mega Region
Result
Visualization
Recursive Clustering
Final
Clustering
Result
Visualization
DACIDR
Representative
Sequences
Reference
Sequences
3D Phylogenetic Tree Visualization
DACIDR
Interpolative
Joining
Spherical
Phylogram
RaXml
Figure 3. Flowchart of Large Scale Data Clustering and Visualization. This is based on MPI and MapReduce parallel computing framework
Contacts:
Yang Ruan ([email protected]), Saliya Ekanayake ([email protected]), Geoffrey Fox ([email protected])
Visualization
Pairwise
Sequence
Alignment
Dissimilarity
Matrix
Pairwise
Clustering
DACIDR
Sample
Clustering
Result
Multidimensional
Scaling
Interpolation
Input
Sequences
Mega
Region 0
DACIDR
Mega
Region 1
DACIDR
Find
Cluster
Centers
Visualization
Recursive Clustering
……
Mega
Region N
Mega Region
Result
Final
Clustering
Result
Visualization
DACIDR
Representative
Sequences
3D Phylogenetic Tree Visualization
DACIDR
Interpolative
Joining
Reference
Sequences
RaXml
Spherical
Phylogram
Visualization