Document 7464787

Download Report

Transcript Document 7464787

Unsupervised clustering in
mRNA expression profiles
D.K. Tasoulis, V.P. Plagianakos, and M.N. Vrahatis
Computational Intelligence Laboratory (CILAB), Department of Mathematics, University of
Patras, GR-26110 Patras, Greece
University of Patras Artificial Intelligence Research Center (UPAIRC), University of Patras,
GR-26110 Patras, Greece
Computers in Biology and Medicine
In Press, Corrected Proof, Available online 24 October 2005
K-Windows Clustering
• Adaptation of K-means, originally
proposed in 2002 by Vrahatis et. al.
• Windowing technique improves speed and
accuracy
• Tries to place a d-dimensional window
(box) containing all patterns that belong to
a single cluster
K-Windows – Basic Concepts
•
Move windows to find cluster centers (fig a)
1.
2.
3.
•
Select k points as centers of d-windows of size a.
Window means becomes new center.
Repeat until stopping criterion (movement of center).
Enlarge windows to determine cluster edges (fig b)
1.
2.
3.
Enlarge one dimension by a specified percent.
Relocate window as above.
Keep only if increase in instances in window exceeds threshold
Unsupervised K-Windows (UKW)
•
•
•
Start with sufficiently large number of windows
Merge to automatically determine the number of
clusters
For each pair of overlapping windows, calculate
proportion of overlap for each window.
a) Large overlap, considered same cluster, W1 is deleted.
b) Many points in common, considered the same cluster.
c) Low overlap, considered two different clusters.
Experimental Setup
• Leukemia dataset – well characterized
• Default UKW parameters used
• Supervised dimension reduction
– Two previously published gene subsets and
their union
• Unsupervised dimension reduction
– Biclustering with UKW
– PCA
– PCA and UKW hybrid
Supervised Feature Selection
• Use two gene subsets selected in previously
published papers using supervised techniques.
• All algorithms did best on combined set, results
below.
Unsupervised Feature Selection
(Biclustering Technique)
• Apply UKW to cluster genes, select one
gene, closest to cluster center, as
representative from each cluster.
• Apply UKW to samples, using those genes
(239).
• UKW accuracy: 93.6% (ALL) and 76%
(AML)
• No results reported for other algorithms
Unsupervised Feature Selection
(PCA Techniques)
• PCA and scree plot to reduce features
– Poor Performance
• Hybrid PCA and UKW method
– Partition genes using UKW
– Transform each partition using PCA
– Select representative factors from each
cluster
– UKW accuracy: 97.87% (ALL) and 88% (AML)
UKW Results Summary
Dataset
Published Gene
Subsets
(Supervised)
UKW Biclustering
(Unsupervised)
ALL Accuracy AML Accuracy
90%
100%
93.6%
76%
PCA
(Unsupervised)
N/A
N/A
PCA-UKW Hybrid
(Unsupervised)
97.87%
88%
• Default parameters
–
–
–
–
–
initial window size a=5
enlargement threshold θe=0.8
merging threshold θm=0.1
coverage threshold θc=0.2
variability threshold θv=0.02
• Link to article