Transcript Document
BINF636 Clustering and Classification Jeff Solka Ph.D. Fall 2008 BINF636 CLUSTERING AND CLASSIFICATION 1 Gene Expression Data X GxI x11 x 21 xG1 samples x12 x22 xG 2 x1I x2 I xGI Genes xgi = expression for gene g in sample i BINF636 CLUSTERING AND CLASSIFICATION 2 The Pervasive Notion of Distance • We have to be able to measure similarity or dissimilarity in order to perform clustering, dimensionality reduction, visualization, and discriminant analysis. • How we measure distance can have a profound effect on the performance of these algorithms. BINF636 CLUSTERING AND CLASSIFICATION 3 Distance Measures and Clustering • Most of the common clustering methods such as k-means, partitioning around medoid (PAM) and hierarchical clustering are dependent on the calculation of distance or an interpoint distance matrix. • Some clustering methods such as those based on spectral decomposition have a less clear dependence on the distance measure. BINF636 CLUSTERING AND CLASSIFICATION 4 Distance Measures and Discriminant Analysis • Many supervised learning procedures (a.k.a. discriminant analysis procedures) also depend on the concept of a distance. – nearest neighbors – k-nearest neighbors – Mixture-models BINF636 CLUSTERING AND CLASSIFICATION 5 BINF636 CLUSTERING AND CLASSIFICATION 6 BINF636 CLUSTERING AND CLASSIFICATION 7 BINF636 CLUSTERING AND CLASSIFICATION 8 Two Main Classes of Distances • Consider two gene expression profiles as expressed across I samples. Each of these can be considered as points in RI space. We can calculate the distance between these two points. • Alternatively we can view the gene expression profiles as being manifestations of samples from two different probability distributions. BINF636 CLUSTERING AND CLASSIFICATION 9 BINF636 CLUSTERING AND CLASSIFICATION 10 A General Framework for Distances Between Points • Consider two m-vectors x = (x1, …, xm) and y = (y1, …, ym). Define a generalized distance of the form d x, y F d1 x1 , y1 , , d m xm , ym where the d k are themselves distances for each of the k 1, , m features. • We call this a pairwise distance function as the pairing of features within observations is preserved. BINF636 CLUSTERING AND CLASSIFICATION 11 Minkowski Metric • Special case of our generalized metric d x, y F d1 x1 , y1 , , d m xm , ym 1/ zk d k xk , yk xk yk and F z1 , , zm zk k 1 Manhatten metric, 1, Euclidean metric, 2 m BINF636 CLUSTERING AND CLASSIFICATION 12 Euclidean and Manhattan Metric Euclidean Metric d euc x, y m x y i 1 i 2 i Manhattan Metric m d man x, y xi yi i 1 BINF636 CLUSTERING AND CLASSIFICATION 13 Correlation-based Distance Measures • Championed for use within the microarray literature by Eisen. • Types – Pearson’s sample correlation distance. – Eisen’s cosine correlation distance. – Spearman sample correlation distance. – Kendall’s t sample correlation. BINF636 CLUSTERING AND CLASSIFICATION 14 Pearson Sample Correlation Distance (COR) m dcor x, y 1 r x, y 1 x x y y i i 1 m i m x x y y i 1 2 i i 1 2 i where x and y are the mean coordinate for the x and y respectively. BINF636 CLUSTERING AND CLASSIFICATION 15 Eisen Cosine Correlation Distance (EISEN) m x' y d eisen x, y 1 1 x y x y i i 1 m x i 1 2 i i m 2 y i i 1 a special case of the Pearson correlation with x=y=0 BINF636 CLUSTERING AND CLASSIFICATION 16 Spearman Sample Correlation Distance (SPEAR) ' ' ' ' x x y y i i m d spear x, y 1 i 1 x x y y m i 1 ' 2 ' i ' i ' 2 where x rank xi and y rank yi ' i ' i BINF636 CLUSTERING AND CLASSIFICATION 17 Tau Kendall’s t Sample Correlation (TAU) m dtau x, y 1 t x, y 1 m C i 1 j 1 xij C yij m m 1 where Cxij sign xi x j and C yij sign yi y j BINF636 CLUSTERING AND CLASSIFICATION 18 Some Observations - I • Since we are subtracting the correlation measures from 1, things that are perfectly positively correlated (correlation measure of 1) will have a distance close to 0 and things that are perfectly negatively correlated (correlation measure of -1) will have a distance close to 2. • Correlation measures in general are invariant to location and scale transformations and tend to group together genes whose expression values are linearly related. BINF636 CLUSTERING AND CLASSIFICATION 19 Some Observations - II • The parametric methods (COR and EISEN) tend to be more negatively effected by the presence of outliers than the non-parametric methods (SPEAR and TAU) • Under the assumption that we have standardized the data so that x and y are m-vectors with zero mean and unit length then there is a simple relationship between the Pearson correlation coefficient r(x,y) and the Euclidean distance deuc x, y 2m 1 r x, y BINF636 CLUSTERING AND CLASSIFICATION 20 Mahalanobis Distance x y ' x y 1 • This allows data directional variability to come into play when calculating distances. • How do we estimate ? BINF636 CLUSTERING AND CLASSIFICATION 21 Distances and Transformations • Assume that g is an invertible possible non-linear transformation g: x x’ • This transformation induces a new metric d’ via d x, y d g 1 x ' , g 1 y ' d ' x ', y ' BINF636 CLUSTERING AND CLASSIFICATION 22 Distances and Scales • Original scanned fluorescence intensities • Logarithmically transformed data • Data transformed by the general logarithm BINF636 CLUSTERING AND CLASSIFICATION 23 Experiment-specific Distances Between Genes • One might like to use additional experimental design information in deterring how one calculates distances between the genes. • One might wish to used smoothed estimates or other sorts of statistical fits and measure distances between these. • In time course data distances that honor the time order of the data are appropriate. BINF636 CLUSTERING AND CLASSIFICATION 24 Standardizing Genes xgi xgi center xg . scale xg . center xg . measure of the center of the distribution of the set of values xgi , i 1, , I (mean, median) scale xg measure of scale (standard deviation, interquartile range, MAD) BINF636 CLUSTERING AND CLASSIFICATION 25 Standardizing Arrays (Samples) xgi xgi center x.i scale x.i center x.i measure of the center of the distribution of the set of values xgi , i 1, , G (mean, median) scale x.i measure of scale (standard deviation, interquartile range, MAD) BINF636 CLUSTERING AND CLASSIFICATION 26 Scaling and Its Implication to Data Analysis - I • Types of gene expression data – Relative (cDNA) – Absolute (Affymetrix) • xgi is the expression of gene g on sample I as measured on a log scale • Let ygi = xgi – xgA; patient A is our reference • The distance between patient samples d ( y.i , y. j ) d g y gi , y gj d g xgi xgA , y gj xgA G G g 1 g 1 where the sum of course is over all genes and xgA is the expression of gene g on patient A. BINF636 CLUSTERING AND CLASSIFICATION 27 Scaling and Its Implication to Data Analysis - II If d ( x, y) are functions of x y alone, then d ( y.i , y. j ) d ( x.i , x. j ) and it does not matter if we look at relative (the y ' s) or absolute (the x ' s) expression measures. BINF636 CLUSTERING AND CLASSIFICATION 28 Scaling and Its Implication to Data Analysis - III The distance between two genes are given by d ( yg . , yh. ) di y gi , yhi di xgi xgA , xhi xhA I I i 1 i 1 where xgA is the expression of gene g on patient A and xhA is the expression for gene h on patient A. If d ( x, y ) has the property that d ( x c, y ) d ( x, y ) for any c, then the distance measure is the same for the absolute and relative expression measures. BINF636 CLUSTERING AND CLASSIFICATION 29 Summary of Effects of Scaling on Distance Measures • Minkowski distances – Distance between samples is the same for relative and absolute measures – Distance between genes is not the same for relative and absolute measures • Pearson correlation-based distance – Distances between genes is the same for relative and absolute measures – Distances between samples is not the same for relative and absolute measures BINF636 CLUSTERING AND CLASSIFICATION 30 What is Cluster Analysis? • Given a collection of n objects each of which is described by a set of p characteristics or variables derive a useful division into a number of classes. • Both the number of classes and the properties of the classes are to be determined. (Everitt 1993) BINF636 CLUSTERING AND CLASSIFICATION 31 Why Do This? • Organize • Prediction • Etiology (Causes) BINF636 CLUSTERING AND CLASSIFICATION 32 How Do We Measure Quality? • Multiple Clusters – Male, Female – Low, Middle, Upper Income • Neither True Nor False • Measured by Utility BINF636 CLUSTERING AND CLASSIFICATION 33 Difficulties In Clustering • Cluster structure may be manifest in a multitude of ways • Large data sets and high dimensionality complicate matters BINF636 CLUSTERING AND CLASSIFICATION 34 Clustering Prerequisites • Method to measure the distance between observations and clusters – Similarity – Dissimilarity – This was discussed previously • Method of normalizing the data – We discussed this previously • Method of reducing the dimensionality of the data – We discussed this previously BINF636 CLUSTERING AND CLASSIFICATION 35 The Number of Groups Problem • • How Do We Decide on the Appropriate Number of Clusters? Duda, Hart and Stork (2001) – Form Je(2)/Je(1) where Je(M) is the sum of squares error criterion for the m cluster model. The distribution of this ratio is usually not known. J e 1 x m 2 xD 2 J e 2 x mi 2 i 1 xD BINF636 CLUSTERING AND CLASSIFICATION 36 Optimization Methods • Minimizing or Maximizing Some Criteria • Does Not Necessarily Form Hierarchical Clusters BINF636 CLUSTERING AND CLASSIFICATION 37 Clustering Criteria The Sum of Squared Error Criteria 1 mi ni x xDi c J e x mi 2 i 1 xDi BINF636 CLUSTERING AND CLASSIFICATION 38 Spoofing of the Sum of Squares Error Criterion BINF636 CLUSTERING AND CLASSIFICATION 39 Related Criteria • With a little manipulation we obtain c 1 J e ni si 2 i 1 1 si 2 ni xDi x x' 2 x 'Di • Instead of using average squared distances betweens points in a cluster as indicated above we could use perhaps the median or maximum distance • Each of these will produce its own variant BINF636 CLUSTERING AND CLASSIFICATION 40 Scatter Criteria mi xD x i 1 m n D x Si xD ( x mi )(x mi ) t i c SW Si i 1 c S B ni mi mmi m t i 1 ST x mx m t xD ST SW SB BINF636 CLUSTERING AND CLASSIFICATION 41 Relationship of the Scattering Criteria SW SB BINF636 CLUSTERING AND CLASSIFICATION 42 Measuring the Size of Matrices • So we wish to minimize SW while maximizing SB • We will measure the size of a matrix by using its trace of determinant • These are equivalent in the case of univariate data BINF636 CLUSTERING AND CLASSIFICATION 43 Interpreting the Trace Criteria c tr SW i 1 tr Si x mi c i 1 xDi 2 Je tr ST tr SW tr S B BINF636 CLUSTERING AND CLASSIFICATION 44 The Determinant Criteria • SB will be singular if the number of clusters is less than or equal to the dimensionality J d SW c S i 1 i • Partitions based on Je may change under linear transformations of the data • This is not the case with Jd BINF636 CLUSTERING AND CLASSIFICATION 45 Other Invariant Criteria • It can be shown that the eigenvalues of SW-1SB are invariant under nonsingular linear transformation • We might choose to maximize d tr S S i SW ST 1 W B i 1 d 1 i 1 1 i BINF636 CLUSTERING AND CLASSIFICATION 46 k-means Clustering 1. Begin initialize n, k, m1, m2, …, mk 2. Do classify n samples according to nearest mi Recompute mi 3. Until no no change in mi 4. Return m1, m2, .., mk 5. End • Complexity of the algorithm is O(ndkT) – T is the number of iterations – T is typically << n BINF636 CLUSTERING AND CLASSIFICATION 47 Example Mean Trajectories BINF636 CLUSTERING AND CLASSIFICATION 48 Optimizing the Clustering Criterion • N(n,g) = The number of partitions of n individuals into g groups N(15,3)=2,375,101 N(20,4)=45,232,115,901 N(25,8)=690,223,721,118,368,580 N(100,5)=1068 Note that the 3.15 x 10 universe in seconds 17 is the estimated age of the BINF636 CLUSTERING AND CLASSIFICATION 49 Hill Climbing Algorithms 1 - Form initial partition into required number of groups 2 - Calculate change in clustering criteria produced by moving each individual from its own to another cluster. 3 - Make the change which leads to the greatest improvement in the value of the clustering criterion. 4 - Repeat steps (2) and (3) until no move of a single individual causes the clustering criterion to improve. • Guarantees local not global optimum BINF636 CLUSTERING AND CLASSIFICATION 50 How Do We Choose c • Randomly “classify” points to generate the mi’s • Randomly generate mi’s • Base location of the c solution on the c-1 solution • Base location of the c solution on a hierarchical solution BINF636 CLUSTERING AND CLASSIFICATION 51 Alternative Methods • Simulated Annealing • Genetic Algorithms • Quantum Computing BINF636 CLUSTERING AND CLASSIFICATION 52 Hierarchical Cluster Analysis • 1 Cluster to n Clusters • Agglomerative Methods – Fusion of n Data Points into Groups • Divisive Methods – Separate the n Data Points Into Finer Groupings BINF636 CLUSTERING AND CLASSIFICATION 53 Dendrograms agglomerative 0 1 (1) (2) (3) (4) (1,2) 4 divisive 3 2 3 4 (1,2,3,4,5) (3,4,5) (4,5) 2 1 BINF636 CLUSTERING AND CLASSIFICATION 0 54 Agglomerative Algorithm (Bottom Up or Clumping) Start: Clusters C1, C2, ..., Cn each with 1 data point 1 - Find nearest pair Ci, Cj, merge Ci and Cj, delete Cj, and decrement cluster count by 1 If number of clusters is greater than 1 then go back to step 1 BINF636 CLUSTERING AND CLASSIFICATION 55 Inter-cluster Dissimilarity Choices • Furthest Neighbor (Complete Linkage) • Nearest Neighbor (Single Linkage) • Group Average BINF636 CLUSTERING AND CLASSIFICATION 56 Single Linkage (Nearest Neighbor) Clustering • Distance Between Groups is Defined as That of the Closest Pair of Individuals Where We Consider 1 Individual From Each Group • This method may be adequate when the clusters are fairly well separated Gaussians but it is subject to problems with chaining BINF636 CLUSTERING AND CLASSIFICATION 57 Example of Single Linkage Clustering 1 1 2 3 4 5 2 0.0 2.0 0.0 6.0 5.0 10.0 9.0 9.0 8.0 (1 2) 3 4 5 (1 2) 0.0 5.0 9.0 8.0 3 4 5 0.0 4.0 5.0 0.0 3.0 0.0 3 4 5 0.0 4.0 5.0 0.0 3.0 0.0 BINF636 CLUSTERING AND CLASSIFICATION 58 Complete Linkage Clustering (Furthest Neighbor) • Distance Between Groups is Defined as Most Distance Pairs of Individuals BINF636 CLUSTERING AND CLASSIFICATION 59 Complete Linkage Example 1 1 2 3 4 5 2 0.0 2.0 0.0 6.0 5.0 10.0 9.0 9.0 8.0 3 4 5 0.0 4.0 5.0 0.0 3.0 0.0 (1,2) is the First Cluster d(12) 3 = max[d13,d23]=d13=6.0 d(12)4 = max[d14,d24]=d14=10.0 d(12)5 = max[d15,d25]=d15=9.0 So the cluster consisting of (12) will be merged with the cluster consisting of (3) BINF636 CLUSTERING AND CLASSIFICATION 60 Group Average Clustering • Distance between clusters is the average of the distance between all pairs of individuals between the 2 groups • A compromise between single linkage and complete linkage BINF636 CLUSTERING AND CLASSIFICATION 61 Centroid Clusters • We use centroid of a group once it is formed. 3 2 1 45 3 1 1 45 3 1 1 22 BINF636 CLUSTERING AND CLASSIFICATION 2 1 1 22 62 Problems With Hierarchical Clustering • Well it really gives us a continuum of different clusterings of the data • As stated previously there are specific artifacts of the various methods BINF636 CLUSTERING AND CLASSIFICATION 63 Dendrogram BINF636 CLUSTERING AND CLASSIFICATION 64 Data Color Histogram or Data Image Orderings of the data matrix were first discussed in Bertin. Wegman in 1990 coined the term “data color histogram.” Mike Minnotte and Webster West subsequently termed the phrase “data image” in 1998. BINF636 CLUSTERING AND CLASSIFICATION 65 Data Image Reveals Obfuscated Cluster Structure Subset of the pairs plot Sorted on Observations Sorted on Observations and Features 90 observations in R100 drawn from a standard normal distribution The first and second 30 rows were shifted by 20 in their first and second dimensions respectively. This data matrix was then multiplied by a 100 x 100 matrix of Gaussian noise. BINF636 CLUSTERING AND CLASSIFICATION 66 The Data Image in the Gene Expression Community • Extracted from BINF636 CLUSTERING AND CLASSIFICATION 67 Example Dataset BINF636 CLUSTERING AND CLASSIFICATION 68 Complete Linkage Clustering BINF636 CLUSTERING AND CLASSIFICATION 69 Single Linkage Clustering BINF636 CLUSTERING AND CLASSIFICATION 70 Average Linkage Clustering BINF636 CLUSTERING AND CLASSIFICATION 71 Pruning Our Tree cutree(tree, k = NULL, h = NULL) tree a tree as produced by hclust. cutree() only expects a list with components merge, height, and labels, of appropriate content each. k an integer scalar or vector with the desired number of groups h numeric scalar or vector with heights where the tree should be cut. At least one of k or h must be specified, k overrides h if both are given. Values Returned cutree returns a vector with group memberships if k or h are scalar, otherwise a matrix with group meberships is returned where each column corresponds to the elements of k or h, respectively (which are also used as column names). BINF636 CLUSTERING AND CLASSIFICATION 72 Example Pruning > x.cl2<-cutree(hclust(x.dist),k=2) > x.cl2[1:10] [1] 1 1 1 1 1 1 1 1 1 1 > x.cl2[190:200] [1] 2 2 2 2 2 2 2 2 2 2 2 BINF636 CLUSTERING AND CLASSIFICATION 73 Identifying the Number of Clusters • As indicated previously we really have no way of identify the true cluster structure unless we have divine intervention • In the next several slides we present some well-known methods BINF636 CLUSTERING AND CLASSIFICATION 74 Method of Mojena • Select the number of groups based on the first stage of the dendogram that satisfies a j 1 a ksa • The a0,a1,a2,... an-1 are the fusion levels corresponding to stages with n, n-1, …,1 clusters. a and are the a mean and unbiased standard deviation of these fusion levels and k is a constant. s • Mojena (1977) 2.75 < k < 3.5 • Milligan and Cooper (1985) k=1.25 BINF636 CLUSTERING AND CLASSIFICATION 75 Hartigan’s k-means theory When deciding on the number of clusters, Hartigan (1975, pp 90-91) suggests the following rough rule of thumb. If k is the result of kmeans with k groups and kplus1 is the result with k+1 groups, then it is justifiable to add the extra group when (sum(k$withinss)/sum(kplus1$withinss)-1)*(nrow(x)-k-1) is greater than 10. BINF636 CLUSTERING AND CLASSIFICATION 76 kmeans Applied to our Data Set BINF636 CLUSTERING AND CLASSIFICATION 77 The 3 term kmeans solution BINF636 CLUSTERING AND CLASSIFICATION 78 The 4 term kmeans Solution BINF636 CLUSTERING AND CLASSIFICATION 79 Determination of the Number of Clusters Using the Hartigan Criteria BINF636 CLUSTERING AND CLASSIFICATION 80 MIXTURE-BASED CLUSTERING g f (x ) i f i (x , ) i 1 f i (x , ) N (m i , i ) BINF636 CLUSTERING AND CLASSIFICATION 81 HOW DO WE CHOOSE g? • Human Intervention • Divine Intervention • Likelihood Ratio Test Statistic – Wolfe’s Method – Bootstrap – AIC,BIC, MDL • Adaptive Mixtures Based Methods – Pruning – SHIP (AKMM) BINF636 CLUSTERING AND CLASSIFICATION 82 Akaike's Information criteria (AIC) • AIC(g) = -2L(f) + N(g) where N(g) is the number of free parameters in the model of size g. • We Choose g In Order to Minimize the AIC Condition • This Criterion is Subject to the Same Regularity Conditions as -2log BINF636 CLUSTERING AND CLASSIFICATION 83 MIXTURE VISUALIZATION 2-d BINF636 CLUSTERING AND CLASSIFICATION 84 MODEL-BASED CLUSTERING • This technique takes a density function approach. • Uses finite mixture densities as models for cluster analysis. • Each component density characterizes a cluster. BINF636 CLUSTERING AND CLASSIFICATION 85 Minimal Spanning Tree-Based Clustering Diansheng Guo Donna Peuquet Mark Gahegan, (2002) , Opening the black box: interactive hierarchical clustering for multivariate spatial patterns, Geographic Information Systems Proceedings of the tenth ACM international symposium on Advances in geographic information systems, McLean, Virginia, USA BINF636 CLUSTERING AND CLASSIFICATION 86 What is Pattern Recognition? • From Devroye, Györfi and Lugosi: – Pattern recognition or discrimination is about guessing or predicting the unknown nature of an observation, a discrete quantity such as black or white, one or zero, sick or healthy, real or fake. • From Duda, Hart and Stork: – The act of taking in raw data and taking an action based on the “category” of the pattern. BINF636 CLUSTERING AND CLASSIFICATION 87 Isn’t This Just Statistics? • Short answer: yes. • Breiman (Statistical Sciences, 2001) suggests there are two cultures within statistical modeling: Stochastic Modelers and Algorithmic Modelers. BINF636 CLUSTERING AND CLASSIFICATION 88 Algorithmic Modeling • Pattern recognition (classification) is concerned with predicting class membership of an observation. • This can be done from the perspective of (traditional statistical) data models. • Often, the data is high dimensional, complex, and of unknown distributional origin. • Thus, pattern recognition often falls into the “algorithmic modeling” camp. • The measure of performance is whether it accurately predicts the class, not how well it models the distribution. • Empirical evaluations often are more compelling than asymptotic theorems. BINF636 CLUSTERING AND CLASSIFICATION 89 Pattern Recognition Flowchart BINF636 CLUSTERING AND CLASSIFICATION 90 Pattern Recognition Concerns • Feature extraction and distance calculation • Development of automated algorithms for classification. • Classifier performance evaluation. • Latent or hidden class discovery based on etracted feature analysis. • Theoretical considerations. BINF636 CLUSTERING AND CLASSIFICATION 91 Linear and Quadratic Discriminant Analysis in Action BINF636 CLUSTERING AND CLASSIFICATION 92 Nearest Neighbor Classifier BINF636 CLUSTERING AND CLASSIFICATION 93 SVM Training Cartoon BINF636 CLUSTERING AND CLASSIFICATION 94 CART Analysis of the Fisher Iris Data BINF636 CLUSTERING AND CLASSIFICATION 95 Random Forests • Create a large number of trees based on random samples of our dataset. • Use a bootstrap sample for each random sample. • Variables used to create the splits are a random subsample of all of the features. • All trees are grown fully. • Majority vote determines membership of a new observation. BINF636 CLUSTERING AND CLASSIFICATION 96 Boosting and Bagging BINF636 CLUSTERING AND CLASSIFICATION 97 Boosting BINF636 CLUSTERING AND CLASSIFICATION 98 Evaluating Classifiers BINF636 CLUSTERING AND CLASSIFICATION 99 Resubstitution BINF636 CLUSTERING AND CLASSIFICATION 100 Cross Validation BINF636 CLUSTERING AND CLASSIFICATION 101 Leave-k-Out BINF636 CLUSTERING AND CLASSIFICATION 102 Cross-Validation Notes BINF636 CLUSTERING AND CLASSIFICATION 103 Test Set BINF636 CLUSTERING AND CLASSIFICATION 104 Some Classifier Results on the Golub ALL vs AML Dataset BINF636 CLUSTERING AND CLASSIFICATION 105 References - I • Richard O. Duda, Peter E. Hart, David G. Stork, 2001, Pattern Calssification, 2nd Edition. • Eisen MB, Spellman PT, Brown PO and Botstein D. (1998). Cluster Analysis and Display of Genome-Wide Expression Patterns. Proc Natl Acad Sci U S A 95, 14863-8. • Brian S. Everitt, Sabine Landau, Morven Leese ,(2001), Cluster Analysis,4th Edition, arnold. • Gasch AP and Eisen MB (2002). Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. Genome Biology 3(11), 1-22. • Gad Getz, Erel Levine, and Eytan Domany . (2000) Coupled two-way clustering analysis of gene microarray data, PNAS, vol. 97, no. 22, pp. 12079–12084. • Hastie T, Tibshirani R, Eisen MB, Alizadeh A, Levy R, Staudt L, Chan WC, Botstein D and Brown P. (2000). 'Gene Shaving' as a Method for Identifying Distinct Sets of Genes with Similar Expression Patterns. GenomeBiology.com 1, BINF636 CLUSTERING AND CLASSIFICATION 106 References - II • A. K. Jain, M. N. Murty, P. J. Flynn , (1999) Data clustering: a review, ACM Computing Surveys (CSUR), Volume 31 Issue 3. • John Quackenbush, (2001),COMPUTATIONAL ANALYSIS OF MICROARRAY DATA NATURE REVIEWS GENETICS VOLUME 2, 419, pp. 418 427 • Ying Xu, Victor Olman, and Dong Xu (2002)Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees Bioinformatics 2002 18: 536-545. BINF636 CLUSTERING AND CLASSIFICATION 107 References - III • Hastie, Tibshirani, Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2001. • Devroye, Györfi, Lugosi, A Probabilistic Theory of Pattern Recognition,1996. • Ripley, Pattern Recognition and Neural Networks, 1996. • Fukunaga, Introduction to Statistical Pattern Recognition, 1990. BINF636 CLUSTERING AND CLASSIFICATION 108