Bregman Divergences in Clustering and Dimensionality Reduction COMS 6998-4: Learning and Empirical Inference Irina Rish IBM T.J.
Download ReportTranscript Bregman Divergences in Clustering and Dimensionality Reduction COMS 6998-4: Learning and Empirical Inference Irina Rish IBM T.J.
Bregman Divergences in Clustering and Dimensionality Reduction COMS 6998-4: Learning and Empirical Inference Irina Rish IBM T.J. Watson Research Center Slide credits: Srujana Merugu, Arindam Banerjee, Sameer Agarwal Outline Intro to Bregman Divergences Clustering with Bregman Divergences Dimensionality Reduction with Bregman Divergences k-means: quick overview From Euclidean distance to Bregman divergences Some rate-distortion theory PCA: quick overview Probabilistic Interpretation of PCA; exponential family From Euclidean distance to Bregman divergences Conclusions Distance (distortion) measures in learning Euclidean distance – most commonly used But…is it always an appropriate type of distance? No! Nearest neighbor, k-means clustering, least squares regression, PCA, distance metric learning, etc Nominal attributes (e.g. binary) Distances between distributions Probabilistic interpretation: Euclidean distance Gaussian data Beyond Gaussian? Exponential family distributions Bregman divergences Squared Euclidean distance is a Bregman divergence Relative entropy (i.e., KL-divergence) is another Bregman divergence Recall Bregman Diverences Now, how about generalizing soft clustering Algorithms using Bregman divergences? (natural parameter) ( - natural parameter, expectatio n parameter) Add a bit of unit-variance Gaussian noise to each point Now remove the original model… Remember the exponential family? Remember Bregman Divergences? Discussion