Bregman Divergences in Clustering and Dimensionality Reduction COMS 6998-4: Learning and Empirical Inference Irina Rish IBM T.J.

Download Report

Transcript Bregman Divergences in Clustering and Dimensionality Reduction COMS 6998-4: Learning and Empirical Inference Irina Rish IBM T.J.

Bregman Divergences
in Clustering and Dimensionality
Reduction
COMS 6998-4:
Learning and Empirical Inference
Irina Rish
IBM T.J. Watson Research Center
Slide credits:
Srujana Merugu, Arindam Banerjee, Sameer Agarwal
Outline

Intro to Bregman Divergences

Clustering with Bregman Divergences




Dimensionality Reduction with Bregman Divergences




k-means: quick overview
From Euclidean distance to Bregman divergences
Some rate-distortion theory
PCA: quick overview
Probabilistic Interpretation of PCA; exponential family
From Euclidean distance to Bregman divergences
Conclusions
Distance (distortion) measures in learning

Euclidean distance – most commonly used


But…is it always an appropriate type of distance? No!



Nearest neighbor, k-means clustering, least squares regression,
PCA, distance metric learning, etc
Nominal attributes (e.g. binary)
Distances between distributions
Probabilistic interpretation:


Euclidean distance  Gaussian data
Beyond Gaussian? Exponential family distributions Bregman
divergences
Squared Euclidean distance is
a Bregman divergence
Relative entropy (i.e., KL-divergence) is another
Bregman divergence
Recall Bregman Diverences
Now, how about generalizing soft clustering
Algorithms using Bregman divergences?
(natural parameter)
( - natural parameter,   expectatio n parameter)
Add a bit of unit-variance Gaussian noise to each point
Now remove the original model…
Remember the exponential family?
Remember Bregman Divergences?
Discussion