Transcript ppt
Online and Batch Learning of Pseudo-Metrics Shai Shalev-Shwartz Hebrew University, Jerusalem Joint work with Yoram Singer, Google Inc. Andrew Y. Ng, Stanford University Learning of Pseudo-Metrics. Slide 1 Motivating Example Learning of Pseudo-Metrics. Slide 2 Our Technique • Map instances into a space in which distances correspond to labels Learning of Pseudo-Metrics. Slide 3 Outline • • • • • • Distance learning setting Large margin for distances An online learning algorithm Online loss analysis A dual version Experiments: • Online - document filtering • Batch - handwritten digit recognition Learning of Pseudo-Metrics. Slide 4 Problem Setting • Training examples: • two instances • similarity label • Hypotheses class: Pseudo-metrics matrix symmetric positive semi-definite matrix Learning of Pseudo-Metrics. Slide 5 Large Margin for Pseudo-Metrics • Sample S is -separated w.r.t. a metric Learning of Pseudo-Metrics. Slide 6 Batch Formulation s.t. s.t. Learning of Pseudo-Metrics. Slide 7 If: Pseudo-metric Online we want that Learning Algorithm (POLA) For • Get two instances If: we want that • Calculate distance • Predict • Get true label • Update matrix Learning of Pseudo-Metrics. and suffer hinge-loss and threshold Slide 8 Core Update: Two Projections • Start with • An example defines a half-space • is the projection of onto this half-space • is the projection of onto the PSD cone Learning of Pseudo-Metrics. Slide 10 Online Learning • Goal – minimize cumulative loss • Why Online? • • • • • Online processing tasks (e.g. Text Filtering) Simple to implement Memory and run-time efficient Worst-case bounds on the performance Online to batch conversions Learning of Pseudo-Metrics. Slide 11 Online Loss Bound • sequence of examples s.t. • • Then, any fixed matrix and threshold “Complexity” of Loss suffered by Loss bound does not depend on dimension Learning of Pseudo-Metrics. Slide 12 Incorporating Kernels • Matrix A can be written as , where • Therefore: Learning of Pseudo-Metrics. Slide 13 Online Experiments • Task: Document filtering according to topics • Dataset: Reuters-21578 • 10,000 documents • Documents labeled as Relevant and Irrelevant • A few relevant documents (1% - 10% of entire set) • Algorithms: • • • • POLA 1 Nearest Neighbor (1-NN) Perceptron Algorithm Perceptron Algorithm with Uneven Margins (PAUM) (Li, Zaragoza, Herbrich, Shawe-Taylor, Kandola) Learning of Pseudo-Metrics. Slide 14 POLA for Document Filtering • Get a document • Calculate distance to relevant documents observed so far using current matrix • Predict: document is relevant iff the distance to the closest relevant document is smaller than the current threshold • Get true label • Update matrix and threshold Learning of Pseudo-Metrics. Slide 15 Document Filtering Results • Each blue point corresponds to one topic • Y-axis designates the error of POLA PAUM error Learning of Pseudo-Metrics. POLA error POLA error POLA error • Points beneath the black diagonal line mean that POLA wins Perceptron error 1-NN error Slide 16 Batch Experiments • Task: Handwritten digits recognition • Dataset: MNIST dataset • 45 binary classification problems (all pairs) • 10,000 training examples • 10,000 test examples • Algorithms: Used k-NN with various metrics: • Pseudo-metric learned by POLA • Euclidean distance • Metric induced by Fisher Discriminant Analysis (FDA) • Metric learned by Relevant Component Analysis (RCA) (Bar-Hillel, Hertz, Shental, and Weinshall) Learning of Pseudo-Metrics. Slide 17 MNIST Results • Each blue point corresponds to one binary classification problem • Y-axis designates the error of POLA • Points beneath the black diagonal line mean that POLA wins RCA error FDA error Euclidean distance error RCA was applied after using PCA as a preprocessing step Learning of Pseudo-Metrics. Slide 18 Toy problem A color-coded matrix of Euclidean distances between pairs of images Learning of Pseudo-Metrics. Slide 20 Metric found by POLA Learning of Pseudo-Metrics. Slide 21 Mapping found by POLA • Our Pseudo-metrics: Learning of Pseudo-Metrics. Slide 22 Mapping found by POLA Learning of Pseudo-Metrics. Slide 23 Summary and Extensions • An online algorithm for learning pseudo-metrics • Formal properties, good experimental results Extensions: • Alternative regularization schemes to the Frobenius norm • “Learning to learn”: • Learning a metric from one set of classes and apply to another set of related classes Learning of Pseudo-Metrics. Slide 24 • Hello bye = w ¢ x Learning of Pseudo-Metrics. Slide 25