Learning to rank Web Science 2013

Download Report

Transcript Learning to rank Web Science 2013

Learning to rank Web Science 2013

Jaspreet Singh

1

Overview

• •

Optimizing search engines using click through data. Thorsten Joachims, SIGKDD 2002.

Large Scale learning to rank. D. Sculley.

Machine Learning Algorithm Retrieval function 2

Optimizing search engines using click through data.

• • Explicit feedback vs Click through data Click through data as triplets (q,r,c) q is the query, r is the ranking, c is the list of links the user has clicked on Assuming that the user scanned the ranking from top to bottom, he must have observed link 2 before clicking on 3, making a decision to not click on it.

3

Learning of retrieval functions

• • • • Exact ordering of documents close to impossible Measure similarity between optimal ordering and given ordering using average precision (Kendall’s tau) Maximizing Kendall’s tau is equivalent to reducing the average rank.

For a fixed but unknown distribution Pr(q,r ∗ ) of queries and target rankings on a document collection D with m documents, the goal is to learn a retrieval function f(q) for which the expected Kendall’s τ is maximal • • The above equation is equivalent to a risk function where – τ is the loss.

Empirical risk minimization principle states that the learning algorithm should choose a hypothesis which minimizes the empirical risk 4

Rank SVM

• • Is it possible to design an algorithm and a family of ranking functions F so that finding the function f belonging to F maximizing τ is efficient and that this function generalizes well beyond the training data.

Usage of weight vectors to adjust rank.

• Instead of maximizing τ directly, it is equivalent to minimize the number of discordant pairs in the calculation of τ. This is equivalent to finding the weight vector so that the maximum number of the following inequalities is fulfilled: 5

Rank SVM

• • • NP hard problem similar to SVM classification Use some regularization parameters to bound and approximate the result.

SVM light 6

Experiments

• • Meta search engine used to collect results from the best search engines and combine them into a single list by union.

To be able to compare the quality of different retrieval functions, the key idea is to present two rankings at the same time. Then measure which ranking has more clicks.

Ranking A 1. D1 2. D2 3. D3 Ranking B 1. D4 2. D5 3. D6 Union 1. D1 2. D4 3. D2 4. D5 5. D3 6. D6 7

Experiments

• • • Offline experiment : verify that the Ranking SVM can indeed learn a retrieval function maximizing Kendall’s tau on partial preference feedback.

Split the collected queries into training and test set and then train the classifier using SVM light.

Result : Ranking SVM can learn regularities in the preferences. More the training queries lesser the error.

• • • Online experiment : verifies that the learned retrieval function does improve retrieval quality as desired. The learned retrieval function is compared against : Google, MSNSearch, Toprank Result : More links from the learned ranking clicked on.

8

Conclusion

• • • The key insight is that such click through data can provide training data in the form of relative preferences The experimental results show that the Ranking SVM can successfully learn an improved retrieval function from click through data. Without any explicit feedback or manual parameter tuning, it has automatically adapted to the particular preferences of a group of 20 users(112 queries) .

There is a trade-off between the amount of training data (ie. large group) and maximum homogeneity (ie. single user) 9

Overview

• • Optimizing search engines using click through data. Thorsten Joachims, SIGKDD 2002.

Large Scale learning to rank. D. Sculley.

Machine Learning Algorithm Retrieval function 10

Large scale learning to rank

• • Pair-wise learning to rank methods such as Rank SVM give good performance, but suffer from the computational burden of optimizing an objective defined over O(n 2 ) possible pairs for data sets with n examples.

Removal of super-linear dependence on training set size by sampling pairs from an implicit pair-wise expansion and applying efficient stochastic gradient descent learners for approximate SVMs • The main approach of this paper is to adapt the pair-wise learning to rank problem into the stochastic gradient descent framework 11

Optimization and stochastic gradient descent

• • • The paper is restricted to solving the classic Rank SVM optimization problem, first posed by Joachims: Minimize the hinge loss.

Stochastic gradient descent is a gradient descent optimization method for minimizing an objective function that is written as a sum of differentiable functions.

generalization ability of stochastic gradient descent relies only on the number of stochastic steps taken, not the size of the data set 12

Indexed Sampling - GetRandomnPair

• • • 2 level nested hashmap First level : query is key Second level: rank is key 13

Stochastic gradient descent

• • • • • • Stochastic implies sampling Gradient descent is a step wise process to find the local minimum of a function.

Rank SVM has a hinge loss function. The hinge loss is used for "maximum margin" classification.

Hence we need to minimize this function and get a good classifier.

The hinge loss is a convex function, so many of the usual convex optimizers used in machine learning can work with it. Hence we can use SGD.

Depending on how they perform updates to the weight vector there are many SGD variations.

14

LETOR Experiment and Results

• • • LETOR: Learning to Rank for Information Retrieval Ranking performance: comparable if not better Training speed: 100 times faster 15

Conclusion

• • • Click through data can be used as partial relevance feedback We can learn a retrieval function that can improve mean average precision Learning retrieval functions can be done on a large scale using stochastic gradient descent.

Machine Learning Algorithm Retrieval function 16