A Support Vector Method for Optimizing Rank

Download Report

Transcript A Support Vector Method for Optimizing Rank

Beyond Position Bias: Examining Result Attractiveness as a Source of Presentation Bias in Clickthrough Data

WWW 2010

Yisong Yue Cornell Univ.

Rajan Patel Google Inc.

Hein Roehrig Google Inc.

User Feedback in Search Systems

• Cheap & representative feedback – Evaluation metrics – Optimization criterion – How to interpret feedback accurately?

• Clicks on (web) search results – Data plentiful – Important domain

Interpreting Clicks

•What does click mean?

•Click means good?

•How good?

How Are Clicks Biased?

• In what ways do clicks not directly reflect user utility or preferences?

• Presentation Bias – Only click on what they pay attention to – E.g., position bias (more clicks at top of ranking) • Understanding presentation bias essential to more accurately interpreting feedback

•Maybe 3 rd result looked more relevant •i.e., judging a book by its cover •Maybe 3 rd result attracted more attention •E.g., eye-catching •Many matching query terms (in bold)

Summary Attractiveness

• Goal: quantify the effect of summary attractiveness on click behavior – Web search context • First study to conduct a rigorous statistical analysis on summary attractiveness bias

Controlling for Position

• Position bias is the largest biasing effect • Need to control for it in order to analyze other biasing effects • Use FairPairs randomization – [Radlinski & Joachims, 2006]

FairPairs Example

• Original: 1 2 3 4 5 6 7 8 9 10 • FairPair1: 1 2 3 4 5 6 7 8 9 10 • Swap: 2 1 3 4 6 5 8 7 9 10 • FairPair2: 1 2 3 4 5 6 7 8 9 10 • Swap: 1 2 3 5 4 7 6 9 8 10 • Randomly choose pairing scheme • Randomly swap each intra-pair ordering independently [Radlinski & Joachims, AAAI 2006]

Interpreting FairPairs Clicks

A on top B on top Click on A 55% 40% Conclusion: B > A Click on B 45% 60% Clicks indicate pairwise preference (relative quality).

Thought Experiment

• Two results A & B – Equally relevant for some query – Ranked adjacently in search results • AB and BA shown equally often (FairPairs) • A has an attractive title. B does not.

• Who gets more clicks, A or B?

Click Data

• Ran FairPairs randomization – A portion of Google US web search traffic.

– 8/1/2009 to 8/20/2009 – 439,246 clicks collected

Human Judged Ratings

• Sampled a subset of 1150 FairPairs.

• Asked human raters to explicitly judge which of the pair is more relevant.

– 5 judgments for each • Human raters must navigate to landing page.

Measuring Attractiveness

• Relative measure of attractiveness • Difference of bolded query terms in title & abstract • Bottom result has +2 bolded terms in title • Bottom result has +2 bolded terms in abstract

Measuring Attractiveness

• Clearly, query/title similarity is informative.

• Good results should have titles that strongly match • But would blindly counting clicks cause us to over-value query/title similarity?

Rated Clicks Model

Null Hypothesis

• Title & abstract bolding have 0 effect • Position and relative (judged) quality are the only factors affecting click probability.

Fitted Model

Param

Base

Title

Abstract Swap Human

Mean

0.653 **

0.150 **

0.039

-0.435 ** -0.360 **

95% Conf. Interv.

+/- 0.183

+/- 0.120

+/- 0.120

+/- 0.209

+/- 0.215

Leveraging All Clicks

• Previous model required human judgments • We need to calibrate against relative quality • How to do this on all 400,000+ clicks?

• Make independence assumptions!

Intuition

• Virtually all search engines predict rankings using many attributes (or features).

• Query/title similarity is only one component. • Example: a document with low query/title similarity might achieve high ranking due to very relevant body text.

1.5

1.0

1.2

Example

1.2

2.0

1.5

0.9

2.0

0.5

1.0

1.5

1.4

1.9

1.7

1.0

1 st 2 nd feature: query/title similarity feature: query/body similarity

1.5

1.0

> 1.0

1.2

Example

1.2

2.0

> 1.5

0.9

2.0

0.5

> 1.0

1.5

1.4

1.9

> 1.7

1.0

1 st 2 nd feature: query/title similarity feature: query/body similarity

Assumption

• Take pairs of adjacent documents at random • Collect relative relevance ratings – Human rated preferences • Should be independent of title bolding difference • Can check using statistical model

Rated Agreement Model

Fitted Model

Param

Base

Title

Abstract

Mean

0.258 **

0.018

0.058

95% Conf. Interv.

+/- 0.062

+/- 0.060

+/- 0.060

Assumption approximately satisfied for query/title similarity.

Title Bias Effect (All Clicks)

• Bars should be equal if not biased

0.6

0.5

0.4

0.3

0.2

0.1

All Clicks Model

Evaluation Metrics & Optimization

• Pairwise preferences common for evaluation – E.g., maximize FairPairs agreement • Goal: maximize pairwise relevance agreement – Want to be aligned with click agreement –

Danger:

might conclude current system is undervaluing query/title similarity • Down-weight clicks on results with more title bolding – E.g., weight clicks by exp(-

w T X T

)

Directions to Explore

• Other ways to measure summary attractiveness – Use other summary content • Other forms of presentation bias – Anything that draws people’s attention • Ways to interpret and adjust for bias – More accurate ways to quantify bias – More accurate evaluation metrics

Extra Slides

Fitted Model (All Clicks)

Param

Base Top Title Bot Title Top Abstract Bot Abstract Swap @ 1 Swap @ 2 Swap @ 3 Swap @ 4-5 Swap @ 6-9 Swap @ 10+

Mean

0.184 ** 0.060 ** 0.061 ** 0.007

0.014 ** 0.561 ** 0.390 ** 0.372 ** 0.198 ** 0.009

0.054 **

95% Conf. Interv.

+/- 0.007

+/- 0.008

+/- 0.009

+/- 0.008

+/- 0.011

+/- 0.012

+/- 0.016

+/- 0.014

+/- 0.009

A Support Vector Method for Optimizing Rank

Transcript A Support Vector Method for Optimizing Rank

User Feedback in Search Systems

Interpreting Clicks

How Are Clicks Biased?

Summary Attractiveness

Controlling for Position

FairPairs Example

Interpreting FairPairs Clicks

Thought Experiment

Click Data

Human Judged Ratings

Measuring Attractiveness

Measuring Attractiveness

Null Hypothesis

Fitted Model

Leveraging All Clicks

Intuition

Example

Example

Assumption

Fitted Model

Title Bias Effect (All Clicks)

Evaluation Metrics & Optimization

Directions to Explore

Extra Slides

Fitted Model (All Clicks)

Directory