Finding Dense and Isolated Submarkets in a Sponsored Search
Transcript Finding Dense and Isolated Submarkets in a Sponsored Search
Modelling Relevance and User Behaviour in
Sponsored Search using
Adarsh Prasad, IIT Delhi
Advisors: Dinesh Govindaraj
Group: Revenue and Relevance
*-Visiting Researcher from Purdue
• Click-Data seems to be the perfect source of information
when deciding which Ads to show in answer to a query. It can
be thought as the result of users voting in favour of the
documents they find interesting.
• This information can be fed into the ranker, to tune search
parameters or even use as training points as for the ranker.
• The aim of the project is to develop a model which takes in
Click-Data and generates output in the form of constraints or
updated ranking score as input to the ranker.
• Quality of training points is of critical importance for learning a ranking
• Currently, labeled data collected using human judges. Human-labeling is
time-consuming and labor-intensive.
• Need to ensure “temporal relevance” of Ads i.e. Something relevant
today might not be relevant 6 months later, therefore labeling must be
repeated and there is a need for automation of labeling process
Main Difficulty – Presentation Bias
•Results at lower positions are less likely to be clicked even if they
•Clicks depend on other Ads being shown.(Externalities)
URL = www.myspace.com
Market = U.K.
Pos 1: uk.myspace.com: ctr = 0.97
Pos 2: www.myspace.com: ctr = 0.11
 Oliver Chapelle et al. A Dynamic Bayesian Click Model for Web Search Ranking
Pos 1 : www.myspace.com : ctr = 0.97
For learning a web search function, clicks can be used as a target or as a
• Use of Click Data as target : Useful for markets with few editorial
• Train on pairwise preferences: Two Sets of preferences:
PE from editorial judgments and PC coming from click modeling.
1. Deriving Preference Relations on
the basis of click-pattern and
feeding them as constraints to
• Position and Order-of-Click
• Aggregate Constraints
1. Sample Clicked Ads and label them as
2. Types of Sampling:
• Position based Weighted : User Clicking
ml-4 Ad stronger signal of relevance as
compared to user clicking ml-1
3. Feed them to the Binary Classifier
 Joachims et al. Optimizing Search Engines using Clickthrough Data
 Agichtein et al. Improving web search ranking via incorporating User Behaviour
 Joachims et al. Accurately interpreting ClickThrough Data as Implicit Feedback
Fisher Score = √(𝝈𝟏 𝟐+ 𝝈𝟐 𝟐)
Log Loss (Label Based)
Background on Click Models
• Use CTR (click-through rate) data.
• Pr(click) = Pr(examination) x Pr(click | examination)
• Need user browsing models to estimate Pr(examination)
• Φ(i) : result at position i
• Examination event:
• Click event:
1, if theuser examined (i)
1, if theuser clickedon (i)
Richardson et al, WWW 2007:
Pr(Ci = 1) = Pr(Ei = 1) Pr(Ci = 1 | Ei = 1)
• αi : position bias
• Depends solely on position.
• Can be estimated by looking at CTR of the same result in different
Using Prior Clicks
Pr(E5 | C1) = 0.3
Pr(E5 | C1,C3) = 0.5
Examination depends on prior clicks
• Cascade model
• Dependent click model (DCM)
• User browsing model (UBM) [Dupret & Piwowarski, SIGIR
• More general and more accurate than Cascade, DCM.
• Conditions Pr(examination) on closest prior click.
• Bayesian browsing model (BBM) [Liu et al, KDD 2009]
• Same user behavior model as UBM.
• Uses Bayesian paradigm for relevance.
User browsing model (UBM)
• Use position of closest prior click to predict Pr(examination).
Pr(Ei = 1 | C1:i-1) = αi β i,p(i)
p(i) = position of
closest prior click
Pr(Ci = 1 | C1:i-1) = Pr(Ei = 1 | C1:i-1) Pr(Ci = 1 | Ei = 1)
Prior clicks don’t
Other Related Work
• Examination depends on prior clicks and prior relevance
• Click chain model (CCM)
• General click model (GCM)
• Post-click models
• Dynamic Bayesian model
• Session utility model
User Browsing in Sponsored
• Is user browsing in sponsored search similar to browsing in Web Search??
• Generally, the assumption in organic search is that users examine and click in a
linear top-to-bottom fashion.
• We observed that for sponsored search where the number of returned results is
few, a fair share (~ 30%) of users click out of order.
• Users behaving in a non-linear fashion is a strong signal, which may contain
• Combining position and temporal behavior of user.
The statistic(x) that has been counted
is the difference between the positions
of temporal clicks.
if the user clicks on ml1 and then ml2 then x = -1
if ml2 and then ml1 then x=1 and so on.
A New Model
• Allow users to move in a non-linear fashion
• Also, incorporate the notion of externalities, i.e. perceived
relevance changes with other clicks.
For learning our parameters, we
can use EM Algorithm.
(1) In E step, we estimate our
hidden parameters by a
(2) In M step- We have closed
form solutions to maximize the