Slides (PPT)

Transcript Slides (PPT)

Catching the Drift:
Learning Broad Matches
from Clickthrough Data
Sonal Gupta, Misha Bilenko, Matt Richardson
Introduction

Keyword-based online advertising: bidded keywords are extracted from context


Context: query (search ads) or page (content ads)
Broad matching: expanding keywords via keyword-to-keywords mapping
 Example: electric cars  tesla, hybrids, toyota prius, golf carts
kw1
kw11
kw12
Ad
kw1
Query
Broad

Selection
kw2
Keyword
or
Match

and

Extraction
Web Page
Expansion
kwn
Ranking
kwn
kwn1
kwn2
Selected Ads

Expanded Keywords
Extracted Keywords

Ad1
Ad2

Adk
Broad matching benefits advertisers (increased reach, less campaign tuning),
users (more relevant ads), ad platform (higher monetization)
Identifying Broad Matches

Good keyword mappings retrieve relevant ads that users click

How to measure what is relevant and likely to be clicked?


Human judgments: expensive, hard to scale
Past user clicks: provide data for kw → kw’ when user was shown
ad(kw') in context of kw


Highly available, less trustworthy
What similarity functions may indicate relevance of kw → kw' ?



Syntactic (edit distance, TF-IDF cosine, string kernels, …)
Co-occurrence (in documents, query sessions, bid campaigns, …)
Expanded representation (search result snippets, category bags, …)
Approach

Task: train a learner to estimate p(click | kw → kw' ) for any kw → kw'

Data


<kw, ad(kw' ), click> triples from clickthrough logs, where kw → kw' was
suggested by previous broad match mappings
Features

Convert each pair to a feature vector capturing similarities etc.
ϕ1(kw, kw' )
(kw → kw') →
…
where ϕi(kw, kw' ) can be any
function of kw, kw' or both
ϕn(kw, kw' )


For each triple <kw, ad(kw'), click>, create an instance: (ϕ(kw, kw' ), click)
Learner: max-margin averaged perceptron (strong theory, very efficient)
Example: Creating an Instance

Historical broad match clickthrough data: kw  kw'  ad(kw' ) click event
digital slr
seattle baseball

 canon rebel
 Canon Rebel Kit for $499  click
 mariners tickets  Mariners season tickets
 no click
Feature functions
Original kw

Broad match kw'
ϕ1
ϕ2
ϕ3
digital slr
canon rebel
0.78
0.001
0.9
seattle baseball
mariners tickets
0.05
0.02
0.2
Instances
[0.78 0.001 0.9], 1
[0.05 0.02 0.2], 0
5
Experiments

Data

2 months of previous broad match ads from Microsoft Content Ads logs



1 month for training, 1 month for testing
68 features (syntactic, co-occurrence based, etc.); greedy feature selection
Metrics

LogLoss:

LogLoss Lift: difference between obtained LogLoss and an oracle that has
access to empirical p(click | kw → kw' ) in test set.

CTR and revenue improvements in live test with users
Results
LogL Lift
-LogLoss
0.14
0.67
0.66
0.1224
0.6572
0.12
0.65
0.1
0.64
0.63
0.08
0.0685
0.62
0.06
0.61
0.6033
0.6
0.04
0.59
0.02
0.58
0
0.57
Prior
Feature
Selection+Online
Prior
Feature
Selection+Online
Live Test Results

Use CTR prediction to maximize expected revenue


Re-rank mappings to incorporate revenue
+18% revenue, -2% CTR
Revenue
CTR
120.0%
120.0%
100.0%
100.0%
80.0%
80.0%
60.0%
60.0%
40.0%
40.0%
20.0%
20.0%
0.0%
0.0%
BM1
BM2
BM6
Our
BM1
BM2
BM6
Our
Online Learning with Amnesia

Advertisers, campaigns, bidded keywords and delivery contexts
change very rapidly: high concept drift

Recent data is more informative


Goal: utilize older data while capturing changes in distributions
Averaged Perceptron doesn’t capture drift
w

avg
t
1
= (wt  wt 1  wt  2  ...)
t
Solution: Amnesiac Averaged Perceptron

Exponential weight decay when averaging hypotheses
wtavg =  (wt  (1   )wt 1  (1   )2 wt 2  ...)
Results
Model
-LogLoss
LogL Lift
Feature Selection + Online Learning +
Amnesia
0.5709
0.0361
Online+Feature Selection, No Amnesia
0.6033
0.0685
Online+Amnesia, No Feature Selection
0.6563
0.1215
Feature Selection+Amnesia, Weekly Batch
0.5948
0.0600
Prior
0.6572
0.1224
Contributions and Conclusions
learning broad matches from implicit feedback

Combining arbitrary similarity measures/features

Using clickthrough logs as implicit feedback

Amnesiac Averaged Perceptron
 Exponentially weighted averaging: distant examples “fade out”

Online learning adapts to market dynamics
Thank You!
Features and Feature Selection

Co-occurrence feature examples:


User search sessions: keywords searched within 10 mins
Advertiser campaigns: keywords co-bidded by the same advertiser

Past clickthrough rates of original and broad matched keywords
Various syntactic similarities
Various existing broad matching lists
and so on…

Feature Selection:





13
A total of 68 features
Greedy feature selection
Additional Information

Estimation of expected value of click over all the ads shown for a
broad match mapping E(p(click(ad(kw))|q))

Query Expansion vs. Broad Matching



Our broad matching algorithm can be extended for query expansion
But, broad matching is for a fixed set of bidded keywords
Forgetron vs. Amesiac Averaged Perceptron


Forgetron maintains a set of budget support vectors: stores examples
explicitly and does not take into account all the data
AAP: weighted average over all the examples, no need to store
examples explicitly
Results
Model
-LogLoss
LogL Lift
Prior
0.6572
0.1224
Feature Selection + Online Learning +
Amnesia
0.5709
0.0361
Online+Amnesia, No Feature Selection
0.6563
0.1215
Feature Selection+Amnesia, Weekly Batch
0.5948
0.0600
Online+Feature Selection, No Amnesia
0.6033
0.0685
Amnesiac Averaged Perceptron
16

Slides (PPT)

Transcript Slides (PPT)

Directory