Transcript ppt

Bandits for Taxonomies:
A Model-based
Approach
Sandeep Pandey
Deepak Agarwal
Deepayan Chakrabarti
Vanja Josifovski
Ads
Ads
DB
(click)
Ad impression: Showing an ad to a user
Advertisers
The Content Match Problem
Ads
Ads
DB
Advertisers
The Content Match Problem
(click)
Ad click: user click leads to revenue for ad server
and content provider
Ads
Ads
DB
Advertisers
The Content Match Problem
The Content Match Problem:
Match ads to pages to maximize clicks
Ads
Ads
DB
Advertisers
The Content Match Problem
Maximizing the number of clicks means:

For each webpage, find the ad with the best
Click-Through Rate (CTR),

but without wasting too many impressions in
learning this.
Online Learning
Maximizing clicks requires:

Dimensionality reduction

Exploration

Exploitation
Both must occur together
Online learning is needed, since the system
must continuously generate revenue
Taxonomies for dimensionality
reduction
Root
Apparel
Computers
• Already exist
• Actively maintained
Travel
• Existing classifiers
to map pages and
ads to taxonomy
nodes
Page/Ad
Learn the matching from page nodes to ad nodes
 dimensionality reduction
Online Learning
Maximizing clicks requires:

Dimensionality reduction  Taxonomy

Exploration

Exploitation
?
Can taxonomies help in explore/exploit as well?
Outline




Problem
Background: Multi-armed bandits
Proposed Multi-level Policy
Experiments
Related Work
Conclusions
Background: Bandits
Bandit “arms”
p1
p2
p3
(unknown payoff
probabilities)
Pull arms sequentially so as to maximize the total
expected reward
• Estimate payoff probabilities pi
• Bias the estimation process towards better arms
Webpage 2 Webpage 1
Background: Bandits
Bandit “arms”
= ads
Webpage 3
~109
pages
~106 ads
Background: Bandits
Webpages
Ads
One bandit
Unknown CTR
Content Match = A matrix
• Each row is a bandit
• Each cell has an unknown CTR
Background: Bandits
Priority
1
Priority
2
Priority
3
Bandit Policy
1. Assign priority to
each arm
Allocation
2. “Pull” arm with max
priority, and observe
reward
Estimation
3. Update priorities
Background: Bandits
Why not simply apply a bandit policy directly to our
problem?
• Convergence is too slow
~109 bandits, with ~106 arms per bandit
• Additional structure is available, that can help
 Taxonomies
Outline



Problem
Background: Multi-armed bandits
Proposed Multi-level Policy
Experiments
Related Work
Conclusions
Multi-level Policy
Ads
classes
Webpages
classes
……
…
…
……
Consider only two levels
Multi-level Policy
Travel
CompuApparel
ters
CompuApparel
Travel
ters
Ad parent
classes
Ad child
classes
……
…
…
……
Consider only two levels
Block
One bandit
Multi-level Policy
Travel
CompuApparel
ters
CompuApparel
Travel
ters
Ad parent
classes
Ad child
classes
……
…
…
……
Block
One bandit
Key idea: CTRs in a block are homogeneous
Multi-level Policy

CTRs in a block are
homogeneous


Used in allocation (picking ad for
each new page)
Used in estimation (updating
priorities after each observation)
Multi-level Policy

CTRs in a block are
homogeneous

Used in allocation (picking ad for
each new page)
Used in estimation (updating
priorities after each observation)
Multi-level Policy (Allocation)
A
C
T
T
Page
classifier
C
A
?


Classify webpage  page class, parent page class
Run bandit on ad parent classes  pick one ad parent
class
Multi-level Policy (Allocation)
ad
?
C
T
T
Page
classifier
C
A
A




Classify webpage  page class, parent page class
Run bandit on ad parent classes  pick one ad parent
class
Run bandit among cells  pick one ad class
In general, continue from root to leaf  final ad
Multi-level Policy (Allocation)
ad
T
Page
classifier
C
A
A
Bandits at higher levels
 use aggregated information
 have fewer bandit arms
 Quickly figure out the best ad parent
class
C
T
Multi-level Policy

CTRs in a block are
homogeneous
Used in allocation (picking ad for
each new page)
Used in estimation (updating
priorities after each observation)
Multi-level Policy (Estimation)

CTRs in a block are
homogeneous


Observations from one cell also
give information about others in
the block
How can we model this
dependence?
Multi-level Policy (Estimation)

Shrinkage Model
# clicks in cell
# impressions
in cell
Scell | CTRcell ~ Bin (Ncell, CTRcell)
CTRcell ~ Beta (Paramsblock)
All cells in a block come from the same distribution
Multi-level Policy (Estimation)

Intuitively, this leads to shrinkage
of cell CTRs towards block
CTRs
E[CTR] = α.Priorblock + (1-α).Scell/Ncell
Estimated
CTR
Beta prior
(“block CTR”)
Observed
CTR
Outline


Problem
Background: Multi-armed bandits
Proposed Multi-level Policy
Experiments
Related Work
Conclusions
Experiments
Depth 0
Depth 1
20 nodes
221 nodes
…
Depth 2
Root
Depth
7
~7000 leaves
Taxonomy structure
We use
these 2
levels
Experiments




Data collected over a 1 day period
Collected from only one server, under some
other ad-matching rules (not our bandit)
~229M impressions
CTR values have been linearly transformed
for purposes of confidentiality
Clicks
Experiments (Multi-level
Policy)
Number of pulls
Multi-level gives much higher #clicks
Mean-Squared Error
Experiments (Multi-level
Policy)
Number of pulls
Multi-level gives much better Mean-Squared Error
 it has learnt more from its explorations
Number of pulls
Mean-Squared Error
Clicks
Experiments (Shrinkage)
without
shrinkage
with
shrinkage
Number of pulls
Shrinkage  improved Mean-Squared Error,
but no gain in #clicks
Outline

Problem
Background: Multi-armed bandits
Proposed Multi-level Policy
Experiments
Related Work
Conclusions
Related Work

Typical multi-armed bandit problems



Bandits with side information


Do not consider dependencies
Very few arms
Cannot handle dependencies among ads
General MDP solvers


Do not use the structure of the bandit problem
Emphasis on learning the transition matrix, which
is random in our problem.
Conclusions


Taxonomies exist for many datasets
They can be used for



Dimensionality Reduction
Multi-level bandit policy  higher #clicks
Better estimation via shrinkage models  better
MSE