Transcript ppt
Bandits for Taxonomies:
A Model-based
Approach
Sandeep Pandey
Deepak Agarwal
Deepayan Chakrabarti
Vanja Josifovski
Ads
Ads
DB
(click)
Ad impression: Showing an ad to a user
Advertisers
The Content Match Problem
Ads
Ads
DB
Advertisers
The Content Match Problem
(click)
Ad click: user click leads to revenue for ad server
and content provider
Ads
Ads
DB
Advertisers
The Content Match Problem
The Content Match Problem:
Match ads to pages to maximize clicks
Ads
Ads
DB
Advertisers
The Content Match Problem
Maximizing the number of clicks means:
For each webpage, find the ad with the best
Click-Through Rate (CTR),
but without wasting too many impressions in
learning this.
Online Learning
Maximizing clicks requires:
Dimensionality reduction
Exploration
Exploitation
Both must occur together
Online learning is needed, since the system
must continuously generate revenue
Taxonomies for dimensionality
reduction
Root
Apparel
Computers
• Already exist
• Actively maintained
Travel
• Existing classifiers
to map pages and
ads to taxonomy
nodes
Page/Ad
Learn the matching from page nodes to ad nodes
dimensionality reduction
Online Learning
Maximizing clicks requires:
Dimensionality reduction Taxonomy
Exploration
Exploitation
?
Can taxonomies help in explore/exploit as well?
Outline
Problem
Background: Multi-armed bandits
Proposed Multi-level Policy
Experiments
Related Work
Conclusions
Background: Bandits
Bandit “arms”
p1
p2
p3
(unknown payoff
probabilities)
Pull arms sequentially so as to maximize the total
expected reward
• Estimate payoff probabilities pi
• Bias the estimation process towards better arms
Webpage 2 Webpage 1
Background: Bandits
Bandit “arms”
= ads
Webpage 3
~109
pages
~106 ads
Background: Bandits
Webpages
Ads
One bandit
Unknown CTR
Content Match = A matrix
• Each row is a bandit
• Each cell has an unknown CTR
Background: Bandits
Priority
1
Priority
2
Priority
3
Bandit Policy
1. Assign priority to
each arm
Allocation
2. “Pull” arm with max
priority, and observe
reward
Estimation
3. Update priorities
Background: Bandits
Why not simply apply a bandit policy directly to our
problem?
• Convergence is too slow
~109 bandits, with ~106 arms per bandit
• Additional structure is available, that can help
Taxonomies
Outline
Problem
Background: Multi-armed bandits
Proposed Multi-level Policy
Experiments
Related Work
Conclusions
Multi-level Policy
Ads
classes
Webpages
classes
……
…
…
……
Consider only two levels
Multi-level Policy
Travel
CompuApparel
ters
CompuApparel
Travel
ters
Ad parent
classes
Ad child
classes
……
…
…
……
Consider only two levels
Block
One bandit
Multi-level Policy
Travel
CompuApparel
ters
CompuApparel
Travel
ters
Ad parent
classes
Ad child
classes
……
…
…
……
Block
One bandit
Key idea: CTRs in a block are homogeneous
Multi-level Policy
CTRs in a block are
homogeneous
Used in allocation (picking ad for
each new page)
Used in estimation (updating
priorities after each observation)
Multi-level Policy
CTRs in a block are
homogeneous
Used in allocation (picking ad for
each new page)
Used in estimation (updating
priorities after each observation)
Multi-level Policy (Allocation)
A
C
T
T
Page
classifier
C
A
?
Classify webpage page class, parent page class
Run bandit on ad parent classes pick one ad parent
class
Multi-level Policy (Allocation)
ad
?
C
T
T
Page
classifier
C
A
A
Classify webpage page class, parent page class
Run bandit on ad parent classes pick one ad parent
class
Run bandit among cells pick one ad class
In general, continue from root to leaf final ad
Multi-level Policy (Allocation)
ad
T
Page
classifier
C
A
A
Bandits at higher levels
use aggregated information
have fewer bandit arms
Quickly figure out the best ad parent
class
C
T
Multi-level Policy
CTRs in a block are
homogeneous
Used in allocation (picking ad for
each new page)
Used in estimation (updating
priorities after each observation)
Multi-level Policy (Estimation)
CTRs in a block are
homogeneous
Observations from one cell also
give information about others in
the block
How can we model this
dependence?
Multi-level Policy (Estimation)
Shrinkage Model
# clicks in cell
# impressions
in cell
Scell | CTRcell ~ Bin (Ncell, CTRcell)
CTRcell ~ Beta (Paramsblock)
All cells in a block come from the same distribution
Multi-level Policy (Estimation)
Intuitively, this leads to shrinkage
of cell CTRs towards block
CTRs
E[CTR] = α.Priorblock + (1-α).Scell/Ncell
Estimated
CTR
Beta prior
(“block CTR”)
Observed
CTR
Outline
Problem
Background: Multi-armed bandits
Proposed Multi-level Policy
Experiments
Related Work
Conclusions
Experiments
Depth 0
Depth 1
20 nodes
221 nodes
…
Depth 2
Root
Depth
7
~7000 leaves
Taxonomy structure
We use
these 2
levels
Experiments
Data collected over a 1 day period
Collected from only one server, under some
other ad-matching rules (not our bandit)
~229M impressions
CTR values have been linearly transformed
for purposes of confidentiality
Clicks
Experiments (Multi-level
Policy)
Number of pulls
Multi-level gives much higher #clicks
Mean-Squared Error
Experiments (Multi-level
Policy)
Number of pulls
Multi-level gives much better Mean-Squared Error
it has learnt more from its explorations
Number of pulls
Mean-Squared Error
Clicks
Experiments (Shrinkage)
without
shrinkage
with
shrinkage
Number of pulls
Shrinkage improved Mean-Squared Error,
but no gain in #clicks
Outline
Problem
Background: Multi-armed bandits
Proposed Multi-level Policy
Experiments
Related Work
Conclusions
Related Work
Typical multi-armed bandit problems
Bandits with side information
Do not consider dependencies
Very few arms
Cannot handle dependencies among ads
General MDP solvers
Do not use the structure of the bandit problem
Emphasis on learning the transition matrix, which
is random in our problem.
Conclusions
Taxonomies exist for many datasets
They can be used for
Dimensionality Reduction
Multi-level bandit policy higher #clicks
Better estimation via shrinkage models better
MSE