Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski Ads Ads DB (click) Ad impression: Showing an ad to a user Advertisers The Content Match Problem.

Bandits for Taxonomies: A Model-based Approach

Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

Ad impression: Showing an ad to a user

Ad click: user click leads to revenue for ad server and content provider

Ads DB The Content Match Problem: Match ads to pages to maximize clicks

Ads DB Maximizing the number of clicks means:  For each webpage, find the ad with the best Click-Through Rate (CTR),  but without wasting too many impressions in learning this.

Online Learning Maximizing clicks requires:

 Dimensionality reduction  Exploration Both must occur together  Exploitation Online learning is needed, since the system must continuously generate revenue

Taxonomies for dimensionality reduction

Root Apparel Computers Travel • Already exist • Actively maintained • Existing classifiers to map pages and ads to taxonomy nodes Page/Ad Learn the matching from page nodes to ad nodes  dimensionality reduction

 Dimensionality reduction  Taxonomy  Exploration ?

 Exploitation Can taxonomies help in explore/exploit as well?


Problem Background: Multi-armed bandits  Proposed Multi-level Policy  Experiments  Related Work  Conclusions

Background: Bandits p 1 p 2 p 3

Bandit “arms” (unknown payoff probabilities) Pull arms sequentially so as to maximize the total expected reward • Estimate payoff probabilities p i • Bias the estimation process towards better arms

Bandit “arms” = ads ~10 9 pages ~10 6 ads

Ads One bandit Unknown CTR Content Match = A matrix • Each row is a bandit • Each cell has an unknown CTR

Priority 1 Priority 2 Priority 3

Allocation Estimation Bandit Policy

1. Assign priority to each arm 2.

“Pull” arm with max priority, and observe reward 3. Update priorities

Why not simply apply a bandit policy directly to our problem?

• Convergence is too slow ~10 9 bandits, with ~10 6 arms per bandit • Additional structure is available, that can help  Taxonomies


Problem Background: Multi-armed bandits Proposed Multi-level Policy  Experiments  Related Work  Conclusions

Webpages classes Ads classes …… … …… … Consider only two levels

Multi-level Policy

Apparel Compu ters Travel Ad parent classes Ad child classes …… … …… … Block One bandit Consider only two levels

Apparel Compu ters Travel Ad parent classes Ad child classes …… … …… … Block One bandit Key idea: CTRs in a block are homogeneous

 CTRs in a block are homogeneous   Used in allocation (picking ad for each new page) Used in estimation (updating priorities after each observation)

 CTRs in a block are homogeneous  Used in allocation (picking ad for each new page) Used in estimation (updating priorities after each observation)

Multi-level Policy (Allocation)



Page classifier   Classify webpage  page class, parent page class Run bandit on ad parent classes  class pick one ad parent

Page classifier     Classify webpage  page class, parent page class Run bandit on ad parent classes  class pick one ad parent Run bandit among cells  pick one ad class In general, continue from root to leaf  final ad

Multi-level Policy (Allocation)



Page classifier Bandits at higher levels   use aggregated information have fewer bandit arms  Quickly figure out the best ad parent class

 CTRs in a block are homogeneous Used in allocation (picking ad for each new page) Used in estimation (updating priorities after each observation)

 CTRs in a block are homogeneous  Observations from one cell also give information about others in the block  How can we model this dependence?

 Shrinkage Model # clicks in cell # impressions in cell S cell | CTR cell ~ Bin (N cell , CTR cell ) CTR cell ~ Beta (Params block ) All cells in a block come from the same distribution

 Intuitively, this leads to


of cell CTRs towards block CTRs E[CTR] = α.Prior

block + (1 α).S

cell /N cell Estimated CTR Beta prior (“block CTR”) Observed CTR


Problem Background: Multi-armed bandits Proposed Multi-level Policy Experiments  Related Work  Conclusions


Depth 0 Root Depth 1 20 nodes Depth 2 221 nodes Depth 7 ~7000 leaves Taxonomy structure We use these 2 levels


 Data collected over a 1 day period  Collected from only one server, under some other ad-matching rules (not our bandit)  ~229M impressions  CTR values have been linearly transformed for purposes of confidentiality

Number of pulls Multi-level gives much higher #clicks

Number of pulls Multi-level gives much better Mean-Squared Error  it has learnt more from its explorations

without shrinkage with shrinkage Number of pulls Number of pulls Shrinkage  improved Mean-Squared Error, but no gain in #clicks


Problem Background: Multi-armed bandits Proposed Multi-level Policy Experiments Related Work  Conclusions

 Typical multi-armed bandit problems   Do not consider dependencies Very few arms  Bandits with side information  Cannot handle dependencies among ads  General MDP solvers  Do not use the structure of the bandit problem  Emphasis on learning the transition matrix, which is random in our problem.


 Taxonomies exist for many datasets  They can be used for    Dimensionality Reduction Multi-level bandit policy  higher #clicks Better estimation via shrinkage models  better MSE