Handling Advertisements of Unknown Quality in Search Advertising Sandeep Pandey Christopher Olston (CMU and Yahoo! Research)

Download Report

Transcript Handling Advertisements of Unknown Quality in Search Advertising Sandeep Pandey Christopher Olston (CMU and Yahoo! Research)

Handling Advertisements of Unknown Quality in Search Advertising

Sandeep Pandey Christopher Olston (CMU and Yahoo! Research)

Sponsored Search

How does it work?

  Search engine displays ads next to search results Advertisers pay search engine per click 

Who benefits from it?

 Main source of funding for search engines  Information flow from advertisers to users

Sponsored Search

Search query results Sponsored search results  Click-through-rate (CTR): given an ad and a query, CTR = probability that the ad receives a click  Optimal policy to maximize search engine’s revenue: ads of highest (CTR x bid) value display

Challenges in Sponsored Search

 Problem:  CTRs initially unknown estimating CTRs requires going around the circle  Exploration/Exploitation Tradeoff:  explore ads to estimate CTRs  exploit known high-CTR ads to maximize revenue refine CTR estimates show ads record clicks earn revenue

The Advertisement Problem

 Problem:     Advertiser A i submits ad a User clicks on a j ij -> A i i,j for pays b ij Queries arrive one after another Select ads to show for each query, in an online fashion  Constraints:  Show at most C ads per query  Advertisers have daily budgets: A pays at most d i i  Goal: Maximize search engine’s revenue

Budgets d d 1 2 A 1 A 2 Ads a 1,1 a 1,3 a 2,1 Query phrases Q 1 Q 2 d 3 A 3 a Advertisers 3,2 Q 3

Our Approach

Unbudgeted Advertisement Problem

 Isomorphic to multi-armed bandit problem 

Budgeted Advertisement Problem

  Similar to bandit problem, but with additional budget constraints that span arms Introduce Budgeted Multi-armed Multi-bandit problem (

BMMP

)

Unbudgeted Advertisement Problem as

Multi-armed Bandit Problem

Bandit:

Classical example of online learning under the explore/exploit tradeoff   K arms. Arm i has an associated reward r i unknown payoff probability p i and Pull C arms at each time instant to maximize the reward accrued over time 

p 1 p 2 p 3 Isomorphism:

query phrase bandit instance; ads arms; CTR payoff probability; bid reward

Policy for Unbudgeted Problem

Policy “MIX” (

adopted from [Auer et. al. ML’02]

)

When query phrase Q

j

arrives

 Compute the priority p i,j of each ad a i,j where p i,j = (e i,j + sqrt(2 ln n j / n i,j )) . b i,j     e i,j is the MLE of the CTR value of a i,j b i,j is the price or bid value of ad a i,j n i,j : # times ad a i,j has been shown in the past n j : # times query Q j has been answered  Display the C highest-priority ads

Budgeted Multi-armed Multi-Bandit

problem (BMMP)

 Finite set of bandit instances; each instance has a finite number of arms  Each arm has an associated

type

 Each type T i  has budget d i Upper limit on the total amount of reward that can be generated by the arms of type T i  An external actor invokes a bandit instance at each time instant  the policy must choose C arms of the invoked instance

Meta Policy for BMMP

 Input: BMMP instance and policy POL for the conventional multi-armed bandit problem  Output: The following Policy BPOL   Run POL in parallel for each bandit instance B i Whenever B i is invoked:  Discard arm(s) with depleted budget   If one or more arms was discarded, restart POL i Let POL i decide which of the remaining arms to activate

Performance Guarantee of BPOL

 OPT 1.

2.

= algorithm that knows in advance: Full sequence of bandit invocations Payoff probabilities     Claim: bpol(N) >= opt(N)/2 – O(f(N)) bpol(N): total expcted reward of BPOL policy after N bandit invocations opt(N): total expected reward of OPT f(N): regret of POL after N invocations of the regular bandit problem

Proof of Performance Guarantee

 Divide the time instants into 3 categories:    1 : BPOL chooses an arm of higher expected reward than OPT  opt 1 (N) <= bpol 1 (N) 2 : BPOL chooses an arm of lower expected reward because OPT’s arm has run out of budget  opt 2 (N) <= bpol 2 (N) + (#types . max reward) 3 : otherwise  opt 3 (N) = O(f(N))  Claim (implies from the above bounds)   opt(N) <= bpol(N) + bpol(N) + O(1) + O(f(N)) bpol(N) >= opt(N)/2 – O(f(n))

Advertisement Policies

BMIX : Output of our generic BPOL policy when given MIX as input  BMIX-E :   Replace sqrt(2 ln n sqrt(min(0.25, V(n i,j ,n j )). ln n .(1-e i,j ). sqrt(2 ln n j / n i,j ) Suggested in Auer. et. al. ML’02.

Purpose: Aggressive exploitation j j / n / n i,j i,j ) in priority p ), where V(n i,j i,j by j ) = e i,j  BMIX-T : throttle(d  i Replace b i,j budget of advertiser A i in priority p i ‘/d i Suggested in Mehta et. al. FOCS’05 i,j i by b i,j . throttle(d i ‘),  Purpose: Delay the depletion of advertisers’ budgets  BMIX-ET: with both E and T modifications

Experiments

 Simulations over real data  Data:     85,000 query phrases from Yahoo! query log Yahoo! ads with daily budget constraints CTRs drawn from Yahoo!’s CTR distribution Simulated user clicks using the CTR values  Time horizon = multiple days  Policies carried over the CTR estimates from one day to the next

Results

 GREEDY : select ads with highest current reward estimate (e i,j . b i,j )  Does not explore. Only exploits.

*Revenue values scaled for confidentiality reasons

Conclusion

 Search advertisement problem   Exploration/exploitation tradeoff Model as multi-armed bandit  Introduced new Bandit variant   Budgeted multi-armed multi-bandit problem (BMMP) New policy for BMMP with performance guarantee  In paper:   Variable set of ads (ads come and go) Prior CTR estimates