Handling Advertisements of Unknown Quality in Search Advertising Sandeep Pandey Christopher Olston (CMU and Yahoo! Research)
Download ReportTranscript Handling Advertisements of Unknown Quality in Search Advertising Sandeep Pandey Christopher Olston (CMU and Yahoo! Research)
Handling Advertisements of Unknown Quality in Search Advertising
Sandeep Pandey Christopher Olston (CMU and Yahoo! Research)
Sponsored Search
How does it work?
Search engine displays ads next to search results Advertisers pay search engine per click
Who benefits from it?
Main source of funding for search engines Information flow from advertisers to users
Sponsored Search
Search query results Sponsored search results Click-through-rate (CTR): given an ad and a query, CTR = probability that the ad receives a click Optimal policy to maximize search engine’s revenue: ads of highest (CTR x bid) value display
Challenges in Sponsored Search
Problem: CTRs initially unknown estimating CTRs requires going around the circle Exploration/Exploitation Tradeoff: explore ads to estimate CTRs exploit known high-CTR ads to maximize revenue refine CTR estimates show ads record clicks earn revenue
The Advertisement Problem
Problem: Advertiser A i submits ad a User clicks on a j ij -> A i i,j for pays b ij Queries arrive one after another Select ads to show for each query, in an online fashion Constraints: Show at most C ads per query Advertisers have daily budgets: A pays at most d i i Goal: Maximize search engine’s revenue
Budgets d d 1 2 A 1 A 2 Ads a 1,1 a 1,3 a 2,1 Query phrases Q 1 Q 2 d 3 A 3 a Advertisers 3,2 Q 3
Our Approach
Unbudgeted Advertisement Problem
Isomorphic to multi-armed bandit problem
Budgeted Advertisement Problem
Similar to bandit problem, but with additional budget constraints that span arms Introduce Budgeted Multi-armed Multi-bandit problem (
BMMP
)
Unbudgeted Advertisement Problem as
Multi-armed Bandit Problem
Bandit:
Classical example of online learning under the explore/exploit tradeoff K arms. Arm i has an associated reward r i unknown payoff probability p i and Pull C arms at each time instant to maximize the reward accrued over time
p 1 p 2 p 3 Isomorphism:
query phrase bandit instance; ads arms; CTR payoff probability; bid reward
Policy for Unbudgeted Problem
Policy “MIX” (
adopted from [Auer et. al. ML’02]
)
When query phrase Q
j
arrives
Compute the priority p i,j of each ad a i,j where p i,j = (e i,j + sqrt(2 ln n j / n i,j )) . b i,j e i,j is the MLE of the CTR value of a i,j b i,j is the price or bid value of ad a i,j n i,j : # times ad a i,j has been shown in the past n j : # times query Q j has been answered Display the C highest-priority ads
Budgeted Multi-armed Multi-Bandit
problem (BMMP)
Finite set of bandit instances; each instance has a finite number of arms Each arm has an associated
type
Each type T i has budget d i Upper limit on the total amount of reward that can be generated by the arms of type T i An external actor invokes a bandit instance at each time instant the policy must choose C arms of the invoked instance
Meta Policy for BMMP
Input: BMMP instance and policy POL for the conventional multi-armed bandit problem Output: The following Policy BPOL Run POL in parallel for each bandit instance B i Whenever B i is invoked: Discard arm(s) with depleted budget If one or more arms was discarded, restart POL i Let POL i decide which of the remaining arms to activate
Performance Guarantee of BPOL
OPT 1.
2.
= algorithm that knows in advance: Full sequence of bandit invocations Payoff probabilities Claim: bpol(N) >= opt(N)/2 – O(f(N)) bpol(N): total expcted reward of BPOL policy after N bandit invocations opt(N): total expected reward of OPT f(N): regret of POL after N invocations of the regular bandit problem
Proof of Performance Guarantee
Divide the time instants into 3 categories: 1 : BPOL chooses an arm of higher expected reward than OPT opt 1 (N) <= bpol 1 (N) 2 : BPOL chooses an arm of lower expected reward because OPT’s arm has run out of budget opt 2 (N) <= bpol 2 (N) + (#types . max reward) 3 : otherwise opt 3 (N) = O(f(N)) Claim (implies from the above bounds) opt(N) <= bpol(N) + bpol(N) + O(1) + O(f(N)) bpol(N) >= opt(N)/2 – O(f(n))
Advertisement Policies
BMIX : Output of our generic BPOL policy when given MIX as input BMIX-E : Replace sqrt(2 ln n sqrt(min(0.25, V(n i,j ,n j )). ln n .(1-e i,j ). sqrt(2 ln n j / n i,j ) Suggested in Auer. et. al. ML’02.
Purpose: Aggressive exploitation j j / n / n i,j i,j ) in priority p ), where V(n i,j i,j by j ) = e i,j BMIX-T : throttle(d i Replace b i,j budget of advertiser A i in priority p i ‘/d i Suggested in Mehta et. al. FOCS’05 i,j i by b i,j . throttle(d i ‘), Purpose: Delay the depletion of advertisers’ budgets BMIX-ET: with both E and T modifications
Experiments
Simulations over real data Data: 85,000 query phrases from Yahoo! query log Yahoo! ads with daily budget constraints CTRs drawn from Yahoo!’s CTR distribution Simulated user clicks using the CTR values Time horizon = multiple days Policies carried over the CTR estimates from one day to the next
Results
GREEDY : select ads with highest current reward estimate (e i,j . b i,j ) Does not explore. Only exploits.
*Revenue values scaled for confidentiality reasons
Conclusion
Search advertisement problem Exploration/exploitation tradeoff Model as multi-armed bandit Introduced new Bandit variant Budgeted multi-armed multi-bandit problem (BMMP) New policy for BMMP with performance guarantee In paper: Variable set of ads (ads come and go) Prior CTR estimates