Transcript Slide 1

Hierarchical Exploration for
Accelerating Contextual Bandits
Yisong Yue, Sue Ann Hong and Carlos Guestrin
Personalized Recommender Systems
• Every day, user visits news portal
• Wish to personalize to her preferences
CoFineUCB:
Coarse-to-Fine Hierarchical Exploration
• Two tiered exploration:
• First in subspace
• Then in full space
• Can only learn from feedback
• E.g., user clicks on or “likes” article
• Leads to exploration vs exploitation dilemma
• Goal is to satisfy user
• Must make exploratory recommendations to
learn user’s preferences
• Formalized as a contextual bandit problem
Linear Stochastic Bandit Problem
• At each iteration t:
• Set of available actions Xt = {xt,1, …, xt,n} (available articles)
• Algorithm chooses action xt from Xt (recommends an article)
• User provides feedback ŷt (user clicks on or “likes” the article)
• Algorithm incorporates feedback
• Assumptions: E[ŷt] = w*Txt (w* is unknown to system)
• Regret:
Theorem: with probability 1- δ average bounded by
(
R(T) = O ( S^ D + SpK ) T
R(T) = å w x - w xt
*T *
t
*T
)
t
Constructing Feature Hierarchies
Using Prior Knowledge
Balancing Exploration vs. Exploitation
“Upper Confidence Bound”
• Given empirical sample of learned profiles W
argmax w x + Ct (x)
T
t
• At each iteration:
xÎXt
Estimated Gain Uncertainty
• In example below: select article on economy:
• Can also be used to reshape full space (use LearnU(W,D))
Mean Estimate by Topic
Uncertainty of Estimate
News Recommender Simulations & User Study
+
Naïve LinUCB
Reshaped
Full Space
Feature Hierarchies
Subspace
• Suppose “stereotypical users” span K-dimensional space
• E.g., “European vs. Asian news”
Coarse-toFine Approach
• Let U = D x K matrix
x =U x
T
• Define projection of articles into subspace:
• Define representation of user profile:
• Thus:
*T
*
*T
^
• If w x » w x then suffices to
learn primarily in subspace
*T
w = Uw + w
*
w x = w x+ w x
*T
*T
*
w
• K-dimensional space much more
efficient to explore
• Explore full space as needed
“Atypical Users”
“All Users”
*
w
*
^
• Leave-one-out simulation validation
• Compared against hierarchy-free baselines
• CoFineUCB combines efficiency of Subspace
Learning with flexibility of Full Space Learning
• Live User Study
• Showed real users real articles
• 10 articles/day, 10 days
• Counted #likes
Comparison
CoFineUCB vs. Naïve
Win / Tie / Loss
24 / 1 / 3
Gain / Day
0.69
CoFineUCB vs. Reshaped
21 / 3 / 6
0.27