An Interactive Learning Approach to Optimizing Information Retrieval Systems CMU ML Lunch September 27th, 2010 Yisong Yue Carnegie Mellon University.

Download Report

Transcript An Interactive Learning Approach to Optimizing Information Retrieval Systems CMU ML Lunch September 27th, 2010 Yisong Yue Carnegie Mellon University.

An Interactive Learning Approach to
Optimizing Information Retrieval Systems
CMU ML Lunch
September 27th, 2010
Yisong Yue
Carnegie Mellon University
Information
Systems
Interactive Learning Setting
• Find the best ranking function (out of 1000)
– Show results, evaluate using click logs
– Clicks biased (users only click on what they see)
– Explore / exploit problem
Interactive Learning Setting
• Find the best ranking function (out of 1000)
– Show results, evaluate using click logs
– Clicks biased (users only click on what they see)
– Explore / exploit problem
• Technical issues
– How to interpret clicks?
– What is the utility function?
– What results to show users?
Interactive Learning Setting
• Find the best ranking function (out of 1000)
– Show results, evaluate using click logs
– Clicks biased (users only click on what they see)
– Explore / exploit problem
• Technical issues
– How to interpret clicks?
– What is the utility function?
– What results to show users?
Team-Game Interleaving
(u=thorsten, q=“svm”)
f1(u,q)  r1
1.
2.
3.
4.
5.
f2(u,q)  r2
Kernel Machines
http://svm.first.gmd.de/
Support Vector Machine
http://jbolivar.freeservers.com/
An Introduction to Support Vector Machines
http://www.support-vector.net/
Archives of SUPPORT-VECTOR-MACHINES ...
http://www.jiscmail.ac.uk/lists/SUPPORT...
SVM-Light Support Vector Machine
http://ais.gmd.de/~thorsten/svm light/
1.
NEXT
PICK
2.
3.
4.
5.
Interleaving(r1,r2)
1.
2.
3.
4.
5.
6.
7.
Kernel Machines
T2
http://svm.first.gmd.de/
Support Vector Machine
T1
http://jbolivar.freeservers.com/
SVM-Light Support Vector Machine
T2
http://ais.gmd.de/~thorsten/svm light/
An Introduction to Support Vector Machines
T1
http://www.support-vector.net/
Support Vector Machine and Kernel ... References T2
http://svm.research.bell-labs.com/SVMrefs.html
Archives of SUPPORT-VECTOR-MACHINES ...
T1
http://www.jiscmail.ac.uk/lists/SUPPORT...
Lucent Technologies: SVM demo applet
T2
http://svm.research.bell-labs.com/SVT/SVMsvt.html
Kernel Machines
http://svm.first.gmd.de/
SVM-Light Support Vector Machine
http://ais.gmd.de/~thorsten/svm light/
Support Vector Machine and Kernel ... References
http://svm.research.bell-labs.com/SVMrefs.html
Lucent Technologies: SVM demo applet
http://svm.research.bell-labs.com/SVT/SVMsvt.html
Royal Holloway Support Vector Machine
http://svm.dcs.rhbnc.ac.uk
Invariant:
For all k, in expectation
same number of team
members in top k from
each team.
Interpretation: (r1 Â r2) ↔ clicks(T1) > clicks(T2)
[Radlinski, Kurup, Joachims; CIKM 2008]
Setting
• Find the best ranking function (out of 1000)
– Evaluate using click logs
– Clicks biased (users only click on what they see)
– Explore / exploit problem
• Technical issues
– How to interpret clicks?
– What is the utility function?
– What results to show users?
Interleave A vs B
…
Left wins
Right wins
A vs B
1
0
A vs C
0
0
B vs C
0
0
Interleave A vs C
…
Left wins
Right wins
A vs B
1
0
A vs C
0
1
B vs C
0
0
Interleave B vs C
…
Left wins
Right wins
A vs B
1
0
A vs C
0
1
B vs C
1
0
Interleave A vs B
…
Left wins
Right wins
A vs B
1
1
A vs C
0
1
B vs C
1
0
Dueling Bandits Problem
• Given K bandits b1, …, bK
• Each iteration: compare (duel) two bandits
– E.g., interleaving two retrieval functions
[Yue & Joachims, ICML 2009] [Yue, Broder, Kleinberg, Joachims, COLT 2009]
Dueling Bandits Problem
• Given K bandits b1, …, bK
• Each iteration: compare (duel) two bandits
– E.g., interleaving two retrieval functions
• Cost function (regret):
T
RT   P(b*  bt )  P(b*  bt ' )  1
t 1
• (bt, bt’) are the two bandits chosen
• b* is the overall best one
• (% users who prefer best bandit over chosen ones)
[Yue & Joachims, ICML 2009] [Yue, Broder, Kleinberg, Joachims, COLT 2009]
T
RT   P(b*  bt )  P(b*  bt ' )  1
t 1
•Example 1
•P(f* > f) = 0.9
•P(f* > f’) = 0.8
•Incurred Regret = 0.7
•Example 2
•P(f* > f) = 0.7
•P(f* > f’) = 0.6
•Incurred Regret = 0.3
•Example 3
•P(f* > f) = 0.51
•P(f* > f) = 0.55
•Incurred Regret = 0.06
Assumptions
• P(bi > bj) = ½ + εij (distinguishability)
• Strong Stochastic Transitivity
– For three bandits bi > bj > bk :
– Monotonicity property
• Stochastic Triangle Inequality
– For three bandits bi > bj > bk :
– Diminishing returns property
• Satisfied by many standard models
– E.g., Logistic / Bradley-Terry
 ik  max ij ,  jk 
 ik   ij   jk
Explore then Exploit
• First explore
– Try to gather as much information as possible
– Accumulates regret based on which bandits we decide to
compare
• Then exploit
– We have a (good) guess as to which bandit best
– Repeatedly compare that bandit with itself
• (i.e., interleave that ranking with itself)
T
RT   P(b*  bt )  P(b*  bt ' )  1
t 1
Naïve Approach
• In deterministic case, O(K) comparisons to find max
• Extend to noisy case:
– Repeatedly compare until confident one is better
• Problem: comparing two awful (but similar) bandits
• Example:
–
–
–
–
P(A > B) = 0.85
P(A > C) = 0.85
P(B > C) = 0.51
Comparing B and C requires many comparisons!
Interleaved Filter
• Choose candidate bandit at random
►
[Yue, Broder, Kleinberg, Joachims, COLT 2009]
Interleaved Filter
• Choose candidate bandit at random
• Make noisy comparisons (Bernoulli trial)
against all other bandits simultaneously
– Maintain mean and confidence interval
for each pair of bandits being compared
►
[Yue, Broder, Kleinberg, Joachims, COLT 2009]
Interleaved Filter
• Choose candidate bandit at random
• Make noisy comparisons (Bernoulli trial)
against all other bandits simultaneously
– Maintain mean and confidence interval
for each pair of bandits being compared
• …until another bandit is better
►
– With confidence 1 – δ
[Yue, Broder, Kleinberg, Joachims, COLT 2009]
Interleaved Filter
• Choose candidate bandit at random
• Make noisy comparisons (Bernoulli trial)
against all other bandits simultaneously
– Maintain mean and confidence interval
for each pair of bandits being compared
• …until another bandit is better
– With confidence 1 – δ
• Repeat process with new candidate
►
– (Remove all empirically worse bandits)
[Yue, Broder, Kleinberg, Joachims, COLT 2009]
Interleaved Filter
• Choose candidate bandit at random
• Make noisy comparisons (Bernoulli trial)
against all other bandits simultaneously
– Maintain mean and confidence interval
for each pair of bandits being compared
• …until another bandit is better
– With confidence 1 – δ
• Repeat process with new candidate
– (Remove all empirically worse bandits)
• Continue until 1 candidate left
►
[Yue, Broder, Kleinberg, Joachims, COLT 2009]
Intuition
• Simulate comparing candidate with
all remaining bandits simultaneously
[Yue, Broder, Kleinberg, Joachims, COLT 2009]
Intuition
• Simulate comparing candidate with
all remaining bandits simultaneously
• Example:
–
–
–
–
–
P(A > B) = 0.85
P(A > C) = 0.85
P(B > C) = 0.51
B is candidate
# comparisons between B vs C bounded
by comparisons between B vs A!
[Yue, Broder, Kleinberg, Joachims, COLT 2009]
Regret Analysis
• Can model sequence of candidate bandits
as a random walk.
• Which will be the next candidate bandit?
1/3
1/3
1/3
• O(Log K) rounds
[Yue, Broder, Kleinberg, Joachims, COLT 2009]
Regret Analysis
• After each round, we remove a constant
fraction of the remaining bandits.
4 matches
2 matches
0 matches
• O(K) total matches
[Yue, Broder, Kleinberg, Joachims, COLT 2009]
Regret Analysis
K

ERT   O logT 


• T – time horizon
• K – # bandits / retrieval functions
• ε – best vs 2nd best
• Average regret RT / T → 0
– Information-theoretically optimal
• Also need to prove correctness (see paper)
[Yue, Broder, Kleinberg, Joachims, COLT 2009]
Summary
• Provably efficient online algorithm
– (In a regret sense)
– Also results for continuous (convex) setting
Summary
• Provably efficient online algorithm
– (In a regret sense)
– Also results for continuous (convex) setting
• Requires comparison oracle
– Reflects user preferences
– Independence / Unbiased
– Strong transitivity
– Triangle Inequality
Directions to Explore
• Relaxing assumptions
– E.g., strong transitivity & triangle inequality
• Integrating context
• Dealing with large K
– Assume additional structure on for retrieval functions?
• Other cost models
– PAC setting (fixed budget, find the best possible)
• Dynamic or changing user interests / environment
Improving Comparison Oracle
Improving Comparison Oracle
• Dueling Bandits Problem
– Interactive learning framework
– Provably minimizes regret
– Assumes idealized comparison oracle
• Can we improve the comparison oracle?
– Can we improve how we interpret results?
Determining Statistical Significance
• Each q, interleave A(q) and B(q), log clicks
• t-Test
– For each q, score: % clicks on A(q)
• E.g., 3/4 = 0.75
– Sample mean score (e.g., 0.6)
– Compute confidence (p value)
• E.g., want p = 0.05 (i.e., 95% confidence)
– More data, more confident
Limitation
• Example: query session with 2 clicks
– One click at rank 1 (from A)
– Later click at rank 4 (from B)
– Normally would count this query session as a tie
Limitation
• Example: query session with 2 clicks
–
–
–
–
–
One click at rank 1 (from A)
Later click at rank 4 (from B)
Normally would count this query session as a tie
But second click is probably more informative…
…so B should get more credit for this query
Linear Model
• Feature vector φ(q,c):
1 always




1
if
click
led
t
o
download



 ( q, c )  
1 if last click


1 if higher rank t hanpreviousclick



• Weight of click is wTφ(q,c)
[Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010]
Example
 1 if c is last click;0 else 
 (q, c)  

1
if
c
is
not
last
click;
0
else


• wTφ(q,c) differentiates last clicks and other clicks
[Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010]
Example
 1 if c is last click;0 else 
 (q, c)  

1
if
c
is
not
last
click;
0
else


• wTφ(q,c) differentiates last clicks and other clicks
• Interleave A vs B
– 3 clicks per session
– Last click 60% on result from A
– Other 2 clicks random
[Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010]
Example
 1 if c is last click;0 else 
 (q, c)  

1
if
c
is
not
last
click;
0
else


• wTφ(q,c) differentiates last clicks and other clicks
• Interleave A vs B
– 3 clicks per session
– Last click 60% on result from A
– Other 2 clicks random
• Conventional w = (1,1) – has significant variance
• Only count last click w = (1,0) – minimizes variance
[Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010]
Scoring Query Sessions
• Feature representation for query session:

1
q    (q, c)   (q, c)
q cA( q )
cB ( q )

[Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010]
Scoring Query Sessions
• Feature representation for query session:

1
q    (q, c)   (q, c)
q cA( q )
cB ( q )

• Weighted score for query:

1
T
T
w q    w  (q, c)   w  (q, c)
q cA( q )
cB ( q )

T
• Positive score favors A, negative favors B
[Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010]
Supervised Learning
• Will optimize for z-Test: Inverse z-Test
– Approximately equal t-Test for large samples
– z-Score = mean / standard deviation
1
m ean( w)   wT q
n q

1
T
std ( w) 
w
q  m ean( w)

n q

2
m ean( w)
 1
w*  arg max

std ( w)
w
 T  1
(Assumes A > B)
[Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010]
Experiment Setup
• Data collection
– Pool of retrieval functions
– Hash users into partitions
– Run interleaving of different pairs in parallel
• Collected on arXiv.org
– 2 pools of retrieval functions
– Training Pool: (6 pairs) know A > B
– New Pool: (12 pairs)
Training Pool – Cross Validation
Experimental Results
• Inverse z-Test works well
–
–
–
–
Beats baseline on most of new interleaving pairs
Direction of tests all in agreement
In 6/12 pairs, for p=0.1, reduces sample size by 10%
In 4/12 pairs, achieves p=0.05, but not baseline
• 400 to 650 queries per interleaving experiment
• Weights hard to interpret (features correlated)
• Largest weight: “1 if single click & rank > 1”
Interactive Learning
• Dueling Bandits Problem
– System learns “on-the-fly”.
– Maximize total user utility over time
– Exploration / exploitation tradeoff
Interactive Learning
• Dueling Bandits Problem
– System learns “on-the-fly”.
– Maximize total user utility over time
– Exploration / exploitation tradeoff
• Interpreting Implicit Feedback
– Supervised learning to learn better interpretation
– How do we “close the loop”?
• Simple yet practical model
– Efficient & compatible with existing approaches
Thank You!
Slides, papers & software available at
www.yisongyue.com
Extra Slides
Regret Analysis
• Round: all the time steps for a particular
candidate bandit
– Halts when better bandit found …
– … with 1- δ confidence
– Choose δ = 1/(TK2)
• Match: all the comparisons between two
bandits in a round
– At most K matches in each round
– Candidate plays one match against each
remaining bandit
[Yue, Broder, Kleinberg, Joachims, COLT 2009]
Regret Analysis
• O(log K) total rounds
• O(K) total matches
• Each match incurs regret
– Depends on δ = K-2T-1
1

O logT 


• Finds best bandit w.p. 1-1/T
• Expected regret:
 1 K
 1
ERT   1  O logT   O(T )
 T 
 T
K

ERT   O logT 


[Yue, Broder, Kleinberg, Joachims, COLT 2009]
Removing Inferior Bandits
• At conclusion of each round
– Remove any empirically worse bandits
• Intuition:
– High confidence that winner is better
than incumbent candidate
– Empirically worse bandits cannot be “much better” than
incumbent candidate
– Can show via Hoeffding bound that winner is also better than
empirically worse bandits with high confidence
– Preserves 1-1/T confidence overall that we’ll find the best bandit