Template to create a scientific poster
Download
Report
Transcript Template to create a scientific poster
User Modeling in Search Engine Logs
Hongning Wang, Advisort: ChengXiang Zhai,
Department of Computer Science, University of Illinois at Urbana-Champaign Urbana, IL 61801 USA
{wang296,czhai}@Illinois.edu
A Non-parametric Bayesian Approach [WSDMโ14]
A Ranking Model Adaptation Approach [SIGIRโ13]
In this work, we study the problem of user modeling in the search log data and propose a generative model,
dpRank, within a non-parametric Bayesian framework. By postulating generative assumptions about a user's
search behaviors, dpRank identifies each individual user's latent search interests and his/her distinct result
preferences in a joint manner. Experimental results on a large-scale news search log data set validate the
effectiveness of the proposed approach, which not only provides in-depth understanding of a user's search
intents but also benefits a variety of personalized applications.
Methods
๐๐๐ก ~๐(๐0 , ๐02 )
In this work, we propose a general ranking model adaptation framework for personalized search. The
proposed framework quickly learns to apply a series of linear transformations, e.g., scaling and shifting,
over the parameters of the given global ranking model such that the adapted model can better fit each
individual user's search preferences. Extensive experimentation based on a large set of search logs from
a major commercial Web search engine confirms the effectiveness of the proposed method compared to
several state-of-the-art ranking model adaptation methods.
Methods
2
๐๐๐ก
~๐บ๐๐๐๐(๐ผ0 , ๐ฝ0 ) ๐ฝ๐๐ฃ ~๐(0, ๐02 )
โข Adjust the generic ranking modelโs parameters with respect to each individual userโs
ranking preferences
Dirichlet Process Prior
y
y
๐(๐ 2 )
p(Q)
(๐1 , ๐12 , ๐ฝ1 )
p(Q)
2
(๐๐ , ๐๐ , ๐ฝ๐ )
p(Q)
(๐๐ , ๐๐2 , ๐ฝ๐ )
Modeling of search interest
๐ ๐๐ ~๐(๐๐ , ๐๐2 ๐ผ)
Modeling of result preferences
Latent User Groups
โ
๐๐ ๐=1 ~๐ท๐(๐พ, ๐)
โฆโฆ
โฆโฆ
f1
f1
f1
๐1 ๐2 ๐3 ๐๐
๐ ๐ท ๐๐ =
๐ฆ๐๐ >๐ฆ๐๐ก
1 โ ๐ ๐๐ ๐๐๐
Group 1
f2
Group k
f2
1
1 + exp(โ๐ฝ๐๐ก (๐๐๐ โ ๐๐๐ก ))
Individual level: characterize
userโs own interest
โฆ
โฆ
๐ ๐ข ๐ฅ = ๐ด๐ข ๐ค ๐ ๐ ๐ฅ
x
๐ข
๐๐ 1
Clicks
๐ข
๐ด =
๐(๐)
0
โฏ
0
๐๐๐ข 2
โฎ
0
โฎ
โฏ
โฏ
โฑ
๐ข
๐๐ 1
๐๐๐ข 2
โฎ
๐๐๐ข ๐
๐๐๐ข 1
โข Linear regression based model adaptation
๐ข
๐ ๐ข3
โฆ
x
Timestamp
Query
5/29/2012 14:06:04
coney island Cincinnati
5/30/2012 12:12:04
drive direction to coney island
5/31/2012 19:40:38
motel 6 locations
5/31/2012 19:45:04 Cincinnati hotels near coney island
min
๐ฟ๐๐๐๐๐ก ๐ด
๐ข
f2
Group c
๐ ๐ข2
๐ ๐ข1
๐ ๐ฅ = ๐ค๐๐ฅ
Aggregated level: information
shared by all the users
๐ด
๐ข
= ๐ฟ ๐ ;๐
๐ข
๐ข
+ ๐๐
(๐ด )
Induced optimization ๐คโ๐๐๐ ๐ ๐ข ๐ฅ = ๐ด๐ข ๐ค ๐ ๐ ๐ฅ ๐๐๐ ๐ค ๐ = (๐ค ๐ , 1)
problem in the same
Lose function from any linear
complexity as the
Complexity of adaptation
learning-to-rank
algorithm,
e.g.,
original problem
RankNet, LambdaRank, RankSVM
โข Instantiation of RankSVM
A fully generative model for exploring usersโ search behaviors
1. Draw latent user groups from DP:
2
~๐บ๐๐๐๐(๐ผ0 , ๐ฝ0 ) ๐ฝ๐๐ฃ ~๐(0, ๐02 )
๐๐๐ก ~๐(๐0 , ๐02 ) ๐๐๐ก
2. Draw group membership for each user from DP:
๐๐ โ
๐=1 ~๐ท๐(๐พ, ๐)
3. To generate a query in user u:
3.1 Draw a latent user group c: ๐๐ ~๐๐ข
2
๐
๐
~๐(๐
,
๐
3.2 Draw query qi for user u accordingly:
๐
๐ ๐ ๐ผ)
3.3 Draw click preferences for qi accordingly:
Gibbs sampling for
posterior inference
๐ ๐ท๐ ๐๐ =
๐ฆ๐๐ >๐ฆ๐๐ก
โข Document ranking
1
โข ๐ ๐๐๐ก , ๐๐ =
Experimental Results
|๐|
โข Yahoo! News search logs
โข May to July, 2011
โข 65 ranking features for each Query-Document pair
โข Query distribution in latent user groups
Group
10
Top Ranked Queries
๐ โ๐
๐
๐ ๐
๐
P@1
P@3
MRR
0.487
0.616
0.622
0.617
0.298
0.446
0.459
0.449
0.220
0.283
0.283
0.281
0.501
0.632
0.638
0.632
dpRank
0.642
0.485
0.290
site authority
proximity in titleโข Click preferences in latent user groups
query match in title
0.658
URSVM
GRSVM
TRSVM
IRSVM
today in history, nascar 2011 schedule, today history, this day in history
9
miami heat, los angeles lakers, liverpool football club, arsenal football, nfl lockout
8
los angeles lakers, arsenal football, the dark knight rises, transformers 3,
manchester united
8
the titanic, the bachelorette, cars 2, hangover 2, the voice
6
tree of life, game of thrones, sonic the hedgehog, world of warcraft, mtv awards
2011
casey anthony trial, casey anthony jurors, casey anthony, crude oil prices, air france
flight 447
2
+C
๐๐๐๐
๐๐
fake tupac story, pbs hackers, alaska earthquake, southwest pilot, arizona wildfires
1
2
selena gomez, lady gaga, britney spears, jennifer aniston, taylor swift
0
1
iran, china, libya, vietnam, Syria
Global model
๐,๐
User Set
0
โข0.2
User Class
๐พ1 ๐ฅ๐ก , ๐ฅ๐
1
=
๐
โข0.6
4
6
8
Feature ID
10
12
๐ ๐ฃ =๐
๐ค๐ฃ๐ ๐ฅ๐๐ฃ
๐ ๐ฃ =๐
๐ฅ๐ก๐ฃ
๐
๐ ๐ฃ =๐
๐ฅ๐๐ฃ
๐ ๐ฃ =๐
โข Query-level improvement against global model
# Queries
# Documents
-
49,782
2,320,711
34,827
187,484
1,744,969
% Population
[10, โ) queries Heavy
6.8
[5, 10) queries Medium
14.9
(0, 5) queries
78.3
โข0.4
2
๐ค๐ฃ๐ ๐ฅ๐ก๐ฃ
๐
# Users
Annotation Set
0.2
0
Non-linear kernels
๐คโ๐๐๐ ๐พ1 ๐ฅ๐ก , ๐ฅ๐ =
โข Adaptation efficiency
per-user basis adaptation baseline
3
3
๐ . ๐ก. 0 โค ๐ผ๐ก โค ๐ถ, โ๐ก
0.4
4
2
๐ก
๐๐๐๐ โฅ 0
๐คโ๐๐๐ ๐ฆ๐๐ > ๐ฆ๐๐ ๐๐๐ ฮ๐ฅ๐๐๐ = ๐ฅ๐๐ โ ๐ฅ๐๐
5
joplin missing, apple icloud, sony hackers, google subpoena, ford transmission
๐ผ
1 โ ๐ ๐ฅ๐ก
โข User-level improvement against global model
6
4
max
๐ . ๐ก. ๐ค ๐ ฮ๐ฅ๐๐๐ โฅ 1 โ ๐๐๐๐ , โ๐๐ , ๐ฅ๐๐ , ๐ฅ๐๐
7
Group ID
7
1
min w
๐ค,๐๐๐๐ 2
1 ๐
๐ผ๐ก โ ๐ผ ๐พ1 ๐ฅ, ๐ฅ + ๐พ2 ๐ฅ, ๐ฅ ๐ผ
2
๐
โข Bing query log: May 27, 2012 โ May 31, 2012
โข 1830 ranking features
10
document age
Pairwise ranking model
Experimental Results
๐
= ๐ ๐๐ ๐ฝ๐ ๐๐๐ก
MAP
9
5
1
1 + exp(โ๐ฝ๐๐ก (๐๐๐ โ ๐๐๐ก ))
Margin rescaling
14
โข0.8
Light
Method
RA
Cross
RA
Cross
RA
Cross
ฮMAP
ฮP@1
0.1843 0.3309
0.1998 0.3523
0.1102 0.2129
0.1494 0.2561
0.0042 0.0575
0.0403* 0.0894*
ฮP@3
0.0120
0.0182
0.0025
0.0208
-0.0221
-0.0021
ฮMRR
0.1832
0.1994
0.1103
0.1500
0.0041
0.0406*
* Indicates p-value<0.01
Use cross-training to determine feature grouping