Transcript Document

A Structured Approach to Query Recommendation
With Social Annotation Data
童薇
2010/12/3
Outline





Motivation
Challenges
Approach
Experimental Results
Conclusions
2010/12/3
Outline





Motivation
Challenges
Approach
Experimental Results
Conclusions
2010/12/3
Motivation

Query Recommendation


2010/12/3
Help users search
Improve the usability of search engines
Recommend what?

Existing Work

Search interests: stick to user’s search intent
equivalent or highly related queries
apple iphone
smartphones
nexus one
apple products

Anything Missing?



2010/12/3
ipod touch
mobileme
Exploratory Interests: some vague or delitescent interests
Unaware of until users are faced with one
May be provoked within a search session
Is the existence of exploratory interest
common and significant?

Identified from search user behavior analysis


Make use of one-week log search data
Verified by Statistical Tests(Log-likehood Ratio Test)

Analyze the causality between initial queries and consequent queries

Results

In 80.9% of cases: Clicks on search results indeed affect
the formulation of the next queries
 In
43.1% of cases: Users would issue different next
queries if they clicked on different results
2010/12/3
Two different heading directions of Query
Recommendation

Emphasize search interests:

Help users easily refine their queries and find what they
need more quickly
 Enhance the “search-click-leave” behavior

Focus on exploratory interests:

Attract more user clicks and make search and browse
more closely integrated
 Increase the staying time and advertisement revenue
equivalent or highly related nexus
queries
one
Recommend queries to satisfy both search and exploratory
apple
iphone
interests of
users
simultaneously
ipod touch mobileme
2010/12/3
Outline





Motivation
Challenges
Our Approach
Experimental Results
Conclusions
2010/12/3
Challenges

To leverage what kind of data resource?
Search logs: Interactions between search users and search engines
Social annotation data: Keywords according to the content of the pages
“wisdom of crowds”
2010/12/3
Challenges


To leverage what kind of data resource?
How to present such recommendations to users?
Refine queries
Stimulate
exploratory interests
2010/12/3
A Structured Approach to Query Recommendation
With Social Annotation Data
Outline





Motivation
Challenges
Approach
Experimental Results
Conclusions
2010/12/3
Approach

Query Relation Graph

A one-mode graph withthe nodes representing all the
unique queries and the edges capturing relationships
between queries

Structured Query Recommendation

Ranking using Expected Hitting Time
 Clustering with Modularity
 Labeling each cluster with social tags
2010/12/3
Query RelationGraph

Query Formulation Model
2010/12/3
Query RelationGraph

Query Formulation Model
2
3
5
3 4
1
2
2010/12/3
PVu |Vq (j | i ) 
w qu(i ,j )
u V w qu(i ,k )
k
u
PVu |Vq (2 | 2) 
5
354
PVq |Vu (1 | 1) 
2
23
Query RelationGraph


Query Formulation Model
Construction of Query Relation Graph
2
35
3 4
1
2
3
2
1
3
1
2
PVu |Vq (2 | 2) 
PVt |Vu (2 | 1) 
1
2
213
PVu |Vt (1 | 2) 
1
2
PVq |Vu (1 | 1) 
2
23
PVq |Vq (2 | 1) 
2010/12/3
5
354
5
2 1
2
* * *
12 6 2 2  3
Ranking with Hitting Time


Apply a Markov random walk on the graph
Employ hitting time as a measure to rank queries

The expected number of steps before
node j is visited
starting from node i
 The hiting time T is the first time that the random walk is at
node j from the start node i:
PVq |Vq (j|i )[T  m ] 

n
PV V

k
1
PVq |Vq (j|k )[T  m  1]
q | q (k|i )
The mean hitting time h(j|i) is the expectation of T under the
X0  i
condition
hVq |Vq (j|i ) 
2010/12/3

mPV V

m
1
[T  m | X 0  i ]
q | q (j |i )
Ranking with Hitting Time


Apply a Markov random walk on the graph
Employ hitting time as a measure to rank queries

The expected number of steps before
starting from node i
 Satisfies the following linear system
2010/12/3
node j is visited
Clustering with Modularity

Group the top k recommendations into clusters

It is natural to apply a graph clustering approach
 Modularity function
C1 C 2
Ck
C 1 e11 e12 ... e1k  a 1


e
e
...
e
2k  a
C 2  21 22
2
Ck
e ij 
... ... ... ...


e
e
...
e
kk  ak ai 
 k1 k 2
Pij


V C V C
i q j

q
i q

q
Pij


V V V V
j
eij

j
Note: In a network in which edges fall between vertices without regard for
the communities they belong to ,we would have est  as at
2010/12/3
Clustering with Modularity

Group the top k recommendations into clusters

It is natural to apply a graph clustering approach
 Modularity function

2010/12/3
Employ the fast unfolding algorithm to perform clustering
Clustering with Modularity

Group the top k recommendations into clusters

It is natural to apply a graph clustering approach
 Modularity function


Employ the fast unfolding algorithm to perform clustering
Label each cluster explicitly with social tags
The expected tag distribution given a query:
The expected tag distribution under a cluster:
2010/12/3
Outline





Motivation
Challenges
Approach
Experimental Results
Conclusions
2010/12/3
Experimental Results

Data set

Query Logs: Spring 2006 Data Asset (Microsoft Research)



Social Annotation Data: Delicious data




15 million records (from US users) sampled over one month in May, 2006
2.7 million unique queries and 4.2 million unique URLs
Over 167 million taggings sampled during October and November, 2008
0.83 million unique users, 57.8 unique URLs and 5.9 million unique tags
Query Relation Graph: 538, 547 query nodes
Baseline Methods
BiHit: Hitting Time approach based on query logs (Mei et al.
CIKM ’08)
 TriList: list-based approach to query recommendation
considering both search and exploratory interests
 TriStrucutre: Our approach
2010/12/3

Examples of RecommendationResults
TriStructure
Query = espn
BiHit
espn magazine
espn go
espn news
espn sports
esonsports
baseball news espn
espn mlb
sports news
espn radio
espn 103.3
espn cell phone
espn baseball
sports
mobile espn
2010/12/3
espn hockey
[sports espn news]
TriList
espn radio
espn radio
espn news
espn news
espn nba
yahoo sports
espn mlb
nba news
espn sports
cbs sportsline
bill simmons
espn nba
sports
[sports news scores]
yahoo sports
espn mlb
nba news
espn sports
cbs sportsline
sporting news
scout
sportsline
sports illustrated
bill simmons
fox sports
sports
sporting news
scout
sportsline
sports illustrated
Examples of RecommendationResults
TriStructure
Query = 24
BiHit
24 season 5
24 series
[tv 24 entertainment]
fox 24
TriList
fox 24
kiefer sutherland
kiefer sutherland
24 tv show
24 fox
24 on fox
tv guide
24 fox
jack bauer
24 tv show
fox 24
24 on fox
24 fox
24 tv show
jack bauer
tv show 24
grey’s anatomy
24 hour
fox television network
fox broadcasting
24 on fox
desperate housewives
prison break
24 spoilers
[tv televisions entertainment]
tv guide
abc
tv listings
fox
fox tv
24 spoilers
fox sports net
abc
grey’s anatomy
fox sport
tv listings
desperate housewives
ktvi 2
fox
prison break
one tree hill
one tree hill
2010/12/3
fox five news
[tv television series]
Manual Evaluation

Comparison based on users’click behavior



2010/12/3
A label tool to simulate the real search scenario
Label how likelihood the user would like to click (6-point scale)
Randomly sampled 300 queries, 9 human judges
Experimental Results (cont.)

non-zero label score ➡ click
Clicked Recommendation Number (CRN)
Clicked Recommendation Score (CRS)
Total Recommendation Score (TRS)
Overall Performance
Click Performance Comparison
Distributions of Labeled Score over Recommendations
2010/12/3
Experimental Results (cont.)

How Structure Helps


How the structured approach affects users’ click willingness
Click Entropy
The
Average Click Entropy over Queries under the TriList and TriStructure Methods.
2010/12/3
Experimental Results (cont.)

How Structure Helps


How the structured approach affects users’ click patterns
Label Score Correlation
Correlation between the Average Label Scores on Same Recommendations for Queries.
2010/12/3
Outline





Motivation
Challenges
Approach
Experimental Results
Conclusions
2010/12/3
Conclusions




Recommend queries in a structured way for better
satisfying both search and exploratory interests of users
Introduce the social annotation data as an important
resource for recommendation
Better satisfy users interests and significantly enhance
user’s click behavior on recommendations
Future work

Trade-off between diversity and concentration
 Tag propagation
2010/12/3
Thanks!
2010/12/3