Transcript Document
A Structured Approach to Query Recommendation With Social Annotation Data 童薇 2010/12/3 Outline Motivation Challenges Approach Experimental Results Conclusions 2010/12/3 Outline Motivation Challenges Approach Experimental Results Conclusions 2010/12/3 Motivation Query Recommendation 2010/12/3 Help users search Improve the usability of search engines Recommend what? Existing Work Search interests: stick to user’s search intent equivalent or highly related queries apple iphone smartphones nexus one apple products Anything Missing? 2010/12/3 ipod touch mobileme Exploratory Interests: some vague or delitescent interests Unaware of until users are faced with one May be provoked within a search session Is the existence of exploratory interest common and significant? Identified from search user behavior analysis Make use of one-week log search data Verified by Statistical Tests(Log-likehood Ratio Test) Analyze the causality between initial queries and consequent queries Results In 80.9% of cases: Clicks on search results indeed affect the formulation of the next queries In 43.1% of cases: Users would issue different next queries if they clicked on different results 2010/12/3 Two different heading directions of Query Recommendation Emphasize search interests: Help users easily refine their queries and find what they need more quickly Enhance the “search-click-leave” behavior Focus on exploratory interests: Attract more user clicks and make search and browse more closely integrated Increase the staying time and advertisement revenue equivalent or highly related nexus queries one Recommend queries to satisfy both search and exploratory apple iphone interests of users simultaneously ipod touch mobileme 2010/12/3 Outline Motivation Challenges Our Approach Experimental Results Conclusions 2010/12/3 Challenges To leverage what kind of data resource? Search logs: Interactions between search users and search engines Social annotation data: Keywords according to the content of the pages “wisdom of crowds” 2010/12/3 Challenges To leverage what kind of data resource? How to present such recommendations to users? Refine queries Stimulate exploratory interests 2010/12/3 A Structured Approach to Query Recommendation With Social Annotation Data Outline Motivation Challenges Approach Experimental Results Conclusions 2010/12/3 Approach Query Relation Graph A one-mode graph withthe nodes representing all the unique queries and the edges capturing relationships between queries Structured Query Recommendation Ranking using Expected Hitting Time Clustering with Modularity Labeling each cluster with social tags 2010/12/3 Query RelationGraph Query Formulation Model 2010/12/3 Query RelationGraph Query Formulation Model 2 3 5 3 4 1 2 2010/12/3 PVu |Vq (j | i ) w qu(i ,j ) u V w qu(i ,k ) k u PVu |Vq (2 | 2) 5 354 PVq |Vu (1 | 1) 2 23 Query RelationGraph Query Formulation Model Construction of Query Relation Graph 2 35 3 4 1 2 3 2 1 3 1 2 PVu |Vq (2 | 2) PVt |Vu (2 | 1) 1 2 213 PVu |Vt (1 | 2) 1 2 PVq |Vu (1 | 1) 2 23 PVq |Vq (2 | 1) 2010/12/3 5 354 5 2 1 2 * * * 12 6 2 2 3 Ranking with Hitting Time Apply a Markov random walk on the graph Employ hitting time as a measure to rank queries The expected number of steps before node j is visited starting from node i The hiting time T is the first time that the random walk is at node j from the start node i: PVq |Vq (j|i )[T m ] n PV V k 1 PVq |Vq (j|k )[T m 1] q | q (k|i ) The mean hitting time h(j|i) is the expectation of T under the X0 i condition hVq |Vq (j|i ) 2010/12/3 mPV V m 1 [T m | X 0 i ] q | q (j |i ) Ranking with Hitting Time Apply a Markov random walk on the graph Employ hitting time as a measure to rank queries The expected number of steps before starting from node i Satisfies the following linear system 2010/12/3 node j is visited Clustering with Modularity Group the top k recommendations into clusters It is natural to apply a graph clustering approach Modularity function C1 C 2 Ck C 1 e11 e12 ... e1k a 1 e e ... e 2k a C 2 21 22 2 Ck e ij ... ... ... ... e e ... e kk ak ai k1 k 2 Pij V C V C i q j q i q q Pij V V V V j eij j Note: In a network in which edges fall between vertices without regard for the communities they belong to ,we would have est as at 2010/12/3 Clustering with Modularity Group the top k recommendations into clusters It is natural to apply a graph clustering approach Modularity function 2010/12/3 Employ the fast unfolding algorithm to perform clustering Clustering with Modularity Group the top k recommendations into clusters It is natural to apply a graph clustering approach Modularity function Employ the fast unfolding algorithm to perform clustering Label each cluster explicitly with social tags The expected tag distribution given a query: The expected tag distribution under a cluster: 2010/12/3 Outline Motivation Challenges Approach Experimental Results Conclusions 2010/12/3 Experimental Results Data set Query Logs: Spring 2006 Data Asset (Microsoft Research) Social Annotation Data: Delicious data 15 million records (from US users) sampled over one month in May, 2006 2.7 million unique queries and 4.2 million unique URLs Over 167 million taggings sampled during October and November, 2008 0.83 million unique users, 57.8 unique URLs and 5.9 million unique tags Query Relation Graph: 538, 547 query nodes Baseline Methods BiHit: Hitting Time approach based on query logs (Mei et al. CIKM ’08) TriList: list-based approach to query recommendation considering both search and exploratory interests TriStrucutre: Our approach 2010/12/3 Examples of RecommendationResults TriStructure Query = espn BiHit espn magazine espn go espn news espn sports esonsports baseball news espn espn mlb sports news espn radio espn 103.3 espn cell phone espn baseball sports mobile espn 2010/12/3 espn hockey [sports espn news] TriList espn radio espn radio espn news espn news espn nba yahoo sports espn mlb nba news espn sports cbs sportsline bill simmons espn nba sports [sports news scores] yahoo sports espn mlb nba news espn sports cbs sportsline sporting news scout sportsline sports illustrated bill simmons fox sports sports sporting news scout sportsline sports illustrated Examples of RecommendationResults TriStructure Query = 24 BiHit 24 season 5 24 series [tv 24 entertainment] fox 24 TriList fox 24 kiefer sutherland kiefer sutherland 24 tv show 24 fox 24 on fox tv guide 24 fox jack bauer 24 tv show fox 24 24 on fox 24 fox 24 tv show jack bauer tv show 24 grey’s anatomy 24 hour fox television network fox broadcasting 24 on fox desperate housewives prison break 24 spoilers [tv televisions entertainment] tv guide abc tv listings fox fox tv 24 spoilers fox sports net abc grey’s anatomy fox sport tv listings desperate housewives ktvi 2 fox prison break one tree hill one tree hill 2010/12/3 fox five news [tv television series] Manual Evaluation Comparison based on users’click behavior 2010/12/3 A label tool to simulate the real search scenario Label how likelihood the user would like to click (6-point scale) Randomly sampled 300 queries, 9 human judges Experimental Results (cont.) non-zero label score ➡ click Clicked Recommendation Number (CRN) Clicked Recommendation Score (CRS) Total Recommendation Score (TRS) Overall Performance Click Performance Comparison Distributions of Labeled Score over Recommendations 2010/12/3 Experimental Results (cont.) How Structure Helps How the structured approach affects users’ click willingness Click Entropy The Average Click Entropy over Queries under the TriList and TriStructure Methods. 2010/12/3 Experimental Results (cont.) How Structure Helps How the structured approach affects users’ click patterns Label Score Correlation Correlation between the Average Label Scores on Same Recommendations for Queries. 2010/12/3 Outline Motivation Challenges Approach Experimental Results Conclusions 2010/12/3 Conclusions Recommend queries in a structured way for better satisfying both search and exploratory interests of users Introduce the social annotation data as an important resource for recommendation Better satisfy users interests and significantly enhance user’s click behavior on recommendations Future work Trade-off between diversity and concentration Tag propagation 2010/12/3 Thanks! 2010/12/3