Transcript 投影片 1
Microsoft adCenter Log ECIR 2009 Introduction • User intent standard categories of Web query – Navigational – Informational – Transactional • Sponsored search – Online commercial intention (purchase) – Noncommercial (research) Intent Taxonomy • Taxonomy: – Commercial query as a query with the underlying intention to make an immediate or future purchase of a specific product or service – Place all other queries into the noncommercial category – A navigational query is defined as a query with the underlying intention to locate a specific Web site or page – An informational query is everything else Related Work • Dai et al. propose a commercial query detector – frequent queries are more likely to have commercial intent • Lee et al. predict user query goals in terms of navigational and informational intent – Past user-click behavior – Anchor-link distribution • Regelson and Fain estimate the click through rate of new ads – Using the click through rates of existing ads with the same bid terms or topic clusters Data Set • Microsoft adCenter Logs – 100 million search impressions – an impression is defined as a single search result page • Filter – removed any extra space (begin, end, between words) – occurring only once (mostly with no ads, 27 million queries are filtered) – impressions with a duplicate combination of impression id and user session id – queries should have at least four ad clicks • our analysis deals with empirical ad click through of queries, it may be wildly different from the true click through rate for queries with few number of ads, leading to noise. • Randomly partitioned the data into three equal-sized sets Features and Classifier • Classifier: SVM • Feature: – query based features • query strings • the content of search engine result pages returned (snippet, anchor text) – content of search result pages • submit each query to the Live search engine and download the 1st search engine result page (SERP) for that query (web page cotent) – ad click through features • extracted according to the impression and click through data recorded for each query. • Ad text is not included, avoiding any possible distortion that ad keywords might produce in the classification. Ground Truth • Ground truth: 1700 queries have been selected for manual classification – the query was contained in training data – the ad click frequency of the query was greater than or equal to 11. • Each selected query was then manually labeled as – Commercial(42%), noncommercial(58%) – Navigational(60%), informational(40%) – three independent annotators. • Agreement – commercial/ noncommercial (81%) – navigational/ informational (87%) Intent Prediction Performance Estimating Number of Ad Clicks for Queries • Ai where i is the number of displayed ads for the impressions in that set • The value |Ai| indicates the number of impressions with i ads displayed • cij 1 to represent there was an ad click resulting from such jth impression, and 0 otherwise Click to Impression Ratio Estimating Number of Ad Clicks for Queries • The number of ad clicks for a given query q can be estimated based on – the number of ads displayed for q – the number of unique impressions in which the query appears • denote the number of times query q appears in the impressions with i number of ads. Estimating Number of Ad Clicks for Queries Estimating Number of Ad Clicks for Queries Conclusion • Click through features, query features, and the content of search engine result pages are together effective in detecting query intent. • Modeling query intent can improve the accuracy of predicting ad click through for previously unseen queries.