Transcript 投影片 1

Microsoft adCenter Log
ECIR 2009
Introduction
• User intent standard categories of Web query
– Navigational
– Informational
– Transactional
• Sponsored search
– Online commercial intention (purchase)
– Noncommercial (research)
Intent Taxonomy
• Taxonomy:
– Commercial query as a query with the underlying
intention to make an immediate or future purchase
of a specific product or service
– Place all other queries into the noncommercial
category
– A navigational query is defined as a query with the
underlying intention to locate a specific Web site
or page
– An informational query is everything else
Related Work
• Dai et al. propose a commercial query detector
– frequent queries are more likely to have commercial intent
• Lee et al. predict user query goals in terms of
navigational and informational intent
– Past user-click behavior
– Anchor-link distribution
• Regelson and Fain estimate the click through rate of
new ads
– Using the click through rates of existing ads with the same
bid terms or topic clusters
Data Set
• Microsoft adCenter Logs
– 100 million search impressions
– an impression is defined as a single search result page
• Filter
– removed any extra space (begin, end, between words)
– occurring only once (mostly with no ads, 27 million queries are filtered)
– impressions with a duplicate combination of impression id and user
session id
– queries should have at least four ad clicks
• our analysis deals with empirical ad click through of queries, it may be
wildly different from the true click through rate for queries with few
number of ads, leading to noise.
• Randomly partitioned the data into three equal-sized sets
Features and Classifier
• Classifier: SVM
• Feature:
– query based features
• query strings
• the content of search engine result pages returned (snippet, anchor
text)
– content of search result pages
• submit each query to the Live search engine and download the 1st
search engine result page (SERP) for that query (web page cotent)
– ad click through features
• extracted according to the impression and click through data
recorded for each query.
• Ad text is not included, avoiding any possible
distortion that ad keywords might produce in the
classification.
Ground Truth
• Ground truth: 1700 queries have been selected for
manual classification
– the query was contained in training data
– the ad click frequency of the query was greater than or
equal to 11.
• Each selected query was then manually labeled as
– Commercial(42%), noncommercial(58%)
– Navigational(60%), informational(40%)
– three independent annotators.
• Agreement
– commercial/ noncommercial (81%)
– navigational/ informational (87%)
Intent Prediction Performance
Estimating Number of Ad Clicks for
Queries
• Ai where i is the number of displayed ads for
the impressions in that set
• The value |Ai| indicates the number of
impressions with i ads displayed
• cij  1 to represent there was an ad click
resulting from such jth impression, and 0
otherwise
Click to Impression Ratio
Estimating Number of Ad Clicks for
Queries
• The number of ad clicks for a given query q
can be estimated based on
– the number of ads displayed for q
– the number of unique impressions in which the
query appears
•
denote the number of times query q
appears in the impressions with i number of
ads.
Estimating Number of Ad Clicks for
Queries
Estimating Number of Ad Clicks for
Queries
Conclusion
• Click through features, query features, and the
content of search engine result pages are
together effective in detecting query intent.
• Modeling query intent can improve the
accuracy of predicting ad click through for
previously unseen queries.