The Query-flow Graph: Model and Applications

Download Report

Transcript The Query-flow Graph: Model and Applications

The Query-flow Graph: Model and
Applications
Paolo Boldi, Francesco Bonchi, Carlos Castillo
Debora Donato , Aristides Gionis, Sebastiano Vigna
Qingyun Wu
Outline
•
•
•
•
Introduction
Basic Concepts
Building the Query-Flow Graph
Applications
Finding logical sessions
Query Recommendation
• Conclusion
Introduction
• Query-Flow Graph
Two ways to define weighting function
• Finding Logical sessions
• Query Recommendation
Basic Concepts
•
•
•
•
Query Log (𝑞𝑖 , 𝑢𝑖 , 𝑡𝑖 , 𝑉𝑖 , 𝐶𝑖 ) 𝐿 = (𝑞𝑖 , 𝑢𝑖 , 𝑡𝑖 )
Sessions
𝑆 =≪ 𝑞𝑖1 , 𝑢𝑖1 , 𝑡𝑖1 >, … , < 𝑞𝑖𝑘 , 𝑢𝑖𝑘 , 𝑡𝑖𝑘 >>
Supersessions without timeout threshold
Chain a sequence of queries with a similar information
need
• Query-flow graph
𝐺𝑞𝑓 = (𝑉, 𝐸, 𝜔)
𝑉 = 𝑄 ∪ {𝑠, 𝑡}, 𝐸 ⊆ 𝑉 × 𝑉, 𝜔 𝑞, 𝑞 ′ : E → (0. . 1]
Building the Query-flow Graph
• Input: a set of sessions 𝑆 𝐿 = {𝑆1 , … , 𝑆𝑚 }
• Given two queries: 𝑞, 𝑞 ′ ∈ 𝑄, if there is at
least one session in 𝑆(𝐿) that 𝑞, 𝑞 ′ are
consecutive, then we form the set of
tentative edges 𝑇 as:
𝑇 = {(𝑞, 𝑞 ′ )|∃𝑆𝑗 ∈ 𝑆 𝐿 𝑠. 𝑡. 𝑞 = 𝑞𝑖 ∈ 𝑆𝑗 ∧ 𝑞 ′ = 𝑞𝑖+1 ∈ 𝑆𝑗 }
• Key point:
How to define the weighting function 𝜔: 𝐸 →
(0. . 1]
Building the Query-flow Graph
• Weights based on chaining probabilities
Extracted features: textual features, session
features, time-related features, and etc.
Training data: picking at random a set of edges and
manually label them.
Machine learning model: logistic regression model
and rule-based model
The authors used the model selected to assign the
weight 𝜔(𝑞, 𝑞 ′ ) to each edge (𝑞, 𝑞′ )
Building the Query-flow Graph
• Weights based on relative frequencies
Turns the query flow graph into a Markov chain.
𝑓 𝑞 : the number of times query 𝑞 appear in the
query log
𝑓 𝑞, 𝑞′ : the number of times query 𝑞′ follows
immediately 𝑞 in a session.
𝑓 𝑠, 𝑞 , and 𝑓(𝑞, 𝑡): the number of times query 𝑞
is the first and last query of a session.
Building the Query-flow Graph
• Weights based on relative frequencies
The weight we use is:
′
𝜔 𝑞, 𝑞
′
=
𝑓(𝑞,𝑞′ )
𝑓(𝑞)
0
𝑖𝑓(𝜔 𝑞, 𝑞 ′ > 𝜃 ∨ (𝑞 = 𝑠) ∨ 𝑞 = 𝑡)
𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒,
in which we use the chaining probabilities 𝜔(𝑞, 𝑞′ )
basically to discard pairs that have a probability less
than 𝜃 to be part of the same chain.
Normalization: transition matrix 𝑃 of a Markov chain
Building the Query-flow Graph
Fig. 1: A portion of the query flow graph using the
Weighting scheme based on relative frequencies
Application 1: Finding logical sessions
• Finding chains of queries in user sessions.
It is a important problem as it allows improving
query-log analysis, user profiling, mining user behavior,
and more.
• Challenges: the supersession 𝑆 may contain many
interwined chains
• Approach: separate the problem into two
subproblems--session reordering and session
breaking
Application 2: Query Recommendation
• Suggest new queries that may be relevant to
the current user’s mission
• Use the weighting scheme based on relative
frequencies
• Three recommendation methods
Application 2: Query Recommendation
• Recommendation by maximum weight
For an input query 𝑞, choose the node having
the largest 𝜔′ (𝑞, 𝑞 ′ )
• Problem: It tends to “drift” toward those queries
that are popular in the query log, but not necessarily
related with the query at hand
Application 2: Query Recommendation
• Recommendation by random walk
Idea: more randomization
Random Walk: A random surfer starts at the initial
querey 𝑞, then, at each step, with probability 𝛼<1, the
surfer follows one of the outlinks from the current
node choosen proportionally to the weights, and with
probability 1 − 𝑎, he instead jumps back to 𝑞.
Application 2: Query Recommendation
• Recommendation by random walk
Transition Matrix:
𝐴 = 𝛼𝑃 + 1 − 𝛼 1𝑒𝑞𝑇
Stationary distribution vector 𝑣(random-walk score
relative to 𝑞):
𝑣𝑇𝐴 = 𝑣
Application 2: Query Recommendation
• Recommendation by random walk
Recommendation can be deduced from the random-walk
score by taking either the single top-score query, or the
best queries up to a certain score threshold.
Problem: similar in maximum weight
Method:
𝑠𝑞
(𝑞 ′ )
𝑠𝑞
(𝑞′ )
=
𝑠𝑞 (𝑞 ′ )
𝑟(𝑞 ′ )
𝑠𝑞 𝑞 ′ =
𝑠𝑞 (𝑞 ′ ) ∙ 𝑠𝑞 (𝑞′ )
In which 𝑟(𝑞 ′ ) is the absolute random walk score of
𝑞 ′ (computed using a uniform preference vector)
Application 2: Query Recommendation
• Recommendation with history
A step further: provide recommendation that
depends not only on the last query input, but on
some of the last queries in the user’s history.
𝐴 = 𝛼𝑃 + 1 − 𝛼 1𝑒𝑞𝑇1,..,𝑞𝑘
Use current query chain {𝑞1 , … , 𝑞𝑘 } instead of
original query 𝑞
Results
Table 1: Top 10 recommendation for the query
”apple” and “jeep”, according to the baseline
and various random-walk scores proposed
Results
Table 2:
Recommendations for
two actual query chains
(Recommendation with history)
Conclusion
• Main contribution: introducing the query-flow
graph and providing a methodology for
constructing such a graph based on mining
query logs
• Applications:
1. Finding chains
2. Query recommendation
References
• Paolo Bold, Francesco Bonchi, Carlos Castillo, Debroa Donato, Aristides
Gionis, Sebastiano Vigna, “The query-flow Graph: Model and applications”,
CIKM’ 08, Oct.26-30, 2008, California, CA
Q&A