A Study of Poisson Query Generation Model for Information Retrieval Qiaozhu Mei, Hui Fang, and ChengXiang Zhai University of Illinois at Urbana-Champaign.

Download Report

Transcript A Study of Poisson Query Generation Model for Information Retrieval Qiaozhu Mei, Hui Fang, and ChengXiang Zhai University of Illinois at Urbana-Champaign.

A Study of Poisson Query
Generation Model for
Information Retrieval
Qiaozhu Mei, Hui Fang, and
ChengXiang Zhai
University of Illinois at Urbana-Champaign
1
Outline
•
•
•
•
Background of query generation in IR
Query generation with Poisson language model
Smoothing in Poisson query generation model
Poisson v.s. multinomial in query generation IR
– Analytical comparison
– Empirical experiments
• Summary
2
Query Generation IR Model [Ponte & Croft 98]
Document
Query
Language Model Likelihood
d1
θ d1
d2
θd2
dN
θdN
p ( q |  d1 )
q
p(q |  d 2 )
p(q |  d N )
Score(d , q)  p(q |  d )
• Scoring documents with query likelihood
• Known as the language modeling (LM) approach to IR
• Different from document generation
3
Interpretation of LM d
• d : a model for queries posed by users who like
document d [Lafferty and Zhai 01]
– Estimate d using document d  use d to approximate
the queries used by users who like d
• Existing methods differ mainly in the choice of d
and how d is estimated (smoothing)
– Multi-Bernoulli: e.g, [Ponte and Croft 98, Metzler et al 04]
– Multinomial: (most popular) e.g., [Hiemstra et al. 99, Miller et al.
99, Zhai and Lafferty 01]
4
Multi-Bernoulli vs. Multinomial
Multi-Bernoulli:
Flip a coin for each word
Doc: d
text
mining
text
mining
model
clustering
text
model
text
…
H
H
… model
Query q:
“text mining”
T
p (q | d )   p( w  1 | d ) p ( w  0 | d )
wq
wq
Multinomial:
Toss a die to choose a word
mining
model
text
text
Query q:
“text mining”
mining
|V |
p(q | d )   p( w j | d )
j 1
c ( w j ,q )
5
Problems of Multinomial
Number of Documents
• Does not model term absence
• Sum-to-one over all terms
• Reality is harder than expected:
TF: skewed distribution
– Empirical estimates:
mean (tf) < variance (tf)
16000
(Church & Gale 95)
14000
"said"
"bush"
12000
10000
– Estimates on AP88-89:
• All terms: : 0.0013;
2: 0.0044
• Query terms: : 0.1289;
2: 0.3918
8000
6000
4000
2000
0
0
10
20
Term Frequency
30
40
– Multinomial/Bernoulli: mean >
variance
6
Poisson?
e   ( ) k
p(c( w)  k ) 
k!
• Poisson models frequency directly (including
zero freq.)
• No sum-to-one constraint on different w
• Mean = Variance
• Poisson is explored in document generation
models, but not in query generation models
7
Related Work
• Poisson has been explored in document
generation models, e.g.,
– 2-Poisson  Okapi/BM25 (Robertson and Walker 94)
– Parallel derivation of probabilistic models
(Roelleke and Wang 06)
• Our work add to this body of exploration of
Poisson
– With query generation framework
– Explore specific features Poisson brings in LM
8
Research Questions
• How can we model query generation with
Poisson language model?
• How can we smooth such a Poisson query
generation model?
• How is a Poisson model different from a
multinomial model in the context of query
generation retrieval?
9
Query Generation with Poisson
Poisson:
Each term as an emitter
text
mining
model
clustering
text
model
text
…
p (q | d ) 
Query: receiver
Rates of
arrival of w:
text
mining
model
clustering
…
3/7
2/7
1/7
1/7
: |q|
[
]
[
]
[ /
]
[ /
]
[
]
1
2
0
0
1
Query: “mining text mining systems”
3
e 3 / 7 (  )1
7
1!
2
1 / 7 1
(  )0
e  2 / 7 (  ) 2 e
7
7
0!
2!
1
e 1/ 7 (  ) 0
7
0!
e   ( )1
1!
10
Query Generation with Poisson (II)
q = ‹c(w1, q), c(w2 , q), …, c(wn , q)›
text
mining
model
clustering
text
model
text
…
ˆ  {ˆ ,..., ˆ }

d
1
n
w1
text
w2
mining
w3
model
1
2
3
w4 clustering
wN
4
…
N
MLE
̂i
 c( w , d )

  c( w' , d )
d D
d D w 'V
i
|q|
[ c(w1, q)
]
[ c(w2, q)
]
[ c(w3, q)
]
[ c(w4, q)
]
[ c(wN, q)
]
e  i |q| (i | q |) c ( wi ,q )
p(q | d )   p(c( wi , q) | d )  
c( wi , q)!
i 1
i 1
n
n
11
Smoothing Poisson LM
ˆ )
Score(d , q)   log p(c( w, q) | 
d
wV
text
mining
model
clustering
text
model
…
C
 dMLE
Query: text mining systems
text 0.02
mining 0.01
model 0.02
…
system 0
+
Background
Collection
text 0.0001
mining 0.0002
model 0.0001
…
system 0.0001
?
̂
e.g., text:  * 0.02 + (1-  )* 0.0001
d system:  * 0 + (1-  )* 0.0001
Different smoothing methods lead to different retrieval formulae
12
Smoothing Poisson LM
• Interpolation (JM):  dMLE
1
+
C
(1   ) p(c(w, q) |  d MLE )    p(c(w, q) | C )
• Bayesian smoothing with Gamma prior:
C
2
Gamma prior
ˆd ,w  
d ,w
d
 dMLE
d ,w p(d ,w | d , C )dd ,w
• Two stage smoothing:
C
 dMLE
'd
U
d
13
Smoothing Poisson LM (II)
• Two-stage smoothing:
– Similar to multinomial 2-stage (Zhai and Lafferty 02)
– Verbose queries need to be smoothed more
3 p(c(w, q) |  d ,U )  (1   ) p(c(w, q) |  d )    p(c(w, q) | U )
A smoothed version of document model (from  dMLE and  C )
e.g.,
ˆ
d ,w

c( w, d )    C , w
| d | 
A background model of user query preference
Use  C when no user prior is known
14
Analytical Comparison: Basic
Distributions
multi-Bernoulli multinomial Poisson
Event space
Appearance
/absence
Yes
V
frequency
Model absence?
Model frequency? No
No
Yes
Yes
Yes
Model length?
No
(document/query)
No
Yes
Sum-to-one
constraint?
Yes
No
No
15
Analytical: Equivalency of basic models
• Equivalent with basic model and MLE:
c( w, d )
Score(d , q)   c( w, q) log
c ( w, q )  0
 c(w, d )
• Poisson + Gamma Smoothing =
multinomial + Dirichlet Smoothing
Score(d , q) 

wq  d
c( w, q) log( 1 
wV
c( w, d )

) | q | log
c( w, C )
| d | 

|C |
• Basic model + JM smoothing behaves similarly
(with a variant component of document length normalization )
16
Benefits: Per-term Smoothing
• Poisson doesn’t require “sum-to-one” over
different terms (different event space)
• Thus  in JM smoothing and 2-stage smoothing
can be made term dependent (per-term)
– multinomial cannot achieve per-term smoothing
(1 w) p(c(w, q) |  d MLE )  w p(c(w, q) | C )
• Can use EM algorithm to estimate ws.
17
Benefits: Modeling Background
• Traditional:  C as a single model
• Not matching the reality
•  C as a mixture model: increase variance
– multinomial mixture (e.g., clusters, PLSA, LDA)
• Inefficient (no close form, iterative estimation)
– Poisson mixture (e.g., Katz’s K-Mixture, 2-Poisson,
Negative Binomial) (Church & Gale 95)
• Have close forms, efficient computation
18
Hypotheses
• H1: With basic query generation retrieval models
(JM smoothing and Gamma smoothing):
Poisson behaves similarly to multinomial
• H2: Per-term smoothing with Poisson may outperform term independent smoothing
– More help on verbose queries
• H3: Background efficiently modeled as Poisson
mixtures may perform better than single Poisson
19
Experiment Setup
• Data: TREC collections and Topics
– AP88-89, Trec7, Trec8, Wt2g
• Query type:
– Short keyword (keyword title);
– Short verbose (one sentence);
– Long verbose (multiple sentences);
• Measurement:
– Mean average precision (MAP)
20
H1: Basic models behave similarly
• JM+Poisson
MAP JM+Multinomial
• JM + Poisson 
JM + Multinomial
• Gamma/Dirichlet >
JM (Poisson/
Multinomial)
• Gamma/Dirichlet
> JM (Poisson/
Multinomial)

21
H2: Per-term outperforms termindependent smoothing
Data
AP
Trec-7
Trec-8
Web
Q
Gamma/
Dirichlet
Per-term
2-stage
SK
0.224
0.226
SV
0.204
0.217*
LV
0.291
0.304*
SK
0.186
0.185
SV
0.182
0.196*
LV
0.224
0.236*
SK
0.257
0.256
SV
0.228
0.246*
LV
0.260
0.274*
SK
0.302
0.307
SV
0.273
0.292*
LV
0.283
0.311*

Per-term > Non-per-term
22
Improvement Comes from Per-term
Data
JM + Per-term > JM
AP
Trec-7
2-stage + Per-term
> 2-stage
Trec-8
Web
Q
JM
JM+per 2-term
stage
2-stage +
per-term
SK
0.203
0.206
0.223
0.226*
SV
0.183
0.214*
0.204
0.217*
SK
0.168
0.174
0.186
0.185
SV
0.176
0.198*
0.194
0.196
SK
0.239
0.227
0.257
0.256
SV
0.234
0.249*
0.242
0.246*
SK
0.250
0.220*
0.291
0.307*
SV
0.217
0.261*
0.273
0.292*
Significant improvement on verbose query
23
H3: Poisson Mixture Background
Improves Performance
Data
AP
Trec-7
Trec-8
Web
Query c = single
Poisson
c = Katz’
K-Mixture
SK
0.203
0.204
SV
0.183
0.188*
SK
0.168
0.169
SV
0.176
0.178*
SK
0.239
0.239
SV
0.234
0.238*
SK
0.250
0.250
SV
0.217
0.223*
Katz’ K-Mixture > Single Poisson
24
Poisson Opens Other Potential
Flexibilities
• Document length penalization?
– JM introduced a variant component of document length
normalization
– Require more expensive computation
• Pseudo-feedback?
– p (c( w, q ) | U ) in the 2-stage smoothing
– Use feedback documents to estimate term dependent
ws.
• Lead to future research directions
25
Summary
• Poisson: Another family of retrieval models
based on query generation
• Basic models behave similarly to multinomial
• Benefits: per-term smoothing and efficient
mixture background model
• Many other potential flexibilities
• Future work:
– explore document length normalization and pseudofeedback
– better estimation of per-term smoothing coefficients
26
Thanks!
27