Personal Information Retrieval (PIR)
Download
Report
Transcript Personal Information Retrieval (PIR)
Retrieval and Evaluation Techniques
for Personal Information
Jin Young Kim
7/26 Ph.D Dissertation Seminar
Personal Information Retrieval (PIR)
The practice and the study of supporting users to retrieve
personal information effectively
2
Personal Information Retrieval in the Wild
Everyone has unique information & practices
Different information and information needs
Different preference and behavior
Many existing software solutions
Platform-level: desktop search, folder structure
Application-level: email, calendar, office suites
3
Previous Work in PIR (Desktop Search)
Focus
User interface issues [Dumais03,06]
Desktop-specific features [Solus06] [Cohen08]
Limitations
Each based on different environment and user group
None of them performed comparative evaluation
Research findings do not accumulate over the years
4
Our Approach
Develop general techniques for PIR
Make contributions to related areas
Start from essential characteristics of PIR
Applicable regardless of users and information types
Structured document retrieval
Simulated evaluation for known-item finding
Build a platform for sustainable progress
Develop repeatable evaluation techniques
Share the research findings and the data
5
Essential Characteristics of PIR
Many document types
Unique metadata for each type
People combine search and browsing [Teevan04]
Long-term interactions with a single user
People mostly find known-items [Elsweiler07]
Privacy concern for the data set
Field-based
Search
Models
Associative
Browsing
Model
Simulated
Evaluation
Methods
6
Search and Browsing Retrieval Models
Challenge
Users may remember different things about the document
How can we present effective results for both cases?
User’s
Memory
Search
Browsing
Lexical
Memory
Associative
Memory
Query
James
Registration
Retrieval
Results
1.
2.
3.
4.
5.
7
Information Seeking Scenario in PIR
User
Input
Search
System
Output
James
A user initiate a session with
a keyword query
Registration
Browsing
2011
Search
The user switches to browsing
by clicking on a email document
James
Registration
2011
The user switches to back to
search with a different query
Simulated Evaluation Techniques
Challenge
User’s query originates from what she remembers.
How can we simulate user’s querying behavior realistically?
User’s
Memory
Search
Browsing
Lexical
Memory
Associative
Memory
Query
James
Registration
Retrieval
Results
1.
2.
3.
4.
5.
9
Research Questions
Field-based Search Models
Associative Browsing Model
How can we improve the retrieval effectiveness in PIR?
How can we improve the type prediction quality?
How can we enable the browsing support for PIR?
How can we improve the suggestions for browsing?
Simulated Evaluation Methods
How can we evaluate a complex PIR system by simulation?
How can we establish the validity of simulated evaluation?
10
Field-based Search Models
Searching for Personal Information
An example of desktop search
12
Field-based Search Framework for PIR
Type-specific Ranking
Type Prediction
Rank documents in each document collection (type)
Predict the document type relevant to user’s query
Final Results Generation
Merge into a single ranked list
13
Type-specific Ranking for PIR
Individual collection has type-specific features
Most of these documents have rich metadata
Thread-based features for emails
Path-based features for documents
Email: <sender, receiver, date, subject, body>
Document: <title, author, abstract, content>
Calendar: <title, date, place, participants>
We focus on developing general retrieval techniques for
structured documents
Structured Document Retrieval
Field Operator / Advanced Search Interface
User’s search terms are found in multiple fields
Understanding Re-finding Behavior in Naturalistic Email Interaction Logs.
Elsweiler, D, Harvey, M, Hacker., M [SIGIR'11]
15
Structured Document Retrieval: Models
Document-based Retrieval Model
f1
Score each document as a whole
f2
...
Field-based Retrieval Model
fn
Combine evidences from each field
q1 q2 ... qm
q1 q2 ... qm
f1
f2
...
fn
Document-based Scoring
w1
w2
wn
f1
f2
...
fn
w1
w2
wn
Field-based Scoring
16
Field Relevance Model for Structured IR
Field Relevance
Different fields are important for different query terms
2
1
‘registration’ is relevant
when it occurs in <subject>
2
1
2
1
‘james’ is relevant when
it occurs in <to>
17
Estimating the Field Relevance: Overview
If User Provides Feedback
Relevant document provides sufficient information
If No Feedback is Available
Combine field-level term statistics from multiple sources
from/to
title
content
Collection
+
from/to
title
content
Top-k Docs
≅
from/to
title
content
Relevant Docs
18
Estimating Field Relevance using Feedback
Assume a user who marked DR as relevant
Estimate field relevance from the field-level term dist. of DR
We can personalize the results accordingly
Rank higher docs with similar field-level term distribution
This weight is provably optimal under LM retrieval framework
Field Relevance:
- To is relevant for ‘james’
- Content is relevant for ‘registration’
DR
19
Estimating Field Relevance without Feedback
Linear Combination of Multiple Sources
Weights estimated using training queries
Features
Field-level term distribution of the collection
Field-level term distribution of top-k docs
Unigram and Bigram LM
Unigram and Bigram LM
A priori importance of each field (wj)
Estimated using held-out training queries
Unigram is the
same to PRM-S
Pseudo-relevance
Feedback
Similar to MFLM
and BM25F
20
Retrieval Using the Field Relevance
Comparison with Previous Work
q1 q2 ... qm
f1
sum
f2
...
fn
w1
w2
wn
f1
f2
...
fn
q1 q2 ... qm
w1
w2
wn
f1
f2
...
fn
P(F1|q1)
P(F2|q1)
P(Fn|q1)
f1
f2
...
fn
P(F1|qm)
P(F2|qm)
P(Fn|qm)
multiply
Ranking in the Field Relevance Model
Per-term Field Score
Per-term Field Weight
21
Evaluating the Field Relevance Model
Retrieval Effectiveness (Metric: Mean Reciprocal Rank)
DQL
BM25F
MFLM
FRM-C FRM-T
FRM-R
TREC
54.2%
59.7%
60.1%
62.4%
66.8%
79.4%
IMDB
40.8%
52.4%
61.2%
63.7%
65.7%
70.4%
Monster
42.9%
27.9%
46.0%
54.2%
55.8%
71.6%
Per-term Field Weights
Fixed Field Weights
80.0%
70.0%
60.0%
TREC
50.0%
IMDB
Monster
40.0%
30.0%
20.0%
DQL
BM25F
MFLM
FRM-C
FRM-T
FRM-R
22
Type Prediction Methods
Field-based collection Query-Likelihood (FQL)
Calculate QL score for each field of a collection
Combine field-level scores into a collection score
Feature-based Method
Combine existing type-prediction methods
Grid Search / SVM for finding combination weights
23
Type Prediction Performance
Pseudo-desktop Collections
CS Collection
(% of queries with correct prediction)
FQL improves performance over CQL
Combining features improves the performance further
24
Summary So Far…
Field relevance model for structured document retrieval
Type prediction methods for PIR
Enables relevance feedback through field weighting
Improves performance using linear feature-based estimation
Field-based type prediction method (FQL)
Combination of features improve the performance further
We move onto associative browsing model
What happens when users can’t recall good search terms?
Associative Browsing Model
Recap: Retrieval Framework for PIR
Keyword Search
Associative Browsing
Registration
James
James
27
User Interaction for Associative Browsing
Users enter a concept or document page by search
The system provides a list of suggestions for browsing
Data Model
User Interface
How can we build associations?
How would it match user’s
Automatically?
Manually?
preference?
Participants wouldn’t create associations beyon
d simple tagging operations
- Sauermann et al. 2005
29
Building the Associative Browsing Model
1. Document Collection
2. Concept Extraction
3. Link Extraction
4. Link Refinement
Click-based
Training
Term
Similarity
Temporal
Similarity
Co-occurrence
30
Link Extraction and Refinement
Link Scoring
Link Presentation
Combination of link type scores
S(c1,c2) = Σi [ wi × Linki(c1,c2) ]
Ranked list of suggested items
Users click on them for browsing
Concepts
Documents
Term Vector Similarity
Temporal Similarity
Tag Similarity
String Similarity
Path / Type Similarity
Co-occurrence
Concept Similarity
Concept: Search Engine
Link Refinement (training wi)
Maximize click-based relevance
Grid Search : Maximize retrieval effectiveness (MRR)
RankSVM : Minimize error in pairwise preference
Browsing Suggestions
31
Evaluating Associative Browsing Model
Data set: CS Collection
Value of browsing for known-item finding
Collect public documents in UMass CS department
CS dept. people competed in known-item finding tasks
% of sessions browsing was used
% of sessions browsing was used & led to success
Quality of browsing suggestions
Mean Reciprocal Rank using clicks as judgments
10-fold cross validation over the click data collected
32
Value of Browsing for Known-item Finding
Evaluation Type
Simulation
Total
(#sessions)
Browsing used
Successful
outcome
63,260
9,410 (14.8%)
3,957 (42.0%)
User Study (1)
290
42 (14.5%)
15 (35.7%)
Document
Only
User Study (2)
142
43 (30.2%)
32 (74.4%)
Document
+ Concept
Comparison with Simulation Results
Roughly matches in terms of overall usage and success ratio
The Value of Associative Browsing
Browsing was used in 30% of all sessions
Browsing saved 75% of sessions when used
Quality of Browsing Suggestions
Concept Browsing (MRR)
0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
CS/Top1
CS/Top5
title
content
tag
time
string
cooc
occur
Uniform
Grid
SVM
Document Browsing (MRR)
0.18
0.16
0.14
0.12
0.1
CS/Top1
0.08
CS/Top5
0.06
0.04
0.02
0
title
content
tag
time
topic
path
type
concept
Uniform
Grid
SVM
34
Simulated Evaluation Methods
Challenges in PIR Evaluation
Hard to create a ‘test-collection’
Each user has different documents and habits
People will not donate their documents and queries for research
Limitations of user study
Experimenting with a working system is costly
Experimental control is hard with real users and tasks
Data is not reusable by third parties
36
Our Approach: Simulated Evaluation
Simulate components of evaluation
Collection: user’s documents with metadata
Task: search topics and relevance judgments
Interaction: query and click data
37
Simulated Evaluation Overview
Simulated document collections
Pseudo-desktop Collections
CS Collection
Subsets of W3C mailing list + Other document types
UMass CS mailing list / Calendar items / Crawl of homepage
Evaluation Methods
Controlled User Study
Simulated Interaction
Field-based
Search
DocTrack Search Game
Query Generation Methods
Associative
Browsing
DocTrack Search + Browsing
Game
Probabilistic User Modeling
Controlled User Study: DocTrack Game
Procedure
DocTrack search game
Collect public documents in UMass CS dept. (CS Collection)
Build a web interface where participants can find documents
People in CS department participated
20 participants / 66 games played
984 queries collected for 882 target documents
DocTrack search+browsing game
30 participants / 53 games played
290 +142 search sessions collected
39
DocTrack Game
Target Item
Find It!
*Users can use search and browsing
for DocTrack search+browsing game
40
Query Generation for Evaluating PIR
Known-item finding for PIR
Query Generation for PIR
A target document represents an information need
Users would take terms from the target document
Randomly select a target document
Algorithmically take terms from the document
Parameters of Query Generation
Choice of extent : Document [Azzopardi07] vs. Field
Choice of term : Uniform vs. TF vs. IDF vs. TF-IDF [Azzopardi07]
41
Validating of Generated Queries
Basic Idea
Validation by Comparing Query-terms
Use the set of human-generated queries for validation
Compare at the level of query terms and retrieval scores
The generation probability of manual query q from Pterm
Validation by Compare Retrieval Scores [Azzopardi07]
Two-sided Kolmogorov-Smirnov test
42
Validation Results for Generated Queries
Validation based on query terms
Validation based on retrieval score distribution
Probabilistic User Model for PIR
Query generation model
State transition model
Term selection from a target document
Use browsing when result looks marginally relevant
Link selection model
Click on browsing suggestions based on perceived relevance
44
A User Model for Link Selection
User’s level of knowledge
Random : randomly click on a ranked list
Informed : more likely to click on more relevant item
Oracle : always click on the most relevant item
Relevance estimated using the position of the target item
1 …
2 …
3 …
1 …
2 …
4 …
5 …
3 …
4 …
1 …
5 …
2 …
3 …
4 …
5 …
45
Success Ratio of Browsing
Varying the level of knowledge and fan-out for simulation
Exploration is valuable for users with low knowledge level
0.48
0.46
0.44
0.42
0.4
random
informed
0.38
oracle
0.36
0.34
0.32
0.3
FO1
FO2
FO3
More Exploration
46
Community Efforts using the Data Sets
47
Conclusions & Future Work
Major Contributions
Field-based Search Models
Associative Browsing Model
Field relevance model for structured document retrieval
Field-based and combination-based type prediction method
An adaptive technique for generating browsing suggestions
Evaluation of associative browsing in known-item finding
Simulated Evaluation Methods for Known-item Finding
DocTrack game for controlled user study
Probabilistic user model for generating simulated interaction
49
Field Relevance for Complex Structures
Current work assumes documents with flat structure
Field Relevance for Complex Structures?
XML documents with hierarchical structure
Joined Database Relations with graph structure
Cognitive Model of Query Generation
Current query generation methods assume:
Relaxing these assumptions
Queries are generated from the complete document
Query-terms are chosen independently from one another
Model the user’s degradation in memory
Model the dependency in query term selection
Ongoing work
Graph-based representation of documents
Query terms can be chosen by random walk
Thank you for your attention!
Special thanks to my advisor, coauthors, and all of you here!
Are we closer to the superhuman now?
One More Slide: What I Learned…
Start from what’s happening from user’s mind
Balance user input and algorithmic support
Field relevance / query generation, …
Generating suggestions for associative browsing
Learn from your peers & make contributions
Query generation method / DocTrack game
Simulated test collections & workshop