Transcript Slide 1
Personalizing Information Search: Understanding Users and their Interests
Diane Kelly School of Information & Library Science University of North Carolina [email protected]
IPAM | 04 October 2007
Background: IR and TREC
What is IR? Who works on problems in IR?
Where can I find the most recent work in IR?
Background: Personalization
Personalization is a process where retrieval is customized to the individual (not one-size-fits-all searching) Hans Peter Luhn was one of the first people to personalize IR through selective dissemination of information (SDI) (now called ‘filtering’) Profiles and user models are often employed to ‘house’ data about users and represent their interests Figuring out how to populate and maintain the profile or user model is a hard problem
Major Approaches
Explicit Feedback Implicit Feedback User’s desktop
Explicit Feedback
Explicit Feedback
Term relevance feedback is one of the most widely used and studied explicit feedback techniques Typical relevance feedback scenarios (
Systems-centered research has found that relevance feedback works (including pseudo relevance feedback) User-centered research has found mixed results about its effectiveness
Explicit Feedback
Terms are not presented in context so it may be hard for users to understand how they can help Quality of terms suggested is not always good Users don’t have the additional cognitive resources to engage in explicit feedback Users are too lazy to provide feedback Questions about the sustainability of explicit feedback for long-term modeling
Examples
Examples
Query Elicitation Study
Users typically pose very short queries This may be because users have a difficult time articulating their information needs traditional search interfaces encourage short queries Polyrepresentative extraction of information needs suggests obtaining multiple representations of a single information need (reference interview)
Motivation
Research has demonstrated that a positive relationship exists between query length and performance in batch-mode experimental IR Query expansion is an effective technique for increasing query length, but research has demonstrated that users have some difficulty with traditional term relevance feedback features
Elicitation Form
[Already Know] [Why Know] [Keywords]
Results: Number of Terms
20
16.18
15
10.67
9.33
10 5 0 baseline Source of Terms Q2 N=45 Q3 Q4
2.33
Experimental Runs
Source of Terms
Baseline Baseline + Pseudo Relevance Feedback Baseline + Elicitation Form Q2 Baseline + Elicitation Form Q3 Baseline + Elicitation Form Q4 Baseline + Combination of Elicitation Form Questions
Run ID
baseline pseudo05, pseudo10, pseudo20, pseudo50 Q2 Q3 Q4 Q3Q4, Q2Q3, Q2Q4, Q234
Overall Performance
.38
.36
.34
.32
.30
.28
.26
.24
0.2843
0.3685
Run_ID
Query Length and Performance
.38
.36
.34
y = 0.263 + .000265(x), p=.000
Q234 Q2Q3 Q2Q4 Q3Q4 Q2 .32
.30
baseline .28
0 Query Length 10 Q4 Q3 20 30 40
Major Findings
Users provided lengthy responses to some of the questions There were large differences in the length of users’ responses to each question In most cases responses significantly improved retrieval Query length and performance were significantly related
Implicit Feedback
Implicit Feedback
What is it?
Information about users, their needs and document preferences that can be obtained unobtrusively, by watching users’ interactions and behaviors with systems
What are some examples?
Examine: Select, View, Listen, Scroll, Find, Query, Cumulative measures Retain: Print, Save, Bookmark, Purchase, Email Reference: Link, Cite Annotate/Create: Mark up, Type, Edit, Organize, Label
Implicit Feedback
Why is it important?
It is generally believed that users are unwilling to engage in explicit relevance feedback It is unlikely that users can maintain their profiles over time Users generate large amounts of data each time the engage in online information-seeking activities and the things in which they are ‘interested’ is in this data somewhere
Implicit Feedback
What do we “know” about it?
There seems to be a positive correlation between selection (click-through) and relevance There seems to be a positive correlation between display time and relevance
What is problematic about it?
Much of the research has been based on incomplete data and general behavior And has not considered the impact of contextual variables – such as task and a user’s familiarity with a topic – on behaviors
Implicit Feedback Study
To investigate: the relationship between behaviors and relevance the relationship between behaviors and context To develop a method for studying and measuring behaviors, context and relevance in a natural setting, over time
Method
Approach: naturalistic and longitudinal, but some control Subjects/Cases: 7 Ph.D. students Study period: 14 weeks Compensation: new laptops and printers
Data Collection
Tasks Relevance Usefulness Document Context Topics Behaviors Display Time Printing Saving Endurance Frequency Stage Persistence Familiarity
Protocol
Client- & Server-side Logging Context Evaluation Context Evaluation; Document Evaluations START Week 1 Week 13 14 weeks Document Evaluations END
Results: Description of Data
Client 1 2 3
Subject
4 5 6 7 2.6 MB 6.8 MB 3.9 MB 2.0 MB 1.5 MB 21.7 MB 4.9 MB Proxy 1.7 GB 83 MB URLs Requested Docs Evaluated Tasks 15,499 870 (5%) 6 5,319 802 (14%) 11 Topics 9 80 39 MB 42 MB 48 MB 3,157 384 (12%) 19 17 3,205 353 (11%) 25 35 3,404 200 (6%) 12 25 2.9 GB 2.1 GB 14,586 1,328 (8%) 21 40 11,657 1,160 (10%) 33 26
Relevance: Usefulness
7.0
6.0
5.0
4.0
3.0
4.8 (1.65) 4.8
2.0
1.0
0.0
1 Subject 6.1 (2.00) 6.1
5.3 (2.20) 5.3
2 3 6.0 (0.80) 6.0
5.3 (2.40) 5.3
4 5 4.6 (0.80) 4.6
5.0 (2.40) 5.0
6 7
Relevance: Usefulness
5 3 1 5 3 1 7 5 3 1 7 5 3 1 7 5 3 1 7 7 5 3 1 7 5 3 1 7 01 02 03 04 05 06 07 08 09
Week viewed
10 11 12 13 14
Display Time
280 260 240 220 200 180 160 140 120 100 80 60 40 20 0 0:0 0:0 1 0:0 2 0:2 0:0 2 0:3 0 0:0 0 0:0 1 0:0 4 1:1 0:0 8 0:0 1 0:0 4 0:0 9 0:0 3 0:0 0 0:0 3 0:0 3 0:0 7 0:0 4 4:5 0:0 2 0:1 6 1:2 0 Client Display Time
Display Time & Usefulness
5.0
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
.5
0.0
1 Usefulness 2 3 4 5 6 7 Subject 1 2 3 4 5 6 7
Display Time & Task
4.0
3.5
3.0
2.5
2.0
1.5
N = 33 1 Task Number 127 2 78 3 17 4 30 5 16 6
Tasks
1. Researching Dissertation 2. Shopping 3. Read News 4. Movie Reviews & Schedules 5. Preparing Course 6. Entertainment
Major Findings
Behaviors differed for each subject, but in general most display times were low most usefulness ratings were high not much printing or saving No direct relationship between display time and usefulness
Major Findings
Main effects for display time and all contextual variables: Task (5 subjects) Topic (6 subjects) Familiarity (5 subjects) Lower levels of familiarity associated with higher display times No clear interaction effects among behaviors, context and relevance
Personalizing Search
Using the display time, task and relevance information from the study, we evaluated the effectiveness of a set of personalized retrieval algorithms Four algorithms for using display time as implicit feedback were tested: 1. User 2. Task 3. User + Task 4. General
Results
0.35
0.3
0.25
MAP 0.2
0.15
0.1
0.05
0 0 2 4 6 8 10 Iteration 12 14 16 18 20
Major Findings
Tailoring display time thresholds based on task information improved performance, but doing so based on user information did not There was a lot of variability between subjects, with the user-centered algorithms performing well for some and poorly for others The effectiveness of most of the algorithms increased with time (and more data)
Some Problems
Relevance
What are we modeling? Does click = relevance?
Relevance is multi-dimensional and dynamic A single measure does to adequately reflect ‘relevance’ Most pages are likely to be rated as useful, even if the value or importance of the information differs
Definition Recipe
Weather Forecast Information about Rocky Mountain Spotted Fever
Paper about Personalization
Page Structure
Some behaviors are more likely to occur on some types of pages A more ‘intelligent’ modeling function would know when and what to observe and expect The structure of pages encourage/inhibit certain behaviors Not all pages are equally as useful for modeling a user’s interests
What types of behaviors do you expect here?
And here?
And here?
And here?
The Future
Future
New interaction styles and systems create new opportunities for explicit and implicit feedback Collaborative search features and query recommendation Features/Systems that support the entire search process (e.g., saving, organizing, etc.) QA systems New types of feedback Negative Physiological
Thank You
Diane Kelly ( [email protected]
) WEB : http://ils.unc.edu/~dianek/research.html
Collaborators: Nick Belkin, Xin Fu, Vijay Dollu, Ryen White
TREC [
T
ext
RE
trieval
C
onference]
It’s not this …
What is TREC?
TREC is a workshop series sponsored by the National Institute of Standards and Technology (NIST) and the US Department of Defense. It’s purpose is to build infrastructure for large-scale evaluation of text retrieval technology.
TREC collections and evaluation measures are the de facto standard for evaluation in IR. TREC is comprised of different tracks each of which focuses on different issues (e.g., question answering, filtering).
TREC Collections
Central to each TREC Track is a collection, which consists of three major components: 1. A corpus of documents (typically newswire)
2. A set of information needs (called
3. A set of relevance judgments.
Each Track also adopts particular evaluation measures Precision and Recall; F-measure Average Precision (AP) and Mean AP (MAP)
Comparison of Measures
List 1 1 2 R R 3 4 R R 5 6 7 R NR NR 8 9 NR NR 10 NR AP 1/1 = 1 2/2 = 2 3/3 = 3 4/4 = 4 5/5 = 5 1.0
List 2 1 2 NR NR 3 4 NR NR 5 6 7 8 9 R R 10 R AP NR R R 1/6 = .167
2/7 = .286
3/8 = .375
4/9 = .444
5/10 = .50
.354
Learn more about TREC
http://trec.nist.gov
Voorhees, E. M., & Harman, D. K. (2005). TREC: Experiment and Evaluation in Information Retrieval, Cambridge, MA: MIT Press.
Example Topic
Learn more about IR
ACM SIGIR Conference Sparck-Jones, K., & Willett, P. (1997). Readings in Information Retrieval. Morgan-Kaufman Publishers.
Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. New York, NY: ACM Press.
Grossman, D. A., Frieder, O. (2004). Information retrieval: Algorithms and Heuristics. The Netherlands: Springer.