Stuff I’ve Seen - Overview
Download
Report
Transcript Stuff I’ve Seen - Overview
Stuff I’ve Seen:
A System for Personal Information
Retrieval and Re-Use
Susan Dumais
Microsoft Research
http://research.microsoft.com/~sdumais
IDM 2003 Workshop
Outline
Search today
Search with Stuff I’ve Seen (SIS)
With: Edward Cutrell, JJ Cadiz, Gavin
Jancke, Raman Sarin, Daniel Robbins
Experiences with SIS
Deployment
Usage data
UI innovations
Next steps for SIS
IDM 2003 Workshop
Search Today …
Many locations, interfaces for
finding things (e.g., web, mail, local
files, help, history, notes)
“… the No.1 question we're trying to
solve [in Longhorn] is ‘Where's my
stuff?’ Right now, file space on any
PC is a cesspool. “
Bill Gates, FORTUNE interview, June 23, 2002
Often slow
IDM 2003 Workshop
Search With SIS
Unified index of stuff you’ve seen
All types of information, e.g., files of all types,
email, calendar, contacts, web pages, etc.
Full-text index of content plus metadata
attributes (e.g., creation time, author, title, size)
Automatic and immediate update of index
Rich UI possibilities, since it’s your content
Get back to information you’ve seen
Re-use vs. initial discovery
IDM 2003 Workshop
Related Work
Several systems for improving access for specific sources
(e.g., web, mail, files, photos, music)
Some integration across sources
KFTF [Jones et al., 2002]
Lifestreams/Scopeware [Fertig, Freeman, Gelernter, 1996]
MyLife Bits [Gemmell et al., 2002]
Haystack [Adar et al., 1999; Huynh et al. 2002]
Commercial products
OS: Mac Sherlock, Windows Indexing Service
Apps: Enfish, 80-20 retriever, dtSearch, X1, etc.
What’s new with SIS …
Full content and metadata for many different sources
Extensible architecture
Usage experiences and experimental data
UI focus
IDM 2003 Workshop
SIS Architecture
Indexing infrastructure uses MS Search
components (note: IR platform)
Gatherer – interface to content sources, e.g., files,
http, MAPI
Filters – decode different file types, e.g., word,
powerpoint, html, pdf, journal
Tokenizer – break into words, including date
normalization, stemming, etc.
Indexer – standard inverted index
Retriever – Boolean, best match (Okapi)
User interface
Client side indexing and storage
IDM 2003 Workshop
SIS Design Principles
Indexing …
No additional work is required
User sees something, and it gets indexed
Retrieval …
Fast, flexible
Interactive refinement
Sort and filter on metadata
Note: Sort/filter automatically triggers query
UI experiments
Previews, Top/Side, Previews, Richer visualizations
Richer visualizations
IDM 2003 Workshop
SIS Demo
IDM 2003 Workshop
Evaluating SIS
Internal deployment
~1500 downloads
Users include: program management, test, sales,
development, administrative, executives, etc.
Research techniques
Free-form feedback
Questionnaires; Structured interviews
Usage patterns from log data
UI experiments (randomly deploy different versions)
Lab studies for richer UI (e.g., timeline, trends)
But even here must work with users’ own content
IDM 2003 Workshop
Top vs. Side Views
Previews vs. Not
Sort By Date vs. Rank
IDM 2003 Workshop
SIS Usage Data
Detailed analysis for 234 people, 6 weeks usage
Personal store characteristics
5k – 100k items; index <150 meg
Query characteristics
Short queries (1.59 words)
Few advanced operators or fielded search in query box (7.5%)
Frequent use of query iteration (48%)
50% refined queries involve filters – type, date most common
35% refined queries involve changes to query
13% refined queries involve re-sort
Query content
Vs. Spink et al.’s analysis of web queries
Importance of people
29% of the queries involve people’s names
IDM 2003 Workshop
SIS Usage Data, cont’d
Characteristics of items opened
File types opened
Log(Freq) = -0.68 * log(DaysSinceSeen) + 2.02
Age of items opened
76% Email
14% Web pages
10% Files
7% today
22% within the last week
46% within the last month
Ease of finding information
Easier after SIS for web, email, files
Non-SIS search decreases for web,
email, files
IDM 2003 Workshop
120
100
Frequency
80
60
40
20
0
0
6
5
500
1000
1500
Pre-usage
2000
2500
Days Since Item First Seen
Post-usage
4
3
2
1
0
Files
Email
Web Pages
SIS Usage, cont’d
UI Usage
Small effects of
Top/Side, Previews
Sort order
Date by far the most
common sort field, even
for people who had Okapi
Rank as default
Importance of time
Few searches for “best”
match; many other
criteria …
IDM 2003 Workshop
3500
Number of Queries
Issued
3000
2500
Date
Rank
2000
1500
1000
500
0
Date
Rank
Starting Default Sort Column
SIS Usage, cont’d
Observations about unified access
Metadata quality is variable
Email: rich, pretty clean
Web: little, not very useful for retrieval
Files: some, but often wrong
Human annotation: don’t depend on it …
Need abstractions, e.g., “Useful date”
Initially, used ‘date seen’
But …
Appointment, when it happens
Email and Web, seen
Files, changed
What do people remember about time?
Memory landmarks
IDM 2003 Workshop
SIS, Timeline w/ Landmarks
Timeline interface
Augmented with landmarks as
pointers into human memory
General: holidays, world events
Personal: important photos, appointments
Heuristics or Bayesian models to identify
memorable events
IDM 2003 Workshop
SIS, Timeline w/ Landmarks
Distribution of Results Over Time
Search Results
Memory Landmarks
- General (world, calendar)
- Personal (appts, photos)
<linked by time to results>
IDM 2003 Workshop
SIS, Timeline Experiment
With Landmarks
Without Landmarks
Search Time (s)
30
25
20
15
10
5
0
Dates Only
IDM 2003 Workshop
Landmarks + Dates
SIS, Visualizing Trends
Summarize the results of a search
Grid-based design
Abstraction beyond individual results
Axes represent topic, time, people
Cells encode frequency, recency
Supports activities like:
What newsgroups are active (on topic x)?
What people are active, authoritative (on topic x)?
When did I last interact w/ people?
IDM 2003 Workshop
SIS, Visualizing Trends
IDM 2003 Workshop
SIS, Grid vs. List Experiment
Grid View
List View
IDM 2003 Workshop
Next Steps
Continue explorations of rich UI
Augment index with “usage” data
SIS as service, with many entry points
“Contextualize” retrieval
Retrieve using Implicit Queries
Identify Stuff I Should See
Flat-land
Good search makes filing less important
Attributes rather than directory locations
IDM 2003 Workshop
SIS Summary
Unified index of stuff you’ve seen
Fast access to full-text and metadata
Heterogeneous content: files, email, web, etc.
Automatic and immediate update of index
Studied usage with several techniques
Ease of finding improves with SIS
Importance of people and time
Short queries, quick iteration
Novel UI to leverage personal memories
New capabilities for personal information
management
More info, http://research.microsoft.com/~sdumais
IDM 2003 Workshop
Vannevar Bush’s Vision
Consider a future device for individual use,
which is a sort of mechanized private file and
library. It needs a name, and, to coin one at
random, "memex" will do. A memex is a device
in which an individual stores all his books,
records, and communications, and which is
mechanized so that it may be consulted with
exceeding speed and flexibility. It is an enlarged
intimate supplement to his memory.
V. Bush (1945). As we may think. Atlantic Monthly, 176, July 1945, 101-108.
IDM 2003 Workshop