Stuff I’ve Seen - Overview

Download Report

Transcript Stuff I’ve Seen - Overview

Stuff I’ve Seen:
A System for Personal Information
Retrieval and Re-Use
Susan Dumais
Microsoft Research
http://research.microsoft.com/~sdumais
IDM 2003 Workshop
Outline


Search today
Search with Stuff I’ve Seen (SIS)


With: Edward Cutrell, JJ Cadiz, Gavin
Jancke, Raman Sarin, Daniel Robbins
Experiences with SIS
Deployment
 Usage data
 UI innovations


Next steps for SIS
IDM 2003 Workshop
Search Today …

Many locations, interfaces for
finding things (e.g., web, mail, local
files, help, history, notes)
“… the No.1 question we're trying to
solve [in Longhorn] is ‘Where's my
stuff?’ Right now, file space on any
PC is a cesspool. “
Bill Gates, FORTUNE interview, June 23, 2002

Often slow
IDM 2003 Workshop
Search With SIS

Unified index of stuff you’ve seen




All types of information, e.g., files of all types,
email, calendar, contacts, web pages, etc.
Full-text index of content plus metadata
attributes (e.g., creation time, author, title, size)
Automatic and immediate update of index
Rich UI possibilities, since it’s your content
 Get back to information you’ve seen
 Re-use vs. initial discovery
IDM 2003 Workshop
Related Work


Several systems for improving access for specific sources
(e.g., web, mail, files, photos, music)
Some integration across sources





KFTF [Jones et al., 2002]
Lifestreams/Scopeware [Fertig, Freeman, Gelernter, 1996]
MyLife Bits [Gemmell et al., 2002]
Haystack [Adar et al., 1999; Huynh et al. 2002]
Commercial products



OS: Mac Sherlock, Windows Indexing Service
Apps: Enfish, 80-20 retriever, dtSearch, X1, etc.
What’s new with SIS …




Full content and metadata for many different sources
Extensible architecture
Usage experiences and experimental data
UI focus
IDM 2003 Workshop
SIS Architecture

Indexing infrastructure uses MS Search
components (note: IR platform)







Gatherer – interface to content sources, e.g., files,
http, MAPI
Filters – decode different file types, e.g., word,
powerpoint, html, pdf, journal
Tokenizer – break into words, including date
normalization, stemming, etc.
Indexer – standard inverted index
Retriever – Boolean, best match (Okapi)
User interface
Client side indexing and storage
IDM 2003 Workshop
SIS Design Principles

Indexing …



No additional work is required
User sees something, and it gets indexed
Retrieval …


Fast, flexible
Interactive refinement



Sort and filter on metadata
Note: Sort/filter automatically triggers query
UI experiments


Previews, Top/Side, Previews, Richer visualizations
Richer visualizations
IDM 2003 Workshop
SIS Demo
IDM 2003 Workshop
Evaluating SIS

Internal deployment



~1500 downloads
Users include: program management, test, sales,
development, administrative, executives, etc.
Research techniques





Free-form feedback
Questionnaires; Structured interviews
Usage patterns from log data
UI experiments (randomly deploy different versions)
Lab studies for richer UI (e.g., timeline, trends)
 But even here must work with users’ own content
IDM 2003 Workshop
Top vs. Side Views
Previews vs. Not
Sort By Date vs. Rank
IDM 2003 Workshop
SIS Usage Data
Detailed analysis for 234 people, 6 weeks usage
 Personal store characteristics


5k – 100k items; index <150 meg
Query characteristics



Short queries (1.59 words)
Few advanced operators or fielded search in query box (7.5%)
Frequent use of query iteration (48%)




50% refined queries involve filters – type, date most common
35% refined queries involve changes to query
13% refined queries involve re-sort
Query content


Vs. Spink et al.’s analysis of web queries
Importance of people

29% of the queries involve people’s names
IDM 2003 Workshop
SIS Usage Data, cont’d
Characteristics of items opened
 File types opened



Log(Freq) = -0.68 * log(DaysSinceSeen) + 2.02
Age of items opened




76% Email
14% Web pages
10% Files
7% today
22% within the last week
46% within the last month
Ease of finding information


Easier after SIS for web, email, files
Non-SIS search decreases for web,
email, files
IDM 2003 Workshop
120
100
Frequency

80
60
40
20
0
0
6
5
500
1000
1500
Pre-usage
2000
2500
Days Since Item First Seen
Post-usage
4
3
2
1
0
Files
Email
Web Pages
SIS Usage, cont’d
UI Usage
 Small effects of
Top/Side, Previews
 Sort order


Date by far the most
common sort field, even
for people who had Okapi
Rank as default
Importance of time
Few searches for “best”
match; many other
criteria …
IDM 2003 Workshop
3500
Number of Queries
Issued

3000
2500
Date
Rank
2000
1500
1000
500
0
Date
Rank
Starting Default Sort Column
SIS Usage, cont’d
Observations about unified access
 Metadata quality is variable





Email: rich, pretty clean
Web: little, not very useful for retrieval
Files: some, but often wrong
Human annotation: don’t depend on it …
Need abstractions, e.g., “Useful date”


Initially, used ‘date seen’
But …




Appointment, when it happens
Email and Web, seen
Files, changed
What do people remember about time?

Memory landmarks
IDM 2003 Workshop
SIS, Timeline w/ Landmarks
Timeline interface
 Augmented with landmarks as
pointers into human memory

General: holidays, world events
 Personal: important photos, appointments
 Heuristics or Bayesian models to identify
memorable events

IDM 2003 Workshop
SIS, Timeline w/ Landmarks
Distribution of Results Over Time
Search Results
Memory Landmarks
- General (world, calendar)
- Personal (appts, photos)
<linked by time to results>
IDM 2003 Workshop
SIS, Timeline Experiment
With Landmarks
Without Landmarks
Search Time (s)
30
25
20
15
10
5
0
Dates Only
IDM 2003 Workshop
Landmarks + Dates
SIS, Visualizing Trends

Summarize the results of a search


Grid-based design



Abstraction beyond individual results
Axes represent topic, time, people
Cells encode frequency, recency
Supports activities like:



What newsgroups are active (on topic x)?
What people are active, authoritative (on topic x)?
When did I last interact w/ people?
IDM 2003 Workshop
SIS, Visualizing Trends
IDM 2003 Workshop
SIS, Grid vs. List Experiment
Grid View
List View
IDM 2003 Workshop
Next Steps



Continue explorations of rich UI
Augment index with “usage” data
SIS as service, with many entry points




“Contextualize” retrieval
Retrieve using Implicit Queries
Identify Stuff I Should See
Flat-land
Good search makes filing less important
 Attributes rather than directory locations

IDM 2003 Workshop
SIS Summary

Unified index of stuff you’ve seen




Fast access to full-text and metadata
Heterogeneous content: files, email, web, etc.
Automatic and immediate update of index
Studied usage with several techniques




Ease of finding improves with SIS
Importance of people and time
Short queries, quick iteration
Novel UI to leverage personal memories

New capabilities for personal information
management

More info, http://research.microsoft.com/~sdumais
IDM 2003 Workshop
Vannevar Bush’s Vision

Consider a future device for individual use,
which is a sort of mechanized private file and
library. It needs a name, and, to coin one at
random, "memex" will do. A memex is a device
in which an individual stores all his books,
records, and communications, and which is
mechanized so that it may be consulted with
exceeding speed and flexibility. It is an enlarged
intimate supplement to his memory.
V. Bush (1945). As we may think. Atlantic Monthly, 176, July 1945, 101-108.
IDM 2003 Workshop