Shiffrin Symposium - Indiana University
Download
Report
Transcript Shiffrin Symposium - Indiana University
Organizing Search
Results
Susan Dumais
Microsoft Research
Sackler – May 11, 2003
Organizing Search Results
Algorithms and interfaces that improve the
effectiveness of search
Beyond ranked lists
Main goal to support search
Also information analysis and discovery
Example applications
SWISH, results classification
GridViz, results summarization
SIS, personal landmarks for context
Sackler – May 11, 2003
Searching with Information
Structured Hierarchically (SWISH)
Collaborators
Edward Cutrell, Hao Chen (Berkeley)
Key Themes
Going beyond long lists of results
Classification algorithms
UI techniques
More about it
http://research.microsoft.com /~sdumais
Sackler – May 11, 2003
Organizing Search Results
Query: “jaguar”
List Organization
=> Shopping
=> Automotive
=> Computers
=> Automotive
Sackler – May 11, 2003
SWISH Category Organization
Web Directory
LookSmart Directory Structure
~400k pages; 17k categories; 7 levels
13 top-level categories; 150 second-level categories
Top-level Categories
Automotive
Business & Finance
Computers & Internet
Entertainment & Media
Health & Fitness
Hobbies & Interests
Home & Family
People & Chat
Reference & Education
Shopping & Services
Society & Politics
Sports & Recreation
Travel & Vacations
Sackler – May 11, 2003
Buy or Sell a Car
Chat
Finance & Insurance
Magazines & Books
Maintenance & Repair
Makes, Models & Clubs
Motorcycles
New Car Showrooms
Off-Road, 4X4 & RVs
Other Auto Interests
Shows & Museums
Trucks & Tractors
Vintage & Classic
SWISH System
Combines the advantages of
Directories - Manually crafted structure but small
<~3 million pages>
Search engines - Broad coverage but limited
metadata <~3 billion pages>
Project search engine results to category structure
Two main components
Text classification models
UI for integrating search results and structure
Sackler – May 11, 2003
Context (category structure) plus focus (search results)
SWISH Architecture
Train
(offline)
manually
classified
web
pages
Sackler – May 11, 2003
Classify
(online)
SVM
model
web
search
results
local
search
results
...
Learning & Classification
Support Vector Machine (SVM)
Accurate and efficient for text classification
(Dumais et al., Joachims)
Model = weighted vector of words
“Automobile” = motorcycle, vehicle, parts, automobile, harley,
car, auto, honda, porsche …
“Computers & Internet” = rfc, software, provider, windows,
user, users, pc, hosting, os, downloads ...
Hierarchical models for LS directory
1 model for top level; N models for second
Very useful in conjunction w/ user interaction
Sackler – May 11, 2003
User Interface Experiments
List Organization
Sackler – May 11, 2003
Category Organization
No Cat Browse Hover Inline + Cat
Hover Inline Names
Names
Group Interface
Sackler – May 11, 2003
List Interface
Effect of Query Difficulty
Easy queries
are faster (p<0.01)
H
A
R
D
E
A
S
Y
Group
Sackler – May 11, 2003
H
A
R
D
E
A
S
Y
Group faster
than List (p<0.01)
Benefit is
larger for hard
queries (p<0.06)
List
SWISH: Summary and
Design Implications
Text Classification
Learn accurate category
models
Classify new web pages onthe-fly
Organize search results
User Interface
Tightly couple search results
with category structure
User manipulation of
presentation of category
structure
Sackler – May 11, 2003
GridViz
Collaborators
George Robertson, Edward Cutrell, Jeremy
Goecks (Georgia Tech)
Key Themes
Abstract beyond individual results
Highly interactive interface to support
understanding of trends and relationships
More about it
http://research.microsoft.com/~sdumais
Sackler – May 11, 2003
GridViz
Summarize the results of a search
Grid-based design
Axes represent topic, time, people
Cells encode frequency, recency
Supports activities like:
What newsgroups are active (on topic x)?
What people are active, authoritative (on topic x)?
When did I last interact w/ people?
Sackler – May 11, 2003
GridViz Demo
Sackler – May 11, 2003
User Interface Experiments
List View
GridViz
40
35
30
25
20
15
10
5
0
G rid V iz
Sackler – May 11, 2003
L is t -v ie w
GridViz Summary
Abstracting beyond individual results
Highly interactive interface
Grid-based design
Axes represent people, topic, time
Cells encode frequency, recency
Preliminary but promising
Sackler – May 11, 2003
Stuff I’ve Seen (SIS)
Collaborators
Edward Cutrell, Raman Sarin, JJ Cadiz,
Gavin Jancke, Daniel Robbins, Merrie Ringel
(Stanford)
Key Themes
Your content
Information re-use
Integration across sources
More about it
… internal for now
Sackler – May 11, 2003
Search Today …
Many locations, interfaces for
finding things (e.g., web, mail, local
files, help, history, intranet)
Often slow
Sackler – May 11, 2003
Search with SIS
Unified index of stuff you’ve seen
Unify access to information regardless of source –
mail, archives, calendar, files, web pages, etc.
Full-text index of content plus metadata attributes
(e.g., creation time, author, title, size)
Automatic and immediate update of index
Rich UI possibilities, since it’s your content
Architecture
Client side indexing and storage
Built using MS Search components
Sackler – May 11, 2003
SIS Demo
Sackler – May 11, 2003
SIS Alpha Observations
800+ internal users
Usage logs (incl different interfaces), survey data
File types opened
76% Email
14% Web pages
10% Files
Age of items accessed
7% today
22% within the last week
46% within the last month
Sackler – May 11, 2003
120
100
Frequency
Item Access Distribution
80
60
40
20
0
0
500
1000
1500
2000
Days Since Item First Seen
2500
SIS Alpha Observations
Use of other search tools
Importance of people
Non-SIS search for web,
email, and files decreases
25% of the queries involve
people’s names
Importance of time
Date by far the most
popular sort field, followed
by rank, author, title
Sackler – May 11, 2003
Even when rank is the default
6
5
Pre-usage
Post-usage
4
3
2
1
0
Files
Email
Web Pages
SIS UI Innovations
Timeline w/ Landmarks
Importance of time
Timeline
interface
Contextualize results using
important landmarks as pointers
into human memory
General: holidays, world events
Personal: important photos, appointments
Sackler – May 11, 2003
Milestones in Time Demo
Sackler – May 11, 2003
Milestones in Timeline
30
Search Time (s)
25
20
15
10
5
0
Landmarks + Dates
Sackler – May 11, 2003
Dates Only
SIS Summary
Unified index of stuff you’ve seen
Fast access to full-text and metadata, from
heterogeneous sources
Automatic and immediate update of index
Rich UI possibilities
Next steps
Better support for tagging -> “flatland”
Implicit queries for finding related info, and
identifying “Stuff I Should See”
Integration with richer activity-based info, Eve
Sackler – May 11, 2003
Organizinging Search Results
Algorithms and interfaces to improve search
Examples and key themes
Important attributes: People, topics, time
Interaction
Evaluation
More information
SWISH … grouping
GridViz … abstraction
SIS … personal content and landmarks
Also
Use structure and context
http://research.microsoft.com/~sdumais
[email protected]
Christopher Lee of (SIG)IR …
http://www.cdvp.dcu.ie/SIGIR/index.html
Sackler – May 11, 2003