Shiffrin Symposium - Indiana University

Download Report

Transcript Shiffrin Symposium - Indiana University

Organizing Search
Results
Susan Dumais
Microsoft Research
Sackler – May 11, 2003
Organizing Search Results

Algorithms and interfaces that improve the
effectiveness of search
Beyond ranked lists
 Main goal to support search
 Also information analysis and discovery


Example applications
SWISH, results classification
 GridViz, results summarization
 SIS, personal landmarks for context

Sackler – May 11, 2003
Searching with Information
Structured Hierarchically (SWISH)

Collaborators


Edward Cutrell, Hao Chen (Berkeley)
Key Themes
Going beyond long lists of results
 Classification algorithms
 UI techniques


More about it

http://research.microsoft.com /~sdumais
Sackler – May 11, 2003
Organizing Search Results
Query: “jaguar”
List Organization
=> Shopping
=> Automotive
=> Computers
=> Automotive
Sackler – May 11, 2003
SWISH Category Organization
Web Directory

LookSmart Directory Structure



~400k pages; 17k categories; 7 levels
13 top-level categories; 150 second-level categories
Top-level Categories
Automotive
Business & Finance
Computers & Internet
Entertainment & Media
Health & Fitness
Hobbies & Interests
Home & Family
People & Chat
Reference & Education
Shopping & Services
Society & Politics
Sports & Recreation
Travel & Vacations
Sackler – May 11, 2003
Buy or Sell a Car
Chat
Finance & Insurance
Magazines & Books
Maintenance & Repair
Makes, Models & Clubs
Motorcycles
New Car Showrooms
Off-Road, 4X4 & RVs
Other Auto Interests
Shows & Museums
Trucks & Tractors
Vintage & Classic
SWISH System

Combines the advantages of




Directories - Manually crafted structure but small
<~3 million pages>
Search engines - Broad coverage but limited
metadata <~3 billion pages>
Project search engine results to category structure
Two main components


Text classification models
UI for integrating search results and structure

Sackler – May 11, 2003
Context (category structure) plus focus (search results)
SWISH Architecture
Train
(offline)
manually
classified
web
pages
Sackler – May 11, 2003
Classify
(online)
SVM
model
web
search
results
local
search
results
...
Learning & Classification

Support Vector Machine (SVM)
Accurate and efficient for text classification
(Dumais et al., Joachims)
 Model = weighted vector of words




“Automobile” = motorcycle, vehicle, parts, automobile, harley,
car, auto, honda, porsche …
“Computers & Internet” = rfc, software, provider, windows,
user, users, pc, hosting, os, downloads ...
Hierarchical models for LS directory
1 model for top level; N models for second
 Very useful in conjunction w/ user interaction

Sackler – May 11, 2003
User Interface Experiments
List Organization
Sackler – May 11, 2003
Category Organization
No Cat Browse Hover Inline + Cat
Hover Inline Names
Names
Group Interface
Sackler – May 11, 2003
List Interface
Effect of Query Difficulty
Easy queries
are faster (p<0.01)

H
A
R
D
E
A
S
Y
Group
Sackler – May 11, 2003
H
A
R
D
E
A
S
Y
Group faster
than List (p<0.01)

Benefit is
larger for hard
queries (p<0.06)

List
SWISH: Summary and
Design Implications

Text Classification




Learn accurate category
models
Classify new web pages onthe-fly
Organize search results
User Interface


Tightly couple search results
with category structure
User manipulation of
presentation of category
structure
Sackler – May 11, 2003
GridViz

Collaborators


George Robertson, Edward Cutrell, Jeremy
Goecks (Georgia Tech)
Key Themes
Abstract beyond individual results
 Highly interactive interface to support
understanding of trends and relationships


More about it

http://research.microsoft.com/~sdumais
Sackler – May 11, 2003
GridViz


Summarize the results of a search
Grid-based design
Axes represent topic, time, people
 Cells encode frequency, recency


Supports activities like:
What newsgroups are active (on topic x)?
 What people are active, authoritative (on topic x)?
 When did I last interact w/ people?

Sackler – May 11, 2003
GridViz Demo
Sackler – May 11, 2003
User Interface Experiments
List View
GridViz
40
35
30
25
20
15
10
5
0
G rid V iz
Sackler – May 11, 2003
L is t -v ie w
GridViz Summary



Abstracting beyond individual results
Highly interactive interface
Grid-based design
Axes represent people, topic, time
 Cells encode frequency, recency


Preliminary but promising
Sackler – May 11, 2003
Stuff I’ve Seen (SIS)

Collaborators


Edward Cutrell, Raman Sarin, JJ Cadiz,
Gavin Jancke, Daniel Robbins, Merrie Ringel
(Stanford)
Key Themes
Your content
 Information re-use
 Integration across sources


More about it

… internal for now
Sackler – May 11, 2003
Search Today …

Many locations, interfaces for
finding things (e.g., web, mail, local
files, help, history, intranet)

Often slow
Sackler – May 11, 2003
Search with SIS

Unified index of stuff you’ve seen





Unify access to information regardless of source –
mail, archives, calendar, files, web pages, etc.
Full-text index of content plus metadata attributes
(e.g., creation time, author, title, size)
Automatic and immediate update of index
Rich UI possibilities, since it’s your content
Architecture


Client side indexing and storage
Built using MS Search components
Sackler – May 11, 2003
SIS Demo
Sackler – May 11, 2003
SIS Alpha Observations

800+ internal users


Usage logs (incl different interfaces), survey data
File types opened
76% Email
 14% Web pages
 10% Files

Age of items accessed
7% today
 22% within the last week
 46% within the last month

Sackler – May 11, 2003
120
100
Frequency

Item Access Distribution
80
60
40
20
0
0
500
1000
1500
2000
Days Since Item First Seen
2500
SIS Alpha Observations

Use of other search tools


Importance of people


Non-SIS search for web,
email, and files decreases
25% of the queries involve
people’s names
Importance of time

Date by far the most
popular sort field, followed
by rank, author, title

Sackler – May 11, 2003
Even when rank is the default
6
5
Pre-usage
Post-usage
4
3
2
1
0
Files
Email
Web Pages
SIS UI Innovations
Timeline w/ Landmarks

Importance of time
 Timeline

interface
Contextualize results using
important landmarks as pointers
into human memory
General: holidays, world events
 Personal: important photos, appointments

Sackler – May 11, 2003
Milestones in Time Demo
Sackler – May 11, 2003
Milestones in Timeline
30
Search Time (s)
25
20
15
10
5
0
Landmarks + Dates
Sackler – May 11, 2003
Dates Only
SIS Summary

Unified index of stuff you’ve seen
Fast access to full-text and metadata, from
heterogeneous sources
 Automatic and immediate update of index
 Rich UI possibilities


Next steps
Better support for tagging -> “flatland”
 Implicit queries for finding related info, and
identifying “Stuff I Should See”
 Integration with richer activity-based info, Eve

Sackler – May 11, 2003
Organizinging Search Results

Algorithms and interfaces to improve search


Examples and key themes






Important attributes: People, topics, time
Interaction
Evaluation
More information



SWISH … grouping
GridViz … abstraction
SIS … personal content and landmarks
Also


Use structure and context
http://research.microsoft.com/~sdumais
[email protected]
Christopher Lee of (SIG)IR …

http://www.cdvp.dcu.ie/SIGIR/index.html
Sackler – May 11, 2003