When/How/Why to use Grouping/Categorizing/Clustering in Search Interfaces Marti Hearst

Download Report

Transcript When/How/Why to use Grouping/Categorizing/Clustering in Search Interfaces Marti Hearst

When/How/Why to use
Grouping/Categorizing/Clustering in
Search Interfaces
Marti Hearst
January 21, 2005
1
Main Points
• Grouping search results is desirable
• However, getting good groups is difficult
• Furthermore, incorporation of groups into
interfaces has not been done well
• Good news: improvements are happening
2
Talk Outline
• Definition of categories and clusters
• Studies showing failure of clustering in
interfaces
• New developments in results grouping
3
The Need to Group
• Interviews with lay users often reveal a desire
for better organization of retrieval results
• Useful for suggesting where to look next
– People prefer links over generating search terms
– But only when the links are for what they want
• Three main approaches for text and images:
– Group items according to pre-defined categories
– Group items into automatically-created clusters
– Group items according to common keywords (new!)
Ojakaar and Spool, Users Continue After Category Links, UIETips
Newsletter, http://world.std.com/~uieweb/Articles/, 2001
4
Categories
• Human-created
– But often automatically assigned to items
• Arranged in hierarchy, network, or facets
– Can assign multiple categories to items
– Or place items within categories
• Usually restricted to a fixed set
– So help reduce the space of concepts
• Intended to be readily understandable
– To those who know the underlying domain
– Provide a novice with a conceptual structure
• There are many already made up!
• However, until recently, their use in interfaces has been
– Under-investigated
– Not met their promise
5
Clustering
• “The art of finding groups in data”
– Kaufman and Rousseeuw
• Groups are formed according to associations
and commonalities among the data’s features.
– There are dozens of algorithms, more all the time
– Most need a way of determining similarity or
difference between a pair of items
– In text clustering, documents usually represented as
a vector of weighted features which are some
transformation on the words
– Similarity between documents is a weighted measure
of feature overlap
6
Clustering
• Potential benefits:
– Find the main themes in a set of documents
• Potentially useful if the user wants a summary of the
main themes in the subcollection
• Potentially harmful if the user is interested in less
dominant themes
– More flexible than pre-defined categories
• There may be important themes that have not been
anticipated
– Disambiguate ambiguous terms
• ACL
– Clustering retrieved documents tends to group those
relevant to a complex query together
Hearst, Pedersen, Revisiting the Cluster Hypothesis, SIGIR’96
7
Scatter/Gather Clustering
• Developed at PARC in the late 80’s/early 90’s
• Top-down approach
– Start with k seeds (documents) to represent k clusters
– Each document assigned to the cluster with the most similar
seeds
• To choose the seeds:
– Cluster in a bottom-up manner
– Hierarchical agglomerative clustering
• Start with n documents, compare all by pairwise similarity,
combine the two most similar documents to make a cluster
• Now compare both clusters and individual documents to find the
most similar pair to combine
• Continue until k clusters remain
• Use the centroid of each of these as seeds
– Centroid: average of the weighted vectors
• Can recluster a cluster to produce a hierarchy of clusters
Pedersen, Cutting, Karger, Tukey, Scatter/Gather: A Cluster-based
Approach to Browsing Large Document Collections, SIGIR 1992
8
Clustering Example:
Medical Text
• Query: “mastectomy” on a breast cancer collection
• 250 documents retrieved
• Summary of cluster themes (subjective):
– prophylactic mastectomy (preventative)
– prostheses and reconstruction
– conservative vs radical surgery
– side effects of surgery
– psychological effects of surgery
• The first two clusters found themes for which there was
no corresponding MESH category
Hearst, The Use of Categories and Clusters for Organizing Retrieval
Results, in Natural Language Information Retrieval, Kluwer, 1999
9
A Clustering Failure
• Query: “implant” and “prosthesis”
• Four clusters returned:
–
–
–
–
use of implants to administer radiation dosages
complications resulting from breast implants
other issues surrounding breast implants
other kinds of prostheses
• Reclustering clusters 2 and 3 does not find cohesive
subgroups
– An examination of the documents indicates that a valid
subdivision was possible
• type of surgical procedure
• risk factors
– This seems to happen when there are too many features in
common
– Perhaps a better clustering algorithm can help in this case
10
Clustering Interface Problems
• Big problem:
– Clusters used primarily as part of a visualization
• This just doesn’t work
– Every usability study says so
– Lots of dots scattered about the screen is meaningless to
users
– There is no inherent spatial relationship among the
documents
– Need text to understand content
• Another big problem:
– Clustering images according to an approximation of visual
similarity
• This just doesn’t work
– What limited studies have been done say so
– Instead: group according to textual categories
11
Visualizing Clustering Results
• Use clustering to map the entire huge
multidimensional document space into a huge
number of small clusters.
• User dimension reduction and then project
these onto a 2D/3D graphical representation
12
Clustering Multi-Dimensional
Document Space
(image from Wise et al 95)
13
Clustering Multi-Dimensional
Document Space
(image from Wise et al 95)
14
15
(from Chen et al., JASIS 49(7))
Kohonen Feature Maps on Text
Is it useful?
• 4 Clustering Visualization Usability Studies
16
Clustering for Search Study 1
• This study compared
– a system with 2D graphical clusters
– a system with 3D graphical clusters
– a system that shows textual clusters
• Novice users
• Only textual clusters were helpful (and they
were difficult to use well)
Kleiboemer, Lazear, and Pedersen. Tailoring a retrieval system for naive
users. SDAIR’96
17
Clustering Study 2:
Kohonen Feature Maps
• Comparison: Kohonen Map and Yahoo
• Task:
– “Window shop” for interesting home page
– Repeat with other interface
• Results:
– Starting with map could repeat in Yahoo (8/11)
– Starting with Yahoo unable to repeat in map (2/14)
Chen, Houston, Sewell, Schatz, Internet Browsing and Searching: User Evaluations of Category Map and Concept
Space Techniques. JASIS 49(7): 582-603 (1998)
18
19
(Lin 92, Chen et al.
97)
Kohonen Feature Maps
Study 2 (cont.)
• Participants liked:
–
–
–
–
–
Correspondence of region size to # documents
Overview (but also wanted zoom)
Ease of jumping from one topic to another
Multiple routes to topics
Use of category and subcategory labels
Chen, Houston, Sewell, Schatz, Internet Browsing and Searching: User Evaluations of Category Map and Concept
Space Techniques. JASIS 49(7): 582-603 (1998)
20
Study 2 (cont.)
• Participants wanted:
–
–
–
–
–
–
–
–
–
•
hierarchical organization
other ordering of concepts (alphabetical)
integration of browsing and search
correspondence of color to meaning
more meaningful labels
labels at same level of abstraction
fit more labels in the given space
combined keyword and category search
multiple category assignment (sports+entertain)
(These can all be addressed with faceted hierarchical categories)
Chen, Houston, Sewell, Schatz, Internet Browsing and Searching: User Evaluations of Category Map and Concept
Space Techniques. JASIS 49(7): 582-603 (1998)
21
Clustering Study 3: NIRVE
Each rectangle is a cluster. Larger clusters closer to the “pole”. Similar clusters near one
another. Opening a cluster causes a projection that shows the titles.
22
Study 3
This study compared:
– 3D graphical clusters
– 2D graphical clusters
– textual clusters
• 15 participants, between-subject design
• Tasks
–
–
–
–
–
Locate a particular document
Locate and mark a particular document
Locate a previously marked document
Locate all clusters that discuss some topic
List more frequently represented topics
Visualization of search results: a comparative evaluation of text, 2D, and 3D interfaces
Sebrechts, Cugini, Laskowski, Vasilakis and Miller, SIGIR ‘99.
23
Study 3
• Results (time to locate targets)
–
–
–
–
Text clusters fastest
2D next
3D last
With practice (6 sessions) 2D neared text results; 3D still
slower
– Computer experts were just as fast with 3D
• Certain tasks equally fast with 2D & text
– Find particular cluster
– Find an already-marked document
• But anything involving text (e.g., find title) much faster
with text.
– Spatial location rotated, so users lost context
• Helpful viz features
– Color coding (helped text too)
– Relative vertical locations
Visualization of search results: a comparative evaluation of text, 2D, and 3D interfaces
Sebrechts, Cugini, Laskowski, Vasilakis and Miller, SIGIR ‘99.
24
Clustering Study 4
• Compared several
factors
• Findings:
– Topic effects
dominate (this is a
common finding)
– Strong difference in
results based on
spatial ability
– No difference
between librarians
and other people
– No evidence of
usefulness for the
cluster visualization
Aspect windows, 3-D visualizations, and indirect comparisons of information retrieval systems,
Swan, &Allan, SIGIR 1998.
25
Summary:
Visualizing for Search Using Clusters
• Huge 2D maps may be inappropriate focus for
information retrieval
– cannot see what the documents are about
– space is difficult to browse for IR purposes
– (tough to visualize abstract concepts)
• Perhaps more suited for pattern discovery and
gist-like overviews
26
Clustering Algorithm Problems
• Doesn’t work well if data is too homogenous
or too heterogeneous
• Often is difficult to interpret quickly
– Automatically generated labels are unintuitive and
occur at different levels of description
• Often the top-level can be ok, but the
subsequent levels are very poor
• Need a better way to handle items that fall
into more than one cluster
27
How do people want to search
and browse images?
• Ethnographic studies of people who use
images intensely find:
– Find specific objects is easy
• Find images of the Empire State Building
– Browsing is hard
• In a usability study with architects, to our
surprise we found their response to an imagebrowsing interface mock-up was they wanted
to see more text (categories).
Elliott, A. (2001). "Flamenco Image Browser: Using Metadata to Improve Image Search During Architectural
28
Design," in the Proceedings of CHI 2001.
An Alternative
• In the Flamenco project, we have shown that
hierarchical faceted metadata, paired with a
good interface, is highly effective for browsing
image collections
– Flamenco.berkeley.edu
• (But that’s a different talk)
29
Study 5: Comparing Textual Cluster
Interfaces to Category Interfaces
• DynaCat system
• Decide on important question types in an
advance
– What are the adverse effects of drug D?
– What is the prognosis for treatment T?
• Make use of MeSH categories
• Retain only those types of categories known to
be useful for this type of query.
Pratt, W., Hearst, M, and Fagan, L. A Knowledge-Based Approach to Organizing Retrieved Documents. AAAI-99
30
DynaCat Interface
Pratt, W., Hearst, M, and Fagan, L. A Knowledge-Based Approach to Organizing Retrieved Documents. AAAI-99
31
DynaCat Study
• Design
– Three queries
– 24 cancer patients
– Compared three interfaces
• ranked list, clusters, categories
• Results
– Participants strongly preferred categories
– Participants found more answers using categories
– Participants took same amount of time with all three
interfaces
Pratt, W., Hearst, M, and Fagan, L. A Knowledge-Based Approach to Organizing
Retrieved Documents. AAAI-99
32
Study 6: Categories vs. Lists
• One study found users preferred one level of categories
over lists, and were faster at finding answers
– Only 13 top-level categories shown
– Secondary-level categories not very accurate
• However, the queries appeared to be somewhat setup to
optimize the usefulness of the clusters
– Example:
•
•
•
•
Query word: “indian”
Task: find indian motorcyles
Query: “alaska”
Task: find yatching adventures in alaska
Chen, Dumais, Bringing order to the web: Automatically categorizing search results.
CHI 2000
33
What about Textual Displays of
Clusters?
• Text-based clustering is more promising
• Text-based clustering on the Web
– In the early days, Excite had a mockup on about 10
documents that pretended to do Scatter/Gather (when
it was called Architext)
• Quickly removed it and started providing standard search
– For a while NorthernLight had a clustering interface
• Didn’t really get anywhere
– The latest entry is Vivisimo
• Has a lot of problems
• BUT … there’s a new development from Vivisimo called
Clusty
• Seems to have much improved clustering and interface
34
An Analysis of Vivisimo
• Query: barcelona
• Query: dog pregnancy
35
36
37
38
An Analysis of Vivisimo
• Query: barcelona
– Hotels and Travel Guide are both at top level
– Also, Barcelona City
– But Travel Guide contains
• Hotels
• Spain, Spanish
– Not really helping to make useful distinctions
39
40
41
An Analysis of Vivisimo
• Query: pregnant dog
– What does the category pregnant mean here?
– Why does it have a subcategory of whelping, when
there is also a main category of whelping?
– And what the relationship to Pregnancy and Birth
– The pages shown don’t seem strongly related to one
another
• How to followup?
– There is a “find in clusters” box, but not very helpful
because no hints about which words might work
42
Search within Results
43
Then along came Clusty …
•
•
•
•
Announced a few months ago
Produced by Vivisimo
Much better interface
Much better clusters
44
45
46
47
48
49
Clusty Improvements
• Labels tend to be more at the same level of description
• Subcategories are more cautious, reflecting groups of
very similar documents
– Do a better job of really showing subcategories
• Nice interface touches
– Better use of color for distinguishing
– Small icons are inviting
– Incorporation of encyclopedia results high up
• Search results are better
– (Not always – pregnant dog not much better)
– Using metasearch
– May be throwing out some docs to get more distribution in
the types of results found
– Looks like they are focusing on term proximity to get more
meaningful grouping
– Don’t allow very many results
50
51
52
53
Clusty Improvements
• Doing sense disambiguation for abbreviations like ACL
– However, no good followup for how to make use of this
– E.g., to search on ACL (meaning comp ling) plus some
other concepts
– On the other hand, using multiple terms is how most
disambiguation is done now
• ACL + disambiguation
• Jaguar + prey
– So not clear if there is a net benefit
• Trying to approximate faceted queries
– Under Jaguar query, for history, show both history of band
with history of car and video game
54
55
Analysis
• Is it really helping? Or are the categories now
too general and overlapping?
• The main effect seems to be that the search
results are better due to the metasearch and
term proximity
56
57
More Analysis
• Reflects the frequency of topics in the data
– So no discussion of nukes in the Spain categories
– No discussion of hotels in the North Korea categories
– Is this good or bad? It depends.
58
Brand New Results!!
• Mika Kaki: “Findex: Search Result Categories
Help Users when Document Ranking Fails”
– To appear at CHI in April
• Two innovations:
– Used very simple method to create the groupings, so
that it is not opaque to users
• Based on frequent keywords
• Allows docs to appear in multiple categories
– Did a naturalistic, longitudinal study of use
• Other things done correctly:
– Took care to ensure good response time
– Analyzed the results in interesting ways
59
60
61
Study Design
• 16 academics
– 8F, 8M
– No CS
– Frequent searchers
• 2 months of use
• Special Log
– 3099 queries issued
– 3232 results accessed
• Two surveys (start and end)
• Google as search engine; rank order retained
62
Key Findings
(all significant)
• Category use takes almost 2 times longer
– First doc selected in 24.4 sec vs 13.7 sec
• No difference in average number of docs
opened per search (1.05 vs. 1.04)
• However, when categories used, users select
>1 doc in 28.6% of the queries (vs 13.6%)
• Num of searches without 0 result selections is
lower when the categories are used
• Median position of selected doc when:
– Using categories: 22 (sd=38)
– Just ranking:
2 (sd=8.6)
63
64
Key Findings
• Category Selections
– 1915 categories selections in 817 searches
– Used in 26.4% of the searches
– During the last 4 weeks of use, the proportion of searches
using categories stayed above the average (27-39%)
– When categories used, selected 2.3 cats on average
– Labels of selected cats used 1.9 words on average
(average in general was 1.4 words)
– Out of 15 cats (default):
• First quartile at 2nd cat
• Median at 5th
• Third quartile at 9th
65
Survey Results
• Qualitative views improved over time
• Realization that categories useful only some of the time
• Freeform responses indicate that categories useful when
queries vague, broad or ambiguous
• Second survey indicated that people felt that their
search habits began to change
–
–
–
–
Consider query formulation less than before (27%)
Use less precise search terms (45%)
Use less time to evaluate results (36%)
Use categories for evaluating results (82%)
66
67
Conclusions from Kaki Study
• Simplicity of category assignment made
groupings understandable
– (my view, not stated by them)
• Keyword-based Categories:
–
–
–
–
–
Are beneficial when result ranking fails
Find results lower in the ranking
Reduce empty results
May make it easier to access multiple results
Availability changed user querying behavior
68
Summary
• Grouping search results is desirable
– Often requested by lay users
– Very positive results for category interface
• However, till recently getting good groups is difficult
– Two main approaches:
• Predefined category sets – too hard to get, doesn’t reflect data
• Automatically created clusters – too hard to understand
– An alternative:
• Frequent keywords, overlapping categories
• Findex, and Clusty
• Finally, a believable, well-done study of category use for
search results reveals some insight!
– Not always useful, but not harmful if understandable (my
assertion) and fast
– Useful in the situations we have surmised
– Interesting result: people change behavior.
69
More Recent Attempts
• Analyzing retrieval results
– KartOO
– Grokker
http://www.kartoo.com/
http://www.groxis.com/service/grok
70
71
72
73
74
References
Chen, Houston, Sewell, and Schatz, JASIS 49(7)
Chen and Yu, Empirical studies of information visualization: a meta-analysis,
IJHCS 53(5),2000
Dumais, Cutrell, Cadiz, Jancke, Sarin and Robbins, Stuff I've Seen: A system
for personal information retrieval and re-use. SIGIR 2003.
Hearst, English, Sinha, Swearingen, Yee. Finding the Flow in Web Site Search,
CACM 45(9), 2002.
Hearst, User Interfaces and Visualization, Chapter 10 of Modern Information
Retrieval, Baeza-Yates and Rebeiro-Nato (Eds), Addison-Wesley 1999.
Johnson, Manning, Hagen, and Dorsey. Specialize Your Site's Search. Forrester
Research, (Dec. 2001), Cambridge, MA
75
References
Sebrechts, Cugini, Laskowski, Vasilakis and Miller, Visualization of search
results: a comparative evaluation of text, 2D, and 3D interfaces, SIGIR ‘99.
Swan and Allan, Aspect windows, 3-D visualizations, and indirect comparisons
of information retrieval systems, SIGIR 1998.
Yee, Swearingen, Li, Hearst, Faceted Metadata for Image Search and Browsing,
Proceedings of CHI 2003
76