Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

Download Report

Transcript Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003

Content Metadata and Search
Remarks to the Dublin Core Workshop
Marti Hearst
SIMS, UC Berkeley
September 28, 2003
Resource Finding and the Web
• Web search vs. collection search
– When a single page is all that’s needed,
web search is fine
• Although validity is an issue
– Unsolved problem:
• How to make source-focused search more
intuitive on the web?
• One idea (untested): task-based search
M. Hearst
Faceted Metadata in Search
What about Content?
• Dublin Core takes stances on the “contentneutral” aspects of metadata
• Q: What about content?
– The Metadata Marsh
• Getting agreement on metadata terms is difficult
• Even worse when talking about content!
• A: Domain-specific solutions
– Don’t worry about cross-domain consistency (a
necessary drawback)
– Success: b-to-b protocols
M. Hearst
Faceted Metadata in Search
Hypothesis (as yet untested):
Assuming we’ve focused on a domain,
agreement on category assignment
can converge much more quickly by:
1. Focusing on the applications that will use
the category system.
2. Designing metadata to be used in
interfaces that show items represented
by many different categories in a highly
flexible, but intuitive, manner.
M. Hearst
Faceted Metadata in Search
One Example: Flamenco Project
• Goal: create intuitive, inviting search
interfaces that make use of hierarchical
faceted metadata
• Challenge: How to provide flexibility
and power without overwhelming?
(Answer: careful interface design)
M. Hearst
Faceted Metadata in Search
The Flamenco Project Team
Brycen Chun
Ame Elliott
Jennifer English
Kevin Li
Rashmi Sinha
Kirsten Swearingen
Ping Yee
http://flamenco.berkeley.edu
Research funded by:
NSF CAREER Grant IIS-9984741
IBM Faculty Fellowship
6
Our Approach
• Integrate the search seamlessly into the
information architecture.
– Use proper HCI methodologies.
• Use faceted metadata:
– More flexible than canned hyperlinks
– Less complex than full search
– Help users see where to go next and return to
what happened previously
• What’s new?
– Putting hierarchical facets into a useable interface.
M. Hearst
Faceted Metadata in Search
Metadata: data about data
Facets: orthogonal categories
GeoRegion
M. Hearst
+ Time/Date
+
Topic
Faceted Metadata in Search
Hierarchical Faceted Metadata
Example: Biological Subject Headings
1. Anatomy [A]
2. Organisms [B]
3. Diseases [C]
4. Chemicals and Drugs [D]
5. Analytical, Diagnostic and Therapeutic Techniques and Equipment [E]
6. Psychiatry and Psychology [F]
7. Biological Sciences [G]
8. Physical Sciences [H]
9. Anthropology, Education, Sociology and Social Phenomena [I]
10. Technology and Food and Beverages [J]
11. Humanities [K]
12. Information Science [L]
13. Persons [M]
14. Health Care [N]
15. Geographic Locations [Z]
M. Hearst
Faceted Metadata in Search
Hierarchical Faced Metadata
1. Anatomy [A]
Body Regions [A01]
2. [B]
Musculoskeletal System [A02]
3. [C]
Digestive System [A03]
4. [D]
Respiratory System [A04]
5. [E]
Urogenital System [A05]
6. [F]
……
7. [G]
8. Physical Sciences [H]
9. [I]
10. [J]
11. [K]
12. [L]
13. [M]
M. Hearst
Faceted Metadata in Search
Hierarchical Faceted Metadata
1. Anatomy [A]
Body Regions [A01]
2. [B]
Musculoskeletal System [A02]
3. [C]
Digestive System [A03]
4. [D]
Respiratory System [A04]
5. [E]
Urogenital System [A05]
6. [F]
……
7. [G]
8. Physical Sciences [H]
9. [I]
10. [J]
11. [K]
12. [L]
13. [M]
M. Hearst
Abdomen [A01.047]
Back [A01.176]
Breast [A01.236]
Extremities [A01.378]
Head [A01.456]
Neck [A01.598]
….
Faceted Metadata in Search
Hierarchical Faceted Metadata
1. Anatomy [A]
Body Regions [A01]
2. [B]
Musculoskeletal System [A02]
3. [C]
Digestive System [A03]
4. [D]
Respiratory System [A04]
5. [E]
Urogenital System [A05]
6. [F]
……
7. [G]
8. Physical Sciences [H]
Electronics
9. [I]
Astronomy
10. [J]
Nature
11. [K]
Time
12. [L]
Weights and Measures
13. [M]
….
M. Hearst
Abdomen [A01.047]
Back [A01.176]
Breast [A01.236]
Extremities [A01.378]
Head [A01.456]
Neck [A01.598]
….
Faceted Metadata in Search
Hierarchical Faceted Metadata
1. Anatomy [A]
Body Regions [A01]
Abdomen [A01.047]
2. [B]
Musculoskeletal System [A02]
Back [A01.176]
3. [C]
Digestive System [A03]
Breast [A01.236]
4. [D]
Respiratory System [A04]
Extremities [A01.378]
5. [E]
Urogenital System [A05]
Head [A01.456]
6. [F]
……
Neck [A01.598]
7. [G]
….
8. Physical Sciences [H]
Electronics
Amplifiers
9. [I]
Astronomy
Electronics, Medical
10. [J]
Nature
Transducers
11. [K]
Time
12. [L]
Weights and Measures
13. [M]
….
M. Hearst
Faceted Metadata in Search
Hierarchical Faceted Metadata
1. Anatomy [A]
Body Regions [A01]
Abdomen [A01.047]
2. [B]
Musculoskeletal System [A02]
Back [A01.176]
3. [C]
Digestive System [A03]
Breast [A01.236]
4. [D]
Respiratory System [A04]
Extremities [A01.378]
5. [E]
Urogenital System [A05]
Head [A01.456]
6. [F]
……
Neck [A01.598]
7. [G]
….
8. Physical Sciences [H]
Electronics
Amplifiers
9. [I]
Astronomy
Electronics, Medical
10. [J]
Nature
Transducers
11. [K]
Time
12. [L]
Weights and Measures
Calibration
13. [M]
….
Metric System
Reference Standard
M. Hearst
Faceted Metadata in Search
The Interface Design
• Chess metaphor
– Opening
– Middle game
– End game
M. Hearst
Faceted Metadata in Search
M. Hearst
Faceted Metadata in Search
M. Hearst
Faceted Metadata in Search
M. Hearst
Faceted Metadata in Search
M. Hearst
Faceted Metadata in Search
M. Hearst
Faceted Metadata in Search
M. Hearst
Faceted Metadata in Search
M. Hearst
Faceted Metadata in Search
M. Hearst
Faceted Metadata in Search
M. Hearst
Faceted Metadata in Search
The Interface Design
• Tightly Integrated Search
• Supports Expand as well as Refine
• Dynamically Generated Pages
– Paths can be taken in any order
– Links are idempotent
• Consistent Color Coding
• Consistent Backup and Bookmarking
• Standard HTML
– No javascript
M. Hearst
Faceted Metadata in Search
What is Tricky About This?
• It is easy to do it poorly
– Yahoo directory structure
• It is hard to be not overwhelming
– Most users prefer simplicity unless
complexity really makes a difference
• It is hard to “make it flow”
– Can it feel like “browsing the shelves”?
– Yes, but we iterated the design 3 times
M. Hearst
Faceted Metadata in Search
Usability Study
• Participants & Collection
– 32 Art History Students
– ~35,000 images from SF Fine Arts Museum
• Study Design
– Within-subjects
• Each participant sees both interfaces
• Balanced in terms of order and tasks
– Participants assess each interface after use
– Afterwards they compare them directly
• Data recorded in behavior logs, server logs, papersurveys; one or two experienced testers at each trial.
• Used 9 point Likert scales.
• Session took about 1.5 hours; pay was $15/hour
M. Hearst
Faceted Metadata in Search
The Baseline System
• Floogle
• Take the best of the existing keywordbased image search systems
M. Hearst
Faceted Metadata in Search
sword
M. Hearst
Faceted Metadata in Search
M. Hearst
Faceted Metadata in Search
M. Hearst
Faceted Metadata in Search
M. Hearst
Faceted Metadata in Search
Hypotheses
• We attempted to design tasks to test the
following hypotheses:
– Participants will experience greater search
satisfaction, feel greater confidence in the results,
produce higher recall, and encounter fewer dead
ends using FC over Baseline
– FC will perceived to be more useful and flexible
than Baseline
– Participants will feel more familiar with the
contents of the collection after using FC
– Participants will use FC to create multi-faceted
queries
M. Hearst
Faceted Metadata in Search
Four Types of Tasks
– Unstructured (3): Search for images of interest
– Structured Task (11-14): Gather materials for an
art history essay on a given topic, e.g.
• Find all woodcuts created in the US
• Choose the decade with the most
• Select one of the artists in this periods and show all of
their woodcuts
• Choose a subject depicted in these works and find
another artist who treated the same subject in a different
way.
– Structured Task (10): compare related images
• Find images by artists from 2 different countries that
depict conflict between groups.
– Unstructured (5): search for images of interest
M. Hearst
Faceted Metadata in Search
Other Points
• Participants were NOT walked through the interfaces.
• The wording of Task 2 reflected the metadata; not the
case for Task 3
• Within tasks, queries were not different in difficulty
(t’s<1.7, p >0.05 according to post-task questions)
• Flamenco is and order of magnitude slower than
Floogle on average.
– In task 2 users were allowed 3 more minutes in FC than in
Baseline.
– Time spent in tasks 2 and 3 were significantly longer in FC
(about 2 min more).
M. Hearst
Faceted Metadata in Search
Post-Interface Assessments
M. Hearst
All significant at p<.05 except simple and overwhelming
Faceted Metadata in Search
Post-Test Comparison
Which Interface Preferable For: Baseline FC
Find images of roses
Find all works from a given period
15
16
2
30
Find pictures by 2 artists in same media
1
29
4
28
8
23
6
24
28
3
1
31
2
29
M. Hearst
Faceted Metadata in Search
Post-Test Comparison
Which Interface Preferable For: Baseline FC
Find images of roses
Find all works from a given period
15
16
2
30
Find pictures by 2 artists in same media
1
29
4
28
8
23
6
24
28
3
1
31
2
29
Overall Assessment:
More useful for your tasks
Easiest to use
Most flexible
More likely to result in dead ends
Helped you learn more
Overall preference
M. Hearst
Faceted Metadata in Search
Study Results Summary
• Strongly positive results for the faceted
metadata interface.
• Moderate use of multiple facets.
• Strong preference over the current state of
the art.
– Chair of Architecture Dept: “It felt like I was
browsing the shelves!”
– This kind of enthusiasm is not seen in similaritybased image search interfaces.
• Hypotheses are supported.
M. Hearst
Faceted Metadata in Search
Study Summary
• Usability studies done on 3 collections:
– Recipes: 13,000 items
– Architecture Images: 40,000 items
– Fine Arts Images: 35,000 items
• Conclusions:
– Users like and are successful with the dynamic
faceted hierarchical metadata, especially for
browsing tasks
– Very positive results, in contrast with studies on
earlier iterations
– Note: it seems you have to care about the
contents of the collection to like the interface
M. Hearst
Faceted Metadata in Search
Advantages of the Approach
• Supports different search types
– Highly constrained known-item searches
– Open-ended, browsing tasks
– Can easily switch from one mode to the
other midstream
– Can both expand and refine
• Allows different people to add content
without breaking things
• Can make use of standard technology
M. Hearst
Faceted Metadata in Search
Metadata Availability
• Many collections already have rich
metadata associated with them.
• Automated methods are improving.
• Have applied this to:
– Tobacco documents archive
– MEDLINE
M. Hearst
Faceted Metadata in Search
Back to the Hypothesis
• This kind of tool may be helpful for
resolving metadata creation wars.
– Multiple paths to get to the same item
– Different views on different subsets of
items
– No need to force everything into one
hierarchy
• What do you think?
M. Hearst
Faceted Metadata in Search