Making the Business Case for Taxonomy
Download
Report
Transcript Making the Business Case for Taxonomy
Taxonomy Strategies LLC
Taxonomy Testing & Usability
Joseph A. Busch
March 25, 2006
Copyright 2006 Taxonomy Strategies LLC. All rights reserved.
Agenda
Qualitative methods
Quantitative methods
Taxonomy Strategies LLC The business of organized information
2
Qualitative taxonomy testing methods
Method
Process
Who
Requires
Validation
Walk-thru
Show &
explain
Taxonomist
SME
Team
Rough
taxonomy
Approach
Appropriateness to task
Walk-thru
Check
conformance
to editorial
rules
Taxonomist
Draft
taxonomy
Editorial
Rules
Consistent look and feel
Usability
Testing
Contextual
analysis (card
sorting,
scenario
testing, etc.)
Users
Rough
taxonomy
Tasks &
Answers
Tasks are completed
successfully
Time to complete task is
reduced
User
Satisfaction
Survey
Users
Rough
Taxonomy
UI Mockup
Search
prototype
Reaction to taxonomy
Reaction to new interface
Reaction to search results
Tagging
Samples
Tag sample
content with
taxonomy
Taxonomist
Team
Indexers
Sample
content
Rough
taxonomy
(or better)
Content ‘fit’
Fills out content inventory
Training materials for people &
algorithms
Basis for quantitative
methods
Taxonomy Strategies LLC The business of organized information
3
Walk-through method—
Show & explain
ABC Computers.com
Content
Type
Competency
Industry
Service
Award
Case Study
Contract &
Warranty
Demo
Magazine
News & Event
Product
Information
Services
Solution
Specification
Technical Note
Tool
Training
White Paper
Other Content
Type
Business &
Finance
Interpersonal
Development
IT Professionals
Technical
Training
IT Professionals
Training &
Certification
PC Productivity
Personal
Computing
Proficiency
Banking &
Finance
Communications
E-Business
Education
Government
Healthcare
Hospitality
Manufacturing
Petrochemocals
Retail /
Wholesale
Technology
Transportation
Other
Industries
Assessment,
Design &
Implementati
on
Deployment
Enterprise
Support
Client Support
Managed
Lifecycle
Asset
Recovery &
Recycling
Training
Taxonomy Strategies LLC The business of organized information
Product
Family
Desktops
MP3 Players
Monitors
Networking
Notebooks
Printers
Projectors
Servers
Services
Storage
Televisions
Non-Dell
Brands
Audience
Line of
Business
RegionCountry
All
Business
Dell Employee
Education
Gaming
Enthusiast
Home
Investor
Job Seeker
Media
Partner
Shopper
First Time
Experienced
Advanced
Supplier
All
Home & Home
Office
Gaming
Government,
Education &
Healthcare
Medium &
Large
Business
Small Business
All
Asia-Pacific
Canada
Dell EMEA
Japan
Latin America &
Caribbean
United States
4
Walk-through method—
Editorial rules consistency check
Abbreviations
Ampersands
Capitalization
General…, More…, Other…
Languages & character sets
Length limits
Multiple parents
Plural vs. singular form
Scope notes
Serial comma
Sources of terms
Spaces
Synonyms & acronyms
Term order (Alphabetic or …)
Term label order (Direct vs.
inverted)
Rule Name
Abbreviations
Abbreviations, other than colloquial
terms and acronyms, shall not be used
in term labels.
Example:
Public Information
NOT:
Public Info.
Ampersands
The ampersand [&] character shall be
used instead of the word ‘and’.
Example:
Licensing & Compliance
NOT:
Licensing and Compliance
Capitalization
Title case capitalization shall be used.
Example: Customer Service
NOT:
CUSTOMER SERVICE
NOT:
Customer service
NOT:
customer service
General…,
More…,
Other…
The term labels “General…”, “More…”,
and “Other…” shall be used for
categories which contain content items
that are not further classifiable.
Example:
“Other Property”
“Other Services”
“General Information”
“General Audience”
…
…
…
Taxonomy Strategies LLC The business of organized information
Editorial Rule
5
Usability testing method—
Task-based card sorting (1)
15 representative questions were selected
Perspective of various organizational units
Most frequent website searches
Most frequently accessed website content
Correct answers to the questions were agreed in advance by team.
15 users were tested
Did not work for the organization
Represented target audiences
Testers were asked “where would you look for …”
“under which facet… Topic, Commodity, or Geography?”
Then, “… under which category?”
Then, “…under which sub-category?”
Tester choices were recorded
Testers were asked to “think aloud”
Notes were taken on what they said
Pre- and post questions were asked
Tester answers were recorded
Taxonomy Strategies LLC The business of organized information
6
Usability testing method—
Task-based card sorting (2)
3. What is the average
farm income level in
your state?
1. Topics
2. Commodities
3. Geographic Coverage
1.
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
1.10
Topics
Agricultural Economy
Agriculture-Related Policy
Diet, Health & Safety
Farm Financial
Conditions
Farm Practices &
Management
Food & Agricultural
Industries
Food & Nutrition
Assistance
Natural Resources &
Environment
Rural Economy
Trade & International
Markets
Taxonomy Strategies LLC The business of organized information
1.4
1.4.1
1.4.2
1.4.3
1.4.4
1.4.5
1.4.6
1.4.7
Farm Financial
Conditions
Costs of Production
Commodity Outlook
Farm Financial
Management &
Performance
Farm Income
Farm Household
Financial Well-being
Lenders & Financial
Markets
Taxes
7
Analysis of task-based card sorting (1)
Find-it Tasks
User 1
User 2
User 3
User 4
User 5
1. Cotton
Cotton
Cotton
Asia
Cotton
Cotton
2. Mad cow
Cattle
Food Safety
Cattle
Cattle
Cattle
3. Farm income
Farm Income
Farm Income
US States
Farm Income
Farm Income
4. Fast food
Food
Consumption
Diet Quality &
Nutrition
Food
Expenditures
Diet Quality &
Nutrition
Diet Quality &
Nutrition
5. WIC
WIC Program
WIC Program
WIC Program
WIC Program
WIC Program
6. GE Corn
Corn
Corn
Corn
Corn
Corn
7. Foodborne illness
Foodborne
Disease
Foodborne
Disease
Consumer Food
Safety
Foodborne
Disease
Foodborne
Disease
Retailing &
Wholesaling
8. Food costs
Food Prices
Market Structure
Market Analysis
Food
Expenditures
9. Tobacco
Tobacco
Tobacco
Tobacco
Tobacco
Tobacco
10. Small Farms
Farm Structure
Farm Structure
Farm Structure
Farm Structure
Farm Structure
11. Traceability
Food System
Labeling Policy
Food Safety
Innovations
Food Safety
Policy
Food Prices
12. Hunger
Food Security
Food Security
Food Security
Food Security
Food Security
13. Trade balance
Commodity
Trade
Trade & Intl
Markets
Commodity
Trade
Market Analysis
Commodity
Trade
14. Conservations
Cropping
Practices
Conservation
Policy
Conservation
Policy
Conservation
Policy
Conservation
Policy
Trade Policy
Food Safety &
Trade
Market Analysis
Commodity
Trade
15. Trade restrictions
WTO
Analysis of task-based card sorting (2)
In 80% of the trials users looked for information under the
categories that we expected them to look for it.
Breaking-up topics into facets makes it easier to find
information, especially information related to
commodities.
Taxonomy Strategies LLC The business of organized information
9
Analysis of task-based card sorting (3)
Test Questions
%
Correct
%
Agree
1. Cotton
91%
82%
2. Mad cow
73%
64%
100%
55%
91%
73%
5. WIC
100%
100%
6. GE corn
100%
100%
7. Foodborne illness
82%
82%
8. Food costs
55%
27%
100%
100%
10. Small farms
91%
91%
11. Traceability
36%
18%
100%
73%
13. Trade balance
36%
64%
14. Conservation
91%
91%
15. Trade restrictions
55%
36%
3. Farm income
4. Fast food
9. Tobacco
12. Hunger
Taxonomy Strategies LLC The business of organized information
Possible change required.
Change required.
Policy of “Traceability” needs to be clarified.
Use quasi-synonyms.
On these trials, only 50% looked in the right
category, & only 27-36% agreed on the
category.
Possible error in categorization of this
question because 64% thought the answer
should be “Commodity Trade.”
10
User satisfaction method—
Card Sort Questionnaire (1)
Was it easy, medium or difficult to choose the appropriate
Topic?
– Easy
– Medium
– Difficult
Was it easy, medium or difficult to choose the appropriate
Commodity?
– Easy
– Medium
– Difficult
Was it easy, medium or difficult to choose the appropriate
Geographic Coverage?
– Easy
– Medium
– Difficult
Taxonomy Strategies LLC The business of organized information
11
User satisfaction method—
Card Sort Questionnaire (2)
More Difficult
Easier
Difficult
1.50
-->
1.00
Easy
2.00
0.50
Topic
Commodity
Geography
Facet
Taxonomy Strategies LLC The business of organized information
12
User interface survey—
Which search UI is ‘better’?
Criteria
User satisfaction
Success completing tasks
Confidence in results
Fewer dead ends
Methodology
Design tasks from specific to
general
Time performance
Calculate success rates
Survey subjective criteria
Pay attention to survey
hygiene:
–
–
–
Participant selection
Counterbalancing
T-scores
Source: Yee, Swearingen, Li, & Hearst
Taxonomy Strategies LLC The business of organized information
13
User interface survey — Results (1)
Which Interface would you rather use for these tasks?
Find images of roses
Google-like
Baseline
Faceted
Category
15
16
Find all works from a certain period
2
30
Find pictures by 2 artists in the same media
1
29
…
Overall assessment:
Google-like
Baseline
Faceted
Category
More useful for your usual tasks
4
28
Easiest to use
8
23
Most flexible
6
24
28
3
Helped you learn more
1
31
Overall preference
2
29
More likely to result in dead-ends
…
Source: Yee, Swearingen, Li, & Hearst
Taxonomy Strategies LLC The business of organized information
14
User interface survey — Results (2)
9
8
7
6
5
4
3
2
1
0
y
s
a
E
7.6
7.7
7.2
6.7
6.0
6.3
4.7
5.8
7.8
7.4
6.0
5.5
4.8
4.0
4.6
3.5
to
e
Us
m
Si
e
pl
e
Fl
le
b
i
x
ou
i
d
e
T
Google-like Baseline
Faceted Category
Taxonomy Strategies LLC The business of organized information
s
In
re
te
in
st
g
Ea
sy
to
ow
r
B
se
le
b
a
oy
j
En
O
rw
e
v
lm
e
h
g
in
Source: Yee, Swearingen, Li, & Hearst
15
Tagging samples—
How many items?
Goal
Illustrate metadata schema
Number of
Items
1-3
Criteria
Random (excluding junk)
Develop training
documentation
10-20
Show typical & unusual
cases
Qualitative test of small
vocabulary (<100 categories)
25-50
Random (excluding junk)
3-10X
number of
categories
Use computer-assisted
methods when more than
10-20 categories. Preexisting metadata is the
most meaningful.
Quantitative test of
vocabularies *
* Quantitative methods require large amounts of tagged content. This requires
specialists, or software, to do tagging. Results may be very different than how
“real” users would categorize content.
Taxonomy Strategies LLC The business of organized information
16
Tagging samples—
Manually tagged metadata sample
Attribute
Values
Title
Jupiter’s Ring System
URL
http://ringmaster.arc.nasa.gov/jupiter/
Description
Overview of the Jupiter ring system. Many images,
animations and references are included for both the
scientist and the public.
Content Types
Web Sites; Animations; Images; Reference Sources
Audiences
Educators; Students
Organizations
Ames Research Center
Missions & Projects
Voyager; Galileo; Cassini; Hubble Space Telescope
Locations
Jupiter
Business Functions
Scientific and Technical Information
Disciplines
Planetary and Lunar Science
Time Period
1979-1999
Taxonomy Strategies LLC The business of organized information
17
Tagging samples—
Spreadsheet for tagging 10’s-100’s of items
1) Clickable URLs for sample content
2) Review small sample and describe
3) Drop-down for tagging (including
‘Other’ entry for the unexpected
4) Flag questions
Taxonomy Strategies LLC The business of organized information
18
Rough Bulk Tagging—
Facet Demo (1)
Collections: 4 content sources
NTRS, SIRTF, Webb, Lessons Learned
Taxonomy
Converted MultiTes format into RDF for Seamark
Metadata
Converted from existing metadata on web pages, or
Created using simple automatic classifier (string matching with
terms & synonyms)
250k items, ~12 metadata fields, 1.5 weeks effort
OOTB Seamark user interface, plus logo
Taxonomy Strategies LLC The business of organized information
19
Rough Bulk Tagging—
OOTB Facet Demo (2)
Taxonomy Strategies LLC The business of organized information
20
Agenda
Qualitative methods
Quantitative methods
Taxonomy Strategies LLC The business of organized information
21
How evenly does it divide the content?
Documents do not distribute uniformly across categories
Zipf (1/x) distribution is expected behavior
80/20 rule in action (actually 70/20 rule)
Measured v Expected Distribution of Top 10 Content Types in
Library of Congress Database
Leading candidate for
splitting
Number of Records
350,000
300,000
250,000
Leading candidates
for merging
200,000
150,000
100,000
50,000
s
tic
St
at
is
bl
io
gr
ap
hy
Bi
er
at
ur
e
lit
itio
ns
Ju
ve
ni
le
Ex
hi
b
ct
io
n
Fi
ap
s
M
ca
ls
Pe
rio
di
og
ra
ph
y
Bi
Co
ng
re
ss
es
0
Top 10 Content Types
Taxonomy Strategies LLC The business of organized information
22
How evenly does it divide the content?
Methodology: 115 randomly selected URLs from corporate intranet
search index were manually categorized. Inaccessible files and ‘junk’
were removed.
Results: Slightly more uniform than Zipf distribution. Above the curve
is better than expected.
Measured v Expected Intranet Content Type Distribution
25
# Documents
20
15
10
5
Programs,
Proposals, Plans
& Schedules
Other &
Unclassified
Papers &
Presentations
Regulations,
Policies,
Procedures &
Templates
Marketing &
Sales
Operations &
Internal
Communications
Manuals &
Learning
Materials
News & Events
People, Groups
& Places
0
Content Type
Taxonomy Strategies LLC The business of organized information
23
How intuitive (repeatable) are the
categorizations?
Methodology: Closed Card Sort
For alpha test of a grocery site
15 Testers put each of 71 best-selling product types into one of 10
pre-defined categories
Categories where fewer than 14 of 15 testers put product into
same category were flagged
Taxonomy Strategies LLC The business of organized information
24
How intuitive (repeatable) are the
categorizations?
Taxonomy Strategies LLC The business of organized information
25
How intuitive (repeatable) are the
categorizations?
% of Testers
Cumulative % of
Products
With Poly-Hierarchy
15/15
54%
69%
14/15
70%
83%
13/15
77%
93%
12/15
83%
100%
11/15
85%
100%
<11/15
100%
100%
Taxonomy Strategies LLC The business of organized information
26
How does taxonomy “shape” match that of
content?
Background:
Hierarchical taxonomies allow
comparison of “fit” between content
and taxonomy areas
Methodology:
25,380 resources tagged with
taxonomy of 179 terms. (Avg. of 2
terms per resource)
Counts of terms and documents
summed within taxonomy hierarchy
Results:
Roughly Zipf distributed (top 20
terms: 79%; top 30 terms: 87%)
Mismatches between term% and
document% flagged
Term Group
%
Terms
%
Docs
Administrators
7.8
15.8
Community Groups
2.8
1.8
Counselors
3.4
1.4
Federal Funds Recipients and
Applicants
9.5
34.4
Librarians
2.8
1.1
News Media
0.6
3.1
Other
7.3
2.0
Parents and Families
2.8
6.0
Policymakers
4.5
11.5
Researchers
2.2
3.6
School Support Staff
2.2
0.2
Student Financial Aid Providers
1.7
0.7
Students
27.4
7.0
Teachers
25.1
11.4
Source: Courtesy Keith Stubbs, US. Dept. of Ed.
Taxonomy Strategies LLC The business of organized information
27
Pop Quiz
What is the #1 underused source of quantitative
information on how to improve your taxonomy?
Query Logs & Click Trails
Taxonomy Strategies LLC The business of organized information
28
Query Log & Click Trail Examination—
Who are the users & what are they looking for?
Only 30-40% of organizations regularly examine their
logs*.
Sophisticated software available, but don’t wait.
80% of value comes from basic reports
Taxonomy Strategies LLC The business of organized information
29
Query logs
UltraSeek Reporting
Top queries
Queries with no results
Queries with no click-through
Most requested documents
Query trend analysis
Complete server usage
summary
Taxonomy Strategies LLC The business of organized information
30
Click Trail Packages
iWebTrack
NetTracker
OptimalIQ
SiteCatalyst
Visitorville
WebTrends
Taxonomy Strategies LLC The business of organized information
31
Start a “Measure & Improve” mindset
Taxonomy changes do not stand alone
Search system improvements
Navigation improvements
Content improvements
Process improvements
Taxonomy Strategies LLC The business of organized information
32
Taxonomy Strategies LLC
Questions
Joseph A. Busch
[email protected]
http://ww.taxonomystrategies.com
March 25, 2006
Copyright 2006 Taxonomy Strategies LLC. All rights reserved.
Bibliography
K. Yee, K. Swearingen, K. Li, M. Hearst. "Searching and organizing:
Faceted metadata for image search and browsing." Proceedings of the
Conference on Human Factors in Computing Systems (April 2003)
http://bailando.sims.berkeley.edu/papers/flamenco-chi03.pdf
R. Daniel and J. Busch. "Benchmarking Your Search Function: A Maturity
Model.” http://www.taxonomystrategies.com/presentations/maturity-200505-17%28as-presented%29.ppt
Taxonomy Strategies LLC The business of organized information
34