How to Validate a Taxonomy

Download Report

Transcript How to Validate a Taxonomy

Taxonomy Strategies LLC
How to Validate a Taxonomy
ASIST Workshop Preview
October 7, 2009
Copyright 2009 Taxonomy Strategies LLC. All rights reserved.
What is a Taxonomy?
 A categorization framework agreed upon by business and
content owners (with the help of subject matter experts) that
will be used to tag content.
 6 broad, discrete divisions (called facets)
 2-3 levels deep.
 Up to 15 terms at each level.
 1200 terms total.
 With some logic—hierarchical, equivalent and associative relationships
between terms.
Taxonomy Strategies LLC The business of organized information
2
Why taxonomy is important
Taxonomy Strategies LLC The business of organized information
3
Why is taxonomy important?
 Easier information management.
 Flexibility to respond to changing needs.
 Foundation for findability and usability.
Taxonomy Strategies LLC The business of organized information
4
How can you know if a taxonomy will work?
 Tagging content.
 Publishing content.
 Finding and using content in user-facing applications.
Taxonomy Strategies LLC The business of organized information
5
What types of taxonomy validation can you do?
 Validating taxonomy design.
 Validating taxonomy usability.
 Validating taxonomy use in a collection.
Taxonomy Strategies LLC The business of organized information
6
Taxonomy design validation methods
Method
Process
Who
Requires
Validation
Web tool
Open Card
sort
 Representative  Rough taxonomy  Card sort analysis (emergent
users
patterns)
Cards &
markup
Delphi card
sort
 Representative  Rough taxonomy  End result (consensus)
users
Walk-thru
Show &
explain
 Stakeholders
Check
conformance
to editorial
rules
 Taxonomist
Matrix or
cards
Closed card
sort
 End users (or
surrogates)
 Rough taxonomy  Consistent results
User
satisfaction
Survey
 End users (or
surrogates)
 Draft taxonomy
 Reaction to taxonomy
Tagging
samples
Tag sample
content with
taxonomy
 Taxonomist
 Sample content
 Content “fit”
 Indexers
 Draft taxonomy
(or better)
 Fills out content inventory
Walk-thru
 Rough taxonomy  Approach
 Appropriateness to task
 Draft taxonomy
 Consistent look & feel
 Editorial rules
 Top queries, etc.
 Training materials for people &
algorithms
 Basis for quantitative methods
Taxonomy Strategies LLC The business of organized information
7
Usability validation methods
Method
Usability
Testing
Process
Requires
Validation
Task-based
 End users (or
scenario
surrogates)
testing (Find
it, Tag it, etc.)
 Draft taxonomy
 Tasks are completed successfully
 Tasks & answers
 Time to complete task is reduced
 End users (or
surrogates)
 Draft taxonomy
 Reaction to taxonomy
 UI mockup,
Search
prototype
 Reaction to new interface,
Reaction to search results
User
Survey
Satisfaction
Who
Taxonomy Strategies LLC The business of organized information
8
Collection analysis validation methods
Method
Distribution
Process
Statistical
analysis
Who
 Taxonomist
 Analyst
Requires
 Tagged
collection
Validation
 Do categories follow Zipf
distribution?
 Do % of categories equal % of
documents?
 Identify candidates for merging &
splitting.
Query log &
click trail
examination
Clustering &
statistical
analysis
 Taxonomist
 Search logs
 Analyst
 Web analytics
Taxonomy Strategies LLC The business of organized information
 Do (clustered) queries follow Zipf
distribution?
 Are top (clustered) queries in
taxonomy?
9
Full-day taxonomy validation workshop agenda
Lectures
9:00-9:30
9:30-10:00
Hands-on Exercises
30 min Introduction
30 min Taxonomy & validation overview
10:00-10:30
30 min Taxonomy design validation
10:30-10:45
15 min Coffee Break
10:45-11:30
45 min Delphi card sorting exercise
11:30-12:00
30 min Closed card sorting
12:00-1:00
Breaks
60 min Lunch
1:00-1:45
45 min Closed card sorting exercise
1:45-2:15
30 min Online tools
2:15-2:45
30 Min Task-based validation methods
2:45-3:00
15 min Coffee break
3:00-3:45
45 min Task-based validation exercise
3:45-4:15
30 min Collection analysis overview
4:15-5:00
45 min Q&A, Closing
Taxonomy Strategies LLC The business of organized information
10
Closed card sorting case study
 Project: Develop Substance Abuse and Mental Health Services
Administration (SAMHSA) taxonomy to be the backbone of a
unified agency information service that collects, integrates,
and reports on customer interaction data in ways that help the
agency plan health communications products, messages and
outreach.
 Task: Sort popular queries (words and phrases) from search
logs into the most likely Taxonomy facet.
Taxonomy Strategies LLC The business of organized information
11
Term sorting data collection form
Term
Alcohol
Anger
Anxiety
Autism
Binge Eating
Bipolar Disorder
Change Management
Child
Co-occurring Disorders
Counseling
Depression
Fulton County
Health Literacy
Immigrant Adolescents
Inhalants
Low Income
Methadone
Native American Culture
Ohio
Opium
Posters
PTSD
Residential Treatment
Rohypnol
Smoking
Stress
Substance Abuse
Suicide
University Students
Vicodin
Webcast
Content
Type
Audience
Population
Groups
Taxonomy Strategies LLC The business of organized information
Substances
Conditions &
Disorders
Intervention &
Treatment
Topics
Professional &
Research Topics
Geographic
Locations
12
Summary of term sorting results
Correct category
Term
Alcohol
Anger
Anxiety
Autism
Binge Eating
Bipolar Disorder
Change Management
Child
Co-occurring Disorders
Counseling
Depression
Fulton County
Health Literacy
Immigrant Adolescents
Inhalants
Low Income
Methadone
Native American Culture
Ohio
Opium
Posters
PTSD
Residential Treatment
Rohypnol
Smoking
Stress
Substance Abuse
Suicide
University Students
Vicodin
Webcast
Frequently chosen related category
Content
Type
Audience
1
1
9
1
2
1
9
1
Frequently chosen incorrect category
Population
Conditions & Intervention &
Professional & Geographic
Groups
Substances Disorders Treatment Topics Research Topics Locations None
1
10
1
9
1
2
12
12
10
1
1
12
2
9
1
12
11
1
9
3
12
12
12
12
10
1
1
9
2
10
2
7
4
12
12
1
1
12
1
11
10
1
1
5
3
1
2
10
1
1
1
1
4
1
3
7
1
3
12
11
1
1
2
Taxonomy Strategies LLC The business of organized information
13
Percentage of popular search terms
sorted correctly
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Taxonomy Strategies LLC The business of organized information
14
Blind sorting of popular search terms
(n=12)
Results: Excellent
50-60%
(7%)
25-50%
(6%)
< 25%
(3%)
84% of terms
were correctly
sorted 60-100%
of the time.
Difficulties
 For Methadone, confusion when, in this case, a substance is a treatment.
 For general terms such as Smoking, Substance Abuse and Suicide, confusion
about whether these are Conditions or Research topics.
Taxonomy Strategies LLC The business of organized information
15
Search terms sorting task user rating
(n=12)
Medium
33%
Easy
42%
Easy/Medium
25%
No one rated the task difficult!
Taxonomy Strategies LLC The business of organized information
16
Task-based case study
 Project: Develop American Society of Civil Engineers (ASCE)
taxonomy to draw from existing databases and content
sources, and provision the next generation websites being
built to support this non-profit, membership association’s
activities.
 Task: Find specific content (web pages)
Taxonomy Strategies LLC The business of organized information
17
Find web pages
ASCE Continuing Education
http://www.asce.org/conted/
T Topics
A Audiences
C Content Types
E
Event Types
L
Locations
T.1
T.2
T.3
T.4
T.5
T.6
O Organizations
T
Topics
T.7
T.8
T.9
T.10
T.11
T.12
T.13
T.14
T.15
T.16
T.17
Architectural Engineering
Coasts & waterways
Construction
Cross-Cutting Topics
Disaster & Hazard
Management
Education & Career
Development
Engineering Mechanics
Energy
Environment
Geotechnical Engineering
People, Projects & Heritage
Planning & Development
Professional Issues
Project Management
Structural Engineering
Transportation
Water & Wastewater
Taxonomy Strategies LLC The business of organized information
T.6 Education & Career
Development
T.6.1 Continuing Education
T.6.2 Engineering Education
T.6.3 Management &
Professional Development
T.6.4 Scholarships, Internships
& Competitions
18
Summary of navigation results trial
Correct category
Trial
Frequently chosen related category
Frequently chosen incorrect category
JF
KSS
MJ
KS
IM
VM
KSM
GSB
KCH
1
2
3
4
5
6
7
8
9
C>A>T
C>E
Key
Navigation
ASCE Continuing Education
A>C>T
Audience
A>O>C
A.27
Content Type
C.12
Event Type
E.4
E.4
E.4
E.4
Location
Organization
Topic
Difficulty
T.6.1
Difficult
T.6.1
Easy
Difficult
Easy
Easy
Difficult Medium
T.6.1
Easy
Medium Difficulty
Gave up
Taxonomy Strategies LLC The business of organized information
19
Overall navigation task performance
Incorrect
9%
(n=54)
Gave Up
4%
Correct
41%
Alternative
46%
 87% navigated as predicted or used a reasonable alternative.
 In only 4% of the trials, did the subject give up.
Taxonomy Strategies LLC The business of organized information
20
Overall user rating of navigation task
(n=9)
Easy
33%
Medium
67%
No one rated the overall task Difficult!
Taxonomy Strategies LLC The business of organized information
21
Taxonomy Strategies LLC
Questions
ASIST Annual Meeting - Full-Day Workshop
November 7, Vancouver, BC
http://www.asist.org/Conferences/AM09/taxonomy.html
Gilbane Conference - Half-Day Workshop
December 1, Boston, MA
http://gilbaneboston.com/workshops.html#workshop
Joseph A. Busch
[email protected]
http://ww.taxonomystrategies.com
October 7, 2009
Copyright 2009 Taxonomy Strategies LLC. All rights reserved.