Content and Taxonomy Quality for the Long-Haul
Download
Report
Transcript Content and Taxonomy Quality for the Long-Haul
Taxonomy Strategies LLC
2006 Enterprise Search Summit
Taxonomy Fundamentals:
What you need to know about taxonomies (but
were afraid to ask)
May 22, 2006
Copyright 2006 Taxonomy Strategies LLC. All rights reserved.
Pop Quiz
On a blank piece of paper:
What questions did you want to have answered by
coming to today’s talks?
What new questions do you have, based on what
you’ve learned from the previous presentations?
Flag one question to be answered later.
You do NOT have to provide your name.
Please DO provide your job title, division, and either
company or company type.
Taxonomy Strategies LLC
The business of organized information
2
What this session will cover
What's involved in creating a taxonomy.
The bottom line benefits of an enterprise taxonomy.
How to calculate the ROI on taxonomy development.
How to convince managers and staff to take taxonomy
seriously, in the face of Google.
How to best implement, support, and maintain a taxonomy from
beginning to end.
How can taxonomies improve my search system? What are the
fundamental principles that dictate when to use metadata and
taxonomy to improve the overall search experience?
Taxonomy Strategies LLC
The business of organized information
3
Taxonomy issues, problems, and concerns
Enormous volumes of information within
organizations
Diversity of assets
Content and technology
Complex and IT-oriented standards
.NET, SOAP, WSDL, etc.
Limited (if any) integration with applications:
Search engines
Information management applications
Back office transaction-based systems
Analytical systems
Taxonomy Strategies LLC
The business of organized information
4
What's involved in creating a taxonomy?
A taxonomy includes:
Metadata scheme which are data fields for describing
content so that it can be found and used
Vocabularies which are collections of terms that are to
be used to fill-in some of the metadata fields
Relationships between content, fields or terms
(hierarchical, equivalence, and associative)
Taxonomy Strategies LLC
The business of organized information
5
What’s a taxonomy?
A taxonomy is not just a folder structure.
A folder structure is a view of a content collection that
can be constructed by using the taxonomy
A taxonomy is not just website navigation
Site navigation is a view of a collection of content that
can be constructed using the taxonomy
Taxonomy Strategies LLC
The business of organized information
6
How do taxonomies actually improve
search?
Input (Query) Side
“Search” using a small set of pre-defined values instead
of trying to guess what word or words might have been
used in the content.
Have synonyms mapped together so searches for “car”
and “automobile” return the same things.
Output (Results) Side
Organize search results into groups of related items.
Sorting and filtering
Refining search results
Taxonomy Strategies LLC
The business of organized information
7
Fundamentals of taxonomy ROI
Tagging content using a taxonomy is a cost, not a
benefit.
There is no benefit without exposing the tagged
content to users in some way that cuts costs or
improves revenues.
Putting taxonomy into operation requires UI changes
and/or backend system changes, as well as data
changes.
You need to determine those changes, and their
costs, as part of the ROI.
Taxonomy Strategies LLC
The business of organized information
8
Usability research—
Taxonomy compared to search results lists
“We found that users preferred a browsing oriented
interface for a browsing task, and a direct search
interface when they knew precisely what they
wanted.”
Marti Hearst (and others)
“The category interface is superior to the list interface
in both subjective and objective measures.”
Hao Chen & Susan Dumais
Taxonomy Strategies LLC
The business of organized information
9
Taxonomy compared to search result lists
Median Search Time in
Seconds
Category is
36% faster
Category is
48% faster
140
120
100
80
60
40
20
0
Category
Source: Chen & Dumais
Taxonomy Strategies LLC
The business of organized information
List
In top 20 results
Not in top 20 results
10
Time saved—
Taxonomy compared to search result lists
1 hour per day searching x 36% faster = 22 minutes
each day
22 minutes x 250 working days per year = 5500
minutes or 92 hours per year
Taxonomy Strategies LLC
The business of organized information
11
Time saved—
Taxonomy compared to search result lists
Benefit:
Service efficiency increase
Number of FOIA requests & information
calls per month
Average cost per call
$
Total FOIA & call costs per year
$ 3,600,000
Increase in productivity by browsing
information
Service costs savings per year
Taxonomy Strategies LLC
50,000
The business of organized information
6
36%
$1,296,000
12
Trusted advisers—
Taxonomy avoids costs
“The amount of time wasted in futile searching for
vital information is enormous, leading to staggering
costs …”
Sue Feldman,
Poor classification costs a 10,000 user organization
$10M each year—about $1,000 per employee.
Jakob Nielsen, useit.com
Taxonomy Strategies LLC
The business of organized information
13
Knowledge workers spend up to 2.5 hours
each day looking for information …
Communicating
Searching
Creating
… But find what they are looking for only 40% of
the time.
Source: Kit Sims Taylor
Taxonomy Strategies LLC
The business of organized information
14
Knowledge workers spend more time re-creating
existing content than creating new content
Communicating
Recreating
existing
content
25%
Searching
Creating
new
content
8%
Source: Kit Sims Taylor (cited by Sue Feldman in her original article)
Taxonomy Strategies LLC
The business of organized information
15
Cost saved by not recreating content
Benefit:
Increase in productivity
Number of employees
Average employee salary
$
Employee costs per year
$5,000,000
Increase in productivity from not recreating content
Employee cost savings per year
Taxonomy Strategies LLC
100
The business of organized information
50,000
25%
$1,250,000
16
Key Factors in ROI
Breadth
“How many people will metadata affect?”
Repeatability
“How many times a day will they use it?
Cost/Benefit
“Is this a costly effort with little or no benefits?”
Source: Todd Stephens, Dublin Core Global Corporate Circle
Taxonomy Strategies LLC
The business of organized information
17
Some common taxonomy ROI scenarios
Customer support
Cutting FOIA & information costs
Increased wed statistics (page hits)
Higher ACSI (American Customer Satisfaction Index) score
Knowledge worker productivity
Less time searching, more time working
Avoiding re-creating information that already exists
Publication catalog
Increased self-service & use
Increased productivity
Compliance
Improved regulatory compliance
Improved enforcement
Research & regulatory accountability
Higher OMB PARS (Performance & Accountability Reports)
Taxonomy Strategies LLC
The business of organized information
18
How to estimate costs—
Tagging
Consider complexity of facet and ambiguity
of content to estimate time per value.
Hier?
Avg #
values /
Item
$ / Min
Cost/
Element
Audience
N
10
0.25
2
$
0.42
$
0.21
Content Type
N
20
0.25
1
$
0.42
$
0.11
Organizational Unit
Y
50
0.5
2
$
0.42
$
0.42
Products & Services
Y
500
1.5
4
$
0.42
$
2.52
Geographic Region
Y
100
0.5
2
$
0.42
$
0.42
Broad Topics
Y
400
2
4
$
0.42
$
3.36
1080
5
15
$
7.04
TOTALS
Is this field worth the cost?
Taxonomy Facet
Typical
CV Size
Time/
Value
(min)
Estimated cost of tagging one item. This can be
reduced with automation, but cannot be eliminated.
Inspired by: Ray Luoma, BAU Solutions
Taxonomy Strategies LLC
The business of organized information
19
Sample ROI Calculations
Description
Year 1
Year 2
Year 3
Year 4
Year 5
Costs
Software Licenses/
Maintenance
$
100,000
$
15,000
$
15,000
$
15,000
$
15,000
Implementation/Support
$
200,000
$
30,000
$
30,000
$
30,000
$
30,000
Taxonomy Creation/
Maintenance
$
100,000
$
15,000
$
15,000
$
15,000
$
15,000
Legacy/Ongoing Tagging
$
703,500
$ 105,525
$
105,525
$
105,525
$
105,525
Ongoing cost of tagging due to 15% content growth.
Benefits
Productivity increases
$
-
$ 125,000
$ 1,250,000
$ 1,250,000
$ 1,250,000
Service efficiency gains
$
-
$ 129,600
$ 1,296,000
$ 1,296,000
$ 1,296,000
Yearly Net Benefits
$(1,103,500)
$
$ 2,380,475
$ 2,380,475
$ 2,380,475
Payback period
1.4
89,075
Years until Benefits = Costs
Inspired by: Todd Stephens, Dublin Core Global Corporate Circle
Taxonomy Strategies LLC
The business of organized information
20
ROI summary
Taxonomy Value Propositions
Find information faster
Avoid recreating information that already exists
Improve service
Improve regulatory compliance
Improve performance & accountability
Don’t sell “taxonomy”, sell the vision of what you want to be
able to do.
Do the calculus (costs and benefits)
Quantify the tangible & intangible benefits
Quantify the total cost of ownership including maintenance &
tagging
Support your calculations with research
Taxonomy Strategies LLC
The business of organized information
21
Three problems of taxonomy governance
The Taxonomy Problem:
How to build and maintain the lists of pre-defined values that go
into some of the metadata elements.
The Tagging Problem:
How to populate metadata elements with complete and consistent
values.
What can be expected from automatic classifiers? What kind of
error detection and error correction procedures are needed?
The ROI (Return On Investment) Problem:
How to use content, metadata, and vocabularies in applications to
obtain business benefits.
Business Goals and Cultural Factors are major influences on
tagging and taxonomy. These must be acknowledged at the
start to avoid re-work.
Taxonomy Strategies LLC
The business of organized information
22
Who should build the taxonomy?
The taxonomy (and metadata specification) should be
produced by a cross-functional team which includes
business, technical, information management, and
content creation stakeholders.
The team should plan on maintaining the taxonomy
as well as building it.
Maintenance will not (usually) be anyone’s full-time job.
Exact mix of people on team will change.
It should be built in an iterative fashion, with more
content and broader review for each iteration.
Taxonomy Strategies LLC
The business of organized information
23
Controlled items Taxonomy team will need
to manage
Metadata Standard
Controlled Vocabularies
Editorial Rules
Tagger Training Materials
(manual and automatic)
Charter, Goals,
Performance Measures
Team Processes
Taxonomy Strategies LLC
The business of organized information
Outreach & ROI
Website
Communication plan
Presentations
Announcements
Taxonomy Roadmap
Long range plan for
Development of controlled
vocabularies, and
Integration with enterprise
applications
24
Controlled item: Editorial rules
Akin to “Chicago Manual of Style”
Issues commonly addressed in the rules:
Abbreviations
Ampersands
Capitalization
Continuations (More… or Other…)
Duplicate Terms
Fidelity to External Source
Hierarchy and Polyhierarchy
Languages and Character Sets
Length Limits
“Other” – Allowed or Forbidden?
Plural vs. Singular Forms
Relation Types and Limits
Scope Notes
Serial Comma
Sources of Terms
Spaces
Spelling (British vs. American English)
Synonyms and Acronyms
Translations
Term Order (Alphabetic or …)
Term Label Order (Direct vs. Inverted)
What to do when rules conflict – how do people decide
which rule is more important?
Taxonomy Strategies LLC
The business of organized information
Rule Name
Editorial Rule
Sources of
Terms
Other things being equal, reusing an existing
vocabulary is preferred to creating a new
one.
Ampersands
The character '&' is preferred to the word
‘and’ in Term Labels.
Example: Use Type: “Manuals & Forms”, not
“Manuals and Forms”.
Special
Characters
Retain accented characters in Term Labels.
Example: Use “España”, not “Espana”.
Serial comma
If a category name includes more than two
items, separate the items by commas. The
last item is separated by the character ‘&’
which IS NOT preceded by a comma.
Example: “Education, Learning &
Employment”, not “Education, Learning, &
Employment”.
Capitalization
Use title case (where all words except
articles are capitalized).
Example: “Education, Learning &
Employment”
NOT “Education, learning & employment”
NOT “EDUCATION, LEARNING &
EMPLOYMENT”
NOT “education, learning & employment”
…
…
25
Controlled item: Training materials
Staff will require training on
UI they use to tag the content
Rules to follow when deciding
what codes to apply
End-effect of the codes they
apply
Structure of the taxonomy
Indexing rules
Rule
Description
Specificity
rule
Apply the most specific terms when tagging
assets. Specific terms can always be generalized,
but generic terms cannot be specialized.
Repeatable
rule
All attributes should be repeatable. Use as many
terms as necessary to describe What the asset is
about and Why it is important. Storage is cheap.
Re-creating content is expensive.
Appropriate
ness rule
Not all attributes apply to all assets. Only supply
values for attributes that make sense.
Usability
rule
Anticipate how the asset will be searched for in
the future, and how to make it easy to find it.
Remember that search engines can only operate
on explicit information.
Indexing UI
Taxonomy Strategies LLC
The business of organized information
26
Controlled item: Communications Plan
Stakeholders: Who are they and
what do they need to know?
Channels: Methods available to
send messages to stakeholders.
Need a mix of narrow vs. broad,
formal vs. informal, interactive vs.
archival, …
Messages: Communications to be
sent at various stages of project.
Bulk of the plan is here
Taxonomy Strategies LLC
The business of organized information
Stakeholders
Info. Needed
Project Sponsors
Progress, Issues, Policies
Dept. Reps
Progress, Priorities,
…
…
Users
Progress, How-Tos
Vendors
RFPs & SOWs
Channel
Description
Demo
Live, or screen capture for download
Presentation
Tailored message for specific
audience
Website
Overview info for all, link to files
Memo
Formal notification
…
…
Trigger
Msg.
Descrip
From
To
Chan.
Initiation
Project
overview
Dept.
head
All
Memo
…
…
…
…
…
27
Controlled item: Team charter
Taxonomy Team is responsible for maintaining:
The Taxonomy, a multi-faceted classification scheme
Associated materials, including a website providing:
Corporate Metadata Standard
Editorial Style Guide
Taxonomy Training Materials
Team rules and procedures (subject to CIO review)
Team evaluates costs and benefits of suggested changes.
Taxonomy Team will:
Manage relationship between providers of source vocabularies and
consumers of the Taxonomy
Identify new opportunities for use of the Taxonomy across the
Enterprise to improve information management practices
Promote awareness and use of the Taxonomy
Taxonomy Strategies LLC
The business of organized information
28
Remaining controlled items
Performance Measures to go along with Charter?
Team Processes (see later in this presentation)
Automatic Classifier Training Materials
Tagging Cost and ROI Spreadsheets
Website
Presentations and Announcements
Change Request List (see later in this presentation)
Taxonomy Roadmap
Taxonomy Strategies LLC
The business of organized information
29
Taxonomy governance environment
Change Requests
& Responses
1: External vocabularies
change on their own
schedule, with some
advance notice.
ISO
3166-1
Other
External
2: Team decides when
to update facets
within Taxonomy
Archives
Intranet
Search
Vocabulary
Management
System
ERMS
’
Notifications
ERP
Custodians
Other
Internal
Taxonomy Strategies LLC
Consuming
Applications
Web CMS
CVs
CV (Controlled Vocabulary) –
The list of values for one
facet in the Taxonomy.
Published
Facets
3: Team adds value via
mappings, translations,
synonyms, training
materials, etc.
Other
Controlled
Items
Intranet
Nav.
DAM
…
…
’
4: Updated versions of
facets published to
consuming
Taxonomy Governance
applications
Environment
The business of organized information
30
Taxonomy governance can be viewed as a
standards process
Closely linked to organizational metadata standard
Taxonomy must evolve, but in predictable way
Team structure, with an appeals process
Taxonomy stewardship is part-time role at most organizations
Team needs to make decisions based on costs and benefits
Documentation and educational material on Taxonomy
and Metadata
Announcements
Comment-handling responsibilities (part of errorcorrection process)
Issue Logs
Release Schedule
Taxonomy Strategies LLC
The business of organized information
31
Where taxonomy changes come from
Firewall
Application
UI
Tagging
UI
Content
Application
Logic
Tagging
Logic
Taxonomy
Staff
notes
‘missing’
concepts
Query log
analysis
End User
Recommendations by Editor
1. Small taxonomy changes
(labels, synonyms)
2. Large taxonomy changes
(retagging, application
changes)
3. New “best bets” content
Taxonomy Strategies LLC
Tagging Staff
Taxonomy Editor
Taxonomy Team
The business of organized information
Team considerations
1. Business goals
2.
experience
Changes in user
experience
3. Retagging cost
Requests from other
Requests
from
other
parts of
NASA
parts of the organization
32
Taxonomy maintenance processes
Different organizations will need to consider their own change
processes.
Organization 1: A custodian is responsible for the content, but
checks facts with department heads before making changes.
Organization 2: Analysts suggest changes, editors approve,
copyeditors verify consistency.
Organization 3: Marketing reps ask for a change, taxonomy editor
makes demo, web representative approves it.
Change process MUST also consider cost of implementing the
change
Retagging data
Reconfiguring auto-classifier
Retraining staff
Changes in user expectations
Taxonomy Strategies LLC
The business of organized information
33
Other change processes
Change Request Process
Anyone can ask a team member
for a change. Team members
responsible for figuring out
details and bringing to team for
decision.
Pending changes list for low
priority/high cost items.
Change Process
Includes preview of change on
site and data mockup
Fast-Track Change Process
Processes may be diagramed or
written
Provide an ‘emergency’ change
process because it will be needed.
How can emergency changes be
requested? Who makes the change
and who approves it?
Who are backups for the people when
they are out?
What is escalation path for denied
requests?
Change Request Process should
call out decision criteria, e.g.
Anyone can ask editor, he gets
Cost of retagging
team leader or deputy approval
Benefit of change
Conflict with editorial rules
Taxonomy Strategies LLC
The business of organized information
34
Taxonomy maintenance workflow
Problem?
Yes
Suggest new
name/category
Review new
name
Problem?
No
Copy edit new
name
Add to
enterprise
Taxonomy
Taxonomy
No
Yes
Taxonomy Tool
Taxonomy Strategies LLC
Analyst
The business of organized information
Editor
Copywriter
Sys Admin
35
Basic Change Request form and process
Need a way to collect and
evaluate change requests.
Need a way to track
deferred change requests.
Legend
O
E
O – Originator
E – Editor
C – Committee
Submit Change
Request
Simple?
Yes
Change as
REQUESTED
E
No
E
Research/complete
Change Request form
Yes
C
Change?
Yes
C
No
No
E
Immediat
e?
Inform Originator
Assign Priority
C
Done
Taxonomy Strategies LLC
The business of organized information
36
Process Document
Team structure and roles
Taxonomy change triggers
Items to be controlled by the
Team
Prioritization criteria
Cost/Benefit considerations for
different types of changes)
Basic change process
Fast-track change process
Situation-specific considerations
Taxonomy Strategies LLC
The business of organized information
37
Finding information should not be about
“Feeling Lucky”
Taxonomy Strategies LLC
The business of organized information
38
How do taxonomies actually improve
search?
Input (Query) Side
“Search” using a small set of pre-defined values
instead of trying to guess what word or words might
have been used in the content.
Have synonyms mapped together so searches for “car”
and “automobile” return the same things.
Output (Results) Side
Organize search results into groups of related items.
Sorting and filtering
Refinement
Taxonomy Strategies LLC
The business of organized information
39
Taxonomy in action on the results side
Position Category
Company
City
State
Salary
Taxonomy Strategies LLC
The business of organized information
40
about 3,890,000 results
Taxonomy Strategies LLC
The business of organized information
41
2,199 results
Taxonomy Strategies LLC
The business of organized information
42
Taxonomy Strategies LLC
Questions?
Joseph A. Busch
+ 415-377-7912
[email protected]
http://ww.taxonomystrategies.com
May 22, 2006
Copyright 2006 Taxonomy Strategies LLC. All rights reserved.
Resources mentioned
The American Customer Satisfaction Index: The voice of the
Nation’s consumer. http://www.theacsi.org/overview.htm
S. Feldman. "The high cost of not finding information." 13:3 KM
World (March 2004)
http://www.kmworld.com/publications/magazine/index.cfm?acti
on=readarticle&Article_ID=1725&Publication_ID=108
M. Hearst, A. Elliott, J. English, R. Sinha, K. Swearingen & K.
Yee. “Finding the Flow in Website Search.” 45 Communications
of the ACM (Sept 2002)
http://bailando.sims.berkeley.edu/papers/cacm02.pdf
Memorandum M-04-20: Performance and Accountability
Reports and Reporting Requirements (July 22, 2004)
http://www.whitehouse.gov/omb/memoranda/fy04/m04-20.pdf
K.S. Taylor. "The brief reign of the knowledge worker," 1998.
http://online.bcc.ctc.edu/econ/kst/BriefReign/BRwebversion.htm
Taxonomy Strategies LLC
The business of organized information
44