Taxonomy Governance

Download Report

Transcript Taxonomy Governance

Taxonomy Strategies LLC
Creating a Governance
Structure for the Ongoing
Maintenance of the Taxonomy
Ron Daniel, Jr.
Sept. 28, 2005
Copyright 2005 Taxonomy Strategies LLC. All rights reserved.
Agenda
10:15
Introduction
10:30
Background
10:35
Maintainable Taxonomies
10:45
Maintainable Metadata
10:50
ROI Estimation
11:00
Governance Environment
11:10
Controlled Items
11:30
Team Structures
11:45
Change Process
12:00
Exercises
12:15
Adjourn
Taxonomy Strategies LLC
The business of organized information
2
Creating a Governance Structure for the
Ongoing Maintenance of the Taxonomy
 Taxonomies must change if they are to remain
relevant. But what will it cost to make those
changes to the taxonomy and to the data which is
categorized by it? Organizations must have
appropriate maintenance processes so that the
taxonomy changes are based on rational cost/benefit
decisions, without becoming mired in endless
paperwork. This interactive workshop will highlight
the framework for creating taxonomy governance
teams and what their specific responsibilities should
be. Special attention will be given to defining
maintainable taxonomies and metadata for achieving
business needs.
Taxonomy Strategies LLC
The business of organized information
3
Metadata and Taxonomy
Metadata
Field
Title
Data Type
String
Example
“The Perl Directory”
Creator
Identifier
String
URL
The Perl Foundation
http://www.perl.org/
Date
DateTime
Jan. 12, 2006
Subject
List
Computers : Programming : Languages : Perl
Taxonomy
Taxonomy Strategies LLC
The business of organized information
4
Ron Daniel, Jr.
 Over 15 years in the business of metadata & automatic
classification
 Principal, Taxonomy Strategies
 Standards Architect, Interwoven
 Senior Information Scientist, Metacode Technologies
 Technical Staff Member, Los Alamos National Laboratory
 Metadata and taxonomies community leadership
 Chair, PRISM (Publishers Requirements for Industry Standard
Metadata) working group
 Acting Chair, XML Linking working group
 Member, RDF working groups
 Co-editor, PRISM, XPointer, 3 IETF RFCs, and Dublin Core 1 & 2
reports.
Taxonomy Strategies LLC
The business of organized information
5
Taxonomy Strategies’ Clients
Government















Commercial
Commodity Futures Trading Commission
Defense Intelligence Agency
ERIC
Dept. Homeland Security
Federal Aviation Administration
Federal Reserve Bank of Atlanta
Forest Service
GSA Office of Citizen Services
(www.firstgov.gov)
Head Start
Infocomm Development Authority of
Singapore
NASA (nasataxonomy.jpl.nasa.gov)
Small Business Administration
Social Security Administration
USDA Economic Research Service
USDA e-Government Program
(www.usda.gov)











Allstate Insurance
Blue Shield of California
Debevoise & Plimpton
Halliburton
Hewlett Packard
Motorola
PeopleSoft
Pricewaterhousecoopers
Siderean Software
Sprint
Time Inc.
Commercial subcontracts
 Agency.com – Top financial services
 Critical Mass – 2 Fortune 50 retailers,
Fortune 50 computer manufacturer
 Deloitte Consulting – Big credit card
 Gistics/OTB – Direct selling giant
International orgs & Non-profits




CEN
IDEAlliance
IMF
OCLC
Taxonomy Strategies LLC
The business of organized information
6
Who are you? What do you want out of
today?
 What type of organization do you work for?
 Government / NGO / SME / Global 2000? What industry?
 What is the size (# employees) of your organization?
 ≤10, 11..100, 101..1k, 1k..10k, 10..100k, > 100k
 What part of the organization do you work in?
 IT / Library & IM / Public Affairs / Product Management /
Engineering / HR & Finance / Other?
 What is your job role within the organization?
 Webmaster / Technical / Researcher / Editorial / Supervisory /
Executive?
 Where is your organization with its taxonomy?
 Thinking / Serious Planning / First Implementation / Major
Revisions / Old-timers just looking for additional clues?
Taxonomy Strategies LLC
The business of organized information
7
Exercise 1: Questions
What questions do you hope this session will answer?
What taxonomy/metadata/search questions do you have that go beyond
what this session could answer?
Taxonomy Strategies LLC
The business of organized information
8
Agenda
10:15
Introduction
10:30
Background
10:35
Maintainable Taxonomies
10:45
Maintainable Metadata
10:50
ROI Estimation
11:00
Governance Environment
11:10
Controlled Items
11:30
Team Structures
11:45
Change Process
12:00
Exercises
12:15
Adjourn
Taxonomy Strategies LLC
The business of organized information
9
Three Problems
Taxonomy development and maintenance is the LEAST of three
problems:
 The Taxonomy Problem: How are we going to build and maintain the lists
of pre-defined values that can go into some of the metadata elements?
 The Tagging Problem: How are we going to populate metadata elements
with complete and consistent values?
 What can we expect to get from automatic classifiers? What kind of error
detection and error correction procedures do we need? What fields do we need?
 The ROI (Return On Investment) Problem: How are we going to use
content, metadata, and vocabularies in applications to obtain business
benefits?
 More sales? Lower support costs? Greater productivity? Risk avoidance?
 How much content? How big an operating budget? How to expose to users?
Business Goals and Cultural Factors are major influences on
tagging and taxonomy. These must be acknowledged at the start
to avoid rework.
Taxonomy Strategies LLC
The business of organized information
10
There’s more to maintaining the Taxonomy
than just maintaining the Taxonomy
What must change when the Taxonomy changes?
 The master copy of the taxonomy.
 The data tagged with the taxonomy?
 The user interface which uses the taxonomy?
 Backend system software which uses the taxonomy?
 The training set for automatic classifiers?
 The educational material for users, catalogers, programmers, etc.?
 The information sent to downstream users of the taxonomy?
The versions of the taxonomy distributed to others.
The list of changes.
 Announcements for stakeholders?
Taxonomy Strategies LLC
The business of organized information
11
Agenda
10:15
Introduction
10:30
Background
10:35
Maintainable Taxonomies
10:45
Maintainable Metadata
10:50
ROI Estimation
11:00
Governance Environment
11:10
Controlled Items
11:30
Team Structures
11:45
Change Process
12:00
Exercises
12:15
Adjourn
Taxonomy Strategies LLC
The business of organized information
12
Metadata and Taxonomy
Metadata
Field
Title
Data Type
String
Example
“The Perl Directory”
Creator
Identifier
String
URL
The Perl Foundation
http://www.perl.org/
Date
DateTime
Jan. 12, 2006
Subject
List
Computers : Programming : Languages : Perl
Taxonomy
Taxonomy Strategies LLC
The business of organized information
13
DMOZ: A worst case example of a unified
‘subject’
 DMOZ has over 600k categories
 Most are a combination of common facets – Geography,
Organization, Person, Document Type, …
 (e.g.) Top: Regional: Europe: Spain: Travel and Tourism: Travel Guides
 (BTW – DMOZ Governance model is out of whack)
Business
Biotechnology &
Pharmaceuticals
Education &
Training
Regional
Europe
Ireland
Business &
Economy
Employment
Health & Medical
Reference
Education
Colleges &
Universities
North America
United States
Maryland
Reference
Education
K-12
Home Schooling
Unschooling
Chats and Forums
Science
Math
Academic
Departments
South America
Colombia
Society
People
Women
Science &
Technology
Mathematics
Science
Social Sciences
Linguistics
Translation
Associations
Business
Small Business
Finance
Business
Accounting
Firms
Business
Employment
By Industry
Business
Healthcare
Employment
Taxonomy Strategies LLC
Columbia
Union College
Athletics
Competency (discipline)
11
Geography
9
Audience
9
Topic
7
Accounting
Organization
5
Directories
Doc Type
4
Industry
4
Process
4
Regional
The business of organized information
14
The power of taxonomy facets
 Categorize in multiple,
independent, categories.
 Allow combinations of
categories to narrow the
choice of items.
 4 independent categories of
10 nodes each have the
same discriminatory power as
one hierarchy of 10,000
nodes (104)
Main
Ingredients
•
•
•
•
•
•
•
•
•
•
Chocolate
Dairy
Fruits
Grains
Meat &
Seafood
Nuts
Olives
Pasta
Spices &
Seasonings
Vegetables
Meal Type
•
•
•
•
•
•
Breakfast
Brunch
Lunch
Supper
Dinner
Snack
Cooking
Methods
Cuisines
•
•
•
•
•
•
•
•
•
•
•
African
American
Asian
Caribbean
Continental
Eclectic/
Fusion/
International
Jewish
Latin American
Mediterranean
Middle Eastern
Vegetarian
•
•
•
•
•
•
•
•
•
•
•
•
•
Advanced
Bake
Broil
Fry
Grill
Marinade
Microwave
No Cooking
Poach
Quick
Roast
Sauté
Slow
Cooking
• Steam
• Stir-fry
 Easier to maintain
 Can be easier to navigate
42 values to maintain (10+6+11+15)
9900 combinations (10x6x11x15)
Taxonomy Strategies LLC
The business of organized information
15
How do I get a good Taxonomy? – Seven
practical rules
1) Incremental, extensible process that identifies and enables
users, and engages stakeholders.
2) Quick implementation that provides measurable results as
quickly as possible.
3) Not monolithic—has separately maintainable facets.
4) Re-uses existing IP as much as possible.
5) A means to an end, and not the end in itself .
6) Not perfect, but it does the job it is supposed to do—such as
improving search and navigation.
7) Improved over time, and maintained.
Taxonomy Strategies LLC
The business of organized information
16
Controlled vocabulary development
procedure
 Develop broad taxonomy outline
(1-3 levels deep)
 Review, revise, and approve
taxonomy outline with
stakeholders and subject matter
experts.
 Draft editorial rules to follow
 Fill in taxonomy outline (no more
than 1500 terms), following and
revising editorial rules
 Tag random samples from
content inventory
 Review, revise, and approve
draft taxonomy with stakeholders
and subject matter experts.
Taxonomy Strategies LLC
The business of organized information
17
Some vocabulary construction rules
 Don’t just have names, also have identifiers
 This will reduce retagging later when names change
 When tagging content, use the most specific code. Let software handle the
hierarchy.
 Bonus: Use URIs for node IDs & publish on the web
 Develop scope notes
 Not just a definition, also say what kind of content the node applies to
 Metadata specification must state the vocabulary for a element.
 Gather data from multiple sources
 Talk with users and experts
 Analyze query logs and content
 Choose and arrange terms
 Test and finalize first version
 Shift into maintenance mode
Taxonomy Strategies LLC
The business of organized information
18
What do I do with all these facets?
 Either expose them directly in
the user interface (postcoordinating)
or
 Combine them in a minimal
hierarchy (pre-coordination)
 Post-coordination takes
software support, which may be
fancy or basic.
 How many facets?
Log10(#documents) as a guide
Taxonomy Strategies LLC
The business of organized information
19
What could possibly go wrong with a little
edit?
 ERP (Enterprise Resource Planning) team made a change to
the product line data element in the product hierarchy.
 They did not know this data was used by downstream
applications outside of ERP.
 An item data standards council discovered the error.
 If the error had not been identified and fixed, the company’s
sales force would not be correctly compensated.
“Lack of the enterprise data standards process in
the item subject area has cost us at least 30 person
days of just ‘category’ rework.”
Source: Danette McGilvray, Granite Falls Consulting, Inc.
20
Taxonomy Strategies LLC
The business of organized information
20
Agenda
10:15
Introduction
10:30
Background
10:35
Maintainable Taxonomies
10:45
Maintainable Metadata
10:50
ROI Estimation
11:00
Governance Environment
11:10
Controlled Items
11:30
Team Structures
11:45
Change Process
12:00
Exercises
12:15
Adjourn
Taxonomy Strategies LLC
The business of organized information
21
Maintainable Metadata
 Design metadata specification for future changes
 Lessons from the Dublin Core
 Provide metadata tagging and storage that will deal
with changes
Taxonomy Strategies LLC
The business of organized information
22
Dublin Core: A little more complicated over
time
Elements
Refinements
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
Abstract
Access rights
Alternative
Audience
Available
Bibliographic citation
Conforms to
Created
Date accepted
Date copyrighted
Date submitted
Education level
Extent
Has format
Has part
Has version
Is format of
Is part of
Identifier
Title
Creator
Contributor
Publisher
Subject
Description
Coverage
Format
Type
Date
Relation
Source
Rights
Language
Taxonomy Strategies LLC
Encodings Types
Is referenced by
Is replaced by
Is required by
Issued
Is version of
License
Mediator
Medium
Modified
Provenance
References
Replaces
Requires
Rights holder
Spatial
Table of contents
Temporal
Valid
The business of organized information
Box
DCMIType
DDC
IMT
ISO3166
ISO639-2
LCC
LCSH
MESH
Period
Point
RFC1766
RFC3066
TGN
UDC
URI
W3CTDF
Collection
Dataset
Event
Image
Interactive
Resource
Moving Image
Physical Object
Service
Software
Sound
Still Image
Text
23
Design Metadata Specification for future
changes
 Degree of future changes will depend on organization size,
sophistication of use, number of repositories and amount of
content.
 Don’t over-engineer
 For all organizations: start with the Dublin Core with a few
additions and deletions for specific needs
 At large/sophisticated organizations:
 “Refinements” will be unavoidable in the future.
 Start with “DatePublished” so that later additions of “DateModified”,
DateApproved”, “DateVerified”, etc. fit in easily.
 Identify broad “integration metadata” vs. division-specific fields.
Coordinate with others to set up a working understanding of a
corporate multi-level metadata standard.
Taxonomy Strategies LLC
The business of organized information
24
Large, Sophisticated, Long-Term Plan?
Multi-level Metadata Specification
Not all key metadata is
taxonomic. Need to define
other basic fields for general
use. (Title, Description, Date,
Creator, ...)
As audiences shrink,
specialist needs increase.
Start with a few elements that
apply to all, gradually add
division-specific, then groupspecific standards.
Taxonomies, Vocabularies,
Ontologies
Dublin Core and Similar
Source: Todd Stephens, BellSouth
Per-Source Data Types,
Access Controls, etc.
Taxonomy Strategies LLC
The business of organized information
25
Provide metatagging and storage that will
deal with changes
 Tag with identifiers, not names.
 This will reduce retagging later when names change
 Not good if people need to view raw tagging, but usually software
will be involved to show labels.
 When tagging content, use the most specific concept. Let
software handle the hierarchy.
 Metadata is easier to manage if it is stored in a central
repository, instead of spread out in the individual files.
 Exception – when sending files out to other systems (e.g. photo
metadata)
 Warning – ‘metadata repositories’ are usually a different class of
software than what we are discussing.
Taxonomy Strategies LLC
The business of organized information
26
Agenda
10:15
Introduction
10:30
Background
10:35
Maintainable Taxonomies
10:45
Maintainable Metadata
10:50
ROI Estimation
11:00
Governance Environment
11:10
Controlled Items
11:30
Team Structures
11:45
Change Process
12:00
Exercises
12:15
Adjourn
Taxonomy Strategies LLC
The business of organized information
27
Fundamentals of taxonomy ROI
 Tagging content using a taxonomy is a cost, not a
benefit.
 There is no benefit without exposing the tagged
content to users in some way that cuts costs,
improves revenues, reduces risk, or achieves some
other clear business goal.
 Putting taxonomy into operation requires UI
changes and/or backend system changes, as well
as data changes.
 You need to determine those changes, and their
costs, as part of the ROI.
Taxonomy Strategies LLC
The business of organized information
28
Key Factors in ROI
 Breadth
 “How many people will metadata affect?”
 Repeatability
 “How many times a day will they use it?
 Cost/Benefit
 “Is this a costly effort with little or no benefits?”
Taxonomy Strategies LLC
The business of organized information
29
How to estimate costs — Tagging
Consider complexity of facet and
ambiguity of content to estimate time
per value.
Taxonomy Facet
Hier?
Typical
CV Size
Time/
Value
(min)
Avg #
values /
Item
$ / Min
Cost/
Element
Audience
N
10
0.25
2
$
0.42
$
0.21
Content Type
N
20
0.25
1
$
0.42
$
0.11
Organizational Unit
Y
50
0.5
2
$
0.42
$
0.42
Products & Services
Y
500
1.5
4
$
0.42
$
2.52
Geographic Region
Y
100
0.5
2
$
0.42
$
0.42
Broad Topics
Y
400
2
4
$
0.42
$
3.36
1080
5
15
$
7.04
TOTALS
Is this field
worth the
cost?
Estimated cost of tagging one item. This can be reduced
with automation, but cannot be eliminated.
Inspired by: Ray Luoma, BAU Solutions
Taxonomy Strategies LLC
The business of organized information
30
How to estimate costs — Assumptions
Your numbers will vary.
ASSUMPTIONS
Enterprise SW License
$ 100,000
Maintenance/Support
15%
SW Implementation
200%
Legacy Content Items
100,000
Content Growth Rate
15%
Tagging/Item
$
Enterprise Taxonomy
$ 100,000
Taxonomy Strategies LLC
The business of organized information
7.04
31
How to estimate costs — Total cost of
ownership (TCO)
Description
Year 1
Year 2
Year 3
Year 4
Year 5
SW
Licenses
$
100,000
Maintenance
Implementation
$
$
15,000
$
15,000
$
15,000
$
15,000
$
30,000
$
30,000
$
30,000
$
30,000
$
105,600
$
121,440
$
139,656
$
160,604
$
15,000
$
15,000
$
15,000
$
15,000
$
165,600
$
181,440
$
199,656
$
220,604
200,000
App Tech Support
Tagging
Legacy Content
$
704,000
Ongoing
Taxonomy
Creation
$
100,000
Maintenance
TOTAL
Taxonomy Strategies LLC
$ 1,103,500
The business of organized information
32
Sample ROI Calculations
Description
Year 1
Year 2
Year 3
Year 4
Year 5
Costs
Software Licenses/
Maintenance
$
100,000
$
15,000
$
15,000
$
15,000
$
15,000
Implementation/Support
$
200,000
$
30,000
$
30,000
$
30,000
$
30,000
Taxonomy Creation/
Maintenance
$
100,000
$
15,000
$
15,000
$
15,000
$
15,000
Legacy/Ongoing Tagging
$
703,500
$ 105,600
$
121,440
$
139,656
$
160,604
Ongoing cost of tagging due to 15% content growth.
Benefits
Productivity increases
$
-
$ 125,000
$ 1,250,000
$ 1,250,000
$ 1,250,000
Service efficiency gains
$
-
$ 129,600
$ 1,296,000
$ 1,296,000
$ 1,296,000
Yearly Net Benefits
$(1,103,500)
$
$ 2,364,560
$ 2,346,344
$ 2,325,396
Payback period
1.4
89,000
Years until Benefits = Costs
Inspired by: Todd Stephens, Dublin Core Global Corporate Circle
Taxonomy Strategies LLC
The business of organized information
33
Where do the benefits come from?
Common taxonomy ROI scenarios
 Catalog site - ROI based on increased sales through improved:
 Product findability
 Product cross-sells and up-sells
 Customer loyalty
 Call center - ROI based on cutting costs through:
 Fewer customer calls due to improved website self-service
 Faster, more accurate CSR responses through better information access
 Compliance – ROI based on:
 Avoiding penalties for breaching regulations
 Following required procedures (e.g. Medical claims)
 Knowledge worker productivity - ROI based on cutting costs through:
 Less time searching for things
 Less time recreating existing materials, with knock-on benefits of less confusion and
reduced storage and backup costs
 Executive mandate
 No ROI at the start, just someone with a vision and the budget to make it happen
Taxonomy Strategies LLC
The business of organized information
34
Agenda
10:15
Introduction
10:30
Background
10:35
Maintainable Taxonomies
10:45
Maintainable Metadata
10:50
ROI Estimation
11:00
Governance Environment
11:10
Controlled Items
11:30
Team Structures
11:45
Change Process
12:00
Exercises
12:15
Adjourn
Taxonomy Strategies LLC
The business of organized information
35
Generic, yet Important, Advice
 It’s not about the tools. It’s not about the taxonomy.
It’s about the business goals and the processes
people use to meet those goals.
 Metrics are grossly underused in metadata and
search.
Taxonomy Strategies LLC
The business of organized information
36
Taxonomy governance overview
 Taxonomy governance can be viewed as a standards
process
Closely linked to organizational metadata standard
Taxonomy must evolve, but in predictable way
 Take tips from other standards efforts
Team structure, with an appeals process
 Taxonomy stewardship is part-time role at most organizations
 Team needs to make decisions based on costs and benefits
Documentation and educational material on Taxonomy and Metadata
Announcements
Comment-handling responsibilities (part of error-correction process)
Issue Logs
Release Schedule
Taxonomy Strategies LLC
The business of organized information
These practices are in rough
order of implementation.
37
Taxonomy governance environment
Change Requests
& Responses
1: External vocabularies
change on their own
schedule, with some
advance notice.
ISO
3166-1
Other
External
Published
Facets
Consuming
Applications
Web CMS
2: Team decides when
to update facets
within Taxonomy
Archives
Intranet
Search
Vocabulary
Management
System
ERMS
Notifications
CVs
ERP
3: Team adds value via
mappings, translations,
synonyms, training
materials, etc.
Custodians
Other
Internal
CV (Controlled Vocabulary) –
The list of values for one
facet in the Taxonomy.
Taxonomy Strategies LLC
’
Other
Controlled
Items
Intranet
Nav.
DAM
…
…
’
4: Updated versions of
facets published to
consuming
Taxonomy Governance
applications
Environment
The business of organized information
38
Agenda
10:15
Introduction
10:30
Background
10:35
Maintainable Taxonomies
10:45
Maintainable Metadata
10:50
ROI Estimation
11:00
Governance Environment
11:10
Controlled Items
11:30
Team Structures
11:45
Change Process
12:00
Exercises
12:15
Adjourn
Taxonomy Strategies LLC
The business of organized information
39
Controlled Items
 Taxonomy Team will have several items to manage:







Controlled Vocabularies
Metadata Standard
Editorial Rules
Tagger Training Materials (manual and automatic)
Charter, Goals, Performance Measures
Team Processes
Outreach & ROI




Website
Communication plan
Presentations
Announcements
 “Roadmap”
 Advanced practice, requires long planning horizon for organization's IT
projects
 Even small taxonomy teams should develop many of these
items, although not to the same level of formality.
Taxonomy Strategies LLC
The business of organized information
40
Controlled Vocabularies are not just
tabbed lists
Source: NASA Taxonomy Competencies Facet
http://nasataxonomy.jpl.nasa.gov/nascomp/index_tt.htm
Taxonomy Strategies LLC
The business of organized information
41
Controlled Item: Metadata Specification
Element Name
XML Map
Repeatable
Source
Purpose
General Purpose Metadata
Unique ID
dc:identifier
1
System supplied
System identifier to retrieve item.
Owner
dc:creator
?
System supplied
POC for content maintenance
Title
dc:title
1
User supplied
Text search & results display
Date
dc:date
1
System supplied
Publish, feature, & review content.
Subject Metadata
Organization
x:corp
*
Corp Classif CV
Asset
x:asset
*
Asset CV
Region/Country
dc:coverage
*
Country CV
Basin/Platform/Well
x:well
*
B/P/Well CV
Content Type
dc:type
?
Content Types CV
Company/Client/Op
erator/Partner
x:company
*
Company CV
Project
x:project
*
Project CV
Search for, browse, group & filter
search results.
Use Metadata
Discipline
dcTerms:
audience
*
Discipline CV
Target, personalize content.
Retention
x:retention
1
System supplied
Remove expired content
Legend:
Taxonomy Strategies LLC
? – 1 or more
The business of organized information
* - 0 or more
42
Controlled Item: Editorial Rules

Akin to “Chicago Manual of Style”

Issues commonly addressed in the rules:
Abbreviations
Ampersands
Capitalization
Continuations (More… or Other…)
Duplicate Terms
Fidelity to External Source
Hierarchy and Polyhierarchy
Languages and Character Sets
Length Limits
“Other” – Allowed or Forbidden?
Plural vs. Singular Forms
Relation Types and Limits
Scope Notes
Serial Comma
Sources of Terms
Spaces
Synonyms and Acronyms
Translations
Term Order (Alphabetic or …)
Term Label Order (Direct vs. Inverted)

What to do when rules conflict – how do people
decide which rule is more important?
Taxonomy Strategies LLC
The business of organized information
Rule Name
Editorial Rule
Use Existing
Vocabularies
Other things being equal, reusing an existing
vocabulary is preferred to creating a new
one.
Ampersands
The character '&' is preferred to the word
‘and’ in Term Labels.
Example: Use Type: “Manuals & Forms”, not
“Manuals and Forms”.
Special
Characters
Retain accented characters in Term Labels.
Example: Use “España”, not “Espana”.
Serial comma
If a category name includes more than two
items, separate the items by commas. The
last item is separated by the character ‘&’
which IS NOT preceded by a comma.
Example: “Education, Learning &
Employment”, not “Education, Learning, &
Employment”.
Capitalization
Use title case (where all words except
articles are capitalized).
Example: “Education, Learning &
Employment”
NOT “Education, learning & employment”
NOT “EDUCATION, LEARNING &
EMPLOYMENT”
NOT “education, learning & employment”
…
…
43
Controlled Item: Training Materials
 Staff will require training on
 The UI they use to tag the
content
 The rules to follow when deciding
what codes to apply
 The end-effect of the codes they
apply
 The structure of the taxonomy
 Tagging examples come from
earlier stages in taxonomy
development process
Indexing rules
Rule
Description
Specificity
rule
Apply the most specific terms when tagging
assets. Specific terms can always be generalized,
but generic terms cannot be specialized.
Repeatable
rule
All attributes should be repeatable. Use as many
terms as necessary to describe What the asset is
about and Why it is important. Storage is cheap.
Re-creating content is expensive.
Appropriate
ness rule
Not all attributes apply to all assets. Only supply
values for attributes that make sense.
Usability
rule
Anticipate how the asset will be searched for in
the future, and how to make it easy to find it.
Remember that search engines can only operate
on explicit information.
Indexing UI
 Hardcopies of the taxonomy, and
yellow highlighters, are helpful
during training
Taxonomy Strategies LLC
The business of organized information
44
Controlled item: Communications Plan
 Stakeholders: Who are they and
what do they need to know?
 Channels: Methods available to
send messages to stakeholders.
 Need a mix of narrow vs. broad,
formal vs. informal, interactive vs.
archival, …
 Messages: Communications to be
sent at various stages of project.
 Bulk of the plan is here
Taxonomy Strategies LLC
The business of organized information
Stakeholders
Info. Needed
Project Sponsors
Progress, Issues, Policies
Dept. Reps
Progress, Priorities,
…
…
Users
Progress, How-Tos
Vendors
RFPs & SOWs
Channel
Description
Demo
Live, or screen capture for download
Presentation
Tailored message for specific
audience
Website
Overview info for all, link to files
Memo
Formal notification
…
…
Trigger
Msg.
Descrip
From
To
Chan.
Initiation
Project
overview
Dept.
head
All
Memo
…
…
…
…
…
45
Controlled Item: Team Charter
 Taxonomy Team is responsible for maintaining:
 The Taxonomy, a multi-faceted classification scheme
 Associated materials, including a website providing:




Corporate Metadata Standard
Editorial Style Guide
Taxonomy Training Materials
Team rules and procedures (subject to CIO review)
 Team evaluates costs and benefits of suggested changes.
 Taxonomy Team will:
 Manage relationship between providers of source
vocabularies and consumers of the Taxonomy
 Identify new opportunities for use of the Taxonomy
across the Enterprise to improve information
management practices
 Promote awareness and use of the Taxonomy
Taxonomy Strategies LLC
The business of organized information
46
Remaining Controlled Items
 Performance Measures to go along with Charter?
 Team Processes (see later in this presentation)
 Automatic Classifier Training Materials
 Website
 Presentations and Announcements
 Change Request List (see later in this presentation)
 “Taxonomy Roadmap”
 Advanced practice, requires long planning horizon for
organization's IT projects
Taxonomy Strategies LLC
The business of organized information
47
Exercise 2: Editorial Rules
 Look at sample taxonomy
 Think of ways to clean it
up and make it ‘better’
Smaller
More professional looking
Easy to use
 Write editorial rules for the
cleanups.
 Provide an example with
each rule:
Rule Name
Plumem
The business of organized information
Lorne ipso ernum de jura fino el
Symosyit Esr
Dirgin a periso de forestima
Himerisf
Faleoin fi ribska firn eowkds
Capitalization
Taxonomy Strategies LLC
Editorial Rule
All terms in lowercase.
“programming, NOT “Programming”
48
Exercise 2: Sample Taxonomy
Source: http://del.icio.us/tag/
Taxonomy Strategies LLC
The business of organized information
49
Exercise 2: Editorial Rules Worksheet
Provide a name for each rule, the rule itself, and an example of the rule of
the form “X, not Y”.
Rule Name
Capitalization
Taxonomy Strategies LLC
Editorial Rule
All terms, except proper nouns, are lowercase.
E.g. “programming”, NOT “Programming”.
E.g. “Schwab”, not “schwab”.
The business of organized information
50
Agenda
10:15
Introduction
10:30
Background
10:35
Maintainable Taxonomies
10:45
Maintainable Metadata
10:50
ROI Estimation
11:00
Governance Environment
11:10
Controlled Items
11:30
Team Structures
11:45
Change Process
12:00
Exercises
12:15
Adjourn
Taxonomy Strategies LLC
The business of organized information
51
Organization 1: Taxonomy
Governance Team
Organization 1 – Internal portal for
Fortune 50 Diversified Multinational.

Team’s liaison to content creators
Estimates costs of proposed changes in terms
of editorial process changes, additional or
reduced workload, etc.
Small-scale Metadata QA Responsibility
Executive Sponsor

Advocate for the taxonomy team
 Business Lead


Keeps team on track with larger business
objectives
Balances cost/benefit issues to decide
appropriate levels of effort
Specialists help in estimating costs

Obtains needed resources if those in team
can’t accomplish a particular task
 Technical Specialist


Estimates costs of proposed changes in
terms of amount of data to be retagged,
additional storage and processing burden,
software changes, etc.
Helps obtain data from various systems
Taxonomy Strategies LLC
The business of organized information
Content Specialist
 Taxonomy Specialist
Suggests potential taxonomy changes based
on analysis of query logs, indexer feedback
Makes edits to taxonomy, installs into system
with aid of IT specialist

Content Owner
Reality check on process change suggestions
Changes
Taxonomy Strategist
Taxonomist
Information Architect 2
Communications Specialist*
52
Organization 2: Vocabulary Policy
Committee
 Organization 2 – A non-profit international
organization. Goal is to improve information
management practices to reduce overlap
between many similar vocabularies across
many systems.
Constraint: Even when number of
vocabularies reduced, some must still have
very close links.
 Business Lead



Chairs group.
Assures CVs fit with organization’s larger
information management effort.
Small group management experience,
Information management background.
 Vocabulary Custodians (3)


Responsible for content in a specific CV, typically
based on organizational lines.
Team lead experience, detail-oriented. Familiar with
databases and organization processes
Other Relevant Staff
 IT Steering Group
 Oversees Vocabulary Policy Committee
 Stakeholders
 Managers of systems using the vocabularies, thus
affected by changes.
 They have a lot of visibility into the process.
 Control over CV changes is limited, but they
schedule their system’s adoption of changes.
 Additional Roles – available during startup of
team, and on an as-needed basis later
 Training Representative
 Develops communications plan, training materials
 Work Practices Representative
 Develops processes, monitors adherence
 IT Representative


Backups, admin of CV Tool
IT administration experience
Taxonomy Strategies LLC
The business of organized information
53
Organization 3: Taxonomy Team
 Organization 3 – Public catalog site
for Fortune 50 Retailer. Data for
products provided by manufacturers.
 Business Lead
 Chairs committee, resolves disputes
 Marketing Representatives
 Provide product marketing expertise
 Advocate for product manufacturers
 Represent data entry concerns
 Website Representative
Likely Changes
 Fast-Track Process – A fast-track
process exists, likely to be used very
often. Representative will ask
Taxonomy Specialist for a change
and he will get approval from Website
Representative.
 Provides input on search and
navigation impacts
 Advocate for customers and other
website users
 Provides search log and click trail
analysis
Larger team than many retailers,
where a single person is
responsible.
 Taxonomy Specialist
 Maintains taxonomy and product
catalog
 Provides data feeds to drive site
Taxonomy Strategies LLC
The business of organized information
A single person still makes the
changes here, but there is some
oversight.
54
What if I have to do it solo?
Realize:
 Its not totally solo – IT help,
Graphics & UI help, Business
Goals help, Funding help, Review
& QA help…
 You are the general contractor
 It needs to be part of your
objectives
 Limit the objectives to what can be
achieved by you, and by your
organization
Concentrate:
 Resource allocation
 (i.e. Manage your time)
 Fundamental processes
 Query log examination
 Error correction procedure
 Cherry-pick from Roles
Business Lead – align with
organization goals, get needed
resources, make cost/benefit
decisions, report upstairs
IT Liaison – Work with IT
specialists to get software installed,
logs gathered, content harvested,
etc. Consider impact of changes on
tools and data
Taxonomy / Search Specialist –
analyze behavior and suggest
changes. Implement changes
which pass cost/benefit muster
Website/User Representative –
consider impact of changes on
users and job performance
 Communications!!!
Taxonomy Strategies LLC
The business of organized information
55
Exercise 3: Team & Stakeholder
Identification
Role
Applicable/Modify
Name(s)
Taxonomy Team Members
Team Lead
Taxonomy Editor(s)
Vocabulary Custodian(s)
Liaisons with external vocabularies
Liaisons with applications using vocabularies
User advocate(s)
Training / Communications
IT / Data & System Maintenance
External Stakeholders
Team Supervisory Group
Representatives of external vocabularies
Representatives of consuming applications
Representatives of users
Other representatives of organization
Taxonomy Strategies LLC
The business of organized information
56
Agenda
10:15
Introduction
10:30
Background
10:35
Maintainable Taxonomies
10:45
Maintainable Metadata
10:50
ROI Estimation
11:00
Governance Environment
11:10
Controlled Items
11:30
Team Structures
11:45
Change Process
12:00
Exercises
12:15
Adjourn
Taxonomy Strategies LLC
The business of organized information
57
Taxonomy editing tools
Immature industry
– no vendors in
upper-right
quadrant!
Ability to Execute
high
Most popular taxonomy
editor? MS Excel
low
All upper-end tools are
high functionality and
high cost.
Widely used, cheap,
good reporting, bad IDs
Niche Players
Taxonomy Strategies LLC
Completeness of Vision
The business of organized information
Visionaries
58
Basic
Standard and Custom Fields
Standard and Custom Relations
Data Typing and
Restrictions
Consistency Enforcement
Flexible Reporting
Flexible Importing?
Midrange
UNICODE
Multiple Vocabulary Support
Inter-Vocabulary Relations
Unique IDs
ISO Codes not sufficient
Advanced
Taxonomy editor functionality
requirements
Workflow
Voting
Change Request Mgmt.
Stylistic rules enforcement
Programmability
Taxonomy Strategies LLC
The business of organized information
Term
Editing
Hierarchy
Browser
59
Taxonomy governance: Where changes
come from
Firewall
Application
UI
Tagging
UI
Content
Application
Logic
Tagging
Logic
Taxonomy
Staff
notes
‘missing’
concepts
Query log
analysis
End User
Recommendations by Editor
1. Small taxonomy changes
(labels, synonyms)
2. Large taxonomy changes
(retagging, application
changes)
3. New “best bets” content
Taxonomy Strategies LLC
Tagging Staff
Taxonomy Editor
Taxonomy Team
The business of organized information
Team considerations
1. Business goals
2.
experience
Changes in user
experience
3. Retagging cost
Requests from other
Requests
from
other
parts of
NASA
parts of the organization
60
Processes
 Different organizations will need to
consider their own change
processes.
Organization 1: A custodian is
responsible for the content, but checks
facts with department heads before
making changes
Organization 2: Analysts suggest
changes, editors approve, copyeditors
verify consistency
Organization 3: Marketing reps ask for
a change, taxonomy editor makes
demo, web representative approves it.
 Change process MUST also
consider cost of implementing the
change
Retagging data
Reconfiguring auto-classifier
Retraining staff
Changes in user expectations
Taxonomy Strategies LLC
The business of organized information
Taxonomy Change Cases
Case 1.
Renaming a term
Case 2.
Adding a new leaf term
Case 3.
Inserting a new term
Case 4.
Splitting a term
Case 5.
Deleting a leaf term or
subtree
Case 6.
Deleting a term
Case 7.
Moving a subtree
Case 8.
Merging terms
Case 9.
Adding a CV
Case 10. Deleting a CV
61
Taxonomy governance: Taxonomy
maintenance workflow
Problem?
Yes
Suggest new
name/category
Review new
name
Problem?
No
Copy edit new
name
Add to
enterprise
Taxonomy
Taxonomy
No
Yes
Taxonomy Tool
Taxonomy Strategies LLC
Analyst
The business of organized information
Editor
Copywriter
Sys Admin
62
Other change processes
 Processes may be diagramed or  Organization X:
written
Change Request Process
 Provide an ‘emergency’ change
process because it will be
needed.
How can emergency changes be
requested? Who makes the
change and who approves it?
Who are backups for the people
when they are out?
Who are escalation points?
 Change Request Process
should call out decision criteria,
e.g.
 Anyone can ask a team member
for a change. Team members
responsible for figuring out
details and bringing to team for
decision.
 Pending changes list for low
priority/high cost items.
Change Process
 Includes preview of change on
site and data mockup
Fast-Track Change Process
 Anyone can ask editor, he gets
team leader or deputy approval
Cost of retagging
Benefit of change
Conflict with editorial rules
Taxonomy Strategies LLC
The business of organized information
63
Fundamental Processes & Outlooks
 Two fundamental processes every organization
should implement to maintain its metadata and
taxonomies:
 Query log / Click trail examination
 Error Correction
 What are the key outlooks a taxonomist should try to
instill in their organization?
 Integrated approach to Taxonomy, Metadata, Search,
and UI
 Measure & Improve Mindset
Taxonomy Strategies LLC
The business of organized information
64
Fundamental process #1 – Query log
examination
 How can we characterize users
and what they are looking for?
•
•
 Query Log & Click Trail
Examination
•
 Only 30-40% of organizations
•
interested in Taxonomy
Governance examine query logs*
 Basic reports provide plenty of
real value
 Greatest value comes from:
 Identifying a person as
responsible for search quality
 Starting a “Measure & Improve”
mindset

Greatest challenge:
•
•
UltraSeek Reporting
Top queries
Queries with no
results
Queries with no
click-through
Most requested
documents
Query trend analysis
Complete server
usage summary
Click Trail
Packages
iWebTrack
NetTracker
OptimalIQ
SiteCatalyst
Visitorville
WebTrends
 Getting a person assigned (≥
10%)
 Getting logs turned back on
Source: Metadata Maturity Model Presentation, Ron Daniel, ESS’05
Taxonomy Strategies LLC
The business of organized information
65
Fundamental process #2 – Error correction
 Errors will happen, and some will be found. What are you going
to do about them?
 Tagging errors, content errors, taxonomy errors, …
 Define an error correction process.
You have an error correction
process. Would you hate to
see it on paper?
 Process will accommodate questions like:
 Who looks at it? Is it an error? What are the costs to correct vs. not
correct? Does the correction need to be scheduled? etc.
 Once a tagging error is corrected, NEVER lose that fact.
 Manually reviewed pages are vital for training automatic classifiers
 Has implications for metadata specification and review procedures
 Over time, multiple error detection methods will be defined
 e.g. Statistical sampling of newly added pages
 Gradually, additional error correction processes may be defined to deal
with particular types of errors
Taxonomy Strategies LLC
The business of organized information
66
Fundamental Outlooks
 Measure & Improve Mindset
 Query logs and click trails are prime example
 Next place to instrument: Error correction and error
detection processes
 Integrated handling of Taxonomy, Metadata, UI, &
Search
 To be most effective, these must work together
 Governance structure must help that happen
 Cross-functional team structure is a start
Taxonomy Strategies LLC
The business of organized information
67
Actions to define taxonomy governance
 Initial vocabularies should be selected for stability as
well as utility.
 Custodians of shared vocabularies must be identified,
educated re. impacts of changes.
 Group of custodians and stakeholders must be
established.
 (Simple) System for sharing the CVs and tracking the
update process must be established.
Taxonomy Strategies LLC
The business of organized information
68
Agenda
10:15
Introduction
10:30
Background
10:35
Maintainable Taxonomies
10:45
Maintainable Metadata
10:50
ROI Estimation
11:00
Governance Environment
11:10
Controlled Items
11:30
Team Structures
11:45
Change Process
12:00
Exercises
12:15
Adjourn
Taxonomy Strategies LLC
The business of organized information
69
Exercise 4: Self-Diagnosis
1. Does your organization know what it is, or wants to be, doing around
search & taxonomy yet?
18. Do you have an identified taxonomy “team” with at least one person?
2. Is the cost basis for the taxonomy ROI clear to you?
19. Is there at least one person working on taxonomy/metadata/search more
than ½ time?
3. Is the benefits basis for the taxonomy ROI clear to you?
20. Does the team contain members who represent search, UI, and
metadata tagging?
4. Is the cost basis for the taxonomy ROI clear to your CFO?
5. Is the benefits basis for the taxonomy clear to your CFO?
21. Does the organization have any hiring and training criteria for taxonomy,
metadata, and search positions?
6. Do you know how content will be tagged?
22. Does the team maintain Editorial Rules?
7. Do you know how tagged content will be displayed to users?
23. Does the team maintain a corporate metadata specification?
8. Do you know how users will fetch the content?
24. Does the team maintain educational materials?
9. Do users know how they should report errors in the tagging?
25. Does the team have a communications plan?
10. Do you know how what information will be logged for later analysis?
26. Does the team examine query logs?
11. Do you know what information has to be reported to management to
justify the taxonomy team?
27. Does the team examine click trails?
12. Does management expect the taxonomy team to justify its existence?
13. Is your organization planning a tightly focused taxonomy effort?
14. Is your organization planning a credible ‘Enterprise Taxonomy Strategy’?
15. Does your organization expect its taxonomies to change frequently?
16. Has your organization identified some facets as stable and some facets
as volatile?
17. Does your organization have a plan for retagging data when the
taxonomy is changed?
Taxonomy Strategies LLC
The business of organized information
28. Does the team have a documented error correction process?
29. Does the organization have a procedure to locate ROT (Redundant,
Obsolete, or Trivial content)?
30. Does the organization have any qualitative or quantitative measures of
data quality?
31. Do you use a tool other than MS Excel for editing and maintaining the
Taxonomy?
32. Were taxonomy, metadata, search, or content management tools
purchased with money other than “use it or lose it” funds?
70
Agenda
10:15
Introduction
10:30
Background
10:35
Maintainable Taxonomies
10:45
Maintainable Metadata
10:50
ROI Estimation
11:00
Governance Environment
11:10
Controlled Items
11:30
Team Structures
11:45
Change Process
12:00
Exercises
12:15
Adjourn
Taxonomy Strategies LLC
The business of organized information
71
Taxonomy Strategies LLC
Contact Info
Ron Daniel
925-368-8371
[email protected]
Sept. 28, 2005
Copyright 2005 Taxonomy Strategies LLC. All rights reserved.