Taxonomy Governance
Download
Report
Transcript Taxonomy Governance
Taxonomy Strategies LLC
Creating a Governance
Structure for the Ongoing
Maintenance of the Taxonomy
Ron Daniel, Jr.
Sept. 28, 2005
Copyright 2005 Taxonomy Strategies LLC. All rights reserved.
Agenda
10:15
Introduction
10:30
Background
10:35
Maintainable Taxonomies
10:45
Maintainable Metadata
10:50
ROI Estimation
11:00
Governance Environment
11:10
Controlled Items
11:30
Team Structures
11:45
Change Process
12:00
Exercises
12:15
Adjourn
Taxonomy Strategies LLC
The business of organized information
2
Creating a Governance Structure for the
Ongoing Maintenance of the Taxonomy
Taxonomies must change if they are to remain
relevant. But what will it cost to make those
changes to the taxonomy and to the data which is
categorized by it? Organizations must have
appropriate maintenance processes so that the
taxonomy changes are based on rational cost/benefit
decisions, without becoming mired in endless
paperwork. This interactive workshop will highlight
the framework for creating taxonomy governance
teams and what their specific responsibilities should
be. Special attention will be given to defining
maintainable taxonomies and metadata for achieving
business needs.
Taxonomy Strategies LLC
The business of organized information
3
Metadata and Taxonomy
Metadata
Field
Title
Data Type
String
Example
“The Perl Directory”
Creator
Identifier
String
URL
The Perl Foundation
http://www.perl.org/
Date
DateTime
Jan. 12, 2006
Subject
List
Computers : Programming : Languages : Perl
Taxonomy
Taxonomy Strategies LLC
The business of organized information
4
Ron Daniel, Jr.
Over 15 years in the business of metadata & automatic
classification
Principal, Taxonomy Strategies
Standards Architect, Interwoven
Senior Information Scientist, Metacode Technologies
Technical Staff Member, Los Alamos National Laboratory
Metadata and taxonomies community leadership
Chair, PRISM (Publishers Requirements for Industry Standard
Metadata) working group
Acting Chair, XML Linking working group
Member, RDF working groups
Co-editor, PRISM, XPointer, 3 IETF RFCs, and Dublin Core 1 & 2
reports.
Taxonomy Strategies LLC
The business of organized information
5
Taxonomy Strategies’ Clients
Government
Commercial
Commodity Futures Trading Commission
Defense Intelligence Agency
ERIC
Dept. Homeland Security
Federal Aviation Administration
Federal Reserve Bank of Atlanta
Forest Service
GSA Office of Citizen Services
(www.firstgov.gov)
Head Start
Infocomm Development Authority of
Singapore
NASA (nasataxonomy.jpl.nasa.gov)
Small Business Administration
Social Security Administration
USDA Economic Research Service
USDA e-Government Program
(www.usda.gov)
Allstate Insurance
Blue Shield of California
Debevoise & Plimpton
Halliburton
Hewlett Packard
Motorola
PeopleSoft
Pricewaterhousecoopers
Siderean Software
Sprint
Time Inc.
Commercial subcontracts
Agency.com – Top financial services
Critical Mass – 2 Fortune 50 retailers,
Fortune 50 computer manufacturer
Deloitte Consulting – Big credit card
Gistics/OTB – Direct selling giant
International orgs & Non-profits
CEN
IDEAlliance
IMF
OCLC
Taxonomy Strategies LLC
The business of organized information
6
Who are you? What do you want out of
today?
What type of organization do you work for?
Government / NGO / SME / Global 2000? What industry?
What is the size (# employees) of your organization?
≤10, 11..100, 101..1k, 1k..10k, 10..100k, > 100k
What part of the organization do you work in?
IT / Library & IM / Public Affairs / Product Management /
Engineering / HR & Finance / Other?
What is your job role within the organization?
Webmaster / Technical / Researcher / Editorial / Supervisory /
Executive?
Where is your organization with its taxonomy?
Thinking / Serious Planning / First Implementation / Major
Revisions / Old-timers just looking for additional clues?
Taxonomy Strategies LLC
The business of organized information
7
Exercise 1: Questions
What questions do you hope this session will answer?
What taxonomy/metadata/search questions do you have that go beyond
what this session could answer?
Taxonomy Strategies LLC
The business of organized information
8
Agenda
10:15
Introduction
10:30
Background
10:35
Maintainable Taxonomies
10:45
Maintainable Metadata
10:50
ROI Estimation
11:00
Governance Environment
11:10
Controlled Items
11:30
Team Structures
11:45
Change Process
12:00
Exercises
12:15
Adjourn
Taxonomy Strategies LLC
The business of organized information
9
Three Problems
Taxonomy development and maintenance is the LEAST of three
problems:
The Taxonomy Problem: How are we going to build and maintain the lists
of pre-defined values that can go into some of the metadata elements?
The Tagging Problem: How are we going to populate metadata elements
with complete and consistent values?
What can we expect to get from automatic classifiers? What kind of error
detection and error correction procedures do we need? What fields do we need?
The ROI (Return On Investment) Problem: How are we going to use
content, metadata, and vocabularies in applications to obtain business
benefits?
More sales? Lower support costs? Greater productivity? Risk avoidance?
How much content? How big an operating budget? How to expose to users?
Business Goals and Cultural Factors are major influences on
tagging and taxonomy. These must be acknowledged at the start
to avoid rework.
Taxonomy Strategies LLC
The business of organized information
10
There’s more to maintaining the Taxonomy
than just maintaining the Taxonomy
What must change when the Taxonomy changes?
The master copy of the taxonomy.
The data tagged with the taxonomy?
The user interface which uses the taxonomy?
Backend system software which uses the taxonomy?
The training set for automatic classifiers?
The educational material for users, catalogers, programmers, etc.?
The information sent to downstream users of the taxonomy?
The versions of the taxonomy distributed to others.
The list of changes.
Announcements for stakeholders?
Taxonomy Strategies LLC
The business of organized information
11
Agenda
10:15
Introduction
10:30
Background
10:35
Maintainable Taxonomies
10:45
Maintainable Metadata
10:50
ROI Estimation
11:00
Governance Environment
11:10
Controlled Items
11:30
Team Structures
11:45
Change Process
12:00
Exercises
12:15
Adjourn
Taxonomy Strategies LLC
The business of organized information
12
Metadata and Taxonomy
Metadata
Field
Title
Data Type
String
Example
“The Perl Directory”
Creator
Identifier
String
URL
The Perl Foundation
http://www.perl.org/
Date
DateTime
Jan. 12, 2006
Subject
List
Computers : Programming : Languages : Perl
Taxonomy
Taxonomy Strategies LLC
The business of organized information
13
DMOZ: A worst case example of a unified
‘subject’
DMOZ has over 600k categories
Most are a combination of common facets – Geography,
Organization, Person, Document Type, …
(e.g.) Top: Regional: Europe: Spain: Travel and Tourism: Travel Guides
(BTW – DMOZ Governance model is out of whack)
Business
Biotechnology &
Pharmaceuticals
Education &
Training
Regional
Europe
Ireland
Business &
Economy
Employment
Health & Medical
Reference
Education
Colleges &
Universities
North America
United States
Maryland
Reference
Education
K-12
Home Schooling
Unschooling
Chats and Forums
Science
Math
Academic
Departments
South America
Colombia
Society
People
Women
Science &
Technology
Mathematics
Science
Social Sciences
Linguistics
Translation
Associations
Business
Small Business
Finance
Business
Accounting
Firms
Business
Employment
By Industry
Business
Healthcare
Employment
Taxonomy Strategies LLC
Columbia
Union College
Athletics
Competency (discipline)
11
Geography
9
Audience
9
Topic
7
Accounting
Organization
5
Directories
Doc Type
4
Industry
4
Process
4
Regional
The business of organized information
14
The power of taxonomy facets
Categorize in multiple,
independent, categories.
Allow combinations of
categories to narrow the
choice of items.
4 independent categories of
10 nodes each have the
same discriminatory power as
one hierarchy of 10,000
nodes (104)
Main
Ingredients
•
•
•
•
•
•
•
•
•
•
Chocolate
Dairy
Fruits
Grains
Meat &
Seafood
Nuts
Olives
Pasta
Spices &
Seasonings
Vegetables
Meal Type
•
•
•
•
•
•
Breakfast
Brunch
Lunch
Supper
Dinner
Snack
Cooking
Methods
Cuisines
•
•
•
•
•
•
•
•
•
•
•
African
American
Asian
Caribbean
Continental
Eclectic/
Fusion/
International
Jewish
Latin American
Mediterranean
Middle Eastern
Vegetarian
•
•
•
•
•
•
•
•
•
•
•
•
•
Advanced
Bake
Broil
Fry
Grill
Marinade
Microwave
No Cooking
Poach
Quick
Roast
Sauté
Slow
Cooking
• Steam
• Stir-fry
Easier to maintain
Can be easier to navigate
42 values to maintain (10+6+11+15)
9900 combinations (10x6x11x15)
Taxonomy Strategies LLC
The business of organized information
15
How do I get a good Taxonomy? – Seven
practical rules
1) Incremental, extensible process that identifies and enables
users, and engages stakeholders.
2) Quick implementation that provides measurable results as
quickly as possible.
3) Not monolithic—has separately maintainable facets.
4) Re-uses existing IP as much as possible.
5) A means to an end, and not the end in itself .
6) Not perfect, but it does the job it is supposed to do—such as
improving search and navigation.
7) Improved over time, and maintained.
Taxonomy Strategies LLC
The business of organized information
16
Controlled vocabulary development
procedure
Develop broad taxonomy outline
(1-3 levels deep)
Review, revise, and approve
taxonomy outline with
stakeholders and subject matter
experts.
Draft editorial rules to follow
Fill in taxonomy outline (no more
than 1500 terms), following and
revising editorial rules
Tag random samples from
content inventory
Review, revise, and approve
draft taxonomy with stakeholders
and subject matter experts.
Taxonomy Strategies LLC
The business of organized information
17
Some vocabulary construction rules
Don’t just have names, also have identifiers
This will reduce retagging later when names change
When tagging content, use the most specific code. Let software handle the
hierarchy.
Bonus: Use URIs for node IDs & publish on the web
Develop scope notes
Not just a definition, also say what kind of content the node applies to
Metadata specification must state the vocabulary for a element.
Gather data from multiple sources
Talk with users and experts
Analyze query logs and content
Choose and arrange terms
Test and finalize first version
Shift into maintenance mode
Taxonomy Strategies LLC
The business of organized information
18
What do I do with all these facets?
Either expose them directly in
the user interface (postcoordinating)
or
Combine them in a minimal
hierarchy (pre-coordination)
Post-coordination takes
software support, which may be
fancy or basic.
How many facets?
Log10(#documents) as a guide
Taxonomy Strategies LLC
The business of organized information
19
What could possibly go wrong with a little
edit?
ERP (Enterprise Resource Planning) team made a change to
the product line data element in the product hierarchy.
They did not know this data was used by downstream
applications outside of ERP.
An item data standards council discovered the error.
If the error had not been identified and fixed, the company’s
sales force would not be correctly compensated.
“Lack of the enterprise data standards process in
the item subject area has cost us at least 30 person
days of just ‘category’ rework.”
Source: Danette McGilvray, Granite Falls Consulting, Inc.
20
Taxonomy Strategies LLC
The business of organized information
20
Agenda
10:15
Introduction
10:30
Background
10:35
Maintainable Taxonomies
10:45
Maintainable Metadata
10:50
ROI Estimation
11:00
Governance Environment
11:10
Controlled Items
11:30
Team Structures
11:45
Change Process
12:00
Exercises
12:15
Adjourn
Taxonomy Strategies LLC
The business of organized information
21
Maintainable Metadata
Design metadata specification for future changes
Lessons from the Dublin Core
Provide metadata tagging and storage that will deal
with changes
Taxonomy Strategies LLC
The business of organized information
22
Dublin Core: A little more complicated over
time
Elements
Refinements
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
Abstract
Access rights
Alternative
Audience
Available
Bibliographic citation
Conforms to
Created
Date accepted
Date copyrighted
Date submitted
Education level
Extent
Has format
Has part
Has version
Is format of
Is part of
Identifier
Title
Creator
Contributor
Publisher
Subject
Description
Coverage
Format
Type
Date
Relation
Source
Rights
Language
Taxonomy Strategies LLC
Encodings Types
Is referenced by
Is replaced by
Is required by
Issued
Is version of
License
Mediator
Medium
Modified
Provenance
References
Replaces
Requires
Rights holder
Spatial
Table of contents
Temporal
Valid
The business of organized information
Box
DCMIType
DDC
IMT
ISO3166
ISO639-2
LCC
LCSH
MESH
Period
Point
RFC1766
RFC3066
TGN
UDC
URI
W3CTDF
Collection
Dataset
Event
Image
Interactive
Resource
Moving Image
Physical Object
Service
Software
Sound
Still Image
Text
23
Design Metadata Specification for future
changes
Degree of future changes will depend on organization size,
sophistication of use, number of repositories and amount of
content.
Don’t over-engineer
For all organizations: start with the Dublin Core with a few
additions and deletions for specific needs
At large/sophisticated organizations:
“Refinements” will be unavoidable in the future.
Start with “DatePublished” so that later additions of “DateModified”,
DateApproved”, “DateVerified”, etc. fit in easily.
Identify broad “integration metadata” vs. division-specific fields.
Coordinate with others to set up a working understanding of a
corporate multi-level metadata standard.
Taxonomy Strategies LLC
The business of organized information
24
Large, Sophisticated, Long-Term Plan?
Multi-level Metadata Specification
Not all key metadata is
taxonomic. Need to define
other basic fields for general
use. (Title, Description, Date,
Creator, ...)
As audiences shrink,
specialist needs increase.
Start with a few elements that
apply to all, gradually add
division-specific, then groupspecific standards.
Taxonomies, Vocabularies,
Ontologies
Dublin Core and Similar
Source: Todd Stephens, BellSouth
Per-Source Data Types,
Access Controls, etc.
Taxonomy Strategies LLC
The business of organized information
25
Provide metatagging and storage that will
deal with changes
Tag with identifiers, not names.
This will reduce retagging later when names change
Not good if people need to view raw tagging, but usually software
will be involved to show labels.
When tagging content, use the most specific concept. Let
software handle the hierarchy.
Metadata is easier to manage if it is stored in a central
repository, instead of spread out in the individual files.
Exception – when sending files out to other systems (e.g. photo
metadata)
Warning – ‘metadata repositories’ are usually a different class of
software than what we are discussing.
Taxonomy Strategies LLC
The business of organized information
26
Agenda
10:15
Introduction
10:30
Background
10:35
Maintainable Taxonomies
10:45
Maintainable Metadata
10:50
ROI Estimation
11:00
Governance Environment
11:10
Controlled Items
11:30
Team Structures
11:45
Change Process
12:00
Exercises
12:15
Adjourn
Taxonomy Strategies LLC
The business of organized information
27
Fundamentals of taxonomy ROI
Tagging content using a taxonomy is a cost, not a
benefit.
There is no benefit without exposing the tagged
content to users in some way that cuts costs,
improves revenues, reduces risk, or achieves some
other clear business goal.
Putting taxonomy into operation requires UI
changes and/or backend system changes, as well
as data changes.
You need to determine those changes, and their
costs, as part of the ROI.
Taxonomy Strategies LLC
The business of organized information
28
Key Factors in ROI
Breadth
“How many people will metadata affect?”
Repeatability
“How many times a day will they use it?
Cost/Benefit
“Is this a costly effort with little or no benefits?”
Taxonomy Strategies LLC
The business of organized information
29
How to estimate costs — Tagging
Consider complexity of facet and
ambiguity of content to estimate time
per value.
Taxonomy Facet
Hier?
Typical
CV Size
Time/
Value
(min)
Avg #
values /
Item
$ / Min
Cost/
Element
Audience
N
10
0.25
2
$
0.42
$
0.21
Content Type
N
20
0.25
1
$
0.42
$
0.11
Organizational Unit
Y
50
0.5
2
$
0.42
$
0.42
Products & Services
Y
500
1.5
4
$
0.42
$
2.52
Geographic Region
Y
100
0.5
2
$
0.42
$
0.42
Broad Topics
Y
400
2
4
$
0.42
$
3.36
1080
5
15
$
7.04
TOTALS
Is this field
worth the
cost?
Estimated cost of tagging one item. This can be reduced
with automation, but cannot be eliminated.
Inspired by: Ray Luoma, BAU Solutions
Taxonomy Strategies LLC
The business of organized information
30
How to estimate costs — Assumptions
Your numbers will vary.
ASSUMPTIONS
Enterprise SW License
$ 100,000
Maintenance/Support
15%
SW Implementation
200%
Legacy Content Items
100,000
Content Growth Rate
15%
Tagging/Item
$
Enterprise Taxonomy
$ 100,000
Taxonomy Strategies LLC
The business of organized information
7.04
31
How to estimate costs — Total cost of
ownership (TCO)
Description
Year 1
Year 2
Year 3
Year 4
Year 5
SW
Licenses
$
100,000
Maintenance
Implementation
$
$
15,000
$
15,000
$
15,000
$
15,000
$
30,000
$
30,000
$
30,000
$
30,000
$
105,600
$
121,440
$
139,656
$
160,604
$
15,000
$
15,000
$
15,000
$
15,000
$
165,600
$
181,440
$
199,656
$
220,604
200,000
App Tech Support
Tagging
Legacy Content
$
704,000
Ongoing
Taxonomy
Creation
$
100,000
Maintenance
TOTAL
Taxonomy Strategies LLC
$ 1,103,500
The business of organized information
32
Sample ROI Calculations
Description
Year 1
Year 2
Year 3
Year 4
Year 5
Costs
Software Licenses/
Maintenance
$
100,000
$
15,000
$
15,000
$
15,000
$
15,000
Implementation/Support
$
200,000
$
30,000
$
30,000
$
30,000
$
30,000
Taxonomy Creation/
Maintenance
$
100,000
$
15,000
$
15,000
$
15,000
$
15,000
Legacy/Ongoing Tagging
$
703,500
$ 105,600
$
121,440
$
139,656
$
160,604
Ongoing cost of tagging due to 15% content growth.
Benefits
Productivity increases
$
-
$ 125,000
$ 1,250,000
$ 1,250,000
$ 1,250,000
Service efficiency gains
$
-
$ 129,600
$ 1,296,000
$ 1,296,000
$ 1,296,000
Yearly Net Benefits
$(1,103,500)
$
$ 2,364,560
$ 2,346,344
$ 2,325,396
Payback period
1.4
89,000
Years until Benefits = Costs
Inspired by: Todd Stephens, Dublin Core Global Corporate Circle
Taxonomy Strategies LLC
The business of organized information
33
Where do the benefits come from?
Common taxonomy ROI scenarios
Catalog site - ROI based on increased sales through improved:
Product findability
Product cross-sells and up-sells
Customer loyalty
Call center - ROI based on cutting costs through:
Fewer customer calls due to improved website self-service
Faster, more accurate CSR responses through better information access
Compliance – ROI based on:
Avoiding penalties for breaching regulations
Following required procedures (e.g. Medical claims)
Knowledge worker productivity - ROI based on cutting costs through:
Less time searching for things
Less time recreating existing materials, with knock-on benefits of less confusion and
reduced storage and backup costs
Executive mandate
No ROI at the start, just someone with a vision and the budget to make it happen
Taxonomy Strategies LLC
The business of organized information
34
Agenda
10:15
Introduction
10:30
Background
10:35
Maintainable Taxonomies
10:45
Maintainable Metadata
10:50
ROI Estimation
11:00
Governance Environment
11:10
Controlled Items
11:30
Team Structures
11:45
Change Process
12:00
Exercises
12:15
Adjourn
Taxonomy Strategies LLC
The business of organized information
35
Generic, yet Important, Advice
It’s not about the tools. It’s not about the taxonomy.
It’s about the business goals and the processes
people use to meet those goals.
Metrics are grossly underused in metadata and
search.
Taxonomy Strategies LLC
The business of organized information
36
Taxonomy governance overview
Taxonomy governance can be viewed as a standards
process
Closely linked to organizational metadata standard
Taxonomy must evolve, but in predictable way
Take tips from other standards efforts
Team structure, with an appeals process
Taxonomy stewardship is part-time role at most organizations
Team needs to make decisions based on costs and benefits
Documentation and educational material on Taxonomy and Metadata
Announcements
Comment-handling responsibilities (part of error-correction process)
Issue Logs
Release Schedule
Taxonomy Strategies LLC
The business of organized information
These practices are in rough
order of implementation.
37
Taxonomy governance environment
Change Requests
& Responses
1: External vocabularies
change on their own
schedule, with some
advance notice.
ISO
3166-1
Other
External
Published
Facets
Consuming
Applications
Web CMS
2: Team decides when
to update facets
within Taxonomy
Archives
Intranet
Search
Vocabulary
Management
System
ERMS
Notifications
CVs
ERP
3: Team adds value via
mappings, translations,
synonyms, training
materials, etc.
Custodians
Other
Internal
CV (Controlled Vocabulary) –
The list of values for one
facet in the Taxonomy.
Taxonomy Strategies LLC
’
Other
Controlled
Items
Intranet
Nav.
DAM
…
…
’
4: Updated versions of
facets published to
consuming
Taxonomy Governance
applications
Environment
The business of organized information
38
Agenda
10:15
Introduction
10:30
Background
10:35
Maintainable Taxonomies
10:45
Maintainable Metadata
10:50
ROI Estimation
11:00
Governance Environment
11:10
Controlled Items
11:30
Team Structures
11:45
Change Process
12:00
Exercises
12:15
Adjourn
Taxonomy Strategies LLC
The business of organized information
39
Controlled Items
Taxonomy Team will have several items to manage:
Controlled Vocabularies
Metadata Standard
Editorial Rules
Tagger Training Materials (manual and automatic)
Charter, Goals, Performance Measures
Team Processes
Outreach & ROI
Website
Communication plan
Presentations
Announcements
“Roadmap”
Advanced practice, requires long planning horizon for organization's IT
projects
Even small taxonomy teams should develop many of these
items, although not to the same level of formality.
Taxonomy Strategies LLC
The business of organized information
40
Controlled Vocabularies are not just
tabbed lists
Source: NASA Taxonomy Competencies Facet
http://nasataxonomy.jpl.nasa.gov/nascomp/index_tt.htm
Taxonomy Strategies LLC
The business of organized information
41
Controlled Item: Metadata Specification
Element Name
XML Map
Repeatable
Source
Purpose
General Purpose Metadata
Unique ID
dc:identifier
1
System supplied
System identifier to retrieve item.
Owner
dc:creator
?
System supplied
POC for content maintenance
Title
dc:title
1
User supplied
Text search & results display
Date
dc:date
1
System supplied
Publish, feature, & review content.
Subject Metadata
Organization
x:corp
*
Corp Classif CV
Asset
x:asset
*
Asset CV
Region/Country
dc:coverage
*
Country CV
Basin/Platform/Well
x:well
*
B/P/Well CV
Content Type
dc:type
?
Content Types CV
Company/Client/Op
erator/Partner
x:company
*
Company CV
Project
x:project
*
Project CV
Search for, browse, group & filter
search results.
Use Metadata
Discipline
dcTerms:
audience
*
Discipline CV
Target, personalize content.
Retention
x:retention
1
System supplied
Remove expired content
Legend:
Taxonomy Strategies LLC
? – 1 or more
The business of organized information
* - 0 or more
42
Controlled Item: Editorial Rules
Akin to “Chicago Manual of Style”
Issues commonly addressed in the rules:
Abbreviations
Ampersands
Capitalization
Continuations (More… or Other…)
Duplicate Terms
Fidelity to External Source
Hierarchy and Polyhierarchy
Languages and Character Sets
Length Limits
“Other” – Allowed or Forbidden?
Plural vs. Singular Forms
Relation Types and Limits
Scope Notes
Serial Comma
Sources of Terms
Spaces
Synonyms and Acronyms
Translations
Term Order (Alphabetic or …)
Term Label Order (Direct vs. Inverted)
What to do when rules conflict – how do people
decide which rule is more important?
Taxonomy Strategies LLC
The business of organized information
Rule Name
Editorial Rule
Use Existing
Vocabularies
Other things being equal, reusing an existing
vocabulary is preferred to creating a new
one.
Ampersands
The character '&' is preferred to the word
‘and’ in Term Labels.
Example: Use Type: “Manuals & Forms”, not
“Manuals and Forms”.
Special
Characters
Retain accented characters in Term Labels.
Example: Use “España”, not “Espana”.
Serial comma
If a category name includes more than two
items, separate the items by commas. The
last item is separated by the character ‘&’
which IS NOT preceded by a comma.
Example: “Education, Learning &
Employment”, not “Education, Learning, &
Employment”.
Capitalization
Use title case (where all words except
articles are capitalized).
Example: “Education, Learning &
Employment”
NOT “Education, learning & employment”
NOT “EDUCATION, LEARNING &
EMPLOYMENT”
NOT “education, learning & employment”
…
…
43
Controlled Item: Training Materials
Staff will require training on
The UI they use to tag the
content
The rules to follow when deciding
what codes to apply
The end-effect of the codes they
apply
The structure of the taxonomy
Tagging examples come from
earlier stages in taxonomy
development process
Indexing rules
Rule
Description
Specificity
rule
Apply the most specific terms when tagging
assets. Specific terms can always be generalized,
but generic terms cannot be specialized.
Repeatable
rule
All attributes should be repeatable. Use as many
terms as necessary to describe What the asset is
about and Why it is important. Storage is cheap.
Re-creating content is expensive.
Appropriate
ness rule
Not all attributes apply to all assets. Only supply
values for attributes that make sense.
Usability
rule
Anticipate how the asset will be searched for in
the future, and how to make it easy to find it.
Remember that search engines can only operate
on explicit information.
Indexing UI
Hardcopies of the taxonomy, and
yellow highlighters, are helpful
during training
Taxonomy Strategies LLC
The business of organized information
44
Controlled item: Communications Plan
Stakeholders: Who are they and
what do they need to know?
Channels: Methods available to
send messages to stakeholders.
Need a mix of narrow vs. broad,
formal vs. informal, interactive vs.
archival, …
Messages: Communications to be
sent at various stages of project.
Bulk of the plan is here
Taxonomy Strategies LLC
The business of organized information
Stakeholders
Info. Needed
Project Sponsors
Progress, Issues, Policies
Dept. Reps
Progress, Priorities,
…
…
Users
Progress, How-Tos
Vendors
RFPs & SOWs
Channel
Description
Demo
Live, or screen capture for download
Presentation
Tailored message for specific
audience
Website
Overview info for all, link to files
Memo
Formal notification
…
…
Trigger
Msg.
Descrip
From
To
Chan.
Initiation
Project
overview
Dept.
head
All
Memo
…
…
…
…
…
45
Controlled Item: Team Charter
Taxonomy Team is responsible for maintaining:
The Taxonomy, a multi-faceted classification scheme
Associated materials, including a website providing:
Corporate Metadata Standard
Editorial Style Guide
Taxonomy Training Materials
Team rules and procedures (subject to CIO review)
Team evaluates costs and benefits of suggested changes.
Taxonomy Team will:
Manage relationship between providers of source
vocabularies and consumers of the Taxonomy
Identify new opportunities for use of the Taxonomy
across the Enterprise to improve information
management practices
Promote awareness and use of the Taxonomy
Taxonomy Strategies LLC
The business of organized information
46
Remaining Controlled Items
Performance Measures to go along with Charter?
Team Processes (see later in this presentation)
Automatic Classifier Training Materials
Website
Presentations and Announcements
Change Request List (see later in this presentation)
“Taxonomy Roadmap”
Advanced practice, requires long planning horizon for
organization's IT projects
Taxonomy Strategies LLC
The business of organized information
47
Exercise 2: Editorial Rules
Look at sample taxonomy
Think of ways to clean it
up and make it ‘better’
Smaller
More professional looking
Easy to use
Write editorial rules for the
cleanups.
Provide an example with
each rule:
Rule Name
Plumem
The business of organized information
Lorne ipso ernum de jura fino el
Symosyit Esr
Dirgin a periso de forestima
Himerisf
Faleoin fi ribska firn eowkds
Capitalization
Taxonomy Strategies LLC
Editorial Rule
All terms in lowercase.
“programming, NOT “Programming”
48
Exercise 2: Sample Taxonomy
Source: http://del.icio.us/tag/
Taxonomy Strategies LLC
The business of organized information
49
Exercise 2: Editorial Rules Worksheet
Provide a name for each rule, the rule itself, and an example of the rule of
the form “X, not Y”.
Rule Name
Capitalization
Taxonomy Strategies LLC
Editorial Rule
All terms, except proper nouns, are lowercase.
E.g. “programming”, NOT “Programming”.
E.g. “Schwab”, not “schwab”.
The business of organized information
50
Agenda
10:15
Introduction
10:30
Background
10:35
Maintainable Taxonomies
10:45
Maintainable Metadata
10:50
ROI Estimation
11:00
Governance Environment
11:10
Controlled Items
11:30
Team Structures
11:45
Change Process
12:00
Exercises
12:15
Adjourn
Taxonomy Strategies LLC
The business of organized information
51
Organization 1: Taxonomy
Governance Team
Organization 1 – Internal portal for
Fortune 50 Diversified Multinational.
Team’s liaison to content creators
Estimates costs of proposed changes in terms
of editorial process changes, additional or
reduced workload, etc.
Small-scale Metadata QA Responsibility
Executive Sponsor
Advocate for the taxonomy team
Business Lead
Keeps team on track with larger business
objectives
Balances cost/benefit issues to decide
appropriate levels of effort
Specialists help in estimating costs
Obtains needed resources if those in team
can’t accomplish a particular task
Technical Specialist
Estimates costs of proposed changes in
terms of amount of data to be retagged,
additional storage and processing burden,
software changes, etc.
Helps obtain data from various systems
Taxonomy Strategies LLC
The business of organized information
Content Specialist
Taxonomy Specialist
Suggests potential taxonomy changes based
on analysis of query logs, indexer feedback
Makes edits to taxonomy, installs into system
with aid of IT specialist
Content Owner
Reality check on process change suggestions
Changes
Taxonomy Strategist
Taxonomist
Information Architect 2
Communications Specialist*
52
Organization 2: Vocabulary Policy
Committee
Organization 2 – A non-profit international
organization. Goal is to improve information
management practices to reduce overlap
between many similar vocabularies across
many systems.
Constraint: Even when number of
vocabularies reduced, some must still have
very close links.
Business Lead
Chairs group.
Assures CVs fit with organization’s larger
information management effort.
Small group management experience,
Information management background.
Vocabulary Custodians (3)
Responsible for content in a specific CV, typically
based on organizational lines.
Team lead experience, detail-oriented. Familiar with
databases and organization processes
Other Relevant Staff
IT Steering Group
Oversees Vocabulary Policy Committee
Stakeholders
Managers of systems using the vocabularies, thus
affected by changes.
They have a lot of visibility into the process.
Control over CV changes is limited, but they
schedule their system’s adoption of changes.
Additional Roles – available during startup of
team, and on an as-needed basis later
Training Representative
Develops communications plan, training materials
Work Practices Representative
Develops processes, monitors adherence
IT Representative
Backups, admin of CV Tool
IT administration experience
Taxonomy Strategies LLC
The business of organized information
53
Organization 3: Taxonomy Team
Organization 3 – Public catalog site
for Fortune 50 Retailer. Data for
products provided by manufacturers.
Business Lead
Chairs committee, resolves disputes
Marketing Representatives
Provide product marketing expertise
Advocate for product manufacturers
Represent data entry concerns
Website Representative
Likely Changes
Fast-Track Process – A fast-track
process exists, likely to be used very
often. Representative will ask
Taxonomy Specialist for a change
and he will get approval from Website
Representative.
Provides input on search and
navigation impacts
Advocate for customers and other
website users
Provides search log and click trail
analysis
Larger team than many retailers,
where a single person is
responsible.
Taxonomy Specialist
Maintains taxonomy and product
catalog
Provides data feeds to drive site
Taxonomy Strategies LLC
The business of organized information
A single person still makes the
changes here, but there is some
oversight.
54
What if I have to do it solo?
Realize:
Its not totally solo – IT help,
Graphics & UI help, Business
Goals help, Funding help, Review
& QA help…
You are the general contractor
It needs to be part of your
objectives
Limit the objectives to what can be
achieved by you, and by your
organization
Concentrate:
Resource allocation
(i.e. Manage your time)
Fundamental processes
Query log examination
Error correction procedure
Cherry-pick from Roles
Business Lead – align with
organization goals, get needed
resources, make cost/benefit
decisions, report upstairs
IT Liaison – Work with IT
specialists to get software installed,
logs gathered, content harvested,
etc. Consider impact of changes on
tools and data
Taxonomy / Search Specialist –
analyze behavior and suggest
changes. Implement changes
which pass cost/benefit muster
Website/User Representative –
consider impact of changes on
users and job performance
Communications!!!
Taxonomy Strategies LLC
The business of organized information
55
Exercise 3: Team & Stakeholder
Identification
Role
Applicable/Modify
Name(s)
Taxonomy Team Members
Team Lead
Taxonomy Editor(s)
Vocabulary Custodian(s)
Liaisons with external vocabularies
Liaisons with applications using vocabularies
User advocate(s)
Training / Communications
IT / Data & System Maintenance
External Stakeholders
Team Supervisory Group
Representatives of external vocabularies
Representatives of consuming applications
Representatives of users
Other representatives of organization
Taxonomy Strategies LLC
The business of organized information
56
Agenda
10:15
Introduction
10:30
Background
10:35
Maintainable Taxonomies
10:45
Maintainable Metadata
10:50
ROI Estimation
11:00
Governance Environment
11:10
Controlled Items
11:30
Team Structures
11:45
Change Process
12:00
Exercises
12:15
Adjourn
Taxonomy Strategies LLC
The business of organized information
57
Taxonomy editing tools
Immature industry
– no vendors in
upper-right
quadrant!
Ability to Execute
high
Most popular taxonomy
editor? MS Excel
low
All upper-end tools are
high functionality and
high cost.
Widely used, cheap,
good reporting, bad IDs
Niche Players
Taxonomy Strategies LLC
Completeness of Vision
The business of organized information
Visionaries
58
Basic
Standard and Custom Fields
Standard and Custom Relations
Data Typing and
Restrictions
Consistency Enforcement
Flexible Reporting
Flexible Importing?
Midrange
UNICODE
Multiple Vocabulary Support
Inter-Vocabulary Relations
Unique IDs
ISO Codes not sufficient
Advanced
Taxonomy editor functionality
requirements
Workflow
Voting
Change Request Mgmt.
Stylistic rules enforcement
Programmability
Taxonomy Strategies LLC
The business of organized information
Term
Editing
Hierarchy
Browser
59
Taxonomy governance: Where changes
come from
Firewall
Application
UI
Tagging
UI
Content
Application
Logic
Tagging
Logic
Taxonomy
Staff
notes
‘missing’
concepts
Query log
analysis
End User
Recommendations by Editor
1. Small taxonomy changes
(labels, synonyms)
2. Large taxonomy changes
(retagging, application
changes)
3. New “best bets” content
Taxonomy Strategies LLC
Tagging Staff
Taxonomy Editor
Taxonomy Team
The business of organized information
Team considerations
1. Business goals
2.
experience
Changes in user
experience
3. Retagging cost
Requests from other
Requests
from
other
parts of
NASA
parts of the organization
60
Processes
Different organizations will need to
consider their own change
processes.
Organization 1: A custodian is
responsible for the content, but checks
facts with department heads before
making changes
Organization 2: Analysts suggest
changes, editors approve, copyeditors
verify consistency
Organization 3: Marketing reps ask for
a change, taxonomy editor makes
demo, web representative approves it.
Change process MUST also
consider cost of implementing the
change
Retagging data
Reconfiguring auto-classifier
Retraining staff
Changes in user expectations
Taxonomy Strategies LLC
The business of organized information
Taxonomy Change Cases
Case 1.
Renaming a term
Case 2.
Adding a new leaf term
Case 3.
Inserting a new term
Case 4.
Splitting a term
Case 5.
Deleting a leaf term or
subtree
Case 6.
Deleting a term
Case 7.
Moving a subtree
Case 8.
Merging terms
Case 9.
Adding a CV
Case 10. Deleting a CV
61
Taxonomy governance: Taxonomy
maintenance workflow
Problem?
Yes
Suggest new
name/category
Review new
name
Problem?
No
Copy edit new
name
Add to
enterprise
Taxonomy
Taxonomy
No
Yes
Taxonomy Tool
Taxonomy Strategies LLC
Analyst
The business of organized information
Editor
Copywriter
Sys Admin
62
Other change processes
Processes may be diagramed or Organization X:
written
Change Request Process
Provide an ‘emergency’ change
process because it will be
needed.
How can emergency changes be
requested? Who makes the
change and who approves it?
Who are backups for the people
when they are out?
Who are escalation points?
Change Request Process
should call out decision criteria,
e.g.
Anyone can ask a team member
for a change. Team members
responsible for figuring out
details and bringing to team for
decision.
Pending changes list for low
priority/high cost items.
Change Process
Includes preview of change on
site and data mockup
Fast-Track Change Process
Anyone can ask editor, he gets
team leader or deputy approval
Cost of retagging
Benefit of change
Conflict with editorial rules
Taxonomy Strategies LLC
The business of organized information
63
Fundamental Processes & Outlooks
Two fundamental processes every organization
should implement to maintain its metadata and
taxonomies:
Query log / Click trail examination
Error Correction
What are the key outlooks a taxonomist should try to
instill in their organization?
Integrated approach to Taxonomy, Metadata, Search,
and UI
Measure & Improve Mindset
Taxonomy Strategies LLC
The business of organized information
64
Fundamental process #1 – Query log
examination
How can we characterize users
and what they are looking for?
•
•
Query Log & Click Trail
Examination
•
Only 30-40% of organizations
•
interested in Taxonomy
Governance examine query logs*
Basic reports provide plenty of
real value
Greatest value comes from:
Identifying a person as
responsible for search quality
Starting a “Measure & Improve”
mindset
Greatest challenge:
•
•
UltraSeek Reporting
Top queries
Queries with no
results
Queries with no
click-through
Most requested
documents
Query trend analysis
Complete server
usage summary
Click Trail
Packages
iWebTrack
NetTracker
OptimalIQ
SiteCatalyst
Visitorville
WebTrends
Getting a person assigned (≥
10%)
Getting logs turned back on
Source: Metadata Maturity Model Presentation, Ron Daniel, ESS’05
Taxonomy Strategies LLC
The business of organized information
65
Fundamental process #2 – Error correction
Errors will happen, and some will be found. What are you going
to do about them?
Tagging errors, content errors, taxonomy errors, …
Define an error correction process.
You have an error correction
process. Would you hate to
see it on paper?
Process will accommodate questions like:
Who looks at it? Is it an error? What are the costs to correct vs. not
correct? Does the correction need to be scheduled? etc.
Once a tagging error is corrected, NEVER lose that fact.
Manually reviewed pages are vital for training automatic classifiers
Has implications for metadata specification and review procedures
Over time, multiple error detection methods will be defined
e.g. Statistical sampling of newly added pages
Gradually, additional error correction processes may be defined to deal
with particular types of errors
Taxonomy Strategies LLC
The business of organized information
66
Fundamental Outlooks
Measure & Improve Mindset
Query logs and click trails are prime example
Next place to instrument: Error correction and error
detection processes
Integrated handling of Taxonomy, Metadata, UI, &
Search
To be most effective, these must work together
Governance structure must help that happen
Cross-functional team structure is a start
Taxonomy Strategies LLC
The business of organized information
67
Actions to define taxonomy governance
Initial vocabularies should be selected for stability as
well as utility.
Custodians of shared vocabularies must be identified,
educated re. impacts of changes.
Group of custodians and stakeholders must be
established.
(Simple) System for sharing the CVs and tracking the
update process must be established.
Taxonomy Strategies LLC
The business of organized information
68
Agenda
10:15
Introduction
10:30
Background
10:35
Maintainable Taxonomies
10:45
Maintainable Metadata
10:50
ROI Estimation
11:00
Governance Environment
11:10
Controlled Items
11:30
Team Structures
11:45
Change Process
12:00
Exercises
12:15
Adjourn
Taxonomy Strategies LLC
The business of organized information
69
Exercise 4: Self-Diagnosis
1. Does your organization know what it is, or wants to be, doing around
search & taxonomy yet?
18. Do you have an identified taxonomy “team” with at least one person?
2. Is the cost basis for the taxonomy ROI clear to you?
19. Is there at least one person working on taxonomy/metadata/search more
than ½ time?
3. Is the benefits basis for the taxonomy ROI clear to you?
20. Does the team contain members who represent search, UI, and
metadata tagging?
4. Is the cost basis for the taxonomy ROI clear to your CFO?
5. Is the benefits basis for the taxonomy clear to your CFO?
21. Does the organization have any hiring and training criteria for taxonomy,
metadata, and search positions?
6. Do you know how content will be tagged?
22. Does the team maintain Editorial Rules?
7. Do you know how tagged content will be displayed to users?
23. Does the team maintain a corporate metadata specification?
8. Do you know how users will fetch the content?
24. Does the team maintain educational materials?
9. Do users know how they should report errors in the tagging?
25. Does the team have a communications plan?
10. Do you know how what information will be logged for later analysis?
26. Does the team examine query logs?
11. Do you know what information has to be reported to management to
justify the taxonomy team?
27. Does the team examine click trails?
12. Does management expect the taxonomy team to justify its existence?
13. Is your organization planning a tightly focused taxonomy effort?
14. Is your organization planning a credible ‘Enterprise Taxonomy Strategy’?
15. Does your organization expect its taxonomies to change frequently?
16. Has your organization identified some facets as stable and some facets
as volatile?
17. Does your organization have a plan for retagging data when the
taxonomy is changed?
Taxonomy Strategies LLC
The business of organized information
28. Does the team have a documented error correction process?
29. Does the organization have a procedure to locate ROT (Redundant,
Obsolete, or Trivial content)?
30. Does the organization have any qualitative or quantitative measures of
data quality?
31. Do you use a tool other than MS Excel for editing and maintaining the
Taxonomy?
32. Were taxonomy, metadata, search, or content management tools
purchased with money other than “use it or lose it” funds?
70
Agenda
10:15
Introduction
10:30
Background
10:35
Maintainable Taxonomies
10:45
Maintainable Metadata
10:50
ROI Estimation
11:00
Governance Environment
11:10
Controlled Items
11:30
Team Structures
11:45
Change Process
12:00
Exercises
12:15
Adjourn
Taxonomy Strategies LLC
The business of organized information
71
Taxonomy Strategies LLC
Contact Info
Ron Daniel
925-368-8371
[email protected]
Sept. 28, 2005
Copyright 2005 Taxonomy Strategies LLC. All rights reserved.