Transcript Slide 1

Taxonomy Strategies LLC
Assorted Slides on Taxonomy &
Metadata Governance
Ron Daniel, Jr.
Copyright 2009Taxonomy Strategies LLC. All rights reserved.
Creating a Governance Structure for the
Ongoing Maintenance of the Taxonomy
 Taxonomies must change if they are to remain
relevant. But what will it cost to make those
changes to the taxonomy and to the data which is
categorized by it? Organizations must have
appropriate maintenance processes so that the
taxonomy changes are based on rational cost/benefit
decisions, without becoming mired in endless
paperwork. This interactive workshop will highlight
the framework for creating taxonomy governance
teams and what their specific responsibilities should
be. Special attention will be given to defining
maintainable taxonomies and metadata for achieving
business needs.
Taxonomy Strategies LLC
The business of organized information
2
Agenda
10:15
Introduction
10:30
Background
10:35
Maintainable Taxonomies
10:45
Maintainable Metadata
10:50
ROI Estimation
11:00
Governance Environment
11:10
Controlled Items
11:30
Team Structures
11:45
Change Process
12:00
Exercises
12:15
Adjourn
Taxonomy Strategies LLC
The business of organized information
3
Three Problems
Taxonomy development and maintenance is the LEAST of three
problems:
 The Taxonomy Problem: How are we going to build and maintain the lists
of pre-defined values that can go into some of the metadata elements?
 The Tagging Problem: How are we going to populate metadata elements
with complete and consistent values?
 What can we expect to get from automatic classifiers? What kind of error
detection and error correction procedures do we need? What fields do we need?
 The ROI (Return On Investment) Problem: How are we going to use
content, metadata, and vocabularies in applications to obtain business
benefits?
 More sales? Lower support costs? Greater productivity? Risk avoidance?
 How much content? How big an operating budget? How to expose to users?
Tolerance for poor data quality?
Business Goals and Cultural Factors are major influences on
tagging and taxonomy. These must be acknowledged at the start
to avoid rework.
Taxonomy Strategies LLC
The business of organized information
4
There’s more to maintaining the Taxonomy
than just maintaining the Taxonomy
What must change when the Taxonomy changes?
 The master copy of the taxonomy.
This is a set of items that
might be maintained by
 The data tagged with the taxonomy?
taxonomy team and need
to be updated. Few groups
 The user interface which uses the taxonomy?
will have all of these under
maint. by the taxo team.
 Backend system software which uses the taxonomy?
 The training set for automatic classifiers?
 The educational material for users, catalogers, programmers, etc.?
 The information sent to downstream users of the taxonomy?
The versions of the taxonomy distributed to others.
The list of changes.
 Announcements for stakeholders?
Taxonomy Strategies LLC
The business of organized information
5
Agenda
10:15
Introduction
10:30
Background
10:35
Maintainable Taxonomies
10:45
Maintainable Metadata
10:50
ROI Estimation
11:00
Governance Environment
11:10
Controlled Items
11:30
Team Structures
11:45
Change Process
12:00
Exercises
12:15
Adjourn
Taxonomy Strategies LLC
The business of organized information
6
Metadata and Taxonomy
Metadata
Field
Title
Data Type
String
Example
Big
“The Perl Directory”
Creator
Identifier
String
URL
simple hierarchy has
lots of nodes and is a lot of
The Perl Foundation
work to maintain.
http://www.perl.org/
Date
DateTime
Jan. 12, 2006
Subject
List
Computers : Programming : Languages : Perl
Taxonomy
Taxonomy Strategies LLC
The business of organized information
7
DMOZ: A worst case example of a unified
‘subject’
 DMOZ has over 600k categories
 Most are a combination of common facets – Geography,
Organization, Person, Document Type, …
 (e.g.) Top: Regional: Europe: Spain: Travel and Tourism: Travel Guides
 (BTW – DMOZ Governance model is out of whack)
Business
Biotechnology &
Pharmaceuticals
Education &
Training
Regional
Europe
Ireland
Business &
Economy
Employment
Health & Medical
Reference
Education
Colleges &
Universities
North America
United States
Maryland
Reference
Education
K-12
Home Schooling
Unschooling
Chats and Forums
Science
Math
Academic
Departments
South America
Colombia
Society
People
Women
Science &
Technology
Mathematics
Science
Social Sciences
Linguistics
Translation
Associations
Business
Small Business
Finance
Business
Accounting
Firms
Business
Employment
By Industry
Business
Healthcare
Employment
Taxonomy Strategies LLC
Columbia
Union College
Athletics
Competency (discipline)
11
Geography
9
Audience
9
Topic
7
Accounting
Organization
5
Directories
Doc Type
4
Industry
4
Process
4
Regional
The business of organized information
8
 If you want to get technical here,
you can explain that lots of big
hierarchies are pre-coordinated
combinations of items that could
come from separate facets. This
introduces some arbitrary
choices (do we list content type
first and location second, or …).
It also leads to a lot of repeated
substructure which means there
have to be edits in many places
to make what is in concept a
pretty small change.
Taxonomy Strategies LLC
The business of organized information
9
The power of taxonomy facets
 Categorize in multiple,
independent, categories.
 Allow combinations of
categories to narrow the
choice of items.
 4 independent categories of
10 nodes each have the
same discriminatory power as
one hierarchy of 10,000
nodes (104)
Main
Ingredients
•
•
•
•
•
•
•
•
•
•
Chocolate
Dairy
Fruits
Grains
Meat &
Seafood
Nuts
Olives
Pasta
Spices &
Seasonings
Vegetables
Meal Type
•
•
•
•
•
•
Breakfast
Brunch
Lunch
Supper
Dinner
Snack
Cooking
Methods
Cuisines
•
•
•
•
•
•
•
•
•
•
•
African
American
Asian
Caribbean
Continental
Eclectic/
Fusion/
International
Jewish
Latin American
Mediterranean
Middle Eastern
Vegetarian
•
•
•
•
•
•
•
•
•
•
•
•
•
Advanced
Bake
Broil
Fry
Grill
Marinade
Microwave
No Cooking
Poach
Quick
Roast
Sauté
Slow
Cooking
• Steam
• Stir-fry
 Easier to maintain
 Can be easier to navigate
42 values to maintain (10+6+11+15)
9900 combinations (10x6x11x15)
Taxonomy Strategies LLC
The business of organized information
10
How do I get a good Taxonomy? – Seven
practical rules
1) Incremental, extensible process that identifies and enables
users, and engages stakeholders.
2) Quick implementation that provides measurable results as
quickly as possible.
3) Not monolithic—has separately maintainable facets.
4) Re-uses existing IP as much as possible.
5) A means to an end, and not the end in itself .
6) Not perfect, but it does the job it is supposed to do—such as
improving search and navigation.
7) Improved over time, and maintained.
Taxonomy Strategies LLC
The business of organized information
11
Some vocabulary construction rules
 Don’t just have names, also have identifiers
 This will reduce retagging later when names change
 When tagging content, use the most specific code. Let software handle the
hierarchy.
 Bonus: Use URIs for node IDs & publish on the web (See LINKED DATA
in the futures chapter)
 Develop scope notes
 Not just a definition, also say what kind of content the node applies to
 Metadata specification must state the vocabulary for a element.
 Gather data from multiple sources
 Talk with users and experts
 Analyze query logs and content
 Choose and arrange terms
 Test and finalize first version
 Shift into maintenance mode
Taxonomy Strategies LLC
The business of organized information
12
What do I do with all these facets?
 Either expose them directly in
the user interface (postcoordinating)
or
 Combine them in a minimal
hierarchy (pre-coordination)
 Post-coordination takes
software support, which may be
fancy or basic.
 How many facets?
(See elsewhere)
Taxonomy Strategies LLC
The business of organized information
13
Agenda
10:15
Introduction
10:30
Background
10:35
Maintainable Taxonomies
10:45
Maintainable Metadata
10:50
ROI Estimation
11:00
Governance Environment
11:10
Controlled Items
11:30
Team Structures
11:45
Change Process
12:00
Exercises
12:15
Adjourn
Taxonomy Strategies LLC
The business of organized information
14
Maintainable Metadata
 Design metadata specification for future changes
 Lessons from the Dublin Core
 Provide metadata tagging and storage that will deal
with changes
Taxonomy Strategies LLC
The business of organized information
15
Dublin Core: A little more complicated over
time
Elements
Refinements
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
Abstract
Access rights
Alternative
Audience
Available
Bibliographic citation
Conforms to
Created
Date accepted
Date copyrighted
Date submitted
Education level
Extent
Has format
Has part
Has version
Is format of
Is part of
Identifier
Title
Creator
Contributor
Publisher
Subject
Description
Coverage
Format
Type
Date
Relation
Source
Rights
Language
Taxonomy Strategies LLC
Encodings Types
Is referenced by
Is replaced by
Is required by
Issued
Is version of
License
Mediator
Medium
Modified
Provenance
References
Replaces
Requires
Rights holder
Spatial
Table of contents
Temporal
Valid
The business of organized information
Box
DCMIType
DDC
IMT
ISO3166
ISO639-2
LCC
LCSH
MESH
Period
Point
RFC1766
RFC3066
TGN
UDC
URI
W3CTDF
Collection
Dataset
Event
Image
Interactive
Resource
Moving Image
Physical Object
Service
Software
Sound
Still Image
Text
16
Design Metadata Specification for future
changes
 Degree of future changes will depend on organization size,
sophistication of use, number of repositories and amount of
content.
 Don’t over-engineer
 For all organizations: start with the Dublin Core with a few
additions and deletions for specific needs
 At large/sophisticated organizations:
 “Refinements” will be unavoidable in the future.
 Start with “DatePublished” so that later additions of “DateModified”,
DateApproved”, “DateVerified”, etc. fit in easily.
 Identify broad “integration metadata” vs. division-specific fields.
Coordinate with others to set up a working understanding of a
corporate multi-level metadata standard.
Taxonomy Strategies LLC
The business of organized information
17
Provide metatagging and storage that will
deal with changes
 Tag with identifiers, not names.
 This will reduce retagging later when names change
 Not good if people need to view raw tagging, but usually software
will be involved to show labels.
 When tagging content, use the most specific concept. Let
software handle the hierarchy.
 Metadata is easier to manage if it is stored in a central
repository, instead of spread out in the individual files.
 Exception – when sending files out to other systems (e.g. photo
metadata)
 Warning – ‘metadata repositories’ are usually a different class of
software than what we are discussing.
Taxonomy Strategies LLC
The business of organized information
18
Agenda
10:15
Introduction
10:30
Background
10:35
Maintainable Taxonomies
10:45
Maintainable Metadata
10:50
ROI Estimation
11:00
Governance Environment
11:10
Controlled Items
11:30
Team Structures
11:45
Change Process
12:00
Exercises
12:15
Adjourn
Taxonomy Strategies LLC
The business of organized information
19
Fundamentals of taxonomy ROI
 Tagging content using a taxonomy is a cost, not a
benefit.
 There is no benefit without exposing the tagged
content to users in some way that cuts costs,
improves revenues, reduces risk, or achieves some
other clear business goal.
 Putting taxonomy into operation requires UI
changes and/or backend system changes, as well
as data changes.
 You need to determine those changes, and their
costs, as part of the ROI.
Taxonomy Strategies LLC
The business of organized information
20
Key Factors in ROI
 Breadth
 “How many people will metadata affect?”
 Repeatability
 “How many times a day will they use it?
 Cost/Benefit
 “Is this a costly effort with little or no benefits?”
Taxonomy Strategies LLC
The business of organized information
21
How to estimate costs — Tagging
Consider complexity of facet and
ambiguity of content to estimate time
per value.
Taxonomy Facet
Hier?
Typical
CV Size
Time/
Value
(min)
Avg #
values /
Item
$ / Min
Cost/
Element
Audience
N
10
0.25
2
$
0.42
$
0.21
Content Type
N
20
0.25
1
$
0.42
$
0.11
Organizational Unit
Y
50
0.5
2
$
0.42
$
0.42
Products & Services
Y
500
1.5
4
$
0.42
$
2.52
Geographic Region
Y
100
0.5
2
$
0.42
$
0.42
Broad Topics
Y
400
2
4
$
0.42
$
3.36
1080
5
15
$
7.04
TOTALS
Is this field
worth the
cost?
Estimated cost of tagging one item. This can be reduced
with automation, but cannot be eliminated.
Inspired by: Ray Luoma, BAU Solutions
Taxonomy Strategies LLC
The business of organized information
22
How to estimate costs — Assumptions
Your numbers will vary.
ASSUMPTIONS
Enterprise SW License
$ 100,000
Maintenance/Support
15%
SW Implementation
200%
Legacy Content Items
100,000
Content Growth Rate
15%
Tagging/Item
$
Enterprise Taxonomy
$ 100,000
Taxonomy Strategies LLC
The business of organized information
7.04
23
How to estimate costs — Total cost of
ownership (TCO)
Description
Year 1
Year 2
Year 3
Year 4
Year 5
SW
Licenses
$
100,000
Maintenance
Implementation
$
$
15,000
$
15,000
$
15,000
$
15,000
$
30,000
$
30,000
$
30,000
$
30,000
$
105,600
$
121,440
$
139,656
$
160,604
$
15,000
$
15,000
$
15,000
$
15,000
$
165,600
$
181,440
$
199,656
$
220,604
200,000
App Tech Support
Tagging
Legacy Content
$
704,000
Ongoing
Taxonomy
Creation
$
100,000
Maintenance
TOTAL
Taxonomy Strategies LLC
$ 1,103,500
The business of organized information
24
Sample ROI Calculations
Description
Year 1
Year 2
Year 3
Year 4
Year 5
Costs
Software Licenses/
Maintenance
$
100,000
$
15,000
$
15,000
$
15,000
$
15,000
Implementation/Support
$
200,000
$
30,000
$
30,000
$
30,000
$
30,000
Taxonomy Creation/
Maintenance
$
100,000
$
15,000
$
15,000
$
15,000
$
15,000
Legacy/Ongoing Tagging
$
703,500
$ 105,600
$
121,440
$
139,656
$
160,604
Ongoing cost of tagging due to 15% content growth.
Benefits
Productivity increases
$
-
$ 125,000
$ 1,250,000
$ 1,250,000
$ 1,250,000
Service efficiency gains
$
-
$ 129,600
$ 1,296,000
$ 1,296,000
$ 1,296,000
Yearly Net Benefits
$(1,103,500)
$
$ 2,364,560
$ 2,346,344
$ 2,325,396
Payback period
1.4
89,000
Years until Benefits = Costs
Inspired by: Todd Stephens, Dublin Core Global Corporate Circle
Taxonomy Strategies LLC
The business of organized information
25
Where do the benefits come from?
Common taxonomy ROI scenarios
 Catalog site - ROI based on increased sales through improved:
 Product findability
 Product cross-sells and up-sells
 Customer loyalty
 Call center - ROI based on cutting costs through:
 Fewer customer calls due to improved website self-service
 Faster, more accurate CSR responses through better information access
 Compliance – ROI based on:
 Avoiding penalties for breaching regulations
 Following required procedures (e.g. Medical claims)
 Knowledge worker productivity - ROI based on cutting costs through:
 Less time searching for things
 Less time recreating existing materials, with knock-on benefits of less confusion and
reduced storage and backup costs
 Executive mandate
 No ROI at the start, just someone with a vision and the budget to make it happen
Taxonomy Strategies LLC
The business of organized information
26
Agenda
10:15
Introduction
10:30
Background
10:35
Maintainable Taxonomies
10:45
Maintainable Metadata
10:50
ROI Estimation
11:00
Governance Environment
11:10
Controlled Items
11:30
Team Structures
11:45
Change Process
12:00
Exercises
12:15
Adjourn
Taxonomy Strategies LLC
The business of organized information
27
Generic, yet Important, Advice
 It’s not about the tools. It’s not about the taxonomy.
It’s about the business goals and the processes
people use to meet those goals.
 Metrics are grossly underused in metadata and
search.
Taxonomy Strategies LLC
The business of organized information
28
Taxonomy governance overview
 Taxonomy governance can be viewed as a standards
process
Closely linked to organizational metadata standard
Taxonomy must evolve, but in predictable way
 Take tips from other standards efforts
Team structure, with an appeals process
 Taxonomy stewardship is part-time role at most organizations
 Team needs to make decisions based on costs and benefits
Documentation and educational material on Taxonomy and Metadata
Announcements
Comment-handling responsibilities (part of error-correction process)
Issue Logs
Release Schedule
Taxonomy Strategies LLC
The business of organized information
These practices are in rough
order of implementation.
29
Taxonomy governance environment
Change Requests
& Responses
1: External vocabularies
change on their own
schedule, with some
advance notice.
ISO
3166-1
Other
External
Published
Facets
Consuming
Applications
Web CMS
2: Team decides when
to update facets
within Taxonomy
Archives
Intranet
Search
Vocabulary
Management
System
ERMS
Notifications
CVs
ERP
3: Team adds value via
mappings, translations,
synonyms, training
materials, etc.
Custodians
Other
Internal
CV (Controlled Vocabulary) –
The list of values for one
facet in the Taxonomy.
Taxonomy Strategies LLC
’
Other
Controlled
Items
Intranet
Nav.
DAM
…
…
’
4: Updated versions of
facets published to
consuming
Taxonomy Governance
applications
Environment
The business of organized information
30
Agenda
10:15
Introduction
10:30
Background
10:35
Maintainable Taxonomies
10:45
Maintainable Metadata
10:50
ROI Estimation
11:00
Governance Environment
11:10
Controlled Items
11:30
Team Structures
11:45
Change Process
12:00
Exercises
12:15
Adjourn
Taxonomy Strategies LLC
The business of organized information
31
Controlled Items
 Taxonomy Team will have several items to manage:







Controlled Vocabularies
Metadata Standard
Editorial Rules
Tagger Training Materials (manual and automatic)
Charter, Goals, Performance Measures
Team Processes
Outreach & ROI




Website
Communication plan
Presentations
Announcements
 “Roadmap”
 Advanced practice, requires long planning horizon for organization's IT
projects
 Even small taxonomy teams should develop many of these
items, although not to the same level of formality.
Taxonomy Strategies LLC
The business of organized information
32
Controlled Vocabularies are not just tabbed
lists
Source: NASA Taxonomy Competencies Facet
http://nasataxonomy.jpl.nasa.gov/nascomp/index_tt.htm
Taxonomy Strategies LLC
The business of organized information
33
Controlled Item: Metadata Specification
Element Name
XML Map
Repeatable
Source
Purpose
General Purpose Metadata
Unique ID
dc:identifier
1
System supplied
System identifier to retrieve item.
Owner
dc:creator
?
System supplied
POC for content maintenance
Title
dc:title
1
User supplied
Text search & results display
Date
dc:date
1
System supplied
Publish, feature, & review content.
Subject Metadata
Organization
x:corp
*
Corp Classif CV
Asset
x:asset
*
Asset CV
Region/Country
dc:coverage
*
Country CV
Basin/Platform/Well
x:well
*
B/P/Well CV
Content Type
dc:type
?
Content Types CV
Company/Client/Op
erator/Partner
x:company
*
Company CV
Project
x:project
*
Project CV
Search for, browse, group & filter
search results.
Use Metadata
Discipline
dcTerms:
audience
*
Discipline CV
Target, personalize content.
Retention
x:retention
1
System supplied
Remove expired content
Legend:
Taxonomy Strategies LLC
? – 1 or more
The business of organized information
* - 0 or more
34
Controlled Item: Editorial Rules

Akin to “Chicago Manual of Style”

Issues commonly addressed in the rules:
Abbreviations
Ampersands
Capitalization
Continuations (More… or Other…)
Duplicate Terms
Fidelity to External Source
Hierarchy and Polyhierarchy
Languages and Character Sets
Length Limits
“Other” – Allowed or Forbidden?
Plural vs. Singular Forms
Relation Types and Limits
Scope Notes
Serial Comma
Sources of Terms
Spaces
Synonyms and Acronyms
Translations
Term Order (Alphabetic or …)
Term Label Order (Direct vs. Inverted)

What to do when rules conflict – how do people
decide which rule is more important?
Taxonomy Strategies LLC
The business of organized information
Rule Name
Editorial Rule
Use Existing
Vocabularies
Other things being equal, reusing an existing
vocabulary is preferred to creating a new
one.
Ampersands
The character '&' is preferred to the word
‘and’ in Term Labels.
Example: Use Type: “Manuals & Forms”, not
“Manuals and Forms”.
Special
Characters
Retain accented characters in Term Labels.
Example: Use “España”, not “Espana”.
Serial comma
If a category name includes more than two
items, separate the items by commas. The
last item is separated by the character ‘&’
which IS NOT preceded by a comma.
Example: “Education, Learning &
Employment”, not “Education, Learning, &
Employment”.
Capitalization
Use title case (where all words except
articles are capitalized).
Example: “Education, Learning &
Employment”
NOT “Education, learning & employment”
NOT “EDUCATION, LEARNING &
EMPLOYMENT”
NOT “education, learning & employment”
…
…
35
Controlled Item: Training Materials
 Staff will require training on
 The UI they use to tag the
content
 The rules to follow when deciding
what codes to apply
 The end-effect of the codes they
apply
 The structure of the taxonomy
 Tagging examples come from
earlier stages in taxonomy
development process
Indexing rules
Rule
Description
Specificity
rule
Apply the most specific terms when tagging
assets. Specific terms can always be generalized,
but generic terms cannot be specialized.
Repeatable
rule
All attributes should be repeatable. Use as many
terms as necessary to describe What the asset is
about and Why it is important. Storage is cheap.
Re-creating content is expensive.
Appropriate
ness rule
Not all attributes apply to all assets. Only supply
values for attributes that make sense.
Usability
rule
Anticipate how the asset will be searched for in
the future, and how to make it easy to find it.
Remember that search engines can only operate
on explicit information.
Indexing UI
 Hardcopies of the taxonomy,
and yellow highlighters, are
helpful during training
Taxonomy Strategies LLC
The business of organized information
36
Controlled item: Communications Plan
 Stakeholders: Who are they and
what do they need to know?
 Channels: Methods available to
send messages to stakeholders.
 Need a mix of narrow vs. broad,
formal vs. informal, interactive vs.
archival, …
 Messages: Communications to be
sent at various stages of project.
 Bulk of the plan is here
Taxonomy Strategies LLC
The business of organized information
Stakeholders
Info. Needed
Project Sponsors
Progress, Issues, Policies
Dept. Reps
Progress, Priorities,
…
…
Users
Progress, How-Tos
Vendors
RFPs & SOWs
Channel
Description
Demo
Live, or screen capture for download
Presentation
Tailored message for specific
audience
Website
Overview info for all, link to files
Memo
Formal notification
…
…
Trigger
Msg.
Descrip
From
To
Chan.
Initiation
Project
overview
Dept.
head
All
Memo
…
…
…
…
…
37
Controlled Item: Team Charter
 Taxonomy Team is responsible for maintaining:
 The Taxonomy, a multi-faceted classification scheme
 Associated materials, including a website providing:




Corporate Metadata Standard
Editorial Style Guide
Taxonomy Training Materials
Team rules and procedures (subject to CIO review)
 Team evaluates costs and benefits of suggested changes.
 Taxonomy Team will:
 Manage relationship between providers of source
vocabularies and consumers of the Taxonomy
 Identify new opportunities for use of the Taxonomy
across the Enterprise to improve information
management practices
 Promote awareness and use of the Taxonomy
Taxonomy Strategies LLC
The business of organized information
38
Remaining Controlled Items
 Performance Measures to go along with Charter?
 Team Processes (see later in this presentation)
 Automatic Classifier Training Materials
 Website
 Presentations and Announcements
 Change Request List (see later in this presentation)
 “Taxonomy Roadmap”
 Advanced practice, requires long planning horizon for
organization's IT projects
Taxonomy Strategies LLC
The business of organized information
39
Exercise 2: Editorial Rules
 Look at sample taxonomy
 Think of ways to clean it
up and make it ‘better’
Smaller
More professional looking
Easy to use
 Write editorial rules for the
cleanups.
 Provide an example with
each rule:
Rule Name
Plumem
The business of organized information
Lorne ipso ernum de jura fino el
Symosyit Esr
Dirgin a periso de forestima
Himerisf
Faleoin fi ribska firn eowkds
Capitalization
Taxonomy Strategies LLC
Editorial Rule
All terms in lowercase.
“programming, NOT “Programming”
40
Exercise 2: Sample Taxonomy
Source: http://del.icio.us/tag/
Taxonomy Strategies LLC
The business of organized information
41
Exercise 2: Editorial Rules Worksheet
Provide a name for each rule, the rule itself, and an example of the rule of
the form “X, not Y”.
Rule Name
Plurals
Use plural form of names, not singular.
Capitalization
Taxonomy Strategies LLC
Editorial Rule
All terms, except proper nouns, are lowercase.
E.g. “programming”, NOT “Programming”.
E.g. “Schwab”, not “schwab”.
The business of organized information
42
Agenda
10:15
Introduction
10:30
Background
10:35
Maintainable Taxonomies
10:45
Maintainable Metadata
10:50
ROI Estimation
11:00
Governance Environment
11:10
Controlled Items
11:30
Team Structures
11:45
Change Process
12:00
Exercises
12:15
Adjourn
Taxonomy Strategies LLC
The business of organized information
43
Organization 1: Taxonomy Governance
Team
Organization 1 – Internal portal for
Fortune 50 Diversified Multinational.

Team’s liaison to content creators
Estimates costs of proposed changes in terms
of editorial process changes, additional or
reduced workload, etc.
Small-scale Metadata QA Responsibility
Executive Sponsor

Advocate for the taxonomy team
 Business Lead


Keeps team on track with larger business
objectives
Balances cost/benefit issues to decide
appropriate levels of effort
Specialists help in estimating costs

Obtains needed resources if those in team
can’t accomplish a particular task
 Technical Specialist


Estimates costs of proposed changes in
terms of amount of data to be retagged,
additional storage and processing burden,
software changes, etc.
Helps obtain data from various systems
Taxonomy Strategies LLC
The business of organized information
Content Specialist
 Taxonomy Specialist
Suggests potential taxonomy changes based
on analysis of query logs, indexer feedback
Makes edits to taxonomy, installs into system
with aid of IT specialist

Content Owner
Reality check on process change suggestions
Changes
Taxonomy Strategist
Taxonomist
Information Architect 2
Communications Specialist*
44
Organization 2: Vocabulary Policy Committee
 Organization 2 – A non-profit international
organization. Goal is to improve information
management practices to reduce overlap
between many similar vocabularies across
many systems.
Constraint: Even when number of
vocabularies reduced, some must still have
very close links.
 Business Lead



Chairs group.
Assures CVs fit with organization’s larger
information management effort.
Small group management experience,
Information management background.
 Vocabulary Custodians (3)


Responsible for content in a specific CV, typically
based on organizational lines.
Team lead experience, detail-oriented. Familiar with
databases and organization processes
Other Relevant Staff
 IT Steering Group
 Oversees Vocabulary Policy Committee
 Stakeholders
 Managers of systems using the vocabularies, thus
affected by changes.
 They have a lot of visibility into the process.
 Control over CV changes is limited, but they
schedule their system’s adoption of changes.
 Additional Roles – available during startup of
team, and on an as-needed basis later
 Training Representative
 Develops communications plan, training materials
 Work Practices Representative
 Develops processes, monitors adherence
 IT Representative


Backups, admin of CV Tool
IT administration experience
Taxonomy Strategies LLC
The business of organized information
45
Organization 3: Taxonomy Team
 Organization 3 – Public catalog site
for Fortune 50 Retailer. Data for
products provided by manufacturers.
 Business Lead
 Chairs committee, resolves disputes
 Marketing Representatives
 Provide product marketing expertise
 Advocate for product manufacturers
 Represent data entry concerns
 Website Representative
Likely Changes
 Fast-Track Process – A fast-track
process exists, likely to be used very
often. Representative will ask
Taxonomy Specialist for a change
and he will get approval from Website
Representative.
 Provides input on search and
navigation impacts
 Advocate for customers and other
website users
 Provides search log and click trail
analysis
Larger team than many retailers,
where a single person is
responsible.
 Taxonomy Specialist
 Maintains taxonomy and product
catalog
 Provides data feeds to drive site
Taxonomy Strategies LLC
The business of organized information
A single person still makes the
changes here, but there is some
oversight.
46
What if I have to do it solo?
Realize:
 Its not totally solo – IT help,
Graphics & UI help, Business
Goals help, Funding help, Review
& QA help…
 You are the general contractor
 It needs to be part of your
objectives
 Limit the objectives to what can be
achieved by you, and by your
organization
Concentrate:
 Resource allocation
 (i.e. Manage your time)
 Fundamental processes
 Query log examination
 Error correction procedure
 Cherry-pick from Roles
Business Lead – align with
organization goals, get needed
resources, make cost/benefit
decisions, report upstairs
IT Liaison – Work with IT
specialists to get software installed,
logs gathered, content harvested,
etc. Consider impact of changes on
tools and data
Taxonomy / Search Specialist –
analyze behavior and suggest
changes. Implement changes
which pass cost/benefit muster
Website/User Representative –
consider impact of changes on
users and job performance
 Communications!!!
Taxonomy Strategies LLC
The business of organized information
47
Exercise 3: Team & Stakeholder
Identification
Role
Applicable/Modify
Name(s)
Taxonomy Team Members
Team Lead
Taxonomy Editor(s)
Vocabulary Custodian(s)
Liaisons with external vocabularies
Liaisons with applications using vocabularies
User advocate(s)
Training / Communications
IT / Data & System Maintenance
External Stakeholders
Team Supervisory Group
Representatives of external vocabularies
Representatives of consuming applications
Representatives of users
Other representatives of organization
Taxonomy Strategies LLC
The business of organized information
48
Agenda
10:15
Introduction
10:30
Background
10:35
Maintainable Taxonomies
10:45
Maintainable Metadata
10:50
ROI Estimation
11:00
Governance Environment
11:10
Controlled Items
11:30
Team Structures
11:45
Change Process
12:00
Exercises
12:15
Adjourn
Taxonomy Strategies LLC
The business of organized information
49
Taxonomy editing tools
Immature industry
– no vendors in
upper-right
quadrant!
high
Most popular taxonomy
editor? MS Excel
Ability to Execute
This slide is out of date.
Don’t know if we want to
include this.
low
All upper-end tools are
high functionality and
high cost.
Widely used, cheap,
good reporting, bad IDs
Niche Players
Taxonomy Strategies LLC
Completeness of Vision
The business of organized information
Visionaries
50
Basic
Standard and Custom Fields
Standard and Custom Relations
Data Typing and
Restrictions
Consistency Enforcement
Flexible Reporting
Flexible Importing?
Midrange
UNICODE
Multiple Vocabulary Support
Inter-Vocabulary Relations
Unique IDs
ISO Codes not sufficient
Advanced
Taxonomy editor functionality requirements
Workflow
Voting
Change Request Mgmt.
Stylistic rules enforcement
Programmability
Taxonomy Strategies LLC
The business of organized information
Term
Editing
Hierarc
hy
Browse
51
Taxonomy governance: Where changes come
from
Firewall
Application
UI
Tagging
UI
Content
Application
Logic
Tagging
Logic
Taxonomy
Staff
notes
‘missing’
concepts
I think three sources of Query log
change requests is a big analysis
concept to communicate to
End User
readers.
Recommendations by Editor
1. Small taxonomy changes
(labels, synonyms)
2. Large taxonomy changes
(retagging, application
changes)
3. New “best bets” content
Taxonomy Strategies LLC
Tagging Staff
Taxonomy Editor
Taxonomy Team
The business of organized information
Team considerations
1. Business goals
2.
experience
Changes in user
experience
3. Retagging cost
Requests from other
Requests
from
other
parts of
NASA
parts of the organization
52
Processes
 Different organizations will need to
consider their own change
processes.
Organization 1: A custodian is
responsible for the content, but checks
facts with department heads before
making changes
Organization 2: Analysts suggest
changes, editors approve, copyeditors
verify consistency
Organization 3: Marketing reps ask for
a change, taxonomy editor makes
demo, web representative approves it.
 Change process MUST also
consider cost of implementing the
change
Retagging data
Reconfiguring auto-classifier
Retraining staff
Changes in user expectations
Taxonomy Strategies LLC
The business of organized information
Taxonomy Change Cases
Case 1.
Renaming a term
Case 2.
Adding a new leaf term
Case 3.
Inserting a new term
Case 4.
Splitting a term
Case 5.
Deleting a leaf term or
subtree
Case 6.
Deleting a term
Case 7.
Moving a subtree
Case 8.
Merging terms
Case 9.
Adding a CV
Case 10. Deleting a CV
53
Taxonomy governance: Taxonomy
maintenance workflow
Can contrast this process
with others that are less
formal and/or less like a
newsroom.. Couple more
are described on next slide.
Suggest new
name/category
Problem?
Yes
Review new
name
Problem?
No
Copy edit new
name
Add to
enterprise
Taxonomy
Taxonomy
No
Yes
Taxonomy Tool
Taxonomy Strategies LLC
Analyst
The business of organized information
Editor
Copywriter
Sys Admin
54
Other change processes
 Processes may be diagramed or  Organization X:
written
Change Request Process
 Provide an ‘emergency’ change
process because it will be
needed.
How can emergency changes be
requested? Who makes the
change and who approves it?
Who are backups for the people
when they are out?
Who are escalation points?
 Change Request Process
should call out decision criteria,
e.g.
 Anyone can ask a team member
for a change. Team members
responsible for figuring out
details and bringing to team for
decision.
 Pending changes list for low
priority/high cost items.
Change Process
 Includes preview of change on
site and data mockup
Fast-Track Change Process
 Anyone can ask editor, he gets
team leader or deputy approval
Cost of retagging
Benefit of change
Conflict with editorial rules
Taxonomy Strategies LLC
The business of organized information
55
Fundamental Processes & Outlooks
 Two fundamental processes every organization
should implement to maintain its metadata and
taxonomies:
 Query log / Click trail examination
 Error Correction
Another biggie
 What are the key outlooks a taxonomist should try to
instill in their organization?
 Integrated approach to Taxonomy, Metadata, Search,
and UI
 Measure & Improve Mindset
Taxonomy Strategies LLC
The business of organized information
56
Fundamental process #1 – Query log
examination
 How can we characterize users
and what they are looking for?
•
•
 Query Log & Click Trail
Examination
•
 Only 30-40% of organizations
•
interested in Taxonomy
Governance examine query logs*
 Basic reports provide plenty of
real value
 Greatest value comes from:
 Identifying a person as
responsible for search quality
 Starting a “Measure & Improve”
mindset

Greatest challenge:
•
•
UltraSeek Reporting
Top queries
Queries with no
results
Queries with no
click-through
Most requested
documents
Query trend analysis
Complete server
usage summary
Click Trail
Packages
iWebTrack
NetTracker
OptimalIQ
SiteCatalyst
Visitorville
WebTrends
 Getting a person assigned (≥
10%)
 Getting logs turned back on
Source: Metadata Maturity Model Presentation, Ron Daniel, ESS’05
Taxonomy Strategies LLC
The business of organized information
57
Fundamental process #2 – Error correction
 Errors will happen, and some will be found. What are you going
to do about them?
 Tagging errors, content errors, taxonomy errors, …
 Define an error correction process.
You have an error correction
process. Would you hate to
see it on paper?
 Process will accommodate questions like:
 Who looks at it? Is it an error? What are the costs to correct vs. not
correct? Does the correction need to be scheduled? etc.
 Once a tagging error is corrected, NEVER lose that fact.
 Manually reviewed pages are vital for training automatic classifiers
 Has implications for metadata specification and review procedures
 Over time, multiple error detection methods will be defined
 e.g. Statistical sampling of newly added pages
 Gradually, additional error correction processes may be defined to deal
with particular types of errors
Taxonomy Strategies LLC
The business of organized information
58
Fundamental Outlooks
 Measure & Improve Mindset
 Query logs and click trails are prime example
 Next place to instrument: Error correction and error
detection processes
 Integrated handling of Taxonomy, Metadata, UI, &
Search
 To be most effective, these must work together
 Governance structure must help that happen
 Cross-functional team structure is a start
Taxonomy Strategies LLC
The business of organized information
59
Actions to define taxonomy governance
 Initial vocabularies should be selected for stability as
well as utility.
 Custodians of shared vocabularies must be identified,
educated re. impacts of changes.
 Group of custodians and stakeholders must be
established.
 (Simple) System for sharing the CVs and tracking the
update process must be established.
Taxonomy Strategies LLC
The business of organized information
60
Agenda
10:15
Introduction
10:30
Background
10:35
Maintainable Taxonomies
10:45
Maintainable Metadata
10:50
ROI Estimation
11:00
Governance Environment
11:10
Controlled Items
11:30
Team Structures
11:45
Change Process
12:00
Exercises
12:15
Adjourn
Taxonomy Strategies LLC
The business of organized information
61
Exercise 4: Self-Diagnosis
1. Does your organization know what it is, or wants to be, doing around
search & taxonomy yet?
18. Do you have an identified taxonomy “team” with at least one person?
2. Is the cost basis for the taxonomy ROI clear to you?
19. Is there at least one person working on taxonomy/metadata/search more
than ½ time?
3. Is the benefits basis for the taxonomy ROI clear to you?
20. Does the team contain members who represent search, UI, and
metadata tagging?
4. Is the cost basis for the taxonomy ROI clear to your CFO?
5. Is the benefits basis for the taxonomy clear to your CFO?
21. Does the organization have any hiring and training criteria for taxonomy,
metadata, and search positions?
6. Do you know how content will be tagged?
22. Does the team maintain Editorial Rules?
7. Do you know how tagged content will be displayed to users?
23. Does the team maintain a corporate metadata specification?
8. Do you know how users will fetch the content?
24. Does the team maintain educational materials?
9. Do users know how they should report errors in the tagging?
25. Does the team have a communications plan?
10. Do you know how what information will be logged for later analysis?
26. Does the team examine query logs?
11. Do you know what information has to be reported to management to
justify the taxonomy team?
27. Does the team examine click trails?
12. Does management expect the taxonomy team to justify its existence?
13. Is your organization planning a tightly focused taxonomy effort?
14. Is your organization planning a credible ‘Enterprise Taxonomy Strategy’?
I think a self-diagnosis quiz
16. Has your organization identified some facets as stable and some facets
as volatile? like this could be nice to
17. Does yourhave
organization
a plan
for retagging
data when
inhave
the
book.
Also
seethe
taxonomy is changed?
the “Metadata Maturity
Model” stuff in the next set
of slides.
15. Does your organization expect its taxonomies to change frequently?
Taxonomy Strategies LLC
The business of organized information
28. Does the team have a documented error correction process?
29. Does the organization have a procedure to locate ROT (Redundant,
Obsolete, or Trivial content)?
30. Does the organization have any qualitative or quantitative measures of
data quality?
31. Do you use a tool other than MS Excel for editing and maintaining the
Taxonomy?
32. Were taxonomy, metadata, search, or content management tools
purchased with money other than “use it or lose it” funds?
62
Taxonomy Strategies LLC
Data Governance Maturity:
When the business depends on clear
description of fuzzy objects
Presented to San Francisco DAMA
Sept. 10, 2008
Ron Daniel, Jr.
Copyright 2009Taxonomy Strategies LLC. All rights reserved.
Goals for this talk
 Provide you with background on maturity models.
 Provide the results of our surveys of Search,
Metadata, & Taxonomy practices and discuss
interesting findings.
 Review the practices in use at stock photo houses,
and compare them to methods that may be used in
typical information management projects.
 Give you the tools to do a simple self-assessment of
your organization’s metadata maturity
Taxonomy Strategies LLC
The business of organized information
64
Agenda
9:15
Metadata Definitions
9:30
Maturity Models
9:45
Metadata Maturity Model (ca. 2006)
10:15
Break
10:30
Stock Photo Business
10:40
Data Governance Practices in Stock Photo
Agencies
11:40
Summary
11:45
Questions
12:00
Adjourn
Taxonomy Strategies LLC
The business of organized information
65
Taxonomy and metadata definitions
Metadata
 “Data about data”.
 Different communities have very different assumptions
about they types of data being described.
 I’m from the Information Science community, not the database,
statistics, or massive storage communities.
Taxonomy
1. The classification of organisms in an ordered system
that indicates natural relationships.
2. The science, laws, or principles of classification;
systematics.
3. Division into ordered groups, categories, or
hierarchies.
Taxonomy Strategies LLC
The business of organized information
66
Examples of taxonomy used to populate
metadata fields
Metadata Values
(Facets within the overall Taxonomy)
Audience
Metadata
Title
Author
Department
Audience
Topic
Taxonomy Strategies LLC
The business of organized information
Internal
Executives
Managers
External
Suppliers
Customers
Partners
Topics
Employee Services
Compensation
Retirement
Insurance
Further Education
Finance and Budget
Products and Services
Support Services
Infrastructure
Supplies
67
Example faceted taxonomy
ABC Computers.com
Content
Type
Competency
Industry
Service
Award
Case Study
Contract &
Warranty
Demo
Magazine
News & Event
Product
Information
Services
Solution
Specification
Technical Note
Tool
Training
White Paper
Other Content
Type
Business &
Finance
Interpersonal
Development
IT Professionals
Technical
Training
IT Professionals
Training &
Certification
PC Productivity
Personal
Computing
Proficiency
Banking &
Finance
Communications
E-Business
Education
Government
Healthcare
Hospitality
Manufacturing
Petrochemocals
Retail /
Wholesale
Technology
Transportation
Other
Industries
Assessment,
Design &
Implementati
on
Deployment
Enterprise
Support
Client Support
Managed
Lifecycle
Asset
Recovery &
Recycling
Training
Taxonomy Strategies LLC
The business of organized information
Product
Family
Desktops
MP3 Players
Monitors
Networking
Notebooks
Printers
Projectors
Servers
Services
Storage
Televisions
Non-ABC
Brands
Audience
Line of
Business
RegionCountry
All
Business
ABC Employee
Education
Gaming
Enthusiast
Home
Investor
Job Seeker
Media
Partner
Shopper
First Time
Experienced
Advanced
Supplier
All
Home & Home
Office
Gaming
Government,
Education &
Healthcare
Medium &
Large
Business
Small Business
All
Asia-Pacific
Canada
ABC EMEA
Japan
Latin America &
Caribbean
United States
68
Manually tagged metadata sample
Attribute
Values
Title
Jupiter’s Ring System
URL
http://ringmaster.arc.nasa.gov/jupiter/
Description
Overview of the Jupiter ring system. Many images,
animations and references are included for both the
scientist and the public.
Content Types
Web Sites; Animations; Images; Reference Sources
Audiences
Educators; Students
Organizations
Ames Research Center
Missions & Projects
Voyager; Galileo; Cassini; Hubble Space Telescope
Locations
Jupiter
Business Functions
Scientific and Technical Information
Disciplines
Planetary and Lunar Science
Time Period
1979-1999
Taxonomy Strategies LLC
The business of organized information
69
Other things sometimes called Taxonomy
Type
Remarks
Synonym Ring
 Connects a series of terms together
 Treats them as equivalent for search purposes
e.g (Dog, Canine, Pooch, Mutt) (Cat, Feline, Kitty), …
Authority File
 Used to control variant names with a preferred term
 Typically used for names of countries, individuals, organizations
e.g. (IBM, Big Blue, International Business Machines Inc.)
Classification
Scheme
 A hierarchical arrangement of terms
 May or may not follow strict “is-a” hierarchy rules
 Usually enumerated; ie, LC or Dewey
Thesaurus
 Expresses semantic relationships of:
• Hierarchy (broader & narrower terms)
• Equivalence (synonyms)
• Associative (related terms)
 May include definitions
Ontology
Taxonomy Strategies LLC
 Resembles faceted taxonomy but uses richer semantic relationships
among terms and attributes and strict specification rules
 A model of reality, allowing inferences to be made.
The business of organized information
70
Agenda
9:15
Metadata Definitions
9:30
Maturity Models
9:45
Metadata Maturity Model (ca. 2006)
10:15
Break
10:30
Stock Photo Business
10:40
Data Governance Practices in Stock Photo
Agencies
11:40
Summary
11:45
Questions
12:00
Adjourn
Taxonomy Strategies LLC
The business of organized information
71
Organizational benchmarking
 A common goal of organizations is to ‘benchmark’
themselves against other organizations.
 Different organizations have:
 Different levels of sophistication in their planning,
execution, and follow-up for CMS, Search, Portal,
Metadata, and Taxonomy projects.
 Different reasons for pursuing Search, Metadata, and
Taxonomy efforts
 Different cultures
 Benchmarks should be to similar organizations.
Taxonomy Strategies LLC
The business of organized information
72
Is unnecessary capability harmful?
 Tool Vendors continue to provide ever-more capable
tools with ever-more sophisticated features.
 But we live in a world where a significant fraction of
public, commercial, web pages don’t have a <title> tag.
 Organizations that can’t manage <title> tags stand a
very poor chance of putting an entity extractor to use,
which requires some ongoing management of the lists
of entities to be extracted.
 Organizations that can’t create and maintain clean
metadata can’t put a faceted search UI to good use.
 Unused capability is poor value-for-money.
 Organizations over-spend on tools and under-spend on
staff & processes.
Taxonomy Strategies LLC
The business of organized information
73
Towards better benchmarking…
 Wanted a method to:
 Generally identify good and bad practices.
 Help clients identify the things they can do, and the things that
stand an excellent chance of failing.
 Predict likely sources of problems in engagements.
 We have started to develop a Metadata Maturity Model,
inspired by Maturity Models from the software industry.
 To keep the model tied to reality, we are conducting surveys to
determine the actual state of practice around search, metadata,
taxonomy, and supporting business functions such as staffing
and project management.
Taxonomy Strategies LLC
The business of organized information
74
A Tale of Two Software
Maturity Models
CMMI (Capability Maturity Model Integration)
vs.
The Joel Test
TAXONOMY STRATEGIES The business of organized information
75
CMMI structure
Maturity Models are collections of Practices.
Main differences in Maturity Models concern:
• Descriptivist or Prescriptivist Purpose
• Degree of Categorization of Practices
• Number of Practices (~400 in CMMI)
Taxonomy Strategies LLC
The business of organized information
Source: http://chrguibert.free.fr/cmmi
76
22 Process Areas, keyed to 5 Maturity
Levels…
 Process Areas contain Specific
and Generic Practices,
organized by Goals and
Features, and arranged into
Levels
 Process Areas cover a broad
range of practices beyond
simple software development
 CMMI Axioms:
Individual processes at higher
levels are AT RISK from
supporting processes at lower
levels.
A Maturity Level is not achieved
until ALL the Practices in that level
are in operation.
Taxonomy Strategies LLC
The business of organized information
77
CMMI Positives
 Independent audits of an organization’s level of maturity are a
common service
 Level 3 certification frequently required in bids
 “…compared with an average Level 2 program, Level 3 programs
have 3.6 times fewer latent defects, Level 4 programs have 14.5
times fewer latent defects, and Level 5 programs have 16.8 times
fewer latent defects”.

Michael Diaz and Jeff King – “How CMM Impacts Quality,
Productivity,Rework, and the Bottom Line”
 ‘If you find yourself involved in product liability litigation you're going to
hear terms like "prevailing standard of care" and "what a reasonable
member of your profession would have done". Considering the fact
that well over a thousand companies world-wide have achieved level
3 or above, and the body of knowledge about the CMM is readily
available, you might have some explaining to do if you claim
ignorance’.
Linda Zarate in a review of A Guide to the Cmm: Understanding the Capability
Maturity Model for Software by Kenneth M. Dymond
Taxonomy Strategies LLC
The business of organized information
78
CMMI Negatives
 Complexity and Expense
 Reading and understanding the materials
 Putting it into action – identifying processes, mapping
processes to model, gathering required data, …
 Audits are expensive
 CMMI does not scale down well to small shops
 Has been accused of restraint of trade
Taxonomy Strategies LLC
The business of organized information
79
At the other extreme, The Joel Test
 Developed by Joel
Spolsky as reaction to
CMMI complexity
 Positives - Quick, easy,
and inexpensive to use.
 Negatives - Doesn’t scale
up well:
Not a good way to assure the
quality of nuclear reactor
software.
Not suitable for scaring away
liability lawyers.
Not a longer-term
improvement plan.
 The Joel Test
1. Do you use source control?
2. Can you make a build in one step?
3. Do you make daily builds?
4. Do you have a bug database?
5. Do you fix bugs before writing new code?
6. Do you have an up-to-date schedule?
7. Do you have a spec?
8. Do programmers have quiet working
conditions?
9. Do you use the best tools money can
buy?
10. Do you have testers?
11. Do new candidates write code during
their interview?
12. Do you do hallway usability testing?
Scoring: 1 point for each ‘yes’. Scores below
10 indicate serious trouble.
Taxonomy Strategies LLC
The business of organized information
80
What does software development
“Maturity” really mean?
 A low score on a maturity audit DOES NOT mean
that an organization can’t develop good software
 It DOES mean that whether the organization will do a
good job depends on the specific mix of people
assigned to the project
 In other words, it sets a floor for how bad an
organization is likely to do, not a ceiling on how good
they can do
 Probability of failure is a good thing to know before
spending a lot of time and money
Taxonomy Strategies LLC
The business of organized information
81
Towards a Metadata
Maturity Model
Caveats:
 Maturity is not a goal, it is a characterization of an
organization’s methods for achieving its core
goals.
 Mature processes impose expenses which must
be justified by consequent cost savings, revenue
gains, or service improvements.
Nevertheless, Maturity Models are useful as collections
of best practices and stages in which to try to adopt
them.
TAXONOMY STRATEGIES The business of organized information
82
Basis for initial maturity model
 CEN study on commercial adoption of Dublin Core
 Small-scale phone survey
 Organizations which have world-class search and
metadata externally
 Not necessarily the most mature overall processes or
the best internal search and metadata
 Literature review
 Client experiences
 Structure from software maturity models
Taxonomy Strategies LLC
The business of organized information
83
Initial Metadata Maturity Model (ca. May,
2005)
37 Practices, Categorized by
Area, Level, and Importance
Practice Area
Maturity Level
Basic
Intermediate
Advanced
BleedingEdge
Search Capabilities
Uniform Search Box
Query Log Exam.
Index Multiple Repos.
Best Bets
Simple Grouping
Intranet Facet
Navigation
Improved Ranking
Metadata and
taxonomy standards
System MD Stds.
Organization MD Std.
Reuse ERP
Multipe Repos Comply
Taxonomy Roadmap
Tools and tool
selection
Requirements, then
Tools
Bakeoff Datasets
Budget for Bakeoffs
Staff training and
hiring
Search Analyst Role
Librarian Expertise
Pre-hire Testing
SME Catalogers
Data creation and QA
CM Introduced
ROT-Eliminatiion
Hybrid Creation Model
Adaptive
Qualification
Quality Measures
Project management
Project Plan
Std. Proj. Methodol.
X-Functional Teams
Communication Plan
Multi-Year Plan
Early Termination
Executive support and
ROI
External Search ROI
Intranet ROI Model
CEO knows Search ROI
Taxonomy Strategies LLC
The business of organized information
Limiting
Highly Abstract
Subject Taxos.
Unneeded
Capabils.
Tools, then Reqs.
Use it or Lose It
Budgets
84
Shortcomings of the initial model
 No idea of how it corresponds to actual practice across multiple
organizations
 Some indications that it over-emphasized the sophisticated
practices and under-emphasized beginning practices.
 The initial metadata maturity model can be regarded as a
hypothesis about how an organization progresses through
various practices as it matures
 How to test it? Let’s ask!
 Two surveys to date
 Surveys are being run in stages because of large number of
practices.
 Ask about future, current, and former practices to gather
information on progression
Taxonomy Strategies LLC
The business of organized information
85
Agenda
9:15
Metadata Definitions
9:30
Maturity Models
9:45
Metadata Maturity Model (ca. 2006)
10:15
Break
10:30
Stock Photo Business
10:40
Data Governance Practices in Stock Photo
Agencies
11:40
Summary
11:45
Questions
12:00
Adjourn
Taxonomy Strategies LLC
The business of organized information
86
Survey 1: Search,
Metadata, & Taxonomy
Practices
The data in this section comes from a survey
conducted in the autumn of 2005.
TAXONOMY STRATEGIES The business of organized information
87
Participants by Organization Size
Taxonomy Strategies LLC
The business of organized information
88
Participants by Job Role
Taxonomy Strategies LLC
The business of organized information
89
Participants by Industry
Taxonomy Strategies LLC
The business of organized information
90
Search Practices
Not current
practice
Being
developed
In practice
Former
practice
NA or
Unknown
Search Box in standard place on all web pages.
20% (12)
11% (7)
62% (38)
2% (1)
5% (3)
Search engine indexes multiple repositories in addition
to web sites.
25% (15)
21% (13)
44% (27)
2% (1)
8% (5)
Spell Checking.
31% (19)
18% (11)
38% (23)
0% (0)
13% (8)
Synonym Searching.
41% (25)
23% (14)
30% (18)
0% (0)
7% (4)
Search results grouped by date, location, or other
factors in addition to simple relevance score.
37% (22)
20% (12)
37% (22)
0% (0)
7% (4)
Queries are logged and the logs are regularly examined
31% (19)
25% (15)
31% (19)
5% (3)
8% (5)
Common queries identified, 'best' pages for those
queries are found, and search engine configured to
return them at the top.
46% (28)
25% (15)
21% (13)
0% (0)
8% (5)
Advanced computation of relevance based on data in
addition to the text of the document.
43% (26)
16% (10)
25% (15)
0% (0)
16% (10)
A faceted search tool, such as Endeca, has been
implemented for the organization's external site or
product catalog search.
68% (41)
7% (4)
10% (6)
0% (0)
15% (9)
A faceted search tool, such as Endeca, has been
implemented for the organization's internal website(s)
or portal.
57% (34)
15% (9)
17% (10)
0% (0)
12% (7)
Taxonomy Strategies LLC
The business of organized information
91
Metadata PracticesThese two questions were the only ones
with much correlation to organization size
Not current
practice
Being
developed
In practice
Former
practice
NA or
Unknown
Metadata standards are developed for the needs of
each system with no overall attempt to unify them.
22% (13)
12% (7)
37% (22)
20% (12)
10% (6)
An Organization-wide metadata standard exists and
new systems consider it during development.
37% (22)
37% (22)
20% (12)
0% (0)
7% (4)
The Organization-wide metadata standard is based on
the Dublin Core.
52% (30)
16% (9)
21% (12)
0% (0)
12% (7)
Multiple repositories comply with metadata standard.
52% (31)
20% (12)
17% (10)
0% (0)
12% (7)
A Cataloging Policy document exists to teach people
how to tag data in compliance with organizational
metadata standard.
48% (29)
20% (12)
20% (12)
0% (0)
12% (7)
The Cataloging Policy document is revised periodically.
48% (29)
15% (9)
17% (10)
0% (0)
20% (12)
A centralized metadata repository exists to aggregate
and unify metadata from disparate sources.
57% (34)
17% (10)
17% (10)
0% (0)
10% (6)
15% (9)
12% (7)
61% (36)
3% (2)
8% (5)
Metadata is generated automatically by software.
38% (23)
18% (11)
27% (16)
2% (1)
15% (9)
Metadata is generated automatically, then reviewed
manually for correction.
48% (29)
18% (11)
17% (10)
2% (1)
15% (9)
Metadata is manually entered into web forms.
Taxonomy Strategies LLC
The business of organized information
92
Taxonomy Practices
Not current
practice
Being
developed
In practice
Former
practice
NA or
Unknown
Org Chart' Taxonomy - One based primarily on the
structure of the organization.
36% (21)
10% (6)
34% (20)
5% (3)
15% (9)
'Products' Taxonomy - One based primarily on the
products and/or services offered by the organization.
37% (22)
10% (6)
32% (19)
5% (3)
15% (9)
'Content Types' Taxonomy - One based primarily on the
different types of documents.
28% (16)
21% (12)
40% (23)
5% (3)
7% (4)
'Topical' Taxonomy - One based primarily on topics of
interest to the site users.
20% (12)
36% (21)
34% (20)
3% (2)
7% (4)
'Faceted' Taxonomy - One which uses several of the
approaches above.
32% (19)
29% (17)
34% (20)
0% (0)
5% (3)
The Taxonomy, or a portion of it, was licensed from an
outside taxonomy vendor.
75% (44)
3% (2)
14% (8)
0% (0)
8% (5)
The Taxonomy follows a written 'style guide' to ensure
its consistency over time.
47% (28)
22% (13)
20% (12)
0% (0)
10% (6)
The Taxonomy is maintained using a taxonomy editing
tool other than MS Excel.
35% (21)
17% (10)
40% (24)
2% (1)
7% (4)
The Taxonomy was validated on a representative
sample of content during its development.
28% (17)
22% (13)
33% (20)
3% (2)
13% (8)
A Roadmap for the future evolution of the Taxonomy
has been developed.
38% (23)
40% (24)
13% (8)
0% (0)
8% (5)
Taxonomy Strategies LLC
The business of organized information
93
Survey 2: Business
Drivers, Processes, and
Staffing
The data in this section comes from a survey
conducted in the spring of 2006.
TAXONOMY STRATEGIES The business of organized information
94
Participants by Job Role
Taxonomy Strategies LLC
The business of organized information
95
Participants by Tenure
Taxonomy Strategies LLC
The business of organized information
96
Participants by Industry
Taxonomy Strategies LLC
The business of organized information
97
Participants by Organization Size
Taxonomy Strategies LLC
The business of organized information
98
Business Drivers: Search, Metadata, and
Taxonomy (SMT) Applications
Taxonomy Strategies LLC
The business of organized information
99
Business Drivers: Desired Benefits
Other
desired
benefits:
Taxonomy Strategies LLC
1
2
3
4
5
6
7
8
9
10
11
Innovation
Core to our business product
Clients do all the above [From a consultant]
Better navigation to diverse State web sites
Increased knowledge sharing across the corporation
Interoperability
Dynamic web applications
Improved user search experience
Improve R&D
Higher value to members [From a non-profit membership org.]
For organization to have better understanding of their content
The business of organized information
100
ROI: Cost Estimation
Taxonomy Strategies LLC
The business of organized information
101
Processes
Use of search
logs is improving
Surprisingly
sophisticated
Basic data quality and
communications need
improvement
Many solo operators
Taxonomy Strategies LLC
The business of organized information
102
Team Structures & Staffing
Taxonomy Strategies LLC
The business of organized information
103
Salary Survey
Experience
0.6 Nice to see it really counts.
Geography
0.5 California and the Northeast have highest salaries.
Co. Size
0.5 Not very reliable, big changes from one datapoint
Education
0.4 Many taxonomists have MLS or above.
Industry
0.4 Surprisingly, retail has high salaries for taxonomists.
Role
0.04 Taxonomists paid about like Information Architects
Time at current job
Taxonomy Strategies LLC
-0.07
The business of organized information
104
Notes from Participants
 There is the constant struggle with individual [magazine] titles
to hire trained librarians or data specialists instead of trying to
save money by hiring an editor who can build articles AND
create and assign metadata. This is a governance issue we
have been struggling with since we have no monetary stake in
the individual publications. We make recommendations, but
have no higher level authority to require titles to hire trained
staff for metadata.
 Reporting metrics have become a new area of confusion as we
move to portalized pages consisting of objects in portlets, each
with their own metadata.
 Key organizational issue is that the "problems" that stem from
lack of systematic metadata/taxonomy creation are not "owned"
by anyone, and consequently have no budget for their solution.
Taxonomy Strategies LLC
The business of organized information
105
Interim Conclusions
TAXONOMY STRATEGIES The business of organized information
106
Observations (1)
 Practices which a single person or a small group can
carry out are more commonly used
 Not surprising
 Very different than ERP/BPR, indicates that information
management is not being sold to the “C-level” staff.
 People need to question how inclusive their
“Organizational Metadata Standards” and “Taxonomy
Roadmaps” actually are.
 We have found Taxonomy Roadmaps to be an advanced
practice, due to a dependence on knowing upcoming IT
development schedule
Taxonomy Strategies LLC
The business of organized information
107
Observations (2)
 Many of the basics are being skipped
 More organizations doing “Spell Checking” than “Query
Log Analysis”.
 69% have a taxonomy change plan, but only 41% have
a plan for revisiting data if the taxonomy changes.
 64% have a communications plan, but only 56% have a
website.
 This seems to be linked to the previous observation –
things that are easy for an individual get done before
things that need an organizational effort, despite their
level of ‘sophistication’.
Taxonomy Strategies LLC
The business of organized information
108
Interim Metadata Maturity Model (ca.
May, 2006)
Basic
Practice Area
Intermediate
Advanced
Search Capabilities
Uniform Search Box
Query Log Exam.
Index Multiple Repos.
Best Bets
Facet Navigation UI
Metadata and
taxonomy standards
System MD Stds.
Organization MD Std.
Multipe Repos Comply w/
MD Std.
Reuse ERP Taxos
Taxo Maint. Doc
Taxonomy Roadmap
Highly Abstract Subject
Taxos (e.g. “Moods”)
Metadata Maint. Doc
Tools and tool
selection
Requirements, then Tools
Bakeoff Datasets
Budget for Bakeoffs
Staff training and
hiring
Librarian or IA Expertise
Search Analyst Role
Cross-Functional Taxonomy
Creation
Cross-functional taxonomy
maint.
SME Catalogers
Pre-hire Testing
Data creation and QA
CM Introduced
ROT-Eliminatiion
Semi-auto tagging
Quality Measures
Project management
Project Plan
X-Functional Teams
Std. Proj. Methodol.
Multi-Year Plan
Communication Plan
SMT Business Manager,
instead of IT Manager
Early Termination
Executive support and
ROI
External Search ROI
SMT in separate silos
Intranet ROI Model
CEO knows Search ROI
Taxonomy Strategies LLC
The business of organized information
Limiting
Tools, then Reqs.
Use it or Lose It
Budgets
109
Search and Metadata Maturity Quick Quiz
Basic
1) Is there a process in place to examine query logs?
2) Is there a process for adding directories and content to the repository, or do people just
do what they want?
3) Is there an organization-wide metadata standard, such as an extension of the Dublin
Core, for use by search tools, multiple repositories, etc.?
Intermediate
4) Does the search engine index more than 4 repositories around the organization?
5) Does the search engine integrate with the taxonomy to improve searches and organize
results?
6) Are there hiring and training practices especially for metadata and taxonomy positions?
7) Is there an ongoing data cleansing procedure to look for ROT (Redundant, Obsolete,
Trivial content)?
8) Are tools only acquired after requirements have been analyzed, or are major purchases
sometimes made to use up year-end money?
Advanced
9) Are there established qualitative and quantitative measures of metadata quality?
10) Can the CEO explain the ROI for search and metadata?
Taxonomy Strategies LLC
The business of organized information
110
Agenda
9:15
Metadata Definitions
9:30
Maturity Models
9:45
Metadata Maturity Model (ca. 2006)
10:15
Break
10:30
Stock Photo Business
10:40
Data Governance Practices in Stock Photo
Agencies
11:40
Summary
11:45
Questions
12:00
Adjourn
Taxonomy Strategies LLC
The business of organized information
111
Agenda
9:15
Metadata Definitions
9:30
Maturity Models
9:45
Metadata Maturity Model (ca. 2006)
10:15
Break
10:30
Stock Photo Business
10:40
Data Governance Practices in Stock Photo
Agencies
11:40
Summary
11:45
Questions
12:00
Adjourn
Taxonomy Strategies LLC
The business of organized information
112
Stock Photo Business
 Advertising, Editorial Content, Corporate
Communications, and many other types of content
rely on images to convey information and moods.
 When time and/or budget does not allow a
commissioned shoot, stock photo houses can supply
images.
 Fundamental problem for users: How to search for an
image that conveys what you want?
 Fundamental problem for houses: How to describe
images so that users can find them?
Taxonomy Strategies LLC
The business of organized information
113
How would you search for this image?
Taxonomy Strategies LLC
The business of organized information
114
Tagging by emotions
Taxonomy Strategies LLC
The business of organized information
115
“silence”
Image Rights Criteria
Objective criteria
Conceptual refinement
Taxonomy Strategies LLC
The business of organized information
116
Clarification: Finger on Lips
Taxonomy Strategies LLC
The business of organized information
117
Scrolling through results…
This is more of the mood I’m looking for…
Taxonomy Strategies LLC
The business of organized information
118
More like this
Taxonomy Strategies LLC
The business of organized information
119
Facets at gettyimages.com
Taxonomy Strategies LLC
The business of organized information
120
Key Questions
 Getty Images (and Corbis) have put a lot of effort into
their websites for image purchase*.
 Internal staff at such organizations tell me that their
intranets are nowhere near as easy to use.
 ROI is the reason why.
 Recall that retail had high salaries for taxonomists,
because the ROI for a better shopping site is so clear.
 The front-ends are dependent on data. How is that
data governed? How does that differ from how their
intranets are governed?
*Licensing,
not purchasing, to be pedantic.
Taxonomy Strategies LLC
The business of organized information
121
Agenda
9:15
Metadata Definitions
9:30
Maturity Models
9:45
Metadata Maturity Model (ca. 2006)
10:15
Break
10:30
Stock Photo Business
10:40
Data Governance Practices in Stock Photo
Agencies
11:40
Summary
11:45
Questions
12:00
Adjourn
Taxonomy Strategies LLC
The business of organized information
122
Who are the users & what are they looking
for?
 Only 30-40% of organizations regularly examine their
logs.
 Sophisticated software available, but don’t wait.
 80% of value comes from basic reports
Taxonomy Strategies LLC
The business of organized information
123
Query log & click trail examination—
Click trail packages






iWebTrack
NetTracker
OptimalIQ
SiteCatalyst
Visitorville 
WebTrends
Overkill
Taxonomy Strategies LLC
The business of organized information
124
Query log & click trail examination–
Query log
 UltraSeek Reporting






Top queries
Queries with no results
Queries with no click-through
Most requested documents
Query trend analysis
Complete server usage
summary
Basic queries provide most
of the value if organization
has a process to review
what is going one.
Taxonomy Strategies LLC
The business of organized information
125
Key Governance Aspects
 Roles and Responsibilities –
 Managers
 Reviewers
 Policies –
 For naming
 Required Fields
 Procedures –
 For reviewing and approving metadata placement
 For acting on poor metadata application
Taxonomy Strategies LLC
The business of organized information
126
Recommended Measure and Improve
Mindset
 Measure - Determine current situation and what is wrong.
• Too many documents in a category? Too many categories? People complaining
about not finding material that is on the site? People asking for materials not on
the site? Common searches without results?
 Decide – Decide how to change things to fix the problem.
• Change navigation list? Add new categories? Add synonyms to search? Create
new content?
 Confirm – Before rolling out changes, test them to make sure they will
improve the problem.
• Usability tests, Card sorts, Internal functionality tests, …
 Implement – Roll out the changes.
 Repeat – Monitor people’s behavior on the site as well as responding to
reported problems.
• Query log examination, Clicktrail examination, Google search result position,
Stakeholder feedback, User surveys, Site analytics, etc.
Taxonomy Strategies LLC
The business of organized information
127
Taxonomy team: Generic roles
 Keeps team on track with larger business objectives.
Stakeholder
Committee
 Reality check on process change suggestions.
 Balances cost/benefit issues to decide appropriate levels of
effort.
 Obtains needed resources if those on committee can’t
accomplish a particular task.
Content
Owners
Business
Lead
Technical
Specialist
 Estimates costs of proposed changes in terms of amount of
data to be retagged, additional storage and processing burden,
software changes, etc.
 Helps obtain data from various systems.
 Committee’s liaison to content creators.
Content
Specialist
 Estimates costs of proposed changes in terms of editorial
Taxonomy
Specialist
 Suggests potential taxonomy changes based on analysis of
Taxonomy Strategies LLC
process changes, additional or reduced workload, etc.
query logs, indexer feedback.
 Makes edits to taxonomy, installs into system with aid of IT
specialist.
The business of organized information
128
Recommended Reading
 CMMI: http://chrguibert.free.fr/cmmi
(Official site is http://www.sei.cmu.edu/cmmi/, but that is not the most
comprehensible.)
 Joel Test
http://www.joelonsoftware.com/articles/fog0000000043.html
 EIA Roadmap
http://www.louisrosenfeld.com/presentations/031013-KMintranets.ppt
 Enterprise Search Report
http://www.cmswatch.com/EntSearch/
Taxonomy Strategies LLC
The business of organized information
129
Fun Questions
The animals are divided into:
(a) belonging to the emperor,
(b) embalmed, (c) tame, (d) sucking pigs, (e)
sirens, (f) fabulous, (g) stray dogs,
(h) included in the present classification,
(i) frenzied, (j) innumerable, (k) drawn with a
very fine camelhair brush, (l) et cetera, (m)
having just broken the water pitcher,
(n) that from along way off look like flies.
This was created
to be as bad a
classification as
possible. What
makes it so bad?
Jorge Luis Borges, " THE ANALYTICAL
LANGUAGE OF JOHN WILKINS"
Works in 3 volumes (in Russian). St.
Petersburg, "Polaris", 1994. V. 2: 87.
Taxonomy Strategies LLC
The business of organized information
130
Taxonomy Strategies LLC
Contact Info
Ron Daniel, Jr.
925-368-8371
[email protected]
Copyright 2009Taxonomy Strategies LLC. All rights reserved.