Transcript Slide 1

Taxonomy Strategies LLC
Benchmarking Your Search
Function: A Metadata Maturity Model
Ron Daniel, Jr.
Taxonomy Strategies LLC
May 17, 2005
Copyright 2005 Taxonomy Strategies LLC. All rights reserved.
Motivating Experiences
 Different organizations have different levels of sophistication in
their planning, execution, and follow-up for CMS, Search, Portal,
Metadata, and Taxonomy projects.
 Last year we had back-to-back engagements with clients who had
very different levels of sophistication.
 Tool Vendors continue to provide ever-more capable tools with
ever-more sophisticated features.
 We live in a world where a significant fraction of public, commercial,
web pages don’t have a <title> tag.
 Organizations that can’t manage <title> tags stand a very poor
chance of putting an entity extractor to use, which requires some
management of the lists of entities to be extracted.
 Taxonomy governance processes must fit the organization
 In terms of scale and complexity
Taxonomy Strategies LLC
The business of organized information
2
Desiderata
 Wanted a method to:
 Predict likely sources of problems in engagements
 Help clients identify the things they can do, and the things that stand
an excellent chance of failing
 Generally identify good and bad practices
 These desiderata are not unique
 Such methods have been defined for software development and
other areas
 They are known as Maturity Models
Taxonomy Strategies LLC
The business of organized information
3
Goals for this Talk
 Provide you with basic knowledge of maturity models
 Give you the tools to do a simple self-assessment of your
organization’s metadata maturity
 Suggest practices that are, and are not, likely next steps in your
organization’s development of:
 Processes to manage search, metadata, and taxonomy deployments.
 Overly-sophisticated processes will fail
 Expertise around search, metadata, and taxonomies
 Systems to create, manage, or use metadata and taxonomies
 Tool selection
 Overly-sophisticated tools will be very poor value-for-money
 Have some fun
Taxonomy Strategies LLC
The business of organized information
4
A Tale of Two Maturity
Models
CMMI (Capability and Maturity Model – Integrated)
vs.
The Joel Test
TAXONOMY STRATEGIES The business of organized information
5
CMMI’s Levels of Maturity, Translated
 1) Initial: You build software like you never have done it before and will never do it
again. One hero spits out code and you don't worry about maintaining or
documenting it. Whatever the programmer gives you is good enough for the end
users.
 2) Repeatable: You actually have a project plan, and the plan might even include
some quality assurance, documentation, and things like that.
 3) Defined: You follow the plan, which is at the organizational level rather than the
project level. You expect to train people, have compatible software, and follow
organizational standards. Think of skilled craftsmen following a blueprint and using
the standards of their trade.
 4) Managed: The organization follows the plan and measures the progress as it
goes, similar to an assembly line for software. Managers know what's happening as
it happens and the software is also monitored.
 5) Optimizing: The final phase is when the factory becomes self-aware. The
lessons learned on the project are used to prevent defects before they occur and
manage technological changes. There's a constant organized feedback mechanism
to improve the cycle time and product quality.
“Modeling Data Management” – A report on discussions of Metadata
Maturity at the 2002 DAMA Conference
Joe Celko
http://www.intelligententerprise.com/020726/512celko1_1.jhtml
Taxonomy Strategies LLC
The business of organized information
6
22 Process Areas, Keyed to 5 Maturity Levels…
 Process Areas contain
Specific and Generic
Practices, organized by Goals
and Features
 Maturity Model Axioms:
 A Maturity Level is not
achieved until ALL the
Practices in that level are in
operation.
 Individual processes at
higher levels are AT RISK
from supporting processes
at lower levels.
These axioms are very
questionable for the Metadata
Maturity Model
Taxonomy Strategies LLC
The business of organized information
7
CMMI Structure
Previous
Diagram only
shows these
two levels
Maturity Models are collections of Practices.
Main differences in Maturity Models concern:
•Degree of Categorization of Practices
•Descriptivist or Prescriptivist Purpose
Source: http://chrguibert.free.fr/cmmi
Taxonomy Strategies LLC
The business of organized information
8
CMMI Positives
 Independent audits of an organization’s level of maturity are a common
service
 Level 3 certification frequently required in bids
 “…compared with an average Level 2 program, Level 3 programs have
3.6 times fewer latent defects, Level 4 programs have 14.5 times fewer
latent defects, and Level 5 programs have 16.8 times fewer latent
defects”.
Michael Diaz and Jeff King – “How CMM Impacts Quality, Productivity,Rework,
and the Bottom Line”
 ‘If you find yourself involved in product liability litigation you're going to
hear terms like "prevailing standard of care" and "what a reasonable
member of your profession would have done". Considering the fact that
well over a thousand companies world-wide have achieved level 3 or
above, and the body of knowledge about the CMM is readily available,
you might have some explaining to do if you claim ignorance’.
Linda Zarate in a review of A Guide to the Cmm: Understanding the Capability
Maturity Model for Software by Kenneth M. Dymond
Taxonomy Strategies LLC
The business of organized information
9
CMMI Negatives
 Complexity and Expense
 Reading and understanding the materials
 Putting it into action – identifying processes, mapping
processes to model, gathering required data, …
 Audits are expensive
 CMMI does not scale down well to small shops
 Has been accused of restraint of trade
Taxonomy Strategies LLC
The business of organized information
10
At the Other Extreme, The Joel Test
 Developed by Joel Spolsky
as reaction to CMMI
complexity
 Positives - Quick, easy, and
inexpensive to use.
 Negatives - Doesn’t scale up
well:
The Joel Test
1. Do you use source control?
2. Can you make a build in one step?
3. Do you make daily builds?
4. Do you have a bug database?
5. Do you fix bugs before writing new code?
6. Do you have an up-to-date schedule?
7. Do you have a spec?
 Not a good way to assure the
quality of nuclear reactor
software.
 Not suitable for scaring away
liability lawyers.
 Not a longer-term improvement
plan.
8. Do programmers have quiet working
conditions?
9. Do you use the best tools money can
buy?
10. Do you have testers?
11. Do new candidates write code during
their interview?
12. Do you do hallway usability testing?
Scoring: 1 point for each ‘yes’. Scores
below 10 indicate serious trouble.
Taxonomy Strategies LLC
The business of organized information
11
A Maturity Rant, in Bullet Points
 Metadata maturity may not be core to your business.
 Maturity is not automatically a good thing.
 Maturity is not a goal, it is a characterization of an organization’s
methods for achieving its core goals.
 Mature processes impose expenses which must be justified by
consequent cost savings, revenue gains, or service improvements.
 “Immature Processes” does not mean “can’t do good work”. It means
“Good results depend on whether the company’s star performers are
doing the job”.
 Maturity predicts the worst that an organization might do on a job, not
the best that it could do.
 Nevertheless, Maturity Models are useful as collections of best
practices and stages in which to try to adopt them.
Taxonomy Strategies LLC
The business of organized information
12
Towards a Metadata
Maturity Model
TAXONOMY STRATEGIES The business of organized information
13
Caveats, Disclaimers, Provisos, Exclusions,
Exemptions, and Limitations on Liability
 Some maturity models are based on millions of dollars
of research and decades of industry experience.
 This isn’t one of them.
 Adjust your expectations accordingly.
Taxonomy Strategies LLC
The business of organized information
14
Basis for Following Materials
 CEN study on commercial adoption of Dublin Core
 Small-scale phone survey
 Organizations which have world-class search and
metadata externally
 Not necessarily the most mature overall processes or the
best internal search and metadata
 Literature review
 Client experiences
Taxonomy Strategies LLC
The business of organized information
15
Search and Metadata Maturity Quick Quiz

Basic
1) Is there a process in place to examine query logs?
2) Is there a process for adding directories and content to the repository, or do people just
do what they want?
3) Is there an organization-wide metadata standard, such as an extension of the Dublin
Core, for use by search tools, multiple repositories, etc.?

Intermediate
4) Is there an ongoing data cleansing procedure to look for ROT (Redundant, Obsolete,
Trivial content)?
5) Does the search engine index more than 4 repositories around the organization?
6) Are system features and metadata fields added based on cost/benefit analysis, rather
than things that are easy to do with the current tools?
7) Are tools only acquired after requirements have been analyzed, or are major purchases
sometimes made to use up year-end money?
8) Are there hiring and training practices especially for metadata and taxonomy positions?

Advanced
9) Are there established qualitative and quantitative measures of metadata quality?
10) Can the CEO explain the ROI for search and metadata?
Taxonomy Strategies LLC
The business of organized information
16
Baseline for Comparison
Frequency of Processes
80
70
60
50
40
30
20
10
0
Query Log
Examination
Organization
Metadata
Standard
ROT
Elimination
Multiple
Repositories
14 Responses from 35 Attendees at a Taxonomy Workshop
Taxonomy Strategies LLC
The business of organized information
17
Aspects of Search and Metadata Maturity
“Limiting” Processes are
harmful practices which
interfere with maturity.
We are collecting
and categorizing
Processes by Area
and Level
Process Areas
Maturity Levels
Basic
Intermed-iate
Advanced
Bleeding Edge
Search Capabilities
Uniform Search Box
Query Log Exam.
Index Multiple
Best Bets
Simple Grouping
Intranet Facet
Navigation
Improved Ranking
Metadata and taxonomy
standards
System MD Stds.
Organization MD Std.
Reuse ERP
Multipe Repos
Comply
Taxonomy Roadmap
Tools and tool selection
Requirements, then
Tools
Bakeoff Datasets
Budget for Bakeoffs
Staff training and hiring
Search Analyst Role
Librarian Expertise
Pre-hire Testing
SME Catalogers
Data creation and QA
CM Introduced
ROT-Eliminatiion
Hybrid Creation
Model
Adaptive
Qualification
Quality Measures
Project management
Project Plan
Std. Proj. Methodol.
X-Functional Teams
Communication Plan
Multi-Year Plan
Early Termination
Executive support and ROI
External Search ROI
Intranet ROI Model
CEO knows Search
ROI
Taxonomy Strategies LLC
The business of organized information
Limiting
Highly Abstract
Subject Taxonomies
Unneeded Capabils.
Tools, then Reqs.
Use it or Lose It
Budgets
18
Search Capabilities
Processes,
Categorized by Type
and Level
 Basic:
 “Uniform Search Box”
 “Query Log Examination”
 Requires reporting functions and an identified staffer
 Intermediate:
 “Index Multiple Repositories”
 Beyond simple web spidering
Highly Valuable
Processes in Orange
 “Best Bets”
 “Simple Results Grouping”
 Advanced:
 “Improved Ranking from Link and Popularity Analysis”
 “Intranet Facet Navigation”
 See Rosenfeld’s EIA Roadmap for more details on search
capabilities staged over time.
Taxonomy Strategies LLC
The business of organized information
19
Rosenfeld’s EIA Roadmap
Taxonomy Strategies LLC
The business of organized information
20
Metadata and Taxonomy Standards
 Basic:
 “System Metadata Standards”
 Intermediate:
 “Defined Organizational Metadata Standard”
 “Reuse of ERP Vocabularies”
 Advanced:
 “Multiple Repositories Comply with Metadata Standard”
 “Taxonomy Roadmap”
 A plan for adding facets over time, based on known upcoming projects
which can use them.
 Requires “Multi-Year Plan of Upcoming Projects”
 Bleeding Edge:
 “Highly Abstract Subject Taxonomies”
 e.g. categorization by Mood & Emotion
Taxonomy Strategies LLC
The business of organized information
21
“Organizational Metadata Standard” - How is Dublin
Core extended?
120%
100%
100%
86%
80%
60%
57%
57%
Roles
Inconsistent
Encoding
40%
20%
0%
Doc Types
Products &
Services
Base: 20 corporate information managers
CEN/ISSS Workshop on Dublin Core
– Guidance information for the deployment of Dublin
Core metadata in Corporate Environments
Taxonomy Strategies LLC
The business of organized information
22
Tools and Tool Selection
 Limiting:
 “Use of Unneeded Tool Capabilities”
 e.g. autogenerated keywords
 “Tools, then Requirements”
 Related to “Use it or Lose it Budgeting”
 Basic:
 “Purpose, then Requirements, then Tools”
 Intermediate:
 “Datasets for Product Evaluations”
 Advanced:
 “Budgeted Evaluations”*
Taxonomy Strategies LLC
The business of organized information
23
Staff Training and Hiring
 Basic:
 “Search Analyst Role”
 Related to “Query Log Examination”
 Intermediate:
 “Adding and Appointing Library Expertise”
 Advanced:
 “Pre-Hire Testing”
 Bleeding Edge
 “Hiring Subject Matter Experts for Cataloging”
Taxonomy Strategies LLC
The business of organized information
24
Data Creation and QA
 Basic:
 “Content Management Introduced”
 Intermediate:
 “ROT-Elimination”
 Advanced:
 “Hybrid Metadata Creation Models”
 Bleeding Edge:
 “Adaptive Qualification of End-User Feedback”
 “Qualitative and Quantitative Measures of Metadata
Quality”*
* Hypothetical, not yet observed in survey participants
Taxonomy Strategies LLC
The business of organized information
25
Methods used to create & maintain metadata:Note
that Automation ≠ Maturity
80%
71%
70%
57%
60%
50%
43%
43%
Centralized
production
Not Automated
40%
30%
20%
10%
0%
Forms
Distributed
Production
Base: 20 corporate information managers
CEN/ISSS Workshop on Dublin Core
– Guidance information for the deployment of Dublin
Core metadata in Corporate Environments
Taxonomy Strategies LLC
The business of organized information
26
Project Management
 Basic:
 “Project Plan”
 Intermediate:
 “Standard Project Methodology”
 “Cross-functional Teams”
 “Communication Plan”
 “Multi-Year Plan of Upcoming Projects”
 Advanced:
 “Early Termination of Projects”
 See Enterprise Search Report for much more on
managing a search project.
Taxonomy Strategies LLC
The business of organized information
27
Executive Support and ROI
 Limiting:
 “Use It or Lose It Budgeting”
 Basic:
 “External Search ROI”
 Intermediate:
 “Intranet ROI Model”
 Advanced:
 “CEO knows Search ROI”
 See Enterprise Search Report for much more on
search ROI.
Taxonomy Strategies LLC
The business of organized information
28
Conclusions
 Remember the rant – Maturity is a characterization of the way an
organization achieves its goals, not a goal in and of itself.
 Not all search needs are created equal.
 Stock photo agencies are tops at search on external site.
 Their intranets are no better than anyone else’s because the ROI is
not clear.
 Consulting agencies have better intranets and KM efforts because of
the clearer ROI.
 High Maturity really means a Metrics Emphasis
 Some organizations believe that is inappropriate for them
 Use this as a guide to decide where to improve, and to decide
which processes may be more sophisticated than your
organization can handle
 Keep in mind the difference between organizational and team
sophistication. A specific team may do some very advanced things,
even if the organization around them is not “mature”.
Taxonomy Strategies LLC
The business of organized information
29
Recommended Reading
CMMI: http://chrguibert.free.fr/cmmi
(Official site is http://www.sei.cmu.edu/cmmi/, but that is not the most
comprehensible.)
Joel Test
http://www.joelonsoftware.com/articles/fog0000000043.html
EIA Roadmap
http://www.louisrosenfeld.com/presentations/031013-KMintranets.ppt
Enterprise Search Report
http://www.cmswatch.com/EntSearch/
Taxonomy Strategies LLC
The business of organized information
30
Taxonomy Strategies LLC
Contact Info
Ron Daniel
925-368-8371
[email protected]
Joseph Busch
415-377-7912
[email protected]
May 17, 2005
Copyright 2005 Taxonomy Strategies LLC. All rights reserved.