FAQs About Taxonomies & Metadata

Download Report

Transcript FAQs About Taxonomies & Metadata

Taxonomy Strategies LLC
FAQs About Taxonomies &
Metadata
Joseph A. Busch & Ron Daniel, Jr.
May 16, 2005
Copyright 2005 Taxonomy Strategies LLC. All rights reserved.
Agenda
9:00
Who are we?
9:10
What are taxonomies & metadata?
9:30
What kinds of taxonomies are there, and what do I need?
9:40
How do I get a good taxonomy?
10:05
How do I associate the taxonomy with content?
10:30
Break
10:45
What do taxonomies and metadata have to do with search?
11:15
How can I sell my management on a taxonomy project?
11:45
Any more questions?
12:00
Adjourn
Taxonomy Strategies LLC
The business of organized information
2
Who is Joseph Busch?
 Over 25 years in the business of organized information
 Founder, Taxonomy Strategies
 Director, Solutions Architecture, Interwoven
 VP, Infoware, Metacode Technologies
 Program Manager, Getty Foundation
 Manager, Pricewaterhouse
 Metadata and taxonomies community leadership
 President, American Society for Information Science & Technology
 Director, Dublin Core Metadata Initiative
 Adviser, National Research Council Computer Science and
Telecommunications Board
 Reviewer, National Science Foundation Division of Information and
Intelligent Systems
 Founder, Networked Knowledge Organization Systems/Services
Taxonomy Strategies LLC
The business of organized information
3
Who is Ron Daniel, Jr.?
 Over 15 years in the business of metadata & automatic
classification
 Principal, Taxonomy Strategies
 Standards Architect, Interwoven
 Senior Information Scientist, Metacode Technologies
 Technical Staff Member, Los Alamos National Laboratory
 Metadata and taxonomies community leadership
 Chair, PRISM (Publishers Requirements for Industry Standard Metadata)
working group
 Acting Chair, XML Linking working group
 Member, RDF working groups
 Co-editor, PRISM, XPointer, 3 IETF RFCs, and Dublin Core 1 & 2
reports.
Taxonomy Strategies LLC
The business of organized information
4
Who has Taxonomy Strategies worked with?
Government
Commercial
 Commodity Futures Trading Commission
 Defense Intelligence Agency
 ERIC
 Federal Aviation Administration
 Federal Reserve Bank of Atlanta
 Forest Service
 GSA Office of Citizen Services
(www.firstgov.gov)
 Head Start
 Infocomm Development Authority of
Singapore
 NASA (nasataxonomy.jpl.nasa.gov)
 Small Business Administration
 Social Security Administration
 USDA Economic Research Service
 USDA e-Government Program
(www.usda.gov)
 Allstate Insurance
 Blue Shield of California
 Debevoise & Plimpton
 Halliburton
 Hewlett Packard
 Motorola
 PeopleSoft
 Pricewaterhousecoopers
 Siderean Software
 Sprint
 Time Inc.
Commercial subcontracts
 Agency.com – Top financial services
 Critical Mass – Fortune 50 retailers
 Deloitte Consulting – Big credit card
 Gistics/OTB – Direct selling giant
International orgs & Non-profits
 CEN
 IDEAlliance
 IMF
 OCLC
Taxonomy Strategies LLC
The business of organized information
5
What we do
Organize Stuff
Taxonomy Strategies LLC
The business of organized information
6
Who are you? What do you want out of today?
 Government / NGO / SME / Global 2000?
 IT / Library & IM / Public Affairs / Product Management
/ Engineering / HR & Finance / Other?
 Webmaster / Technical / Researcher / Editorial /
Supervisory / Executive?
 Competing session – Search & Content Management:
Putting the Puzzle Pieces Together
 What brought you HERE instead of THERE?
Taxonomy Strategies LLC
The business of organized information
7
Agenda
9:00
Who are we?
9:10
What are taxonomies & metadata?
9:30
What kinds of taxonomies are there, and what do I need?
9:40
How do I get a good taxonomy?
10:05 How do I associate the taxonomy with content?
10:30 Break
10:45 What do taxonomies and metadata have to do with search?
11:15 How can I sell my management on a taxonomy project?
11:45 Any more questions?
12:00 Adjourn
Taxonomy Strategies LLC
The business of organized information
8
What is metadata? Different definitions
 Library & Information
Science
 Author/Title/Subject
 Controlled Vocabularies for
Subject Codes (e.g. Dewey)
 Authority Files for Author
Names
 Database
 Tables/Columns/
Datatypes/Relationships
 References for some values
Taxonomy Strategies LLC
The business of organized information
9
Difficult to Generate
What is metadata? Another view of Dublin Core
Subject metadata –
Use metadata –
What & Why:
Subject, Description,
Coverage
How can it be used:
Rights & Permissions
Better resource description =
Better navigation &
Asset metadata –
discovery
Who, Where & When:
Relational metadata –
Title, Creator, Publisher,
Contributor, Date, Type,
Format, Identifier, Source,
Language
Links between and to:
Relation
Functionality
Taxonomy Strategies LLC
The business of organized information
10
Are there extensions to the Dublin Core?
Elements
Refinements
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
Abstract
Access rights
Alternative
Audience
Available
Bibliographic citation
Conforms to
Created
Date accepted
Date copyrighted
Date submitted
Education level
Extent
Has format
Has part
Has version
Is format of
Is part of
Identifier
Title
Creator
Contributor
Publisher
Subject
Description
Coverage
Format
Type
Date
Relation
Source
Rights
Language
Taxonomy Strategies LLC
Encodings Types
Is referenced by
Is replaced by
Is required by
Issued
Is version of
License
Mediator
Medium
Modified
Provenance
References
Replaces
Requires
Rights holder
Spatial
Table of contents
Temporal
Valid
The business of organized information
Box
DCMIType
DDC
IMT
ISO3166
ISO639-2
LCC
LCSH
MESH
Period
Point
RFC1766
RFC3066
TGN
UDC
URI
W3CTDF
Collection
Dataset
Event
Image
Interactive
Resource
Moving Image
Physical Object
Service
Software
Sound
Still Image
Text
11
What is metadata: A scheme for recipes
Element
Data
Type
Length
Source
Purpose
Asset Metadata
Unique ID
Integer
Fixed
System supplied
Basic accountability
Recipe Title
String
Variable
Licensed Content
Text search & results display
Recipe summary
String
Variable
Licensed Content
Content
Variable
Main Ingredients
vocabulary
Key index to retrieve & aggregate
recipes, & generate shopping list
Main Ingredients
List
Subject Metadata
Meal Types
List
Variable
Meal Types vocab
Cuisines
List
Variable
Cuisines
Courses
List
Variable
Courses vocab
Cooking Method
Flag
Fixed
Cooking vocab
Browse or group recipes & filter search
results
Link Metadata
Recipe Image
Pointer
Variable
Product Group
Merchandize products
Use Metadata
Rating
String
Variable
Licensed Content
Filter, rank, & evaluate recipes
Release Date
Date
Fixed
Product Group
Publish & feature new recipes
Taxonomy Strategies LLC
The business of organized information
12
What is a taxonomy? Systematics
Pragmatic view
Biological taxonomy place an organism in one and only one
But most of the time things belong
place. to more than one category.
Animalia
Chordata
Mammalia
Carnivora
Canidae
Canis
C. familiari
Kingdom
Phylum
Class
Order
Family
Genus
Species
Linnaeus …
Pets
Mammals
Farm
Animals
Dogs
Taxonomy Strategies LLC
The business of organized information
13
Agenda
9:00
Who are we?
9:10
What are taxonomies & metadata?
9:30
What kinds of taxonomies are there, and what do I need?
9:40
How do I get a good taxonomy?
10:05 How do I associate the taxonomy with content?
10:30 Break
10:45 What do taxonomies and metadata have to do with search?
11:15 How can I sell my management on a taxonomy project?
11:45 Any more questions?
12:00 Adjourn
Taxonomy Strategies LLC
The business of organized information
14
Are there other organizational schemes?
Type
Remarks
Synonym
Ring
Connects a series of terms together
Treats them as equivalent for search purposes
Authority File
Used to control variant names with a preferred term
Typically used for names of countries, individuals,
organizations
Classification An arrangement of knowledge
Does not follow taxonomy rules
Scheme
Usually enumerated; ie, LC or Dewey
Thesaurus
Expresses semantic relationships of:
 Hierarchy (broader & narrower terms)
 Equivalence (synonyms)
 Associative (related terms)
Ontology
Taxonomy Strategies LLC
Resembles faceted taxonomy but uses richer semantic
relationships among terms and attributes and strict
specification rules
The business of organized information
15
Another point of view ….
Taxonomies
Ontologies
(Vocabularies)
Synonym
Rings
Authority
Files
Classification
Schemes
Simple
Equivalence
Thesauri
Complex
Hierarchical
Associative
(Relationships)
Source: Amy Warner. Metadata and Taxonomies for a More Flexible Information
Architecture (http://www.lexonomy.com/presentations/metadataAndTaxonomies.ppt)
Taxonomy Strategies LLC
The business of organized information
16
Taxonomic metadata – e-Forms example
Agency
0001 Legislative
1000 Judicial
1100 Executive
Office of Pres
0003 Exec Depts
1200 Agriculture
1300 Commerce
9700 Defense
9100 Education
8900 Energy
7500 HHS
7000 DHS
8600 HUD
1400 Interior
1500 Justice
1600 Labor
1900 State
6900 Transport
2000 Treasury
3600 Veterans
Ind Agencies
Intl Orgs
Form Type
Industry
Impact
Application
Approval
Claim
Information
request
Information
submission
Instructions
Legal filing
Payment
Procurement
Renewal
Reservation
Service
request
Test
Other input
Other
transaction
00 Generic
11 Agriculture
21 Mining
22 Utilities
23 Construct
31-33 Manuf
42 Wholesale
44-45 Retail
48-49 Trans
51 Info
52 Finance
54 Profession
55 Mgmt
56 Support
61 Education
62 Health
Care
71 Arts
72 Hospitality
81 Other
Services
92 Public
Admin
Jurisdiction
Metadata Elements
Federal
State +
Local +
Other +
BRM Impact
Keyword
Topic
Citizen Srvcs
Social Srvs
Defense
Disasters
Econ Dev
Education
Energy
Env Mgmt
Law Enf
Judicial
Correctional
Health
Security
Income Sec
Intelligence
Intl Affairs
Nat Resour
Transport
Workforce
Science
Delivery
Support
Management
Agriculture &
food
Commerce
Communications
Education
Energy
Env pro
Foreign rels
Govt
Health &
safety
Housing &
comm dev
Labor
Law
Named grps
National def
Nat resources
Recreation
Sci & tech
Social pgms
Transport
Audience
All
General
Citizen
Business
Govt
Employee
Native
American
Nonresident
Tourist
Special
group
Taxonomies
Taxonomy Strategies LLC
The business of organized information
17
Why use faceted taxonomies?
 4 independent categories
of 10 nodes each have
the same discriminatory
power as one hierarchy
of 10,000 nodes (104)
 Easier to maintain
 Can be easier to
navigate
Taxonomy Strategies LLC
The business of organized information
18
Agenda
9:00
Who are we?
9:10
What are taxonomies & metadata?
9:30
What kinds of taxonomies are there, and what do I need?
9:40
How do I get a good taxonomy?
 Can I get a taxonomy off-the-shelf or create one with software?
 How do you know it is good?
 How do you build or modify to make it good?
10:05 How do I associate the taxonomy with content?
10:30 Break
10:45 What do taxonomies and metadata have to do with search?
11:15 How can I sell my management on a taxonomy project?
11:45 Any more questions?
12:00 Adjourn
Taxonomy Strategies LLC
The business of organized information
19
How do I get a good Taxonomy? – Seven practical
rules
1) Incremental, extensible process that identifies and enables
users, and engages stakeholders.
2) Quick implementation that provides measurable results as
quickly as possible.
3) Not monolithic—has separately maintainable facets.
4) Re-uses existing IP as much as possible.
5) A means to an end, and not the end in itself .
6) Not perfect, but it does the job it is supposed to do—such as
improving search and navigation.
7) Improved over time, and maintained.
Taxonomy Strategies LLC
The business of organized information
20
Can I get a taxonomy off the shelf?
 Sure:
 www.taxonomywarehouse.com
 There are usually license fees, but they will be less than
the effort to develop an equivalent taxonomy.
 The voice of experience says these will usually not be
what you want.
 We recommend:
 Adopt a faceted approach.
 Reuse existing (esp. internal) vocabularies for as many
of the facets as reasonable.
 Plan on doing full-custom “Content Type” and “Subject”
taxonomies.
Taxonomy Strategies LLC
The business of organized information
21
Sources for 8 common taxonomies
Taxonomy
Definition
Potential Sources
Organization
Organizational structure.
FIPS 95-2, U.S. Government Manual, Your
organizational structure, etc.
Content Type
Structured list of the various types
of content being managed or used.
DC Types, AGLS Document Type, AAT
Information Forms , Your records management
policy, etc.
Industry
Broad market categories such as
lines of business, life events, or
industry codes.
FIPS 66, SIC, NAICS, Your market segments,
etc.
Location
Place of operations or
constituencies.
FIPS 5-2, FIPS 55-3, ISO 3166, UN Statistics
Div, US Postal Service, Your sales regions, etc.
Function
Functions and processes
performed to accomplish mission
and goals.
FEA Business Reference Model, Enterprise
Ontology, AAT Functions, Your business
functions, etc.
Topic
Business topics relevant to your
mission & goals.
Federal Register Thesaurus, NAL Agricultural
Thesaurus, LCSH, Your research areas, etc.
Audience
Subset of constituents to whom a
piece of content is directed or
intended to be used.
GEM, ERIC Thesaurus, IEEE LOM, Your
psycho-graphics or personas, etc.
Products &
Services
Names of products/programs &
services.
ERP system, Your products and services, etc.
Taxonomy Strategies LLC
The business of organized information
22
What about automatically created taxonomies?
 Documents can be
‘clustered’ based on
similarities and differences.
 Problems:
 Typically only a single
hierarchy
 No overall plan
 Results hard for people to
navigate
What does “North” mean on this map?
Taxonomy Strategies LLC
The business of organized information
23
What should I expect from automatic taxonomy
construction software?
 Software can scan large quantities of
content and extract statistically significant
words and phrases.
 Example: Archive of 10 publications was
analyzed for topics significant to ‘copyright’.
 Software does a poor job of
 de-duplication
 turning those significant words and phrases
into a larger structure
 discriminating between gold and garbage
 Software is good for
 getting an understanding of the key phrases
in a large amount of content
 providing test cases for evaluating a
taxonomy
Taxonomy Strategies LLC
The business of organized information
Source: Sample data courtesy of
Randy Marcinko and nStein.
24
How can I test a Taxonomy? – Qualitative methods
Method
Walk-throughs
Process
Show and explain
Validation
 Approach
 Consistency to rules
Usability Testing
User Satisfaction
Tagging samples
 Appropriateness to task
Contextual analysis  Tasks are completed
successfully
(card sorting,
scenario testing,
 Time to complete task is reduced
etc.)
 Reaction to new interface
Survey
Tag sample
content with
taxonomy
 Reaction to search results
 Content ‘fit’
 Fills out content inventory
 Training materials for people &
algorithms
 Basis for quantitative methods
Taxonomy Strategies LLC
The business of organized information
25
Quantitative Method – How evenly does it divide the
content?
 Background:
Measured and Expected Distribution of Top 10 Content Types
in Library of Congress Database
 Documents do not distribute uniformly
350,000
Number of Records
across categories
 Zipf (1/x) distribution is expected
behavior
 80/20 rule in action (actually 70/20 rule)
300,000
250,000
Series2
200,000
Series1
150,000
100,000
50,000
er
at
ur
e
bl
io
gr
ap
hy
St
at
is
tic
s
Bi
Ju
ve
ni
le
lit
itio
ns
ct
io
n
Ex
hi
b
20
15
Measured
Expected
10
5
Programs,
Proposals, Plans
& Schedules
Other &
Unclassified
Papers &
Presentations
Regulations,
Policies,
Procedures &
Marketing &
Sales
Manuals &
Learning
Materials
Operations &
Internal
Communications
0
News & Events
the Zipf distribution, which is better than
expected
25
People, Groups
& Places
 Results were slightly more uniform than
Measured and Expected Distribution of Content Types in an
Intranet
# Documents
 Results:
ap
s
Top 10 Content Types
 Part of alpha test of ‘content type’ for
corporate intranet
 115 URLs selected at random from
search index were manually categorized.
Inaccessible files and ‘junk’ were
removed
Fi
 Methodology:
M
Co
ng
re
ss
es
Bi
og
ra
ph
y
Pe
rio
di
ca
ls
0
Content Type
Taxonomy Strategies LLC
The business of organized information
26
Quantitative Method – How intuitive (repeatable) are the
categorizations?
 Methodology: Closed Card
Sort
 For alpha test of a grocery site
 15 Testers put each of 100 best-
selling products into one of 10
pre-defined categories
 Categories where fewer than 14
of 15 testers put product into
same category were flagged
“Cocoa Drinks – Powder” is best
categorized in both “Beverages”
and “Grocery”.
 Results:
% of
Testers
Cumulative %
of Products
15/15
54%
14/15
13/15
12/15
11/15
<11/15
70%
77%
83%
85%
100%
Taxonomy Strategies LLC
The business of organized information
In the trade, “Corn Tortillas” are
a Dairy item!
27
Quantitative Method – How does taxonomy “shape”
match that of content?
 Background:
 Hierarchical taxonomies allow
comparison of “fit” between content
and taxonomy areas
 Methodology:
 25,380 resources tagged with
taxonomy of 179 terms. (Avg. of 2
terms per resource)
 Counts of terms and documents
summed within taxonomy hierarchy
 Results:
 Roughly Zipf distributed (top 20
terms: 79%; top 30 terms: 87%)
 Mismatches between term% and
document% flagged
Taxonomy Strategies LLC
The business of organized information
Term Group
%
Terms
%
Docs
Administrators
7.8
15.8
Community Groups
2.8
1.8
Counselors
3.4
1.4
Federal Funds Recipients
and Applicants
9.5
34.4
Librarians
2.8
1.1
News Media
0.6
3.1
Other
7.3
2.0
Parents and Families
2.8
6.0
Policymakers
4.5
11.5
Researchers
2.2
3.6
School Support Staff
2.2
0.2
Student Financial Aid
Providers
1.7
0.7
Students
27.4
7.0
Teachers
25.1
11.4
Source: Courtesy Keith Stubbs, US. Dept. of Ed.
28
How do large corporations typically extend the
Dublin Core?
120%
100%
100%
86%
80%
57%
60%
40%
20%
0%
Doc Types
Products & Services
Roles
Base: 20 corporate information managers
Source: CEN/ISSS Workshop on Dublin Core. Guidance information for the deployment of
Dublin Core metadata in Corporate Environments
(http://www.cenorm.be/cenorm/businessdomains/businessdomains/isss/cwa/cwa15247.asp)
Taxonomy Strategies LLC
The business of organized information
29
Agenda
9:00
9:10
9:30
9:40
10:05
Who are we?
What are taxonomies & metadata?
What kinds of taxonomies are there, and what do I need?
How do I get a good taxonomy?
How do I associate the taxonomy with content?
 How are we going to populate metadata elements with complete and consistent




10:30
10:45
11:15
11:45
12:00
values?
What can we expect to get from automatic classifiers?
What kinds of tools do people use?
How do different automatic classification tools compare?
What else should I keep in mind?
Break
What do taxonomies and metadata have to do with search?
How can I sell my management on a taxonomy project?
Any more questions?
Adjourn
Taxonomy Strategies LLC
The business of organized information
30
General remarks on tagging
 Province of authors (SMEs) or editors?
 Taxonomy often highly granular to meet task and re-use needs.
 Vocabulary dependent on originating department.
 The more tags there are (and the more values for each tag), the
more hooks to the content.
 If there are too many, authors will resist and use “general” tags
(if available)
 Automatic classification tools exist, and are valuable, but results
are not as good as humans can do.
 “Semi-automated” is best.
 Degree of human involvement is a cost/benefit tradeoff.
Taxonomy Strategies LLC
The business of organized information
31
What methods do large companies use to create &
maintain metadata?
80%
71%
70%
57%
60%
50%
43%
43%
Centralized
production
Not Automated
40%
30%
20%
10%
0%
Forms
Distributed
Production
Base: 20 corporate information managers
Source: CEN/ISSS Workshop on Dublin Core. Guidance information for the deployment of
Dublin Core metadata in Corporate Environments
(http://www.cenorm.be/cenorm/businessdomains/businessdomains/isss/cwa/cwa15247.asp)
Taxonomy Strategies LLC
The business of organized information
32
low
Content Volumes
high
How do tools compare? Analyst viewpoint
low
Taxonomy Strategies LLC
The business of organized information
Accuracy Level
high
33
What accuracy should we expect from an automatic
classifier?
 Classification Performance is
Accuracy
measured by “Inter-cataloger
agreement”
Trained Librarians
 Trained librarians agree less than 80%
of the time
 Errors are subtle differences in
judgment, or big goofs
potential
performance
gain
Regexps
 Automatic classification struggles to
match human performance
 Exception: Entity recognition can
exceed human performance
 Classifier performance limited by
algorithms available, which is limited
by development effort
 Very wide variance in one vendor’s
performance depending on who does
the implementation, and how much
time they have to do it
Taxonomy Strategies LLC
The business of organized information
Development Effort/
Licensing Expense
1) 80/20 tradeoff where 20% of effort
gives 80% of performance.
2) Smart implementation of inexpensive
tools will outperform naive
implementations of world-class tools.
34
low
Content Volumes
high
How do tools compare? Pragmatic viewpoint
low
Taxonomy Strategies LLC
Accuracy Level
The business of organized information
high
35
What kind of metadata creation and maintenance
process is needed?
 Even ‘purely’ automatic
meta-tagging systems need
a manual error correction
procedure.
Compose in
Template
Automatically
fill-in metadata
Submit to CMS
Problem?
Y
Approve/Edit
metadata
 Should add a QA sampling
Review
content
N
Copy Edit
content
Har
d
Cop
y
Web
site
mechanism
Problem?
N
 Tagging models:
 Author-generated
 Central librarians
 Hybrid – central auto-tagging
service, distributed manual
review and correction
Taxonomy Strategies LLC
The business of organized information
Y
Tagging Tool
Analyst
Editor
Copywriter
Sys Admin
Sample of ‘author-generated’ metadata
workflow.
36
Tagging tool example: Interwoven MetaTagger
Manual form fill-in w/ check
boxes, pull-down lists, etc.
Taxonomy Strategies LLC
The business of organized information
Auto keyword &
summarization
37
Tagging tool example: Interwoven MetaTagger
Auto-categorization
Rules & pattern
matching
Parse & lookup
(recognize names)
Taxonomy Strategies LLC
The business of organized information
38
Where do I put the metadata?
 Where can I store metadata?
 In the content – HTML Headers, File properties, etc.
 In a centralized repository – Search index, Metadata database, etc.
 Where should I store metadata? It depends.
 If you are moving files through a process, putting it in the file keeps it
from getting dropped at system borders.
 If you are doing search across multiple documents, it has to be at
least copied out of the files.
 If you make copies of files and modify them, consistent in-file
metadata will be impossible.
 Real question is not where to STORE the metadata, it is how to
MAINTAIN the metadata.
 Web CMS as an example
Taxonomy Strategies LLC
The business of organized information
39
Agenda
9:00
Who are we?
9:10
What are taxonomies & metadata?
9:30
What kinds of taxonomies are there, and what do I need?
9:40
How do I get a good taxonomy?
10:05 How do I associate the taxonomy with content?
10:30 Break
10:45 What do taxonomies and metadata have to do with search?
11:15 How can I sell my management on a taxonomy project?
11:45 Any more questions?
12:00 Adjourn
Taxonomy Strategies LLC
The business of organized information
40
Agenda
9:00
9:10
9:30
9:40
10:05
10:30
Who are we?
What are taxonomies & metadata?
What kinds of taxonomies are there, and what do I need?
How do I get a good taxonomy?
How do I associate the taxonomy with content?
Break
10:45 What do taxonomies and metadata have to do with search?




Does adding a taxonomy mean replacing my search engine?
How are they used behind the scenes in a search implementation
How are they used in the Search UI to aid searching?
How can we make our current search engine better?
11:15 How can I sell my management on a taxonomy project?
11:45 Any more questions?
12:00 Adjourn
Taxonomy Strategies LLC
The business of organized information
41
How to fix search? … Add metadata to search on!

“Adding metadata to unstructured content allows it to be managed
like structured content. Applications that use structured content work
better.”

“Enriching content with structured metadata is critical for supporting
search and personalized content delivery.”

“Content that has been adequately tagged with metadata can be
leveraged in usage tracking, personalization and improved
searching.”

“Better structure equals better access: Taxonomy serves as a
framework for organizing the ever-growing and changing information
within a company. The many dimensions of taxonomy can greatly
facilitate Web site design, content management, and search
engineering. If well done, taxonomy will allow for structured Web
content, leading to improved information access.”
Taxonomy Strategies LLC
The business of organized information
42
How does Google do so well without metadata?
 They don’t, they just use particular types of metadata:
 Number of incoming links
 PageRank for each incoming link
 Text of incoming links
Taxonomy Strategies LLC
The business of organized information
43
Dublin Core framework for corporate use
 Not just 15 elements
 A framework to enable cross-resource exploration and
use
Dublin Core is framework
for “integration metadata”
at BellSouth
Source: Courtesy of Todd Stephens, BellSouth
Taxonomy Strategies LLC
The business of organized information
44
What about Search? Integration Metadata
Element
Data
Type
Length
Req. /
Repeat
Source
Purpose
Asset Metadata
Unique ID
Integer
Fixed
dc:identifier
1
System supplied
Basic accountability
Recipe Title
dc:title Variable
String
1
Licensed Content
Text search & results display
Recipe summary
dc:description
String
Variable
1
Licensed Content
Content
Main Ingredients
X
List
?
Main Ingredients
vocabulary
Key index to retrieve & aggregate
recipes, & generate shopping list
Variable
Subject Metadata
Meal Types
ListX
Variable
*
Meal Types vocab
Cuisines
ListX
Variable
*
Cuisines
Courses
ListX
Variable
*
Courses vocab
Cooking Method
X
Flag
Fixed
*
Cooking vocab
Browse or group recipes & filter
search results
Link Metadata
Recipe Image
Pointer
Variable
dcterms:hasPart
?
Product Group
Merchandize products
Use Metadata
Rating
String
Variable
Release Date
dc:dateFixed
Date
1
Licensed Content
Filter, rank, & evaluate recipes
1
Product Group
Publish & feature new recipes
dc:type=“recipe”,
dc:format=“text/html”,
Legend:
? – 1 or more * -dc:language=“en”
0 or more
Taxonomy Strategies LLC
The business of organized information
45
Agenda
9:00
Who are we?
9:10
What are taxonomies & metadata?
9:30
What kinds of taxonomies are there, and what do I need?
9:40
How do I get a good taxonomy?
10:10 How do I associate the taxonomy with content?
10:30 Break
10:45 What do taxonomies and metadata have to do with search?
11:30 How can I sell my management on a taxonomy project?
11:45 Any more questions?
12:00 Adjourn
Taxonomy Strategies LLC
The business of organized information
46
How do I sell Management on a Taxonomy Project?
 Don’t sell “metadata” or “taxonomy”, sell the vision of
what you want to be able to do.
 Clearly understand what the problem is and what the
opportunities are.
 Do the calculus (costs and benefits)
 Design the taxonomy (in terms of LOE) in relation to
the value at hand.
Taxonomy Strategies LLC
The business of organized information
47
Fundamentals of metadata ROI
 Tagging content using metadata and a taxonomy are
costs, not benefits.
 There is no benefit without exposing the tagged
content to users in some way that cuts costs or
improves revenues.
 Putting metadata and a taxonomy into operation
requires UI changes and/or backend system changes,
as well as data changes.
 You need to determine those changes, and their costs,
as part of the ROI.
Taxonomy Strategies LLC
The business of organized information
48
What are the typical metadata ROI scenarios?
 Catalog site
 Increased sales.
 Increased productivity.
 Customer support
 Cutting costs.
 Increased sales.
 Compliance
 Avoiding penalties.
 Knowledge worker productivity
 Less time searching, more time working.
Taxonomy Strategies LLC
The business of organized information
49
Metadata ROI: Catalog site
Guided Navigation
 2-3 clicks to product
 No dead ends
http://www.tesco.com/winestore
Taxonomy Strategies LLC
The business of organized information
50
Metadata ROI: Catalog site
 Increased sales
 Enterprise portal cost
 Product findability.
 $6M
 Product cross-sells and up-
sells.
 Customer loyalty.
 1-5% increase in sales
 $57.6B sales (’04)
 $2.1B net income (’04)
 1-5% increase in productivity
 $600M to $2B/year
 $21M to $105M/year
 $155M to $776M/year
$50K average cost per employee
310,400 employees (’04)
Source: Proforma based on Hoover’s data.
Taxonomy Strategies LLC
The business of organized information
51
Metadata ROI: Customer support model
Help on search
page, not a click
away.
Type and go to
search for specific
policies
Policy categories
for browsing
Refine search
offered with
results
Good search
results for policy
topics, e.g.,
“pets”
Taxonomy Strategies LLC
The business of organized information
52
Metadata ROI: Customer support model
 Self service
Manual processing
 Fewer customer calls.
 100,000 documents
 Faster, more accurate CSR
 2 pages per document
responses through better
information access.
 $4 per page
 $800K
 25-50% service efficiency
increase
 300K customer service calls
per month
 $6 cost per call

$5.4M to $10.8M/yr
 1-5% increased sales
 $18.6B sales (’04)
 ($761M) net income (’04)
 $186M to $930M/year
 ($575M) to $169M/year
Source: Proforma based on Hoover’s data.
Taxonomy Strategies LLC
The business of organized information
53
Metadata ROI: Compliance
 Avoiding penalties for
breaching regulations
 SOX: up to 5 years in jail
 SOX: up to $5M
 Following required
procedures
 Loss of company
 $100B revenue (’00)

$100B
 Loss of partner companies
 Arthur Andersen
Source: Proforma based on Hoover’s data.
Taxonomy Strategies LLC
The business of organized information
54
Knowledge workers spend up to 2.5 hours
each day looking for information …
Communicating
Searching
Creating
… But find what they are looking for only 40% of
the time.
— Kit Sims Taylor
Taxonomy Strategies LLC
The business of organized information
55
High cost of not finding information
 “The amount of time wasted in futile searching for vital
information is enormous, leading to staggering costs …”
— Sue Feldman, bnb nbnbn
High cost of poor classification

Poor classification costs a 10,000 user organization
$10M each year—about $1,000 per employee.
— Jakob Nielsen, useit.com
But “better search” itself is a weak ROI
Taxonomy Strategies LLC
The business of organized information
56
Knowledge workers spend more time re-creating
existing content than creating new content
Communicating
Recreating
existing
content
26%
Searching
Creating
new
content
9%
— Kit Sims Taylor
Taxonomy Strategies LLC
The business of organized information
57
Metadata ROI: Productivity
 Decreased cost to market
 Decreased development cost
 Increased R&D productivity
 Reduced time for sales &
 Enterprise document
management system cost
 $10M
marketing
 1-5% decrease in drug
development cost
 $800M/drug
 5-10% increase in R&D
 $8M to $16M/drug
productivity
 13% of revenue
 $39B in sales (’04)
 $254M to $507M/year
 10-20% decrease in time
for sales & marketing
 13% of revenue
 $254M to $507M/year
Source: Proforma based on Hoover’s data.
Taxonomy Strategies LLC
The business of organized information
58
Metadata ROI: Executive Mandate
 There is no ROI out of the box
 Just someone with a vision
…and the budget to make it happen.
 What’s really needed?
 Demos and proofs of value.
 So that a stronger cost benefit argument can be made for
continuing the work
Taxonomy Strategies LLC
The business of organized information
59
Productivity, loyalty, and revenue have provided the
ROI
Taxonomy Strategies LLC
The business of organized information
60
Intranet has provided the best ROI
Intranet
Web/online
customer sales
Web dev
infrastructure
Web/online
business sales
Middleware to
link Web to ERP
Extranet/supply
chain
ebilling/payment
systems
Wireless Web
access
e-marketplace/
portal
None
Taxonomy Strategies LLC
The business of organized information
61
Agenda
9:00
Who are we?
9:10
What are taxonomies & metadata?
9:30
What kinds of taxonomies are there, and what do I need?
9:40
How do I get a good taxonomy?
10:05 How do I associate the taxonomy with content?
10:30 Break
10:45 What do taxonomies and metadata have to do with search?
11:15 How can I sell my management on a taxonomy project?
11:45 Any more questions?
12:00 Adjourn
Taxonomy Strategies LLC
The business of organized information
62
Agenda
9:00
Who are we?
9:10
What are taxonomies & metadata?
9:30
What kinds of taxonomies are there, and what do I need?
9:40
How do I get a good taxonomy?
10:05 How do I associate the taxonomy with content?
10:30 Break
10:45 What do taxonomies and metadata have to do with search?
11:15 How can I sell my management on a taxonomy project?
11:45 Any more questions?
12:00 Adjourn
Taxonomy Strategies LLC
The business of organized information
63
Taxonomy Strategies LLC
Contact Info
Ron Daniel
925-368-8371
[email protected]
Joseph Busch
415-377-7912
[email protected]
May 16, 2005
Copyright 2005 Taxonomy Strategies LLC. All rights reserved.