Four Myths about Taxonomies

Download Report

Transcript Four Myths about Taxonomies

Taxonomy Strategies LLC
4 Myths about Taxonomies
ITIMG – Industrial Technical
Information Managers Group Meeting
Newport Beach, CA
April 11, 2005
Copyright 2005 Taxonomy Strategies LLC. All rights reserved.
Who I am
Over 25 years in the business of organized information







Founder & Principal, Taxonomy Strategies
Director, Solutions Architecture, Interwoven
VP, Infoware, Metacode Technologies
Program Manager, Getty Foundation
Manager, Pricewaterhouse
Assistant Director for Technical Services, Hampshire College
Chief, Technical Services, Paul Weiss Rifkind Wharton & Garrison
Metadata & taxonomies community leadership.




President, American Society for Information Science & Technology
Trustee, Dublin Core Metadata Initiative
Co-Founder, Networked Knowledge Organization Systems/Services
Adviser, National Research Council Computer Science and
Telecommunications Board
 Reviewer, National Science Foundation Division of Information and
Intelligent Systems
TAXONOMY STRATEGIES LLC The business of organized information
2
Recent & current projects
Government







Commodity Futures Trading Commission
Defense Intelligence Agency
ERIC
Federal Aviation Administration
Federal Reserve Bank of Atlanta
Forest Service
GSA Office of Citizen Services
(www.firstgov.gov)
 Head Start
 Infocomm Development Authority of
Singapore
 NASA (nasataxonomy.jpl.nasa.gov)
 Small Business Administration
 Social Security Administration
 USDA Economic Research Service
 USDA e-Government Program
(www.usda.gov)
TAXONOMY STRATEGIES LLC The business of organized information
Commercial











Allstate Insurance
Blue Shield of California
Debevoise & Plimpton
Halliburton
Hewlett Packard
Motorola
PeopleSoft
Pricewaterhouse Coopers
Siderean Software
Sprint
Time Inc.
Commercial subcontracts




Agency.com – Top financial services
Critical Mass – Fortune 50 retailer
Deloitte Consulting – Big credit card
Gistics/OTB – Direct selling giant
NGO’s




CEN
IDEAlliance
IMF
OCLC
3
What I do
Organize Stuff
TAXONOMY STRATEGIES LLC The business of organized information
4
Agenda
 Myth #1: The Web has changed everything
 Myth #2: Taxonomies are monolithic hierarchies
 Myth #3: Literary warrant
 Myth #4: Knowledge workers
TAXONOMY STRATEGIES LLC The business of organized information
5
Finding information should not be about
“Feeling Lucky”
TAXONOMY STRATEGIES LLC The business of organized information
6
Something is wrong with this picture
 “…search is so fundamental that people should have
been focusing on it all along. The reality of the situation is
that there was a great assumption that search was
actually working just fine.”
— Harley Manning, Research Director
TAXONOMY STRATEGIES LLC The business of organized information
7
Why doesn’t search work?
 For search engines to work, they need better stuff to work
on!
Otherwise it’s Garbage in…
…and garbage out.
 Correctly matching content with questions (regardless of
the technology) requires better content to work on.
TAXONOMY STRATEGIES LLC The business of organized information
8
How to fix search … add metadata to search on
 “Adding metadata to unstructured content allows it to be
managed like structured content. Applications that use
structured content work better.”
 “Enriching content with structured metadata is critical for
supporting search and personalized content delivery.”
 “Content that has been adequately tagged with metadata
can be leveraged in usage tracking, personalization and
improved searching.”
TAXONOMY STRATEGIES LLC The business of organized information
9
Difficult to Generate
What is metadata? Another view of Dublin Core
Subject metadata –
Use metadata –
What & Why:
Subject, Description,
Coverage
How can it be used:
Rights & Permissions
Better resource description =
Better navigation &
Asset metadata –
discovery
Who, Where & When:
Relational metadata –
Title, Creator, Publisher,
Contributor, Date, Type,
Format, Identifier, Source,
Language
Links between and to:
Relation
Functionality
TAXONOMY STRATEGIES LLC The business of organized information
10
Dublin Core is a little more complicated
Elements
Refinements
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
Abstract
Access rights
Alternative
Audience
Available
Bibliographic citation
Conforms to
Created
Date accepted
Date copyrighted
Date submitted
Education level
Extent
Has format
Has part
Has version
Is format of
Is part of
Identifier
Title
Creator
Contributor
Publisher
Subject
Description
Coverage
Format
Type
Date
Relation
Source
Rights
Language
Encodings Types
Is referenced by
Is replaced by
Is required by
Issued
Is version of
License
Mediator
Medium
Modified
Provenance
References
Replaces
Requires
Rights holder
Spatial
Table of contents
Temporal
Valid
TAXONOMY STRATEGIES LLC The business of organized information
Box
DCMIType
DDC
IMT
ISO3166
ISO639-2
LCC
LCSH
MESH
Period
Point
RFC1766
RFC3066
TGN
UDC
URI
W3CTDF
Collection
Dataset
Event
Image
Interactive
Resource
Moving Image
Physical Object
Service
Software
Sound
Still Image
Text
11
Metadata is a data model– A scheme for e-Forms
Element
Namespace
Source
Purpose
Identifier
dc:identifier
System supplied
Basic accountability
Registrar
dc:creator
LDAP validated
Accountability & maintenance
Form Name
dc:title
User
Text search, results display
Form Number
dcterms:alternative
User
Text search, results display
Revision Date
dcterms:modified
User
Filter or rank search results
FIPS 95-2
Key index to retrieve &
aggregate assets
Agency
dc:publisher
Subject
Form Type
dc:type
Form Type
vocabulary
Industry Code
us:naics
NAICS codes
Browse or group search results
Jurisdiction
dc:coverage
FIPS 5-2
Browse or group search results
Purpose
us:feabrm
FEA Business
Ref Model
Browse or group search results
...
…
...
...
TAXONOMY STRATEGIES LLC The business of organized information
Browse or group search results
12
How is Dublin Core used in corporate
environments?
60%
57%
50%
43%
43%
40%
29%
30%
20%
10%
0%
De facto
Simple
Base: 20 corporate information managers
Access enabler
Compliance
CEN/ISSS Workshop on Dublin Core
– Guidance information for the deployment of Dublin
Core metadata in Corporate Environments
TAXONOMY STRATEGIES LLC The business of organized information
13
Dublin Core framework for corporate use
 Not just 15 elements
 A framework to enable cross-resource exploration and
use
Dublin Core is framework
for “integration metadata”
at BellSouth
TAXONOMY STRATEGIES LLC The business of organized information
14
Agenda
 Myth #1: The Web has changed everything
 Myth #2: Taxonomies are monolithic hierarchies
 Myth #3: Literary warrant
 Myth #4: Knowledge workers
TAXONOMY STRATEGIES LLC The business of organized information
15
What is a taxonomy? Systematics view
Hierarchical classification of things into a tree structure
Animalia
Chordata
Mammalia
Carnivora
Canidae
Canis
C. familiari
Kingdom
Phylum
Class
Order
Family
Genus
Species
Linnaeus …
44-Office Equipment and Accessories and
Supplies
.12-Office Supplies
.17-Writing Instruments
.05-Mechanical pencils
.06-Wooden pencils
.07-Colored pencils
Segment
Family
Class
Commodity
UNSPSC …
TAXONOMY STRATEGIES LLC The business of organized information
16
Taxonomic metadata – e-Forms example
Agency
0001 Legislative
1000 Judicial
1100 Executive
Office of Pres
0003 Exec Depts
1200 Agriculture
1300 Commerce
9700 Defense
9100 Education
8900 Energy
7500 HHS
7000 DHS
8600 HUD
1400 Interior
1500 Justice
1600 Labor
1900 State
6900 Transport
2000 Treasury
3600 Veterans
Ind Agencies
Intl Orgs
Form Type
Industry
Impact
Application
Approval
Claim
Information
request
Information
submission
Instructions
Legal filing
Payment
Procurement
Renewal
Reservation
Service
request
Test
Other input
Other
transaction
00 Generic
11 Agriculture
21 Mining
22 Utilities
23 Construct
31-33 Manuf
42 Wholesale
44-45 Retail
48-49 Trans
51 Info
52 Finance
54 Profession
55 Mgmt
56 Support
61 Education
62 Health
Care
71 Arts
72 Hospitality
81 Other
Services
92 Public
Admin
Jurisdiction
Metadata Elements
Federal
State +
Local +
Other +
BRM Impact
Keyword
Topic
Citizen Srvcs
Social Srvs
Defense
Disasters
Econ Dev
Education
Energy
Env Mgmt
Law Enf
Judicial
Correctional
Health
Security
Income Sec
Intelligence
Intl Affairs
Nat Resour
Transport
Workforce
Science
Delivery
Support
Management
Agriculture &
food
Commerce
Communications
Education
Energy
Env pro
Foreign rels
Govt
Health &
safety
Housing &
comm dev
Labor
Law
Named grps
National def
Nat resources
Recreation
Sci & tech
Social pgms
Transport
Audience
All
General
Citizen
Business
Govt
Employee
Native
American
Nonresident
Tourist
Special
group
Taxonomies
TAXONOMY STRATEGIES LLC The business of organized information
17
The power of taxonomy facets
 4 independent categories
of 10 nodes each have
the same discriminatory
power as one hierarchy
of 10,000 nodes (104)
 Easier to maintain
 Can be easier to
navigate
TAXONOMY STRATEGIES LLC The business of organized information
18
Taxonomic metadata example:
Form SS-4. Employer Identification Number (EIN)
Facet
Values
Agency
IRS
Content Type
Information Submission
Industry
Impact
Generic
Jurisdiction
Federal
Programs &
Services
Support Delivery of Services/General
Government/Taxation Management
Keyword Topic Commerce/Employment taxes
Audience
Business
TAXONOMY STRATEGIES LLC The business of organized information
19
Methods used to create & maintain metadata
80%
71%
70%
57%
60%
50%
43%
43%
Centralized
production
Not Automated
40%
30%
20%
10%
0%
Forms
Distributed
Production
Base: 20 corporate information managers
CEN/ISSS Workshop on Dublin Core
– Guidance information for the deployment of Dublin
Core metadata in Corporate Environments
TAXONOMY STRATEGIES LLC The business of organized information
20
Agenda
 Myth #1: The Web has changed everything
 Myth #2: Taxonomies are monolithic hierarchies
 Myth #3: Literary warrant
 Myth #4: Knowledge workers
TAXONOMY STRATEGIES LLC The business of organized information
21
Literary warrant
 The “literature” on which a controlled vocabulary is
based.
 The “official names” of people, organizations, events,
places, and things has been published sources
Type of Entity
Authoritative Sources
Author names
Title page
Places
US Board on Geographic Names,
National Geo-Spatial Intelligence
Agency, ISO 3166, UN Statistics Division
Subjects
Existing literature
TAXONOMY STRATEGIES LLC The business of organized information
22
Why vocabulary differences are necessary
 Terminology is needed before “literature” establishes
warrant.
 Categories are needed for internal purposes such as
sorting, analysis, and other ad hoc groupings.
 Organizations, places, and other entities change over
time.
TAXONOMY STRATEGIES LLC The business of organized information
23
Folksonomies: Emergent topics
TAXONOMY STRATEGIES LLC The business of organized information
24
Some vocabulary differences are necessary:
Grouping
ISO
Internal
3166-1 UN Code Code
Name
Official Name
AUT
40
122
Austria
Republic of Austria
BEL
56
124
Belgium
Kingdom of Belgium
DNK
208
128
Denmark
Kingdom of Denmark
FRA
250
132
France
French Republic
Germany
Federal Republic of
Germany
DEU
276
134
SMR
674
135
San Marino
Republic of San
Marino
ITA
380
136
Italy
Italian Republic
LUX
442
137
Luxembourg
Grand Duchy of
Luxembourg
…
…
…
…
…
TAXONOMY STRATEGIES LLC The business of organized information
25
Some vocabulary differences are necessary:
Entities change over time
Name
Part of
Effective
Dates
Entity Type
Serbia and
Montenegro
Europe
2003-
Serbia and
Montenegro
Federal Republic
of Yugoslavia
1991-2003 Republic
Yugoslavia
Europe
1929-1991 Independent state
TAXONOMY STRATEGIES LLC The business of organized information
Independent state
26
Sources for 7 common taxonomies
Taxonomy
Definition
Organization
Organizational structure.
FIPS 95-2, U.S. Government Manual, Your
organizational structure, etc.
Content Type
Structured list of the various types of
content being managed or used.
DC Types, AGLS Document Type, AAT
Information Forms , Records management
policy, etc.
Industry
Broad market categories such as
lines of business, life events, or
industry codes.
FIPS 66, SIC, NAICS, etc.
Location
Place of operations or constituencies.
FIPS 5-2, FIPS 55-3, ISO 3166, UN
Statistics Div, US Postal Service, etc.
Function
Functions and processes performed
to accomplish mission and goals.
FEA Business Reference Model,
Enterprise Ontology, AAT Functions, etc.
Topic
Business topics relevant to your
mission and goals.
Federal Register Thesaurus, NAL
Agricultural Thesaurus, LCSH, etc.
Audience
Subset of constituents to whom a
piece of content is directed or
intended to be used.
GEM, ERIC Thesaurus, IEEE LOM, etc.
Products and
Services
Names of products/programs &
services.
ERP system, Your products and services,
etc.
TAXONOMY STRATEGIES LLC The business of organized information
Potential Sources
27
How Dublin Core is extended?
120%
100%
100%
86%
80%
60%
57%
57%
Roles
Inconsistent
Encoding
40%
20%
0%
Doc Types
Products &
Services
Base: 20 corporate information managers
CEN/ISSS Workshop on Dublin Core
– Guidance information for the deployment of Dublin
Core metadata in Corporate Environments
TAXONOMY STRATEGIES LLC The business of organized information
28
Business process document types: Local document
type lists are commonly invented
Oil & gas services company document types
analysis, appraisals, assessments, forecasts, predictions
agendas, plans, designs, schedules, workflow
applications, proposals, requests, requirements
permits, consents, approvals, rejections, certificates
work orders, correspondence
auditing, compliance, testing, inspections, operations reports
lessons learned, after-action reviews, meeting minutes, FAQs
policies, procedures, training manuals, standards, best practices
research notes, journal articles
newsletters, bulletins, press releases
ads, brochures, data sheets, technical notes, case studies, price lists
checklists, templates, forms, logos, branding
software, database forms
TAXONOMY STRATEGIES LLC The business of organized information
29
What controlled vocabularies are being used?
60%
57%
50%
43%
40%
29%
30%
20%
14%
10%
0%
ERP
LDAP
Base: 20 corporate information managers
Business Process
ISO 3166
CEN/ISSS Workshop on Dublin Core
Language Codes
– Guidance information for the deployment of Dublin
Core metadata in Corporate Environments
TAXONOMY STRATEGIES LLC The business of organized information
30
Agenda
 Myth #1: The Web has changed everything
 Myth #2: Taxonomies are monolithic hierarchies
 Myth #3: Literary warrant
 Myth #4: Knowledge workers
TAXONOMY STRATEGIES LLC The business of organized information
31
Knowledge workers spend up to 2.5 hours
each day looking for information …
Communicating
Searching
Creating
… But find what they are looking for only 40% of
the time.
— Kit Sims Taylor
TAXONOMY STRATEGIES LLC The business of organized information
32
Knowledge workers spend more time re-creating
existing content than creating new content
Communicating
Recreating
existing
content
26%
Searching
Creating
new
content
9%
— Kit Sims Taylor
TAXONOMY STRATEGIES LLC The business of organized information
33
High cost of not finding information
 “The amount of time wasted in futile searching for vital
information is enormous, leading to staggering costs …”
— Sue Feldman,
High cost of poor classification
 Poor classification costs a 10,000 user organization $10M
each year—about $1,000 per employee.
— Jakob Nielsen, useit.com
TAXONOMY STRATEGIES LLC The business of organized information
34
Opportunities and challenges
 80% of enterprise data is unstructured.
 Outputs from back office systems are documents—
queries & reports.
 Avoiding unnecessary recreation of content.
 Enabling decision-making transparency.
 Promulgating policies & guidelines.
 Managing intellectual property.
 Supporting product & services throughout their life
cycle—development, marketing, sales & support.
TAXONOMY STRATEGIES LLC The business of organized information
35
Productivity, loyalty, and revenue have provided the
ROI
TAXONOMY STRATEGIES LLC The business of organized information
36
Intranet has provided the best ROI
Intranet
Web/online
customer sales
Web dev
infrastructure
Web/online
business sales
Middleware to
link Web to ERP
Extranet/supply
chain
ebilling/payment
systems
Wireless Web
access
e-marketplace/
portal
None
TAXONOMY STRATEGIES LLC The business of organized information
37
Taxonomy Strategies LLC
Joseph A. Busch
+ 415-377-7912
[email protected]
http://ww.taxonomystrategies.com
April 11, 2005
Copyright 2005 Taxonomy Strategies LLC. All rights reserved.