Transcript USDA ERS

Taxonomy Strategies
pvc
Taxonomy & Metadata Strategies for
Effective Content Management
February 10, 2004
Copyright 2004 Taxonomy Strategies. All rights reserved.
Who I am
Over 25 years in the business of organized information
 Founder, Taxonomy Strategies
 Director, Solutions Architecture, Interwoven
 VP, Infoware, Metacode Technologies (acquired by Interwoven,
November 2000)
 Program Manager, Getty Foundation
 Manager, Pricewaterhouse
Metadata and taxonomies community leadership.
 President, American Society for Information Science & Technology
 Director, Dublin Core Metadata Initiative
 Adviser, National Research Council Computer Science and
Telecommunications Board
 Reviewer, National Science Foundation Division of Information and
Intelligent Systems
 Founder, Networked Knowledge Organization Systems/Services
TAXONOMY STRATEGIES The business of organized information
2
Agenda
 Remember: Dublin Core = Descriptive cataloging
 Myth #1: Taxonomies are monolithic hierarchies
 Myth #2: People retrieve content by topical subjects
 Myth #3: Nobody else can index content
 Myth #4: All a search engine can retrieve is a list
TAXONOMY STRATEGIES The business of organized information
3
What is metadata? Another view of Dublin Core
Complexity
Subject metadataBetter
–
Use metadata –
navigation How
& can it be used:
What & Why:
Subject, Description,
Rights & Permissions
discovery
Coverage
Asset metadata –
Who, Where &More
When: efficient
Relational metadata
Links between and to:
Title, Creator, Publisher,
editorial
Contributor, Date, Type,
Relation
process
Format, Identifier, Source,
Language
–
Enabled Functionality
http://dublincore.org/documents/dces/
TAXONOMY STRATEGIES The business of organized information
4
Metadata is a data model– A scheme for e-Forms
Element
Data
Type
Req. /
Repeat
Length
Source
Purpose
Asset
Identifier
Integer
Fixed
1
System supplied
Basic accountability
Registrar
String
Variable
1
LDAP validated
Accountability & maintenance
Form Name
String
Variable
?
User
Text search, results display
Form Number
String
Variable
1
User
Text search, results display
Revision Date
Date
Fixed
1
User
Filter or rank search results
?
Organization
vocabulary
Key index to retrieve & aggregate
assets
Agency
List
Fixed
Subject
Form Type
List
Variable
1
Form Type
vocabulary
Industry Code
List
Fixed
?
NAICS codes
Browse or group search results
*
2-letter USPS
codes
Browse or group search results
*
FEA Business Ref
Model vocabulary
Browse or group search results
...
...
Jurisdiction
List
Fixed
Purpose
List
Variable
...
...
...
...
Legend:
TAXONOMY STRATEGIES The business of organized information
? – 1 or more
Browse or group search results
* - 0 or more
5
Agenda
 Remember: Dublin Core = Descriptive cataloging
 Myth #1: Taxonomies are monolithic hierarchies
 Myth #2: People retrieve content by topical subjects
 Myth #3: Nobody else can index content
 Myth #4: All a search engine can retrieve is a list
TAXONOMY STRATEGIES The business of organized information
6
What is a taxonomy? Information
A Systematics
view.design view.
system
The specification of the names of people, places, things
The specification
… and everything
of the names
elseofthat
people,
is needed
places, things
to allow search engines and other content applications to work better.
Animalia
Chordata
Mammalia
Carnivora
Canidae
Canis
C. familiari
Kingdom
Phylum
Class
Order
Family
Genus
Species
Linnaeus …
44-Office Equipment and Accessories and
Supplies
.12-Office Supplies
.17-Writing Instruments
.05-Mechanical pencils
.06-Wooden pencils
.07-Colored pencils
Segment
Family
Class
Commodity
UNSPSC …
TAXONOMY STRATEGIES The business of organized information
7
What is the purpose of a Taxonomy? To …
Discover
Classify
HORSE-DRAWN
CARRIAGE
CAR
Create
PLANE
SPACE SHUTTLE
WINGS
ENGINE
WHEELS
 Content
 Assets
 Taxonomy
 Logical &
Intuitive Filters
 Site Maps
 Search Engines
 Portals
 Content Integration
Networks
… find the right information at the right
time to solve the problem at hand
TAXONOMY STRATEGIES The business of organized information
8
Taxonomy– e-Forms example
Agency
0001 Legislative
1000 Judicial
1100 Executive
Office of Pres
0003 Exec Depts
1200 Agriculture
1300 Commerce
9700 Defense
9100 Education
8900 Energy
7500 HHS
7000 DHS
8600 HUD
1400 Interior
1500 Justice
1600 Labor
1900 State
6900 Transport
2000 Treasury
3600 Veterans
Ind Agencies
Intl Orgs
Form Type
Industry
Impact
Application
Approval
Claim
Information
request
Information
submission
Instructions
Legal filing
Payment
Procurement
Renewal
Reservation
Service
request
Test
Other input
Other
transaction
00 Generic
11 Agriculture
21 Mining
22 Utilities
23 Construct
31-33 Manuf
42 Wholesale
44-45 Retail
48-49 Trans
51 Info
52 Finance
54 Profession
55 Mgmt
56 Support
61 Education
62 Health
Care
71 Arts
72 Hospitality
81 Other
Services
92 Public
Admin
Jurisdiction
BRM Impact
Facet Federal
Categories
Citizen Srvcs
State
Local
Other
Social Srvs
Defense
Disasters
Econ Dev
Education
Energy
Env Mgmt
Law Enf
Judicial
Correctional
Health
Security
Income Sec
Intelligence
Intl Affairs
Nat Resour
Transport
Workforce
Science
Delivery
Support
Management
Keyword
Topic
Agriculture &
food
Commerce
Communications
Education
Energy
Env pro
Foreign rels
Govt
Health &
safety
Housing &
comm dev
Labor
Law
Named grps
National def
Nat resources
Recreation
Sci & tech
Social pgms
Transport
Audience
All
General
Citizen
Business
Govt
Employee
Native
American
Nonresident
Tourist
Special
group
Controlled Vocabularies
TAXONOMY STRATEGIES The business of organized information
9
The power of taxonomy facets
 4 independent categories
of 10 nodes each have
the same discriminatory
power as one hierarchy
of 10,000 nodes (104)
 Easier to maintain
 Can be easier to
navigate
TAXONOMY STRATEGIES The business of organized information
10
Agenda
 Remember: Dublin Core = Descriptive cataloging
 Myth #1: Taxonomies are monolithic hierarchies
 Myth #2: People retrieve content by topical subjects
 Myth #3: Nobody else can index content
 Myth #4: All a search engine can retrieve is a list
TAXONOMY STRATEGIES The business of organized information
11
7 Common taxonomy facets
Personalized content delivery requires defining taxonomy facets
Facet
Definition
Example Source
Organization
Organizational structure.
FIPS 95-2, Your organizational
structure, etc.
Content Type
Structured list of the various types of
content being managed or used.
AGLS Document Type, AAT
Information Forms , Records
management policy, etc.
Industry
Broad market categories such as lines of
business, life events, or industry codes.
FIPS 66, SIC, NAICS, etc.
Location
Place of operations or constituencies.
FIPS 5-2, FIPS 55-3, ISO 3166, US
Postal Service, etc.
Function
Functions and processes performed to
accomplish mission and goals.
FEA Business Reference Model,
Enterprise Ontology, AAT
Functions, etc.
Topic
Business topics relevant to your mission
and goals.
Federal Register Thesaurus, NAL
Agricultural Thesaurus, etc.
Audience
Subset of constituents to whom a piece of
content is directed or intended to be used.
GEM, ERIC Thesaurus, IEEE LOM,
etc.
Products and
Services
Names of products and services.
ERP system, Your products and
services, etc.
… and re-use of existing vocabulary sources
TAXONOMY STRATEGIES The business of organized information
12
Mapping facets to Dublin Core
Dublin Core Elements
Definition
Vocabulary
Source
Title
Resource name.
Not applicable
Creator
Content maker.
LDAP
Subject
Content topic.
Topic facet
Description
Description of content, summary.
Not applicable
Publisher
Publisher of this manifestation.
Agency facet
Contributor
Content contributor.
LDAP
Date
Content lifecycle event for this manifestation.
Not applicable
Type
Genre.
Form Type facet
Format
Format of this manifestation.
RFC 2045
Identifier
Reference for this manifestation, e.g., URL.
Not applicable
Source
Source from which this manifestation has been
derived.
Not applicable
Language
Language of this manifestation.
ISO 639
Relation
Reference to related resource.
None
Coverage
Space, period, date, jurisdiction, etc.
Jurisdiction facet
Rights
Who has rights to use this manifestation.
Access security/privacy
classification
TAXONOMY STRATEGIES The business of organized information
13
Facets at work on FirstGov site
Frequency
Organization
Audience
Content Type
TAXONOMY STRATEGIES The business of organized information
14
Agenda
 Remember: Dublin Core = Descriptive cataloging
 Myth #1: Taxonomies are monolithic hierarchies
 Myth #2: People retrieve content by topical subjects
 Myth #3: Nobody else can index content
 Myth #4: All a search engine can retrieve is a list
TAXONOMY STRATEGIES The business of organized information
15
Indexing rules… Simplified for creator indexing
Rule
Description
Specificity rule
Apply the most specific terms when tagging assets.
Specific terms can always be generalized, but
generic terms cannot be specialized.
Repeatable rule
All attributes should be repeatable. Use as many
terms as necessary to describe What the asset is
about and Why it is important. Storage is cheap.
Re-creating content is expensive.
Appropriateness
rule
Not all attributes apply to all assets. Only supply
values for attributes that make sense.
Usability rule
Anticipate how the asset will be searched for in the
future, and how to make it easy to find it.
Remember that search engines can only operate
on explicit information.
TAXONOMY STRATEGIES The business of organized information
16
FAA Metadata Scheme 1.0
Attribute
Values
Content Types
Types of content.
Organizations
Functions
FAA organizations, partners,
and industry associations.
Locations by geography and
function.
Regulatory functions.
Topics
Web site categories.
Audiences
Audiences for a content item.
Locations
TAXONOMY STRATEGIES The business of organized information
17
FAA-Office of Accident Investigation
New NTSB Safety Recommendations—
discovery indexing example
Attribute
Values
Content Types
Rules
Organizations
Office of Accident Investigation
(AAI)
Locations
United States
Functions
Rulemaking
Topics
Regulations and Policies
Audiences
Air carriers
TAXONOMY STRATEGIES The business of organized information
18
Blueprint for NAS Modernization 2002
Update—discovery indexing example
Attribute
Values
Content Types
Reports
Organizations
Office of System Architecture
and Investment Analysis (ASD)
Locations
Airspace areas
Functions
Airspace
Topics
Airports and Air Traffic
Audiences
Government
TAXONOMY STRATEGIES The business of organized information
19
Agenda
 Remember: Dublin Core = Descriptive cataloging
 Myth #1: Taxonomies are monolithic hierarchies
 Myth #2: People retrieve content by topical subjects
 Myth #3: Nobody else can index content
 Myth #4: All a search engine can retrieve is a list
TAXONOMY STRATEGIES The business of organized information
20
Problems of “information space” design
Within a website
 Show extent of content available
 Filter large content sets into reviewable groups
 Find more content like this
 Show latest content
 Feature selected content
Outside a website
 Contribute content to portals
 Map to other taxonomies
 Integrate with web search engines
TAXONOMY STRATEGIES The business of organized information
21
How to show the coverage of a large collection—
NASA Taxonomy demo site
TAXONOMY STRATEGIES The business of organized information
22
FirstGov search versus NASA Taxonomy search +
facet navigation—Type in: Rover
TAXONOMY STRATEGIES The business of organized information
23
How to show the coverage of a large collection—
FAO collection
TAXONOMY STRATEGIES The business of organized information
24
Agricola search versus FAO search + facet
navigation—Type in: Thinning
TAXONOMY STRATEGIES The business of organized information
25
Website search best practices
Search Results Page

1
Search




6

3
 Best Bets
 Meaningful results order
 Last updated
4
5
 Meaningful titles &
descriptions

Content Page


 Search Box
Search
9
7


8
 Facet browse w/ counts
 Site nav w/ extras
 Inline links

10
 Links to related content
 Subscriptions
Subscribe
TAXONOMY STRATEGIES The business of organized information
26
Additional sources on faceted metadata
 Hearst, M., Elliott, A., English, J., Sinha, R., Swearingen, K., and Yee,
P., Finding the Flow in Web Site Search, Communications of the
ACM, 45 (9), September 2002, pp.42-49.
http://www.sims.berkeley.edu/~hearst/papers/cacm02.pdf
 K. Yee, K. Swearingen, K., M. Hearst. “Searching and organizing:
Faceted metadata for image search and browsing.” Proceedings of
the Conference on Human Factors in Computing Systems (April
2003) http://bailando.sims.berkeley.edu/papers/flamenco-chi03.pdf
 Taxonomy Strategies Website Bibliography > Click on taxonomy
http://www.taxonomystrategies.com/html/bibliography.htm#taxonomy
TAXONOMY STRATEGIES The business of organized information
27
Taxonomy Strategies
Joseph A. Busch
+ 415-377-7912
[email protected]
February 10, 2004
Copyright 2004 Taxonomy Strategies. All rights reserved.