Agenda - Taxonomy Strategies

Download Report

Transcript Agenda - Taxonomy Strategies

Taxonomy Strategies LLC
Tagging, Interfaces & Content
Organization Infrastructures
Joseph A Busch, Principal
November 1, 2006
Copyright 2006 Taxonomy Strategies LLC. All rights reserved.
Who I am
 Over 25 years in the business of organized information
 Founder & Principal, Taxonomy Strategies
 Director, Solutions Architecture, Interwoven
 VP, Infoware, Metacode Technologies
 Program Manager, Getty Foundation
 Manager, Pricewaterhouse
 Assistant Director for Technical Services, Hampshire College
 Chief, Technical Services, Paul Weiss Rifkind Wharton & Garrison
 Metadata & taxonomies community leadership.
 President, American Society for Information Science & Technology
 Trustee, Dublin Core Metadata Initiative
 Co-Founder, Networked Knowledge Organization Systems/Services
 Adviser, National Research Council Computer Science and
Telecommunications Board
 Reviewer, National Science Foundation Division of Information and
Intelligent Systems
TAXONOMY STRATEGIES LLC The business of organized information
2
Recent & current projects
Government
Commercial
Not-for-Profit
TAXONOMY STRATEGIES LLC The business of organized information
3
What I do
Organize Stuff
TAXONOMY STRATEGIES LLC The business of organized information
4
For us, taxonomy work includes:
 Metadata specification defines
the properties needed to
describe content so that it can
be found & used.
 Vocabularies are collections of
terms that are used to specify
some of the metadata
properties.
 Some vocabularies are big
and hierarchical, some are
small and flat.
 An application profile specifies
what metadata & vocabularies
are required, and then
represents them formally.
TAXONOMY STRATEGIES LLC The business of organized information
5
Seven phases of taxonomy and metadata design
1 Identify
Objectives
2 Inventory
Content
Conduct interviews
ID sources, spider
assets & extract
metadata
3 Specify
Metadata
4 Model
Content
5 Specify
Vocabularies
6 Specify
Procedures
7 Train Staff
TAXONOMY STRATEGIES LLC The business of organized information
Define fields &
purpose
Define content
chunks & XML
DTDs
Compile controlled
vocabularies
Develop workflow,
rules & procedures
Develop
materials &
train staff
6
Use metadata to support core purposes
 Metadata can be used to provide enough information for any user,
Complexity
tool, or program to find out everything needed to find and apply any
piece of content.
Subject metadata –
What, Where & Why:
Subject, Type, Coverage
Use metadata –
When & How:
Date, Language, Rights
Asset metadata – Who:
Identifier, Creator, Title,
Description, Publisher,
Format, Contributor
Relational metadata –
Links between and to:
Source, Relation
Enabled Functionality
http://dublincore.org/documents/dces/
TAXONOMY STRATEGIES LLC The business of organized information
7
Use metadata to support core purposes
 Metadata can be used to provide enough information for any user,
Complexity
tool, or program to find out everything needed to find and apply any
piece of content.
Subject metadata – Better Use metadata –
What, Where &navigation
Why:
&When & How:
Subject, Type, Coverage
Date, Language, Rights
discovery
Asset metadataMore
– Who: efficient
Relational metadata –
Identifier, Creator, Title,
editorial
Links between and to:
Description, Publisher,
process Source, Relation
Format, Contributor
Enabled Functionality
http://dublincore.org/documents/dces/
TAXONOMY STRATEGIES LLC The business of organized information
8
Agenda
 Tagging
 Interface
 Content Organization
TAXONOMY STRATEGIES LLC The business of organized information
9
Tagging Overview
 Tagging is better than the words that happen to occur in a
piece of content.
 All tagging is useful
 End user tagging
 Tagging by librarians
 Automated tagging by OS and algorithms
 Content should be tagged throughout its lifecycle, each
time the content is handled and used so that it accrues
value or its significance is diminished.
TAXONOMY STRATEGIES LLC The business of organized information
10
MS Office: File  Properties
TAXONOMY STRATEGIES LLC The business of organized information
11
Flickr: Organize
TAXONOMY STRATEGIES LLC The business of organized information
12
Four Tagging Rules
Rule
Description
Use specific
terms
Apply the most specific terms when tagging
content. Specific terms can always be
generalized, but generic terms cannot be
specialized.
Use multiple
terms
Use as many terms as necessary to describe
What the content is about & Why it is important.
Use appropriate
terms
Only fill-in the facets & values that make sense.
Not all facets apply to all content.
Consider how
content will be
used
Anticipate how the content will be searched for in
the future, & how to make it easy to find it.
Remember that search engines can only operate
on explicit information.
TAXONOMY STRATEGIES LLC The business of organized information
13
Agenda
 Tagging
 Interface
 Content Organization
TAXONOMY STRATEGIES LLC The business of organized information
14
Requirements for a tagging interface
 Automated form fill-in (automatically fills in known data)
 Tagging precedents (see tags already assigned by









others)
Controlled vocabularies, e.g., with pull-down list
Multi-valued tags
Geo-tagging
Group tagging
Clean-up tag tools, e.g., alpha list
Batch editing
Share/Don’t share (Public/Private)
Identified owner (who can be emailed)
Almost immediate feedback, e.g., tag cloud
TAXONOMY STRATEGIES LLC The business of organized information
15
Form fill-in: Automatically filled-in known data
TAXONOMY STRATEGIES LLC The business of organized information
16
Form fill-in: Automatically filled-in known data
Manual form fill-in w/ check
boxes, pull-down lists, etc.
Auto keyword &
summarization
TAXONOMY STRATEGIES LLC The business of organized information
17
Form fill-in: Automatically filled-in known data
Auto-categorization
Rules & pattern
matching
Parse & lookup
(recognize names)
TAXONOMY STRATEGIES LLC The business of organized information
18
Tagging precedents:
See tags assigned by others
TAXONOMY STRATEGIES LLC The business of organized information
19
Multi-valued group tagging
TAXONOMY STRATEGIES LLC The business of organized information
20
Group geo-tagging
TAXONOMY STRATEGIES LLC The business of organized information
21
Group geo-tagging
TAXONOMY STRATEGIES LLC The business of organized information
22
Clean up tag tools: Alpha list
TAXONOMY STRATEGIES LLC The business of organized information
23
Batch edit
TAXONOMY STRATEGIES LLC The business of organized information
24
Share or don’t share tagging
TAXONOMY STRATEGIES LLC The business of organized information
25
Bulk Tagging
 ID collection of related content items by pattern or context
 Then, apply same attributes to all content items
TAXONOMY STRATEGIES LLC The business of organized information
26
Tag a folder
 Drag & drop content items into folder
 Then, content items inherit properties of folder
TAXONOMY STRATEGIES LLC The business of organized information
27
Workflow
 Approve & improve mindset
Create
Content
Add
Metadata
Review &
Improve
TAXONOMY STRATEGIES LLC The business of organized information
Publish
Review &
Improve
28
Interactive rewards
 Almost instantaneous exposure of tags in simple user
interfaces on the web provides positive reinforcement for
user tagging that simply did not exist before.
 For example,
 Most popular
 Tag clouds
 Alerts
TAXONOMY STRATEGIES LLC The business of organized information
29
Most popular
 Another example is most emailed from, e.g., the NY Times.
TAXONOMY STRATEGIES LLC The business of organized information
30
Tag cloud
TAXONOMY STRATEGIES LLC The business of organized information
31
Alerts
 New (content selected by date)
 Subscriptions (content selected by tags)
 Interest (content selected by other people)
 Individual (content selected for you by other people)
TAXONOMY STRATEGIES LLC The business of organized information
32
Agenda
 Tagging
 Interface
 Content Organization
TAXONOMY STRATEGIES LLC The business of organized information
33
Content organization models: The Information
Architect
 Saul Wurman’s 5 ways to categorize things
 By location (spatially)
 By alphabet (alphabetically)
 By time (chronologically)
 By category (subject)
 By hierarchy (BT/NT, etc)
Richard Saul Wurman. Information Architects (1996)
TAXONOMY STRATEGIES LLC The business of organized information
34
Content organization models: The Records Manager
 Archives & business records
 By function (business purpose)
 By genre (document type)
Brands &
Varieties
Events
Ingredients
Locations
Nutrients
Organizations
Functions
Accounting
Administration
Environment
Finance
Human Resources
Legal
Marketing & Sales
Plant Operations
Projects
Public Relations
Research &
Development
Tax
Treasury
TAXONOMY STRATEGIES LLC The business of organized information
Doc Types
Account Listings
Acquisitions
Cash Disbursements
Cash Receipts
Contract Accounting
Records
Credit Advices
Credit Card Charges
Donations
Employee Expense
Reports
Invoices
Petty Cash Records
Permits & Licenses
Plans & Forecasts
Royalty Payments
Sales Receipts
35
Content organization models: The Product Manager
 Management (for general business operational purposes)
 By products and services
Systems
Peripherals
Services
Support
My Account
Handhelds
Monitors
Printers
Projectors
TVs
CRT Monitors
LCD Monitors
All-in-One &
Photo Printers
B/W &
Multifunction
Laser Printers
Color Laser
Printers
Ink & Printer
Accessories
TAXONOMY STRATEGIES LLC The business of organized information
LCD TVs
Plasma TVs
Parts
All Electronics & Accessories
Desktop Accessories
Notebook Accessories
Digital Photography
Handhelds
Memory
Monitors
MP3 Players
Networking
Power
Printers & Ink
Projectors
Software & Games
Storage & Drives
TVs & Home Theater
36
Content organization models: Marketer
 Marketing & sales
 By psycho social profiles such as lifestyle stages, personas, etc.
 By industry
 By location
Audience
Age Group
Aisles
Business
Consumer
Financial Risk
Service
Standard
TAXONOMY STRATEGIES LLC The business of organized information
Intention
Inquiry
Research
Support
Upgrade
Lifecycle
Industry
Pre-Sales
Early Life
Purchase
Experience &
Sales Process
Set Up / Installation
Billing Experience
Support
Retain & Renew
Construction &
Building
Field Services
Finance &
Insurance
Financial Services
Government
Healthcare
Higher Education
Hospitality
Services
Insurance
K-12 Education
Manufacturing
Professional
Services
Real Estate
Retail
Transportation &
Distribution
Location
Regions
ZIP Code
37
Content organization models: Editor
 Editorial
 By content lifecycle
Social Aspects of Digital Libraries: Final Workshop Report (Nov 1996)
http://is.gseis.ucla.edu/research/dl/UCLA_DL_Report.doc
TAXONOMY STRATEGIES LLC The business of organized information
38
Faceted taxonomy theory & practice
 How many terms are needed to provide sufficient
granularity?
 Not as many as you think
 Post-coordinate indexing allows several simple controlled
vocabularies to be combined, rather than using a single
large pre-coordinated vocabulary.
TAXONOMY STRATEGIES LLC The business of organized information
39
The power of faceted taxonomy
 4 independent categories of 10
nodes each have the same
discriminatory power as one
hierarchy of 10,000 nodes (104)
 Easier to maintain
 Easier to tag by content authors
 Can be easier to navigate
TAXONOMY STRATEGIES LLC The business of organized information
Audience
Health
Industry
Advocacy
Contractors &
Grantees
Environmental
Professionals
Federal
Facilities
General Public
Industry
Kids
Researchers &
Scientists
Small Business
Students
Advisory
Exposure
Food Safety
Health
Assessment
Health Effect
Health Risk
Occupational
Health
Pesticide
Effects
Sun Protection
Toxicity
Agriculture &
Cattle
Automobile
Repair
Chemical
Dry Cleaning
Electronics &
Computer
Energy
Extractive
Industries
Food
Processing
Leather
Tanning &
Finishing
Metal Finishing
Substance
Allergen
Biological
Contaminant
Carcinogen
Chemical
Explosive
Liquid Waste
Microorganism
Ozone
Pesticide
Radioactive
Waste
40
Impact on collection size by increasing number of
terms per facet
140,000,000
125,000,000
120,000,000
Collection Size
100,000,000
80,000,000
60,000,000
51,200,000
40,000,000
20,000,000
16,200,000
200,000
3,200,000
0
110
220
330
440
550
Terms per Facet
# Docs/Category
# Facets
# Terms/Facet
Max Collection Size
# Post-coord combos
20
20
20
20
20
4
4
4
4
4
10
20
30
40
50
200,000
3,200,000
16,200,000
51,200,000
125,000,000
10,000
160,000
810,000
2,560,000
6,250,000
TAXONOMY STRATEGIES LLC The business of organized information
41
Impact on collection size by increasing number of
facets
2,500,000,000
2,000,000,000
Collection Size
2,000,000,000
1,500,000,000
1,000,000,000
500,000,000
200,000
2,000,000
200,000,000
20,000,000
0
14
52
63
Number of facets
74
85
# Docs/Category
20
20
20
20
20
# Terms/Facet
10
10
10
10
10
4
5
6
7
8
# Facets
Max Collection Size
# Post-coord combos
200,000
2,000,000
20,000,000
200,000,000
2,000,000,000
10,000
100,000
1,000,000
10,000,000
100,000,000
TAXONOMY STRATEGIES LLC The business of organized information
42
Sources for 7 common taxonomies
Taxonomy
Definition
Organization
Organizational structure.
FIPS 95-2, U.S. Government Manual, Your
organizational structure, etc.
Content Type
Structured list of the various types of
content being managed or used.
DC Types, AGLS Document Type, AAT
Information Forms , Records management
policy, etc.
Industry
Broad market categories such as
lines of business, life events, or
industry codes.
FIPS 66, SIC, NAICS, etc.
Location
Place of operations or constituencies.
FIPS 5-2, FIPS 55-3, ISO 3166, UN
Statistics Div, US Postal Service, etc.
Function
Functions and processes performed
to accomplish mission and goals.
FEA Business Reference Model,
Enterprise Ontology, AAT Functions, etc.
Topic
Business topics relevant to your
mission and goals.
Federal Register Thesaurus, NAL
Agricultural Thesaurus, LCSH, etc.
Audience
Subset of constituents to whom a
piece of content is directed or
intended to be used.
GEM, ERIC Thesaurus, IEEE LOM, etc.
Products and
Services
Names of products/programs &
services.
ERP system, Your products and services,
etc.
TAXONOMY STRATEGIES LLC The business of organized information
Potential Sources
43
Facetted tagging
 How well can end users (content authors) do this?
 Incentives help such as almost instantaneous feedback (AIF)
 Importance of workflow (new slide?)
– Tagging & re-tagging throughout content life cycle
– Show graphic of content lifecycle (from UCLA NSF workshop?)
 Approve & improve mindset
 Test & improve
TAXONOMY STRATEGIES LLC The business of organized information
44
Summary
 There are lessons to be learned from web tagging about
how to get good metadata in document and content
management applications.
 Document and content management system tagging
must be simple, and it must be almost instantaneously
easier to find relevant work products.
TAXONOMY STRATEGIES LLC The business of organized information
45
Taxonomy Strategies LLC
Questions?
Joseph A. Busch
415-377-7912,
[email protected]
November 1, 2006
Copyright 2006 Taxonomy Strategies LLC. All rights reserved.
Tagging Overview
 Tagging, any kind of tagging is better than the words that happen to occur in
a piece of content. End user tagging is useful, so is tagging by librarians, as
are tags automatically assigned by operating systems and language
processing algorithms. Content should be tagged throughout its lifecycle,
each time the content is handled and used so that it accrues value or its
significance is diminished.
 Almost instantaneous exposure of tags in simple user interfaces on the web
provides positive reinforcement for user tagging that simply did not exist
before. It should not be surprising that a good user interface improves
usability.
 As content users flock to websites that help to organize the content on the
web, advertisements and value added content services follow. The
bottleneck in the semantic web has been not enough tagged content. The
end user tagging revolution may begin to address this shortcoming.
 There are lessons to be learned from web tagging about how to get good
metadata in document and content management applications. Document and
content management system tagging must be simple, and it must be almost
instantaneously easier to find relevant work products.
TAXONOMY STRATEGIES LLC The business of organized information
47