Tagging: It’s the Interface Stupid!

Download Report

Transcript Tagging: It’s the Interface Stupid!

Taxonomy Strategies LLC
Tagging: It’s the Interface
Stupid!
Joseph A Busch
November 4, 2006
Copyright 2006 Taxonomy Strategies LLC. All rights reserved.
Who I am
 Over 25 years in the business of organized information
 Founder & Principal, Taxonomy Strategies
 Director, Solutions Architecture, Interwoven
 VP, Infoware, Metacode Technologies
 Program Manager, Getty Foundation
 Manager, Pricewaterhouse
 Assistant Director for Technical Services, Hampshire College
 Chief, Technical Services, Paul Weiss Rifkind Wharton & Garrison
 Metadata & taxonomies community leadership.
 President, American Society for Information Science & Technology
 Trustee, Dublin Core Metadata Initiative
 Co-Founder, Networked Knowledge Organization Systems/Services
 Adviser, National Research Council Computer Science and
Telecommunications Board
 Reviewer, National Science Foundation Division of Information and
Intelligent Systems
Taxonomy Strategies LLC The business of organized information
2
Recent & current projects
Government

















Chelan County Public Utilities District
Commodity Futures Trading Commission
Federal Aviation Administration
Federal Reserve Bank of Atlanta
Head Start
Infocomm Development Authority of Singapore
NASA (nasataxonomy.jpl.nasa.gov)
U.S. Defense Intelligence Agency
U.S.D.A. Economic Research Service
U.S.D.A. e-Government Program
(www.usda.gov)
U.S. Dept of Education ERIC
U.S. D.H.S. Citizenship and Immigration
Services
U.S. Environmental Protection Agency
U.S. Forest Service
U.S. GSA Office of Citizen Services
(www.firstgov.gov)
U.S. Small Business Administration
U.S. Social Security Administration
Taxonomy Strategies LLC The business of organized information
Commercial




















Agency.com
Amway
Albertsons
Allstate Insurance
Baker Hughes
BHP Billiton
Blue Shield of California
Campbell Soup Company
Capital One
Debevoise & Plimpton
Dell
Halliburton
Hewlett Packard
Microsoft
Motorola
PeopleSoft
Pricewaterhouse Coopers
Siderean Software
Sprint
Time Inc.
NGO’s





European Committee for Standardization
IDEAlliance
International Monetary Fund
National Association of Realtors
OCLC
3
What I do
Organize Stuff
Taxonomy Strategies LLC The business of organized information
4
For us, taxonomy work includes:
 Metadata specification defines
the properties needed to
describe content so that it can
be found & used.
 Vocabularies are collections of
terms that are used to specify
some of the metadata
properties.
 Some vocabularies are big
and hierarchical, some are
small and flat.
 An application profile specifies
what metadata & vocabularies
are required, and then
represents them formally.
Taxonomy Strategies LLC The business of organized information
5
Agenda
 Tagging
 Tagging Interface
 Content Organization
Taxonomy Strategies LLC The business of organized information
6
Tagging Overview
 Tagging is better than the words that happen to occur in a
piece of content.
 All tagging is useful
 End user tagging
 Tagging by librarians
 Automated tagging by OS and algorithms
 Content should be tagged throughout its lifecycle, each
time the content is handled and used so that it accrues
value or its significance is diminished.
Taxonomy Strategies LLC The business of organized information
7
MS Office: File  Properties
Taxonomy Strategies LLC The business of organized information
8
Organize
Taxonomy Strategies LLC The business of organized information
9
What is social tagging?
 End user tagging
 Easy, intuitive tagging interfaces
 Almost instantaneous feedback
 Enables people to tag & re-tag content
 … in response to seeing their tags in context with other tags.
 Emergent categories
 Resembles open card sort process in which patterns emerge
 … rather than validating categories using closed card sorts.
Taxonomy Strategies LLC The business of organized information
10
Social tagging innovators
 flickr founders
 Caterina Fake
 Stewart Butterfield
 del.icio.us founder
 Joshua Schachter
 del.icio.us & flickr are now both part of Yahoo!
 As of April 2006 flickr has 130 million photos posted by 3
million registered users.
Taxonomy Strategies LLC The business of organized information
11
Four tagging rules for end users
Rule
Description
Use specific
terms
Apply the most specific terms when tagging
content. Specific terms can always be
generalized, but generic terms cannot be
specialized.
Use multiple
terms
Use as many terms as necessary to describe
What the content is about & Why it is important.
Use appropriate
terms
Only fill-in the facets & values that make sense.
Not all facets apply to all content.
Consider how
content will be
used
Anticipate how the content will be searched for in
the future, & how to make it easy to find it.
Remember that search engines can only operate
on explicit information.
Taxonomy Strategies LLC The business of organized information
12
Agenda
 Tagging
 Tagging Interface
 Content Organization
Taxonomy Strategies LLC The business of organized information
13
Requirements for a tagging interface
 Automated form fill-in (automatically fills in known data)
 Tagging precedents (see tags already assigned by









others)
Controlled vocabularies, e.g., with pull-down list
Multi-valued tags
Geo-tagging
Group tagging
Clean-up tag tools, e.g., alpha list
Batch editing
Share/Don’t share (Public/Private)
Identified owner (who can be emailed)
Almost immediate feedback, e.g., tag cloud
Taxonomy Strategies LLC The business of organized information
14
Form fill-in: Automatically filled-in known data
Taxonomy Strategies LLC The business of organized information
15
Form fill-in: Automatically filled-in known data
Manual form fill-in w/ check
boxes, pull-down lists, etc.
Auto keyword &
summarization
Taxonomy Strategies LLC The business of organized information
16
Form fill-in: Automatically filled-in known data
Auto-categorization
Rules & pattern
matching
Parse & lookup
(recognize names)
Taxonomy Strategies LLC The business of organized information
17
Tagging precedents:
See tags assigned by others
Taxonomy Strategies LLC The business of organized information
18
Multi-valued group tagging
Taxonomy Strategies LLC The business of organized information
19
Group geo-tagging
Taxonomy Strategies LLC The business of organized information
20
Group geo-tagging
Taxonomy Strategies LLC The business of organized information
21
Clean up tag tools: Alpha list
Taxonomy Strategies LLC The business of organized information
22
Batch edit
Taxonomy Strategies LLC The business of organized information
23
Share or don’t share tagging
Taxonomy Strategies LLC The business of organized information
24
Bulk tagging
 ID collection of related content items by pattern or context
 Then, apply same attributes to all content items
Taxonomy Strategies LLC The business of organized information
25
Tag a folder
 Drag & drop content items into folder
 Then, content items inherit properties of folder
Taxonomy Strategies LLC The business of organized information
26
Workflow
 Approve & improve mindset
Create
Content
Add
Metadata
Review &
Improve
Taxonomy Strategies LLC The business of organized information
Publish
Review &
Improve
27
Interactive rewards
 Almost instantaneous exposure of tags in simple user
interfaces on the web provides positive reinforcement for
user tagging that simply did not exist before.
 For example,
 Most popular
 Tag clouds
 Alerts
Taxonomy Strategies LLC The business of organized information
28
Most popular
 Another example is most emailed from, e.g., the NY Times.
Taxonomy Strategies LLC The business of organized information
29
Tag cloud
Taxonomy Strategies LLC The business of organized information
30
Alerts
 New (content selected by date)
 Subscriptions (content selected by tags)
 Interest (content selected by other people)
 Individual (content selected for you by other people)
Taxonomy Strategies LLC The business of organized information
31
Agenda
 Tagging
 Tagging Interface
 Content Organization
Taxonomy Strategies LLC The business of organized information
32
What are prevalent models? The Information
Architect
 Saul Wurman’s 5 ways to categorize things
 By location (spatially)
 By alphabet (alphabetically)
 By time (chronologically)
 By category (subject)
 By hierarchy (BT/NT, etc)
Richard Saul Wurman. Information Architects (1996)
Taxonomy Strategies LLC The business of organized information
33
Content organization models: The Records Manager
 Archives & business records
 By function (businss purpose)
 By genre (document type)
Brands &
Varieties
Events
Ingredients
Locations
Nutrients
Organizations
Functions
Accounting
Administration
Environment
Finance
Human Resources
Legal
Marketing & Sales
Plant Operations
Projects
Public Relations
Research &
Development
Tax
Treasury
Taxonomy Strategies LLC The business of organized information
Doc Types
Account Listings
Acquisitions
Cash Disbursements
Cash Receipts
Contract Accounting
Records
Credit Advices
Credit Card Charges
Donations
Employee Expense
Reports
Invoices
Petty Cash Records
Permits & Licenses
Plans & Forecasts
Royalty Payments
Sales Receipts
34
Content organization models: The Product Manager
 Management (for general business operational purposes)
 By products and services
Systems
Peripherals
Services
Support
My Account
Handhelds
Monitors
Printers
Projectors
TVs
CRT Monitors
LCD Monitors
All-in-One &
Photo Printers
B/W &
Multifunction
Laser Printers
Color Laser
Printers
Ink & Printer
Accessories
Taxonomy Strategies LLC The business of organized information
LCD TVs
Plasma TVs
Parts
All Electronics & Accessories
Desktop Accessories
Notebook Accessories
Digital Photography
Handhelds
Memory
Monitors
MP3 Players
Networking
Power
Printers & Ink
Projectors
Software & Games
Storage & Drives
TVs & Home Theater
35
Content organization models: The Marketer
 Marketing & sales
 By psycho social profiles such as lifestyle stages, personas, etc.
 By industry
 By location
Audience
Age Group
Aisles
Business
Consumer
Financial Risk
Service
Standard
Taxonomy Strategies LLC The business of organized information
Intention
Inquiry
Research
Support
Upgrade
Lifecycle
Industry
Pre-Sales
Early Life
Purchase
Experience &
Sales Process
Set Up / Installation
Billing Experience
Support
Retain & Renew
Construction &
Building
Field Services
Finance &
Insurance
Financial Services
Government
Healthcare
Higher Education
Hospitality
Services
Insurance
K-12 Education
Manufacturing
Professional
Services
Real Estate
Retail
Transportation &
Distribution
Location
Regions
ZIP Code
36
Content organization models: The Editor
 Editorial
 By content lifecycle
Social Aspects of Digital Libraries: Final Workshop Report (Nov 1996)
http://is.gseis.ucla.edu/research/dl/UCLA_DL_Report.doc
Taxonomy Strategies LLC The business of organized information
37
Facet theory & practice
 How many terms are needed to provide sufficient
granularity?
 Not as many as you think
 Post-coordinate indexing allows several simple controlled
vocabularies to be combined, rather than using a single
large pre-coordinated vocabulary.
Taxonomy Strategies LLC The business of organized information
38
The power of facets
 4 independent categories of 10
nodes each have the same
discriminatory power as one
hierarchy of 10,000 nodes (104)
 Easier to maintain
 Easier to tag by content authors
 Can be easier to navigate
Taxonomy Strategies LLC The business of organized information
Audience
Health
Industry
Advocacy
Contractors &
Grantees
Environmental
Professionals
Federal
Facilities
General Public
Industry
Kids
Researchers &
Scientists
Small Business
Students
Advisory
Exposure
Food Safety
Health
Assessment
Health Effect
Health Risk
Occupational
Health
Pesticide
Effects
Sun Protection
Toxicity
Agriculture &
Cattle
Automobile
Repair
Chemical
Dry Cleaning
Electronics &
Computer
Energy
Extractive
Industries
Food
Processing
Leather
Tanning &
Finishing
Metal Finishing
Substance
Allergen
Biological
Contaminant
Carcinogen
Chemical
Explosive
Liquid Waste
Microorganism
Ozone
Pesticide
Radioactive
Waste
39
Impact on collection size by increasing number of
terms per facet
140,000,000
125,000,000
120,000,000
Collection Size
100,000,000
80,000,000
60,000,000
51,200,000
40,000,000
20,000,000
16,200,000
200,000
3,200,000
0
110
220
330
440
550
Terms per Facet
# Docs/Category
# Facets
# Terms/Facet
Max Collection Size
# Post-coord combos
20
20
20
20
20
4
4
4
4
4
10
20
30
40
50
200,000
3,200,000
16,200,000
51,200,000
125,000,000
10,000
160,000
810,000
2,560,000
6,250,000
Taxonomy Strategies LLC The business of organized information
40
Impact on collection size by increasing number of
facets
2,500,000,000
2,000,000,000
Collection Size
2,000,000,000
1,500,000,000
1,000,000,000
500,000,000
200,000
2,000,000
200,000,000
20,000,000
0
14
52
63
Number of facets
74
85
# Docs/Category
20
20
20
20
20
# Terms/Facet
10
10
10
10
10
4
5
6
7
8
# Facets
Max Collection Size
# Post-coord combos
200,000
2,000,000
20,000,000
200,000,000
2,000,000,000
10,000
100,000
1,000,000
10,000,000
100,000,000
Taxonomy Strategies LLC The business of organized information
41
Taxonomy Strategies LLC
Is faceted indexing the future of
social tagging?
November 4, 2006
Copyright 2006 Taxonomy Strategies LLC. All rights reserved.
Summary
 There are lessons to be learned from web tagging about
how to get good metadata in document and content
management applications.
 Document and content management system tagging
must be simple, and it must be almost instantaneously
easier to find relevant work products.
Taxonomy Strategies LLC The business of organized information
43
Taxonomy Strategies LLC
Questions?
Joseph A. Busch
415-377-7912,
[email protected]
November 4, 2006
Copyright 2006 Taxonomy Strategies LLC. All rights reserved.
Tagging Overview
 Tagging, any kind of tagging is better than the words that happen to occur in
a piece of content. End user tagging is useful, so is tagging by librarians, as
are tags automatically assigned by operating systems and language
processing algorithms. Content should be tagged throughout its lifecycle,
each time the content is handled and used so that it accrues value or its
significance is diminished.
 Almost instantaneous exposure of tags in simple user interfaces on the web
provides positive reinforcement for user tagging that simply did not exist
before. It should not be surprising that a good user interface improves
usability.
 As content users flock to websites that help to organize the content on the
web, advertisements and value added content services follow. The
bottleneck in the semantic web has been not enough tagged content. The
end user tagging revolution may begin to address this shortcoming.
 There are lessons to be learned from web tagging about how to get good
metadata in document and content management applications. Document and
content management system tagging must be simple, and it must be almost
instantaneously easier to find relevant work products.
Taxonomy Strategies LLC The business of organized information
45