Transcript USDA ERS
Taxonomy Strategies pvc Taxonomy & Metadata Strategies for Effective Content Management February 10, 2004 Copyright 2004 Taxonomy Strategies. All rights reserved. Who I am Over 25 years in the business of organized information Founder, Taxonomy Strategies Director, Solutions Architecture, Interwoven VP, Infoware, Metacode Technologies (acquired by Interwoven, November 2000) Program Manager, Getty Foundation Manager, Pricewaterhouse Metadata and taxonomies community leadership. President, American Society for Information Science & Technology Director, Dublin Core Metadata Initiative Adviser, National Research Council Computer Science and Telecommunications Board Reviewer, National Science Foundation Division of Information and Intelligent Systems Founder, Networked Knowledge Organization Systems/Services TAXONOMY STRATEGIES The business of organized information 2 Agenda Remember: Dublin Core = Descriptive cataloging Myth #1: Taxonomies are monolithic hierarchies Myth #2: People retrieve content by topical subjects Myth #3: Nobody else can index content Myth #4: All a search engine can retrieve is a list TAXONOMY STRATEGIES The business of organized information 3 What is metadata? Another view of Dublin Core Complexity Subject metadataBetter – Use metadata – navigation How & can it be used: What & Why: Subject, Description, Rights & Permissions discovery Coverage Asset metadata – Who, Where &More When: efficient Relational metadata Links between and to: Title, Creator, Publisher, editorial Contributor, Date, Type, Relation process Format, Identifier, Source, Language – Enabled Functionality http://dublincore.org/documents/dces/ TAXONOMY STRATEGIES The business of organized information 4 Metadata is a data model– A scheme for e-Forms Element Data Type Req. / Repeat Length Source Purpose Asset Identifier Integer Fixed 1 System supplied Basic accountability Registrar String Variable 1 LDAP validated Accountability & maintenance Form Name String Variable ? User Text search, results display Form Number String Variable 1 User Text search, results display Revision Date Date Fixed 1 User Filter or rank search results ? Organization vocabulary Key index to retrieve & aggregate assets Agency List Fixed Subject Form Type List Variable 1 Form Type vocabulary Industry Code List Fixed ? NAICS codes Browse or group search results * 2-letter USPS codes Browse or group search results * FEA Business Ref Model vocabulary Browse or group search results ... ... Jurisdiction List Fixed Purpose List Variable ... ... ... ... Legend: TAXONOMY STRATEGIES The business of organized information ? – 1 or more Browse or group search results * - 0 or more 5 Agenda Remember: Dublin Core = Descriptive cataloging Myth #1: Taxonomies are monolithic hierarchies Myth #2: People retrieve content by topical subjects Myth #3: Nobody else can index content Myth #4: All a search engine can retrieve is a list TAXONOMY STRATEGIES The business of organized information 6 What is a taxonomy? Information A Systematics view.design view. system The specification of the names of people, places, things The specification … and everything of the names elseofthat people, is needed places, things to allow search engines and other content applications to work better. Animalia Chordata Mammalia Carnivora Canidae Canis C. familiari Kingdom Phylum Class Order Family Genus Species Linnaeus … 44-Office Equipment and Accessories and Supplies .12-Office Supplies .17-Writing Instruments .05-Mechanical pencils .06-Wooden pencils .07-Colored pencils Segment Family Class Commodity UNSPSC … TAXONOMY STRATEGIES The business of organized information 7 What is the purpose of a Taxonomy? To … Discover Classify HORSE-DRAWN CARRIAGE CAR Create PLANE SPACE SHUTTLE WINGS ENGINE WHEELS Content Assets Taxonomy Logical & Intuitive Filters Site Maps Search Engines Portals Content Integration Networks … find the right information at the right time to solve the problem at hand TAXONOMY STRATEGIES The business of organized information 8 Taxonomy– e-Forms example Agency 0001 Legislative 1000 Judicial 1100 Executive Office of Pres 0003 Exec Depts 1200 Agriculture 1300 Commerce 9700 Defense 9100 Education 8900 Energy 7500 HHS 7000 DHS 8600 HUD 1400 Interior 1500 Justice 1600 Labor 1900 State 6900 Transport 2000 Treasury 3600 Veterans Ind Agencies Intl Orgs Form Type Industry Impact Application Approval Claim Information request Information submission Instructions Legal filing Payment Procurement Renewal Reservation Service request Test Other input Other transaction 00 Generic 11 Agriculture 21 Mining 22 Utilities 23 Construct 31-33 Manuf 42 Wholesale 44-45 Retail 48-49 Trans 51 Info 52 Finance 54 Profession 55 Mgmt 56 Support 61 Education 62 Health Care 71 Arts 72 Hospitality 81 Other Services 92 Public Admin Jurisdiction BRM Impact Facet Federal Categories Citizen Srvcs State Local Other Social Srvs Defense Disasters Econ Dev Education Energy Env Mgmt Law Enf Judicial Correctional Health Security Income Sec Intelligence Intl Affairs Nat Resour Transport Workforce Science Delivery Support Management Keyword Topic Agriculture & food Commerce Communications Education Energy Env pro Foreign rels Govt Health & safety Housing & comm dev Labor Law Named grps National def Nat resources Recreation Sci & tech Social pgms Transport Audience All General Citizen Business Govt Employee Native American Nonresident Tourist Special group Controlled Vocabularies TAXONOMY STRATEGIES The business of organized information 9 The power of taxonomy facets 4 independent categories of 10 nodes each have the same discriminatory power as one hierarchy of 10,000 nodes (104) Easier to maintain Can be easier to navigate TAXONOMY STRATEGIES The business of organized information 10 Agenda Remember: Dublin Core = Descriptive cataloging Myth #1: Taxonomies are monolithic hierarchies Myth #2: People retrieve content by topical subjects Myth #3: Nobody else can index content Myth #4: All a search engine can retrieve is a list TAXONOMY STRATEGIES The business of organized information 11 7 Common taxonomy facets Personalized content delivery requires defining taxonomy facets Facet Definition Example Source Organization Organizational structure. FIPS 95-2, Your organizational structure, etc. Content Type Structured list of the various types of content being managed or used. AGLS Document Type, AAT Information Forms , Records management policy, etc. Industry Broad market categories such as lines of business, life events, or industry codes. FIPS 66, SIC, NAICS, etc. Location Place of operations or constituencies. FIPS 5-2, FIPS 55-3, ISO 3166, US Postal Service, etc. Function Functions and processes performed to accomplish mission and goals. FEA Business Reference Model, Enterprise Ontology, AAT Functions, etc. Topic Business topics relevant to your mission and goals. Federal Register Thesaurus, NAL Agricultural Thesaurus, etc. Audience Subset of constituents to whom a piece of content is directed or intended to be used. GEM, ERIC Thesaurus, IEEE LOM, etc. Products and Services Names of products and services. ERP system, Your products and services, etc. … and re-use of existing vocabulary sources TAXONOMY STRATEGIES The business of organized information 12 Mapping facets to Dublin Core Dublin Core Elements Definition Vocabulary Source Title Resource name. Not applicable Creator Content maker. LDAP Subject Content topic. Topic facet Description Description of content, summary. Not applicable Publisher Publisher of this manifestation. Agency facet Contributor Content contributor. LDAP Date Content lifecycle event for this manifestation. Not applicable Type Genre. Form Type facet Format Format of this manifestation. RFC 2045 Identifier Reference for this manifestation, e.g., URL. Not applicable Source Source from which this manifestation has been derived. Not applicable Language Language of this manifestation. ISO 639 Relation Reference to related resource. None Coverage Space, period, date, jurisdiction, etc. Jurisdiction facet Rights Who has rights to use this manifestation. Access security/privacy classification TAXONOMY STRATEGIES The business of organized information 13 Facets at work on FirstGov site Frequency Organization Audience Content Type TAXONOMY STRATEGIES The business of organized information 14 Agenda Remember: Dublin Core = Descriptive cataloging Myth #1: Taxonomies are monolithic hierarchies Myth #2: People retrieve content by topical subjects Myth #3: Nobody else can index content Myth #4: All a search engine can retrieve is a list TAXONOMY STRATEGIES The business of organized information 15 Indexing rules… Simplified for creator indexing Rule Description Specificity rule Apply the most specific terms when tagging assets. Specific terms can always be generalized, but generic terms cannot be specialized. Repeatable rule All attributes should be repeatable. Use as many terms as necessary to describe What the asset is about and Why it is important. Storage is cheap. Re-creating content is expensive. Appropriateness rule Not all attributes apply to all assets. Only supply values for attributes that make sense. Usability rule Anticipate how the asset will be searched for in the future, and how to make it easy to find it. Remember that search engines can only operate on explicit information. TAXONOMY STRATEGIES The business of organized information 16 FAA Metadata Scheme 1.0 Attribute Values Content Types Types of content. Organizations Functions FAA organizations, partners, and industry associations. Locations by geography and function. Regulatory functions. Topics Web site categories. Audiences Audiences for a content item. Locations TAXONOMY STRATEGIES The business of organized information 17 FAA-Office of Accident Investigation New NTSB Safety Recommendations— discovery indexing example Attribute Values Content Types Rules Organizations Office of Accident Investigation (AAI) Locations United States Functions Rulemaking Topics Regulations and Policies Audiences Air carriers TAXONOMY STRATEGIES The business of organized information 18 Blueprint for NAS Modernization 2002 Update—discovery indexing example Attribute Values Content Types Reports Organizations Office of System Architecture and Investment Analysis (ASD) Locations Airspace areas Functions Airspace Topics Airports and Air Traffic Audiences Government TAXONOMY STRATEGIES The business of organized information 19 Agenda Remember: Dublin Core = Descriptive cataloging Myth #1: Taxonomies are monolithic hierarchies Myth #2: People retrieve content by topical subjects Myth #3: Nobody else can index content Myth #4: All a search engine can retrieve is a list TAXONOMY STRATEGIES The business of organized information 20 Problems of “information space” design Within a website Show extent of content available Filter large content sets into reviewable groups Find more content like this Show latest content Feature selected content Outside a website Contribute content to portals Map to other taxonomies Integrate with web search engines TAXONOMY STRATEGIES The business of organized information 21 How to show the coverage of a large collection— NASA Taxonomy demo site TAXONOMY STRATEGIES The business of organized information 22 FirstGov search versus NASA Taxonomy search + facet navigation—Type in: Rover TAXONOMY STRATEGIES The business of organized information 23 How to show the coverage of a large collection— FAO collection TAXONOMY STRATEGIES The business of organized information 24 Agricola search versus FAO search + facet navigation—Type in: Thinning TAXONOMY STRATEGIES The business of organized information 25 Website search best practices Search Results Page 1 Search 6 3 Best Bets Meaningful results order Last updated 4 5 Meaningful titles & descriptions Content Page Search Box Search 9 7 8 Facet browse w/ counts Site nav w/ extras Inline links 10 Links to related content Subscriptions Subscribe TAXONOMY STRATEGIES The business of organized information 26 Additional sources on faceted metadata Hearst, M., Elliott, A., English, J., Sinha, R., Swearingen, K., and Yee, P., Finding the Flow in Web Site Search, Communications of the ACM, 45 (9), September 2002, pp.42-49. http://www.sims.berkeley.edu/~hearst/papers/cacm02.pdf K. Yee, K. Swearingen, K., M. Hearst. “Searching and organizing: Faceted metadata for image search and browsing.” Proceedings of the Conference on Human Factors in Computing Systems (April 2003) http://bailando.sims.berkeley.edu/papers/flamenco-chi03.pdf Taxonomy Strategies Website Bibliography > Click on taxonomy http://www.taxonomystrategies.com/html/bibliography.htm#taxonomy TAXONOMY STRATEGIES The business of organized information 27 Taxonomy Strategies Joseph A. Busch + 415-377-7912 [email protected] February 10, 2004 Copyright 2004 Taxonomy Strategies. All rights reserved.