Taxonomy Development Workshop

Download Report

Transcript Taxonomy Development Workshop

Facets and Faceted Navigation
Development
Tom Reamy
Chief Knowledge Architect
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com
Agenda
 Two Case Studies
–
Good and Bad
 Development Process
–
Research Foundation
–
Facet Design: Sources
Integrated Solution
–
• Metadata Strategy – Technology and People
–
Develop, Test, Monitor, Refine Application
 Conclusions
2
Enterprise Environment – Case Studies
 A Tale of Two Taxonomies
–
It was the best of times, it was the worst of times
 Basic Approach
–
–
–
–
–
–
Initial meetings – project planning
High level K map – content, people, technology
Contextual and Information Interviews
Content Analysis
Draft Taxonomy – validation interviews, refine
Integration and Governance Plans
3
Enterprise Environment – Case One – Taxonomy, 7 facets
 Taxonomy of Subjects / Disciplines:
–
Science > Marine Science > Marine microbiology > Marine toxins
 Facets:
–
Organization > Division > Group
– Clients > Federal > EPA
– Instruments > Environmental Testing > Ocean Analysis > Vehicle
– Facilities > Division > Location > Building X
– Methods > Social > Population Study
– Materials > Compounds > Chemicals
– Content Type – Knowledge Asset > Proposals
4
Enterprise Environment – Case One – Taxonomy, 7 facets
 Project Owner – KM department – included RM, business
process
 Involvement of library - critical
 Realistic budget, flexible project plan
 Successful interviews – build on context
–
Overall information strategy – where taxonomy fits
 Good Draft taxonomy and extended refinement
–
–
Software, process, team – train library staff
Good selection and number of facets
 Final plans and hand off to client
5
Enterprise Environment – Case Two – Taxonomy, 4 facets
 Taxonomy of Subjects / Disciplines:
–
Geology > Petrology
 Facets:
–
Organization > Division > Group
– Process > Drill a Well > File Test Plan
– Assets > Platforms > Platform A
– Content Type > Communication > Presentations
6
Enterprise Environment – Case Two – Taxonomy, 4 facets
 Environment Issues
–
Value of taxonomy understood, but not the complexity
and scope
– Under budget, under staffed
– Location – not KM – tied to RM and software
• Solution looking for the right problem
–
Importance of an internal library staff
– Difficulty of merging internal expertise and taxonomy
7
Enterprise Environment – Case Two – Taxonomy, 4 facets
 Project Issues
–
–
Project mind set – not infrastructure
Wrong kind of project management
• Special needs of a taxonomy project
• Importance of integration – with team, company
–
Project plan more important than results
• Rushing to meet deadlines doesn’t work with semantics as
well as software
8
Enterprise Environment – Case Two – Taxonomy, 4 facets
 Research Issues
–
–
–
Not enough research – and wrong people
Interference of non-taxonomy – communication
Misunderstanding of research – wanted tinker toy connections
• Interview 1 implies conclusion A
 Design Issues
–
–
–
Not enough facets
Wrong set of facets – business not information
Ill-defined facets – too complex internal structure
9
Taxonomy Development
Conclusion: Risk Factors
 Political-Cultural-Semantic Environment
–
Not simple resistance - more subtle
• – re-interpretation of specific conclusions and sequence of
conclusions / Relative importance of specific recommendations
 Understanding project scope
 Access to content and people
–
Enthusiastic access
 Importance of a unified project team
–
Working communication as well as weekly meetings
10
Faceted Navigation: Development process
Overview
 Research Foundation – KA Audit
Environment – Technology and People
– Users, Content, Information Behaviors and Needs
–
 Facet Design - Sources
–
Selection of Facets and Facet Structure
 Integrated solution
–
Metadata Strategy – Technology and People
 Application
–
–
Design, Develop, Test, Refine
Monitor and Refine
11
Faceted Navigation: Development process
Information / Knowledge Environment
 Strategic Foundation
Info Problems – what, how severe
– Political environment – support, special interests
–
 Strategic Questions – why, what value from the taxonomy and
facet classification, how are you going to use it
 Technology Environment – ECM, Enterprise Search
 High Level Content Map / Content Structures
 High Level Community Map – formal and informal
12
Faceted Navigation: Development process
Facet Design - Sources
 Facet Theory and Practice
–
Broaden your perspective
 Domain Collection - metadata
–
Database or Catalog
– Unstructured content – Much more difficult
 Content Structure – vocabularies, glossaries, etc.
 Building Facets – facetize the taxonomy
–
Pull out facets –
• Chemistry – Agents/Compounds, Instruments
• Chemistry and Health -- methods
 Current or projected metadata as source
–
Content Types – presentations, well reports, policy
13
Faceted Navigation: Development process
Research Foundation
 Users – formal and informal communities
–
How do users think, categorize
–
Information behaviors and needs
– Natural Level categories
 What labels do they use?
–
–
Assets vs. Facilities and instruments / Processes vs Activities
Issue – labels that people use to describe their business and label
that they use to find information
 Suitability of Facets and Facet Labels
–
Support for user tasks
 Interviews, surveys, search log analysis, folksonomies
14
Faceted Navigation: Development process
An Integrated Approach: Elements
 Multiple Knowledge Structures
–
–
Facet – orthogonal dimension of metadata
Taxonomy - Subject matter / aboutness
 Technology – Search, Content Management
 Text analytics
–
–
Entity extraction – feeds facets, signatures, ontologies
Taxonomy & Auto-categorization – aboutness, subject
 People – tagging, evaluating tags, fine tune rules and
taxonomy
 People – Users, social tagging, suggestions
15
Faceted Navigation: Development process
Integrated Solutions: Technology
 Search – Integrated features, facets and clusters and tag
clouds and feedback
 Enterprise Content Management – tagging and Policy
–
–
Place to add metadata, supported by policy
Gather input from authors, tag clouds plus
 Text Analytics – Taxonomy management, entity extraction,
categorization, sentiment
–
–
Auto-populate variety of metadata – author, title, date, etc.
Relevance – best bets to weights and classes of documents
16
Faceted Navigation: Development process
Software Tools – Auto-categorization
 Auto-categorization
–
–
–
–
–
–
Training sets – Bayesian, Vector Machine
Terms – literal strings, stemming, dictionary of related terms
Rules – simple – position in text (Title, body, url)
Advanced – saved search queries (full search syntax)
NEAR, SENTENCE, PARAGRAPH
Boolean – X NEAR Y and Not-Z
 Advanced Features
Facts / ontologies /Semantic Web – RDF +
– Sentiment Analysis – positive, negative, neutral
–
17
Faceted Navigation: Development process
Software Tools – Entity Extraction
 Dictionaries – variety of entities, coverage, specialty
Cost of update – service or in-house
– Inxight – 50+ predefined entity types
– Nstein – 800,000 people, 700,000 locations, 400,000 organizations
–
 Rules
Capitalization, text – Mr., Inc.
– Advanced – proximity and frequency of actions, associations
– Need people to continually refine the rules
–
 Entities and Categorization
–
Total number and pattern of entities = a type of aboutness of
the document – Bar Code, Fingerprint
18
Faceted Navigation: Development process
Integrated Solution: People
 Programmers, Librarians, Taxonomists, Metadata specialist
–
Integrate, design, develop rules, monitor activity & quality
 Authors, Subject Matter Experts
–
Input into design (important facets), rules, activity meaning
 Users – Web 2.0
–
–
–
Feedback – quality and usability
Suggestions – missing terms, bad categorization & entity
Tags Clouds & folksonomy – for social networking features,
not for information retrieval
19
Faceted Navigation: Development process
Faceted Navigation Application
 Usability Studies
–
–
–
–
–
Integration with browse/search - Findability
Equal ranked facets or primary-secondary facets
Granularity of Facets
Ordering of the facets
Sorting within facets
 Monitor usage and refine.
–
Unused facets / Preferred facets / facet combinations
– Map to user communities / information behaviors
 Refine auto-categorization and entity values
–
Disambiguation
20
Conclusion - Development
 Design starts with self-knowledge – users, content, activities
 Integrated Solution is needed
–
Multiple Knowledge structures, technology, people
– Search, Content management, text analytics
 Faceted navigation requires a lot of Metadata
 Text Analytics (Entity extraction and auto-categorization) are
essential
 Monitor and Refine never ends – dedicated resources
 Semantic Projects are different
–
Project management, software evaluation
21
Conclusions – Faceted Navigation
 The future is the combination of simple facets (name catalogs of
entities) with rich taxonomies with complex semantics / ontologies
–
Ontologies = Relationships of two facets
 Facets call for a new type of taxonomies
–
Faceted taxonomies and/or simple taxonomies
 Future – new kinds of applications:
–
Text Mining, research tools, sentiment
 Future of Search – smart ways to refine results, not better
relevance
–
–
Real problem with 10 mil hits – no way to get to target
Include facets, taxonomies, semantics, & lots of metadata
22
Questions?
Tom Reamy
[email protected]
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com
Faceted Navigation Resources
 Articles
–
Faceted Classification Resource Collection
• http://deyalexander.com/resources/faceted-classification.html
–
A Simplified Model for Facet Analysis
• http://iainstitute.org/pg/a_simplified_model_for_facet_analysis.ph
p
–
Mailing List for Faceted Classification
• http://www.poorbuthappy.com/fcd/
–
Study – Facets on the Web (75 ecommerce sites)
• http://mypage.iu.edu/%7Eklabarre/facetstudy.html
24
Faceted Navigation Resources
 Example Implementations
–
Berkeley SIMS – Flamenco
–
http://bailando.sims.berkeley.edu/flamenco.html
Facetmap – demo’s – www.facetmap.com
 Tools
–
–
–
–
–
Business Objects / Inxight – entity and fact extraction –
www.inxight.com
Teragram – www.teragram.com
Lexalytics – www.lexalytics.com
Data Harmony – www.dataharmony.com
Smart Logic – www.smartlogic.com
25
Faceted Navigation Resources
 Vendors
–
–
Most Search vendors now offer faceted navigation
FAST, Autonomy, etc.
• Beware of parametric search sold as facets
–
Most focused on facets – application and metrics:
• Endeca – http://www.endeca.com
26
Faceted Navigation Resources
 Articles
–
How to Make a Faceted Classification and Put It On the Web
• http://www.misatonic.org/library/facet-web-howto.html
–
Putting Facets on the Web: An Annotated Bibliography
• http://www.miskatonic.org/library/facet-biblio.html
–
–
Ecommerce – cooking and kitchen – Faceted Navigation
http://www.
Extended Faceted Taxonomies for Web Catalogs
• http://www.ercim.org/publication/Ercim_News/enw51/tzitzikas.html
–
Webdesignpractices – study of ecommerce use of faceted
navigation – Use of Faceted Classification
• http://www.webdesignpractices.com/navigation/facets.html
27