Taxonomy Development Workshop

Download Report

Transcript Taxonomy Development Workshop

Enterprise Semantic Infrastructure
Workshop
Tom Reamy
Chief Knowledge Architect
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com
Agenda
 Introduction
 Semantic Infrastructure
Basic Concepts – Content, People, Business Processes, Technology
– Developing an Articulated Strategic Vision
– Benefits of an Infrastructure Approach
–
 Development and Maintenance of a Semantic Infrastructure
Semantic Tools – Capabilities & Acquisition Strategy
– Development Processes & Best Practices
–
 Semantic Infrastructure Applications
–
Enterprise Search
– Search Based Applications & Beyond
 Discussion &Questions / Lunch
2
KAPS Group: General






Knowledge Architecture Professional Services
Virtual Company: Network of consultants – 8-10
Partners – SAS, Smart Logic, Microsoft, Concept Searching, etc.
Consulting, Strategy, Knowledge architecture audit
Services:
– Taxonomy/Text Analytics development, consulting, customization
– Technology Consulting – Search, CMS, Portals, etc.
– Evaluation of Enterprise Search, Text Analytics
– Metadata standards and implementation
– Knowledge Management: Collaboration, Expertise, e-learning
Applied Theory – Faceted taxonomies, complexity theory, natural
categories
3
Questions?
Tom Reamy
[email protected]
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com
Semantic Infrastructure
Basic Concepts & Benefits
Tom Reamy
Chief Knowledge Architect
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com
Agenda
 Semantic Infrastructure – Basic Concepts
–
Content & Content Structure
– People – Resources, Producers, Consumers
– Semantics in Business Processes
– Technology – Information, Text Analytics, Text Mining
 Semantic Infrastructure – Strategic Foundation
–
Knowledge Audit Plus
 Semantic Infrastructure – Benefits of an Infrastructure Approach
–
–
Infrastructure vs. Projects
Semantics vs. Technology
 Conclusion
6
Semantic Infrastructure: 4 Dimensions
 Ideas – Content and Content Structure
–
–
Map of Content – Tribal language silos
Structure – articulate and integrate
 People – Producers & Consumers
–
Communities, Users, Central Team
 Activities – Business processes and procedures
–
Semantics, information needs and behaviors
 Technology
–
–
CMS, Search, portals, text analytics
Applications – BI, CI, Semantic Web, Text Mining
7
Semantic Infrastructure: 4 Dimensions
Content and Content Structure
 Map multiple types and sources of content
–
Structured and unstructured, internal and external
 Beyond Metadata and Taxonomy
–
Keywords - poor performance
– Dublin Core: hard to implement
– Dublin Core: Too formal and not formal enough
 Need structures that are more powerful and more flexible
–
Model of framework and smart modules
 Framework
–
–
–
–
Faceted metadata
Simple taxonomies with intelligence – categorization & extraction
Ontology and Semantic Web
Best bets and user metadata
8
Knowledge Structures










List of Keywords (Folksonomies)
Controlled Vocabularies, Glossaries
Thesaurus
Browse Taxonomies (Classification)
Formal Taxonomies
Faceted Classifications
Semantic Networks / Ontologies
Categorization Taxonomies
Topic Maps
Knowledge Maps
9
A Framework of Knowledge Structures
 Level 1 – keywords, glossaries, acronym lists, search logs
– Resources, inputs into upper levels
 Level 2 – Thesaurus, Taxonomies
– Semantic Resource – foundation for applications, metadata
 Level 3 – Facets, Ontologies, semantic networks, topic
maps, Categorization Taxonomies
– Applications
 Level 4 – Knowledge maps
– Strategic Resource
10
Semantic Infrastructure: People
 Communities / Tribes
–
Different languages
– Different Cultures
– Different models of knowledge
 Two needs – support silos and inter-silo communication
 Types of Communities
–
–
–
Formal and informal
Variety of subject matters – vaccines, research, sales
Variety of communication channels and information behaviors
 Individual People – tacit knowledge / information behaviors
–
–
Consumers and Producers of information – In Depth
Map major types
11
Semantic Infrastructure Dimensions
People: Central Team
 Central Team supported by software and offering services
–
–
–
–
–
–
–
Creating, acquiring, evaluating taxonomies, metadata standards,
vocabularies, categorization taxonomies
Input into technology decisions and design – content management,
portals, search
Socializing the benefits of metadata, creating a content culture
Evaluating metadata quality, facilitating author metadata
Analyzing the results of using metadata, how communities are using
Research metadata theory, user centric metadata
Facilitate knowledge capture in projects, meetings
12
Semantic Infrastructure Dimensions
People: Location of Team
 KM/KA Dept. – Cross Organizational, Interdisciplinary
 Balance of dedicated and virtual, partners
–
Library, Training, IT, HR, Corporate Communication
 Balance of central and distributed
 Industry variation
–
–
–
Pharmaceutical – dedicated department, major place in the
organization
Insurance – Small central group with partners
Beans – a librarian and part time functions
 Which design – knowledge architecture audit
13
Semantic Infrastructure Dimensions
Technology Infrastructure
 Enterprise platforms: from creation to retrieval to
application
–
Semantic Infrastructure as the computer network
• Applications – integrated meaning, not just data
 Semantic Structure
–
Text Analytics – taxonomy, categorization, extraction
 Integration Platforms – Content management, Search
–
–
Add structure to content at publication
Add structure to content at consumption
14
Infrastructure Solutions: Resources
Technology
 Text Mining
–
–
Both a structure technology – taxonomy development
And an application
 Search Based Applications
–
Portals, collaboration, business intelligence, CRM
– Semantics add intelligence to individual applications
– Semantics add ability to communicate between applications
 Creation – content management, innovation, communities of
practice (CoPs)
–
When, who, how, and how much structure to add
– Workflow with meaning, distributed subject matter experts (SMEs)
and centralized teams
15
Infrastructure Solutions: Elements
Business Processes
 Platform for variety of information behaviors & needs
–
–
Research, administration, technical support, etc.
Types of content, questions
 Subject Matter Experts – Info Structure Amateurs
 Web Analytics – Feedback for maintenance & refine
 Enhance Basic Processes – Integrated Workflow
–
Enhance Both Efficiency and Quality
 Enhance support processes – education, training
 Develop new processes and capabilities
–
External Content – Text mining, smarter categorization
16
Semantic Infrastructure: The start and foundation
Knowledge Architecture Audit
 Knowledge Map - Understand what you have, what you
are, what you want
–
The foundation of the foundation
 Contextual interviews, content analysis, surveys, focus
groups, ethnographic studies, Text Mining
 Category modeling – “Intertwingledness” -learning new
categories influenced by other, related categories
 Natural level categories mapped to communities, activities
• Novice prefer higher levels
• Balance of informative and distinctiveness
 Living, breathing, evolving foundation is the goal
17
Knowledge Architecture Audit:
Knowledge Map
Project
Foundation
Contextual
Interviews
Information
Interviews
App/Content User Survey
Catalog
Strategy
Document
Meetings,
work groups
Overview
High Level:
Process
Community
Info
behaviors
of Business
processes
Technology
and content
All 4
dimensions
Meetings,
work groups
General
Outline
Broad
Context
Deep
Details
Deep
Details
Complete
Picture
New
Foundation
18
Semantic Infrastructure
Enterprise Taxonomies: Wrong Approach
 Very difficult to develop - $100,000’s
 Even more difficult to apply
–
–




Teams of Librarians or Authors/SME’s
Cost versus Quality
Problems with maintenance
Cost rises in proportion with granularity
Difficulty of representing user perspective
Social media requires a framework – doesn’t create one
–
Tyranny of the majority, madness of crowds
19
Semantic Infrastructure
Content Structures: New Approach
 Simple Subject Taxonomy structure
–
Easy to develop and maintain
 Combined with categorization capabilities
–
Added power and intelligence
 Combined with Faceted Metadata
–
–
Dynamic selection of simple categories
Allow multiple user perspectives
• Can’t predict all the ways people think
• Monkey, Banana, Panda
 Combined with ontologies and semantic data
–
–
Multiple applications – Text mining to Search
Combine search and browse
20
Semantic Infrastructure Design:
People, Technology, Business Processes
 People (Central) – tagging, evaluating tags, fine tune rules
and taxonomy
 People (Users) - social tagging, suggestions
 Software - Text analytics, auto-categorization, entity
extraction
 Software – Search, Content Management, Portals-Intranets
–
Hybrid model – combination of automatic and human
 Business Processes – integrated search with activities, text
analytics based applications , intelligent routing
21
Semantic Infrastructure Benefits
Why Semantic Infrastructure






Unstructured content = 80% or more of all content
Limited Usefullness – database of unstructured content
Need to add (infra) structure to make it useful
Information is about meaning, semantics
Search is about semantics, not technology
Can’t Google do it?
–
–
–
Link Algorithm – human act of meaning
Doesn’t work in enterprise
1,000’s of editors adding meaning
 New technology makes it possible – Text Analytics
22
Semantic Infrastructure Benefits
General Time and Productivity
 Time Savings – Too Big to Believe?
–
–
–
–
Lost time searching - $12M a year per 1,000
Cost of recreating lost information - $4.5M per 1,000
Cost of not finding the right information – Years?
10% improvement = $1.2M a year per 10,000
 Making Metrics Human
–
–
–
–
Number of addition FTE’s at no cost (enhanced productivity)
Savings passed on to clients
Spreadsheet of extra activities (ex. Training – working smarter
Build a more integrated, smarter organization
23
Semantic Infrastructure Benefits
Return on Existing Technology
 Enterprise Content Management - $100K - $2M
–
Underperforming – year after year, new initiative every 5 years
 ECM as part of a Platform
–
Enhance search – improved metadata, especially keywords
 A Hybrid Model of ECM and Metadata
–
–
–
Authors, editors-librarians, Text Analytics
Submit a document -> TA generates metadata, extracts
concepts, Suggests categorization (keywords) -> author OK’s
(easy task) -> librarian monitors for issues
Use results as input into analytics
24
Semantic Infrastructure Benefits
Return on Existing Technology
 Enterprise Search - $100K - $2M
–
–
Cost Effective and good quality keywords / categorization
More metadata – faceted navigation
 Work with ECM or dynamically generate categorization at
search results time
 Rich results – summaries, categorization, facets like date,
people, organizations, etc. Tag clouds and related topics
 Foundation for Search Based Applications – all need
semantics
25
Semantic Infrastructure Benefits
Infrastructure vs. Projects
 Strategic foundation vs. Short Term
 Integrated solution – CM and Search and Applications
–
–
Better results
Avoid duplication
 Semantics
–
–
Small comparative cost
Needed to get full value from all the above
 ROI – asking the wrong question
–
–
What is ROI for having an HR department?
What is ROI for organizing your company?
26
Semantic Infrastructure Benefits
Selling the Benefits
 CTO, CFO, CEO
–
–
–
–
–
–
Doesn’t understand – wrong language
Semantics is extra – harder work will overcome
Not business critical
Not tangible – accounting bias
Does not believe the numbers
Believes he/she can do it
 Need stories and figures that will connect
 Need to understand their world – every case is different
 Need to educate them – Semantics is tough and needed
27
Conclusion
 Semantic Infrastructure is not just a project
–
Foundation and Platform for multiple projects
 Semantic Infrastructure is not just about search
–
It is about language, cognition, and applied intelligence
 Strategic Vision (articulated by K Map) is essential
–
–
Even for your under the radar vocabulary project
Paying attention to theory is practical
 Benefits are enormous – believe it!
 Think Big, Start Small, Scale Fast
–
Initial Project = +10%, All Other Projects = -50%
28
Questions?
Tom Reamy
[email protected]
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com