Taxonomy Development Workshop

Download Report

Transcript Taxonomy Development Workshop

Taxonomy and Social Media
Social Taxonomies
Tom Reamy
Chief Knowledge Architect
KAPS Group
Program Chair – Text Analytics World
Knowledge Architecture Professional Services
http://www.kapsgroup.com
Agenda
 Introduction
 It’s a Different World
–
Content and Intent
 New Approaches
–
To Taxonomy
– Text Analytics
 New Applications – and Opportunities
 Conclusion
2
Introduction: KAPS Group
 Knowledge Architecture Professional Services – Network of Consultants
 Applied Theory – Faceted & emotion taxonomies, natural categories
Services:
– Strategy – IM & KM - Text Analytics, Social Media, Integration
– Taxonomy/Text Analytics, Social Media development, consulting
– Text Analytics Quick Start – Audit, Evaluation, Pilot
 Partners – Smart Logic, Expert Systems, SAS, SAP, IBM, FAST,
Concept Searching, Attensity, Clarabridge, Lexalytics
 Clients: Genentech, Novartis, Northwestern Mutual Life, Financial
Times, Hyatt, Home Depot, Harvard Business Library, British Parliament,
Battelle, Amdocs, FDA, GAO, World Bank, Dept. of Transportation, etc.
 Program Chair – Text Analytics World – March 29-April 1 - SF
 Presentations, Articles, White Papers – www.kapsgroup.com
 Current – Book – Text Analytics: How to Conquer Information Overload,
Get Real Value from Social Media, and Add Smart Text to Big Data
3
New Content Characteristics
It’s a Very Different World
 Scale – orders of magnitude – 100’s of millions, Billions
 Speed – 20-100 million a day
 Size – Twitter, Blogs, forums, email
–
140 characters to a few sentences
 Quality – misspellings, lack of structure, incoherence
 Conversations – not stand alone docs
–
Can’t tell what a “document” is about without reference to
previous threads
 Purpose – communicate - social grooming, rant
–
Not exchange of ideas, policies, etc.
 Simple Content Complexity – single thoughts, simplicity of
emotion
4
New Content Characteristics
It’s a Very Different World – Search and Taxonomy
 i tried very slow, NO GOOGLE search, some apps not working.. This is
not a "with GOOGLE" My friend has incredible, that is much batter..
Anyways i returned samsung, replace incredible. What's great about
it: 4" LCD What's not so great: NOT A GOOGLE PHONE
 (nt 2.0)willie John ci to/for: wanted to know about charges for pic mail for
;bill date 4/5/2010 | repeat: no | auth: pin | ptns affected: 7777777777 |
information/instructions given: sup gave pic mail for free and gave adj for
$ 2.40 new bal is $ 147.53 | any mobile, anytime: n | ir: yes | ir-email: n |
5
New Content Characteristics
It’s a Very Different World – Topical Current Content








Content not archived (for users)
No real need for search (or just very simple search)
Very Poor (if any) metadata – not faceted search
Focus on phrases, sentences – not documents
Little need of a subject taxonomy
About emotions, things, products, people
Who are the users? They don’t need our help
Taxonomies, we don’t need no stinking taxonomies!
6
It’s a Very Different World




So why are we talking about it at a taxonomy boot camp?
Taxonomy = structure (purists can leave now)
All of this content is a rich source of research material
Companies are mining this resource and they need to add
structure to get deeper understanding
 Varieties of structure:
–
–
–
Simple topical taxonomies 2-3 levels
Emotion taxonomies, Ontologies and Semantic Networks
Dynamic taxonomies – built on public taxonomies, enterprise
taxonomy – exposed in hierarchical triples .
 Need more automatic / semi-automatic solutions
–
Advanced text analytics
7
New Kinds of Social Taxonomies
 New Taxonomies – Appraisal
Appraisal Groups – Adjective and modifiers – “not very good”
– Four types – Attitude, Orientation, Graduation, Polarity
– Supports more subtle distinctions than positive or negative
–
 Emotion taxonomies
–
Joy, Sadness, Fear, Anger, Surprise, Disgust
– New Complex – pride, shame, embarrassment, love, awe
– New situational/transient – confusion, concentration, skepticism
 Beyond Keywords – Need Text Analytics
–
–
–
Analysis of phrases, multiple contexts – conditionals, oblique
Analysis of conversations – dynamic of exchange, private language
Enterprise taxonomy rolled into a categorization taxonomy
8
9
10
Case Study – Categorization & Sentiment
11
Case Study – Categorization & Sentiment
12
Taxonomy and Social Media: Applications
New Range of Applications
 Real Sentiment Analysis - Limited value of Positive and Negative
– Degrees of intensity, complexity of emotions and documents
–
Contextual rules – “I would have loved X except for the battery”
 Expertise Analysis
Experts think & write differently – process, chunks
– Categorization rules for documents, authors, communities
–
 Behavior Prediction–TA and Predictive Analytics, Social Analytics
 Crowd Sourcing – technical support to Wiki’s
 Political – conservative and liberal minds/texts
–
Disgust, shame, cooperation, openness
13
Taxonomy and Social Media: Applications
Pronoun Analysis: Fraud Detection; Enron Emails
 Patterns of “Function” words reveal wide range of insights
 Function words = pronouns, articles, prepositions, conjunctions, etc.
– Used at a high rate, short and hard to detect, very social, processed
in the brain differently than content words
 Areas: sex, age, power-status, personality – individuals and groups
 Lying / Fraud detection: Documents with lies have
– Fewer and shorter words, fewer conjunctions, more positive emotion
words
– More use of “if, any, those, he, she, they, you”, less “I”
– More social and causal words, more discrepancy words
 Current research – 76% accuracy in some contexts
 Text Analytics can improve accuracy and utilize new sources
14
Taxonomy and Social Media: Applications
Behavior Prediction – Telecom Customer Service
 Basic Rule
–
(START_20, (AND,
–
(DIST_7,"[cancel]", "[cancel-what-cust]"),
– (NOT,(DIST_10, "[cancel]", (OR, "[one-line]", "[restore]", “[if]”)))))
 Examples:
–
customer called to say he will cancell his account if the does not stop receiving
a call from the ad agency.
– cci and is upset that he has the asl charge and wants it off or her is going to
cancel his act
– ask about the contract expiration date as she wanted to cxl teh acct
 Combine sophisticated rules with sentiment statistical training and
Predictive Analytics and behavior monitoring
15
Taxonomy, Text Analytics, and Social Media
Conclusions
 Social Media is a Different World
–
Content, Scale, Questions
 New Types of Taxonomy
–
Smaller, more dynamic subject taxonomies
– Appraisal, Emotion, Things, Motivations, Actions, etc.
 Taxonomists – Time to Explore new structures
–
Ontologies, semantic networks, all of above
 Text Analytics – needs good taxonomy design – levels, etc.
–
Adds a platform – flexible and powerful auto-tagging,
 Result: New Types of Applications
– Stand alone and with standard search/taxonomy
– Merge data and text, external and internal
16
Questions?
Tom Reamy
[email protected]
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com