Taxonomy Development Workshop
Download
Report
Transcript Taxonomy Development Workshop
Taxonomy and Social Media
Social Taxonomies
Tom Reamy
Chief Knowledge Architect
KAPS Group
Program Chair – Text Analytics World
Knowledge Architecture Professional Services
http://www.kapsgroup.com
Agenda
Introduction
It’s a Different World
–
Content and Intent
New Approaches
–
To Taxonomy
– Text Analytics
New Applications – and Opportunities
Conclusion
2
Introduction: KAPS Group
Knowledge Architecture Professional Services – Network of Consultants
Applied Theory – Faceted & emotion taxonomies, natural categories
Services:
– Strategy – IM & KM - Text Analytics, Social Media, Integration
– Taxonomy/Text Analytics, Social Media development, consulting
– Text Analytics Quick Start – Audit, Evaluation, Pilot
Partners – Smart Logic, Expert Systems, SAS, SAP, IBM, FAST,
Concept Searching, Attensity, Clarabridge, Lexalytics
Clients: Genentech, Novartis, Northwestern Mutual Life, Financial
Times, Hyatt, Home Depot, Harvard Business Library, British Parliament,
Battelle, Amdocs, FDA, GAO, World Bank, Dept. of Transportation, etc.
Program Chair – Text Analytics World – March 29-April 1 - SF
Presentations, Articles, White Papers – www.kapsgroup.com
Current – Book – Text Analytics: How to Conquer Information Overload,
Get Real Value from Social Media, and Add Smart Text to Big Data
3
New Content Characteristics
It’s a Very Different World
Scale – orders of magnitude – 100’s of millions, Billions
Speed – 20-100 million a day
Size – Twitter, Blogs, forums, email
–
140 characters to a few sentences
Quality – misspellings, lack of structure, incoherence
Conversations – not stand alone docs
–
Can’t tell what a “document” is about without reference to
previous threads
Purpose – communicate - social grooming, rant
–
Not exchange of ideas, policies, etc.
Simple Content Complexity – single thoughts, simplicity of
emotion
4
New Content Characteristics
It’s a Very Different World – Search and Taxonomy
i tried very slow, NO GOOGLE search, some apps not working.. This is
not a "with GOOGLE" My friend has incredible, that is much batter..
Anyways i returned samsung, replace incredible. What's great about
it: 4" LCD What's not so great: NOT A GOOGLE PHONE
(nt 2.0)willie John ci to/for: wanted to know about charges for pic mail for
;bill date 4/5/2010 | repeat: no | auth: pin | ptns affected: 7777777777 |
information/instructions given: sup gave pic mail for free and gave adj for
$ 2.40 new bal is $ 147.53 | any mobile, anytime: n | ir: yes | ir-email: n |
5
New Content Characteristics
It’s a Very Different World – Topical Current Content
Content not archived (for users)
No real need for search (or just very simple search)
Very Poor (if any) metadata – not faceted search
Focus on phrases, sentences – not documents
Little need of a subject taxonomy
About emotions, things, products, people
Who are the users? They don’t need our help
Taxonomies, we don’t need no stinking taxonomies!
6
It’s a Very Different World
So why are we talking about it at a taxonomy boot camp?
Taxonomy = structure (purists can leave now)
All of this content is a rich source of research material
Companies are mining this resource and they need to add
structure to get deeper understanding
Varieties of structure:
–
–
–
Simple topical taxonomies 2-3 levels
Emotion taxonomies, Ontologies and Semantic Networks
Dynamic taxonomies – built on public taxonomies, enterprise
taxonomy – exposed in hierarchical triples .
Need more automatic / semi-automatic solutions
–
Advanced text analytics
7
New Kinds of Social Taxonomies
New Taxonomies – Appraisal
Appraisal Groups – Adjective and modifiers – “not very good”
– Four types – Attitude, Orientation, Graduation, Polarity
– Supports more subtle distinctions than positive or negative
–
Emotion taxonomies
–
Joy, Sadness, Fear, Anger, Surprise, Disgust
– New Complex – pride, shame, embarrassment, love, awe
– New situational/transient – confusion, concentration, skepticism
Beyond Keywords – Need Text Analytics
–
–
–
Analysis of phrases, multiple contexts – conditionals, oblique
Analysis of conversations – dynamic of exchange, private language
Enterprise taxonomy rolled into a categorization taxonomy
8
9
10
Case Study – Categorization & Sentiment
11
Case Study – Categorization & Sentiment
12
Taxonomy and Social Media: Applications
New Range of Applications
Real Sentiment Analysis - Limited value of Positive and Negative
– Degrees of intensity, complexity of emotions and documents
–
Contextual rules – “I would have loved X except for the battery”
Expertise Analysis
Experts think & write differently – process, chunks
– Categorization rules for documents, authors, communities
–
Behavior Prediction–TA and Predictive Analytics, Social Analytics
Crowd Sourcing – technical support to Wiki’s
Political – conservative and liberal minds/texts
–
Disgust, shame, cooperation, openness
13
Taxonomy and Social Media: Applications
Pronoun Analysis: Fraud Detection; Enron Emails
Patterns of “Function” words reveal wide range of insights
Function words = pronouns, articles, prepositions, conjunctions, etc.
– Used at a high rate, short and hard to detect, very social, processed
in the brain differently than content words
Areas: sex, age, power-status, personality – individuals and groups
Lying / Fraud detection: Documents with lies have
– Fewer and shorter words, fewer conjunctions, more positive emotion
words
– More use of “if, any, those, he, she, they, you”, less “I”
– More social and causal words, more discrepancy words
Current research – 76% accuracy in some contexts
Text Analytics can improve accuracy and utilize new sources
14
Taxonomy and Social Media: Applications
Behavior Prediction – Telecom Customer Service
Basic Rule
–
(START_20, (AND,
–
(DIST_7,"[cancel]", "[cancel-what-cust]"),
– (NOT,(DIST_10, "[cancel]", (OR, "[one-line]", "[restore]", “[if]”)))))
Examples:
–
customer called to say he will cancell his account if the does not stop receiving
a call from the ad agency.
– cci and is upset that he has the asl charge and wants it off or her is going to
cancel his act
– ask about the contract expiration date as she wanted to cxl teh acct
Combine sophisticated rules with sentiment statistical training and
Predictive Analytics and behavior monitoring
15
Taxonomy, Text Analytics, and Social Media
Conclusions
Social Media is a Different World
–
Content, Scale, Questions
New Types of Taxonomy
–
Smaller, more dynamic subject taxonomies
– Appraisal, Emotion, Things, Motivations, Actions, etc.
Taxonomists – Time to Explore new structures
–
Ontologies, semantic networks, all of above
Text Analytics – needs good taxonomy design – levels, etc.
–
Adds a platform – flexible and powerful auto-tagging,
Result: New Types of Applications
– Stand alone and with standard search/taxonomy
– Merge data and text, external and internal
16
Questions?
Tom Reamy
[email protected]
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com