Model of Taxonomy Development
Download
Report
Transcript Model of Taxonomy Development
Semantic Infrastructure for KM 2.0
A new approach to folksonomies and other knowledge
representations
Tom Reamy
Chief Knowledge Architect
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com
2.0 Themes
“Tags are great because you throw caution to the wind, forget
about whittling down everything into a distinct set of categories
and instead let folks loose categorizing their own stuff on their
own terms." - Matt Haughey - MetaFilter
“It’s MySpace meets YouTube meets Wikipedia meets Google –
on steroids.”
“It’s ignorance meets egotism meets bad taste meets mob rule –
on steroids.” – The Cult of the Amateur – Andrew Keen
“Things fall apart; the center cannot hold;
Mere anarchy is loosed upon the world,…
The best lack all conviction, while the worst
Are full of passionate intensity.” - The Second Coming – W.B.
Yeats
2
Agenda
Introduction
Essentials of Folksonomies
–
Advantages, Disadvantages, and Dangers of Folksonomies
Improving the Quality of Folksonomies
–
Facets and Flickr
– Del.icio.is – Topics, Popularity and Findability
Semantic Infrastructure Solution
–
–
–
Elements of Semantic Infrastructure / KM 2.0
Evolving Folksonomies
Ontologies and Natural categories
Conclusion
3
KAPS Group
Knowledge Architecture Professional Services (KAPS)
Consulting, strategy recommendations
Knowledge architecture audits
Partners – Convera, Inxight, FAST, and others
Taxonomies: Enterprise, Marketing, Insurance, etc.
–
Taxonomy customization, ontology development
Intellectual infrastructure for organizations
–
–
Knowledge organization, technology, people and processes
Search, content management, portals, collaboration,
knowledge management, e-learning, etc.
4
Essentials of Folksonomies?
Wikipedia: A folksonomy is an Internet-based information
retrieval methodology consisting of collaboratively
generated, open-ended labels that categorize content such
as Web pages, online photographs, and Web links.
A folksonomy is most notably contrasted from a taxonomy –
done by users, not professionals,
Example sites – Del.icio.us and Flickr (not really – no
feedback)
It is just metadata that users add
Key – social mechanism for seeing other tags
5
Advantages of Folksonomies
Simple (no complex structure to learn)
–
No need to learn difficult formal classification system
Lower cost of categorization
–
Distributes cost of tagging over large population
Open ended – can respond quickly to changes
Relevance – User’s own terms
Support serendipitous form of browsing
Easy to tag any object – photo, document, bookmark
Better than no tags at all
Getting people excited about metadata!
6
Disadvantages of Folksonomies - Quality
They don’t work very well for finding
No structure, no conceptual relationships
–
Flats lists do not a onomy make
Issues of scale – popular tags already showing a million hits
Limited applicability – only useful for non-technical or nonspecialist domains
Either personal tags (other’s can’t find) or popularity tags – lose
interesting terms (Power law distribution)
–
Most people can’t tag very well – learned skill
Errors – misspellings, single words or bad compounds, single use
or idiosyncratic use
7
Dangers of Folksonomies
Unwisdom of Crowds
–
–
“We find that whole communities suddenly fix their minds
upon one object, and go mad in its pursuit; that millions of
people become simultaneously impressed with one delusion,
and run after it, till their attention is caught by some new folly
more captivating than the first.”
From witch hunts to tulipomania to stock market crash
• Extraordinary Popular Delusions and the Madness of Crowds
Tyranny of the majority
–
–
Popularity drowns quality
Narrowing of choices, lost content
8
Better Folksonomies:
Will social networking make tags better?
Not so far – example of Del.icio.us – same tags
Quality and Popularity are very different things
Most people don’t tag, don’t re-tag
Study – folksonomies follow NISO guidelines – nouns, etc –
but do they actually work – see analysis
Most tags deal with computers and are created by people
that love to do this stuff – not regular users and infrequent
users – Beware true believers!
9
Flickr Facets
10
Flickr Facets
Basic Facets – over 90% of content
Place – Amsterdam to Beach – 40%
– Events, Date, People, Things / Animals, Color
–
Subject Matter – less than 1%
Works on lower level scales:
–
Artparade, tourofbritain, stgilesfair, hideoutblockparty (last weeks)
Faceted navigation – extremely powerful, easy to use to find
How to develop automatic facets?
–
–
Design facet system – one time cost, some monitoring
Entity Extraction, suggested placement
11
Del.icio.us Tags
Design blog software music tools reference art video
programming webdesign web2.0 mac howto linux
tutorial web free news photography shopping blogs
css imported education travel javascript food games
Development inspiration politics flash apple tips java google osx
business windows iphone science productivity books toread helath funny
internet wordpress ajax ruby research humor fun technology search
opensource
Photoshop media recipes cool work article marketing security mobile jobs rails
lifehacks tutorials resources php social download diy ubuntu freeware portfolio
photo movies writing graphics youtube audio online
12
Del.icio.us - Topics, not Facets
High level topics - photography, news, education
Get related terms by popularity, not conceptual
–
Photography
• Synonyms - photo, photos
• Related – art, design, images, camera
• Related Facet – howto, tutorial, photoshop
Popularity is not quality
–
Dominance of computer terms
– Tyranny of the majority – design (1 MIL), interior design – 3,909
Top 25 – same set, slight order shift – social inertia
New terms - important – iphone, ipod, .net, ebooks,facebook
– Dropped terms – adult, babes, britney, naked, sex, sexy
–
13
Del.icio.us - Folksonomy Findability
Too many hits (where have we heard that before?)
–
Design – 1 Mil, software – 931,259, sex – 129,468
No plurals, stemming (singular preferred)
Folksonomy – 14,073, folksonomies – 3,843, both – 1,891
– Blog-1.7M, blogs – 516,340, Weblog- 155,917, weblogs – 36,434,
blogging – 157,922, bloging – 697
– Taxonomy – 9.683, taxonomies – 1,574
–
Personal tags – cool, fun, funny, etc
–
–
Good for social research, not finding documents or sites
How good for personal use? Funny is time dependent
14
Del.icio.us - Improving the Quality
Bundle tags – if used?
–
Types of relationships – ubuntu – tutorial, howto, reference, tips,
install
Ontology Clusters – grow with people and software
Taxonomy Clusters – software – Linux - ubuntu
Add broad general taxonomy of most popular tags
Tags as natural categories – build up and down
– Start – evolve a simple 2 level taxonomy
– People assign tags to a category, build numbers
–
Evolve quality of tags and emerging structure of tags
Preferred term = popular (Blog/blogs – Books/book)
– Add mechanisms – rank tags, taggers, categories
–
15
Enterprise Environment – KM 2.0
From internet to intranet – we’ve done this before
–
Remember early Intranets built on Internet model?
Smaller content repositories, more coherent
More precise targets – specific documents (the official
version) not web sites
More formal – from documents to publishing procedures
More control of publishing – corporate policy
More options for tagging – part of CM system, policy,
dedicated editor team, reward system
16
Semantic Infrastructure Solution
What won’t work:
–
–
Recommendations about count-non-count nouns or singular –
plural
Link to online dictionary or Wikipedia – extra work, whole
focus is on ease of tagging – any help has to be immediate
and integrated - or done by a central group
New Relationship of Center and Crowd
–
–
Not top down or bottom up
Interpenetration of opposites
Integrated Solution: Content Structures, People,
Technology, Policies and Procedures
17
Semantic Infrastructure: People
KM 2.0 (or 3.0?)
–
KM always concerned with social aspects of knowledge
New relationship of center and users – more sophisticated
support, more freedom, more suggestions, more user input
–
–
- New roles – for users (taggers, part of variety of communities –
both distributed and central)
New roles for central – create feedback system, tweak the evolution
of the system, Develop initial candidates
Communities of Practice – apply to tagging, ranking
Community Maps – formal and informal
– Map tags to communities – more useful suggestions
– Use tags to uncover communities (see tech SNA)
–
18
Semantic Infrastructure - Technology
Enterprise Content Management
Place to add metadata – of all kinds, not just keywords
– Policy support – important, part of job performance
– Add tag clouds to input page
– More sophisticated displays
–
• Tag clouds mapped to community map
• Tag clusters, taxonomy location
Semantic Software – Inxight, Teragram etc.
–
Suggest terms based on text, on tag clouds
Social Networking – add semantics
–
SNA – apply to people and tags
KM – platforms, COP’s – social tags
19
Semantic Infrastructure: Putting it all together
Complexity Theory and Folksonomies: Feedback
Ranking Methods
–
Explicit – people rank directly
• Categories, tags, taggers
• Good tags, best bets for terms or categories?
–
Implicit – software evaluation, reverse relevance
Ranking Roles
–
–
–
Taggers – everyone (rewards, make it easy and fun)
Meta-taggers – everyone (but levels of meta-taggers)
Editors – tagging system, integration with taxonomy, resolve
disputes, Wikipedia model
20
Content Structures – Best of Both Worlds
Start and end with a formal taxonomy / Ontology
–
Findability vastly superior
– Communication with others – share tags
– Take advantage of conceptual relationships
Tagging experience – folksonomies plus
Users can type any word – system looks it up – plurals, synonyms,
preferred terms, spelling variations
– Software suggestions – based on content of bookmark, document
and on popular user tags – natural level not top down
– New terms flagged and routed to central team
–
Facets – for both things and documents (faceted taxonomy)
–
–
Software suggests facet values, user override
Cognitively simpler task than own value, complex hierarchy
21
Conclusions: Semantic Infrastructure for KM 2.0
Folksonomies can help – but they need help to evolve
better quality
Fundamental contradiction of ease of tagging and findability
will limit usefulness of Internet folksonomies
Enterprise (Intranets, KM) is where the benefits will happen
Semantic Infrastructure solution (people, policy, technology,
semantics) and feedback is best approach
Evolve folksonomies, taxonomies, ontologies – not central,
top-down design
Intelligent Design + Darwin = new job – Taxonomy Gods
22
Questions?
Tom Reamy
[email protected]
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com