Transcript Slide 1

Web 2.0, Tagging, Search
engines, RawSugar
Frank Smadja
RawSugar
May 2006
RawSugar
What is Web 2.0
Tim O’Reilly:
Web 2.0 is the network as platform, spanning all connected devices; Web 2.0 applications are those that make the most
of the intrinsic advantages of that platform: delivering software as a continually-updated service that gets better
the more people use it, consuming and remixing data from multiple sources, including individual users, while
providing their own data and services in a form that allows remixing by others, creating network effects through
an "architecture of participation," and going beyond the page metaphor of Web 1.0 to deliver rich user
experiences.
http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html
RawSugar
What is Web 2.0?
Social Web – “Wisdom of Crowds”
– Users are publishers
– Network effect – SHARE – e.g: blogger.com, flickr, youtube, del.icio.us, tadalist.com, i4giveu.com,
Technology:
– Software delivery: Hours, Users are testers
– AJAX (more later)
– E.g.: 30Boxes, Writely, Google Calendar
Business model:
–
–
–
–
Free for users, Paid Advertisements
Share revenues with users
E.g., Google adsense, simpy, RawSugar
Pageviews => $$$$
RawSugar
Social Web – Wisdom of Crowds
(1) diversity of opinion
(2) independence of
members from one
another
(3) decentralization and
(4) a good method for
aggregating opinions
Show: Digg amazon.com Yahoo!
Movies
RawSugar
What is Tagging?
From Gary Larson
RawSugar
Tagging Example
RawSugar
Before Tagging: Classification
• Too hard to classify
• Too expensive
• Not scalable
• Yahoo! directory
• Dmoz
• Semantic Web
RawSugar
Categorization is hard!!
Object worth
remembering
(article,
image…)
Multiple
concepts
activated
Choose
ONE of the
activated
concepts.
Categorize
it!
AnalysisParalysis!
From Rashmi Sinha
RawSugar
Tagging is simpler
Object worth
remembering
(article,
image…)
Multiple
concepts
are
activated
Tag
it!
Note all
concepts
From Rashmi Sinha
RawSugar
The Personal to the Social
From Rashmi Sinha
RawSugar
Tagging is a reality
• Bookmarkers tag:
– Delicious, Rawsugar, Shadows, Simpy, Blinklist, …
• Bloggers tag:
– 27 million blogs, doubles every 6 months
– 1/3rd of blog posts now use tags (or categories)
• Many more:
–
–
–
–
–
BBC – news site
News - Digg
YouTube - Video
Flickr, photo publishing and tagging
Enterprise? Museums? Cell phones?
Most user generated content is tagged !
RawSugar
What Tagging is NOT
– NOT: Generous and altruistic people
classifying the Web for the sake of the
community
– NOT: Smart software automatically classifying
Web pages and tagging them
– NOT: A collaborative way to classify the web
into a growing giant ontology (folksonomy)
RawSugar
So why do People Tag?
– Recovery/sharing of personal information:
• Bookmarks
• Photos
• Videos, etc.
– Increased traffic and findability
• Bloggers
– Social reward
– Advertisement $ 
Tagging brings value to the tagger
RawSugar
Why is Tagging successful?
•Tagging is free
•Tagging is easy
•Tagging brings value
[Marlow, Naaman, Boyd & Davis 2006]
Semantic
Web
Tagging
Who
classifies
Publishers or
Librarians
Everybody,
consumers
Controlled
vocabulary
Yes
No
Imposed
structure
Yes
No
Classification
cost
High
Free
Recovery
NA
Yes
Searchability
Low
Medium
Navigation
High
Medium
RawSugar
RawSugar
• Covers the last mile of search
• Provides Guided Search on tagged pages
• Publish guided search
– Provide guided search to your site, Blog
– Get more traffic
– Receive advertising revenues!
Search and Explore
– Navigate by topics, people, directories
– Find Experts
RawSugar
Nothing to eat here!
RawSugar
Still no food here !
RawSugar
Bingo !
RawSugar
What’s Great
What’s not Great ?
• Great:
– You know what you’re looking for:
• “Zibibbo restaurant” -
• Not so great:
– You’re hungry !
– You want to browse - Discover information, explore.
– You want to know what is popular (“restaurants,
digital camera, Java Tutorial, Free Games, etc.”)
RawSugar
State of the art:
The Last Mile of Search
• 83% unhappy with search results (WSJ survey)
– Most searches point to a list of content websites and directories
– Navigation of these sites is cumbersome and tedious
• Google 2 steps approach:
– Search “restaurants”
– While (true) { explore guide; }
– Change the query and Repeat
“The last mile of search” Examples:
Digital Camera
Palo Alto bike
Daily Kos
Sprol dot Com
RawSugar
Where is the last mile?
Google stops here:
Human Knowledge:
• Small and mid-size websites and blogs
• Content is organized by human and manually:
– Categorization
– recommendations
• Poor search and navigation
• Each directory is an island of information and
does not connect to related directories
RawSugar
What’s Missing?
Browsing with Facets
“Easy to discover information without prior knowledge of collection contents “
Faceted Search Paradigm
Not new:
•
•
•
•
Library systems: “American history”, “Shakespeare”, etc.
Search Engines: Endeca, Shopping.com, Yahoo! Directories, Dmoz, etc.
Google/MSN/Yahoo! Local Search - Browse by Location Current uses: E-Commerce
Problems:
•
•
•
Maintained by humans – Expensive
Rely on a world order – Brittle
Facets use a controlled vocabulary – Not easy to define.
=> Not Scalable
RawSugar
Amazon – Faceted Search
Search for Tel Aviv
RawSugar
Shopping.com Faceted Search
Search for Tel Aviv
RawSugar
RawSugar Faceted Search
Refine your search
RawSugar
RawSugar Faceted Search
Juniorbonner on del.icio.us vs. Juniorbonner on RawSugar
RawSugar
RawSugar Into the Last Mile
RawSugar inside
RawSugar
RawSugar Into the Last Mile
RawSugar inside
RawSugar
RawSugar Faceted Search
in the last mile
Daily Kos Blog
Search for Iran on RawSugar
RawSugar
RawSugar Technology
RawSugar
Problem 1:
Searching the TagSpace
How would
You tag this?
How would
You search
For it?
Tags: Ikura, Uni, Ebi, Sushi, Nigiri, Japanese food, lunch in Tokyo,
Ezobafun-uni, Kitamurashiuni, Murasakiuni, Akazaebi, Tenagaebi, etc.
RawSugar
Problem 2:
Exploring the TagSpace
Locations
Restaurant Type
morphology
Not a restaurant!
RawSugar
Problem 3:
Exploring the TagSpace
Not usable !
RawSugar
RawSugar – Tag Hierarchy
Guided Navigation
Food groups
Origins groups
Locations
groups
RawSugar
RawSugar Tag Hierarchy
• Key idea: Some users (4%) define tag
hierarchies – (food>sushi, european>spanish, …)
• We mine this tag space to learn simple
tag-relations (ISA relations and RELATED)
using statistics.
• At search time: We apply this learned
knowledge to group tags from results.
RawSugar
RawSugar –Guided Search
Combining Hierarchy Fragments
User 3
User 1
food
cooking
europe
recipes
UK
Scotland
Edinburgh
User4
Spain
Asian
Chinese
Thai
Italy
User 2
User 5
food
vegetarian
Sushi
Southwest
California
Bay Area
San Francisco
Texas
RawSugar
RawSugar: Mining and Clustering
Tags
sailing
• Related tags: Tags that are
related – (collocations,
synonymy, antinomy, ISA,
HASA, …)
• Related pages: Pages
tagged similarly
Pages
• Related people: People
with similar interests
RawSugar TagSpace
People
RawSugar
Related work
Rashmi Sinha: “Tag Sorting: Another tool in an information architect's toolbox”
http://www.rashmisinha.com/archives/05_02/tag-sorting.html
Emanuele Quintarelli: “Hierarchical taxonomies from flat tag spaces”
http://www.infospaces.it/wordpress/topics/information-architecture/91
Paul Heyman (Stanford): “Tag Hierarchies”
http://i.stanford.edu/~heymann/taghierarchy.html
Brooks, Montanez, University of San Francisco: “Improved Annotation of the Blogosphere
via Autotagging and Hierarchical Clustering ”
http://www.cs.usfca.edu/~brooks/papers/brooks-montanez-www06.pdf
Siderean fac.etio.us: “Faceted search on delicious tags”
http://www.siderean.com/delicious/facetious.jsp
Marti Hearst: “Clustering vs. Faceted Search”
http://bailando.sims.berkeley.edu/papers/cacm06.pdf
And more …
RawSugar
Conclusion
Questions?
RawSugar
Backup Technology Slides
RawSugar
What should we do?
Smart Backend – Easy Tagging
“Tag Relations improve searchability and exploration.”
Similar tags:
• Spelling and morphology: macos<->mac_os<->mac os;
tagging <-> tags <->tagged,
• Synonyms: macos <-> tiger; films <-> movies; new york <-> nyc;
• Related: cooking <-> recipes, software development <-> programming,
Tag groups or subtags:
•Location -> san francisco, london, new york, etc.
•Food -> sushi, sashimi, pizza, etc.
•Programming -> html, java, css, etc.
Goal : Discover them by Mining the tag space
RawSugar
What should we do?
Smart Backend – Friendly Frontend
• Backend should not dictate Frontend (Patrick Schmitz,
Berkeley/Yahoo!)
•Smart processing is done by the backend under the hood.
• Tagging should be as effortless as possible, assisted but not
automatic. Fight Analysis-Paralysis (Rashmi Sinha)
• Systems should be built to incite people to tag. Bring Value to
the tagger
RawSugar
What is Missing?
Tag relations
“Tag Relations improve searchability and exploration.”
Similar tags:
• Spelling and morphology: macos<->mac_os<->mac os;
tagging <-> tags <->tagged,
• Synonyms: macos <-> tiger; films <-> movies; new york <-> nyc;
• Related: cooking <-> recipes, software development <-> programming,
Tag groups or subtags:
•Location -> san francisco, london, new york, etc.
•Food -> sushi, sashimi, pizza, etc.
•Programming -> html, java, css, etc.
Goal : Discover them by Mining the tag space
RawSugar
Flickr – Clusters
RawSugar
Clustering – Step 1
Similarity among tags
RawSugar
Some good Clusters found
RawSugar
Tags that belong to the same clusters -
RawSugar
Dmoz – World Order
RawSugar
Dmoz – World Order
RawSugar
Recommendations: dpreview
RawSugar
Faceted Search on TagSpace
Challenges
• Faceted search paradigm on the
TagSpace:
– Not a controlled environment
– Large scale (1 facet for every 5 documents)
– Lots of noise: search, search engine, google,
search_engines, searchengine, searchengines,
search_engine, engine, web, internet, tools,
reference, news, information, portal, engines,
searching, tech, buscadores, tool …
RawSugar
Faceted Search on TagSpace
Challenges
How to rank facets? What facets should be
displayed? How to show them?
• Performance: Reduce the search space • Refining facets: Tags that allow the user to
refine (reduce) the search (depth)
• Related facets: Tags that allow the user to
explore (breadth)
• Group facets: Cluster tags that are related -
RawSugar
Before RawSugar
RawSugar
With RawSugar
Other
users
navigation
RawSugar
Searching the TagSpace with
RawSugar: Suggestion Engine
Goals:
-
Ease of tagging
Cohesiveness of our tagspace. Attempts to have our users re-use the same tags instead of
creating infinite variations. (search engines, searchengine, search, search tools, search sites, etc.)
Key Ideas :
-
Always suggest first the most popular tags
Use tag hierarchy and tag context to find the most relevant tags.
Use information on the user and the other users to refine the suggestions.
RawSugar
What’s Missing?
Human Meta Knowledge
Is it good or no? What is it about? Is it popular?
Not new:
• Guides: paloaltoonline.com, expedia.com, etc..
• Review Sites - Zagat.com, dpreview.com, etc.
• Shopping sites – shopping.com, Amazon,
Problems:
• Limited to small environments or verticals (digital camera,
restaurants, etc.)
• Not real search across sites • Manpower – hiring, training, etc.
=> Not Scalable
RawSugar