Transcript Slide 1
Topical search in the Twitter OSN
Saptarshi Ghosh
Collaborators:
Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP)
Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)
Topical search in Twitter
Twitter has emerged as an important source of
information & real-time news
Topical search
Search for breaking news and trending topics
Searching for topical experts
Searching for information on specific topics
Primary requirement: Identify topical expertise of
users
Profile of a Twitter user
Example tweets
Prior approaches to find topic experts
Research studies
Pal et. al. (WSDM 2011) uses 15 features from tweets,
network, to identify topical experts
Weng et. al. (WSDM 2010) uses ML approach
Application systems
Twitter Who To Follow (WTF), Wefollow, …
Methodology not fully public, but reported to utilize
several features
Prior approaches use features
extracted from
User profiles
Tweets posted by a user
Screen-name, bio, …
Hashtags, others retweeting a given user, …
Social graph of a user
Number of followers, PageRank, …
Problems with prior approaches
User profiles – screen-name, bio, …
Tweets posted by a user
Bio often does not give meaningful information
Tweets mostly contain day-to-day conversation
Social graph of a user – number of followers, PageRank
Helps to identify authoritative users, but …
Does not provide topical information
We propose …
Use a completely different feature to infer topics of
expertise for an individual Twitter user
Utilize social annotations
How does the Twitter crowd describe a user?
Social annotations obtained through Twitter Lists
Approach essentially relies on crowdsourcing
Twitter Lists
Primarily an organizational feature
Used to organize the people one is following
Create a named list, add an optional List description
Add related users to the List
Tweets posted by these users will be grouped together as
a separate stream
How Lists work ?
Using Lists to infer topics for users
If U is an expert / authority in a certain topic
U likely to be included in several Lists
List names / descriptions provide valuable semantic cues
to the topics of expertise of U
Inferring topical attributes of users
Dataset
Collected Lists of 55 million Twitter users who
joined before or in 2009
88 million Lists collected in total
All studies consider 1.3 million users who are
included in 10 or more Lists
Most List names / descriptions in English, but
significant fraction also in French, Portuguese, …
Mining Lists to infer expertise
Collect Lists containing a given user U
List names / descriptions collected into
a ‘topic document’ for the given user
Identify U’s topics from the document
Ignore domain-specific stopwords
Identify nouns and adjectives
Unify similar words based on edit-distance,
e.g., journalists and jornalistas, politicians
and politicos (not unified by stemming)
Mining Lists to infer expertise
Unigrams and bigrams considered as
topics
Extracted from topic document of U:
Topics for user U
Frequencies of the topics in the document
Topics inferred from Lists
politics, senator, congress, government,
republicans, Iowa, gop, conservative
politics, senate, government, congress,
democrats, Missouri, progressive, women
celebs, actors, famous, movies, comedy,
funny, music, hollywood, pop culture
linux, tech, open, software, libre, gnu,
computer, developer, ubuntu, unix
Lists vs. other features
Profile bio
love, daily, people, time, GUI, movie,
video, life, happy, game, cool
Most common
words from tweets
celeb, actor, famous, movie, stars,
comedy, music, Hollywood, pop culture
Most common
words from Lists
Lists vs. other features
Profile bio
Fallon, happy, love, fun, video, song,
game, hope, #fjoln, #fallonmono
Most common
words from tweets
celeb, funny, humor, music, movies,
laugh, comics, television, entertainers
Most common
words from Lists
Evaluation of inferred topics – 1
Evaluated through user-survey
Evaluator shown top 30 topics for a chosen user
Are the inferred attributes (i) accurate, (ii) informative?
Binary response for both queries
More than 93% evaluators judged the topics to be
both accurate and informative
The few negative judgments were a result of subjectivity
Evaluation of inferred topics – 2
Comparison with topics identified by Twitter WTF
Obtained top 20 WTF results for about 200 queries
3495 distinct users
Topics inferred by us from Lists include query-topic
for 2916 users (83.4%)
For the rest
Case 1 – inferred topics include semantically very similar
words, but not exact query-word (18%)
Case 2 – wrong results by WTF, unrelated to query (58%)
Comparison with Twitter WTF
Restaurant dineLA for query “dining”
Inferred topics – science, tech, space, cosmology, nasa
Comedian jimmyfallon for query “astrophysicist”
Inferred topics – food, restaurant, recipes, los angeles
Space explorer HubbleHugger77 for query “hubble”
Case 1
Inferred topics – celebs, comedy, humor, actor
Web developer ScreenOrigami for query “origami”
Inferred topics – webdesign, html, designers
Case 2
Who-is-who service
Developed a Who-is-Who
service for Twitter
Shows word-cloud for
major topics for a user
http://twitter-app.mpisws.org/who-is-who/
Inferring Who-is-who in the Twitter
Social Network, WOSN 2012
(Highest rated paper in workshop)
Identifying topical experts
Topical experts in Twitter
400 million tweets posted daily
Quality of tweets posted by different users vary
widely
News, pointless babble, conversational tweets, spam, …
Challenge: to find topical experts
Sources of authoritative information on specific topics
Basic methodology
Given a query (topic)
Identify experts on the topic using Lists
Discussed earlier
Rank identified experts w.r.t. expertise on the given
topic
Need a suitable ranking algorithm
Commonly used ranking metrics such as number of
followers, PageRank does not consider topic
Ranking experts
Two components of ranking user U w.r.t. query Q:
relevance of U to Q, popularity of U
Relevance of user to query
Cover density ranking between topic document TU of user U
and Q
Cover Density ranking preferred for short queries
Popularity of user: Number of Lists including the user
Topic relevance( TU, Q ) × log( #Lists including U )
Cognos
Search system for topical experts in Twitter
Publicly deployed at
http://twitter-app.mpi-sws.org/whom-to-follow/
Cognos: Crowdsourcing Search for Topic Experts in Microblogs,
ACM International SIGIR Conference 2012
Cognos
results for
“politics”
Cognos
results for
“stem cell”
Cognos
results for
“earthquake”
Evaluation of Cognos
System evaluated ‘in-the-wild’
People were asked to try the system and give feedback
Evaluators were students & researchers from the home
institutes of researchers
Advantage – lot of varied queries tried
Disadvantage – subjectivity in relevance judgement
User-evaluation of Cognos
Sample queries for evaluation
Evaluation results
Overall 2136 relevance judgments over 55 queries
1680 said relevant (78.7%)
Large amount of subjectivity in evaluations
Same result for same query received both relevant and
non-relevant judgments
E.g., for query “cloud computing”, Werner Vogels got
4 relevant judgments, 6 non-relevant judgments
Cognos vs Twitter Who-to-follow
Evaluator shown top 10 results by both systems
27 distinct queries were asked at least twice
Result-sets anonymized
Evaluator judges which is better / both good / both bad
Queries chosen by evaluators themselves
In total, asked 93 times
Judgment by majority voting
Cognos vs Twitter WTF
Cognos judged better on 12 queries
Twitter WTF judged better on 11 queries
Computer science, Linux, mac, Apple, ipad, India,
internet, windows phone, photography, political journalist
Music, Sachin Tendulkar, Anjelina Jolie, Harry Potter,
metallica, cloud computing, IIT Kharagpur
Mostly names of individuals or organizations
Tie on 4 queries
Microsoft, Dell, Kolkata, Sanskrit as an official language
Topical content search
Challenges in topical content search
Services today are limited to keyword search
Search for ‘politics’ get only tweets which contain the
word ‘politics’
Knowing which keywords to search for, is itself an issue
Individual tweets are too small to deduce topics
Scalability: 400M tweets posted per day
Tweets may contain spam / rumors / phishing URLs
Our approach
Look at tweets posted by a selected set of topical
experts
Inferring topic of tweets from tweeters’ expertise
Large fraction of tweets posted by experts are only about
day-to-day conversation
Solution: If multiple experts on a topic tweet about
something, it is most likely related to the topic
Sampling Tweets from Experts
We capture all tweets from 585K topical experts
The experts generate 1.46 million tweets per day
Identified through Lists
Expertise in a wide variety of topics
0.268% of all tweets on twitter scalable
Trustworthiness
Experts not likely to post spam / phishing URLs
Less chance of rumors in what is posted by several experts
Methodology at a Glance
Gather tweets from experts on given topic
Group tweets on the same news-story
Multi-level clustering (cluster: news-story)
We use a group of hashtags to represent a news-story
Cluster tweets based on the hashtags they contain
Cluster hashtags based on co-occurrence
Rank new-stories by popularity
Number of distinct experts tweeting on the story
Number of tweets on the story
Results for the
last week on
Politics
(a popular topic)
Hashtags which
co-occur frequently
grouped together
Related tweets
grouped together by
common hashtags.
The most popular tweet
in the story shown
Our system specially
excels for niche
topics.
Evaluation – Relevance
Evaluated using human feedback
Used Amazon Mechanical Turk for user evaluation
Evaluated top 10 clusters for 20 topics
Users have to judge if the tweet shown was
relevant to the given topic
Options are Relevant / Not Relevant / Can’t Say
Evaluating Tweet Relevance
We obtained 3150 judgments
80% of tweets marked relevant by majority judgment
Non-relevant results primarily due to
Global events that were discussed by experts across all
topics, e.g., Hurricane Sandy in the USA
Sometimes, topic is too specific and several experts tweet
on a broader topic (e.g., baseball and ESPN Sports
Update)
Effect of global events
Experts on all topics tweeting on #sandy
Most of these got negative judgments
Diversity of topics in Twitter
Topics in Twitter
Discovering thousands of experts on diverse topics
characterizing the Twitter platform as a whole
On what topics is expert content available in Twitter?
Popular view – few topics such as politics, sports,
music, celebs, …
We find – lots of niche topics along with the popular
ones
Topics in Twitter – major topics to niche ones
what Twitter is mostly known for
wide variety of niche topics
Thank You
Contact: [email protected]