PowerPoint 演示文稿

Download Report

Transcript PowerPoint 演示文稿

CLEar (Clairaudient Ear)

A Realtime Online Observatory for Bursty and Viral Events A demonstration of CLEar System

The Architecture of CLEar

1/13 To sum up, an meaningful event observatory should equip the following functions: 

Detection

of a bursty topic as soon as it emerges;  Early

prediction

if the bursty topic is likely to go viral; 

Summarization

of related bursty topics into semantically coherent events that can be monitored; 

Contextualization

of the events with its temporal evolution and corresponding coverage across other news media.

Recommended Materials

2/13

A Tutorial at WWW 2014 : Towards a Social Media Analytics Platform: Event Detection and User Profiling for Twitter A Tutorial at KDD 2009: Tutorial on Event Detection Hila Becker http://www.cs.columbia.edu/~hila/

Why Bursty and Viral?

3/13  more responsive and reliable sources to pick up bursty events.

 Compared against traditional news media, Twitter have been recognized as much Trigger a surge of public interest within a short period of time.

 Capable of handling both planned and unplanned event.

Topic Detection in Social Media

4/13 

Document-pivot :

for a new tweet, assign it to a simliar existing event or take it as a new event if no similar events existed(This tweet is also called the first story of this event ).

 Sasa Petrovic .etc Streaming first story detection with application to twitter HLT ‘10 

Feature-pivot :

some bursty features of hidden events would show an sharply increase than excepted when an event is happening.

  Chen Lin .etc Generating event storylines from microblogs CIKM’12 Chenliang Li .etc Twevent: segment-based event detection from tweets CIKM’12

Bursty Term Detection

#MH370 lives Southern #eat Tmr korean Sleep indian save

Bursty Term Grouping

MH370 Southern indian Korean save lives #eat tmr sleep

Candidate Event Filtering

MH370 Southern indian Korean save lives #eat tmr sleep

The shortness of existing Works

5/13 

Existing works mostly focus on event detection and extraction

without any post-processing

.

The lack of a well-established analysis for an event limits its utility.

Popularity prediction

Many challenging research problems

Topic clustering Event summarization Event contextualization … …

Popularity Prediction

User behaviors like replying and retweeting provide new mechanism for information diffusion.

Topic popularity can be measured by the size of involved users.

Prediction of topic popularity can not only have a recognize of event trends, but also remove noisy and spam bursty topics at an early stage.

 The challenges of this problem come from the uncertainty in information diffusion path and insufficient information at the early stage of a burst, offering little clue as to whether the detected bursty topic would sustain its virality or simply die down quickly. 6/13

Topic Clustering

 Due to the existence of many

duplicate

and

semantically close topics

, it is desirable to remove duplicate topics and group together topics to form a coherent event.

 A single-pass incremental clustering problem.

The essential problem of clustering is define a metric to measure the similarity between topic and exiting event(cluster).

 Simply based on co-occurrence of bursty keywords likely to be absent because they are much shorter compared to formal document and largely depend on the detection algorithm.

7/13

Topic Clustering cont.

Measure the similarity between topic and event from the following perspective: Content Similarity User Similarity Time Similarity Volume Similarity Entity Similarity How to combine those individual similarities ?

 An intuitive approach to combine those individual similarities is using different weights. However, the number of different weight combination is huge and we don’t have some prior knowledge about the weights. 

Learning weighting scheme through a classification model to form a unified similarity metric.

8/13

Event Summarization

9/13  Traditional summarization methods mainly focus on content summarization to extract representative tweets from an event relevant tweet set.

 Besides, we propose to summarize this event from structure and user perspective.

 A fundamental problem is

Sub-event Detection

.

Sub-event Detection

10/13    An event usually contains some more fine-grained stages and detection algorithms can’t detect all stage of an event generally.

Detection of all possible sub-events provide a basis for study some deeply properties of event. Both volume [2,3] and content [1] of this event provide a signal to sub-event occurrence. Compared against volume curve, we think that the content is more trustful due to the volume curve is largely depended on the retrieval results and user publish pattern.

 To solve this problem, we should overcome the following two difficulties:

Retrieval

: How to retrieve high-quality tweets about this event?

Sub-event

: How to detect all sub-events in a online manner?

[1] Akshaya Iyengar .etc Content-based prediction of temporal boundaries for events in twitter. Socialcom 2011 [2] Jeffrey Nichols .etc Summarizing sporting events using twitter IUI’2012 [3] Arkaitz Zubiaga .etc Towards real-time summarization of scheduled events from twitter streams

1. How to retrieve high-quality tweets about this event?

11/13 

Common practice

: using event keywords as a query to search in tweet collections.  The following three factors remains a large obstacle to employ standard retrieval methods: -A. Seemingly relevant tweets with good textual quality might not be truly relevant to the event; - B. Tweets highly relevant to the event might not contain any of the query keywords; - C. Query keywords might can’t represent the event comprehensively and even provide a noisy indicator.

 To solve A , besides relevance score returned by

Elasticsearch

, we can integrate other features like tweet-specific features, publisher features to reorder the search result.

 To solve B and C , we can use event keyword expansion, take the

burstiness

of term consideration besides traditional TF-IDF value during the expand term selection.

[1] into [1] Metzler D, Cai C, Hovy E. Structured event retrieval over microblog archives[C] ACL 2012: 646-655.

2. How to detect all sub-events in a online manner?

12/13 

Topic Model :

high complexity and its output are usually a general topic.

Event Boundary Prediction :

can only divide this event into before, during, after.

 We propose to firstly divide the event duration into equal-sized non-overlapping timespan, then merge adjacent timespans into an sub-event along a chronological order.

 Finally, we should verify sub-event’s

popularity and reliability

to filter spurious sub-events. The reliability can measured by total followers of all publishers while the popularity can reflected by the number of retweets.

Event Contextualization

Find a representative picture of this event.

Find some related news about this event.

13/13