PowerPoint 演示文稿

Download Report

Transcript PowerPoint 演示文稿

CLEar (Clairaudient Ear)

A Realtime Online Observatory for Bursty and Viral Events A demonstration of CLEar System

The Architecture of CLEar

1/13 To sum up, an meaningful event observatory should equip the following functions: 

Detection

of a bursty topic as soon as it emerges;  Early

prediction

if the bursty topic is likely to go viral; 

Summarization

of related bursty topics into semantically coherent events that can be monitored; 

Contextualization

of the events with its temporal evolution and corresponding coverage across other news media.

Recommended Materials

2/13

A Tutorial at WWW 2014 : Towards a Social Media Analytics Platform: Event Detection and User Profiling for Twitter A Tutorial at KDD 2009: Tutorial on Event Detection Hila Becker http://www.cs.columbia.edu/~hila/

Why Bursty and Viral?

3/13  more responsive and reliable sources to pick up bursty events.

 Compared against traditional news media, Twitter have been recognized as much Trigger a surge of public interest within a short period of time.

 Capable of handling both planned and unplanned event.

Topic Detection in Social Media

4/13 

Document-pivot :

for a new tweet, assign it to a simliar existing event or take it as a new event if no similar events existed(This tweet is also called the first story of this event ).

 Sasa Petrovic .etc Streaming first story detection with application to twitter HLT ‘10 

Feature-pivot :

some bursty features of hidden events would show an sharply increase than excepted when an event is happening.

  Chen Lin .etc Generating event storylines from microblogs CIKM’12 Chenliang Li .etc Twevent: segment-based event detection from tweets CIKM’12

Bursty Term Detection

#MH370 lives Southern #eat Tmr korean Sleep indian save

Bursty Term Grouping

MH370 Southern indian Korean save lives #eat tmr sleep

Candidate Event Filtering

MH370 Southern indian Korean save lives #eat tmr sleep

The shortness of existing Works

5/13 

Existing works mostly focus on event detection and extraction

without any post-processing

.



The lack of a well-established analysis for an event limits its utility.

Popularity prediction

Many challenging research problems

Topic clustering Event summarization Event contextualization … …

Popularity Prediction



User behaviors like replying and retweeting provide new mechanism for information diffusion.



Topic popularity can be measured by the size of involved users.



Prediction of topic popularity can not only have a recognize of event trends, but also remove noisy and spam bursty topics at an early stage.

 The challenges of this problem come from the uncertainty in information diffusion path and insufficient information at the early stage of a burst, offering little clue as to whether the detected bursty topic would sustain its virality or simply die down quickly. 6/13

Topic Clustering

 Due to the existence of many

duplicate

and

semantically close topics

, it is desirable to remove duplicate topics and group together topics to form a coherent event.

 A single-pass incremental clustering problem.

The essential problem of clustering is define a metric to measure the similarity between topic and exiting event(cluster).

 Simply based on co-occurrence of bursty keywords likely to be absent because they are much shorter compared to formal document and largely depend on the detection algorithm.

7/13

Topic Clustering cont.



Measure the similarity between topic and event from the following perspective: Content Similarity User Similarity Time Similarity Volume Similarity Entity Similarity How to combine those individual similarities ?

 An intuitive approach to combine those individual similarities is using different weights. However, the number of different weight combination is huge and we don’t have some prior knowledge about the weights. 

Learning weighting scheme through a classification model to form a unified similarity metric.

8/13

Event Summarization

9/13  Traditional summarization methods mainly focus on content summarization to extract representative tweets from an event relevant tweet set.

 Besides, we propose to summarize this event from structure and user perspective.

 A fundamental problem is

Sub-event Detection

Sub-event Detection

10/13    An event usually contains some more fine-grained stages and detection algorithms can’t detect all stage of an event generally.

Detection of all possible sub-events provide a basis for study some deeply properties of event. Both volume [2,3] and content [1] of this event provide a signal to sub-event occurrence. Compared against volume curve, we think that the content is more trustful due to the volume curve is largely depended on the retrieval results and user publish pattern.

 To solve this problem, we should overcome the following two difficulties:

Retrieval

: How to retrieve high-quality tweets about this event?

Sub-event

: How to detect all sub-events in a online manner?

[1] Akshaya Iyengar .etc Content-based prediction of temporal boundaries for events in twitter. Socialcom 2011 [2] Jeffrey Nichols .etc Summarizing sporting events using twitter IUI’2012 [3] Arkaitz Zubiaga .etc Towards real-time summarization of scheduled events from twitter streams

1. How to retrieve high-quality tweets about this event?

11/13 

Common practice

: using event keywords as a query to search in tweet collections.  The following three factors remains a large obstacle to employ standard retrieval methods: -A. Seemingly relevant tweets with good textual quality might not be truly relevant to the event; - B. Tweets highly relevant to the event might not contain any of the query keywords; - C. Query keywords might can’t represent the event comprehensively and even provide a noisy indicator.

 To solve A , besides relevance score returned by

Elasticsearch

, we can integrate other features like tweet-specific features, publisher features to reorder the search result.

 To solve B and C , we can use event keyword expansion, take the

burstiness

of term consideration besides traditional TF-IDF value during the expand term selection.

[1] into [1] Metzler D, Cai C, Hovy E. Structured event retrieval over microblog archives[C] ACL 2012: 646-655.

2. How to detect all sub-events in a online manner?

12/13 

Topic Model :

high complexity and its output are usually a general topic.



Event Boundary Prediction :

can only divide this event into before, during, after.

 We propose to firstly divide the event duration into equal-sized non-overlapping timespan, then merge adjacent timespans into an sub-event along a chronological order.

 Finally, we should verify sub-event’s

popularity and reliability

to filter spurious sub-events. The reliability can measured by total followers of all publishers while the popularity can reflected by the number of retweets.

Event Contextualization



Find a representative picture of this event.



Find some related news about this event.

13/13