Research poster 48 x 48 - H

Download Report

Transcript Research poster 48 x 48 - H

Mining Cross-network Association for
YouTube Video Promotion
Ming Yan, Jitao Sang, Changsheng Xu*.
1 Institute
of Automation, Chinese Academy of Sciences, Beijing, 100190.
2 China-Singapore Institute of Digital Media, Singapore, 139951.
{ming.yan, jtsang, csxu}@nlpr.ia.ac.cn
•
Large quantities of videos are consumed in
YouTube and the trend is growing year by year.
YouTube exhibits limited propagation efficiency
and many videos remain unknown to the wide
public due to the limited internal mechanism.
External referrers such as social media
websites arise to be important sources to lead
users to YouTube videos, among which Twitter
has grown to be the top referrer recently.
The followee-follower and user-centric
architecture has distinguished Twitter with
significant information propagation efficiency.
Motivation: For specific YouTube video, to
identify proper Twitter followees with goal
to maximize video dissemination to the
followers.
•
•
•
•
2. Regression-based Association
Generation process
•
•
•
Sample 𝜃~𝐷𝑖𝑟 𝜃 𝛼
For each textual word 𝑤𝑚 , 𝑚 ∈ 1, … , 𝑀
 Sample 𝑧𝑚 ~𝑀𝑢𝑙𝑡𝑖(𝜃)
 Sample 𝑤𝑚 ~𝑝(𝑤|𝑧𝑚 , 𝜇, 𝜎) from a
multivariate Gaussian distribution
conditioned on 𝑧𝑚
For each video keyframe 𝑓𝑛 , 𝑛 ∈ 1, … , 𝑁
 Sample 𝑦𝑛 ~𝑈𝑛𝑖𝑓 1, … , 𝑀
 Sample 𝑓𝑛 ~𝑝 𝑓 𝑦𝑛 , 𝒛, 𝛽 from a
multinomial distribution conditioned on
the 𝑧𝑦𝑚 factor.
•
min 𝑈𝑜𝑇 − 𝐴𝑈𝑜𝑌 + 𝜆1 ||𝐴||𝑞
𝐴
•
•
Introduction
For our video promotion application, we claim
that only large exposure to more audiences is
not enough, what video promotion cares is the
number of “effective” audiences, who are
likely to show interest to the target video and
with higher probability to take subsequent
consuming actions. Therefore, the core of
video promotion should both match with the
interest of the Twitter followers and be
cost-effective.
Since the properness of Twitter followee is
decided by the followers, we are interested in
investigating into the followee-follower
architecture in Twitter. Therefore, we represent
each Twitter user (document) with all his/her
followees (words) and apply the standard LDA
on the user social graph for topic modeling.
Topic Number Selection
Perplexity 𝒟test = exp{−
Advantage
•
•
•
•
}
•
Figure 4. Perplexities for different topic
numbers on YouTube and Twitter
Figure 5 and 6 show the visualization of some of
the discovered topics in YouTube and Twitter,
respectively.
Figure 1. Problem Illustration.
2
𝑌
𝑌
𝑇 − 𝐷 𝑇 𝑆 𝑇 |2
min
−
𝐷
𝑆
+
|
𝑈
2
2
𝐷𝑌 ,𝐷𝑇 ,𝑆 𝑌 ,𝑆 𝑇
𝑌 || +𝜆 ||𝑆 𝑇 ||
𝜆3 ||𝑆𝑜 ||1 + 𝜆4 ||𝑆𝑛𝑜𝑛
1
5
𝑛𝑜𝑛 1
𝑠. 𝑡. 𝒅𝑌 ≤ 1, 𝒅𝑇 ≤ 1, ∀𝑑 ∈ 𝐷
𝑈𝑌
𝑠 ∗ = min ||𝒖𝑌 − 𝐷𝑌 𝑠||22 + 𝜆||𝑠||1
𝑠
•
•
Cross-network Topic Association
Innovation
•
To address the discrepancy issue, we propose a
solution that first aggregates YouTube video
distribution to user level, and then exploit the
overlapped users among different networks as
bridge for association mining.
•
if the same group of users heavily involve with
topic A in network X and topic B in network Y, it
is very likely that topic A and B are closely
associated.
𝑣∈𝑉𝑢
 How to define the Ground-Truth (GT)
properness for each video-followee pair
in the training set?
Both the follower interest and the cost to ask
Twitter followee for help should be considered.
Here, we treat the follower number of the
twitter followee as the virtual cost, i.e., more
popular a Twitter followee, more cost should
be paid to ask him/her for help. To consider
both of these, the Ground-Truth (GT)
properness is defined as below:
𝑁𝑣 𝑓 + 𝑁𝑣 (𝑤)
∙ 𝑝(𝑧𝑘𝑌 |𝑣)
𝑁 𝑓 + 𝑁(𝑤)
𝑁𝑣 𝑓 , 𝑁𝑣 𝑤 : the total number of keyframes
and words in video 𝑣.
𝑁 𝑓 , 𝑁(𝑤) : the total number of keyframes
and words in 𝒖’s video set 𝑉𝑢 .
To discover the latent structure within YouTube
video and Twitter user spaces, and facilitate the
subsequent analysis and applications in topic
Various Approaches for Association Mining
level. Through this stage, each YouTube video via Overlapped Users
and Twitter user can be represented as
Goal
distributions in the derived corresponding topic
•
To enable topic distribution transfer between
spaces.
different networks, i.e., given user’s topical
interest in YouTube videos, we can infer his/her
YouTube Video Topic Modeling
most probably followed Twitter followee topics.
• The video topics are expected to span over
1. Transition Probability-based Association
both textual and visual spaces. We introduce
a modification to the multi-modal topic model, • We assume YouTube and Twitter user set share
Corr-LDA, as depicted in the figure 3.
the overlapped users 𝑈𝑜 = 𝑈 𝑌 ∩ 𝑈 𝑇 . Viewing as
a probabilistic transition problem, the topic
association can be calculated by aggregating
over all the overlapped users:
𝑎𝑖𝑗 = 𝑝 𝑧𝑗𝑇 𝑧𝑖𝑌 =
𝑝(𝑧𝑗𝑇 |𝑢) ⋅ 𝑝(𝑢|𝑧𝑖𝑌
𝑢∈𝑈𝑜
𝐾𝑌
Figure 3. The graphical representation of
inverse Corr-LDA
p 𝑧𝑗𝑇 𝑢𝑡 =
𝑎𝑖𝑗 ⋅ 𝑝(𝑧𝑖𝑌 |𝑢𝑡
𝑖=1
With the topical distribution transfer enabled,
in this third stage, we first transfer the test
video distribution to the Twitter topic space.
Then we are devoted to matching the
YouTube video with Twitter followee in the
same Twitter topic space in a ranking svmbased scheme.
Two critical issues should be addressed when
utilizing the ranking-svm training scheme:
 How to extract the video-followee pair
features?
We define the video-followee pair features as
the vector product between the transferred
video and Twitter followee distributions:
𝑇
𝑇
𝑇
𝜙 𝒗𝑇𝑡 , 𝒖𝑇 = 𝒗𝑇𝑡 ⨀ 𝒖𝑇 = {𝑣𝑡,1
∙ 𝑢1𝑇 , … , 𝑣𝑡,𝐾
∙ 𝑢𝐾
}
YouTube User-Topic Distri. Aggregation
𝑝 𝑧𝑘 𝑢𝑖 =
We utilize Mean Abosolute Error (MAE) on half
of the overlapped users to evaluate the
performance of topical distribution transfer
between YouTube and Twitter. The result is
shown in Figure 7.
Twitter Referrer Identification
Figure 5. Visualization of discovered
YouTube topics
•
•
𝒖𝑇 = 𝐷𝑇 𝑠 ∗
Evaluation of Transfer Error
Assumption
Goal
+
𝑌 , 𝑆𝑇
• This objective is convex to 𝐷𝑌 , 𝐷𝑇 , 𝑆𝑜 , 𝑆𝑛𝑜𝑛
𝑛𝑜𝑛
respectively, we design an iterative algorithm
to obtain the optimal 𝐷𝑌 , 𝐷𝑇 .
• With the derived 𝐷𝑌 , 𝐷𝑇 , topical distribution
transfer between YouTube and Twitter spaces
can be enable by:
Two Challenges
• The heterogeneous knowledge
association between YouTube video and
Twitter followee;
• How to define the “properness” of
candidate Twitter followee for a specific
YouTube video.
Heterogeneous Topic Modeling
We assume that the latent structure is actually
user attribute, i.e., it is the same user’s unique
attribute values that give birth to his/her differen
activities and thus the cross-network topic
distributions.
A YouTube factor and a twitter factor are
coupled to the same user attribute, and the
same user should have identical coefficients
when projected to the coupled user factors.
Objective Function
Discovered Topic Visualization
Figure 2. The framework
Try to discover the shared latent structure
behind YouTube and Twitter topic spaces on
user level.
Assumption
•
To address the challenges one by one, we
propose a three-stage framework as our
solution:
• Heterogeneous Topic Modeling: To
discover the latent structure within YouTube
video and Twitter user spaces, respectively;
• Cross-network Topic Association: To
address the discrepancy issue between the
heterogeneous YouTube video and Twitter
user spaces by mining cross-network topic
association on a collective user-level.
• Referrer Identification:To define the
“properness” of candidate Twitter followee
for a specific YouTube video and match
video to followee in a ranking-based
method.
Don’t need an explicit association matrix
Non-linear
Non-overlapped users can be also utilized
Innovation
𝑑∈𝒟𝑡𝑒𝑠𝑡 ln 𝑝(𝒘𝑑 )
𝑑∈𝒟𝑡𝑒𝑠𝑡 𝑁𝑑
When q = 1, this is a lasso problem and can
be effectively solved by feature-sign search
algorithm.
When q = 2, this is a ridge regression
problem and have an analytical solution.
3. Latent Attribute-based Association
Twitter Followee Topic Modeling
•
To deal with the noisy user distribution issue, a
regression-based optimization approach is
proposed:
𝑓𝑜𝑙𝑙𝑜𝑤𝑒𝑒
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑣, 𝑢 =
|𝑈𝑣 ∩ 𝑈𝑢
|
𝑓𝑜𝑙𝑙𝑜𝑤𝑒𝑒
|𝑈𝑢
|
𝑓𝑜𝑙𝑙𝑜𝑤𝑒𝑒
|𝑈𝑣 ∩ 𝑈𝑢
|
𝑟𝑒𝑐𝑎𝑙𝑙 𝑣, 𝑢 =
|𝑈𝑣 |
𝐺𝑇 − 𝑝𝑟𝑜𝑝𝑒𝑟𝑛𝑒𝑠𝑠 𝑣, 𝑢
2
=
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑣, 𝑢 −1 + 𝑟𝑒𝑐𝑎𝑙𝑙 𝑣, 𝑢
−1
𝑈𝑣 : the set of users showing interest in video v;
𝑓𝑜𝑙𝑙𝑜𝑤𝑒𝑒
𝑈𝑢
: the follower set of u.
Evaluation for the final video promotion
•
We utilize NDCG@5 to evaluate our final video
promotion problem. The result is shown in
Figure 8. It demonstrates the advantages of our
proposed framework to promote YouTube
videos via Twitter network compared with other
baselines.
0.014
0.0135
0.0132
0.0129
MAE
0.013
0.0125
Figure 6. Visualization of discovered Twitter topics
0.0118
0.012
0.0115
0.011
0.0105
Regression_ Regression_ LA_overlap
0.1213
0.1233
0.1271
Regression
+Weighted
LA_all
+Direct
0.1325
0.1062
0.0122
TP
0.15
LA_all
Figure 7. MAE for distribution transfer in Stage 2.
NDCG@5
0.0135
Figure 8. NDCG@5 for
different settings
0.1
0.0742
0.05
0
Random
Popularity
Regression
+Direct
LA_all
+Weighted