Transcript CatchSync! - Meng Jiang
CATCHSYNC: CATCHING SYNCHRONIZED BEHAVIOR IN LARGE DIRECTED GRAPHS Meng Jiang, Tsinghua University, Beijing, China Joint work with Peng Cui, Alex Beutel, Christos Faloutsos and Shiqiang Yang August 26, 2014 – NYC, USA
2
Fraud Detection: Graph Analysis Problem [www.buyfollowz.org] [buymorelikes.com]
3
Fraud Detection: Graph Analysis Problem [buycheaplikes.com] [reviewsteria.com]
4
Our Goals • Given: A graph (large-scale, directed, etc.) • Find: Frauds = Anomalous edges • Goals: • G1. Find patterns that distinguish from normal users fraudsters • G2. Design algorithms that catch fraudsters
OUTLINE
1. Background
2. Fraudulent Pattern 3. The Algorithm 4. Experiments
5
6
Anomalies in Degree Distributions • Power-law distribution
DBLP Author
-publication
Flickr User
-user
Twitter Who
-follows-whom [konect.uni-koblenz.de/networks/]
Anomalies in Degree Distributions
7 41M 2009 3.17M
0.41M
d=20
8
Linear Classifier with “Degree”: Fail +1 (Fraud) =20?
3.17M
0.41M
Label (+1,-1) Out-degree d=20
classifier ×
9
Graph Structure Distorted
117M 2011 1.91M
0.44M
d=64
10
Traditional Fraud Detection +1 (Fraud) Big? Small?
Big?
Big?
Big?
Label (+1,-1)
classifier
Out-degree In-degree #tweet #url in tweets #hashtag in tweets
Content-based features
Empty Profile?
11
Few Followers?
12
Many Followings?
13
14
Content: Unavailable? Look Normal?
0, 0, 0… sorry
Label (+1,-1)
classifier
Out-degree In-degree #tweet #url in tweets #hashtag in tweets
Content-based features
Behavior is the Key Monetary Incentive Content what they appear behave to Behavior/ Links what they have to behave
15
OUTLINE
1. Background
2. Fraudulent Pattern
3. The Algorithm 4. Experiments
16
17
Behavior-based Features Follower behavior Followee behavior ≈ Out-degree 1 st left singular vector ( Hubness ) 2 nd … left singular vector ≈ In-degree 1 st right singular vector ( Authoritativeness ) 2 nd … right singular vector
Behavior-based Feature Space Follower behavior Followee behavior
18
Fraudulent Behavior Patterns
19
Fraudulent Behavior Patterns
20
Fraudulent Behavior Patterns
21
Fraudulent Behavior Patterns
22
Fraudulent Behavior Patterns
23
Fraudulent Behavior Patterns • Synchronized • Abnormal
24
OUTLINE
1. Background 2. Fraudulent Pattern
3. The Algorithm
4. Experiments
25
Synchronicity and Normality • Synchronicity
26
Synchronicity and Normality • Normality
27
Synchronicity-Normality Plot
28
29
Theorem •
For any distribution, there is a
parabolic lower limit
in the synchronicity-normality plot .
synchronicity •
Proof.
See our paper normality
CatchSync Algorithm • Distance-based anomaly detection • Fraudsters • Big synchronicity • Small normality • Away from the densest
30
OUTLINE
1. Background 2. Fraudulent Pattern Mining 3. The Algorithm
4. Experiments
31
32
Experiments • Q1: Does CatchSync remove anomalies?
• Degree distribution • Feature space • Q2: Is CatchSync catching actually fraudulent users? • Q3: Is CatchSync robust?
33
Q1: Does CatchSync Remove Anomalies?
2009 3.17M
41M 0.41M
d=20
Q1: Does CatchSync Remove Anomalies?
34 117M 2011
Before CatchSync Follower behavior Followee behavior
35
After CatchSync Follower behavior Followee behavior
36
Q2: Is CatchSync Catching Actually Fraudulent Users?
37 173/1,000 237/1,000
Q2: Is CatchSync Catching Actually Fraudulent Users?
38 CatchSync +SPOT CatchSync
SPOT OutRank 0 0,2
0.813
0.751
0,4 0,412 0,6 0.597
0,8 1
Q2: Is CatchSync Catching Actually Fraudulent Users?
39 CatchSync +SPOT CatchSync
SPOT OutRank 0 0,2 0.377
0,4 0,6
0,785 0,694
0,653 0,8 1
Q2: Is CatchSync Catching Actually Fraudulent Users?
40 Recall = 80% Precision in Twitter
83.5%
Precision in Tencent Weibo
79.4%
41
Q3: Is CatchSync Robust to Camouflage?
Target Popular camouflage Random camouflage
42
Q3: Is CatchSync Robust to Camouflage?
43
Q3: Is CatchSync Robust to Camouflage?
44
Q3: Is CatchSync Robust to Camouflage?
Popular camouflage Random camouflage
45
Conclusion • Goals • G1. Find patterns that distinguish fraudulent user behavior from normal behavior •
A1: Synchronized & Abnormal!
• G2. Design algorithms that catch fraudsters •
A2: CatchSync!
•
Remove spikes
•
Content free
•
Robust to camouflage
Questions?
Meng Jiang [email protected]
http://www.meng-jiang.com
46