CatchSync! - Meng Jiang

Download Report

Transcript CatchSync! - Meng Jiang

CATCHSYNC: CATCHING SYNCHRONIZED BEHAVIOR IN LARGE DIRECTED GRAPHS Meng Jiang, Tsinghua University, Beijing, China Joint work with Peng Cui, Alex Beutel, Christos Faloutsos and Shiqiang Yang August 26, 2014 – NYC, USA

2

Fraud Detection: Graph Analysis Problem [www.buyfollowz.org] [buymorelikes.com]

3

Fraud Detection: Graph Analysis Problem [buycheaplikes.com] [reviewsteria.com]

4

Our Goals • Given: A graph (large-scale, directed, etc.) • Find: Frauds = Anomalous edges • Goals: • G1. Find patterns that distinguish from normal users fraudsters • G2. Design algorithms that catch fraudsters

OUTLINE

1. Background

2. Fraudulent Pattern 3. The Algorithm 4. Experiments

5

6

Anomalies in Degree Distributions • Power-law distribution

DBLP Author

-publication

Flickr User

-user

Twitter Who

-follows-whom [konect.uni-koblenz.de/networks/]

Anomalies in Degree Distributions

7 41M 2009 3.17M

0.41M

d=20

8

Linear Classifier with “Degree”: Fail +1 (Fraud) =20?

3.17M

0.41M

Label (+1,-1) Out-degree d=20

classifier ×

9

Graph Structure Distorted

117M 2011 1.91M

0.44M

d=64

10

Traditional Fraud Detection +1 (Fraud) Big? Small?

Big?

Big?

Big?

Label (+1,-1)

classifier

Out-degree In-degree #tweet #url in tweets #hashtag in tweets

Content-based features

Empty Profile?

11

Few Followers?

12

Many Followings?

13

14

Content: Unavailable? Look Normal?

0, 0, 0… sorry

Label (+1,-1)

classifier

Out-degree In-degree #tweet #url in tweets #hashtag in tweets

Content-based features

Behavior is the Key Monetary Incentive Content what they appear behave to Behavior/ Links what they have to behave

15

OUTLINE

1. Background

2. Fraudulent Pattern

3. The Algorithm 4. Experiments

16

17

Behavior-based Features Follower behavior Followee behavior ≈ Out-degree 1 st left singular vector ( Hubness ) 2 nd … left singular vector ≈ In-degree 1 st right singular vector ( Authoritativeness ) 2 nd … right singular vector

Behavior-based Feature Space Follower behavior Followee behavior

18

Fraudulent Behavior Patterns

19

Fraudulent Behavior Patterns

20

Fraudulent Behavior Patterns

21

Fraudulent Behavior Patterns

22

Fraudulent Behavior Patterns

23

Fraudulent Behavior Patterns • Synchronized • Abnormal

24

OUTLINE

1. Background 2. Fraudulent Pattern

3. The Algorithm

4. Experiments

25

Synchronicity and Normality • Synchronicity

26

Synchronicity and Normality • Normality

27

Synchronicity-Normality Plot

28

29

Theorem •

For any distribution, there is a

parabolic lower limit

in the synchronicity-normality plot .

synchronicity •

Proof.

See our paper  normality

CatchSync Algorithm • Distance-based anomaly detection • Fraudsters • Big synchronicity • Small normality • Away from the densest

30

OUTLINE

1. Background 2. Fraudulent Pattern Mining 3. The Algorithm

4. Experiments

31

32

Experiments • Q1: Does CatchSync remove anomalies?

• Degree distribution • Feature space • Q2: Is CatchSync catching actually fraudulent users? • Q3: Is CatchSync robust?

33

Q1: Does CatchSync Remove Anomalies?

2009 3.17M

41M 0.41M

d=20

Q1: Does CatchSync Remove Anomalies?

34 117M 2011

Before CatchSync Follower behavior Followee behavior

35

After CatchSync Follower behavior Followee behavior

36

Q2: Is CatchSync Catching Actually Fraudulent Users?

37 173/1,000 237/1,000

Q2: Is CatchSync Catching Actually Fraudulent Users?

38 CatchSync +SPOT CatchSync

SPOT OutRank 0 0,2

0.813

0.751

0,4 0,412 0,6 0.597

0,8 1

Q2: Is CatchSync Catching Actually Fraudulent Users?

39 CatchSync +SPOT CatchSync

SPOT OutRank 0 0,2 0.377

0,4 0,6

0,785 0,694

0,653 0,8 1

Q2: Is CatchSync Catching Actually Fraudulent Users?

40 Recall = 80% Precision in Twitter

83.5%

Precision in Tencent Weibo

79.4%

41

Q3: Is CatchSync Robust to Camouflage?

Target Popular camouflage Random camouflage

42

Q3: Is CatchSync Robust to Camouflage?

43

Q3: Is CatchSync Robust to Camouflage?

44

Q3: Is CatchSync Robust to Camouflage?

Popular camouflage Random camouflage

45

Conclusion • Goals • G1. Find patterns that distinguish fraudulent user behavior from normal behavior •

A1: Synchronized & Abnormal!

• G2. Design algorithms that catch fraudsters •

A2: CatchSync!

Remove spikes

Content free

Robust to camouflage

Questions?

Meng Jiang [email protected]

http://www.meng-jiang.com

46