Mobius North: GD - University of South Florida

Transcript Mobius North: GD - University of South Florida

GDC: Group Discovery
using Co-location Traces
Steve Mardenfeld
Daniel Boston
Susan Juan Pan
Quentin Jones†
Adriana Iamntichi‡
Cristian Borcea
Department of Computer Science,
New Jersey Institute of Technology
†Department
of Information Systems, NJIT
‡Department of Computer Science, USF
Physical Groups

Informally: groups of people that meet face to face
◦ Formal definition: Homans’ sociology book “The Human
Group”

Groups can be used in social or socially aware
applications
◦ Recommender systems: recommend concerts to people
who go to concerts together
◦ Data forwarding in delay-tolerant ad hoc networks: give
priority to members of same group as destination when
selecting next hop
How to detect groups automatically?
Group Detection Using Location Traces
Users carry mobile phones and upload location
to central server
 Server analyzes location traces to detect groups
 In previous work, we developed an algorithm
for group/place detection

◦ Achieved 96% accuracy with low false positives
Problems:
Location privacy
Battery power
GDC: Use Bluetooth Co-location Traces
INTERNET

User
Seen Time
A
B
B
A
B
A
C
B
1:00
1:05
1:05
1:07
A
C
1:07
Advantages
◦
◦
◦
◦
Improved location privacy
A
Low power consumption B
Practicality due to Bluetooth ubiquity in mobile phones
Accuracy due to Bluetooth transmission range
4
Challenges

Attendance at a group is variable

People may be merely passing near a group, not
remaining part of it

Group members spend different lengths of time
with the group

Sampling frequency and user mobility can affect
data completeness

Each user may have a different perspective on
the same meeting
5
Outline

GDC Algorithm

User Study Results

Distributed GDC

Conclusions
6
GDC in a Nutshell

Transform raw Bluetooth records into meeting
records between pairs of users

Discover and record all combinations of users
appearing at the same meeting (user clusters)

Resolve differences in user perspectives on
shared clusters

Select all significant clusters and output as user
groups
7
Creating Pair-wise Meeting Records
User
mak43
User
djb38
User
jp238
Time
Stamp
User
With
Time
Stamp
User
With
Time
Stamp
User
With
11:01:30
Time
Userjp238User
User11:01:30 mak43
Stamp
djb38With
User11:02:01
Start djb38
End
11:02:01
djb38
jp238
With
Time
Time
11:02:15
11:02:01
djb38 jp238
mak43
jp238 11:04:14
11:01:30 11:04:18
11:10:10
11:04:14 djb38 djb38
jp238
jp238
11:07:50
djb38 11:04:18
11:01:30 11:10:10
11:07:05
11:04:14 djb38 jp238
mak43
djb38 11:05:02
11:01:30 11:04:14
11:07:05 djb38 jp238
mak43
11:07:05
djb38
11:07:50
11:10:10
11:01:30Usermak43User
Time
User
djb38
Stamp
11:02:01
jp238 With
User11:02:01
Startmak43
End
11:02:15
jp238
djb38
With
Time
Time
11:02:15
11:02:15
jp238 jp238mak43
jp238 11:04:14
11:02:01 11:04:14
11:07:50
11:05:02 jp238 jp238mak43
mak4311:04:14
11:01:30 11:04:14
11:07:05
11:07:50 jp238 mak43
djb38
11:07:05
11:07:50
jp238 mak43
mak43
11:07:50
jp238
11:01:30 User
mak43 User
Time
User
jp238With
Stamp
11:02:01
djb38
User
Startdjb38jp238
End
11:01:30
11:02:15mak43
With
Time
Time
11:02:15mak43
mak43djb38
11:01:30
mak43
11:01:30
11:10:10
11:04:14
djb3811:05:02
11:04:18
mak43
jp238
djb3811:04:18
mak43
11:02:01
11:07:50
11:07:05
mak4311:10:10
11:10:10 mak43
jp238
djb3811:05:02
11:02:01
mak4311:04:14
11:07:50
djb38
jp238
11:07:50
mak43
jp238
11:10:10
mak43
Decreasing Meeting Granularity (MG) from 5 min to
2 ½ min produces noticeable changes
8
Creating User Clusters
User
User
With
mak43
Start
Time
End
Time
User
djb38
User
With
Start
Time
End
Time
User
jp238
User
With
Start
Time
End
Time
jp238
11:01:30 11:10:10
jp238
11:02:01 11:07:50
mak43
11:01:30 11:10:10
djb38
11:01:30 11:07:05
mak43
11:01:30 11:07:05
djb38
11:02:01 11:07:05
User
mak43
User
djb38
User
jp238
Users With
Time Spent
Users With
Time Spent
Users With
Time Spent
jp238, djb38
00:05:35
jp238, mak43 00:05:04
djb38, mak43 00:05:04
jp238
00:08:40
jp238
00:05:49
djb38
00:05:04
djb38
00:05:35
mak43
00:05:35
mak43
00:08:40
9
Creating Global Clusters

Resolve Perspective Differences
◦ Use Minimum Group Time (MGT)
◦ Use Minimum Group Meeting Frequency (MGMF)
User
mak43
User
djb38
User
jp238
Users With
Time Spent
Users With
Time Spent
Users With
Time Spent
jp238, djb38
00:05:35
jp238, mak43
00:05:04
djb38, mak43
00:05:04
jp238
00:08:40
jp238
00:05:49
djb38
00:05:04
djb38
00:05:35
Cluster
mak43
00:05:35
mak43
Minimum Time
Min. Frequency
djb38, jp238, mak43 00:05:04
1
djb38, mak43
00:05:35
1
djb38, jp238
00:05:04
1
jp238, mak43
00:08:40
1
00:08:40
10
Selecting the User Groups

Identify and remove subgroups of significant groups
◦ Keep a subgroup if it meets double the time of the group
that includes it
Cluster
Minimum Time
djb38, jp238, mak43 00:05:04
djb38, mak43
00:05:35
jp238, mak43
00:10:40
Group
Min.Time
djb38, jp238,
mak43
00:05:04
jp238, mak43
00:10:40
11
Complexity Analysis
R - total number of Bluetooth records
 N - total number of users in the dataset
 L - maximum number of users in a group

◦ Small value because relatively few users are in the
transmission range (10m)
◦ Our experiments: max = 15, avg = 6.8
Creating Pair-Wise Meeting Records
O(R)
Creating User Clusters
O(R * 2L)
Creating Global Clusters
O(N * 2L)
Selecting the User Groups
O(R * 2L)
Total Complexity
O(R * 2L), R>> N
12
Evaluation

Goals
◦ Analyze effect of group meeting frequency and time
◦ Compare GDC and K-Clique
 K-Clique uses a time threshold to select graph edges and
analyzes the graph for k-cliques

Experiments
◦ Collect data from mobile phones carried by 100+
volunteer students on campus for one month
◦ Run GDC and K-Clique on collected data
 Also tested on Reality Mining data from MIT
◦ Ask users to rank groups using Likert Scale
 1 to 5, 5 is best
13
Data Collection Details
18
120.00%
16
Frequency
Cumulative %
Number of Users
14
100.00%
12
80.00%
10
60.00%
8
6
40.00%
4
20.00%
2



25
0
30
0
M
or
e
Number of Hours
75
10
0
12
5
15
0
17
5
20
0
70
65
55
60
50
45
40
30
35
25
15
20
8
10
6
4
2
0.00%
0
0
78 users each contributed less than 24 hours of recorded data
Sparse data: random volunteers, many students are commuters
Demographics: 72% male, 28% female, 25% graduate, 75%
undergraduate
14
Effect of Meeting Time and Frequency
16
25
Rating Frequency
20
15
10
Very Bad
Bad
OK
Good
Very Good
14
12
Rating Frequency
Very Bad
Bad
Ok
Good
Very Good
10
8
6
4
5
2
0
0
1
2
3-4
Group Meeting Frequency
5 and Greater
2000-3000
3000-5000
5000-7000
>7000
Minimum Group Time
Detection accuracy increases significantly with
meeting frequency and total meeting time
15
GDC vs. K-Clique
0.35
K-Clique
Percentage of Total Ratings
0.3
GDC
GDC:
0.25
MGT = 2000s
0.2
MGMF = 2
0.15
K-Clique:
0.1
Threshold 2000s
0.05
0
Very Bad
Bad
OK
Good
Very Good
Rating Category

Overall, GDC groups rated 30% better than the
popular K-Clique algorithm
◦ GDC groups are guaranteed to meet
◦ Not all K-Clique groups meet

Some GDC groups are rated poorly because
members don’t know their names
16
GDC Groups: NJIT Dataset vs. Reality
Mining Dataset
Percentage of Total Groups
0.7
0.6
Reality Mining
0.5
NJIT Datset
0.4
0.3

0.2

NJIT: MGT = 2000s, MGMF = 1
Reality Mining: MGT = 18000s, MGMF = 9
(normalized for 9 months)
0.1
0
3
4
5
6
7
8
9
10
11
12
13
Group Size

Group distributions as a function of size are
relatively similar despite the fact that Reality Mining
is a denser dataset
17
Outline

GDC Algorithm

User Study Results

Distributed GDC

Conclusions
18
Distributed GDC (D-GDC)

GDC executed on the phones

Benefits
◦ Better privacy
 Avoid “Big Brother” scenario
 Ability to control message exchange on a per-case basis
◦ Resiliency: no bottleneck & no single point of failure
◦ Flexibility: each user controls how often to run
D-GDC
19
D-GDC Implementation

Collect Bluetooth records locally through
message exchange
◦ No global aggregation like in GDC

Control exchange with heuristic policies
◦ These policies can be specified by users
◦ Allows greater individual privacy control
Run remainder of GDC device-local
 Evaluated using replay simulation over our real
traces

20
Preliminary Results
D-GDC
Local only
Average similarity
77.33%
58.24%
Groups with similarity > 90%
59.77%
19.14%
Overall similarity: compute similarity of each user’s
GDC groups against the closest matches in D-GDC
and average the results
 Compared D-GDC with a version running only on
data collected locally by phones

◦ D-GDC performs significantly better than local-only
version
21
Conclusion

Physical groups enable new socially-aware
features in applications

GDC: practical, high-accuracy, no location
collection
◦ Validated by users and outperforms K-Clique by 30%
◦ Higher accuracy can be achieved by increasing
frequency and time parameters

A decentralized version improves privacy and
produces promising results
22
Thank You!

Mobius project:
http://www.cs.njit.edu/~borcea/mobius/

Acknowledgement: NSF grants CNS-0831753
and CNS-0834585
23

Mobius North: GD - University of South Florida

Transcript Mobius North: GD - University of South Florida

Directory