Data-driven Modeling and Design of Networked Mobile Societies: A Paradigm Shift for Future Social Networking Ahmed Helmy Computer and Information Science and Engineering (CISE)

Download Report

Transcript Data-driven Modeling and Design of Networked Mobile Societies: A Paradigm Shift for Future Social Networking Ahmed Helmy Computer and Information Science and Engineering (CISE)

Data-driven Modeling and Design of
Networked Mobile Societies:
A Paradigm Shift for Future Social Networking
Ahmed Helmy
Computer and Information Science and Engineering (CISE) Department
University of Florida
[email protected] , http://www.cise.ufl.edu/~helmy
Founder & Director: Wireless Mobile Networking Lab http://nile.cise.ufl.edu
Funded by:
Networked Mobile Societies Everywhere, Anytime
Transportation/Vehicular Networks
Sensor Networks
Disaster & Emergency alerts
Mobile Ad hoc, Sensor and Delay Tolerant Networks
Emerging Behavior-Aware Services
• Tight coupling between users, devices
– Devices can infer user preferences, behavior
– Capabilities: comm, comp, storage, sensing
• New generation of behavior-aware protocols
– Behavior: mobility, interest, trust, friendship,…
– Apps: interest-cast, participatory sensing, crowd
sourcing, mobile social nets, alert systems, …
New paradigms of communication?!
Paradigm Shift in Protocol Design
Used to:
Design general purpose protocols
Evaluate using models
(random mobility, traffic, …)
Deployment context: Modify to
improve performance and failures for
specific context
– May end up with suboptimal performance or failures due to lack
of context in the design
Propose to:
Analyze, model deployment context
Design ‘application class’-specific
parameterized protocols
Utilize insights from context analysis
to fine-tune protocol parameters
Problem Statement
• How to gain insight into deployment context?
• How to utilize insight to design future services?
Approach
• Extensive trace-based analysis to identify dominant
trends & characteristics
• Analyze user behavioral patterns
– Individual user behavior and mobility
– Collective user behavior: grouping, encounters
• Integrate findings in modeling and protocol design
– I. User mobility modeling – II. Behavioral grouping
– III. Information dissemination in mobile societies, profile-cast
The TRACE framework
x1,1 L
x1,n
M O
M
x t,1 L
x t,n
MobiLib
Represent
Trace
Analyze

Characterize, Cluster
Employ
(Modeling, Protocol Design)
Community-wide Wireless/Mobility Library
• Library of
– Measurements from Universities, vehicular networks
– Realistic models of behavior (mobility, traffic, encounters)
– Simulation benchmarks - Tools for trace data mining
• Available libraries:
– CRAWDAD (Dartmouth, ‘05-) crawdad.cs.dartmouth.edu
MobiLib (USC & UFL, ’04-) nile.cise.ufl.edu/MobiLib
• 60+ Traces from: USC, Dartmouth, MIT, UCSD, UCSB, UNC, UMass,
GATech, Cambridge, UFL, …
• Tools for mobility modeling (IMPORTANT, TVC), data mining
• Types of traces:
– Campuses (WLANs), Conference AP and encounter traces
– Municipal (off-campus) wireless APs, Bus & vehicular
Trace
IMPACT: Investigation of Mobile-user Patterns Across
University Campuses using WLAN Trace Analysis*
- 4 major campuses – 30 day traces studied from 2+ years of traces
- Total users > 12,000 users - Total Access Points > 1,300
Trace
source
Trace
duration
User
type
Environment
Collection
method
Analyzed
part
MIT
7/20/02 –
8/17/02
Generic
3 corporate
buildings
Polling
Whole
trace
Dartmouth
4/01/01 –
6/30/04
Generic
w/ subgroup
University
campus
Event-based July ’03
April ’04
UCSD
9/22/02 –
12/8/02
PDA only
University
campus
Polling
USC
4/20/05 –
3/31/06
Generic
University
campus
Event-based 04/20/0505/19/05
(Bldg)
09/22/0210/21/02
* W. Hsu, A. Helmy, “IMPACT: Investigation of Mobile-user Patterns Across University Campuses using
WLAN Trace Analysis”, two papers at IEEE Wireless Networks Measurements (WiNMee), April 2006 and
IEEE Transactions on Mobile Computing, 2010 (To appear).
Case study I – Individual Mobility
T ra c e s
O b se rv a tio n
A p p lic a tio n
In d iv id u a l
u se r m o b ility
M o b ility
m odel
M ic ro sc o p ic
b e h a v io r
U se r g ro u p s
in th e
p o p u la tio n
E n c o u n te r
p a tte rn s in
th e n e tw o rk
P ro file -c a st
p ro to c o l
S m a llW o rld b a se d
m e ssa g e
d isse m in a tio n
M a c ro sc o p ic
b e h a v io r
Classification of Mobility Models
Mobility
Space
Geographic
Restriction
Temporal
Correlation
* F. Bai, A. Helmy, "A Survey of Mobility Modeling and Analysis in Wireles
Adhoc Networks", Book Chapter in the book "Wireless Ad Hoc and Sensor
Networks”, Kluwer Academic Publishers, June 2004.
Spatial
Correlation
Spatio-temporal Mobility in WLANs
Skewed location preference
Prob.(online time fraction > x)
• Simple existing models
are very different
from the spatio-temporal
characteristics in WLANs
Characterize
On/off activity pattern
95% on-line time at 5 most visited APs
Periodic re-appearance
Periodic repetition peaks daily/weekly
The TVC Model: Reproducing Mobility Characteristics
Skewed location visiting preference
1
Average fraction of online time
associated with the AP
Time-Variant Community (TVC) Model:
1- Assigns communities (locations) to users
to re-produce location visiting preference
2- Varies temporal assignment of communities
to re-produce the periodic re-appearance
AP sorted by total am ount of tim e associated with it
11 21 31 41 51 61 71 81 91
1.E+00
1.E-01
1.E-02
M odel-sim plified
1.E-03
1.E-04
1.E-05
M IT
M odel-com plex
1.E-06
Periodic re-appearance
Prob.(Node re-appear at the same
AP after the time gap)
0 .3
IEEE INFOCOM 2007
IEEE/ACM Trans. on Networking 2009
CCDF
0 .2 5
0 .2
M odel-sim plified
M IT
0 .1 5
0 .1
0 .0 5
M odel-com plex
0
0
2
4
Tim e gap (days)
6
8
* Model-simplified: single community per node. Model-complex: multiple communities
** Similar matches achieved for USC and Dartmouth traces
Case study II – Encounter Patterns
T ra c e s
O b s e rv a tio n
A p p lic a tio n
In d iv id u a l
u s e r m o b ility
M o b ility
m odel
M ic ro s c o p ic
b e h a v io r
U s e r g ro u p s
in th e
p o p u la tio n
E n c o u n te r
p a tte rn s in
th e n e tw o rk
P ro file -c a s t
p ro to c o l
S m a llW o rld based
m essage
d is s e m in a tio n
M a c ro s c o p ic
b e h a v io r
Case Study II: Goal
• Understand inter-node encounter patterns from a
global perspective
– How do we represent encounter patterns?
– How do the encounter patterns influence network
connectivity and communication protocols?
• Encounter definition:
– In WLAN: When two mobile nodes access the same
AP at the same time they have an ‘encounter’
– In DTN: When two mobile nodes move within
communication range they have an ‘encounter’
0
Fraction of user population (x)
0.4
0.6
0.2
0.8
1
1
Cambridge
0.1
UCSD
MIT
USC
0.01
Dart-04
0.001
0.0001
Prob. (total encounter events > x)
Prob. (unique encounter fraction > x)
Observations: Nodal Encounters
Dart-03
CCDF of unique encounter count
CCDF of total encounter count
•In all the traces, the MNs encounter a small fraction of the user population.
• A user encounters 1.8%-6% on average of the user population
•The number of total encounters for the users follows a BiPareto distribution.
W. Hsu, A. Helmy, “On Nodal Encounter Patterns in Wireless LAN Traces”, IEEE Transactions on
Mobile Computing (TMC), To appear
The Encounter graph
• Vertices: mobile nodes, Edges: node encounters
x1,1  x1,n
  
xt ,1  xt ,n
Represent
Daily encounter graphs for MIT trace
Small Worlds of Encounters
Regular graph
Normalized CC and PL
• Encounter graph: nodes as vertices and edges link all vertices that encounter
Clustering
Coefficient (CC)
Small World
Av. Path Length
Random graph
• The encounter graph is a Small World graph (high CC, low PL)
• Even for short time period (1 day) its metrics (CC, PL) almost saturate
Information Diffusion in DTNs via Encounters
• Epidemic routing (spatio-temporal broadcast)
achieves almost complete delivery
Trace duration = 15 days
Unreachable ratio
(Fig: USC)
Robust to the removal of short encounters
Robust to selfish nodes (up to ~40%)
Encounter-graphs using Friends
• Distribution for friendship index FI is exponential for all the traces
• Friendship between MNs is highly asymmetric
• Among all node pairs: < 5% with FI > 0.01, and <1% with FI > 0.4
•Top-ranked friends form cliques and low-ranked friends are key to provide
random links (short cuts) to reduce the degree of separation in encounter graph.
Case study III – Groups in WLAN
T ra c e s
O b s e rv a tio n
A p p lic a tio n
In d iv id u a l
u s e r m o b ility
M o b ility
m odel
M ic ro s c o p ic
b e h a v io r
U s e r g ro u p s
in th e
p o p u la tio n
E n c o u n te r
p a tte rn s in
th e n e tw o rk
P ro file -c a s t
p ro to c o l
S m a llW o rld based
m essage
d is s e m in a tio n
M a c ro s c o p ic
b e h a v io r
Case Study III: Goal
• Identify similar users (in terms of long run mobility
preferences) from the diverse WLAN user population
– Understand the constituents of the population
– Identify potential groups for group-aware service
• Classify users based on their mobility trends and
location-visiting preferences
– Traces studied: semester-long USC trace (spring 2006,
94days) and quarter-long Dartmouth trace (spring 2004, 61
days)
Representation of User Association Patterns
W. Hsu, D. Dutta, A. Helmy, “Mining Behavioral Groups in WLANs”, ACM MobiCom ‘07
• Summarize user association per day by a vector
– a = {aj : fraction of online time user i spends at APj on day d}
-Office, 10AM -12PM
Association vector:
-Library, 3PM – 4PM
(library, office, class) =(0.2, 0.4, 0.4)
-Class, 6PM – 8PM
• Sum long-run mobility in “association matrix”
Each row represents the
percentage of time spent at
each location for a day
An entry represents the percentage of
online time during time day i at location j
 0 .5

x
 2 ,1
 

 
x
 t ,1
Office
Dorm
0 .4



xi, j




0 .1 



 

 
x t , n 
Each column corresponds to a location
Example association matrix to describe a given user’s location visiting preference
x1,1  x1,n
  
xt ,1  xt ,n
Represent
Eigen-behaviors & Behavioral Similarity Distance
• Eigen-behaviors (EB): Vectors describing maximum
remaining power in assoc. matrix M (through SVD):
- Get Eigen-vectors:
- Get relative importance:
- Get Eigen-values:
• Eigen-behavior Distance weighted inner products of EBs
– Sim(U ,V )   wi w j ui  v j
i , j
• Assoc. patterns can be re-constructed with low rank & error
• For over 99% of users, < 7 vectors capture > 90% of M’s power
Similarity-based User Classification
• Hierarchical clustering of similar behavioral groups
Dartmouth
• High quality clustering:
1
0 .8
0 .6
CDF
– Inter-group vs. intra-group distance
– Significance vs. random groups
In te r-g ro u p
In tra -g ro u p
S e rie s 3
S e rie s 4
• 0.93 v.s. 0.46 (USC), 0.91 v.s. 0.42 (Dart)
A M V D E ig e n - b e h a v io r
d is ta n c e
0 .4
0 .2
0
0
0 .2
0 .4
0 .6
0 .8
D is ta n c e b e tw e e n u s e r s
*AMVD = Average Minimum Vector Distance
– Unique groups based on Eigen Behaviors
Significance score of top eigenbehavior for
USC
Dartmouth
Its own group
0.779
0.727
Other groups
0.005
0.004
1
User Groups in WLAN - Observations
• Identified hundreds of distinct groups of similar users
• Skewed group size distribution –
– the largest 10 groups account for more than 30% of population on
campus
– Power-law distributed of group sizes
• Most groups can be described by a list of locations with a clear
ordering of importance
• Some groups visit multiple locations with similar importance –
– taking the most important location for each user is not sufficient
Group size
1000
Videos
D artm ou th
5 4 0 *x^-0 .6 7
USC
5 0 0 *x^-0 .7 5
100
10
1
1
10
100
U ser group size rank
1000
Behavioral Similarity: The Missing Link
Models
Models
Traces
Trace
s
Traces
Models
Existing models produce behaviorally homogeneous users and lack the richness
of behavioral structure in real traces. Richer models are needed !
Behavioral Similarity Graphs
Random and community models produce fully connected similarity graphs
G. Thakur, A. Helmy, W. Hsu, “Similarity analysis and modeling of similarity in mobile societies: The missing link”, UF Tech Report, Jun 2010
Profile-cast: A New Communication Paradigm
W. Hsu, D. Dutta, A. Helmy, ACM Mobicom 2007, WCNC 2008, Trans. Networking To appear
Payload Dest Address
Payload Target Profile
• Sending messages to others with similar behavior,
without knowing their identity
– Announcements to users with specific behavioral profile V
– Interest-based ads, similarity resource discovery
• For Delay Tolerant Networks (DTNs)
B
Is E similar to V?
E
Is B similar to V?
C
?
D
Is C/D similar to V?
A
Profile-cast Use Cases
• Mobility-based profile-cast (Target mode)
– Targeting group of users who move in a particular pattern (lost-andfound, context-aware messages, moviegoers)
– Approach: use “similarity metric” between users
Mobility
space
N
SN
S
D
D
Scoped message
spread in the mobility space
Forward
??
N
N
D
• Mobility-independent profile-cast (Dissemination mode)
– Targeting people with a certain characteristics independent of mobility
(classic music lovers)
– Approach: use “Small World” encounter patterns
Profile-cast Operation
1. profiling
N
N
S N
• Determining user similarity
– S sends Eigen behaviors for
the virtual profile to N
– N evaluated the similarity by
weighted inner products of
Eigen-behaviors
Sim(U ,V )   wi w j ui  v j
2. Forwarding decision
N
i , j
– Message forwarded if Sim(U,V)
is high (the goal is to deliver
messages to nodes with similar
profile)
– Privacy conserving: N and S do
not send information about their
own behavior
Profile-cast CSI protocol: Target-mode
S
Sim (BP(A), P(T)) = similarity of node’s behavioral profile to the target profile
Mobility Profile-cast (intra-group)
Goal
Epidemic
S
Group-spread
S
Single long random walk
S
S
Multiple short random walks
S
Mobility Profile-cast (inter-group)
Goal
Epidemic
S
T.P.
S
T.P.
Gradient-ascend
S
T.P.
Single long random walk
S
T.P.
Group-spread
Multiple short random walks
S
T.P.
S
T.P.
Profile-cast Evaluation
* Results presented as the ratio to epidemic routing
- Over 96% delivery ratio – Over 98% reduction in overhead w.r.t. Epidemic
- RW < 45% delivery
- Strikes a near optimal balance between delivery, overhead and delay
- Other variants (e.g., multi-copy, simulated annealing) under investigation
Video
Extending Interest, Behavior Beyond Mobility
• In addition to mobility, user’s web access and traffic patterns,
applications used (among others) represent other dimensions
of interest and behavior
• Further analysis of network measurements (e.g., Netflow) can
reveal behavioral characteristics in these dimensions
• Netflow traces are 3 orders of magnitude larger than WLANs
(WLANs: dozens of millions, Netflows: dozens of billions)
• New challenges in mining ‘big data’ to get information
S. Moghaddam, A. Helmy, S. Ranka, M. Somaya, “Data-driven Co-clustering Model of Internet Usage in Large Mobile Societies”, UF
Tech Report, May 2010
Web-usage Spatio-temporal multi-D Clustering
Clustering of Locations based on web access
(similar locations coded with same color)
- Users can be consistently modeled using few (~10) clusters with disjoint profiles.
- Access patterns from multiple locations show clustered distinct behavior.
Gender-based feature analysis in Campus-wide WLANs
U. Kumar, N. Yadav, A. Helmy, Mobicom 2007, Crawdad 2007
3500
Male
2500
Female
25
Female
20
2000
15
1500
visitors
1000
10
500
0
Male
30
University Campus
traces
traces
Area
5
0
Intel
Apple
Gem…
Ente…
Links…
ASKE…
D-Link
Ager…
Netg…
Average Duration (sec)
3000
Percentage Users
35
Manufacturer
- Able to classify users by gender using knowledge of campus map
-Users exhibit distinct on-line behavior, preference of device and mobility based on gender
-On-going Work
-How much more can we know?
-What is the “information-privacy trade-off”?
Future Directions (Applications)
• Behavior aware push/caching services (targeted
ads, events of interest, announcements)
• Caching based on behavioral prediction
• Detecting abnormal user behavior & access
patterns based on previous profiles
• Can we extend this paradigm to include social
aspects (trust, friendship, cooperation)?
• Privacy issues and mobile k-anonymity
• Participatory sensing, deputizing the community
Disaster Relief (Self-Configuring) Networks
sensor
sensor
sensor
sensor
sensor
sensor
sensor
sensor
sensor
sensor
sensor
sensor
sensor
sensor
sensor
sensor
sensor
sensor
sensor
sensor
On-going and Future Directions Utilizing mobility
– Controlled mobility scenarios
• DakNet, Message Ferries, Info Station
– Mobility-Assisted protocols
• Mobility-assisted information diffusion:
EASE, FRESH, DTN, $100 laptop
– Context-aware Networking
• Mobility-aware protocols: self-configuring,
mobility-adaptive protocols
• Socially-aware protocols: security, trust,
friendship, associations, small worlds
– On-going Projects
• Next Generation (Boundless) Classroom
• Disaster Relief Self-configuring Survivable
Networks
The Next Generation (Boundless) Classroom
Students
sensor
sensor
sensor
sensor
sensor
sensor-adhoc
Embedded sensor network
WLAN/adhoc
WLAN/adhoc
sensor
sensor
sensor
Multi-party conference
Tele-collaboration tools
sensor
sensor
sensor-adhoc
Instructor
WLAN/adhoc
Challenges
sensor
sensor
sensor
sensor
sensor-adhoc
-Integration of wired Internet, WLANs, Adhoc
Mobile and Sensor Networks
-Will this paradigm provide better learning
experience for the students?
Real world group experiments (structural health monitoring)
Future Directions: TechnologyHuman Interaction
The Next Generation Classroom
Emerging Wireless &
Multimedia Technologies
Protocols,
Applications,
Services
Human
Behavior
Mobility,
Load
Dynamics
Engineering
Multi-Disciplinary Research
Human Computer Interaction (HCI)
& User Interface
Social Sciences
Cognitive
Sciences
Education
Psycology
Application Development
Service Provisioning
Emerging Wireless &
Multimedia Technologies
How to Capture?
Protocols,
Applications,
Services
Human
Behavior
Educational/
Learning
Experience
Protocol Design
How to Evaluate?
Measurements
Mobility Models
Context-aware
Networking
How to Design?
Traffic Models
Mobility,
Load
Dynamics
Thank you!
Ahmed Helmy [email protected]
URL: www.cise.ufl.edu/~helmy
MobiLib: nile.cise.ufl.edu/MobiLib
Outline
• Ad Hoc, Sensor Networks & DTNs
– The paradigm shift: trace-driven design
•
•
•
•
•
The TRACE framework
Small worlds of encounters
Mining the mobile society: Similarity analysis
Profile-cast
Future directions
Background: Delay Tolerant Networks (DTN)
• DTNs are mobile networks with sparse,
intermittent nodal connectivity
• Encounter events provide the communication
opportunities among nodes
• Messages are stored and moved across the
network with nodal mobility
A B
C
Graphs , Path Length and Clustering
Small World Graph: Low path length, High clustering
Regular Graph
- High path length
- High clustering
1
Random Graph
- Low path length,
- Low clustering
0.8
0.6
0.4
0.2
[Helmy’03]
Clustering
Path Length
0
0.0001
0.001
0.01
0.1
1
probability of re-wiring (p)
- In Small Worlds, a few short cuts contract the diameter (i.e., path length) of a regular graph to
resemble diameter of a random graph without affecting the graph structure (i.e., clustering)
On Mobility & Predictability of VoIP & WLAN Users
J. Kim, Y. Du, M. Chen, A. Helmy, Crawdad 2007
Work in-progress
Markov O(2) Predictor Accuracy
VoIP User Prediction Accuracy
-VoIP users are highly mobile and exhibit dramatic difference in behavior than WLAN users
-Prediction accuracy drops from ave ~62% for WLAN users to below 25% for VoIP users
Motivates
-Revisiting mobility modeling
-Revisiting mobility prediction
Profile-cast Operation
1. profiling
N
N
S N
– Singular
value
decomposition
• Profiling
user
mobility
provides
summary
ofnode
the
– Theamobility
of a
matrixis(Arepresented
few eigen-behavior
by an
vectors
are sufficient,
e.g. for
association
matrix
99% of users at most 7 vectors
describe 90% of power in the
x
  x 
association matrix)
x
1 ,1
N
Each row represents an
association vector for
time
 slot
 a

   







   

entry represents

x
 2 ,1
 

 
x
 t ,1
1, 2
1, n


x ,j
 
Sum. i vectors

    
  x t , n 
 

An
the percentage of
online time during time slot i at location j

Mobility Independent Profile-cast
Goal
Flooding
S
SmallWorld-based
S
Single long random walk
S
S
Multiple short random walks
S
Thank you!
Ahmed Helmy [email protected]
URL: www.cise.ufl.edu/~helmy
MobiLib: nile.cise.ufl.edu/MobiLib
Implementation Details (in progress)
Future Work
– N-copy-per-clique in the “mobility space”
- D iffe re n t le g e n d s re p re se n t n o d e s
w ith d iffe re n t m o b ility tre n d s
-W h ite n o d e s d e n o te th e ta rg e t
re c ip ie n ts
S
S
S
In te re st sp a c e
M o b ility sp a c e
P h y sic a l sp a c e
– We expect this to work because similarity in
mobility leads to frequent encounters
0 .7
0 .6
Encounter Ratio
0 .5
0 .4
0 .3
0 .2
0 .1
0
0
0 .2
0 .4
0 .6
U s e r p a ir s im ila rity
0 .8
1
Future Work
– N-copy-per-clique in the “mobility space”
S
S
S
In te re st sp a c e
M o b ility sp a c e
- D iffe re n t le g e n d s re p re se n t n o d e s
w ith d iffe re n t m o b ility tre n d s
-W h ite n o d e s d e n o te th e ta rg e t
re c ip ie n ts
P h y sic a l sp a c e
– Challenge: From mobility to interest and other
classifications
Netflow Trace Sample