MicroHash: An efficient Index Structure for Wireless

Download Report

Transcript MicroHash: An efficient Index Structure for Wireless

Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
Disclosure-free GPS Trace
Search in Smartphone Networks
Demetrios Zeinalipour-Yazti
Christos Laoudias
Maria I. Andreou
Dimitrios Gunopulos
12th IEEE International Conference on Mobile Data Management
(MDM’11), June 7th, 2011, Luleå, Sweden
http://www.cs.ucy.ac.cy/~dzeina/
MDM 2011 © Zeinalipour-Yazti, Laoudias, Andreou, Gunopulos
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
Smartphones
•
Smartphone: A powerful sensing device!
–
–
–
–
•
Processing: 1 GHz dual core
RAM & Flash Storage: 1GB & 48GB, respectively
Networking: WiFi, 3G (Mbps) / 4G (100Mbps–1Gbps)
Sensing: Proximity, Ambient Light, Accelerometer, Microphone,
Geographic Coordinates based on AGPS (fine), WiFi or Cellular
Towers (coarse).
Combining many of
those “sensors” creates
opportunities for Crowdsourced data acquisition in
Urban Environments.
–
Aka. Opportunistic/Participatory Sensing
MDM 2011 © Zeinalipour-Yazti, Laoudias, Andreou, Gunopulos
2
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
A Word on GPS Trace Collection
• Popular Smartphones are already collect
positional information. Same applies to Social
Networking Applications (e.g., Latitude, Gowalla,
Twitter, etc.)
• iPhone User Position Logging:
– iPhone collects coarse-grain positional information
(i.e., triangulated Cell tower) locally on your
smartphone (and iTunes backup).
– The unencrypted log file is even migrated between
devices!
– Displaying your iPhone trace history on a Map:
http://petewarden.github.com/iPhoneTracker/
MDM 2011 © Zeinalipour-Yazti, Laoudias, Andreou, Gunopulos
3
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
Presentation Outline
• Introduction
• System Model and Problem
Formulation
• Background on Trajectory Similarity
• The SmartTrace Algorithm
• Experimental Evaluation
• Future Work
MDM 2011 © Zeinalipour-Yazti, Laoudias, Andreou, Gunopulos
4
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
System Model and Problem Formulation
Find the K most similar
trajectories to Q without pulling
together all traces at QN
MDM 2011 © Zeinalipour-Yazti, Laoudias, Andreou, Gunopulos
5
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
Constraints and Objectives
A. Don’t Disclose the User’s Trajectory to QN
– Social sites are already undergoing significant
privacy restructuring (e.g., google buzz, facebook)
– Trajectories are large (270MB/year with 2s samples)
B. Minimize Net Traffic and Local Processing
– 3G/4G and WiFi traffic: i) depletes smartphone
battery and ii) degrades network health*
* In 2009 AT&T’s customers affected by iPhone release.
MDM 2011 © Zeinalipour-Yazti, Laoudias, Andreou, Gunopulos
6
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
SmartTrace Applications
• Our framework finds applications in a
wide range of domains:
– Intelligent Transportation Systems: “Find
whether a new bus route is similar to the
trajectories of K other users.”
– Social Networks: “Find whether there is a
cycling route from MOMA to the Julliard”
• GeoLife, GPS-Waypoints, Sharemyroutes,
etc. offer centralized counterparts.
– Habitant Monitoring: Find zebras that moved
more similarly to zebra X before it got injured.
MDM 2011 © Zeinalipour-Yazti, Laoudias, Andreou, Gunopulos
7
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
Presentation Outline
• Introduction
• System Model and Problem
Formulation
• Background on Trajectory Similarity
• The SmartTrace Algorithm
• Experimental Evaluation
• Future Work
MDM 2011 © Zeinalipour-Yazti, Laoudias, Andreou, Gunopulos
8
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
Trajectory Similarity Search
•
Problem: Compare the query with all the
distributed sequences and return the k most
similar sequences to the query.
•
Similarity between two objects A, B is associated
with a distance function (see next) Distance
D = 7.3
?
D = 10.2
K
D = 11.8
Query
D = 17
D = 22
MDM 2011 © Zeinalipour-Yazti, Laoudias, Andreou, Gunopulos
9
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
Trajectory Similarity Search
•
Lp-norms are the simplest way to compare
trajectories (e.g., Euclidean, Manhattan, etc.)
n
L p  ( | a[i]  b[i] | )
p 1/ p
i 1
•
P=1 Manhattan
P=2 Euclidean
Lp-norms are fast (i.e., O(n)), but inaccurate.
–
–
No Flexible matching in time. (miss out-of-phase)
No Flexible matching in space. (miss outliers)
MDM 2011 © Zeinalipour-Yazti, Laoudias, Andreou, Gunopulos
10
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
Trajectory Similarity Search
• LCSS: Given strings A and B, LCSS is the longest
string that is a subsequence of both A and B;
ignore majority of noise
match
Time
match
• A Dynamic Programming algorithm for this
problem requires O(|A|*|B|) time. *
• It can be computed in O(δ(|A|+|B|)) if we limit the
matching window within δ. => Still expensive
* Procesing a trajectory
δ
with size |Ai|=1.8MB,
B
requires 111 seconds on a
A
smartphone
11
MDM 2011 © Zeinalipour-Yazti, Laoudias, Andreou, Gunopulos
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
LCSS(MBEQ, Ai): Bounding Above LCSS*
ΜΒΕ: Minimum Bounding
Envelope
ε
Q
A
2δ
40 pts
6 pts
Theorem: LCSS ,  (Q, A)  LCSS ,  ( MBE (Q), A)
12
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
Presentation Outline
• Introduction
• System Model and Problem
Formulation
• Background on Trajectory Similarity
• The SmartTrace Algorithm
• Experimental Evaluation
• Future Work
MDM 2011 © Zeinalipour-Yazti, Laoudias, Andreou, Gunopulos
13
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
SmartTrace Algorithm Outline
•
An intelligent top-K processing algorithm for
identifying the K most similar trajectories to Q
in a distributed environment.
•
•
Step A: Conduct the cheap
linear-time LCSS(MBEQ,Ai)
computation on the
smartphones to approximate
the answer.
Step B: Exploit the
approximation to identify the
correct answer by iteratively
asking specific nodes to
conduct LCSS(Q, Ai).
MDM 2011 © Zeinalipour-Yazti, Laoudias, Andreou, Gunopulos
14
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
SmartTrace Algorithm (1/2)
Input: Query Trajectory Q, m Target Trajectories, Result
Preference K (K << m), Iteration Step Increment λ.
Output: K trajectories most similar to Q.
At the query node QN:
1.
Upper Bound (UB) Computation: Instruct each of
the m smartphones to invoke a computation of the
linear-time LCSS(MBEQ,Ai) (i ≤ m).
2.
Collection of UB: Receive the UBs of all m
trajectories participating in the query and add those
scores to the METADATA vector stored at QN. Let
METADATA be sorted in descending order based
on the UB scores.
MDM 2011 © Zeinalipour-Yazti, Laoudias, Andreou, Gunopulos
15
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
SmartTrace Algorithm (2/2)
3.
4.
Full Computation: Ask the λ (λ ≥ K) highest UB
nodes to compute LCSS(Q,Ai) and then send back
the computed “full” scores.
Termination Condition: If the next highest UB is
smaller than the K-th largest full score then stop; else
goto step 3 in order to identify the next λ candidates.
STOP
A
Top-2
CONTINUE
A
22
Top-2
22
23
UB Scores
5.
FULL Scores
23
UB Scores
FULL Scores
(Tentative) Ship Matching: If the termination
condition has been met, tentatively ship the
respective matches to QN, based on some local trace
disclosure policy.
16
MDM 2011 © Zeinalipour-Yazti, Laoudias, Andreou, Gunopulos
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
SmartTrace Protocol (STP)
Querying
Node
Server
(QN)
Participating
Node
LCSS(MBEQ,Ai)
1
2
LCSS(Q,Ai)
3
Text Protocol, RFC-like specification
MDM 2011 © Zeinalipour-Yazti, Laoudias, Andreou, Gunopulos
17
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
Presentation Outline
• Introduction
• System Model and Problem
Formulation
• Background on Trajectory Similarity
• The SmartTrace Algorithm
• Experimental Evaluation
• Future Work
MDM 2011 © Zeinalipour-Yazti, Laoudias, Andreou, Gunopulos
18
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
Experimental Methodology
• Datasets & Queries
– Oldenburg (Realistic): IAPG Institute, Germany
• Dataset
http://iapg.jade-hs.de/personen/brinkhoff/generator/
– 2,000 Car Trajectories moving in the city of Oldenburg.
– Trajectory Length: 11,731 ±7,193 points
• Queryset
– Randomly sampled out of the original dataset with interpolated noise
– Trajectory Length: 100 points.
– GeoLife (Real): Microsoft Research Asia
• Dataset
http://research.microsoft.com/en-us/projects/geolife/
– 1,100 Human Trajectories over the city of Beijing in the time frame
2007-2009 (1 sample / 5 seconds or 1 sample / 10 meters)
– Trajectory Length: 190,110 ±126,590 points
• Queryset
– Randomly sampled out of the original dataset with interpolated noise
– Trajectory Length: 500 points
MDM 2011 © Zeinalipour-Yazti, Laoudias, Andreou, Gunopulos
19
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
Experimental Methodology
• Algorithms:
– Centralized (C): 1) Ship Trajectories to QN; 2) Conduct
centralized LCSS(Q,Ai) computation;
– Decentralized (D): 1) Ship Q to all nodes; 2) Conduct the
LCSS(Q,Ai) computation locally;
– SmartTrace (ST): 1) Ship Q to all nodes; 2) Conduct the lineartime LCSS(MBEQ,Ai) computation; 3) Iteratively ask specific
nodes to calculate LCSS(Q,Ai);
• Metrics:
– Execution Time (T): The total time to answer the query.
– Amortized Energy (E) per Device: average energy consumed
by a smartphone for answering the query (based on Powertutor
profile – Univ. of Michigan)
– δ and ε (temporal and spatial matching) parameters are kept
constant for all experiments. The values affect the matching
granularity, which is similar for all algorithms.
MDM 2011 © Zeinalipour-Yazti, Laoudias, Andreou, Gunopulos
20
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
Experimental Results
(Execution Time)
Result I:
ST and D are 1 order of
magnitude faster than C.
10x
Expl: ST and D rely
mainly on processing
while C relies on data
transfer, which is slow!
Result II:
ST is faster than D
(i.e., 17% and 8%,
respectively for the two
datasets)
Expl: Attributed to the variable length of trajectories
(i.e., D always compares against the longest
trajectory while ST compares against it only if it
belongs to the candidate S-set)
MDM 2011 © Zeinalipour-Yazti, Laoudias, Andreou, Gunopulos
21
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
Experimental Results
(Energy Consumption)
Result III:
C is network-intensive
while ST and D are cpuintensive
Expl: ST and D have very
little network activity (i.e.,
which accounts for 2.59mJ
and 2.29mJ, respectively)
Result IV:
- ST is 67% more energy
efficient than D
-ST is 81% more energy
efficient than C
-Expl: ST doesn’t execute
LCSS(Q,Ai) on all nodes.
MDM 2011 © Zeinalipour-Yazti, Laoudias, Andreou, Gunopulos
22
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
Prototype System (GPS)
• SmartTrace: Implemented as a Client-Server text-based protocol
– Server implemented in JAVA (4,500 LOC)
– Client implemented in JAVA on Android (2,500 LOC + XML files)
Query
Device B
Device C
* “SmartTrace: Finding Similar Trajectories in Smartphone Networks without Disclosing the Traces”, C. Costa, C. Laoudias, D.
Zeinalipour-Yazti, D. Gunopulos Demo at the 27th IEEE Intl. Conf. on Data Engineering (ICDE’11), Hannover, Germany,
2011.
MDM 2011 © Zeinalipour-Yazti, Laoudias, Andreou, Gunopulos
23
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
Prototype System (GPS)
Privacy
Answer
Setting
With
Trace
MDM 2011 © Zeinalipour-Yazti, Laoudias, Andreou, Gunopulos
Answer
24
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
Prototype System (RSS)
The SmartTrace algorithm
works equally well for indoor
environments (using RSS)
Ε
Ζ
Η
Γ
Δ
B
A
MDM 2011 © Zeinalipour-Yazti, Laoudias, Andreou, Gunopulos
25
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
Presentation Outline
• Introduction
• System Model and Problem
Formulation
• Background on Trajectory Similarity
• The SmartTrace Algorithm
• Experimental Evaluation
• Future Work
MDM 2011 © Zeinalipour-Yazti, Laoudias, Andreou, Gunopulos
26
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
Future Work
• Evaluate the SmartTrace prototype system over
the SmartNet testbed we are developing.
• Develop extensions that do not require the
iterative execution of LCSS(Q,Ai) but can
postpone them to a final post-processing step.
• Develop new Similarity Measures for (Highly
Dimensional) RSS Trajectories.
• Develop a killer application for our algorithm
and deploy the executable APK on Google
Market to gain further experiences with this.
Possibly also develop a client for iPhone
devices.
MDM 2011 © Zeinalipour-Yazti, Laoudias, Andreou, Gunopulos
27
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
Future Work
SmartNet
Programming cloud for the
development of smartphone
network applications &
protocols as well as
experimentation with real
smartphone devices.
Install APK,
Upload File,
Reboot, …
MDM 2011 © Zeinalipour-Yazti, Laoudias, Andreou, Gunopulos
28
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
Disclosure-free GPS Trace
Search in Smartphone Networks
Demetrios Zeinalipour-Yazti
Christos Laoudias
Maria I. Andreou
and Dimitrios Gunopulos
Thanks!
Questions?
http://www.cs.ucy.ac.cy/~dzeina/
MDM 2011 © Zeinalipour-Yazti, Laoudias, Andreou, Gunopulos
www.modap.org
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
A Word on GPS Trace Collection
MDM 2011 © Zeinalipour-Yazti, Laoudias, Andreou, Gunopulos
30
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
LCSS Definition
31
MDM 2011 © Zeinalipour-Yazti, Laoudias, Andreou, Gunopulos
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
Experimental Results
(Varying K Parameter)
Result V: Performance results are the same when the
preference K is constraint within 1% of the answer set
(typical for top-K query processing algorithms).
MDM 2011 © Zeinalipour-Yazti, Laoudias, Andreou, Gunopulos
32
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
Experimental Results
(Varying the λ Parameter)
• The λ parameter defines how aggressively ST explores
the top-k result set (Higher λ => Faster Convergence)
Theorem: ST requires O(m/λ) iterations in the worse
case, where λ denotes the step increment and m the
number of trajectories
Result VI (λ-convergence):
Our algorithm convergences
in 7.6 and 9.3 iterations, on
average, for the Oldenburg
and Geolife datasets,
respectively.
MDM 2011 © Zeinalipour-Yazti, Laoudias, Andreou, Gunopulos
33
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
SmartTrace Execution
Query: Find the K=2 most similar trajectories to Q
Ask A4 & A2
for the
computation
of LCSS
METADATA
id,lb
Stop if
Kth LCSS
λ+1
>=
Last UB
λ
id,ub
DATA
A4,30
A2,27
A0,25
A3,20
A9,18
A7,12
....
A4,23
A2,22
A0,16
A3,18
A9,15
A7,10
....
UB
EXACT
Q
A4
LCSS(Q,A4)=23
Kth LCSS
≥?
K=2
22
23
34
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
SmartNet: Programming Cloud
• Currently, there are no testbeds (like motelab,
planetlab) for realistically emulating and prototyping
Smartphone Network applications and protocols at a
large scale.
• Currently applications are tested in emulators.
– Drawbacks:
• Sensors are not emulated.
• It is difficult to concurrently
re-program several devices
between the devices.
• MobNet project (at UCY 2010-2012), will develop an
innovative cloud testbed of mobile sensor devices using
50+ Android devices.
MDM 2011 © Zeinalipour-Yazti, Laoudias, Andreou, Gunopulos
35
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
Road Traffic Mapping (RTM): Past
Mapping Road Traffic is traditionally carried out with
fixed cameras & sensors mounted on roadsides
http://www.rta.nsw.gov.au/
MDM 2011 © Zeinalipour-Yazti, Laoudias, Andreou, Gunopulos
36
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010
RTM with Smartphone Networks: Future
Opportunistic (w/ user interaction) and
Participatory Sensing (w/out user interaction):
Mapping the Road traffic by collecting WiFi signals.
Received Signal Strength (RSS):
power present in WiFi radio signal
Ε
Ζ
Η
Γ
Δ
B
A
Graphics courtesy of: A .Thiagarajan et. al. “Vtrack: Accurate, Energy-Aware Road Traffic Delay
Estimation using Mobile Phones, In Sensys’09, pages 85-98. ACM, (Best Paper) MIT’s CarTel Group
MDM 2011 © Zeinalipour-Yazti, Laoudias, Andreou, Gunopulos
37