Privacy of Location Trajectory Chi-Yin Chow Department of Computer Science City University of Hong Kong Mohamed F.

Download Report

Transcript Privacy of Location Trajectory Chi-Yin Chow Department of Computer Science City University of Hong Kong Mohamed F.

Privacy of Location Trajectory
Chi-Yin Chow
Department of Computer Science
City University of Hong Kong
Mohamed F. Mokbel
Department of Computer Science and Engineering
University of Minnesota
Outline
• Introduction
• Protecting Trajectory Privacy in Locationbased Services
• Protecting Privacy in Trajectory Publication
• Future Research Directions
2
Data Privacy
• Example: Hospitals want to publish medical records for
public health research
• Contain personal sensitive information
• Natural way: remove known identifiers (de-identify)
Medical Records
Gender
Zip Code
Date of Birth
Diagnosis
...
3
Is De-identification Enough?
Medical Records
Gender
Zip Code
Date of Birth
Diagnosis
...
Voter Registration Records
Name
...
Gender
Zip Code
Date of Birth
4
Is De-identification Enough?
Voter Registration Records
Name
...
Medical Records
Gender
Zip Code
Date of Birth
Diagnosis
...
Quasi-identifiers
5
Data Privacy-Preserving Techniques
• k-anonymity (Sweeney, IJUFKS’02)
• Indistinguishable among at least k records
• l-diversity (Machanavajjhala et al., TKDD’07)
• At least l values for sensitive attributes
• t-closeness (Li et al., TKDE’10)
• Distribution of sensitive attributes
(in equivalence class
vs
in entire data set)
6
Location Privacy
• Location-Based Services (LBS)
• Untrustable LBS Service Provider – Location Privacy Leakage
7
Location Privacy-Preserving
Techniques
• False Location
• Users generate fake locations
• Space Transformation
• Transform into another space
• Spatial Cloaking
• Blur user’s location into cloaked region
8
More Challenging: Trajectory
Privacy
• The hospital example
• Suppose the trajectories of patients should be published
• Trajectory T:
• De-identified
Suppose adversary know a patient visited (1, 5)
and (8, 10) at timestamps 2 and 5, respectively
He has a disease of HIV!
Sensitive Attribute
Powerful quasi-identifiers!
9
Two Kinds of Trajectory
• Real-time Trajectory -- Continuous LBS
• “Continuously inform me the traffic condition within 1 mile from
my vehicle”
• “Let me know my friends’ locations if they are within 2km from
my location”
• Off-line Trajectory -- Historical Trajectory
• Publish trajectory data for public research
• Answer spatio-temporal range queries
10
Continuous Location-based Services
vs. Trajectory Publication
• Scalability Requirement
• Continuous LBS: Real-time
• Historical Trajectory: Off-line
• Applicability of Global Optimization
• Continuous LBS: Dynamic, Uncertain
• Historical Trajectory: Static
11
Outline
• Introduction
• Protecting Trajectory Privacy in Locationbased Services
• Protecting Privacy in Trajectory Publication
• Future Research Directions
12
Protecting Trajectory Privacy in LBS
• Category-I LBS: Require consistent user identities.
• “Let me know my friends’ locations if they are within 2km from
my location”
• Category-II LBS: Do not require consistent user identities.
• “Send e-coupons to users within 1km from my coffee shop”
13
Protecting Trajectory Privacy in LBS
• Spatial cloaking
• Mix-zones
• Vehicular mix-zones
• Path confusion
• Path confusion with mobility prediction and data
caching
• Euler histogram-based on short IDs
• Dummy trajectories
14
Spatial Cloaking
• Main Idea: Blur user’s location into cloaked region
• k-anonymity
• Challenge: From snapshot location to continuous trajectory
• Trajectory tracing attack
• Anonymity-set tracing attack
• Support consistent user identity
15
Trajectory Tracing Attack (1/2)
Suppose R1 and R2 are two cloaked regions for user U at t1 and t2, respectively.
Suppose attacker knows U’s maximum speed.
time
time
Maximum
bound
B
C
A
R2
t2
t2
y
C
t1
y
C
A
B R1
x
t1
A
B R1
x
16
Trajectory Tracing Attack (2/2)
Attacker could infer which user is U! (Here it is C)
time
Maximum
bound
B
C
A
R2
t2
y
C
t1
A
B R1
x
17
Trajectory Tracing Attack: Solution
time
Maximum
bound
C
B
A
Maximum
time bound
R2
t2
y
C
B R1
Patching Technique
x
t1
R2
y
C
A
B
A
tn
t2
t1
C
B R1
A
x
Delaying Technique
(Cheng et al., PETS’06)
18
Anonymity-set Tracing Attack
y
y
3-Anonymous Cloaked
Spatial Region
F
H
E
F
H
A
G
G
B
E
A
C
C
D
D
B
x
At time t1
x
At time t2
19
Anonymity-set Tracing Attack:
Solution
• Solution 1: Group-based Approach
• Solution 2: Distortion-based Approach
• Solution 3: Prediction-based Approach
20
Solution 1: Group-based Approach
y
y
3-Anonymous Cloaked
Spatial Region
F
H
E
G
B
y
F
A
G
G
E
A
C
F
H
C
D
At time t1
H
E
C
D
D
B
x
A
B
x
At time t2
x
At time t3
• Group members are fixed
• All members need to report their locations to the anonymizer server periodically
(Chow et al., SSTD’07)
21
Solution 2: Distortion-based
Approach
y
time
C
(x1-, y1-)
(x+1, y+1)
tn
R1
tn-1
Rn-1
y
B
t2
x
At time t1
…
A
Rn
t1
R2
C
R1
A
B
x
At time ti
• Do not need other members to report their locations periodically
• Use their initial directions and velocities to calculate distortion regions
• Use distortion regions as new cloaked regions
(Pan et al., SIGSPATIAL’09)
22
Solution 3: Prediction-based
Approach
• Predict user’s trajectory
• Cloak it with other users’ historical trajectories
Expected trajectory
Historical trajectories
u3
u1
p1
C 1 C p2
2
p3
C3
C5
p4
p5
C4
u2
(Xu et al., INFOCOM’08)
23
Protecting Trajectory Privacy in LBS
• Spatial cloaking
• Mix-zones
• Vehicular mix-zones
• Path confusion
• Path confusion with mobility prediction and data
caching
• Euler histogram-based on short IDs
• Dummy trajectories
24
Mix-Zones (1/2)
• Main Idea:
• Users change pseudonyms when entering mix-zones
• Do not reveal their location when they are in mix-zones
• k-anonymity
• Not support consistent user identity
25
Mix-Zones (2/2)
(Freudiger et al., PETS’09)
a
x
b
Mix-Zone
y
c
z
• Ensuring k-anonymity
• At least k users in mix-zone at a certain time point
• Each user spends a completely random duration of time in the mix-zone
• Each user is equally likely to exit in any exit points no matter entering
through any entry points
26
Vehicular Mix-Zones (1/2)
• Mix-zone designed for Euclidean space not secure
enough when it comes to vehicle movements
•
•
•
•
•
Physical roads
Vehicle directions
Speed limits
Traffic conditions
Road conditions
Seg3out
Seg3in
c
Seg2in
Seg1out
d
b
a
Seg1in
Mix-Zone
Seg2out
27
Vehicular Mix-Zones (2/2)
• Adaptive mix-zones:
• Road intersection, together with outgoing road segments
Seg3in
Seg3out
c
Seg1out
Seg2in
d
a
b
Seg1in
Seg2out
(Palanisamy et al., ICDE’11)
28
Protecting Trajectory Privacy in LBS
• Spatial cloaking
• Mix-zones
• Vehicular mix-zones
• Path confusion
• Path confusion with mobility prediction and data
caching
• Euler histogram-based on short IDs
• Dummy trajectories
29
Path Confusion
• Goal: Avoid linking consecutive location samples to
individual vehicles
• Main Idea: A central server controls the release of
location data to satisfy “time-to-confusion”
• Not support consistent user identity
(Gruteser et al., MobiSys’03)
30
Path Confusion with Mobility
Prediction and Data Caching
• Main Idea: The location anonymizer predicts vehicular
movement paths, pre-fetches the spatial data on
predicted paths, stores the data in a cache
• Service provider can only see queries for a series of interweaving paths
The data on this path are cached
U
U
b
c
a
The data on this path are cached
d
e
c
b
Predicted path
a
f
d
e
?
?
f
31
(Meyerowitz et al., MobiCom’09)
Protecting Trajectory Privacy in LBS
• Spatial cloaking
• Mix-zones
• Vehicular mix-zones
• Path confusion
• Path confusion with mobility prediction and data
caching
• Euler histogram-based on short IDs
• Dummy trajectories
32
Euler Histogram-based on Short IDs
(EHSID)
• Goal: Privacy-aware Traffic Monitoring (answering aggregate
queries of a given region)
• ID-based query (count of unique vehicles) (need ID?)
• Entry-based query (count of entries)
• Short ID: Partial ID information about objects
• Full ID: 1 1 0 1 1 1 0 1 1
• Bit Pattern: 1, 3, 4, 7
• Short ID: 1 0 1 0
• Euler Histogram: Answer aggregate queries
• Not support consistent user identity
33
(Xie et al., IEEE Trans. ITS’10)
Euler Histogram
Use an Euler histogram to count distinct rectangles in a query region R
=6+1–5=2
• F is the sum of face counts inside R
• V is the sum of vertex counts inside R (excluding its boundary)
• E is the sum of edge counts inside R (excluding its boundary)
C
A
1
2
1
B
1
1
1
2
2
1
1
3
2
2
1
2
1
2
2
2
2
2
2
1
2
2
34
Query region
F = 1+2+1+2 = 6
V=1
E = 1+1+1+2 = 5
Euler Histogram-based on Short IDs
(EHSID)
• Answering four types of queries
•
•
•
•
ID-based cross-border
ID-based distinct-objects
Entry-based cross-border
Entry-based distinct-objects
V1
Query Types
Queries
Query
Region
V2
Cross-border Distinct-object
ID-based
1
2
Entry-based
2
3
Query Answers
• How to calculate these answers using Euler Histogram?
35
Define Four Types of Vertices
(JO)
V
01: 1
(OB)
V
01: 1
10: 1
E
01: 1
d
c
E
01: 1
Road Segment
Query Region
E
01: 1
10: 1
a
b
V
01: 1
Q
E
V
Two Trajectories
e
f
V
E
V
01: 1 01: 1 01: 1
10: 1 10: 1 10: 1
(JI)
(CI)
36
Euler Histogram-based on Short IDs
(EHSID)
(JO)
V
01: 1
(OB)
V
01: 1
10: 1
E
01: 1
d
c
E
01: 1
Road Segment
Query Region
E
01: 1
10: 1
a
b
V
01: 1
Q
E
V
Two Trajectories
e
f
V
E
V
01: 1 01: 1 01: 1
10: 1 10: 1 10: 1
(JI)
(CI)
37
Protecting Trajectory Privacy in LBS
• Spatial cloaking
• Mix-zones
• Vehicular mix-zones
• Path confusion
• Path confusion with mobility prediction and data
caching
• Euler histogram-based on short IDs
• Dummy trajectories
38
Dummy Trajectories
• Main Idea: User generate fake location trajectories
• How to choose dummy trajectories?
• How to measure the degree of privacy protection?
• Support consistent user identity
(You et al., PALMS’07)
39
How to Choose Dummy Trajectories
• Snapshot disclosure (SD): Average probability of successfully inferring each
true location
• Trajectory disclosure (TD): Probability of successfully identifying the true
trajectory among all possible trajectories
• Distance deviation (DD): Average distance between the ith location samples
of real trajectory and each dummy trajectory
y
4
s2
Td1
Td2
I1
3
2
s3
Tr
d1
I2
d3
s1
d2
1
40
0
1
2
3
4
5
x
Outline
• Introduction
• Protecting Trajectory Privacy in Locationbased Services
• Protecting Privacy in Trajectory Publication
• Future Research Directions
41
Protecting Privacy in Trajectory
Publication
• Clustering-based Anonymization Approach
• Generalization-based Anonymization Approach
• Suppression-based Anonymization Approach
• Grid-based Anonymization Approach
42
Clustering-based Anonymization
Approach
• Main Idea: Group k co-localized trajectories within the
same time period to form a k-anonymized aggregate
trajectory.
• Trajectory Uncertainty Model
time
Horizontal Disk
Trajectory
Volume
Trajectory
y
d
Uncertainty
threshold
43
x
(Abul et al., ICDE’08)
Clustering-based Anonymization
Approach
Aggregate trajectory of a set of 2-anonymized co-localized trajectories
time
Trajectory
Volume of Tq
(radius=d)
Trajectory
Volume of Tp
(radius=d)
Bounding trajectory
volume of Tp and Tq
(radius=d/2)
y
Aggregate
Trajectory
44
x
Protecting Privacy in Trajectory
Publication
• Clustering-based Anonymization Approach
• Generalization-based Anonymization Approach
• Suppression-based Anonymization Approach
• Grid-based Anonymization Approach
45
Generalization-based
Anonymization Approach
• Main Idea:
• Step1: Generalize a trajectory data set into a
sequence of k-anonymized regions
• Step2: Uniformly select k atomic points from each
anonymized region and reconstruct k trajectories
(Nergiz et al., TDP’09)
46
47
48
Protecting Privacy in Trajectory
Publication
• Clustering-based Anonymization Approach
• Generalization-based Anonymization Approach
• Suppression-based Anonymization Approach
• Grid-based Anonymization Approach
49
Suppression-based Anonymization
Approach
• Main Idea: Iteratively suppress locations until the privacy
constraint is met
• Privacy constraint
• Difference between transformed trajectories and original ones
50
(Terrovitis et al., MDM’08)
Suppress location a1
Suppression-based Anonymization
Approach
The probability adversary can identify the actual user of any location pi
51
Suppress location a1
Suppression-based Anonymization
Approach
Calculate difference between transformed trajectory and the original
52
Suppression-based Anonymization
Approach
53
Protecting Privacy in Trajectory
Publication
• Clustering-based Anonymization Approach
• Generalization-based Anonymization Approach
• Suppression-based Anonymization Approach
• Grid-based Anonymization Approach
54
Grid-based Anonymization
Approach
• Main Idea: Replace locations with grids (could have
different resolutions)
55
(Gidofalvi et al., MDM’07)
Outline
• Introduction
• Protecting Trajectory Privacy in Locationbased Services
• Protecting Privacy in Trajectory Publication
• Future Research Directions
56
Future Directions
• Personalized LBS (require more user semantics)
• User preferences and background information could be used as
quasi-identifiers
• Trajectory publication supporting more complex queries
• Spatio-temporal queries
• Spatio-temporal data analysis
57