Privacy of Location Trajectory Chi-Yin Chow Department of Computer Science City University of Hong Kong Mohamed F.
Download ReportTranscript Privacy of Location Trajectory Chi-Yin Chow Department of Computer Science City University of Hong Kong Mohamed F.
Privacy of Location Trajectory Chi-Yin Chow Department of Computer Science City University of Hong Kong Mohamed F. Mokbel Department of Computer Science and Engineering University of Minnesota Outline • Introduction • Protecting Trajectory Privacy in Locationbased Services • Protecting Privacy in Trajectory Publication • Future Research Directions 2 Data Privacy • Example: Hospitals want to publish medical records for public health research • Contain personal sensitive information • Natural way: remove known identifiers (de-identify) Medical Records Gender Zip Code Date of Birth Diagnosis ... 3 Is De-identification Enough? Medical Records Gender Zip Code Date of Birth Diagnosis ... Voter Registration Records Name ... Gender Zip Code Date of Birth 4 Is De-identification Enough? Voter Registration Records Name ... Medical Records Gender Zip Code Date of Birth Diagnosis ... Quasi-identifiers 5 Data Privacy-Preserving Techniques • k-anonymity (Sweeney, IJUFKS’02) • Indistinguishable among at least k records • l-diversity (Machanavajjhala et al., TKDD’07) • At least l values for sensitive attributes • t-closeness (Li et al., TKDE’10) • Distribution of sensitive attributes (in equivalence class vs in entire data set) 6 Location Privacy • Location-Based Services (LBS) • Untrustable LBS Service Provider – Location Privacy Leakage 7 Location Privacy-Preserving Techniques • False Location • Users generate fake locations • Space Transformation • Transform into another space • Spatial Cloaking • Blur user’s location into cloaked region 8 More Challenging: Trajectory Privacy • The hospital example • Suppose the trajectories of patients should be published • Trajectory T: • De-identified Suppose adversary know a patient visited (1, 5) and (8, 10) at timestamps 2 and 5, respectively He has a disease of HIV! Sensitive Attribute Powerful quasi-identifiers! 9 Two Kinds of Trajectory • Real-time Trajectory -- Continuous LBS • “Continuously inform me the traffic condition within 1 mile from my vehicle” • “Let me know my friends’ locations if they are within 2km from my location” • Off-line Trajectory -- Historical Trajectory • Publish trajectory data for public research • Answer spatio-temporal range queries 10 Continuous Location-based Services vs. Trajectory Publication • Scalability Requirement • Continuous LBS: Real-time • Historical Trajectory: Off-line • Applicability of Global Optimization • Continuous LBS: Dynamic, Uncertain • Historical Trajectory: Static 11 Outline • Introduction • Protecting Trajectory Privacy in Locationbased Services • Protecting Privacy in Trajectory Publication • Future Research Directions 12 Protecting Trajectory Privacy in LBS • Category-I LBS: Require consistent user identities. • “Let me know my friends’ locations if they are within 2km from my location” • Category-II LBS: Do not require consistent user identities. • “Send e-coupons to users within 1km from my coffee shop” 13 Protecting Trajectory Privacy in LBS • Spatial cloaking • Mix-zones • Vehicular mix-zones • Path confusion • Path confusion with mobility prediction and data caching • Euler histogram-based on short IDs • Dummy trajectories 14 Spatial Cloaking • Main Idea: Blur user’s location into cloaked region • k-anonymity • Challenge: From snapshot location to continuous trajectory • Trajectory tracing attack • Anonymity-set tracing attack • Support consistent user identity 15 Trajectory Tracing Attack (1/2) Suppose R1 and R2 are two cloaked regions for user U at t1 and t2, respectively. Suppose attacker knows U’s maximum speed. time time Maximum bound B C A R2 t2 t2 y C t1 y C A B R1 x t1 A B R1 x 16 Trajectory Tracing Attack (2/2) Attacker could infer which user is U! (Here it is C) time Maximum bound B C A R2 t2 y C t1 A B R1 x 17 Trajectory Tracing Attack: Solution time Maximum bound C B A Maximum time bound R2 t2 y C B R1 Patching Technique x t1 R2 y C A B A tn t2 t1 C B R1 A x Delaying Technique (Cheng et al., PETS’06) 18 Anonymity-set Tracing Attack y y 3-Anonymous Cloaked Spatial Region F H E F H A G G B E A C C D D B x At time t1 x At time t2 19 Anonymity-set Tracing Attack: Solution • Solution 1: Group-based Approach • Solution 2: Distortion-based Approach • Solution 3: Prediction-based Approach 20 Solution 1: Group-based Approach y y 3-Anonymous Cloaked Spatial Region F H E G B y F A G G E A C F H C D At time t1 H E C D D B x A B x At time t2 x At time t3 • Group members are fixed • All members need to report their locations to the anonymizer server periodically (Chow et al., SSTD’07) 21 Solution 2: Distortion-based Approach y time C (x1-, y1-) (x+1, y+1) tn R1 tn-1 Rn-1 y B t2 x At time t1 … A Rn t1 R2 C R1 A B x At time ti • Do not need other members to report their locations periodically • Use their initial directions and velocities to calculate distortion regions • Use distortion regions as new cloaked regions (Pan et al., SIGSPATIAL’09) 22 Solution 3: Prediction-based Approach • Predict user’s trajectory • Cloak it with other users’ historical trajectories Expected trajectory Historical trajectories u3 u1 p1 C 1 C p2 2 p3 C3 C5 p4 p5 C4 u2 (Xu et al., INFOCOM’08) 23 Protecting Trajectory Privacy in LBS • Spatial cloaking • Mix-zones • Vehicular mix-zones • Path confusion • Path confusion with mobility prediction and data caching • Euler histogram-based on short IDs • Dummy trajectories 24 Mix-Zones (1/2) • Main Idea: • Users change pseudonyms when entering mix-zones • Do not reveal their location when they are in mix-zones • k-anonymity • Not support consistent user identity 25 Mix-Zones (2/2) (Freudiger et al., PETS’09) a x b Mix-Zone y c z • Ensuring k-anonymity • At least k users in mix-zone at a certain time point • Each user spends a completely random duration of time in the mix-zone • Each user is equally likely to exit in any exit points no matter entering through any entry points 26 Vehicular Mix-Zones (1/2) • Mix-zone designed for Euclidean space not secure enough when it comes to vehicle movements • • • • • Physical roads Vehicle directions Speed limits Traffic conditions Road conditions Seg3out Seg3in c Seg2in Seg1out d b a Seg1in Mix-Zone Seg2out 27 Vehicular Mix-Zones (2/2) • Adaptive mix-zones: • Road intersection, together with outgoing road segments Seg3in Seg3out c Seg1out Seg2in d a b Seg1in Seg2out (Palanisamy et al., ICDE’11) 28 Protecting Trajectory Privacy in LBS • Spatial cloaking • Mix-zones • Vehicular mix-zones • Path confusion • Path confusion with mobility prediction and data caching • Euler histogram-based on short IDs • Dummy trajectories 29 Path Confusion • Goal: Avoid linking consecutive location samples to individual vehicles • Main Idea: A central server controls the release of location data to satisfy “time-to-confusion” • Not support consistent user identity (Gruteser et al., MobiSys’03) 30 Path Confusion with Mobility Prediction and Data Caching • Main Idea: The location anonymizer predicts vehicular movement paths, pre-fetches the spatial data on predicted paths, stores the data in a cache • Service provider can only see queries for a series of interweaving paths The data on this path are cached U U b c a The data on this path are cached d e c b Predicted path a f d e ? ? f 31 (Meyerowitz et al., MobiCom’09) Protecting Trajectory Privacy in LBS • Spatial cloaking • Mix-zones • Vehicular mix-zones • Path confusion • Path confusion with mobility prediction and data caching • Euler histogram-based on short IDs • Dummy trajectories 32 Euler Histogram-based on Short IDs (EHSID) • Goal: Privacy-aware Traffic Monitoring (answering aggregate queries of a given region) • ID-based query (count of unique vehicles) (need ID?) • Entry-based query (count of entries) • Short ID: Partial ID information about objects • Full ID: 1 1 0 1 1 1 0 1 1 • Bit Pattern: 1, 3, 4, 7 • Short ID: 1 0 1 0 • Euler Histogram: Answer aggregate queries • Not support consistent user identity 33 (Xie et al., IEEE Trans. ITS’10) Euler Histogram Use an Euler histogram to count distinct rectangles in a query region R =6+1–5=2 • F is the sum of face counts inside R • V is the sum of vertex counts inside R (excluding its boundary) • E is the sum of edge counts inside R (excluding its boundary) C A 1 2 1 B 1 1 1 2 2 1 1 3 2 2 1 2 1 2 2 2 2 2 2 1 2 2 34 Query region F = 1+2+1+2 = 6 V=1 E = 1+1+1+2 = 5 Euler Histogram-based on Short IDs (EHSID) • Answering four types of queries • • • • ID-based cross-border ID-based distinct-objects Entry-based cross-border Entry-based distinct-objects V1 Query Types Queries Query Region V2 Cross-border Distinct-object ID-based 1 2 Entry-based 2 3 Query Answers • How to calculate these answers using Euler Histogram? 35 Define Four Types of Vertices (JO) V 01: 1 (OB) V 01: 1 10: 1 E 01: 1 d c E 01: 1 Road Segment Query Region E 01: 1 10: 1 a b V 01: 1 Q E V Two Trajectories e f V E V 01: 1 01: 1 01: 1 10: 1 10: 1 10: 1 (JI) (CI) 36 Euler Histogram-based on Short IDs (EHSID) (JO) V 01: 1 (OB) V 01: 1 10: 1 E 01: 1 d c E 01: 1 Road Segment Query Region E 01: 1 10: 1 a b V 01: 1 Q E V Two Trajectories e f V E V 01: 1 01: 1 01: 1 10: 1 10: 1 10: 1 (JI) (CI) 37 Protecting Trajectory Privacy in LBS • Spatial cloaking • Mix-zones • Vehicular mix-zones • Path confusion • Path confusion with mobility prediction and data caching • Euler histogram-based on short IDs • Dummy trajectories 38 Dummy Trajectories • Main Idea: User generate fake location trajectories • How to choose dummy trajectories? • How to measure the degree of privacy protection? • Support consistent user identity (You et al., PALMS’07) 39 How to Choose Dummy Trajectories • Snapshot disclosure (SD): Average probability of successfully inferring each true location • Trajectory disclosure (TD): Probability of successfully identifying the true trajectory among all possible trajectories • Distance deviation (DD): Average distance between the ith location samples of real trajectory and each dummy trajectory y 4 s2 Td1 Td2 I1 3 2 s3 Tr d1 I2 d3 s1 d2 1 40 0 1 2 3 4 5 x Outline • Introduction • Protecting Trajectory Privacy in Locationbased Services • Protecting Privacy in Trajectory Publication • Future Research Directions 41 Protecting Privacy in Trajectory Publication • Clustering-based Anonymization Approach • Generalization-based Anonymization Approach • Suppression-based Anonymization Approach • Grid-based Anonymization Approach 42 Clustering-based Anonymization Approach • Main Idea: Group k co-localized trajectories within the same time period to form a k-anonymized aggregate trajectory. • Trajectory Uncertainty Model time Horizontal Disk Trajectory Volume Trajectory y d Uncertainty threshold 43 x (Abul et al., ICDE’08) Clustering-based Anonymization Approach Aggregate trajectory of a set of 2-anonymized co-localized trajectories time Trajectory Volume of Tq (radius=d) Trajectory Volume of Tp (radius=d) Bounding trajectory volume of Tp and Tq (radius=d/2) y Aggregate Trajectory 44 x Protecting Privacy in Trajectory Publication • Clustering-based Anonymization Approach • Generalization-based Anonymization Approach • Suppression-based Anonymization Approach • Grid-based Anonymization Approach 45 Generalization-based Anonymization Approach • Main Idea: • Step1: Generalize a trajectory data set into a sequence of k-anonymized regions • Step2: Uniformly select k atomic points from each anonymized region and reconstruct k trajectories (Nergiz et al., TDP’09) 46 47 48 Protecting Privacy in Trajectory Publication • Clustering-based Anonymization Approach • Generalization-based Anonymization Approach • Suppression-based Anonymization Approach • Grid-based Anonymization Approach 49 Suppression-based Anonymization Approach • Main Idea: Iteratively suppress locations until the privacy constraint is met • Privacy constraint • Difference between transformed trajectories and original ones 50 (Terrovitis et al., MDM’08) Suppress location a1 Suppression-based Anonymization Approach The probability adversary can identify the actual user of any location pi 51 Suppress location a1 Suppression-based Anonymization Approach Calculate difference between transformed trajectory and the original 52 Suppression-based Anonymization Approach 53 Protecting Privacy in Trajectory Publication • Clustering-based Anonymization Approach • Generalization-based Anonymization Approach • Suppression-based Anonymization Approach • Grid-based Anonymization Approach 54 Grid-based Anonymization Approach • Main Idea: Replace locations with grids (could have different resolutions) 55 (Gidofalvi et al., MDM’07) Outline • Introduction • Protecting Trajectory Privacy in Locationbased Services • Protecting Privacy in Trajectory Publication • Future Research Directions 56 Future Directions • Personalized LBS (require more user semantics) • User preferences and background information could be used as quasi-identifiers • Trajectory publication supporting more complex queries • Spatio-temporal queries • Spatio-temporal data analysis 57