Slides - ECE Users Pages - Georgia Institute of Technology

Download Report

Transcript Slides - ECE Users Pages - Georgia Institute of Technology

Structural Data De-anonymization:
Quantification, Practice, and Implications
Shouling Ji, Weiqing Li, and Raheem Beyah
Georgia Institute of Technology
Mudhakar Srivatsa
IBM T. J. Watson Research Center
Narayanan-Shmatikov attack (IEEE S&P 2009)
•
A. Narayanan and V. Shmatikov, De-anonymizing Social Networks, IEEE S&P 2009.
•
Anonymized data: Twitter (crawled in late 2007)
– A microblogging service
– 224K users, 8.5M edges
•
Auxiliary data: Flicker (crawled in late 2007/early 2008)
– A photo-sharing service
– 3.3M users, 53M edges
•
Result: 30.8% of the users are successfully de-anonymized
Twitter
Heuristics
Eccentricity
Edge directionality
Node degree
Revisiting nodes
Reverse match
User mapping
S. Ji, W. Li, M. Srivatsa and R. Beyah
Structural Data De-anonymization
Flicker
Srivatsa-Hicks Attacks (ACM CCS 2012)
•
M. Srivatsa and M. Hicks, De-anonymizing Mobility Traces: using Social Networks as a SideChannel, ACM CCS 2012.
•
Anonymized data
– Mobility traces: St Andrews, Smallblue, and Infocom 2006
•
Auxiliary data
– Social networks: Facebook, and DBLP
•
Over 80% users can be successfully de-anonymized
S. Ji, W. Li, M. Srivatsa and R. Beyah
Structural Data De-anonymization
Motivation
• Question 1: Why can structural data be de-anonymized?
• Question 2: What are the conditions for successful data de-anonymization?
• Question 3: What portion of users can be de-anonymized in a structural dataset?
[1] P. Pedarsani and M. Grossglauser, On the Privacy of Anonymized Networks, KDD 2011.
[2] L. Yartseva and M. Grossglauser, On the Performance of Percolation Graph Matching, COSN 2013.
[3] N. Korula and S. Lattanzi, An Efficient Reconciliation Algorithm for Social Networks, VLDB 2014.
S. Ji, W. Li, M. Srivatsa and R. Beyah
Structural Data De-anonymization
Motivation
• Question 1: Why can structural data be de-anonymized?
• Question 2: What are the conditions for successful data de-anonymization?
• Question 3: What portion of users can be de-anonymized in a structural dataset?
Our Constribution
Address the above three open questions under a practical
data model.
S. Ji, W. Li, M. Srivatsa and R. Beyah
Structural Data De-anonymization
Outline
• Introduction and Motivation
• System Model
• De-anonymization Quantification
• Evaluation
• Implication 1: Optimization based De-anonymization (ODA) Practice
• Implication 2: Secure Data Publishing
• Conclusion
S. Ji, W. Li, M. Srivatsa and R. Beyah
Structural Data De-anonymization
System Model
• Anonymized Data
• Auxiliary Data
• De-anonymization
• Measurement
S. Ji, W. Li, M. Srivatsa and R. Beyah
Structural Data De-anonymization
System Model
• Anonymized Data
• Auxiliary Data
• De-anonymization
• Measurement
• Quantification
Configuration Model
G can have an arbitrary degree sequence
that follows any distribution
conceptual underlying graph
S. Ji, W. Li, M. Srivatsa and R. Beyah
Structural Data De-anonymization
Outline
• Introduction and Motivation
• System Model
• De-anonymization Quantification
• Evaluation
• Implication 1: Optimization based De-anonymization (ODA) Practice
• Implication 2: Secure Data Publishing
• Conclusion
S. Ji, W. Li, M. Srivatsa and R. Beyah
Structural Data De-anonymization
De-anonymization Quantification
• Perfect De-anonymization Quantification
Structural Similarity Condition
S. Ji, W. Li, M. Srivatsa and R. Beyah
Graph/Data Size Condition
Structural Data De-anonymization
De-anonymization Quantification
•
-Perfect De-anonymization Quantification
Graph/Data Size Condition
S. Ji, W. Li, M. Srivatsa and R. Beyah
Structural Similarity Condition
Structural Data De-anonymization
Outline
• Introduction and Motivation
• System Model
• De-anonymization Quantification
• Evaluation
• Implication 1: Optimization based De-anonymization (ODA) Practice
• Implication 2: Secure Data Publishing
• Conclusion
S. Ji, W. Li, M. Srivatsa and R. Beyah
Structural Data De-anonymization
Evaluation
• Datasets
S. Ji, W. Li, M. Srivatsa and R. Beyah
Structural Data De-anonymization
Evaluation
• Perfect De-anonymization Condition
Structural Similarity Condition
Graph/Data Size Condition
S. Ji, W. Li, M. Srivatsa and R. Beyah
Structural Data De-anonymization
Evaluation
•
-Perfect De-anonymization Condition
Projection/Sampling Condition
Structural Similarity Condition
S. Ji, W. Li, M. Srivatsa and R. Beyah
Graph/Data Size Condition
Structural Data De-anonymization
Evaluation
•
-Perfect De-anonymizability
S. Ji, W. Li, M. Srivatsa and R. Beyah
Structural Similarity Condition
Structural Data De-anonymization
Evaluation
•
-Perfect De-anonymizability
Structural Similarity Condition
How many users can be
successfully de-anonymized
S. Ji, W. Li, M. Srivatsa and R. Beyah
Structural Data De-anonymization
Outline
• Introduction and Motivation
• System Model
• De-anonymization Quantification
• Evaluation
• Implication 1: Optimization based De-anonymization (ODA) Practice
• Implication 2: Secure Data Publishing
• Conclusion
S. Ji, W. Li, M. Srivatsa and R. Beyah
Structural Data De-anonymization
Optimization based De-anonymization (ODA)
• Our quantification implies
– An optimum de-anonymization solution exists
– However, it is difficult to find it.
Select candidate users from
unmapped users with top degrees
Mapping candidate users by
minimizing the Edge Error
function
S. Ji, W. Li, M. Srivatsa and R. Beyah
Structural Data De-anonymization
Optimization based De-anonymization (ODA)
• Our quantification implies
– An optimum de-anonymization solution exists
– However, it is difficult to find it.
ODA Features
1. Cold start (seed-free)
2. Can be used by other attacks for
landmark (seed) identification
3. Optimization based
Space complexity
Time complexity
S. Ji, W. Li, M. Srivatsa and R. Beyah
Structural Data De-anonymization
ODA Evaluation
• Dataset
– Google+ (4.7M users, 90.8M edges): using random sampling to get anonymized graphs
and auxiliary graphs
– Gowalla:
• Anonymized graphs: constructed based on 6.4M check-ins <UserID, latitude, longitude,
timestamp, location ID> generated by .2M users
• Auxiliary graph: the Gowalla social graph of the .2 users (1M edges)
– Results: landmark identification
S. Ji, W. Li, M. Srivatsa and R. Beyah
Structural Data De-anonymization
ODA Evaluation
• Dataset
– Google+ (4.7M users, 90.8M edges): using random sampling to get anonymized graphs
and auxiliary graphs
– Gowalla:
• Anonymized graphs: constructed based on 6.4M check-ins <UserID, latitude, longitude,
timestamp, location ID> generated by .2M users
• Auxiliary graph: the Gowalla social graph of the .2 users (1M edges)
– Results: de-anonymization
S. Ji, W. Li, M. Srivatsa and R. Beyah
Structural Data De-anonymization
Outline
• Introduction and Motivation
• System Model
• De-anonymization Quantification
• Evaluation
• Implication 1: Optimization based De-anonymization (ODA) Practice
• Implication 2: Secure Data Publishing
• Conclusion
S. Ji, W. Li, M. Srivatsa and R. Beyah
Structural Data De-anonymization
Secure Structural Data Publishing
• Structural information is important
• Based on our quantification
– Secure structural data publishing is difficult, at least theoretically
• Open problem …
S. Ji, W. Li, M. Srivatsa and R. Beyah
Structural Data De-anonymization
Conclusion
– We proposed the first quantification framework for structural data deanonymization under a practical data model
– We conducted a large-scale de-anonymizability evaluation of 26 real world
structural datasets
– We designed a cold-start optimization-based de-anonymization algorithm
Acknowledgement
We thank the anonymous reviewers very much
for their valuable comments!
S. Ji, W. Li, M. Srivatsa and R. Beyah
Structural Data De-anonymization
Thank you!
Shouling Ji
[email protected]
http://users.ece.gatech.edu/sji/
S. Ji, W. Li, M. Srivatsa and R. Beyah
Structural Data De-anonymization