Probabilistic Privacy Analysis of Published Views Hui (Wendy) Wang Laks V.S. Lakshmanan
Download ReportTranscript Probabilistic Privacy Analysis of Published Views Hui (Wendy) Wang Laks V.S. Lakshmanan
Probabilistic Privacy Analysis of Published Views Hui (Wendy) Wang Laks V.S. Lakshmanan University of British Columbia Vancouver, Canada Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Introduction Publishing relational data containing personal information Age Job Disease Sarah 50 Artist Heart disease Alice 30 Artist AIDS John 50 Artist Heart disease Privacy concern: private associations Name E.g., Alice gets AIDS Utility concern: public associations E.g. what are the ages that people are possibly to have heart disease Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Approach 1: K-anonymity K-anonymity (e.g., [Bayardo05], [LeFevre05]) Name Age Job Disease * [30,50] Artist Heart disease * [30,50] Artist AIDS * [30,50] Artist Heart disease Guaranteed privacy Compromise utility for privacy Revisit the example: what are the ages that people are possibly to have heart disease Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Approach 2: View Publishing Publishing Views (e.g., [Yao05], [Miklau04], [Xiao06]) Name Age Age Job Disease Sarah 50 50 Artist Heart disease Alice 30 30 Artist AIDS John 50 V1 Guaranteed utility Possibility of privacy breach E.g., V1 join V2 Wang, Lakshmanan V2 Prob(“Alice”, “AIDS”) = 1 Probabilistic Privacy Analysis of Published Views, IDAR'07 Approach 2 (Cont.) Different views yield different degree of protection Name Job Age Job Disease Sarah Artist 50 Artist Heart disease Alice Artist 30 Artist AIDS John Artist V1 V1 join V2 Wang, Lakshmanan V2 Prob(“Alice”, “AIDS”) < 1 Probabilistic Privacy Analysis of Published Views, IDAR'07 Problem Set Given a view scheme V, and a set of private associations A, what’s the probability of privacy breach of A by publishing V? Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Our Contributions Define two attack models Formally define the probability of privacy breach Propose connectivity graph as the synopsis of the database Derive the formulas of quantifying probability of privacy breach Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Outline Introduction Security model & attack model Measurement of probability Conclusion & future work Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Security Model Private association Form: (ID=I, P=p) E.g., (Name=“Alice”, Disease=“AIDS”) Can be expressed in SQL View: duplicate free Base table Uniqueness property: every ID value is associated with a unique p value in the base table Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Attack Model 1: Unrestricted Model The attacker does NOT know the existence of uniqueness property The attacker can access the view definition and the view tables The attack approach Construct the candidates of base table Pick the ones that contain the private association Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Example of Unrestricted Model A B C a1 b1 c1 a2 b1 c2 A B B C Attacker knows: a1 b1 b1 c1 a2 b1 b1 c2 V1= A, B (T) Base table T A B C Attacker constructs: a1 b1 c1 a2 b1 c2 A B C A V2 = B, c (T) B C a1 b1 c2 a1 b1 c2 a2 b1 c1 a2 b1 c2 a1 b1 c1 Possible world #1 Possible world #2 Possible world #3 For (A=a1, C=c1), √ X √ attacker picks: Prob. of privacy breach of (A=a1, C=c1): 5/7 Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 7 unrestricted ... possible worlds 5 unrestricted interesting worlds Attack Model 2: Restricted Model The attacker knows the existence of uniqueness property Similar attack approach Only pick the possible/interesting worlds that satisfy the uniqueness property Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Example of Restricted Model A B C a1 b1 C1 a2 b1 c2 A B B C Attacker knows: a1 b1 b1 c1 a2 b1 b1 c2 V1= A, B (T) Base table T A Attacker constructs B C A B V2 = B, c (T) C a1 b1 c1 a1 b1 c2 a2 b1 c2 a2 b1 c1 Possible world #1 2 restricted possible worlds Possible world #2 1 restricted X √ interesting world Prob. of privacy breach of (A=a1, C=c1): 1/2 For (A=a1, C=c1), attacker picks: Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 How to Measure Probability? Construction of possible/interesting worlds is not efficient! Is there a better way for probability measurement? Our approach: connectivity graph + (interesting) covers Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Probability Measurement: Unrestricted Model A B C a1 b1 c1 a2 b1 c2 a1 b2 c1 View scheme: V1= A, B (T), Private association: (A=a1, C=c1) Base table T 1 a1, b1 b1, c1 3 2 a2, b1 b1, c2 Connectivity graph Wang, Lakshmanan V2 = B, c (T) 4 Unrestricted covers = Unrestricted possible worlds E.g., (<1, 3>, <2, 4>, <1, 4>) Unrestricted interesting covers = Unrestricted Interesting worlds Prob. = # of unrestricted interesting covers / # of unrestricted covers Probabilistic Privacy Analysis of Published Views, IDAR'07 Quantification Formulas: 2-tablecase For view schemes of 2 view tables Equivalent to a 2-partite connectivity graph (m, n) Unrestricted model m n ( Cmmi1Cnn1j (1) m ni j 2ij1 (1) m1 (1) n1 (1) m n1 ) Prob.= i 2 j 2 m n ( C i 2 j 2 m m i C n n j (1) m n i j m 2 Cmmi Cnn j (1) m n i j i 2 j 2 (1) m 1 m (1) n 1 n ( 1) m n 1 mn) Restricted model Prob. = 1/n Wang, Lakshmanan n ij Probabilistic Privacy Analysis of Published Views, IDAR'07 Quantification Formulas: k-tablecase For view schemes of k>2 view tables The connectivity graph is not complete as 2view-table case We don’t have a general formula of prob. of privacy breach Have no choice but to enumerate all possible /interesting worlds Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Related Work K-anonymity (e.g. [Bayardo05], [LeFevre05]) Modify data values View publishing [Miklau04], [Deutsch05] [Yao05] Focus on complexity, not computation of probability Measure privacy breach by k-anonymity, not probability [Xiao06] Utility on aggregate result, not public associations Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Conclusion We defined a general framework to measure the likelihood of privacy breach We proposed two attack models For 2-view-table-case, we derived the formulas to calculate the probability of privacy breach Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Future Work For 2-view-table case, find an approximation of the formulas of probability for unrestricted model Keep on working on k-view-table case (k>2) Extension of restricted model Given a set of private/public associations and a base table, how to design the safe and useful views? Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Q&A Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 References [Bayardo05] Roberto J. Bayardo, Rakesh Agrawal, Data privacy through optimal k-anonymization, ICDE, 2005. [LeFevre05] Kristen LeFevre, David DeWitt, and Raghu Ramakrishnan, Incognito: Efficient Full-domain Kanonymity', SIGMOD'05. [Miklau04] Gerome Miklau, Dan Suciu, A Formal Analysis of Information Disclosure in Data Exchange', SIGMOD'04. [Xiao06] Xiaokui Xiao, Yufei Tao, Anatomy: Simple and Effective Privacy Preservation', VLDB, 2006. [Yao05] Chao Yao, X.Sean Wang, Sushil Jajodia, Checking for k-Anonymity Violation by Views', VLDB'05. Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07