Probabilistic Privacy Analysis of Published Views Hui (Wendy) Wang Laks V.S. Lakshmanan
Download
Report
Transcript Probabilistic Privacy Analysis of Published Views Hui (Wendy) Wang Laks V.S. Lakshmanan
Probabilistic Privacy Analysis
of Published Views
Hui (Wendy) Wang
Laks V.S. Lakshmanan
University of British Columbia
Vancouver, Canada
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Introduction
Publishing relational data containing personal information
Age
Job
Disease
Sarah
50
Artist
Heart disease
Alice
30
Artist
AIDS
John
50
Artist
Heart disease
Privacy concern: private associations
Name
E.g., Alice gets AIDS
Utility concern: public associations
E.g. what are the ages that people are possibly to have heart
disease
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Approach 1: K-anonymity
K-anonymity (e.g., [Bayardo05], [LeFevre05])
Name Age
Job
Disease
*
[30,50]
Artist
Heart disease
*
[30,50]
Artist
AIDS
*
[30,50]
Artist
Heart disease
Guaranteed privacy
Compromise utility for privacy
Revisit the example: what are the ages that people
are possibly to have heart disease
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Approach 2: View Publishing
Publishing Views (e.g., [Yao05], [Miklau04],
[Xiao06])
Name Age
Age
Job
Disease
Sarah 50
50
Artist
Heart disease
Alice
30
30
Artist
AIDS
John
50
V1
Guaranteed utility
Possibility of privacy breach
E.g., V1 join V2
Wang, Lakshmanan
V2
Prob(“Alice”, “AIDS”) = 1
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Approach 2 (Cont.)
Different views yield different degree of
protection
Name
Job
Age
Job
Disease
Sarah
Artist
50
Artist
Heart disease
Alice
Artist
30
Artist
AIDS
John
Artist
V1
V1 join V2
Wang, Lakshmanan
V2
Prob(“Alice”, “AIDS”) < 1
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Problem Set
Given a view scheme V, and a set of private
associations A, what’s the probability of privacy
breach of A by publishing V?
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Our Contributions
Define two attack models
Formally define the probability of privacy
breach
Propose connectivity graph as the synopsis
of the database
Derive the formulas of quantifying probability
of privacy breach
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Outline
Introduction
Security model & attack model
Measurement of probability
Conclusion & future work
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Security Model
Private association
Form: (ID=I, P=p)
E.g., (Name=“Alice”, Disease=“AIDS”)
Can be expressed in SQL
View: duplicate free
Base table
Uniqueness property: every ID value is associated
with a unique p value in the base table
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Attack Model 1: Unrestricted Model
The attacker does NOT know the
existence of uniqueness property
The attacker can access the view
definition and the view tables
The attack approach
Construct the candidates of base table
Pick the ones that contain the private
association
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Example of Unrestricted Model
A
B
C
a1
b1 c1
a2
b1 c2
A
B
B
C
Attacker knows: a1
b1
b1
c1
a2
b1
b1
c2
V1= A, B (T)
Base table T
A B
C
Attacker
constructs: a1 b1 c1
a2 b1 c2
A
B
C
A
V2 = B, c (T)
B
C
a1 b1 c2
a1 b1 c2
a2 b1 c1
a2 b1 c2
a1 b1 c1
Possible world #1
Possible world #2
Possible world #3
For (A=a1, C=c1), √
X
√
attacker picks:
Prob. of privacy breach of (A=a1, C=c1): 5/7
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07
7 unrestricted
... possible
worlds
5 unrestricted
interesting
worlds
Attack Model 2: Restricted Model
The attacker knows the existence of
uniqueness property
Similar attack approach
Only pick the possible/interesting worlds that
satisfy the uniqueness property
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Example of Restricted Model
A
B
C
a1
b1 C1
a2
b1 c2
A
B
B
C
Attacker knows: a1
b1
b1
c1
a2
b1
b1
c2
V1= A, B (T)
Base table T
A
Attacker constructs
B
C
A
B
V2 = B, c (T)
C
a1 b1 c1
a1 b1 c2
a2 b1 c2
a2 b1 c1
Possible world #1
2 restricted
possible
worlds
Possible world #2
1 restricted
X
√
interesting
world
Prob. of privacy breach of (A=a1, C=c1): 1/2
For (A=a1, C=c1),
attacker picks:
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07
How to Measure Probability?
Construction of possible/interesting worlds
is not efficient!
Is there a better way for probability
measurement?
Our approach: connectivity graph +
(interesting) covers
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Probability Measurement:
Unrestricted Model
A
B
C
a1
b1 c1
a2
b1 c2
a1
b2 c1
View scheme: V1= A, B (T),
Private association: (A=a1, C=c1)
Base table T
1
a1, b1
b1, c1
3
2
a2, b1
b1, c2
Connectivity graph
Wang, Lakshmanan
V2 = B, c (T)
4
Unrestricted covers =
Unrestricted possible worlds
E.g., (<1, 3>, <2, 4>, <1,
4>)
Unrestricted interesting covers =
Unrestricted Interesting worlds
Prob. = # of unrestricted
interesting covers / # of
unrestricted covers
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Quantification Formulas: 2-tablecase
For view schemes of 2 view tables
Equivalent to a 2-partite connectivity graph (m, n)
Unrestricted model
m
n
( Cmmi1Cnn1j (1) m ni j 2ij1 (1) m1 (1) n1 (1) m n1 )
Prob.=
i 2 j 2
m
n
( C
i 2 j 2
m
m i
C
n
n j
(1)
m n i j
m
2 Cmmi Cnn j (1) m n i j
i 2 j 2
(1) m 1 m (1) n 1 n ( 1) m n 1 mn)
Restricted model
Prob. = 1/n
Wang, Lakshmanan
n
ij
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Quantification Formulas: k-tablecase
For view schemes of k>2 view tables
The connectivity graph is not complete as 2view-table case
We don’t have a general formula of prob. of
privacy breach
Have no choice but to enumerate all possible
/interesting worlds
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Related Work
K-anonymity (e.g. [Bayardo05], [LeFevre05])
Modify data values
View publishing
[Miklau04], [Deutsch05]
[Yao05]
Focus on complexity, not computation of probability
Measure privacy breach by k-anonymity, not probability
[Xiao06]
Utility on aggregate result, not public associations
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Conclusion
We defined a general framework to
measure the likelihood of privacy breach
We proposed two attack models
For 2-view-table-case, we derived the
formulas to calculate the probability of
privacy breach
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Future Work
For 2-view-table case, find an approximation of
the formulas of probability for unrestricted
model
Keep on working on k-view-table case (k>2)
Extension of restricted model
Given a set of private/public associations and a base
table, how to design the safe and useful views?
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Q&A
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07
References
[Bayardo05] Roberto J. Bayardo, Rakesh Agrawal, Data
privacy through optimal k-anonymization, ICDE, 2005.
[LeFevre05] Kristen LeFevre, David DeWitt, and Raghu
Ramakrishnan, Incognito: Efficient Full-domain Kanonymity', SIGMOD'05.
[Miklau04] Gerome Miklau, Dan Suciu, A Formal Analysis
of Information Disclosure in Data Exchange', SIGMOD'04.
[Xiao06] Xiaokui Xiao, Yufei Tao, Anatomy: Simple and
Effective Privacy Preservation', VLDB, 2006.
[Yao05] Chao Yao, X.Sean Wang, Sushil Jajodia,
Checking for k-Anonymity Violation by Views', VLDB'05.
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07