Probabilistic Privacy Analysis of Published Views Hui (Wendy) Wang Laks V.S. Lakshmanan

Transcript Probabilistic Privacy Analysis of Published Views Hui (Wendy) Wang Laks V.S. Lakshmanan

Probabilistic Privacy Analysis
of Published Views
Hui (Wendy) Wang
Laks V.S. Lakshmanan
University of British Columbia
Vancouver, Canada
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Introduction


Publishing relational data containing personal information
Age
Job
Disease
Sarah
50
Artist
Heart disease
Alice
30
Artist
AIDS
John
50
Artist
Heart disease
Privacy concern: private associations


Name
E.g., Alice gets AIDS
Utility concern: public associations

E.g. what are the ages that people are possibly to have heart
disease
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Approach 1: K-anonymity



K-anonymity (e.g., [Bayardo05], [LeFevre05])
Name Age
Job
Disease
*
[30,50]
Artist
Heart disease
*
[30,50]
Artist
AIDS
*
[30,50]
Artist
Heart disease
Guaranteed privacy
Compromise utility for privacy

Revisit the example: what are the ages that people
are possibly to have heart disease
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Approach 2: View Publishing

Publishing Views (e.g., [Yao05], [Miklau04],
[Xiao06])
Name Age
Age
Job
Disease
Sarah 50
50
Artist
Heart disease
Alice
30
30
Artist
AIDS
John
50
V1


Guaranteed utility
Possibility of privacy breach

E.g., V1 join V2
Wang, Lakshmanan
V2
Prob(“Alice”, “AIDS”) = 1
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Approach 2 (Cont.)

Different views yield different degree of
protection
Name
Job
Age
Job
Disease
Sarah
Artist
50
Artist
Heart disease
Alice
Artist
30
Artist
AIDS
John
Artist
V1

V1 join V2
Wang, Lakshmanan
V2
Prob(“Alice”, “AIDS”) < 1
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Problem Set
Given a view scheme V, and a set of private
associations A, what’s the probability of privacy
breach of A by publishing V?
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Our Contributions




Define two attack models
Formally define the probability of privacy
breach
Propose connectivity graph as the synopsis
of the database
Derive the formulas of quantifying probability
of privacy breach
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Outline




Introduction
Security model & attack model
Measurement of probability
Conclusion & future work
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Security Model

Private association





Form: (ID=I, P=p)
E.g., (Name=“Alice”, Disease=“AIDS”)
Can be expressed in SQL
View: duplicate free
Base table

Uniqueness property: every ID value is associated
with a unique p value in the base table
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Attack Model 1: Unrestricted Model



The attacker does NOT know the
existence of uniqueness property
The attacker can access the view
definition and the view tables
The attack approach


Construct the candidates of base table
Pick the ones that contain the private
association
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Example of Unrestricted Model
A
B
C
a1
b1 c1
a2
b1 c2
A
B
B
C
Attacker knows: a1
b1
b1
c1
a2
b1
b1
c2
V1= A, B (T)
Base table T
A B
C
Attacker
constructs: a1 b1 c1
a2 b1 c2
A
B
C
A
V2 = B, c (T)
B
C
a1 b1 c2
a1 b1 c2
a2 b1 c1
a2 b1 c2
a1 b1 c1
Possible world #1
Possible world #2
Possible world #3
For (A=a1, C=c1), √
X
√
attacker picks:
Prob. of privacy breach of (A=a1, C=c1): 5/7
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07
7 unrestricted
... possible
worlds
5 unrestricted
interesting
worlds
Attack Model 2: Restricted Model


The attacker knows the existence of
uniqueness property
Similar attack approach

Only pick the possible/interesting worlds that
satisfy the uniqueness property
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Example of Restricted Model
A
B
C
a1
b1 C1
a2
b1 c2
A
B
B
C
Attacker knows: a1
b1
b1
c1
a2
b1
b1
c2
V1= A, B (T)
Base table T
A
Attacker constructs
B
C
A
B
V2 = B, c (T)
C
a1 b1 c1
a1 b1 c2
a2 b1 c2
a2 b1 c1
Possible world #1
2 restricted
possible
worlds
Possible world #2
1 restricted
X
√
interesting
world
Prob. of privacy breach of (A=a1, C=c1): 1/2
For (A=a1, C=c1),
attacker picks:
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07
How to Measure Probability?



Construction of possible/interesting worlds
is not efficient!
Is there a better way for probability
measurement?
Our approach: connectivity graph +
(interesting) covers
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Probability Measurement:
Unrestricted Model
A
B
C
a1
b1 c1
a2
b1 c2
a1
b2 c1
View scheme: V1= A, B (T),
Private association: (A=a1, C=c1)

Base table T
1
a1, b1
b1, c1
3

2
a2, b1
b1, c2
Connectivity graph
Wang, Lakshmanan
V2 = B, c (T)
4

Unrestricted covers =
Unrestricted possible worlds
 E.g., (<1, 3>, <2, 4>, <1,
4>)
Unrestricted interesting covers =
Unrestricted Interesting worlds
Prob. = # of unrestricted
interesting covers / # of
unrestricted covers
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Quantification Formulas: 2-tablecase

For view schemes of 2 view tables


Equivalent to a 2-partite connectivity graph (m, n)
Unrestricted model

m
n
( Cmmi1Cnn1j (1) m ni  j 2ij1  (1) m1  (1) n1  (1) m n1 )
Prob.=
i 2 j 2
m
n
( C
i 2 j 2
m
m i
C
n
n j
(1)
m  n i  j
m
2   Cmmi Cnn j (1) m  n i  j 
i 2 j 2
(1) m 1 m  (1) n 1 n  ( 1) m  n 1 mn)

Restricted model

Prob. = 1/n
Wang, Lakshmanan
n
ij
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Quantification Formulas: k-tablecase

For view schemes of k>2 view tables



The connectivity graph is not complete as 2view-table case
We don’t have a general formula of prob. of
privacy breach
Have no choice but to enumerate all possible
/interesting worlds
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Related Work

K-anonymity (e.g. [Bayardo05], [LeFevre05])


Modify data values
View publishing

[Miklau04], [Deutsch05]


[Yao05]


Focus on complexity, not computation of probability
Measure privacy breach by k-anonymity, not probability
[Xiao06]

Utility on aggregate result, not public associations
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Conclusion



We defined a general framework to
measure the likelihood of privacy breach
We proposed two attack models
For 2-view-table-case, we derived the
formulas to calculate the probability of
privacy breach
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Future Work



For 2-view-table case, find an approximation of
the formulas of probability for unrestricted
model
Keep on working on k-view-table case (k>2)
Extension of restricted model

Given a set of private/public associations and a base
table, how to design the safe and useful views?
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07
Q&A
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07
References





[Bayardo05] Roberto J. Bayardo, Rakesh Agrawal, Data
privacy through optimal k-anonymization, ICDE, 2005.
[LeFevre05] Kristen LeFevre, David DeWitt, and Raghu
Ramakrishnan, Incognito: Efficient Full-domain Kanonymity', SIGMOD'05.
[Miklau04] Gerome Miklau, Dan Suciu, A Formal Analysis
of Information Disclosure in Data Exchange', SIGMOD'04.
[Xiao06] Xiaokui Xiao, Yufei Tao, Anatomy: Simple and
Effective Privacy Preservation', VLDB, 2006.
[Yao05] Chao Yao, X.Sean Wang, Sushil Jajodia,
Checking for k-Anonymity Violation by Views', VLDB'05.
Wang, Lakshmanan
Probabilistic Privacy Analysis of
Published Views, IDAR'07