Center for Secure Information Systems Concordia Institute for Information Systems Engineering k-Jump Strategy for Preserving Privacy in Micro-Data Disclosure Wen Ming Liu1, Lingyu Wang1,

Download Report

Transcript Center for Secure Information Systems Concordia Institute for Information Systems Engineering k-Jump Strategy for Preserving Privacy in Micro-Data Disclosure Wen Ming Liu1, Lingyu Wang1,

Center for Secure Information Systems
Concordia Institute for Information Systems Engineering
k-Jump Strategy for Preserving Privacy in
Micro-Data Disclosure
Wen Ming Liu1, Lingyu Wang1, and Lei Zhang2
1
Concordia University
2 George Mason University
ICDT 2010
CIISE / CSIS
March 23 , 2010
Agenda
 Background
 K-Jump Strategy
 Data Utility Comparison
 Conclusion
2
Agenda
 Background
 Example
 Algorithm anaive and asafe
 K-Jump Strategy
 Data Utility Comparison
 Conclusion
3
Example
Data Holder’s View
4
Data
Holder
Example – Data Holder’s View
generalization
generalization
DoB
Name
DoB
Alice
Condition
Condition
DoB
1990
flu
1980~1999
Bob
1985
cold
Charlie
1974
cancer
David
1962
Eve
1953
Fen
1970~1999
cancer
1940~1969
1941
headache
toothache
Micro-Data Table t0
 Name: identifier.
 DoB: quasi-identifier.
Condition: sensitive attribute.
Goal:
Release table to
satisfy 2-diversity
flu
Condition
Condition
DoB
DoB
flu
1970~1999
cold
cold
Goal:
Release
table to
cancer
1960~1979
cancer
satisfy 2-diversity
cancer
cancer
1940~1959
headache
headache
Condition
Condition
Condition
flu
cold
cancer
1940~1969
cancer
Released!
headache
toothache
toothache
toothache
2-diversity?
generalization
function
Generalization
g1(t0g)1()
2-diversity?
generalization
function
Generalization
g2(t0g)2()
Generalization
g2(t0)
generalization
algorithm:
considering generalization
Released!
function g1 and then g2 in order
5
Example (cont.)
Adversary’s View
6
Example (cont.) – Adversary’s View
Attacker knows:
 generalization
Condition
Adversary
 public knowledge
 privacy property
Goal:
Guess what isDoB
the
micro-data
1970~1999
Name
DoB
Condition
flu
DoB
Condition
cold
t2
t3
t4
col
col … can can
flu
A
flu
flu
???
cancer
cold
B
col
can flu
1974
???
1940~1969
cancer
cancer
C
can col
David
1962
???
cancer
Eve
1953
???
Fen
1941
???
Alice
1990
???
Bob
1985
Charlie
Unknown
Public
Knowledge
Micro-Data Table t0
1970~1999
1940~1969
t35
t36
can … flu
col
… col
flu
D
can can can can … tac
tac
headache
E
hac hac hac hac … hac hac
toothache
toothache
F
tac tac
headache
can flu
tac
tac … can can
Released
Released
Generalization g2(t0)
Generalization g2(t0)
What can
adversary infer?
 Name: identifier.
 DoB: quasi-identifier.
Condition: sensitive attribute.
…
t1
The three persons in each group may
have thepermutation
three conditions inset
any given
order.
7
Example (cont.)
permutation set
…
t1
t2
t3
t4
t35
t36
A
flu
flu
col
col … can can
B
col
can flu
C
can col
can … flu
col
… col
flu
D
can can can can … tac
tac
E
hac hac hac hac … hac hac
F
tac tac
can flu
tac
tac … can can
This would be the adversary’s best guesses
of the micro-data table, if the released
generalization is his/her only knowledge,
However …
8
Example (cont.) – Adversary Simulating the Algorithm
However, adversary also knows the generalization
algorithm, and can simulate the algorithm to further
exclude some invalid guesses.
9
Mental image
Simulating the algorithm
Example (cont.) – Adversary Simulating the Algorithm
Name
DoB
Condition
t…
1
2
3
4
35
36
DoB
Condition
t…
1
2
3
4
35
36
DoB
Condition
Alice
1990
???
flu
cold
…
cancer
1980~1999
???
flu
cold
…
cancer
1970~1999
flu
Bob
1985
???
cold
cancer
…
flu
Charlie
1974
…
???
cancer
cold
flu
1960~1979
David
1962
???
cancer
…
toothache
Eve
1953
???
headache
…
Fen
1941
???
toothache
…
cancer
Violate
Satisfy
privacy!
privacy!
???
cancer
…
flu
cold
cold
…
???
cancer
cold
flu
cancer
???
cancer
…
toothache
1940~1959
1940~1969
cancer
???
…
headache
headache
???
toothache
…
cancer
toothache
Possible
Table ti
Unknown
Generalization
g1(ti)
Checked
but unused
Released
Micro-Data Table t0
Generalization g1(t0)
Generalization g2(t0)
t1
t2
t3
t4
…
t35
t36
Name
DoB
t1
t3
t7
t9
A
flu
flu
col
col
…
can
can
Alice
1990
flu
cold
flu
cold
B
col
can
flu
can
…
flu
col
Bob
1985
cold
flu
cold
flu
C
can
col
can
flu
…
col
flu
Charlie
1974
cancer
cancer
cancer
cancer
D
can
can
can
can
…
tac
tac
David
1962
cancer
cancer
cancer
cancer
E
hac
hac
hac
hac
…
hac
hac
Eve
toothache
toothache
F
tac
tac
tac
tac
…
can
can
headache
headache
permutation set
1953
headache
Let’s
Is this
trythe
to
check
valid headache
Fen
toothache toothache
guess
it 1941
using
ofthe
the
micro-data
algorithm!
table?
disclosure set
10
Decision Process of Safe and Unsafe Algorithms
Most existing generalization algorithms (without considering this problem):
g1(t0)
g2(t0)
gi(t0)
gn(t0)
Evaluate the permutation set.
t0
Y
per1
Y
N
g1
anaive
per2
Y
N
...
g2
peri
Y
N
...
gi
pern
N

(Adversary’s mental image of the microdata table without the knowledge about the
algorithm)
gn
Safe generalization algorithms (Zhang’07ccs, ….)
t0
asafe
g1(t0)
g2(t0)
gi(t0)
gn(t0)
Y
Y
Y
Y
ds1
N
ds2
N
...
dsi
N
...
dsn
per1
per2
peri
pern
g1
g2
gi
gn
Evaluate the disclosure set, instead.
N

(Adversary’s mental image of the microdata table after simulating the algorithm)
 box: the ith iteration
 diamond:
an evaluation of the
privacy property
 per: permutation set
 ds: disclosure set
evaluation
path
11
Agenda
 Background
 K-Jump Strategy
 The Algorithm Family ajump( k )
 Properties of ajump( k )
 Data Utility Comparison
 Conclusion
12
The Algorithm Family ajump(k)
g1(t0)
g2(t0)
Y
Y
ds1
ds2
Y
t0

ajump(k)
per1
g1
g2+k(t0)
Y
N
ds2+k
Y
N
per2
g2
gn(t0)
Y
N
dsn
Y
N
...
per2+k
N
Y
N
...
g2+k
pern
N

gn
 naive strategy : evaluate privacy property on permutation set only
 safe strategy : evaluate privacy property on disclosure set directly
 k-jump strategy: penalize by jumping over the next k-1 iterations
naive strategy: efficient but unsafe
safe strategy : safe but costly
13
Properties of ajump(k)
g1(t0)
g2(t0)
Y
Y
ds1
t0

ajump(k)
ds2
Y
per1
g2+k(t0)
Y
N
ds2+k
Y
N
g1
per2
gn(t0)
Y
N
dsn
Y
N
...
g2
per2+k
g2+k
N
Y
N
...
pern
N

gn
 Computation of the disclosure set
 asafe: to compute ds(gi(t0)), must first compute ds(gj(t)) for all t in per(gi(t0)) and j=1,2, … ,i-1
 ajump: to compute ds(gi(t0)) (2<i<2+k), no longer need to compute ds(g2(t)) for all t in per(gi(t0))
 ds(g1(t0)) and ds(g2(t0))
 ds(g1(t0)) = per(g1(t0))
 ds(g2(t0)) is independent of the distance vector.
 Size of the family
 There are (n-1)! different jump distance vectors.
14
Agenda
 Background
 K-Jump Strategy
 Data Utility Comparison
 Construction for Theorem 1:
1-jump and i-jump (1<i) incomparable
 Construction for Theorem 2:
i-jump and j-jump (1<i<j) incomparable
 Construction for Theorem 3:
K1-jump and K2-jump (K1,K2: vector) incomparable
 Construction for proposition 2:
Reusing generalization functions
 Results on asafe and ajump(1)
 Conclusion
15
Construction for Theorem1:1-jump and i-jump (1<i) incomparable
QID
g1
g2
g3
…
A
C0
C0
C0
…
B
C1
C1
C1
…
C
C2
C2
C2
…
D
C3
C3
C3
E
C4
C4
F
C5
G
S1
S2
S3
S4
A
C0
C0
C0
C0
B
C1
C1
C1
C1
C
C2
C2
C2
C2
…
D
C3
C3
C3
C3
C4
…
E
C4
C4
C4
C4
C5
C5
…
F
C5
C5
C5
C5
C6
C6
C6
…
G
C6
C6
C6
C6
H
C6
C6
C6
…
H
C6
C6
C6
C6
I
C6
C6
C6
…
I
C6
C8/C9
C7/C9
C7/C8
J
C7
C7
C7
…
J
C7
C6
C6
C6
K
C7
C7
C7
…
K
C7
C8
C7
C7
L
C8
C8
C8
…
L
C8
C9
C9
C8
M
C8
C8
C8
…
M
C8
C8/C9
C7/C9
C7/C8
N
C9
C9
C9
…
N
C9
C7
C8
C9
O
C9
C9
C9
…
O
C9
C7
C8
C9
#
4320
1152
1152
1152
privacy property :
highest ratio of a
sensitive value in a group
must be no greater than
1/2
To compute ds3k(t0):
1
Excluding any table t for
which p(per1(t))=true
Belongs to one of the
four disjoint sets.
16
Construction for Theorem1 (cont.) : 1-jump and i-jump (1<i)
QID
g1
g2
g3
…
A
C0
C0
C0
…
B
C1
C1
C1
…
C
C2
C2
C2
…
D
C3
C3
C3
…
E
C4
C4
C4
…
F
C5
C5
C5
…
G
C6
C6
C6
H
C6
C6
I
C6
J
S1
S2
S3
S4
A
C0
C0
C0
C0
B
C1
C1
C1
C1
C
C2
C2
C2
C2
D
C3
C3
C3
C3
E
C4
C4
C4
C4
F
C5
C5
C5
C5
…
G
C6
C6
C6
C6
C6
…
H
C6
C6
C6
C6
C6
C6
…
I
C6
C8/C9
C7/C9
C7/C8
C7
C7
C7
…
J
C7
C6
C6
C6
K
C7
C7
C7
…
K
C7
C8
C7
C7
L
C8
C8
C8
…
L
C8
C9
C9
C8
M
C8
C8
C8
…
M
C8
C8/C9
C7/C9
C7/C8
N
C9
C9
C9
…
N
C9
C7
C8
C9
O
C9
C9
C9
…
O
C9
C7
C8
C9
#
4320
1152
1152
1152
privacy property :
highest ratio of a
sensitive value in a group
must be no greater than
1/2
To compute ds3k(t0):
1
2
Excluding any table t for
which p(per1(t))=true
Considering generalizing
these tables using g2
S2, S3, S4 cannot be
disclosed under g2.
17
Construction for Theorem1 (cont.) :
QID
g1
g2
g3
…
A
C0
C0
C0
…
B
C1
C1
C1
…
C
C2
C2
C2
…
D
C3
C3
C3
…
E
C4
C4
C4
…
F
C5
C5
C5
…
G
C6
C6
C6
H
C6
C6
I
C6
J
1-jump and i-jump (1<i)
S1
S101
S102
S103
A
C0
C0
C0
C0
B
C1
C1
C1
C1
C
C2
C2
C2
C2
D
C3
C3
C3
C3
E
C4
C4
C4
C4
F
C5
C5
C5
C5
…
G
C6
C6
C6
C6
C6
…
H
C6
C6
C6
C6
C6
C6
…
I
C6
C6
C6
C6
C7
C7
C7
…
J
C7
C8
C7
C7
K
C7
C7
C7
…
K
C7
C8
C7
C7
L
C8
C8
C8
…
L
C8
C9
C9
C8
M
C8
C8
C8
…
M
C8
C9
C9
C8
N
C9
C9
C9
…
N
C9
C7
C8
C9
O
C9
C9
C9
…
O
C9
C7
C8
C9
#
4320
288
288
288
privacy property :
highest ratio of a
sensitive value in a group
must be no greater than
1/2
To compute ds3k(t0):
1
2
Excluding any table t for
which p(per1(t))=true
Considering generalizing
these tables using g2
a. Subsets in S1
which with both N
and O have C7, C8,
or C9 cannot be
disclosed under g2.

|S1’|=864
18
Construction for Theorem1 (cont.) :
QID
g1
g2
g3
…
A
C0
C0
C0
…
B
C1
C1
C1
…
C
C2
C2
C2
…
D
C3
C3
C3
…
E
C4
C4
C4
…
F
C5
C5
C5
…
G
C6
C6
C6
H
C6
C6
I
C6
J
1-jump and i-jump (1<i)
S1
S111
S112
S113
A
C0
C0
C0
C0
B
C1
C1
C1
C1
C
C2
C2
C2
C2
D
C3
C3
C3
C3
E
C4
C4
C4
C4
F
C5
C5
C5
C5
…
G
C6
C6
C6
C6
C6
…
H
C6
C6
C6
C6
C6
C6
…
I
C6
C6
C6
C6
C7
C7
C7
…
J
C7
C7
C7
C7
K
C7
C7
C7
…
K
C7
C8
C8
C7
L
C8
C8
C8
…
L
C8
C9
C8
C8
M
C8
C8
C8
…
M
C8
C9
C9
C9
N
C9
C9
C9
…
N
C9
C7
C7
C8
O
C9
C9
C9
…
O
C9
C8
C9
C9
#
4320
1152
1152
1152
To compute ds3k(t0):
1
2
Excluding any table t for
which p(per1(t))=true
Considering generalizing
these tables using g2
b. For ajump(i),all
tables in S1\S1’ will
be excluded from
ds3i(t0).
ds 3(t 0 )  S 1  S 2  S 3  S 4
i
privacy property :
highest ratio of a
sensitive value in a group
must be no greater than
1/2
'

|S1\S1’|=3456
Satisfied!
19
Construction for Theorem1 (cont.) :
QID
g1
g2
g3
…
A
C0
C0
C0
…
B
C1
C1
C1
…
C
C2
C2
C2
…
D
C3
C3
C3
…
E
C4
C4
C4
…
F
C5
C5
C5
…
G
C6
C6
C6
H
C6
C6
I
C6
J
1-jump and i-jump (1<i)
S1
S111
S1111
S1112
A
C0
C0
C0
C0
B
C1
C1
C1
C1
C
C2
C2
C2
C2
D
C3
C3
C3
C3
E
C4
C4
C4
C4/C5
F
C5
C5
C5
C6
…
G
C6
C6
C6
C6
C6
…
H
C6
C6
C6
C4/C5
C6
C6
…
I
C6
C6
 C6
C6
C7
C7
C7
…
J
C7
C7
C7
C7
K
C7
C7
C7
…
K
C7
C8
C8
C8
L
C8
C8
C8
…
L
C8
C9
C9
C9
M
C8
C8
C8
…
M
C8
C9
C9
C9
N
C9
C9
C9
…
N
C9
C7
C7
C7
O
C9
C9
C9
…
O
C9
C8
C8
C8
#
4320
1152
576
576
To compute ds3k(t0):
1
2
Excluding any table t for
which p(per1(t))=true
Considering generalizing
these tables using g2
c. For ajump(1),the
disclosure set of all
tables in S1\S1’ under
g2 do not satisfy the
privacy property.
ds 3(t 0 )  S 1  S 2  S 3  S 4
1
privacy property :
highest ratio of a
sensitive value in a group
must be no greater than
1/2
Violated!
The ratio of I being
associated with C6 is 5/9.
20
Construction for Theorem2: i-jump and j-jump (1<i<j) incomparable
Show the evaluation
paths by figures.
21
Construction for Theorem2 (cont.) : i-jump and j-jump (1<i<j)
g1
g2
g3
… gj
gj+1
gj+2
…
C0
C0
C0
… C0
C0
C0
…
C1
C1
C1
… C1
C1
C1
…
 The case where i-jump has better
C2
C2
C2
… C2
C2
C2
…
utility than j-jump is relatively easier to
C3
C3
C3
… C3
C3
C3
…
construct. We only show the construction
C4
C4
C4
… C4
C4
C4
…
for the other case.
S
S
S
… S
S
S
…
S
S
S
… S
S
S
…
C5
C5
C5
… C5
C5
C5
…
C6
C6
C6
… C6
C6
C6
…
C7
C7
C7
… C7
C7
C7
…
C8
C8
C8
… C8
C8
C8
…
C9
C9
C9
… C9
C9
C9
…
…
…
…
… …
…
…
…
 For this construction, generalization
gj+2 will be released for j-jump, while
gj+i+1 or after will be released for i-jump.
22
Construction for Theorem3:
K1-jump and K2-jump (K1,K2:vectors) incomparable
23
Construction for proposition2: Reusing generalization functions
QID
g1
g2
g3
g2'
A
C1
C1
C1
C1
B
C2
C2
C2
C2
C
C3
C3
C3
C3
D
C4
C4
C4
C4
E
C5
C5
C5
F
C3
C3
G
C3
C3
g2
S1
S2
S3
A
C1
C1
C1/C2
C1/C2
B
C2
C2
C3
C3
C
C3
C3
C1/C2
C1/C2
D
C4
C4
C3
C4
C5
E
C5
C5
C3
C5
C3
C3
F
C3
C3
C4
C3
C3
C3
G
C3
C3
C5
C3
#
72
24
8
8
Without reusing g2:
The table will lead
to disclosing
nothing!
Belongs to one of the
three disjoint sets.

40
1
2
Cannot be disclosed
under g1(.) or g3(.) .
To compute ds2:
 the jump distance is 1;
ds 2  S 1  S 2  S 3
 the privacy property:
highest ratio of a sensitive value in
a group must be no greater than ½.
Violated!
24
Construction for proposition2 (cont.): Reusing generalization functions
QID
g1
g2
g3
g2'
A
C1
C1
C1
C1
B
C2
C2
C2
C2
C
C3
C3
C3
C3
D
C4
C4
C4
C4
E
C5
C5
C5
C5
F
C3
C3
C3
G
C3
C3
C3
g3
S1
S2
S3
A
C1
C1
C1/C2
C1/C2
B
C2
C2
C3
C3
C
C3
C3
C1/C2
C1/C2
D
C4
C4
C3
C4
E
C5
C5
C3
C5
C3
F
C3
C3
C4
C3
C3
G
C3
C3
C5
C3
24
8
8
g2 is reused as g2’:
To calculate ds2’,
the tables can be
disclosed under g1,
g2, and g3 must be
excluded from per2’
#
1
2
S1,S2, and S3 cannot
be disclosed under g2,
as mentioned above.
S2 and S3 cannot be
disclosed under g3.

40
 the jump distance is 1;
 the privacy property:
highest ratio of a sensitive value in
a group must be no greater than ½.
25
Construction for proposition2 (cont.): Reusing generalization functions
QID
g1
g2
g3
g2'
A
C1
C1
C1
C1
B
C2
C2
C2
C2
C
C3
C3
C3
C3
D
C4
C4
C4
C4
E
C5
C5
C5
C5
F
C3
C3
C3
G
C3
C3
C3
1
S1,S2, and S3 cannot be
disclosed under g2, as
mentioned above.
2
S2 and S3 cannot be
disclosed under g3.
3
S1
S11
S12
S13
A
C1
C1
C1
C1
B
C2
C2
C2
C2
C
C3
C3
C3
C3
D
C4
C3
C3
C4
E
C5
C4/C5
C3
C5
C3
F
C3
C3
C4
C3
C3
G
C3
C4/C5
C5
C3
#
24
16
4
4
S1 can be further divided
into three disjoint subsets
g2 is reused as g2’:
To caculate ds2’,
the tables can be
disclosed under g1,
g2, and g3 must be
excluded from per2’

a. S12 and S13 cannot
be disclosed under g3.
 the jump distance is 1;
 the privacy property:
highest ratio of a sensitive value in
a group must be no greater than ½.
26
Construction for proposition2 (cont.): Reusing generalization functions
QID
g1
g2
g3
g2'
A
C1
C1
C1
C1
B
C2
C2
C2
C2
C
C3
C3
C3
C3
D
C4
C4
C4
C4
E
C5
C5
C5
C5
F
C3
C3
C3
C3
G
C3
C3
C3
C3
1
2
3
To compute ds3(t0 in S11):
g2 is reused as g2’:
To caculate ds2’,
the tables can be
disclosed under g1,
g2, and g3 must be
excluded from per2’
S1,S2, and S3 cannot be
disclosed under g2, as
mentioned above.
b. The tables in
subset S11 can be
disclosed under
g3.
S1 can be further divided
into three disjoint subsets
S2 and S3 cannot be
disclosed under g3.
Excluding any table t for
which p(per1(t))=true
A
These subsets cannot
B Belongs to one of the
be
under g2.
twodisclosed
disjoint sets
(nor under g2).
one instance
S1
S11
tA
SA1
SA2
A
C1
C1
C1
C3
C1/C2/C4
B
C2
C2
C2
C3
C1/C2/C4
C
C3
C3
C3
C1
C3
D
C4
C3
C3
C2
C3
E
C5
C4/C5
C4
C4
C1/C2/C4
F
C3
C3
C3
C3
C3
G
C3
C4/C5
C5
C5
C5
#
24
16
120
12
36

27
Construction for proposition2 (cont.): Reusing generalization functions
QID
g1
g2
g3
g2'
A
C1
C1
C1
C1
B
C2
C2
C2
C2
C
C3
C3
C3
C3
D
C4
C4
C4
E
C5
C5
F
C3
G
C3
S12
S13
S2
S3
A
C1
C1
C1/C2
C1/C2
B
C2
C2
C3
C3
C
C3
C3
C1/C2
C1/C2
C4
D
C3
C4
C3
C4
C5
C5
E
C3
C5
C3
C5
C3
C3
C3
F
C4
C3
C4
C3
C3
C3
C3
G
C5
C3
C5
C3
#
4
4
8
8
g2 is reused as g2’:
ds 2 '  S 12  S 13  S 2  S 3
The ratio of D and E
being associated with
C3 are 0.5, which is the
highest ratio.
 the jump distance is 1;
 the privacy property:
highest ratio of a sensitive value in
a group must be no greater than ½.
Satisfied!
28
Results on asafe and ajump(1)
1. When the privacy property is:
either set-monotonic
or
based on the highest ratio of sensitive values
 Lemma 3:
p(per(t0))=false  p(any of its subsets)=false
 Corollary 1:
The algorithm asafe has the same data utility as ajump(1)
2. When the privacy property is other cases:
 Lemma 4:
The ds3 under asafe is a subset of that under ajump(1)
 Theorem 5:
The data utility of asafe and ajump(1) is generally incomparable.
29
Agenda
 Background
 K-Jump Strategy
 Data Utility Comparison
 Conclusion
30
Conclusion
 We have proposed a novel k-jump strategy for
micro-data disclosure.
 Transform a given generalization algorithm into a large
number of safe algorithms.
 Show the data utility is generally incomparable by
constructing counter-examples.
 Practical impact: make a secret choice.
31
Further Result and Future Work
 Further Results in the extended version of this
paper:
n
 Computational complexity: O (| max( per ) | k )
 Making a secret choice among unsafe algorithms does not yield a
safe solution.
 Future studies:
 Study more efficient safe algorithms.
 Employ statistical methods to compare different k-jump
algorithms..
 Further investigate the opportunity in reusing generalization
functions.
32
Thank you!
33
Example – Data Holder View
Data
Holder
generalization
Goal:
Release table to
satisfy 2-diversity
generalization
Name
DoB
Condition
DoB
Condition
Condition
DoB
DoB
Alice
1990
flu
1980~1999
flu
1970~1999
Bob
1985
cold
Charlie
1974
cancer
David
1962
cancer
cold
Goal:
Releasecancer
table to
1960~1979
satisfy 2-diversity
Eve
1953
headache
Fen
1941
toothache
Micro-Data Table t0
 Name: identifier.
 DoB: quasi-identifier.
Condition: sensitive attribute.
cancer
1940~1959
Condition
Condition
Condition
flu
cold
cancer
1940~1969
cancer
headache
headache
toothache
toothache
2-diversity?
generalization
function
Generalization
g1(t0g)1()
2-diversity?
generalization
function
Generalization
g2(t0g)2()
generalization algorithm:
considering generalization
function g1 and then g2 in order
34
Toy Example
Data
Holder
Attacker
generalized
Name
DoB
Condition
DoB
Condition
Alice
1990
flu
???
1970~1999
flu
Bob
1985
cold
???
Charlie
1974
cancer
???
David
1962
cancer
???
Eve
1953
Fen
1941
Attacker knows:
 generalization
 external data
 privacy property
…
t1
t2
t3
t4
A
flu
flu
col
col … can can
cold
B
col
can flu
cancer
C
can col
cancer
headache
???
toothache
???
1940~1969
t35
t36
can … flu
col
… col
flu
D
can can can can … tac
tac
headache
E
hac hac hac hac … hac hac
toothache
F
tac tac
can flu
tac
tac … can can
2-diversity
Micro-Data
External Table
Data t0
Generalization g2(t0)
 Name: identifier.
 DoB: quasi-identifier.
What can
attacker infer?
permutation set
Condition: sensitive attribute.
35