Transcript Recent Advances in Homomorphic Encryption
Recent Advances in Homomorphic Encryption
Shai Halevi – IBM Research February 13, 2012
Computing on Encrypted Data • o o o Wouldn’t it be nice to be able to… Encrypt my data before sending to the cloud While still allowing the cloud to search/sort/edit/… this data on my behalf Keeping the data in encrypted form • Without shipping it back and forth to be decrypted
Computing on Encrypted Data • o o o Wouldn’t it be nice to be able to… Encrypt my queries to the cloud While still allowing the cloud to process them Cloud returns encrypted answers • that I can decrypt
Computing on Encrypted Data $skj#hS28ksytA@ …
Computing on Encrypted Data $kjh9*mslt@na0 &maXxjq02bflx m^00a2nm5,A4.
pE.abxp3m58bsa
(3saM%w,snanba nq~mD=3akm2,A Z,ltnhde83|3mz{n dewiunb4]gnbTa* kjew^bwJ^mdns0
Privacy Homomorphisms • Rivest-Adelman-Dertouzos 1978
Plaintext
x
1
P
x
2
c i
Enc(
x i
)
Ciphertext space C
c
1
c
2 * #
y
Dec(
d
)
y d
• • o Example: RSA-encrypt (
e
,
N
) (
x
) =
x e x
1 e
x
2 e = (
x
1
x
2 ) e mod
N
mod
N
“Somewhat Homomorphic” (SWHE): can compute some functions on encrypted data
“Fully Homomorphic” Encryption (FHE) • Encryption for which we can compute arbitrary functions on the encrypted data Enc(
x
) Eval
f
Size independent of
f
’s complexity Enc ( f(
x
) ) • o Enough to do Enc(
x
1
x
2 ), Enc(
x
1 +
x
2 ) Every function can be expressed as a polynomial
How To Do It?
• • • o o Open for >30 years First plausible construction in [Gentry’09] SWHE with security based on hard problems in (ideal) integer lattices A transformation SWHE FHE • Requires SWHE that can evaluate its own decryption circuit Several other constructions since then
A Taste of Gentry’s SWHE Scheme • • • • • o Public key is two integers p,r Secret key is one integer w Enc
p,r
( 𝑏 ∈ {0,1} ): Implicit encoding of matix, vecot choose high-degree polynomial with small coefficients (call it Q), output c=2Q(r)+b mod p Dec
w
(c): output (c∙w mod p) mod 2 Addition, multiplication of ctxts modulo p (main catch: integers are huge…)
Performance • • • • o A little slow… First working implementation in [GH11] ½ -hour to compute a single gate Because of homomorphic decryption o o o 13-14 orders of magnitude slowdown vs. computing on non-encrypted data The underlying SWHE is faster Can evaluate polynomials of degree upto ~200 About ½-second for a single gate
Performance • • A little slow… Butler Lampson: “ lifetimes […] ” I don’t think we’ll see anyone using Gentry’s solution in our – Forbes, Dec-19, 2011 Rest of this talk: reasons to believe otherwise
• • Why is [G’09] So Slow?
o Ciphertext of SWHE is large Ciphertext has dimension D, bitsize N • Ciphertext size |c| ~ D∙N implicit in
p,r,w
o o Ciphertext contains n-bit “noise” • D > N∙ l to get security initial noise, say
n
0 = log( l ) • N > n to be able to decrypt Mult(c1,c2) adds the noise bits in the ci’s • e.g., from n bits each to 2n-bits in the product o Degree-k polynomial has ≥kn 0 -bit noise Need N>kn 0 , D>kn 0 ∙ l , |c|> W ( l
k
2 )
Why is [G’09] So Slow?
• • Ciphertext of SWHE has size |c|≥ W ( l
k
2 ) SWHE FHE relies on homomorphic evaluation of the decryption function o Of degree W ( l ), so |c|≥ W ( l 3 ) Decryption takes ≥ W ( l 3 ) operations Homomorphic evaluation of each operation on two ciphertexts takes time ≥ W ( l 3 ) Total complexity ≥ W ( l 6 ) o Can be improved to 3.5
)
Faster Homomorphic Evaluation • • • o Brakerski-Vaikuntanathan’11 and Brakerski-Gentry-Vaikuntanathan’12 From 3.5
to o Smart-Vercateran‘11 SIMD operations, batching ops at a time o o Gentry-Halevi-Smart’12 From SIMD to general-purpose computing Evaluating t-gate circuit in time t∙polylog( l ) • Assuming average width
A Taste of [BV11, BGV12] • • • • • • Secret key is a vector s Ciphertext is a vector c o Dec
s
(c): output (c•s mod q) mod 2 q is a parameter Addition is vector-addition modulo q Multiplication is tensor-product mod q, followed by “dimension reduction” Parameter q evolves during computation
The [BV11, BGV12] Techniques • • • • o A better multiplication technique Decrease bitsize rather than increase noise o c1,c2, have bitsize N, noise bitsize = n Mult(c1,c2) has bitsize N n, noise bitsize = n o Need N 0 >(k+1)n to handle depth k This gives degree 2
k
Decryption has depth O(log l ) N 0 = polylog( l ), D = , |c|= 𝑂(𝜆)
Exploiting Parallelism • SIMD: Working on Data Arrays Array of length ℓ ℓ -ADD
2 8 1 2 10 3 9 0 9 2 6 8 4 1 5 4 5 9 5 9 0 3 14 3 7 8 3 0 15 3 6 1 7 … 1 … 4 … 5 2 4 6
Exploiting Parallelism • SIMD: Working on Data Arrays Array of length ℓ ℓ -MULT
2 8 1 2 16 2 9 0 0 2 6 4 1 4 5 5 9 0 3 12 4 20 45 0 7 8 3 0 56 0
• What’s the point?
o Efficiency: ℓ -fold parallelism
6 1 6 … 1 … 4 … 4 2 4 8
Plaintext Algebra • • • Some FHE variants use polynomial rings o o Native plaintext space is 𝑅 2 = 𝑍 2 Binary polynomials modulo Φ 𝑚 (𝑋) • Φ 𝑚 (𝑋) is m’th cyclotomic polynomial Dimension is 𝐷 = 𝜙 𝑚 ≈ 𝑚 𝑋 /Φ 𝑚 (𝑋) o o Φ 𝑚 (𝑋) Φ 𝑚 irreducible over Z, but not mod 2 𝑋 = ∏ ℓ 𝑗=1 𝐹 𝑗 𝑋 (mod 2) • The F
j
’s are irreducible, all have the same degree For some m’s we can get ℓ = Ω( 𝐷 log 𝐷 )
Plaintext Slots • • • o Plaintext element 𝑎 ∈ 𝑅 2 ℓ 𝑎 ≅ 𝛼 𝑗 𝑗=1 , 𝛼 𝑗 = (𝑎 𝑚𝑜𝑑 𝐹 𝑗 ) encodes ℓ values o Polynomial Chinese Remainders Each 𝛼 𝑗 can be a bit Ops ,+ work independently on the slots o 𝑎 × 𝑎 ′ ≅ 𝛼 𝑗 × 𝛼 𝑗 ′ 𝑗 , 𝑎 + 𝑎 ′ ≅ 𝛼 𝑗 + 𝛼 𝑗 ′ 𝑗
Homomorphic SIMD [SV’11] • • • • Computing same function on ℓ the price of one computation inputs at o Pack the inputs into the slots Bit-slice, inputs to j’th instance go in j‘th slots Compute the function once After decryption, decode the ℓ output bits from the output plaintext polynomial
Aside: an ℓ -SELECT Operation x
x 1 1 x 2 0 x 3 0 x 4 1 x 5 0 x 6 1 x 7 0 =
x
x 8 0 x 9 1 x 10 1 x 11 0 x 12 1 x 13 0 x 14 1 = x 1 0 0 0 x 9 x 10 x 4 + 0 0 x 12 x 6 0 0 x 14 x 1 x 9 x 10 x 4 x 12 x 6 x 14
• We will use this later
Low-Overhead HE [GHS’12] • • • o Start from the [BGV’12] cryptosystem Overhead o Apply the [SV’11] SIMD trick Overhead polylog( l ) when computing the same function f on inputs o o Extend to computing a single instance of f As long as the circuit of f has width Use internal parallelism inside this one circuit
So you want to compute some function… 1 1 x 1 x 2 x 3 x 4 x 5 0 x 7 x 8 x 9 x 10 x 11 x 12 1 x 14 x 15 x 16 x 17 x 18 x 19 1 x 21 x 22 x 23 x 24 x 25 x 26 Input bits ADD and MULT are a complete set of operations.
So you want to compute some function… Using SIMD… 1 1 x 1 x 2 x 3 x 4 x 1 x 2 x 3 x 4 x 5 x 5 0 x 7 x 8 x 9 x 10 x 11 x 12 x 7 x 8 x 9 x 10 x 11 x 12 1 x 14 x 15 x 16 x 17 x 18 x 19 x 14 x 15 x 16 x 17 x 18 x 19 1 x 21 x 22 x 23 x 24 x 25 x 26 x 21 x 22 x 23 x 24 x 25 x 26 Input bits ℓ -ADD and ℓ -MULT are not a complete set of operations!!!
… unless, of course, we use ℓ =1…
Routing Values Between Levels • We need to map this
x 1 x 15 x 2 x 16 x 3 x 17 x 4 x 18 x 5 x 19 0 1 x 7 x 21 x 8 x 22 x 9 x 23 x 10 x 24 x 11 x 25 x 12 x 26
• Into that … so we can use ℓ -add
1 x 14 x 1 x 2 x 3 x 4 x 5 0 x 7 x 8 x 9 x 10 x 11 x 12 1 x 14 x 15 x 16 x 17 x 18 x 19 1 x 21 x 22 x 23 x 24 x 25 x 26
ℓ -ADD, ℓ -MULT, ℓ -PERMUTE: a complete set of SIMD ops x 1 x 2 x 3 x 4 x 5 0 ℓ -PERMUTE( π ) ℓ -MULT x 1 x 2 x 3 x 4 x 5 x 7 x 1 x 3 x 5 * * * * 1 1 1 0 0 0 0 x 1 x 3 x 5 0 0 0 0 x 1 x 2 x 3 x 4 x 5 x 7 x 2 x 4 * * * * * 1 1 0 0 0 0 0 x 2 x 4 0 0 0 0 0
ℓ -ADD, ℓ -MULT, ℓ -PERMUTE: a complete set of SIMD ops x 1 x 2 x 3 x 4 x 5 0 ℓ -ADD x 1 + x x 2 1 x 3 + x x 4 3 x 5 x 5 0 0 0 0 0 0 0 0 x 2 x 4 0 0 0 0 0
ℓ -ADD, ℓ -MULT, ℓ -PERMUTE: a complete set of SIMD ops 1 1 • x 1 x 2 x 3 x 4 x 5 0 x 7 x 8 x 9 x 10 x 11 x 12 1 x 14 x 15 x 16 x 17 x 18 x 19 1 x 21 x 22 x 23 x 24 x 25 x 26 Use ℓ -PERMUTE for routing between circuit levels Input bits Not quite obvious
Routing Values Between Levels 1.
How to implement ℓ -permute?
3.
o o
2.
𝑎 ∈ 𝑅 2 encodes an ℓ -array using polynomial-CRT We are given an encryption of 𝑎 Fan-out: need to
clone
values from high fan-out gates before routing to next level
Big permutation
: For a width-W level, we need a permutation over 2W values o Implemented using ℓ -permute on ℓ -arrays o Even when 𝑊 ≫ ℓ
Implementing ℓ -Permute • o o o Recall: native plaintext is binary polynomial modulo Φ 𝑚 𝑋 , 𝑎 ∈ 𝑅 2 = 𝑍 2 [𝑋]/Φ 𝑚 (𝑋) 𝑎 ≅ 𝛼 1 , … , 𝛼 ℓ 𝑎 + 𝑎 𝑎 × 𝑎 ′ ′ ≅ 𝛼 ≅ 𝛼 1 1 , 𝛼 + 𝛼 1 ′ × 𝛼 1 ′ 𝑗 = (𝑎 𝑚𝑜𝑑 𝐹 𝑗 ) , … , 𝛼 , … , 𝛼 ℓ ℓ + 𝛼 ℓ ′ × 𝛼 ℓ ′ • Is there a natural operation on polynomials that moves values between slots?
Consequences of Galois Theory • Assuming that slots contain single bits: For every 𝑗 ∈ 𝑍 ∗ 𝑚 𝜅 𝑗 𝑎 𝑋 the mapping = 𝑎 𝑋 𝑗 𝑚𝑜𝑑 Φ 𝑚 𝑋 induces a permutation on the slots o o o permutation=identity iff j is power of two The 𝜅 𝑗 ’s form a group under composition For every pair (i 1 , i 2 ), exists j such that the contents of slot i 1 to slot i 2 𝜅 𝑗 sends
An Illustrative Special Case • o o For some Φ 𝑚 ’s, the 𝜅 𝑗 ’s induce circular shifts on the ℓ -array of slots: If 𝑎 ≅ 𝛼 1 , 𝛼 2 … , 𝛼 ℓ then 𝜅 𝑗 (𝑎) ≅ 𝛼 2 , … , 𝛼 ℓ , 𝛼 1 • for some 𝑗 ∈ 𝑍 ∗ 𝑚 It mean also that for every 𝑗 ′ 𝜅 𝑗′ (𝑎) ≅ 𝛼 ℎ+1 , … , 𝛼 ℓ , 𝛼 1 , … , 𝛼 ℎ = ℎ ⋅ 𝑗 𝑚𝑜𝑑 𝑚 , • For these Φ 𝑚 ’s, all the 𝜅 𝑗 ’s induce rotations
Applying the 𝜅 𝑗 ’s homomorphically • • • o o Roughly, applying 𝜅 𝑗 to the cipehrtext c=Enc(
a
) yields an encryption of 𝜅 𝑗 (𝑎) With respect to a different secret key • 𝜅 𝑗 (𝑠) rather than s But this can be fixed So we can implement circular shifts o But we need arbitrary permutations In order to do intra-circuit routing
From Shifts to Arbitrary Permutations • • o o o A naïve solution: p is the permutation that we need to implement For every i, let j(i) be an index such that the permutation of 𝜅 𝑗(𝑖) sends i to p (i) Let 𝑎 𝑖 = 𝜅 𝑗 𝑖 𝑎 o o Use a big SELECT on all the 𝑎 𝑖 ’s • Pick the slot p (i) from 𝑎 𝑖 Implements p , but takes Θ ℓ ops Inefficient, we might as well not use SIMD
From Shifts to Arbitrary Permutations • • Using Beneš/Walksman Permutation Networks: Two back-to-back butterflies o Every exchange is controlled by a bit o Values sent on either straight edges or cross edges Every permutation can be realized by appropriate setting of the control bits
Realizing Permutation Networks Claim: every level of the Benes network can be realized by two shifts and two SELECTs Example: 0 1 2 3 4 5 6 7 2 1 0 3 6 Control bits:
1 0 1 1
7 4 5
Realizing Permutation Networks
a
1
a
2
a
3 0 0 0 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 input shift(-2) shift(2) 2 1 0 3 6 7 4 5 output
Realizing Permutation Networks
a
1
a
2
a
3 0
2
6
1
3 7 2 4
0 3
5 1 4
6
2 5
7
3 6 0
4
7 1
5
input shift(-2) shift(2)
a
4
a
5
2 2 1 1
4
0 3 3 6 6 7 7
0
4
1
5
SELECT(a 1 ,a 2 ) SELECT(a 3 ,a 4 ) 2 1 0 3 6 7 4 5 output
Realizing Permutation Networks Claim: every level of the Benes network can be realized by two shifts and two SELECTs Proof : In every level, all the exchanges are between nodes at the same distance o Distance 2
i
for some i Can implement all these exchanges using shift(2
i
), shift(-2
i
), and two SELECTs
Realizing Permutation Networks • • Every level takes 2 shifts and 2 SELECTs There are 2log( ℓ ) levels • Any permutation on ℓ -arrays can be realized using 4log( ℓ ) shifts and 4log( ℓ ) SELECTs Some more complications when ℓ power of two is not a o But still only O(log ℓ ) operations
Routing Values Between Levels • o o Implementing ℓ -permute Using 𝑋 ↦ 𝑋 𝑗 to get simple shifts Benes network to get arbitrary permutation • • o Takes O(log ℓ ) operations Cloning values from high fan-out gates Permutations over 𝑊 ≫ ℓ elements o Both can be done in O(log W) operations Intra-level routing takes O(log W) work o For a width-W level
Low Overhead HE • • • o o Pack inputs into ℓ -arrays ℓ can made as large as Route values to their place at the input to next level o Takes O(W polylog W) work Apply SIMD operation to implement level Total work for size-t width only t∙polylog( 𝜆 ) circuit is
So How Fast Is It?
• • • o Not quite as slow as before Not blazing fast either… o Should be able to evaluate AES in time between 1 hour and 1 day But can evaluate 20-100 AES blocks in this time, so amortized per-block time is better Should be usable in niche applications within a year or two
Questions?
Handling Large Permutations • • • • Can we arbitrarily permute m × ℓ items, given in m arrays of size ℓ , using ℓ -ADD, ℓ -MULT, ℓ -PERMUTE? o o o Theorem (Lev, Pippenger, Valiant ‘84): A permutation π over m × ℓ addresses (viewed as a rectangle) can be decomposed as π = π 3 ◦ π 2 ◦ π 1 , where: π 1 only permutes within the columns π 2 only permutes within the rows π 3 only permutes within the columns Within rows: Use ℓ -PERMUTE on each row (array).
Within columns: We swap elements with same array index. Can do this using only ℓ -ADD and ℓ -MULT.
Decomposing Permutations 13 15 7 2 18 3 8 17 14 4 9 1 16 5 20 11 12 6 19 10
?
1 6 11 16 2 7 12 17 3 8 13 18 4 9 14 19 5 10 15 20
Decomposing Permutations 13 15 7 2 18 3 8 17 14 4 9 1 16 5 20 11 12 6 19 10
?
1 6 11 16 2 7 12 17 3 8 13 18 4 9 14 19 5 10 15 20 13 15 7 2 17 3 8 18 14 4 9 1 5 16 11 20 6 12 10 19
Decomposing Permutations 13 15 7 2 18 3 8 17 14 4 9 1 16 5 20 11 12 6 19 10
?
1 6 11 16 2 7 12 17 3 8 13 18 4 9 14 19 5 10 15 20 13 15 7 2 17 3 8 18 14 4 9 1 5 16 11 20 6 12 10 19 6 16 11 1 17 12 7 2 13 3 8 18 14 4 9 19 5 15 10 20
Decomposing Permutations 13 15 7 2 18 3 8 17 14 4 9 1 16 5 20 11 12 6 19 10
?
1 6 11 16 2 7 12 17 3 8 13 18 4 9 14 19 5 10 15 20 13 15 7 2 17 3 8 18 14 4 9 1 5 16 11 20 6 12 10 19 6 16 11 1 17 12 7 2 13 3 8 18 14 4 9 19 5 15 10 20
Decomposing Permutations 13 15 7 2 18 3 8 17 14 4 9 1 16 5 20 11 12 6 19 10
?
1 6 11 16 2 7 12 17 3 8 13 18 4 9 14 19 5 10 15 20 13 15 7 2 17 3 8 18 14 4 9 1 5 16 11 20 6 12 10 19 6 16 11 1 17 12 7 2 13 3 8 18 14 4 9 19 5 15 10 20
Decomposing Permutations 13 15 7 2 18 3 8 17 14 4 9 1 16 5 20 11 12 6 19 10
?
1 6 11 16 2 7 12 17 3 8 13 18 4 9 14 19 5 10 15 20 13 15 7 2 17 3 8 18 14 4 9 1 5 16 11 20 6 12 10 19 6 16 11 1 17 12 7 2 13 3 8 18 14 4 9 19 5 15 10 20