Recent Advances in Homomorphic Encryption

Transcript Recent Advances in Homomorphic Encryption

Recent Advances in Homomorphic Encryption

Shai Halevi – IBM Research February 13, 2012

Computing on Encrypted Data • o o o Wouldn’t it be nice to be able to… Encrypt my data before sending to the cloud While still allowing the cloud to search/sort/edit/… this data on my behalf Keeping the data in encrypted form • Without shipping it back and forth to be decrypted

Computing on Encrypted Data • o o o Wouldn’t it be nice to be able to… Encrypt my queries to the cloud While still allowing the cloud to process them Cloud returns encrypted answers • that I can decrypt

Computing on Encrypted Data $skj#hS28ksytA@ …

Computing on Encrypted Data $kjh9*mslt@na0 &maXxjq02bflx m^00a2nm5,A4.

pE.abxp3m58bsa

(3saM%w,snanba nq~mD=3akm2,A Z,ltnhde83|3mz{n dewiunb4]gnbTa* kjew^bwJ^mdns0

Privacy Homomorphisms • Rivest-Adelman-Dertouzos 1978

Plaintext

c i

 Enc(

x i

)

Ciphertext space C

2 * #

 Dec(

)

y d

• • o Example: RSA-encrypt (

) (

) =

x e x

1 e 

2 e = (

1 

2 ) e mod

mod

“Somewhat Homomorphic” (SWHE): can compute some functions on encrypted data

“Fully Homomorphic” Encryption (FHE) • Encryption for which we can compute arbitrary functions on the encrypted data Enc(

) Eval

Size independent of

’s complexity Enc ( f(

) ) • o Enough to do Enc(

1 

2 ), Enc(

1 +

2 ) Every function can be expressed as a polynomial

How To Do It?

• • • o o Open for >30 years First plausible construction in [Gentry’09] SWHE with security based on hard problems in (ideal) integer lattices A transformation SWHE  FHE • Requires SWHE that can evaluate its own decryption circuit Several other constructions since then

A Taste of Gentry’s SWHE Scheme • • • • • o Public key is two integers p,r Secret key is one integer w Enc

p,r

( 𝑏 ∈ {0,1} ): Implicit encoding of matix, vecot choose high-degree polynomial with small coefficients (call it Q), output c=2Q(r)+b mod p Dec

(c): output (c∙w mod p) mod 2 Addition, multiplication of ctxts modulo p (main catch: integers are huge…)

Performance • • • • o A little slow… First working implementation in [GH11] ½ -hour to compute a single  gate Because of homomorphic decryption o o o 13-14 orders of magnitude slowdown vs. computing on non-encrypted data The underlying SWHE is faster Can evaluate polynomials of degree upto ~200 About ½-second for a single  gate

Performance • • A little slow… Butler Lampson: “ lifetimes […] ” I don’t think we’ll see anyone using Gentry’s solution in our – Forbes, Dec-19, 2011 Rest of this talk: reasons to believe otherwise

• • Why is [G’09] So Slow?

o Ciphertext of SWHE is large Ciphertext has dimension D, bitsize N • Ciphertext size |c| ~ D∙N implicit in

p,r,w

o o Ciphertext contains n-bit “noise” • D > N∙ l to get security initial noise, say

0 = log( l ) • N > n to be able to decrypt Mult(c1,c2) adds the noise bits in the ci’s • e.g., from n bits each to 2n-bits in the product o Degree-k polynomial has ≥kn 0 -bit noise Need N>kn 0 , D>kn 0 ∙ l , |c|> W ( l

2 )

Why is [G’09] So Slow?

• • Ciphertext of SWHE has size |c|≥ W ( l

2 ) SWHE  FHE relies on homomorphic evaluation of the decryption function o Of degree W ( l ), so |c|≥ W ( l 3 )  Decryption takes ≥ W ( l 3 ) operations  Homomorphic evaluation of each operation on two ciphertexts takes time ≥ W ( l 3 )  Total complexity ≥ W ( l 6 ) o Can be improved to 3.5

)

Faster Homomorphic Evaluation • • • o Brakerski-Vaikuntanathan’11 and Brakerski-Gentry-Vaikuntanathan’12 From 3.5

to o Smart-Vercateran‘11 SIMD operations, batching ops at a time o o Gentry-Halevi-Smart’12 From SIMD to general-purpose computing Evaluating t-gate circuit in time t∙polylog( l ) • Assuming average width

A Taste of [BV11, BGV12] • • • • • • Secret key is a vector s Ciphertext is a vector c o Dec

(c): output (c•s mod q) mod 2 q is a parameter Addition is vector-addition modulo q Multiplication is tensor-product mod q, followed by “dimension reduction” Parameter q evolves during computation

The [BV11, BGV12] Techniques • • • • o A better multiplication technique Decrease bitsize rather than increase noise o c1,c2, have bitsize N, noise bitsize = n  Mult(c1,c2) has bitsize N n, noise bitsize = n o Need N 0 >(k+1)n to handle depth k This gives degree 2

Decryption has depth O(log l ) N 0 = polylog( l ), D = , |c|= 𝑂(𝜆)

Exploiting Parallelism • SIMD: Working on Data Arrays Array of length ℓ ℓ -ADD

2 8 1 2 10 3 9 0 9 2 6 8 4 1 5 4 5 9 5 9 0 3 14 3 7 8 3 0 15 3 6 1 7 … 1 … 4 … 5 2 4 6

Exploiting Parallelism • SIMD: Working on Data Arrays Array of length ℓ ℓ -MULT

2 8 1 2 16 2 9 0 0 2 6 4 1 4 5 5 9 0 3 12 4 20 45 0 7 8 3 0 56 0

• What’s the point?

o Efficiency: ℓ -fold parallelism

6 1 6 … 1 … 4 … 4 2 4 8

Plaintext Algebra • • • Some FHE variants use polynomial rings o o Native plaintext space is 𝑅 2 = 𝑍 2 Binary polynomials modulo Φ 𝑚 (𝑋) • Φ 𝑚 (𝑋) is m’th cyclotomic polynomial Dimension is 𝐷 = 𝜙 𝑚 ≈ 𝑚 𝑋 /Φ 𝑚 (𝑋) o o Φ 𝑚 (𝑋) Φ 𝑚 irreducible over Z, but not mod 2 𝑋 = ∏ ℓ 𝑗=1 𝐹 𝑗 𝑋 (mod 2) • The F

’s are irreducible, all have the same degree For some m’s we can get ℓ = Ω( 𝐷 log 𝐷 )

Plaintext Slots • • • o Plaintext element 𝑎 ∈ 𝑅 2 ℓ 𝑎 ≅ 𝛼 𝑗 𝑗=1 , 𝛼 𝑗 = (𝑎 𝑚𝑜𝑑 𝐹 𝑗 ) encodes ℓ values o Polynomial Chinese Remainders Each 𝛼 𝑗 can be a bit Ops  ,+ work independently on the slots o 𝑎 × 𝑎 ′ ≅ 𝛼 𝑗 × 𝛼 𝑗 ′ 𝑗 , 𝑎 + 𝑎 ′ ≅ 𝛼 𝑗 + 𝛼 𝑗 ′ 𝑗

Homomorphic SIMD [SV’11] • • • • Computing same function on ℓ the price of one computation inputs at o Pack the inputs into the slots Bit-slice, inputs to j’th instance go in j‘th slots Compute the function once After decryption, decode the ℓ output bits from the output plaintext polynomial

Aside: an ℓ -SELECT Operation x

x 1 1 x 2 0 x 3 0 x 4 1 x 5 0 x 6 1 x 7 0 =

x 8 0 x 9 1 x 10 1 x 11 0 x 12 1 x 13 0 x 14 1 = x 1 0 0 0 x 9 x 10 x 4 + 0 0 x 12 x 6 0 0 x 14 x 1 x 9 x 10 x 4 x 12 x 6 x 14

• We will use this later

Low-Overhead HE [GHS’12] • • • o Start from the [BGV’12] cryptosystem Overhead o Apply the [SV’11] SIMD trick Overhead polylog( l ) when computing the same function f on inputs o o Extend to computing a single instance of f As long as the circuit of f has width Use internal parallelism inside this one circuit

So you want to compute some function… 1 1 x 1 x 2 x 3 x 4 x 5 0 x 7 x 8 x 9 x 10 x 11 x 12 1 x 14 x 15 x 16 x 17 x 18 x 19 1 x 21 x 22 x 23 x 24 x 25 x 26 Input bits ADD and MULT are a complete set of operations.

So you want to compute some function… Using SIMD… 1 1 x 1 x 2 x 3 x 4 x 1 x 2 x 3 x 4 x 5 x 5 0 x 7 x 8 x 9 x 10 x 11 x 12 x 7 x 8 x 9 x 10 x 11 x 12 1 x 14 x 15 x 16 x 17 x 18 x 19 x 14 x 15 x 16 x 17 x 18 x 19 1 x 21 x 22 x 23 x 24 x 25 x 26 x 21 x 22 x 23 x 24 x 25 x 26 Input bits ℓ -ADD and ℓ -MULT are not a complete set of operations!!!

… unless, of course, we use ℓ =1… 

Routing Values Between Levels • We need to map this

x 1 x 15 x 2 x 16 x 3 x 17 x 4 x 18 x 5 x 19 0 1 x 7 x 21 x 8 x 22 x 9 x 23 x 10 x 24 x 11 x 25 x 12 x 26

• Into that … so we can use ℓ -add

1 x 14 x 1 x 2 x 3 x 4 x 5 0 x 7 x 8 x 9 x 10 x 11 x 12 1 x 14 x 15 x 16 x 17 x 18 x 19 1 x 21 x 22 x 23 x 24 x 25 x 26

ℓ -ADD, ℓ -MULT, ℓ -PERMUTE: a complete set of SIMD ops x 1 x 2 x 3 x 4 x 5 0 ℓ -PERMUTE( π ) ℓ -MULT x 1 x 2 x 3 x 4 x 5 x 7 x 1 x 3 x 5 * * * * 1 1 1 0 0 0 0 x 1 x 3 x 5 0 0 0 0 x 1 x 2 x 3 x 4 x 5 x 7 x 2 x 4 * * * * * 1 1 0 0 0 0 0 x 2 x 4 0 0 0 0 0

ℓ -ADD, ℓ -MULT, ℓ -PERMUTE: a complete set of SIMD ops x 1 x 2 x 3 x 4 x 5 0 ℓ -ADD x 1 + x x 2 1 x 3 + x x 4 3 x 5 x 5 0 0 0 0 0 0 0 0 x 2 x 4 0 0 0 0 0

ℓ -ADD, ℓ -MULT, ℓ -PERMUTE: a complete set of SIMD ops 1 1 • x 1 x 2 x 3 x 4 x 5 0 x 7 x 8 x 9 x 10 x 11 x 12 1 x 14 x 15 x 16 x 17 x 18 x 19 1 x 21 x 22 x 23 x 24 x 25 x 26 Use ℓ -PERMUTE for routing between circuit levels Input bits Not quite obvious

Routing Values Between Levels 1.

How to implement ℓ -permute?

o o

𝑎 ∈ 𝑅 2 encodes an ℓ -array using polynomial-CRT We are given an encryption of 𝑎 Fan-out: need to

clone

values from high fan-out gates before routing to next level

Big permutation

: For a width-W level, we need a permutation over 2W values o Implemented using ℓ -permute on ℓ -arrays o Even when 𝑊 ≫ ℓ

Implementing ℓ -Permute • o o o Recall: native plaintext is binary polynomial modulo Φ 𝑚 𝑋 , 𝑎 ∈ 𝑅 2 = 𝑍 2 [𝑋]/Φ 𝑚 (𝑋) 𝑎 ≅ 𝛼 1 , … , 𝛼 ℓ 𝑎 + 𝑎 𝑎 × 𝑎 ′ ′ ≅ 𝛼 ≅ 𝛼 1 1 , 𝛼 + 𝛼 1 ′ × 𝛼 1 ′ 𝑗 = (𝑎 𝑚𝑜𝑑 𝐹 𝑗 ) , … , 𝛼 , … , 𝛼 ℓ ℓ + 𝛼 ℓ ′ × 𝛼 ℓ ′ • Is there a natural operation on polynomials that moves values between slots?

Consequences of Galois Theory • Assuming that slots contain single bits: For every 𝑗 ∈ 𝑍 ∗ 𝑚 𝜅 𝑗 𝑎 𝑋 the mapping = 𝑎 𝑋 𝑗 𝑚𝑜𝑑 Φ 𝑚 𝑋 induces a permutation on the slots o o o permutation=identity iff j is power of two The 𝜅 𝑗 ’s form a group under composition For every pair (i 1 , i 2 ), exists j such that the contents of slot i 1 to slot i 2 𝜅 𝑗 sends

An Illustrative Special Case • o o For some Φ 𝑚 ’s, the 𝜅 𝑗 ’s induce circular shifts on the ℓ -array of slots: If 𝑎 ≅ 𝛼 1 , 𝛼 2 … , 𝛼 ℓ then 𝜅 𝑗 (𝑎) ≅ 𝛼 2 , … , 𝛼 ℓ , 𝛼 1 • for some 𝑗 ∈ 𝑍 ∗ 𝑚 It mean also that for every 𝑗 ′ 𝜅 𝑗′ (𝑎) ≅ 𝛼 ℎ+1 , … , 𝛼 ℓ , 𝛼 1 , … , 𝛼 ℎ = ℎ ⋅ 𝑗 𝑚𝑜𝑑 𝑚 , • For these Φ 𝑚 ’s, all the 𝜅 𝑗 ’s induce rotations

Applying the 𝜅 𝑗 ’s homomorphically • • • o o Roughly, applying 𝜅 𝑗 to the cipehrtext c=Enc(

) yields an encryption of 𝜅 𝑗 (𝑎) With respect to a different secret key • 𝜅 𝑗 (𝑠) rather than s But this can be fixed So we can implement circular shifts o But we need arbitrary permutations In order to do intra-circuit routing

From Shifts to Arbitrary Permutations • • o o o A naïve solution: p is the permutation that we need to implement For every i, let j(i) be an index such that the permutation of 𝜅 𝑗(𝑖) sends i to p (i) Let 𝑎 𝑖 = 𝜅 𝑗 𝑖 𝑎 o o Use a big SELECT on all the 𝑎 𝑖 ’s • Pick the slot p (i) from 𝑎 𝑖 Implements p , but takes Θ ℓ ops Inefficient, we might as well not use SIMD

From Shifts to Arbitrary Permutations • • Using Beneš/Walksman Permutation Networks: Two back-to-back butterflies o Every exchange is controlled by a bit o Values sent on either straight edges or cross edges Every permutation can be realized by appropriate setting of the control bits

Realizing Permutation Networks Claim: every level of the Benes network can be realized by two shifts and two SELECTs Example: 0 1 2 3 4 5 6 7 2 1 0 3 6 Control bits:

1 0 1 1

7 4 5

Realizing Permutation Networks

3 0 0 0 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 input shift(-2) shift(2) 2 1 0 3 6 7 4 5 output

Realizing Permutation Networks

3 0

3 7 2 4

0 3

5 1 4

2 5

3 6 0

7 1

input shift(-2) shift(2)

2 2 1 1

0 3 3 6 6 7 7

SELECT(a 1 ,a 2 ) SELECT(a 3 ,a 4 ) 2 1 0 3 6 7 4 5 output

Realizing Permutation Networks Claim: every level of the Benes network can be realized by two shifts and two SELECTs Proof : In every level, all the exchanges are between nodes at the same distance o Distance 2

for some i Can implement all these exchanges using shift(2

), shift(-2

), and two SELECTs

Realizing Permutation Networks • • Every level takes 2 shifts and 2 SELECTs There are 2log( ℓ ) levels •  Any permutation on ℓ -arrays can be realized using 4log( ℓ ) shifts and 4log( ℓ ) SELECTs Some more complications when ℓ power of two is not a o But still only O(log ℓ ) operations

Routing Values Between Levels • o o Implementing ℓ -permute Using 𝑋 ↦ 𝑋 𝑗 to get simple shifts Benes network to get arbitrary permutation • • o Takes O(log ℓ ) operations Cloning values from high fan-out gates Permutations over 𝑊 ≫ ℓ elements o Both can be done in O(log W) operations  Intra-level routing takes O(log W) work o For a width-W level

Low Overhead HE • • • o o Pack inputs into ℓ -arrays ℓ can made as large as Route values to their place at the input to next level o Takes O(W polylog W) work Apply SIMD operation to implement level Total work for size-t width only t∙polylog( 𝜆 ) circuit is

So How Fast Is It?

• • • o Not quite as slow as before Not blazing fast either… o Should be able to evaluate AES in time between 1 hour and 1 day But can evaluate 20-100 AES blocks in this time, so amortized per-block time is better Should be usable in niche applications within a year or two

Questions?

Handling Large Permutations • • • • Can we arbitrarily permute m × ℓ items, given in m arrays of size ℓ , using ℓ -ADD, ℓ -MULT, ℓ -PERMUTE? o o o Theorem (Lev, Pippenger, Valiant ‘84): A permutation π over m × ℓ addresses (viewed as a rectangle) can be decomposed as π = π 3 ◦ π 2 ◦ π 1 , where: π 1 only permutes within the columns π 2 only permutes within the rows π 3 only permutes within the columns Within rows: Use ℓ -PERMUTE on each row (array).

Within columns: We swap elements with same array index. Can do this using only ℓ -ADD and ℓ -MULT.