Transcript slides

HYDRA: A Flexible PQC Processor
Chen-Mou Cheng
National Taiwan University
November 16, 2012
Acknowledgment
• Joint work with Bo-Yin Yang (Academia Sinica)
and Andy Wu
Post-quantum cryptography
•
•
•
•
Hash-based cryptography
Code-based cryptography
Lattice-based cryptography
Multivariate cryptography
Multivariate cryptography
• Composition of maps
• Public quadratic polynomials
• F1 and Fk are affine (y = Ax + b)
Step 2. Encryption p ――――→ E ――――→ c
easy↑ ↓hard
Step 1. Generation p → F1 → F2 … → Fk → c
↓easy ↓easy easy↓
Step 3. Decryption p ← D1 ← D2 … ← Dk ← c
Classification of multivariates
• Big-field multivariates
– Matsumoto-Imai derivatives
– SFLASH, HFE
• Small-field (or true) multivariates
– Unbalanced Oil-and-Vinegar derivatives
– Rainbow, TTS
Security of UOV
• MQ: Multivariate quadratics direct attacks
– Gröbner bases: XL, F4/F5 families
• EIP: Extended Isomorphism of Polynomials,
a.k.a. rank or linear algebra attacks
– Low rank attack
– High rank attack
– Reconciliation attack
–…
The HYDRA processor
• A scalable, programmable crypto coprocessor
• Accompanying toolchains and software
libraries
• API to raise abstraction level for developing
security applications
• Allowing aggressive experimentation with PKC,
especially PQC
Slogans
• Cheap PKC
– Hardware acceleration of core computation
– Customizable for multiple vertical markets, allowing
cost sharing
• Future-proof PKC
– Algorithm agility, allowing “BIOS upgrades”
– PQC to resist emerging quantum-computers’ attacks
• Management-free PKC
– Lower total cost of ownership via PKC
– Identity-based crypto ⇒ No more PKI!
• “If we build them [cheaply], they will come”
Target cryptosystems
Scheme
Low Security (280)
High Security (2112,2128)
ECC
NIST 2K160
NIST 2K233 (112bit)
NIST P192
NIST P256, Curve25519
GLS1271
Surface1271 (HEC)
Pairings
NTRU
MQPKC
BN(Barreto-Naehrig)161
BN256
LD(Lopez-Dahab)2271
LD 21223, Beuchat 3509
ees251ep7
ees347ep2 (112bit)
(q=2 instead of q=3)
ees397ep1 (128bit)
Rainbow(q=16 or 31;24,20,20)
Rainbow (q; 32, 32, 32)
TTS (q=16 or 31; 24,20,20)
3HFE(731)-p
3HFE(747)-p
ASIC prototyping of NTRU
ASIC prototyping of TTS
ASIC prototyping of Fp multiplications
The Hydra microarchitecture
D$
Axpy engine
Decoder
μC DMA
Memory bus
I$
Design ingredients
• Axpy-style ISA for regular data movement
between cache & datapath, i.e., Ya•X + Y,
where |a| = w, |X| = lw, |Y| = lw or (l + 1)w
• Wide & flexible vector datapath
• DMA engine to (pre-)fetch and store data to
fill up vector datapath as much as possible
• General-purpose mC for complex I/O
Review: NTRU cryptosystem
• Core operation: Multiplication in Z[x]/(xn-1)
• Key generation
•
•
•
•
Randomly choose f and g with small coefficients
Find fp , fq such that fpf = 1 mod p and fqf = 1 mod q
Public key: h = pfqg
Private key: f , fp
• Encryption
• Randomly generate r with coefficients in [-1,1]
• c = rh+m
• Decryption
• a = fc, with coefficients in [-q/2,q/2]
• m = afp, with coefficient in [-p/2,p/2]
Multiplications in NTRU
x
+
a4
b4
a3
b3
a2
b2
a1
b1
a0
b0
a4 b 0
a3 b 0
a2 b 0
a1 b 0
a0 b 0
a3 b 1
a2 b 1
a1 b 1
a0 b 1
a4 b 1
a2 b 2
a1 b 2
a0 b 2
a4 b 2
a3 b 2
a1 b 3
a0 b 3
a4 b 3
a3 b 3
a2 b 2
a0 b 4
a4 b 4
a3 b 4
a2 b 4
a1 b 4
c4
c3
c2
c1
c0
NTRU ees397ep1
• p=2, q=307, n=397
• Message m: 397 bits
• Signature c: (Z307)397, ~397x9 bits
• Public key h: (Z307[x])/(x397-1), ~397x9 bits
• Private key
f : (Z307[x])/(x397-1), ~397x9 bits
- Contains 74 nonzero elements
fp: (Z2[x])/(x397-1), = 397x1 bits
Review: TTS cryptosystem
• Message z: (GF31)40, ~200 bits
• Signature w: (GF31)64, ~320 bits
• Public key P: (GF31)40x2080, ~416 Kbits
– Bottleneck: Quadratic polynomial evaluation
• Private key: ~44244 bits
– Bottleneck: Linear maps and system solving
Review: Elliptic curve pairing
•
•
•
•
•
Core operations are finite-field arithmetic
Bottleneck for prime fields: Modular multiplication
Euclid’s division: y=qn+r, 0<=r<n
Hensel’s division: y+qn=pkr, 0<=r<2n, p prime
Montgomery method
–
–
–
–
–
x  pkx mod n: ring homomorphism if (p,n)=1
Precompute p’,n’ such that pkp’-nn’=1
q  (y mod pk)n’
q’  (q mod pk)n
r  (y+q’)/pk
Montgomery method: More details
• Problem: Given A, B, M, compute AB mod M
• Idea: Works in an isomorphic ring
– AAR mod M and BBR mod M
– Need a way to compute ABR mod M
• Solution: (x,y) M (xy)/R mod M
– T(AR mod M)(BR mod M)
– Can add multiple of M since mod M
• T + xM = 0 mod R, therefore x = –M–1T mod R
– (AR,BR) M(T + (–M–1T mod R)M)/R = ABR mod M
Multi-precision Montgomery
• X = (xn – 1 xn – 2 … x0), xi in {0,…,2w – 1}
• S0
• for i in 0 .. n – 1
– qis0 + aib0(–M–1) mod 2w
– S(S + aiB + qiM)/2w
– [loop invariant: S in {0,…,M + B – 1}]
• [post condition: 2nwS = AB + QM]
The main Hydra ISA
• Recall: Ya•X + Y
– |a| = w, |X| = lw, |Y| = lw or (l + 1)w
• Type i (for pairing)
– a in {0,…,2w – 1}, X in {0,…,2lw – 1},
Y in {0,…,2(l + 1)w – 1}
– •,+: the usual integer multiplication and addition
• Type q (for TTS)
– a in Fq, X in Fql, Y in Fql, and q ≤ 2w
– •,+: scalar multiplication and vector addition in ldimensional vector spaces over Fq
Type r Axpy instructions
• X in Zql, Y in Zql such that q ≤ 2w
• a in Zph such that h[lgp] ≤ 2w
x
+
a4
b4
a3
b3
a2
b2
a1
b1
a0
b0
a4 b 0
a3 b 0
a2 b 0
a1 b 0
a0 b 0
a3 b 1
a2 b 1
a1 b 1
a0 b 1
a4 b 1
a2 b 2
a1 b 2
a0 b 2
a4 b 2
a3 b 2
a1 b 3
a0 b 3
a4 b 3
a3 b 3
a2 b 2
a0 b 4
a4 b 4
a3 b 4
a2 b 4
a1 b 4
c4
c3
c2
c1
c0
Next steps
• Prototype implementation
– Bulk of the work goes here
• SystemC-based ISA simulator
• Compiler construction
– Maybe to base on LLVM
Thank you!
• Questions or comments?