2012-01-3.presentationx

Download Report

Transcript 2012-01-3.presentationx

Ofer Schwarz, Winter 2012-2013
Advisor: Barukh Ziv
Elliptic Curve
Cryptography
The EC Discrete Logarithm problem and
Pollard’s Rho attack
Background
ECDLP; The ECDLP attack; Project goals
Elliptic Curves
β€’ Elliptic curves may be defined over any field
β€’ Solutions π‘₯, 𝑦 to the equation
𝑦 2 + π‘Ž1 π‘₯𝑦 + π‘Ž3 𝑦 = π‘₯ 3 + π‘Ž2 π‘₯ 2 + π‘Ž4 π‘₯ + π‘Ž6
β€’ Obtain a simpler equation through variable change
o Over 𝑭𝑝 : 𝑦 2 = π‘₯ 3 + π‘Žπ‘₯ + 𝑏
o Over 𝑭2π‘š : 𝑦 2 + π‘₯𝑦 = π‘₯ 2 + π‘Žπ‘₯ 2 + 𝑏
β€’ Define an additive group structure using geometry
o β€œPoint an infinity” ∞ serves as the unit element
Calculating π‘₯3 , 𝑦3 = π‘₯1 , 𝑦1 + (π‘₯2 , 𝑦2 ) over 𝐹𝑝 :
𝑦2 βˆ’ 𝑦1
π‘š=
π‘₯3 = π‘š2 βˆ’ π‘₯1 + π‘₯2
𝑦3 = π‘š π‘₯1 βˆ’ π‘₯3 βˆ’ 𝑦1
π‘₯2 βˆ’ π‘₯1
ECDLP
β€’ Elliptic Curve Discrete Logarithm Problem
β€’ Computational hardness of DLP is the basis for many
cryptographic systems (e.g., DSA, ElGamal)
β€’ Given a finite field 𝐹,
β€’ An elliptic curve 𝐸 over 𝐹,
β€’ A point 𝑃 ∈ 𝐸(𝐹) of order 𝑛 [𝑛𝑃 = ∞],
β€’ And another point 𝑄 = 𝑙𝑃 ∈ 𝑃
β€’ The problem: find 𝑙
ECDLP using collisions
β€’ The idea: find 𝑐1 , 𝑑1 , (𝑐2 , 𝑑2 ) such that
𝑐1 𝑃 + 𝑑1 𝑄 = 𝑐2 𝑃 + 𝑑2 𝑄
β€’ Then we have 𝑐1 βˆ’ 𝑐2 𝑃 = 𝑑2 βˆ’ 𝑑1 𝑄 = 𝑑2 βˆ’ 𝑑1 𝑙𝑃
β€’ Simple method to find a collision: birthday paradox
o Very heavy memory requirements
β€’ Pollard’s Rho attack: same time, negligible memory
β€’ The means: random functions
Pollard’s Rho
β€’ Every function over a finite space
is composed of finite chains
β€’ Each chain has a cycle, and a collision:
π‘₯ β‰  𝑦 such that 𝑓 π‘₯ = 𝑓 𝑦
β€’ In a random function:
o Expected tail length β‰ˆ
πœ‹π‘›/8
o Expected cycle length β‰ˆ
πœ‹π‘›/8
β€’ Use any cycle-detection method
o E.g., Floyd’s algorithm: ~3 𝑛 EC operations
β€’ Use a specific family of functions for which given 𝑋
= π‘Žπ‘ƒ + 𝑏𝑄 it is easy to find π‘Žβ€² , 𝑏′ s.t. 𝑓 𝑋 = π‘Žβ€² 𝑃 + 𝑏 β€² 𝑄
Additive walks
β€’ Partition the curve into disjoint subsets 𝑆1 , … , π‘†π‘š
o E.g., according to the least π‘˜ = log 2 π‘š bits of π‘₯ coordinate
β€’ Choose random integers π‘Žπ‘– , 𝑏𝑖 for 𝑖 = 1, … , π‘š
β€’ For 𝑋 ∈ 𝑆𝑖 , define 𝑓 𝑋 = 𝑋 + π‘Žπ‘– 𝑃 + 𝑏𝑖 𝑄
β€’ For starting element, choose random π‘Žπ‘ƒ + 𝑏𝑄
Pohlig-Hellman reduction
𝑒
𝑒
β€’ Assume 𝑛 = 𝑝11 β‹― π‘π‘Ÿ π‘Ÿ
β€’ Reduces ECDLP of order 𝑛 to 𝑒𝑖 instances of order 𝑝𝑖
for 𝑖 = 1, … , π‘Ÿ
β€’ Uses Chinese remainder theorem and group
structure
β€’ Significance: ECDLP of order 𝑛 is only as hard as the
largest prime factor of 𝑛
β€’ Usually the parameters are chosen so 𝑛 is prime
Project goals
β€’ Implement a generic EC arithmetic library
β€’ Implement the ECDLP attack
β€’ Research and implement various improvements
and optimizations for the attack
β€’ Ultimate goal: solve 64-bit ECDLP (i.e., 𝑛 β‰ˆ 264 )
Improvements and
optimizations
Nivasch’s algorithm; Montgomery trick and distinguished
point method; Negation map
1. Nivasch’s algorithm
β€’ Cycle detection using stacks
β€’ The idea: find the smallest value in the cycle
o Keep a stack of values encountered so far
o For each new value, remove all values larger than it
o Stack is ordered by π‘₯𝑖 , 𝑖 , increasing in both
β€’ Improvement: use π‘š stacks, with partitioning
o Look for smallest value on cycle in each subset separately
β€’ Expected runtime: 1 +
1
2 π‘š+1
β€’ Expected memory: 𝑂(π‘š log
πœ‹π‘›/2
𝑛 )
2. The Montgomery trick
β€’ Inversion is the most expensive field operation
β€’ Compute several inversions simultaneously
β€’ The trick: use accumulating products:
π‘—βˆ’1
π‘Žπ‘—βˆ’1 =
π‘Žπ‘– βˆ™
𝑖=1
βˆ’1
π‘š
π‘Žπ‘–
𝑖=1
π‘š
βˆ™
π‘Žπ‘–
𝑖=𝑗+1
β€’ Substitute π‘š inversions with 3 π‘š βˆ’ 1 multiplications
and 1 inversion
Local parallelization
β€’ Montgomery’s trick requires several parallel
instances (all running locally)
β€’ Naïve parallelization only results in a π‘š speedup
β€’ The distinguished point method yields a speedup
factor of π‘š
β€’ The result: we can use Montgomery’s trick without
losing efficiency!
Distinguished points
β€’ Pollard’s Rho chains may
intersect
β€’ Use same function in all
instances
β€’ Keep a hash table of points
β€’ Only insert β€œdistinguished” points
β€’ Common method: π‘˜ least bits of
the 𝑦 coordinate are all 0
β€’ Gives the same speedup factor,
but saves a factor of 2π‘˜ in
memory
3. Negation map
β€’ Method for improving the attack by a factor of 2
β€’ The idea: given a point 𝑃 ∈ 𝐸(𝐹), it’s very easy to
calculate βˆ’π‘ƒ
o In prime curves: βˆ’ π‘₯, 𝑦 = (π‘₯, βˆ’π‘¦)
β€’ The idea: β€œgroup” each point and its negative as a
single element
o E.g., use the one with an even 𝑦 coordinate
Fruitless cycles
β€’ Problem with negation map in additive walks
β€’ If 𝑋 ∈ 𝑆𝑖 and 𝑓 𝑋 = βˆ’π‘“ 𝑋 ∈ 𝑆𝑖 , then
𝑓 𝑓 𝑋 = βˆ’ 𝑋 + π‘Žπ‘– 𝑃 + 𝑏𝑖 𝑄 + π‘Žπ‘– 𝑃 + 𝑏𝑖 𝑄 = βˆ’π‘‹
β€’ β€œFruitless” because linear combination is the same
β€’ Happens with π‘ƒπ‘Ÿ =
1
2π‘š
every step (π‘š = partition factor)
β€’ Longer even-length cycles are also possible
o Probability is exponential in cycle length
Resolving fruitless cycles
β€’ The simplest idea actually works: just check!
β€’ Check for 2-cycles every π‘˜2 steps
o
o
o
o
When calculating 𝑋𝑖 = 𝑓(π‘‹π‘–βˆ’1 ) for 𝑖 ≑ 0 (π‘šπ‘œπ‘‘ π‘˜2 )
Check if π‘‹π‘–βˆ’1 = π‘‹π‘–βˆ’3
If so, define 𝑋𝑖 = 2 βˆ™ min{π‘‹π‘–βˆ’1 , π‘‹π‘–βˆ’2 }
Still easy to calculate the linear combination
β€’ Do the same for larger even lengths
o Analysis shows that optimal π‘˜π‘ β‰ˆ π‘šπ‘/2
o Only need to check up to π‘π‘šπ‘Žπ‘₯ =
log 𝑛
log π‘š
Implementation and results
EC arithmetic library; Collision library; Challenges and results
Curve arithmetic library
β€’ Generic EC arithmetic library in C++
β€’ Support for various different curves and algorithms
o Extensible syntax that allows adding even more curves and algorithms
β€’ Fast field arithmetic using GMP and NTL
o Incl. complex operations, e.g., Chinese remainders, modular square roots
Collision library
β€’ Generic (templated) C++ library for finding collisions
β€’ Only need to supply the function
β€’ Currently implemented:
o Floyd’s algorithm
o Nivasch’s stack algorithm
o Distinguished point method for parallelization
Challenges
β€’ 4 ECDLP challenges of increasing difficulty
o 30, 40, 50 and 64 bits
β€’ 1 Extra challenge with non-prime order for testing
Pohlig-Hellman reduction
Results!
β€’ 64-bit challenge solved in ~16 hours, ~231 iterations
β€’ Results from previous group: 60 bits in 5-6 days
β€’ Best result to date: 112 bits in 3.5 months
o Used a cluster of 218 PlayStation 3 consoles
o Single-Instruction, Multiple-Data architecture
o Heavy optimizations on all levels
Results!
Average time
Average function calls
65536
35
32768
16384
30
8192
25
2048
1024
log2(#calls)
Runtime (seconds)
4096
512
256
128
20
15
64
32
10
16
8
5
4
2
1
0
30
40
50
Challenge bits
64
30
40
50
Challenge bits
64
Optimization tests
β€’ Check every improvement against vanilla version
β€’ Nivasch: 2.16 times less iterations, 1.4 speedup
β€’ Montgomery: 1.43 speedup factor for 40 bits, 1.33
factor for 30 bits
β€’ Negation map: 1.1 times less iterations, no speedup
o (Actually about 1.07 times slower)
Improvement ideas
β€’ Distributed attack
β€’ Low-level optimizations
o Integer arithmetic
o Field arithmetic (probably harder since NTL is very good at that)
o In-place operations instead of constructors and copying
β€’ Use SIMD architecture (e.g., GPUs)
The End