2012-01-3.presentationx

Transcript 2012-01-3.presentationx

Ofer Schwarz, Winter 2012-2013
Advisor: Barukh Ziv
Elliptic Curve
Cryptography
The EC Discrete Logarithm problem and
Pollard’s Rho attack
Background
ECDLP; The ECDLP attack; Project goals
Elliptic Curves
• Elliptic curves may be defined over any field
• Solutions 𝑥, 𝑦 to the equation
𝑦 2 + 𝑎1 𝑥𝑦 + 𝑎3 𝑦 = 𝑥 3 + 𝑎2 𝑥 2 + 𝑎4 𝑥 + 𝑎6
• Obtain a simpler equation through variable change
o Over 𝑭𝑝 : 𝑦 2 = 𝑥 3 + 𝑎𝑥 + 𝑏
o Over 𝑭2𝑚 : 𝑦 2 + 𝑥𝑦 = 𝑥 2 + 𝑎𝑥 2 + 𝑏
• Define an additive group structure using geometry
o “Point an infinity” ∞ serves as the unit element
Calculating 𝑥3 , 𝑦3 = 𝑥1 , 𝑦1 + (𝑥2 , 𝑦2 ) over 𝐹𝑝 :
𝑦2 − 𝑦1
𝑚=
𝑥3 = 𝑚2 − 𝑥1 + 𝑥2
𝑦3 = 𝑚 𝑥1 − 𝑥3 − 𝑦1
𝑥2 − 𝑥1
ECDLP
• Elliptic Curve Discrete Logarithm Problem
• Computational hardness of DLP is the basis for many
cryptographic systems (e.g., DSA, ElGamal)
• Given a finite field 𝐹,
• An elliptic curve 𝐸 over 𝐹,
• A point 𝑃 ∈ 𝐸(𝐹) of order 𝑛 [𝑛𝑃 = ∞],
• And another point 𝑄 = 𝑙𝑃 ∈ 𝑃
• The problem: find 𝑙
ECDLP using collisions
• The idea: find 𝑐1 , 𝑑1 , (𝑐2 , 𝑑2 ) such that
𝑐1 𝑃 + 𝑑1 𝑄 = 𝑐2 𝑃 + 𝑑2 𝑄
• Then we have 𝑐1 − 𝑐2 𝑃 = 𝑑2 − 𝑑1 𝑄 = 𝑑2 − 𝑑1 𝑙𝑃
• Simple method to find a collision: birthday paradox
o Very heavy memory requirements
• Pollard’s Rho attack: same time, negligible memory
• The means: random functions
Pollard’s Rho
• Every function over a finite space
is composed of finite chains
• Each chain has a cycle, and a collision:
𝑥 ≠ 𝑦 such that 𝑓 𝑥 = 𝑓 𝑦
• In a random function:
o Expected tail length ≈
𝜋𝑛/8
o Expected cycle length ≈
𝜋𝑛/8
• Use any cycle-detection method
o E.g., Floyd’s algorithm: ~3 𝑛 EC operations
• Use a specific family of functions for which given 𝑋
= 𝑎𝑃 + 𝑏𝑄 it is easy to find 𝑎′ , 𝑏′ s.t. 𝑓 𝑋 = 𝑎′ 𝑃 + 𝑏 ′ 𝑄
Additive walks
• Partition the curve into disjoint subsets 𝑆1 , … , 𝑆𝑚
o E.g., according to the least 𝑘 = log 2 𝑚 bits of 𝑥 coordinate
• Choose random integers 𝑎𝑖 , 𝑏𝑖 for 𝑖 = 1, … , 𝑚
• For 𝑋 ∈ 𝑆𝑖 , define 𝑓 𝑋 = 𝑋 + 𝑎𝑖 𝑃 + 𝑏𝑖 𝑄
• For starting element, choose random 𝑎𝑃 + 𝑏𝑄
Pohlig-Hellman reduction
𝑒
𝑒
• Assume 𝑛 = 𝑝11 ⋯ 𝑝𝑟 𝑟
• Reduces ECDLP of order 𝑛 to 𝑒𝑖 instances of order 𝑝𝑖
for 𝑖 = 1, … , 𝑟
• Uses Chinese remainder theorem and group
structure
• Significance: ECDLP of order 𝑛 is only as hard as the
largest prime factor of 𝑛
• Usually the parameters are chosen so 𝑛 is prime
Project goals
• Implement a generic EC arithmetic library
• Implement the ECDLP attack
• Research and implement various improvements
and optimizations for the attack
• Ultimate goal: solve 64-bit ECDLP (i.e., 𝑛 ≈ 264 )
Improvements and
optimizations
Nivasch’s algorithm; Montgomery trick and distinguished
point method; Negation map
1. Nivasch’s algorithm
• Cycle detection using stacks
• The idea: find the smallest value in the cycle
o Keep a stack of values encountered so far
o For each new value, remove all values larger than it
o Stack is ordered by 𝑥𝑖 , 𝑖 , increasing in both
• Improvement: use 𝑚 stacks, with partitioning
o Look for smallest value on cycle in each subset separately
• Expected runtime: 1 +
1
2 𝑚+1
• Expected memory: 𝑂(𝑚 log
𝜋𝑛/2
𝑛 )
2. The Montgomery trick
• Inversion is the most expensive field operation
• Compute several inversions simultaneously
• The trick: use accumulating products:
𝑗−1
𝑎𝑗−1 =
𝑎𝑖 ∙
𝑖=1
−1
𝑚
𝑎𝑖
𝑖=1
𝑚
∙
𝑎𝑖
𝑖=𝑗+1
• Substitute 𝑚 inversions with 3 𝑚 − 1 multiplications
and 1 inversion
Local parallelization
• Montgomery’s trick requires several parallel
instances (all running locally)
• Naïve parallelization only results in a 𝑚 speedup
• The distinguished point method yields a speedup
factor of 𝑚
• The result: we can use Montgomery’s trick without
losing efficiency!
Distinguished points
• Pollard’s Rho chains may
intersect
• Use same function in all
instances
• Keep a hash table of points
• Only insert “distinguished” points
• Common method: 𝑘 least bits of
the 𝑦 coordinate are all 0
• Gives the same speedup factor,
but saves a factor of 2𝑘 in
memory
3. Negation map
• Method for improving the attack by a factor of 2
• The idea: given a point 𝑃 ∈ 𝐸(𝐹), it’s very easy to
calculate −𝑃
o In prime curves: − 𝑥, 𝑦 = (𝑥, −𝑦)
• The idea: “group” each point and its negative as a
single element
o E.g., use the one with an even 𝑦 coordinate
Fruitless cycles
• Problem with negation map in additive walks
• If 𝑋 ∈ 𝑆𝑖 and 𝑓 𝑋 = −𝑓 𝑋 ∈ 𝑆𝑖 , then
𝑓 𝑓 𝑋 = − 𝑋 + 𝑎𝑖 𝑃 + 𝑏𝑖 𝑄 + 𝑎𝑖 𝑃 + 𝑏𝑖 𝑄 = −𝑋
• “Fruitless” because linear combination is the same
• Happens with 𝑃𝑟 =
1
2𝑚
every step (𝑚 = partition factor)
• Longer even-length cycles are also possible
o Probability is exponential in cycle length
Resolving fruitless cycles
• The simplest idea actually works: just check!
• Check for 2-cycles every 𝑘2 steps
o
o
o
o
When calculating 𝑋𝑖 = 𝑓(𝑋𝑖−1 ) for 𝑖 ≡ 0 (𝑚𝑜𝑑 𝑘2 )
Check if 𝑋𝑖−1 = 𝑋𝑖−3
If so, define 𝑋𝑖 = 2 ∙ min{𝑋𝑖−1 , 𝑋𝑖−2 }
Still easy to calculate the linear combination
• Do the same for larger even lengths
o Analysis shows that optimal 𝑘𝑐 ≈ 𝑚𝑐/2
o Only need to check up to 𝑐𝑚𝑎𝑥 =
log 𝑛
log 𝑚
Implementation and results
EC arithmetic library; Collision library; Challenges and results
Curve arithmetic library
• Generic EC arithmetic library in C++
• Support for various different curves and algorithms
o Extensible syntax that allows adding even more curves and algorithms
• Fast field arithmetic using GMP and NTL
o Incl. complex operations, e.g., Chinese remainders, modular square roots
Collision library
• Generic (templated) C++ library for finding collisions
• Only need to supply the function
• Currently implemented:
o Floyd’s algorithm
o Nivasch’s stack algorithm
o Distinguished point method for parallelization
Challenges
• 4 ECDLP challenges of increasing difficulty
o 30, 40, 50 and 64 bits
• 1 Extra challenge with non-prime order for testing
Pohlig-Hellman reduction
Results!
• 64-bit challenge solved in ~16 hours, ~231 iterations
• Results from previous group: 60 bits in 5-6 days
• Best result to date: 112 bits in 3.5 months
o Used a cluster of 218 PlayStation 3 consoles
o Single-Instruction, Multiple-Data architecture
o Heavy optimizations on all levels
Results!
Average time
Average function calls
65536
35
32768
16384
30
8192
25
2048
1024
log2(#calls)
Runtime (seconds)
4096
512
256
128
20
15
64
32
10
16
8
5
4
2
1
0
30
40
50
Challenge bits
64
30
40
50
Challenge bits
64
Optimization tests
• Check every improvement against vanilla version
• Nivasch: 2.16 times less iterations, 1.4 speedup
• Montgomery: 1.43 speedup factor for 40 bits, 1.33
factor for 30 bits
• Negation map: 1.1 times less iterations, no speedup
o (Actually about 1.07 times slower)
Improvement ideas
• Distributed attack
• Low-level optimizations
o Integer arithmetic
o Field arithmetic (probably harder since NTL is very good at that)
o In-place operations instead of constructors and copying
• Use SIMD architecture (e.g., GPUs)
The End

2012-01-3.presentationx

Transcript 2012-01-3.presentationx

Directory