p1011_diab_s

Download Report

Transcript p1011_diab_s

An automated pipeline balancing
in the SRC Reconfigurable Computer
and its application to the RC5 cipher breaking
Hatim Diab1, Miaoqing Huang1, Kris Gaj2,
Tarek El-Ghazawi1 , Nikitas Alexandridis1
1The
George Washington University
2George Masson University
Objectives
• Implement pipelined RC5 Key Breaker
on a single chip,
• Demonstrate automatic balancing of a
pipeline by a compiler (SRC),
• Show the cost of added pipeline.
Diab
2
1011/MAPLD'04
Requirements
• Given:
– A matching pair of Plain text message (M) and Cipher text
(C)
• Find the correct corresponding Secret Key
– Test the possible Secrete Keys exhaustively,
– Keys, 128bit-long key from all 0’s to all 1’s.
• Requirements
– The processing element (PE) to be fed a new Secrete Key
(Ki) each cycle,
– Compare C with the output Ci corresponding to Ki
Diab
3
1011/MAPLD'04
RC5 Algorithm
• Mixing in the Secret Key.
i=j=0
A=B=0
do 3*max(26,4) times
// S[0..25] is the array to be mixed for rc5 encryption
A=S[i]=(S[i]+A+B)<<<3;
// L[0…3] is the array converted from the secrete key K[0..15]
B=L[j]=(L[j]+A+B)<<<(A+B);
i=(i+1) mod (26);
// The output is the array S[0..25], which will be used to encrypt
j=(j+1) mod (4);
// the plain text.
• Encryption.
LE=A+S[0];
// A is the upper part of plain text
RE=B+S[1];
// B is the low part of plain text
for i=1 to 12 do
LE=((LE⊕RE)<<<RE)+S[2*i];
RE=((RE⊕LE)<<<LE)+S[2*i+1];
The processed LE is the upper part of cipher text,
The processed RE is the low part of cipher text.
Diab
4
1011/MAPLD'04
Key-Breaking Flowchart
Set 128 bit key to all 0s
Counter
M
Encryption
Ki
Key Generation
Ci
Ci=C?
N
Y
Stop & return to main
program
Diab
5
1011/MAPLD'04
Condition & Implementation
• RC5 32/12/16
– Cipher text 32*2 bits = 64 bits
– 12 rounds
– Key = 16 * 8bits = 128 bits
• Implement RC5 encryption using
– 12 rounds of encryption macros, with 6 clocks
latency
– 78 iterations of key generation macros, with 3
clocks latency
Diab
6
1011/MAPLD'04
Design & Bottleneck
• Pipelined design
– Process one key every clock cycle in a pipelined
fashion
• Data dependencies
– One of the features of RC5 is the extensive use of
data dependent rotations,
– S value needed every 26th step,
– L value needed every 4th step,
• Manual HDL-based realization of the pipeline
proved to be time-consuming and error-prone.
Diab
7
1011/MAPLD'04
Data Dependencies in Each Iteration
0
1
L0
L1
L2
L3
L0
2
3
4
5
6
7
8
24
25
to 26
S0
S1
S2
L2
S3
S4
S24
S25
30
50
51
L3
from 25
26
27
28
29
to 52
S0
S1
S2
S3
L0
L1
L2
54
55
56
L3
L0
from 51
52
53
76
77
RC 5 Encryption
Diab
8
1011/MAPLD'04
Solution
• Implement on one FPGA chip concurrently
– 78 key initialization macros
– 12 encryption macros
• Connect the macros in a linear pipeline.
• The SRC compiler will balance the pipeline by
inserting delay channels to make all macros
run synchronously.
Diab
9
1011/MAPLD'04
Delay Channels Added by
SRC Compiler
Delay 1 = 1 reg
Delay 2 = 2 reg
wire
Delay 5 = 5 reg
Diab
10
1011/MAPLD'04
Detailed flow
kkey001
skey000
kkey002
kkey003
skey001
skey002
skey003
skey004
1
2
3
4
kkey003
kkey010
kkey000
0
0
0
X
y
5
6
kkey010
7
8
skey024
skey025
24
25
to 26
Delay
Channel
Delay
Channel
Delay
Channel
Delay
Channel
Delay
Channel
Delay
Channel
Delay
Channel
skey100
skey101
skey102
skey103
skey104
skey124
skey125
26
27
28
29
30
50
51
from 25
to 52
Delay
Channel
Delay
Channel
skey200
skey201
52
53
Delay
Channel
Delay
Channel
76
77
from 51
54
55
56
RC 5 Encryption
Diab
11
1011/MAPLD'04
Compilation Result
• Device utilization summary:
Number of External IOBs
Number of LOCed External IOBs
Number of Slices
Number of BUFGMUXs
594 out of 1104
594 out of 594
33790 out of 33792
1 out of 16
53%
100%
99%
6%
• Maximum Clock Frequency
Diab
12
1011/MAPLD'04
Effectiveness of the Benchmark
Time (SRC) (ms)
Expected Key
Found Key
EEDBA521
6D8F4B15
00000000 00000000
00000000 00000000
00000000 00000000
00000000 00000000
97,342
0
C53073A4
8AFAE310
00000000 00000000
00000000 00010000
00000000 00000000
00000000 00010000
98,028
359,000
07CEC757
C72BCAE9
00000000 00000000
00000000 10000000
00000000 00000000
00000000 10000000
2,781,980
1,847,105,000
2F68DC4A
ADBFACC6
00000000 00000000
00000000 20000000
00000000 00000000
00000000 20000000
5,466,274
5,251,282,000
6643CACD
D1EDD161
00000000 00000000
00000001 00000000
00000000 00000000
00000001 00000000
43,050,562
Too large to
simulate
51C6514A
4EF0A99B
00000000 00000000
00000010 00000000
00000000 00000000
00000010 00000000
687,318,493
Too large to
simulate
Diab
13
Time (PC)
(ms)
Cipher Text
1011/MAPLD'04
Conclusion
• The objective was realized, i.e., every clock one
128bit-long variable is pushed into the processing
chain,
• A speed-up of 1000x over SW and 300x over serial
HW implementations was achieved,
• For the flexible parameters used in RC5 algorithm,
different map routines can be designed respectively to
fit the distinct area and throughput requirements,
• The automated pipeline balancing of the SRC
compiler proved to substantially decrease the
development time of complex pipelined designs.
Diab
14
1011/MAPLD'04