Transcript Slide 1

Design and Implementation of FPGA-based
systolic array for
LZ Data Compression
By
Mohamed Ahmed Abd El Ghany Ahmed
2006
Overview








Introduction to Data Compression
Data Compression Methods
Systolic Array Operation in LZ
Proposed design (Design-P)
FPGA Implementation
Testing Application
Software simulation
Conclusions
Introduction to Data Compression

Data compression is the process of
converting an input data stream into another
data stream with a reduced size.

Benefits of data compression
Reduction of data storage requirements
 Reduction of data transfer cost

Data Compression Methods
Lossless Data
Compression
The decompressed data must
always be identical to the
original data
Lossy Data
Compression
The decompressed data are
some approximation of the
original data
Run-Length Encoding
Transform coding schemes
Statistical Methods
Vector Quantization schemes
Dictionary Methods
Sub-band coding schemes
Lempel Ziv Algorithms
LZ77
LZSS
LZH
LZ78
LZW
LZMW
LZSS Idea
Dictionary
Lookahead buffer
ab cbbacde bbade aa....
Window
Output codeword (1, Ip, Lmax)
bb acdebba deaae f g....
Shifting by Lmax ( 3 )
(1, 2, 3)
Codeword length Lc
Lc =log2 (dictionary length) + log2 (lookahead buffer length) + 1 bits
In the example, Lc= log2(7)+log2(5)+1 = 7 bits
bba
3 bytes = 24 bits
(1, 2,3)
Compressed to
7 bits
Non-Match Case
Dictionary
Lookahead buffer
ab cbbacde f acde aa....
Window
Output codeword (0, S)
S = first symbol of lookahead buffer
bc bbacdef acdea a....
Shifting by 1
(0, f )
Systolic Array Operation in LZ
dictionary
X0
X1
X2
Lookahead buffer
X3
X4
X5
X6
Y0
E4
X8
Y2
Length =Ls
Length = n-Ls
E5
X7
Y1
E3
E1
E2
E0
i
Y0
X0
Y1
X1
Y2
X7
L5
X6
L4
X5
L3
X3
X4
L2
n-Ls
L1
L0
X2
j
Ls
Interleaved Design (Design-i)
Li
PE2
X7 X4 X6 X3 X5 X2 X4 X1 X3 X0
PE1
D
D
X0
X1
PE0
D
Y2
Input sequence
D
Y1
X2
X3
X4
Y0
X5
X6
X7
X8
The Match Results Block
Mux
Lmax
Reg
Li
l
Lmax
a
b
comparator
a>b
Counter for Xi
position
Mux
Counter for
Xi + [n- Ls/2] position
p
Mux
Ip
Reg
p
Ip
n - Ls
a
b
comparator
a=b
Code word ready
Proposed Design (Design-P)
PE1
PE2
X7….X2 X1 X0
D
Y2
E2
PE0
D
Y1
E1
Y0
E0
L-encoder
Li
Design-P PE
Design-i PE
Yj
Yj
w
w
Reg
Xi
Reg
Xi
w
D
w
a
a
b
Comparator
a=b
b
Ei
comparator
a=b
Ei
Accumulator
D
D
D
Li
L-Encoder
E0
E1
E2
Li0
Li1
MRB of Design-P
MRB of Design-2i
Lmax
Reg
Lmax
Reg
Li
Lmax
Mux
Mux
l
Li
Lmax
l
Ls
a
b
comparator
a>b
a
b
c omparator
a>b
a
b
comparator
a=b
Ip
Ip
done
Reg
Reg
Counter
for Xi
position
Mux
counter
n - Ls
a
b
comparator
a=b
p
Code word
ready
Counter for
Xi + [n- Ls/2]
position
Mux
p
Ip
Mux
p
p
Ip
n - Ls
a
b
comparator
a=b
Code word
ready
Parallel Compression
PE2
D
X0
X1
PE1
Y
2
X2
PE0
D
Y
1
Y
0
E0
E1
E2
X3
LI
L-encoder
X4
X5
X6
D
X7
X8
PE1
PE2
Y
2
D
Y
1
E2
PE0
Y
0
E1
E0
L-encoder
LII
LZ Compression Chip
Yi
Input sequence
FIFO
Xi
Control_FIFO
Code word
SALZC
component
Control
Host controller
Li
First-in-First-out (FIFO)
Write_counter
Write_address
Input_sequence
Block
RAM
controls
Read_counter
read_address
The implementation results of Design-P
and Design-i
Number of
Slices
2352
Number of Number of
Slice Flip
4 input
Flops
LUTs
4704
Number
of
BRAMs
Maximum
Frequency
14
200 MHz
4704
Design-p
(n=512, Ls=16)
302
12 %
401
8%
398
8%
1
7%
113.766 MHz
Design-2i
(n=512, Ls=16)
459
19%
500
10%
619
13%
1
7%
79.815 MHz
Design-p
(n=1024, Ls=16)
310
13%
408
8%
419
8%
2
14%
104.308 MHz
Design-2i
(n=1024, Ls=16)
471
20%
511
10%
650
13%
2
14%
79.700 MHz
I/O Interface of LZ Compression
Chip
Data input
codeword
8
LZ compression
chip
Control signals
6
16
Codeword ready
end
Testing Application
8 8
En
Parallel
port
interface
5
8
LZ compression
chip
5
Mux
Latch of
input
stream
5
6
Latch of
control
signals
En
3
S2
S1
5
Data Flow of Testing Application
Data stream
LZ
compression
Chip
PC
Compressed data
Decompression Architecture
Input codeword
To pointer
To length
En
Code
checker
En
Shift
control
select
R2
R(n-Ls-1)
R(n-Ls)
MUX
output
R1
Direct
symbol
length
Pointer
Selector logic
The Compression Rate (Rc)
Rc = clk

Example:
The dictionary size (n) = 1k
Ls =16
w =8
clk = 104.308 MHz
LsW
n-Ls+1
Rc= 13 Mbit per second
Software Simulation
Data Sets
Silesia corpus
Calgary corpus
Experiments on the Calgary corpus
1.2
compression ratio
1.1
n=256
1
n=512
0.9
n=1024
0.8
n=2048
0.7
n=4096
n=8192
0.6
n=16384
0.5
0.4
4
8
16
32
Ls values
64
128
Experiments on the Silesia Corpus
1.2
compression ratio
1.1
n=256
1
n=512
0.9
n=1024
0.8
n=2048
0.7
n=4096
n=8192
0.6
n=16384
0.5
0.4
4
8
16
32
Ls values
64
128
Conclusions



The proposed implementation is area and
speed efficient. The compression rate is
increased by more than 40% and the design
area is decreased by more than 30%.
The prototype is executed using XILINX,
Spartan II FPGA.
The chip can be incorporated among realtime systems so that data can be
compressed and decompressed on-the-fly.
Future Work



Studying the effect of combining the proposed
architecture for LZ data compression and elliptic
curve cryptography in a single chip.
Study the fast string matching techniques are
required to accelerate the compression process.
By modifying the host controller and including,
e.g., dictionaries, our chip can be used for other
string-matching based LZ algorithms, such as
LZ78 and LZW.
Thanks