Transcript Slide 1
Design and Implementation of FPGA-based systolic array for LZ Data Compression By Mohamed Ahmed Abd El Ghany Ahmed 2006 Overview Introduction to Data Compression Data Compression Methods Systolic Array Operation in LZ Proposed design (Design-P) FPGA Implementation Testing Application Software simulation Conclusions Introduction to Data Compression Data compression is the process of converting an input data stream into another data stream with a reduced size. Benefits of data compression Reduction of data storage requirements Reduction of data transfer cost Data Compression Methods Lossless Data Compression The decompressed data must always be identical to the original data Lossy Data Compression The decompressed data are some approximation of the original data Run-Length Encoding Transform coding schemes Statistical Methods Vector Quantization schemes Dictionary Methods Sub-band coding schemes Lempel Ziv Algorithms LZ77 LZSS LZH LZ78 LZW LZMW LZSS Idea Dictionary Lookahead buffer ab cbbacde bbade aa.... Window Output codeword (1, Ip, Lmax) bb acdebba deaae f g.... Shifting by Lmax ( 3 ) (1, 2, 3) Codeword length Lc Lc =log2 (dictionary length) + log2 (lookahead buffer length) + 1 bits In the example, Lc= log2(7)+log2(5)+1 = 7 bits bba 3 bytes = 24 bits (1, 2,3) Compressed to 7 bits Non-Match Case Dictionary Lookahead buffer ab cbbacde f acde aa.... Window Output codeword (0, S) S = first symbol of lookahead buffer bc bbacdef acdea a.... Shifting by 1 (0, f ) Systolic Array Operation in LZ dictionary X0 X1 X2 Lookahead buffer X3 X4 X5 X6 Y0 E4 X8 Y2 Length =Ls Length = n-Ls E5 X7 Y1 E3 E1 E2 E0 i Y0 X0 Y1 X1 Y2 X7 L5 X6 L4 X5 L3 X3 X4 L2 n-Ls L1 L0 X2 j Ls Interleaved Design (Design-i) Li PE2 X7 X4 X6 X3 X5 X2 X4 X1 X3 X0 PE1 D D X0 X1 PE0 D Y2 Input sequence D Y1 X2 X3 X4 Y0 X5 X6 X7 X8 The Match Results Block Mux Lmax Reg Li l Lmax a b comparator a>b Counter for Xi position Mux Counter for Xi + [n- Ls/2] position p Mux Ip Reg p Ip n - Ls a b comparator a=b Code word ready Proposed Design (Design-P) PE1 PE2 X7….X2 X1 X0 D Y2 E2 PE0 D Y1 E1 Y0 E0 L-encoder Li Design-P PE Design-i PE Yj Yj w w Reg Xi Reg Xi w D w a a b Comparator a=b b Ei comparator a=b Ei Accumulator D D D Li L-Encoder E0 E1 E2 Li0 Li1 MRB of Design-P MRB of Design-2i Lmax Reg Lmax Reg Li Lmax Mux Mux l Li Lmax l Ls a b comparator a>b a b c omparator a>b a b comparator a=b Ip Ip done Reg Reg Counter for Xi position Mux counter n - Ls a b comparator a=b p Code word ready Counter for Xi + [n- Ls/2] position Mux p Ip Mux p p Ip n - Ls a b comparator a=b Code word ready Parallel Compression PE2 D X0 X1 PE1 Y 2 X2 PE0 D Y 1 Y 0 E0 E1 E2 X3 LI L-encoder X4 X5 X6 D X7 X8 PE1 PE2 Y 2 D Y 1 E2 PE0 Y 0 E1 E0 L-encoder LII LZ Compression Chip Yi Input sequence FIFO Xi Control_FIFO Code word SALZC component Control Host controller Li First-in-First-out (FIFO) Write_counter Write_address Input_sequence Block RAM controls Read_counter read_address The implementation results of Design-P and Design-i Number of Slices 2352 Number of Number of Slice Flip 4 input Flops LUTs 4704 Number of BRAMs Maximum Frequency 14 200 MHz 4704 Design-p (n=512, Ls=16) 302 12 % 401 8% 398 8% 1 7% 113.766 MHz Design-2i (n=512, Ls=16) 459 19% 500 10% 619 13% 1 7% 79.815 MHz Design-p (n=1024, Ls=16) 310 13% 408 8% 419 8% 2 14% 104.308 MHz Design-2i (n=1024, Ls=16) 471 20% 511 10% 650 13% 2 14% 79.700 MHz I/O Interface of LZ Compression Chip Data input codeword 8 LZ compression chip Control signals 6 16 Codeword ready end Testing Application 8 8 En Parallel port interface 5 8 LZ compression chip 5 Mux Latch of input stream 5 6 Latch of control signals En 3 S2 S1 5 Data Flow of Testing Application Data stream LZ compression Chip PC Compressed data Decompression Architecture Input codeword To pointer To length En Code checker En Shift control select R2 R(n-Ls-1) R(n-Ls) MUX output R1 Direct symbol length Pointer Selector logic The Compression Rate (Rc) Rc = clk Example: The dictionary size (n) = 1k Ls =16 w =8 clk = 104.308 MHz LsW n-Ls+1 Rc= 13 Mbit per second Software Simulation Data Sets Silesia corpus Calgary corpus Experiments on the Calgary corpus 1.2 compression ratio 1.1 n=256 1 n=512 0.9 n=1024 0.8 n=2048 0.7 n=4096 n=8192 0.6 n=16384 0.5 0.4 4 8 16 32 Ls values 64 128 Experiments on the Silesia Corpus 1.2 compression ratio 1.1 n=256 1 n=512 0.9 n=1024 0.8 n=2048 0.7 n=4096 n=8192 0.6 n=16384 0.5 0.4 4 8 16 32 Ls values 64 128 Conclusions The proposed implementation is area and speed efficient. The compression rate is increased by more than 40% and the design area is decreased by more than 30%. The prototype is executed using XILINX, Spartan II FPGA. The chip can be incorporated among realtime systems so that data can be compressed and decompressed on-the-fly. Future Work Studying the effect of combining the proposed architecture for LZ data compression and elliptic curve cryptography in a single chip. Study the fast string matching techniques are required to accelerate the compression process. By modifying the host controller and including, e.g., dictionaries, our chip can be used for other string-matching based LZ algorithms, such as LZ78 and LZW. Thanks