Group Meeting_3.pptx

Download Report

Transcript Group Meeting_3.pptx

Fast Memory Addressing Scheme
for Radix-4 FFT Implementation
Author:
Xin Xiao, Erdal Oruklu and Jafar Saniie
(Illinois Institute of Technology)
Source:
IEEE International Conference on
Electro/Information Technology, 2009. eit ’09
Presented by Cheng-Chien Wu , Master Student of CSIE,CCU
1
Outline
– Introduction
– Radix4-FFT
– Related Work
– Proposed Method
– Experimental Results
– Conclusion
2
Introduction
• Fast Fourier Transform (FFT) is
widely applied in the speech
processing, image processing, and
communication system.
• One of the key components for
various signal processing and
communications applications such
as software defined radio and
OFDM.
3
Introduction(cont’d)
4
Introduction(cont’d)
• The main objective
– This study is primarily Concerned
Improving the performance of the
address generation unit of the FFT
processor by eliminating the
complex critical path components.
5
Outline
– Introduction
– Radix4-FFT
– Related Work
– Proposed Method
– Experimental Results
– Conclusion
6
Introduction(cont’d)
• Important FFT issues
– High throughput
– FFT size
– Power consumption
– Low cost
– Area
7
Outline
– Introduction
– Radix4-FFT
– Related Work
– Proposed Method
– Conclusion
8
Radix-4
• The N-point discrete Fourier transform is
defined by
9
Data Path of Radix-4
10
Butterfly Units
• The N-point FFT can be
decomposed to repeated microoperations called butterfly
operations. When the size of the
butterfly is r, the FFT operation is
called a radix-r FFT.
11
Butterfly Units in Radix-4
12
Memory-based FFT
• In memory-based FFT architecture,
only one butterfly structure is
implemented in the chip, this
butterfly unit will execute all the
calculations recursively.
13
Execution Time
• If parallel and pipeline processing
techniques are used, an N point
radix-r FFT can be executed by
𝑁
log 𝑟 𝑁 clock cycles.
𝑟
• This indicates that a radix-4 FFT
can be four times faster than a
radix-2 FFT.
14
Outline
– Introduction
– Radix4-FFT
– Related Work
– Proposed Method
– Experimental Results
– Conclusion
15
Related Work
Year
Title
1969
Organization of Large Scale Fourier
Processors
J. Assoc.
Comput. Mach.
1976
Simplified control of FFT hardware
IEEE Trans. Acoust.,
Speech, Signal Processing
1992
Conflict free memory addressing for
dedicated FFT hardware
IEEE Trans. Circuits Syst.
1999
An effective memory addressing scheme
for FFT processors
IEEE Trans. on Signal
Process
2008
An Efficient FFT Engine With Reduced
Addressing Logic
IEEE Transactions on
Circuits and Systems II
16
Data Path of Radix-2
17
Data Path of Radix-4
18
Outline
– Introduction
– Radix4-FFT
– Related Work
– Proposed Method
– Experimental Results
– Conclusion
19
Memory Banks
• Four memory banks are used to store the data.
20
Read Ports and Write Ports
• However, for pass 1 and pass 2, four inputs
and four outputs of any butterfly stage belong
to same memory bank.
• Since each memory bank is a two-port
memory, at each clock cycle, each memory
bank can export (read) once and import(write)
once.
• Four clock cycles are necessary to perform
four read and four write accesses in pass 1
and pass 2.
21
Counter D
• Other main components of the
FFT processor are Counter D and
the barrel shifter. Counter D has
two parts:
– Pass counter P which is v=log4N
bits (Pv-1 to P0)
– Butterfly counter B which is 𝑚 =
𝑁
log 2
4
𝑁 bits (Bm-1 to B0).
22
Barrel Shifter
• The barrel shifter generates all the
addresses for four memory banks
based on the pass number of the
FFT, which can be expressed as:
RR(counter B, 2p)
• where RR(counter B, 2p) means
rotate-right butterfly counter B by
2p bits, and p is the pass number
of FFT.
23
Twiddle Factor
• For twiddle factors Wb, Wc and Wd, three
memory banks are used with same address
generation logic. For pass p, this address is
given as:
• 𝐵𝑚−1 𝐵𝑚−2 … 𝐵2𝑝 000 … 0(2p 0’s follow)
24
For Larger FFT Size
• For different length FFT
transforms, the control logic of
the multiplexers only depends on
the last three bits of the
counter ,so the register and
multiplexer structures are fixed
for different length FFTs resulting
in a common architecture for any
N-point FFT.
25
Logic Minimization
• After logic minimization, it results in
only primitive logic gates such as
AND/OR gates using the least
significant bits of the butterfly
counter B.
26
Address Sequences(R0~R15)
27
Address Sequences(R16 ~R31)
28
Outline
– Introduction
– Radix4-FFT
– Related Work
– Proposed Method
– Experimental Results
– Conclusion
29
Experimental Results
30
Experimental Results
31
Outline
– Introduction
– Radix4-FFT
– Related Work
– Proposed Method
– Experimental Results
– Conclusion
32
Conclusions
• The proposed method for radix-4 FFT
avoids any addition in the address
generation, enabling a fast data path for
butterfly operations.
• The same concept can be extended to any
radix FFT, but the amount of registers and
multiplexers for different radix FFT will be
different: For radix-r FFT, 2𝑟 2 registers and
4r multiplexers are needed.
33
Thanks for Listening
34