The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth Moreland Sandia National Labs Edward Angel U.
Download ReportTranscript The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth Moreland Sandia National Labs Edward Angel U.
The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth Moreland Sandia National Labs Edward Angel U. of New Mexico Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. Overview • Introduction – Motivation, FFT review. • FFT Techniques – Exploitable FFT properties. • Implementation • Results – Performance, applications, conclusions. 2 Graphics Hardware 2003 Motivation • The Fourier transform is a principal tool for digital image processing. – Filtering. – Correction. – Compression. – Classification. – Generation. • As such, should not our graphics hardware support such a tool? 3 Graphics Hardware 2003 The Discrete Fourier Transform • Converts data in the spatial or temporal domain into frequencies the data comprise. 1 N 1 F f x F u f x WNux N x 0 N 1 F F u f x F u WNux 1 u 0 4 Graphics Hardware 2003 WN e j 2 N The Discrete Fourier Transform • 2D transform can be computed by applying the transform in one direction, then the other. 1 N 1 M 1 ux vy F f x, y F u, v f x , y W W M N MN y 0 x 0 N 1 M 1 F F u, v f x, y F u, v WMuxWNvy 1 v 0 u 0 DFT IDFT 5 Graphics Hardware 2003 The Fast Fourier Transform • Divide and Conquer Algorithm – Input sequence is divided into subsequences consisting of values from even and odd indices, respectively. f e x f 2 x f o x f 2 x 1 F u F e u WNu F o u 6 Graphics Hardware 2003 Index Magic • Do not use recursion. – Use dynamic programming: iterate over entire array computing all values for each recursive depth together, like mergesort. • Indexing is non-obvious. – Unlike mergesort, recursive step does not divide array into contiguous chunks. – At any iteration, what partition does a given index belong to, and where can one find the applicable values of the sub-partitions? 7 Graphics Hardware 2003 Index Magic • Common solution: rearrange data by reversing the bits of indices. – FFT can occur with contiguous partitions. – Requires an extra data copy. • Our solution, determine indexing in place. Ai n Ai 1 n u N 2i W2ui Ai 1 n u N 2i N 2i u n div N 2i Note that the paper has a typo. 8 Graphics Hardware 2003 Fourier Symmetry of Real Sequences • In general, the frequency spectra of even real functions contain imaginary values. – Captures magnitude and phase shift of sinusoids. • Brute force FFT doubles computation and storage costs. • But, Fourier transforms of real functions have symmetry. * N u u , F u F – N – Values at F 0 and F 2 are real (because they are conjugates with themselves). 9 Graphics Hardware 2003 Fourier Transform of Real Functions • Pick two functions, let them be f(x) and g(x). • Let h(x) = f(x) + j g(x). – Note that there is no loss of information. • Can perform FFT of h in half the time as performing the brute force FFT of f and g individually. – Simply point to one row of image as real components and another as imaginary components. 10 Graphics Hardware 2003 f g Untangling Fourier Transform Pairs • Fourier transform is linear. – H(u) = F(u) + j G(u) • We can “untangle” using symmetry of F and G. – Add and subtract H(u) and H(N – u) to cancel out conjugate terms of F and G. H u H N u F u F N u jGu GN u 2F u R 2 jGu R H u H N u 2 jF u I 2Gu I 11 Graphics Hardware 2003 Untangling Fourier Transform Pairs F u R H u R H N u R F u I 12 H u I H N u I G u R 12 H u I H N u I G u I 12 H u R H N u R 12 1 2 Graphics Hardware 2003 Packing Transforms of Real Functions • We can store Fourier transform in an array the same size as the input. – Throw away conjugate duplicates. – Throw away imaginary values known to be zero. 13 0 N 1 N2 1 N2 N2 1 N 1 Real Values Imaginary Values Graphics Hardware 2003 Column-wise FFT • We have two columns with real values. – Use same “tangled” approach. • All other columns are complex numbers. – Use regular FFT. Real Real Paired for Complex 14 Graphics Hardware 2003 Packing 2D Transforms of Real Functions N 1 • Rows transformed from complex values are already packed appropriately. • The two rows transformed from real values are untangled and packed to follow suite. N 2 1 N 2 N 2 1 1 0 0 M 1 M 2 Real Values 15 Graphics Hardware 2003 1 M 2 M 2 1 M 1 Imaginary Values Available Resources • nVidia GeForce FX 5800 Ultra. – Full 32-bit floating point pipeline and frame buffers. – Fully programmable vertex and fragment units. • Cg – High level language for vertex and fragment programs. • Traditional CPU: 1.7 GHz Intel Zeon – Freely available high performance FFT implementations. 16 Graphics Hardware 2003 Implementation • Using a SIMD model for parallel computation. – Draw quadrilateral parallel to screen. – Rasterizer invokes the same fragment program “in parallel” over all pixels covered by quadrilateral. – Inputs/output dependent on location of pixel the fragment program is running. • We require many rendering passes. – Use “render to texture” extension. – Use two frame buffers: one for retrieving values of last pass and one for storing results of current computation. 17 Graphics Hardware 2003 Implementation Imag. F I, G R, G I, F Imaginary Untangled R, F I, G Real F Graphics Hardware 2003 R, G Real Tangled Scale Untangle I, F Imag. G Scale R, F Real G Pass Imaginary Tangled FFT Imaginary Untangled Untangle Real Untangled Imag., Tangled Imag. F Real Untangled Imag., Tangled Scale Real F Imag. G Real, Tangled FFT Pass Images 18 Real Tangled Real G Untangle Pass Pass Frequency Spectra Imaginary Tangled FFT Real, Tangled Untangle Scale FFT Fragment Programs • Written in Cg, compiled for GeForce FX. Program 19 Instructions Arithmetic Texture FFT 27 3 Untangle 4 2 Scale 1 1 Tangle 1 2 Pass 0 1 Multiply 66 4 Graphics Hardware 2003 Applications • Digital image filtering. 20 Graphics Hardware 2003 Applications • Texture generation. • Volume rendering. 21 Graphics Hardware 2003 Performance Image Size Rendering Rate (Hz) Arithmetic (sec) Texture Lookup (sec) 10242 0.37 1.9 0.6 5122 1.6 0.44 0.13 2562 6.7 0.09 0.03 1282 25 0.01 0.007 • Computation speed: 2.5 GigaFLOPS • Texture read rate: 3.4 GB/sec 22 Graphics Hardware 2003 Conclusions • The Fourier transform on the GPU has many potential applications. • A well established FFT on the CPU (FFTW) still has an edge over GPU implementation. – Both software and hardware of GPU are first generations. – Room for improvement. 23 Graphics Hardware 2003 Get the Cg Code • http://www.cgshaders.org ? • http://www.cs.unm.edu/~kmorel/documents/fftgpu • [email protected] 24 Graphics Hardware 2003 Questions? 25 Graphics Hardware 2003