The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth Moreland Sandia National Labs Edward Angel U.

Download Report

Transcript The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth Moreland Sandia National Labs Edward Angel U.

The FFT on a GPU
Graphics Hardware 2003
July 27, 2003
Kenneth Moreland
Sandia National Labs
Edward Angel
U. of New Mexico
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,
for the United States Department of Energy’s National Nuclear Security Administration
under contract DE-AC04-94AL85000.
Overview
• Introduction
– Motivation, FFT review.
• FFT Techniques
– Exploitable FFT properties.
• Implementation
• Results
– Performance, applications, conclusions.
2
Graphics Hardware 2003
Motivation
• The Fourier transform is a principal tool for digital
image processing.
– Filtering.
– Correction.
– Compression.
– Classification.
– Generation.
• As such, should not our graphics hardware
support such a tool?
3
Graphics Hardware 2003
The Discrete Fourier Transform
• Converts data in the spatial or temporal domain
into frequencies the data comprise.
1 N 1
F  f x   F u    f x WNux
N x 0
N 1
F F u   f x    F u WNux
1
u 0
4
Graphics Hardware 2003
WN  e j 2
N
The Discrete Fourier Transform
• 2D transform can be computed by applying the
transform in one direction, then the other.
1 N 1 M 1
ux
vy


F  f x, y   F u, v  
f
x
,
y
W
W

M
N
MN y 0 x 0
N 1 M 1
F F u, v   f x, y    F u, v WMuxWNvy
1
v 0 u 0
DFT
IDFT
5
Graphics Hardware 2003
The Fast Fourier Transform
• Divide and Conquer Algorithm
– Input sequence is divided into subsequences
consisting of values from even and odd indices,
respectively.
f e x   f 2 x 
f o x  f 2 x  1
F u   F e u   WNu F o u 
6
Graphics Hardware 2003
Index Magic
• Do not use recursion.
– Use dynamic programming: iterate over entire array
computing all values for each recursive depth
together, like mergesort.
• Indexing is non-obvious.
– Unlike mergesort, recursive step does not divide
array into contiguous chunks.
– At any iteration, what partition does a given index
belong to, and where can one find the applicable
values of the sub-partitions?
7
Graphics Hardware 2003
Index Magic
• Common solution: rearrange data by reversing
the bits of indices.
– FFT can occur with contiguous partitions.
– Requires an extra data copy.
• Our solution, determine indexing in place.



Ai n  Ai 1 n  u N 2i  W2ui Ai 1 n  u N 2i  N 2i
u  n div N 2i
Note that the paper has a typo.
8
Graphics Hardware 2003

Fourier Symmetry of Real Sequences
• In general, the frequency spectra of even real
functions contain imaginary values.
– Captures magnitude and phase shift of sinusoids.
• Brute force FFT doubles computation and storage
costs.
• But, Fourier transforms of real functions have
symmetry.
*


N  u 

u
,
F
u

F
–
N
– Values at F 0 and F  2  are real (because they are
conjugates with themselves).
9
Graphics Hardware 2003
Fourier Transform of Real Functions
• Pick two functions, let them be
f(x) and g(x).
• Let h(x) = f(x) + j g(x).
– Note that there is no loss of
information.
• Can perform FFT of h in half the
time as performing the brute
force FFT of f and g individually.
– Simply point to one row of
image as real components and
another as imaginary
components.
10
Graphics Hardware 2003
f
g
Untangling Fourier Transform Pairs
• Fourier transform is linear.
– H(u) = F(u) + j G(u)
• We can “untangle” using symmetry of F and G.
– Add and subtract H(u) and H(N – u) to cancel out
conjugate terms of F and G.
H u   H N  u   F u   F N  u   jGu   GN  u 
 2F u R  2 jGu R
H u   H N  u   2 jF u I  2Gu I
11
Graphics Hardware 2003
Untangling Fourier Transform Pairs
F u R 
H u R  H N  u R 
F u I  12 H u I  H  N  u I 
G u R  12 H u I  H  N  u I 
G u I   12 H u R  H  N  u R 
12
1
2
Graphics Hardware 2003
Packing Transforms of Real Functions
• We can store Fourier
transform in an array
the same size as the
input.
– Throw away
conjugate duplicates.
– Throw away
imaginary values
known to be zero.
13
0
N  1 N2  1 N2 N2  1 N  1
Real Values
Imaginary Values
Graphics Hardware 2003
Column-wise FFT
• We have two columns
with real values.
– Use same “tangled”
approach.
• All other columns are
complex numbers.
– Use regular FFT.
Real
Real
Paired for
Complex
14
Graphics Hardware 2003
Packing 2D Transforms of Real Functions
N 1
• Rows transformed
from complex values
are already packed
appropriately.
• The two rows
transformed from
real values are
untangled and
packed to follow
suite.
N
2

1
N
2
N
2
1

1
0
0 M  1
M
2
Real Values
15
Graphics Hardware 2003
1
M
2
M
2
 1 M  1
Imaginary Values
Available Resources
• nVidia GeForce FX 5800 Ultra.
– Full 32-bit floating point pipeline and frame buffers.
– Fully programmable vertex and fragment units.
• Cg
– High level language for vertex and fragment
programs.
• Traditional CPU: 1.7 GHz Intel Zeon
– Freely available high performance FFT
implementations.
16
Graphics Hardware 2003
Implementation
• Using a SIMD model for parallel computation.
– Draw quadrilateral parallel to screen.
– Rasterizer invokes the same fragment program “in
parallel” over all pixels covered by quadrilateral.
– Inputs/output dependent on location of pixel the
fragment program is running.
• We require many rendering passes.
– Use “render to texture” extension.
– Use two frame buffers: one for retrieving values of
last pass and one for storing results of current
computation.
17
Graphics Hardware 2003
Implementation
Imag.
F
I, G
R, G
I, F
Imaginary
Untangled
R, F
I, G
Real
F
Graphics Hardware 2003
R, G
Real
Tangled
Scale
Untangle
I, F
Imag.
G
Scale
R, F
Real
G
Pass
Imaginary
Tangled
FFT
Imaginary
Untangled
Untangle
Real
Untangled
Imag., Tangled
Imag.
F
Real
Untangled
Imag., Tangled
Scale
Real
F
Imag.
G
Real, Tangled
FFT
Pass
Images
18
Real
Tangled
Real
G
Untangle
Pass
Pass
Frequency Spectra
Imaginary
Tangled
FFT
Real, Tangled
Untangle
Scale
FFT
Fragment Programs
• Written in Cg, compiled for GeForce FX.
Program
19
Instructions
Arithmetic
Texture
FFT
27
3
Untangle
4
2
Scale
1
1
Tangle
1
2
Pass
0
1
Multiply
66
4
Graphics Hardware 2003
Applications
• Digital image filtering.
20
Graphics Hardware 2003
Applications
• Texture generation.
• Volume rendering.
21
Graphics Hardware 2003
Performance
Image Size
Rendering
Rate (Hz)
Arithmetic
(sec)
Texture
Lookup (sec)
10242
0.37
1.9
0.6
5122
1.6
0.44
0.13
2562
6.7
0.09
0.03
1282
25
0.01
0.007
• Computation speed: 2.5 GigaFLOPS
• Texture read rate: 3.4 GB/sec
22
Graphics Hardware 2003
Conclusions
• The Fourier transform on the GPU has many
potential applications.
• A well established FFT on the CPU (FFTW) still
has an edge over GPU implementation.
– Both software and hardware of GPU are first
generations.
– Room for improvement.
23
Graphics Hardware 2003
Get the Cg Code
• http://www.cgshaders.org ?
• http://www.cs.unm.edu/~kmorel/documents/fftgpu
• [email protected]
24
Graphics Hardware 2003
Questions?
25
Graphics Hardware 2003