Document 7601927
Download
Report
Transcript Document 7601927
AHPCC/NCSA WORKSHOP
Fast Fourier Transform Using FFTw
Guobin Ma1 ([email protected]),
Sirpa Saarinen2 ([email protected]),
and Paul M. Alsing1 ([email protected]),
1AHPCC, 2NCSA
4/18/00
Spring 2000 FFTw workshop
1
Contents
FFT basic
(Paul)
What is FFT and why FFT
FFTw
Outline of FFTW
(Guobin)
Characteristics
C routines
Performance and C example codes
(Sirpa)
Fortran wrappers and example codes
(Guobin)
Exercises
4/18/00
(skipped)
Spring 2000 FFTw workshop
2
FFT Basic
What is FFT and why FFT
by Paul Alsing
4/18/00
Spring 2000 FFTw workshop
3
Fourier Transform: frequency analysis of time series data.
DFT: Discrete Fourier Transform (N time/freq points)
FFT: Fast Fourier Transform: efficient implementation ~O(Nlog2N)
H f ht e
2 i f t
1
ht
2
N 1
H n hk e 2 i k n / N
dt
k 0
H f e
2 i f t
d
1
hk
N
N 1
H e
n 0
2 i k n / N
n
hk ht k , t k kt , k 0,1, , N 1
fn
4/18/00
n
, n N / 2, , N / 2 1
Nt
Spring 2000 FFTw workshop
4
Aliasing issues:
Let fc = Nyquist Frequency
= 1/(2t). A sine wave
sampled at fc will be sampled at
2 points, the peak and the trough.
Frequency components f > | fc |
will be falsely folded back into
the range -fc < f < fc.
4/18/00
Spring 2000 FFTw workshop
5
Fourier Transform: radix 2, Danielson-Lanczos
N 1
H n e 2 i k n / N hk
k 0
N / 2 1
e
2 i 2 k n / N
k 0
N / 2 1
e
hk
2 i k n / N / 2
k 0
N / 2 1
2 i 2 k 1 n / N
e
hk
k 0
hk W
N / 2 1
n
2 i k n / N / 2
n
2 i / N
e
h
,
W
e
k
k 0
H ne W n H no
where H ne is the nth component of the FT of length N/2 formed from the
even components of the original hk ' s; H no is the nth component of the FT
of length N/2 formed from the odd components of the original hk ' s
4/18/00
Spring 2000 FFTw workshop
6
Fourier Transform: radix 2, Danielson-Lanczos (cont.)
H n H ne W n H no ,
H nee W 2 n H neo W n H noe W 2 n H noo ,
W n
H
H
Hn
H
H
H
e
n
, H no
ee
n
W 4 n H neeo W 2 n H neoe W 4 n H neoo
oee
n
W 4 n H noeo
, H neo , H noe , H noo
,H
2n
ooe
n
,
W 4 n H noooo
log 2 N 8 steps
is of length N
eee
eoe
n
n
4/18/00
W H
eee
n
are of length N / 2
are of length N / 4
, H noee , H nooe , H neeo , H neoo , H noeo , H nooo
Spring 2000 FFTw workshop
are of length N / 8
7
Fourier Transform: radix 2, butterfly Cooley-Tukey algorithm
We finally get down to 1-point transforms such as
H neoeeoeooee hm (some input valu e) for some 0 m N - 1
The question is: which value of m corresponds to which pattern
of e’s and o’s?
The answer is:
Let {e=0,o=1}. Reverse the pattern of e’s and o’s and you will
have the value of m in binary.
4/18/00
Spring 2000 FFTw workshop
8
Bit reversal:
The Cooley-Tukey
algorithm first
rearranges the data
in bit reversed form,
then builds up the
transform in
N log2N iterations
(decimation in time).
4/18/00
eee
eeo
eee
eeo
eoe
eoe
eoo
eoo
oee
oee
oeo
oeo
ooe
ooe
eee
eee
Spring 2000 FFTw workshop
9
Ordering of
time series
(coord space)
and frequencies
in fourier
(momentum)
space.
4/18/00
Spring 2000 FFTw workshop
10
Example Application: Quantum Mechanics
Propagation of (dimensionless) Schrodinger Wave Function
x, t
2 x, t
i
V x, t x, t H x, t 0
2
t
x
x, t t e i H t x, t ,
H T V
e iT t / 2 e iV t e i T t / 2 x, t
e iV x1 ,t t x1 , t
iV t
In coordinate space, e
x, t
iV x N ,t t
e
x
,
t
N
e i1/ 2 k1 t / 2 ˆ k1 , t
iT t
In fourier (momentum) space, e
ˆ k , t
i1/ 2 k N2 t / 2
e
ˆ k N , t
2
11
Serial FFTs
y
x
y
x
4/18/00
Transpose data to keep
x data is contiguous in
memory (Fortran) Spring 2000 FFTw workshop y transforms continguous
12
in memory.
transpose
Parallel FFTs
P0
x
P1 y
P2
P3
P0
P1
x P2
P3
y
In parallel, all x transforms
transpose
are local operations on each
processor (no communication)
In performing the transpose
processors must perform an
All-to-All communication.
Outline of FFTw
By Guobin Ma
4/18/00
Spring 2000 FFTw workshop
14
Characteristics of FFTw
C routines generated by Caml-Light ML
1D/nD, real/complex data
Arbitrary input size, not necessary 2n
Serial/Parallel, Share/Distributed Memory
Faster than all others, high performance
Portable, automatically adapt to machine
4/18/00
Spring 2000 FFTw workshop
15
Two Phases of FFTw
Hardware dependent algorithm
Planner
‘Learn’ the fast way on your machine
Produce a data structure --‘plan’
Reusable
Executor
Compute the transform
Apply to all FFTw operation modes
1D/nD, complex/real, serial/parallel
4/18/00
Spring 2000 FFTw workshop
16
C Routines of FFTw
Routines
1D/nD complex
1D/nD real
Corresponding parallel (MPI) ones
Arguments
Special notes
Data formats
4/18/00
Spring 2000 FFTw workshop
17
1D Complex Transform
Typical call
#include <fftw.h>
…
{
fftw_complex in[N], out[N];
fftw_plan p;
…
p = fftw_create_plan(int n, fftw_direction dir, int flags);
…
fftw_one(p, in, out);
…
fftw_destroy_plan(p);
}
4/18/00
Spring 2000 FFTw workshop
18
1D Complex Transform (cont.)
Routines
fftw_plan fftw_create_plan(int n,
fftw_direction dir, int flags);
void fftw_one(fftw_plan plan, fftw_complex *in,
fftw_complex *out);
fftw_plan fftw_create_plan_specific(int n,
fftw_direction dir, int flags,
fftw_complex *in, int istride,
fftw_complex *out, int ostride);
4/18/00
Spring 2000 FFTw workshop
19
1D Complex Transform (cont.)
Routines (cont.)
void fftw(fftw_plan plan, int howmany,
fftw_complex *in, int istride, int idist ,
fftw_complex *out, int ostride, int odist);
fftw_destroy_plan(fftw_plan plan);
4/18/00
Spring 2000 FFTw workshop
20
1D Complex Transform (cont.)
Arguments
plan: data structure containing all the information
n: data size
dir: FFTW_FORWARD (-1), FFTW_BACKWORD (+1)
flags: FFTW_MESURE, FFTW_ESTIMATE, FFTW_OUT_PLACE,
FFTW_IN_PLACE, FFTW_USE_WISDOM, separated by |
howmany: number of transforms / input arrays
in, istride, idist: input arrays, in[i*istride+j*idist]
out, ostride, odist: output arrays, ...
4/18/00
Spring 2000 FFTw workshop
21
1D Complex Transform (cont.)
Notes
out of place (default), in[N], out[N]
in place, save memory, cost more time
ignore ostride and odist; ignore out
in-order output, 0 frequency at out[0]
unnormalized, factor of N
4/18/00
Spring 2000 FFTw workshop
22
nD Complex Transform
Routines, similar to 1D case, except …
fftwnd_plan fftwnd_create_plan(int rank,
const *int n, fftw_direction dir, int flags);
void fftwnd_one(fftwnd_plan plan, , );
fftwnd_plan fftw_create_plan_specific(int rank,
const *int n, fftw_direction dir, , , , , );
void fftwnd(fftwnd_plan plan, , , , , , , );
fftwnd_destroy_plan(fftwnd_plan plan);
4/18/00
Spring 2000 FFTw workshop
23
nD Complex Transform (cont.)
Arguments
rank: dimensionality of the arrays to be transformed
n: pointer to an array of rank - size of each dimension, e.g. n[8,4,5]
row-major for C, column-major for Fortran
Special routines for 2D and 3D cases
nd -> 2d, 3d
n_dim -> nx, ny
4/18/00
or nx, ny, nz
Spring 2000 FFTw workshop
24
1D Real Transform
Routines, similar to 1D complex case, except …
rfftw_plan rfftw_create_plan( , , );
void rfftw_one(rfftw_plan plan, fftw_real *in,
fftw_real *out);
rfftw_plan rfftw_create_plan_specific(int n,
fftw_direction dir, int flags, fftw_real *in,
int istride, fftw_real *out, int ostride);
void rfftw(rfftwnd_plan plan, int howmany,
fftw_real *in, int istride, int idist,
fftw_real *out, int ostride, int odist);
rfftw_destroy_plan(rfftw_plan plan);
4/18/00
Spring 2000 FFTw workshop
25
1D Real Transform (cont.)
Arguments
dir: FFTW_REAL_TO_COMPLEX = FFTW_FORWARD = -1
FFTW_COMPLEX_TO_REAL = FFTW_BACK_WARD = 1
others have the same meaning as before
4/18/00
Spring 2000 FFTw workshop
26
nD Real Transform
Routines, similar to 1D real case, but …
rfftwnd_plan rfftwnd_create_plan(int rank,
const *int n, fftw_direction dir, int flags);
void rfftwnd_one_real_to_complex(rfftwnd_plan plan,
fftw_real *in, fftw_complex *out);
void rfftwnd_one_complex_to_real(rfftwnd_plan plan,
fftw_complex *in, fftw_real *out);
void rfftwnd_real_to_complex(rfftwnd_plan plan,
int howmany, fftw_real *in, int istride,
int idist, fftw_complex *out, int ostride,
int odist);
4/18/00
Spring 2000 FFTw workshop
27
nD Real Transform (cont.)
Routines (cont.)
void rfftwnd_complex_to_real(rfftwnd_plan plan,
int howmany, fftw_complex *in, int istride,
int idist, fftw_real *out, int ostride,
int odist);
rfftwnd_destroy_plan(rfftwnd plan);
Special 2D and 3D routines
4/18/00
Spring 2000 FFTw workshop
28
nD Array Format
nD arrays stored as a single contiguous block
C order, Row-major order
First index most slowly, last most quickly
Fortran order, Column-major order
First index most quickly, last most slowly
Static Array - no problem
Dynamic Array - may have problem in nD case
4/18/00
Spring 2000 FFTw workshop
29
Parallel FFTw
Multi-thread
Skipped
MPI
nD complex
Routines
Notes
Data Layout
1D complex
nD real
4/18/00
Spring 2000 FFTw workshop
30
nD Complex MPI FFTw
Routines, similar to uniprocessor case, except mpi…
fftwnd_mpi_plan fftwnd_create_plan(mpi_comm comm,
int rank, const *int n, fftw_direction dir, int flags);
void
int
int
int
int
fftwnd_mpi_local_size(fftwnd_mpi_plan p,
*local_first, int *local_first_start,
*local_second_after_transpose,
*local_second_start_after_transpose,
*total_local_size);
local_data = (fftw_complex*) malloc(sizeof(fftw_complex)
* total_local_size);
work = (fftw_complex*) malloc(sizeof(fftw_complex)
* total_local_size);
4/18/00
Spring 2000 FFTw workshop
31
nD Complex MPI FFTw (cont.)
Routines (cont.)
void fftwnd_mpi(fftwnd_mpi_plan p, int n_fields,
fftw_complex *local_data, fftw_complex *work,
fftw_mpi_output_order output_order);
void fftw_mpi_destroy_plan(fftwnd_mpi_plan p);
4/18/00
Spring 2000 FFTw workshop
32
nD Complex MPI FFTw (cont.)
Notets
First argument: comm - MPI communicator
Data layout
All fftw_mpi are in-place
work:
Optional,
Same size as local_data,
great efficiency by extra storage
output_order: normal/transposed
transposed: performance improvements, need to reshape the
data manually, may have problem sometimes
4/18/00
Spring 2000 FFTw workshop
33
nD Complex MPI FFTw (cont.)
Data layout
Distributed data
Divided according to row (1st dimension) in C
Divided according to column (last dimension) in Fortran
Given plan, all other parameters regarding to data layout are
determined by fftwnd_mpi_local_size
total_local_size = n1/np*n1*n2…*nk*n_fields
transposed_order: n2 will be the 1st dimension in output
inverse transform n[n2,n1,n3,...,nk]
4/18/00
Spring 2000 FFTw workshop
34
1D Complex MPI FFTw
Routines, similar to nD case, except no nd…
fftw_mpi_plan fftw_create_plan(mpi_comm comm,
int n, fftw_direction dir, int flags);
void
int
int
int
int
4/18/00
fftw_mpi_local_size(fftw_mpi_plan p,
*local_n, int *local_n_start,
*local_n_after_transpose,
*local_start_after_transpose,
*total_local_size);
Spring 2000 FFTw workshop
35
1D Complex MPI FFTw (cont.)
Routines (cont.)
void fftw_mpi(fftw_mpi_plan p, int n_fields,
fftw_complex *local_data, fftw_complex *work,
fftw_mpi_output_order output_order);
void fftw_mpi_destroy_plan(fftw_mpi_plan p);
Generally worse speedup than nD, fit large size
4/18/00
Spring 2000 FFTw workshop
36
nD Real MPI FFTw
Similar to that for uniprocessor and complex MPI
Speedup 2, save 1/2 space at the expense of more
complicated data format
Can have transposed-order output data
No 1D Real MPI FFTw
4/18/00
Spring 2000 FFTw workshop
37
FFTw Fortran-Callable Wrappers
Routine names, append _f77 in C routine names
fftw/fftwnd/rfftw/rfttwnd ->
fftw_f77/fftwnd_f77/rfftw_f77/rfttwnd_f77
fftw_mpi/fftwnd_mpi ->
fftw_f77_mpi/fftwnd_f77_mpi
e.g. fftwnd_create_plan(3, n_dim,
FFTW_FORWARD, FFTW_ESTIMATE | FFTW_IN_PLACE)
-> fftwnd_f77_create_plan(plan, 3, n_dim,
FFTW_FORWARD, FFTW_ESTIMATE + FFTW_IN_PLACE)
4/18/00
Spring 2000 FFTw workshop
38
Break
4/18/00
Spring 2000 FFTw workshop
39
FFTw Performance
By Sirpa Saarinen
4/18/00
Spring 2000 FFTw workshop
40
C Example Codes
By Sirpa Saarinen
4/18/00
Spring 2000 FFTw workshop
41
FFTW Fortran Wrappers
and Example Codes
By Guobin Ma
4/18/00
Spring 2000 FFTw workshop
42
FFTw Fortran-Callable Wrappers
Notes
Any function that returns a value is converted into a subroutines with an
additional (first) parameter.
No null in Fortran, must allocate and pass an array for out.
nD arrays, column-major, Fortran order
plan variables: be declared as integer
Constants
FFTW_FORWARD, FFTW_BACKWARD, FFTW_IN_PLACE, …
separated by ‘+’ instead of ‘|’
In file fortran/fftw_f77.i, fftw_f90.i, fftw_f90_mpi.i
4/18/00
Spring 2000 FFTw workshop
43
Fortran Examples
Source codes at AHPCC (tested on Turing, BB, SGI):
~gbma/workshop/fftw/codes or
http://www.arc.unm.edu/~gbma/Workshop/FFTW/codes
Complex data
1D serial, fftw_1d.f90
1D parallel, fftw_1d_p.f90
nD serial, fftw_3d.f90
nD Parallel
Normal order, fftw_3d_p_n.f90
Transposed order, fftw_3d_p_t.f90
4/18/00
Spring 2000 FFTw workshop
44
Fortran Examples (cont.)
1D case
Input
f ( x)
N 2
keikx dk
N 2
Forward output F (k ) (0,1,2,..., k ,..., N 1, N , N 1,... 1)
2
2
2
Inverse output f (x )
nD case
Input
f ( x, y, z ) k x k y k z e
i(kx xk y ykz z )
dk x dk y dk z
Forward output F (k x , k y , k z ) (0,1,2,..., k x k y k z ,...,1)
Inverse output
f ( x, y , z )
4/18/00
Spring 2000 FFTw workshop
45
1D Serial Fortran Example
FFTw codes
...
call fftw_f77_create_plan(plan_forward,N,
FFTW_FORWARD, FFTW_ESTIMATE)
call fftw_f77_create_plan(plan_reverse,N,
FFTW_BACKWARD,FFTW_ESTIMATE)
...
call fftw_f77_one(plan_forward,in,out)
...
call fftw_f77_one(plan_reverse,out,in)
...
call fftw_f77_destroy_plan(plan_forward)
call fftw_f77_destroy_plan(plan_reverse)
4/18/00
Spring 2000 FFTw workshop
&
&
46
1D Parallel Fortran Example
FFTw codes
...
call fftw_f77_mpi_create_plan(p_fwd,MPI_COMM_WORLD,N, &
FFTW_FORWARD,FFTW_ESTIMATE)
...
call fftw_f77_mpi_local_sizes(p_fwd, local_n, local_start, &
local_n_after_trans, local_start_after_trans,
total_local_size)
...
allocate( psi_local(0:total_local_size-1) )
...
allocate( work(0:total_local_size-1) )
4/18/00
Spring 2000 FFTw workshop
47
1D Parallel Fortran Example (cont.)
FFTw codes (cont.)
...
call fftw_f77_mpi(p_fwd,1,psi_local,work,USE_WORK)
...
call fftw_f77_mpi_destroy_plan(p_fwd)
...
call fftw_f77_mpi_create_plan(p_rvs,MPI_COMM_WORLD,N, &
FFTW_BACKWARD,FFTW_ESTIMATE)
...
call fftw_f77_mpi(p_rvs,1,psi_local,work,USE_WORK)
...
call fftw_f77_mpi_destroy_plan(p_rvs)
4/18/00
Spring 2000 FFTw workshop
48
nD Serial Fortran Example
FFTw codes
call fftwnd_f77_create_plan(p_fwd,nd,n_dim, &
FFTW_FORWARD,FFTW_ESTIMATE + FFTW_IN_PLACE)
call fftwnd_f77_one(p_fwd,psi,0)
call fftwnd_f77_destroy_plan(p_fwd)
call fftwnd_f77_create_plan(p_rvs,nd,n_dim, &
FFTW_BACKWARD,FFTW_ESTIMATE + FFTW_IN_PLACE)
call fftwnd_f77_one(p_rvs,psi,0)
call fftwnd_f77_destroy_plan(p_rvs)
4/18/00
Spring 2000 FFTw workshop
49
nD Parallel Fortran Example
FFTw codes, normal order, nD local array
n_dim(1)=nx; n_dim(2)=ny; n_dim(3)=nz
call fftwnd_f77_mpi_create_plan(p_fwd,MPI_COMM_WORLD,&
nd,n_dim,FFTW_FORWARD,FFTW_ESTIMATE)
call fftwnd_f77_mpi_local_sizes(p_fwd, local_nlast,
&
local_last_start, local_nlast2_after_trans,
&
local_last2_start_after_trans, total_local_size)
allocate( psi_local(0:nx-1,0:ny-1,0:local_nlast-1) )
allocate( work(0:nx-1,0:ny-1,0:local_nlast-1) )
4/18/00
Spring 2000 FFTw workshop
50
nD Parallel Fortran Example (cont.)
FFTw codes, normal order, nD local array (cont.)
call fftwnd_f77_mpi(p_fwd,1,psi_local,work,USE_WORK,order)
call fftwnd_f77_mpi_destroy_plan(p_fwd)
call fftwnd_f77_mpi_create_plan(p_rvs,MPI_COMM_WORLD, &
nd,n_dim,FFTW_BACKWARD,FFTW_ESTIMATE)
call fftwnd_f77_mpi(p_rvs,1,psi_local,work,USE_WORK,order)
call fftwnd_f77_mpi_destroy_plan(p_rvs)
4/18/00
Spring 2000 FFTw workshop
51
nD Parallel Fortran Example (cont.)
FFTw codes, transposed order, 1D local array
n_dim(1)=nx; n_dim(2)=ny; n_dim(3)=nz
call fftwnd_f77_mpi_create_plan(p_fwd,MPI_COMM_WORLD,&
nd,n_dim,FFTW_FORWARD,FFTW_ESTIMATE)
call fftwnd_f77_mpi_local_sizes(p_fwd, local_nlast,
&
local_last_start, local_nlast2_after_trans,
&
local_last2_start_after_trans, total_local_size)
allocate( psi_local(0:total_local_size-1) )
allocate( work(0:total_local_size-1) )
4/18/00
Spring 2000 FFTw workshop
52
nD Parallel Fortran Example (cont.)
FFTw codes, transposed order, 1D local array (cont.)
call fftwnd_f77_mpi(p_fwd,1,psi_local,work,USE_WORK,order)
call fftwnd_f77_mpi_destroy_plan(p_fwd)
n_dim(1)=nx; n_dim(2)=nz; n_dim(3)=ny
call fftwnd_f77_mpi_create_plan(p_rvs,MPI_COMM_WORLD, &
nd,n_dim,FFTW_BACKWARD,FFTW_ESTIMATE)
call fftwnd_f77_mpi(p_rvs,1,psi_local,work,USE_WORK,order)
call fftwnd_f77_mpi_destroy_plan(p_rvs)
4/18/00
Spring 2000 FFTw workshop
53
nD Parallel Fortran Example (cont.)
Notes
Normal order
Easy to code, ‘low’ performance
Transposed order
‘High’ performance, complicated to code, user reorder data
Use-work
High efficiency, large memory space
4/18/00
Spring 2000 FFTw workshop
54
Run the Examples at AHPCC
Copy files to your directory
cp ~gbma/workshop/fftw/codes/*.* .
Compile
make filename.tur
make filename.bb
make filename.sgi
with link specification -lfftw -lfftw_mpi (only for MPI)
Run
BB: qsub -I -l nodes=2
mpirun -np 2 -machinefile $PBS_NODEFILE filename.bb
Turing: filename.tur
SGI: mpirun -np 2 filename.sgi
4/18/00
Spring 2000 FFTw workshop
55
References
Numerical Recipe (FOTRAN)
by / William T. Vetterling et al., New York : Cambridge
University Press, 1992
Numerical integration
by P. J. Davis & P. Rabinowitz, Waltham, Mass., Blaisdell
Pub. Co. 1967
www.fftw.org
FFTW User’s manual
by M. Frigo & S. G. Johnson
4/18/00
Spring 2000 FFTw workshop
56
Acknowledgement
Brain Baltz
installation of FFTw at AHPCC
running MPI at AHPCC
John Greenfield
setting up the grid access
Andrew Pineda
computer work environment at AHPCC
Brain Smith & Susan Atlas
many stimulated discussions
Many others ...
4/18/00
Spring 2000 FFTw workshop
57