PFA should not be confused with the mixed-radix generalization of the popular Cooley-Tukey algorithm, which also subdivides a DFT of size n = n1n2 into smaller transforms of size n1 and n2. The latter algorithm can use any factors (not necessarily relatively prime), but it has the disadvantage that it also requires extra multiplications by roots of unity called twiddle factors, in addition to the smaller transforms. On the other hand, PFA has the disadvantages that it only works for relatively prime factors (e.g. it is useless for power-of-two sizes) and that it requires a more complicated re-indexing of the data based on the Chinese Remainder Theorem (CRT). Note, however, that PFA can be combined with mixed-radix Cooley-Tukey, with the former factorizing n into relatively prime components and the latter handling repeated factors.
PFA is also closely related to the nested Winograd FFT algorithm, where the latter performs the decomposed n1 by n2 transform via more sophisticated two-dimensional convolution techniques. Some older papers therefore also call Winograd's algorithm a PFA FFT.
(Although the PFA is distinct from the Cooley-Tukey algorithm, it is interesting to note that Good's 1958 work on the PFA was cited as inspiration by Cooley and Tukey in their famous 1965 paper. In fact, it was the only prior FFT work cited by them, as they were not then aware of the earlier research by Gauss and others.)
Table of contents |
|
Recall that the DFT is defined by the formula:
Suppose that n = n1n2, where n1 and n2 are relatively prime. In this case, we can define a bijective re-indexing of the input k and output j by:
This re-indexing of k is called the Ruritanian mapping (also Good's mapping), while this re-indexing of j is called the CRT mapping. The latter refers to the fact that j is the solution to the Chinese remainder problem j = j1 mod n1 and j = j2 mod n2.
(One could instead use the Ruritanian mapping for the output j and the CRT mapping for the input k, or various intermediate choices.)
A great deal of research has been devoted to schemes for evaluating this re-indexing efficiently, ideally in-place, while minimizing the number of costly modulo (remainder) operations (Chan, 1991, and references).
The above re-indexing is then substituted into the formula for the DFT, and in particular into the product jk in the exponent. Because e2πi = 1, this exponent is evaluated modulo n: any n1n2 = n cross term in the jk product can be set to zero. (Similarly, fj and xk are implicitly periodic in n, so their subscripts are evaluated modulo n.) The remaining terms give:
(Here, we have used the fact that n1-1n1 is unity when evaluated modulo n2 in the inner sum's exponent, and vice-versa for the outer sum's exponent.)
Algorithm
The PFA involves a re-indexing of the input and output arrays, which when substituted into the DFT formula transforms it into two nested DFTs (a two-dimensional DFT).Re-indexing
where n1-1 denotes the multiplicative inverse of n1 modulo n2 and vice-versa for n2-1; the indices ja and ka run from 0,...,na-1 (for a = 1, 2). These inverses only exist for relatively prime n1 and n2, and that condition is also required for the first mapping to be bijective.DFT re-expression
The inner and outer sums are simply DFTs of size n2 and n1, respectively
References: