High-Performance Sparse Fourier Transform on Parallel Architectures
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Fast Fourier Transform (FFT) is one of the most important numerical algorithms widely used in numerous scientific and engineering computations. With the emergence of big data problems, however, in which the size of the processed data can easily exceed terabytes, it is challenging to acquire, process and store a sufficient amount of data to compute the FFT in the first place. The recently developed \textit{sparse} FFT (sFFT) algorithm provides a solution to this problem. The sFFT can compute a compressed Fourier transform by using only a small subset of the input data, thus achieves significant performance improvement.
Modern homogeneous and heterogeneous multicore and manycore architectures are now part of the mainstream computing scene and can offer impressive performance for many applications. The computations that arise in sFFT lend it naturally to efficient parallel implementations. In this dissertation, we present efficient parallel implementations of the sFFT algorithm on three state-of-the-art parallel architectures, namely multicore CPUs, GPUs and a heterogeneous multicore embedded system. While the increase in the number of cores and memory bandwidth on modern architectures provide an opportunity to improve the performance through sophisticated parallel algorithm design, the sFFT is inherently complex, and numerous challenges need to address to deliver the optimal performance. In this dissertation, various parallelization and performance optimization techniques are proposed and implemented. Our parallel sFFT is more than 5x and 20x faster than the sequential sFFT on multicore CPUs and GPUs, respectively. Compared to full-size FFT libraries, the parallel sFFT achieves more than 9x speedup on multicore CPUs and 12x speedup on GPUs for a broad range of signal spectra.