Parallel I/O on Compressed Data Files



Journal Title

Journal ISSN

Volume Title



The increase in processing power of modern computing hardware has not been accompanied by a proportional increase in the performance of storage technology leading to an imbalance in cluster and parallel computing architectures where input-output (I/O) operations may bottleneck the overall performance of the system. This makes necessary the use of sophisticated software solutions to overcome limitations on I/O performance. One method is to apply specialized algorithms in parallel I/O to optimize data transfer. Another solution to this problem is to use data compression to effectively reduce the amount of data which is transferred between processing and storage units. An under examined area of research is the intersection of parallel I/O and data compression and how these two techniques can be combined in High Performance Computing (HPC) environments. This dissertation presents a general model for incorporating data compression within existing parallel I/O algorithms and evaluates the performance benefits obtained through performing parallel I/O on compressed data files. In particular, the dissertation presents an Open MPI-I/O (OMPIO) implementation which incorporates arbitrary compression libraries within the two phase I/O algorithm through a new file format. The results indicate significant performance and space saving benefits through this approach and the parallel compression semantics presented in this dissertation provide a theoretical basis for future research in parallel I/O and data compression.



Parallel I/O, Data compression, MPI, Open MPI


Portions of this document appear in: Singh, Siddhesh Pratap, and Edgar Gabriel. "Parallel I/O on Compressed Data Files: Semantics, Algorithms, and Performance Evaluation." In 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), pp. 192-201. IEEE, 2020.