Subhlok, Jaspal2023-05-26May 20222022-04-28Portions of this document appear in: Singh, Siddhesh Pratap, and Edgar Gabriel. "Parallel I/O on Compressed Data Files: Semantics, Algorithms, and Performance Evaluation." In 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), pp. 192-201. IEEE, 2020.https://hdl.handle.net/10657/14271The increase in processing power of modern computing hardware has not been accompanied by a proportional increase in the performance of storage technology leading to an imbalance in cluster and parallel computing architectures where input-output (I/O) operations may bottleneck the overall performance of the system. This makes necessary the use of sophisticated software solutions to overcome limitations on I/O performance. One method is to apply specialized algorithms in parallel I/O to optimize data transfer. Another solution to this problem is to use data compression to effectively reduce the amount of data which is transferred between processing and storage units. An under examined area of research is the intersection of parallel I/O and data compression and how these two techniques can be combined in High Performance Computing (HPC) environments. This dissertation presents a general model for incorporating data compression within existing parallel I/O algorithms and evaluates the performance benefits obtained through performing parallel I/O on compressed data files. In particular, the dissertation presents an Open MPI-I/O (OMPIO) implementation which incorporates arbitrary compression libraries within the two phase I/O algorithm through a new file format. The results indicate significant performance and space saving benefits through this approach and the parallel compression semantics presented in this dissertation provide a theoretical basis for future research in parallel I/O and data compression.application/pdfengThe author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. UH Libraries has secured permission to reproduce any and all previously published materials contained in the work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s).Parallel I/OData compressionMPIOpen MPIParallel I/O on Compressed Data Files2023-05-26Thesisborn digital