Performance of Serialization Libraries in a High Performance Computing Environment



Journal Title

Journal ISSN

Volume Title



High performance computing is a subset of distributed computing, and is a paradigm that involves building a cluster of interconnected machines capable of performing operations in parallel. This parallelization enables the cluster to reduce the time needed to perform operations by distributing the work across multiple cluster nodes. The process is heavily dependent on internode communication, and requires nodes to coordinate and communicate by passing messages among themselves. High performance computing requires that this messaging be very efficient. The messaging process involves serializing the message contents prior to transmission, and deserializing it upon receipt by the receiver. Several libraries have emerged to facilitate serialization and deserialization including Protocol Buffers, FlatBuffers, and MessagePack. The goal of this thesis is to evaluate the performance of these libraries within the context of a high performance computing software package. As an evaluation infrastructure, a parallelized mass spectrometry tool currently under development at the University of Houston is used, and a new mechanism for serialization is contributed to this tool using each of these three serialization libraries. The libraries are evaluated holistically within the context of the above software package; with many metrics being observed including their performance in terms of execution time and hardware utilization, as well as their general ease of development.



Serialization, High performance computing