Challenges in Converting a Large Scale Proteomics Application to Another Programming Language

Date

2020-08

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Cross-linked mass spectrometry has been gaining popularity lately as a relatively cheap and versatile method for providing macromolecular structural data. However, the software required for matching the ion fragments produced during the mass spectrometry experiments presents a scaling issue that can lead to very long run times. The problem is that matching the spectra present in the mass spectrometry data requires a database search. A full database search is O(n2) in the number of entries in the database. Reducing the number of entries in the database can lead to inaccurate results. It is desirable to be able to perform a full database search as quickly as possible so that the database search is not such a bottleneck for these types of experiments. Many applications exist for performing the spectra matching required for cross-linked mass spectrometry experiments. However, none of these applications is ready for a high-performance computing environment. It is desirable to provide a proteomics search software package that can be executed on a cluster of computers. This project approaches this problem by converting an open-source proteomics search package from C# to C++, which is a more appropriate language for high-performance computing applications. As the program selected for this project is very large, this project only details the conversion of certain aspects of it. These aspects include file input and output functionality, unit test functionality, and providing functions and classes that exist in C# but are missing in C++. The converted functions and classes were evaluated using unit tests and execution time benchmarks. The unit tests were used to determine the correctness of the converted code, while the benchmarks were used to make a comparison between the original C# execution time and the converted C++ execution time.

Description

Keywords

Programming Language, Conversion, Proteomics

Citation