Challenges in Converting a Large Scale Proteomics Application to Another Programming Language

dc.contributor.advisorGabriel, Edgar
dc.contributor.committeeMemberCheung, Margaret S.
dc.contributor.committeeMemberAlipour, M. Amin
dc.creatorBiddle, Nicholas
dc.date.accessioned2020-12-18T17:09:03Z
dc.date.available2020-12-18T17:09:03Z
dc.date.createdAugust 2020
dc.date.issued2020-08
dc.date.submittedAugust 2020
dc.date.updated2020-12-18T17:09:04Z
dc.description.abstractCross-linked mass spectrometry has been gaining popularity lately as a relatively cheap and versatile method for providing macromolecular structural data. However, the software required for matching the ion fragments produced during the mass spectrometry experiments presents a scaling issue that can lead to very long run times. The problem is that matching the spectra present in the mass spectrometry data requires a database search. A full database search is O(n2) in the number of entries in the database. Reducing the number of entries in the database can lead to inaccurate results. It is desirable to be able to perform a full database search as quickly as possible so that the database search is not such a bottleneck for these types of experiments. Many applications exist for performing the spectra matching required for cross-linked mass spectrometry experiments. However, none of these applications is ready for a high-performance computing environment. It is desirable to provide a proteomics search software package that can be executed on a cluster of computers. This project approaches this problem by converting an open-source proteomics search package from C# to C++, which is a more appropriate language for high-performance computing applications. As the program selected for this project is very large, this project only details the conversion of certain aspects of it. These aspects include file input and output functionality, unit test functionality, and providing functions and classes that exist in C# but are missing in C++. The converted functions and classes were evaluated using unit tests and execution time benchmarks. The unit tests were used to determine the correctness of the converted code, while the benchmarks were used to make a comparison between the original C# execution time and the converted C++ execution time.
dc.description.departmentComputer Science, Department of
dc.format.digitalOriginborn digital
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/10657/7272
dc.language.isoeng
dc.rightsThe author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s).
dc.subjectProgramming Language, Conversion, Proteomics
dc.titleChallenges in Converting a Large Scale Proteomics Application to Another Programming Language
dc.type.dcmiText
dc.type.genreThesis
thesis.degree.collegeCollege of Natural Sciences and Mathematics
thesis.degree.departmentComputer Science, Department of
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of Houston
thesis.degree.levelMasters
thesis.degree.nameMaster of Science

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
BIDDLE-THESIS-2020.pdf
Size:
811.76 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
4.43 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.82 KB
Format:
Plain Text
Description: