Evaluation of mpi4py for Natural Language Processing Scenarios

dc.contributor.advisorGabriel, Edgar
dc.contributor.committeeMemberSolorio, Thamar
dc.contributor.committeeMemberLindner, Peggy
dc.creatorSaxena, Manvi 1985-
dc.date.accessioned2018-06-22T21:52:36Z
dc.date.available2018-06-22T21:52:36Z
dc.date.createdMay 2018
dc.date.issued2018-05
dc.date.submittedMay 2018
dc.date.updated2018-06-22T21:52:36Z
dc.description.abstractMany Natural Language Processing (NLP) applications operating on large data sets are written in programming languages that do not have bindings in the Message Passing Interface (MPI) specification. Yet, with increasing problem sizes, these applications also necessitate some form of parallel and distributed processing. The goal of this thesis is to evaluate the utilization of MPI with a non-traditional HPC programing language, Python, for NLP application scenarios. The current thesis is divided into two parts. The first part evaluates the performance and functionality of the mpi4py, a python module for MPI binding, using multiple point-to-point benchmarks with native C-based MPI benchmarks using an InfiniBand and a Gigabit Ethernet network interconnect. The results show that in many instances communication performance of the Python benchmarks was on par with their C-based counterparts. In the second part of the thesis, a few application scenarios used in Natural Language Processing (NLP) such as word count, n-gram count, and tfidf were developed, and mpi4py module was used to distribute data on different nodes for these scenarios and to evaluate performance. The results demonstrate that the application of mpi4py module in NLP scenarios can greatly improve execution time.
dc.description.departmentComputer Science, Department of
dc.format.digitalOriginborn digital
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10657/3102
dc.language.isoeng
dc.rightsThe author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s).
dc.subjectNLP
dc.subjectPython
dc.subjectMPI
dc.titleEvaluation of mpi4py for Natural Language Processing Scenarios
dc.type.dcmiText
dc.type.genreThesis
thesis.degree.collegeCollege of Natural Sciences and Mathematics
thesis.degree.departmentComputer Science, Department of
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of Houston
thesis.degree.levelMasters
thesis.degree.nameMaster of Science

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
SAXENA-THESIS-2018.pdf
Size:
1.46 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
4.43 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.81 KB
Format:
Plain Text
Description: