Show simple item record

dc.contributor.advisorGabriel, Edgar
dc.creatorSaxena, Manvi 1985-
dc.date.accessioned2018-06-22T21:52:36Z
dc.date.available2018-06-22T21:52:36Z
dc.date.createdMay 2018
dc.date.issued2018-05
dc.date.submittedMay 2018
dc.identifier.urihttp://hdl.handle.net/10657/3102
dc.description.abstractMany Natural Language Processing (NLP) applications operating on large data sets are written in programming languages that do not have bindings in the Message Passing Interface (MPI) specification. Yet, with increasing problem sizes, these applications also necessitate some form of parallel and distributed processing. The goal of this thesis is to evaluate the utilization of MPI with a non-traditional HPC programing language, Python, for NLP application scenarios. The current thesis is divided into two parts. The first part evaluates the performance and functionality of the mpi4py, a python module for MPI binding, using multiple point-to-point benchmarks with native C-based MPI benchmarks using an InfiniBand and a Gigabit Ethernet network interconnect. The results show that in many instances communication performance of the Python benchmarks was on par with their C-based counterparts. In the second part of the thesis, a few application scenarios used in Natural Language Processing (NLP) such as word count, n-gram count, and tfidf were developed, and mpi4py module was used to distribute data on different nodes for these scenarios and to evaluate performance. The results demonstrate that the application of mpi4py module in NLP scenarios can greatly improve execution time.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.rightsThe author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s).
dc.subjectNLP
dc.subjectPython
dc.subjectMPI
dc.titleEvaluation of mpi4py for Natural Language Processing Scenarios
dc.date.updated2018-06-22T21:52:36Z
dc.type.genreThesis
thesis.degree.nameMaster of Science
thesis.degree.levelMasters
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of Houston
thesis.degree.departmentComputer Science, Department of
dc.contributor.committeeMemberSolorio, Thamar
dc.contributor.committeeMemberLindner, Peggy
dc.type.dcmiText
dc.format.digitalOriginborn digital
dc.description.departmentComputer Science, Department of
thesis.degree.collegeCollege of Natural Sciences and Mathematics


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record