Evaluation of mpi4py for Natural Language Processing Scenarios
dc.contributor.advisor | Gabriel, Edgar | |
dc.contributor.committeeMember | Solorio, Thamar | |
dc.contributor.committeeMember | Lindner, Peggy | |
dc.creator | Saxena, Manvi 1985- | |
dc.date.accessioned | 2018-06-22T21:52:36Z | |
dc.date.available | 2018-06-22T21:52:36Z | |
dc.date.created | May 2018 | |
dc.date.issued | 2018-05 | |
dc.date.submitted | May 2018 | |
dc.date.updated | 2018-06-22T21:52:36Z | |
dc.description.abstract | Many Natural Language Processing (NLP) applications operating on large data sets are written in programming languages that do not have bindings in the Message Passing Interface (MPI) specification. Yet, with increasing problem sizes, these applications also necessitate some form of parallel and distributed processing. The goal of this thesis is to evaluate the utilization of MPI with a non-traditional HPC programing language, Python, for NLP application scenarios. The current thesis is divided into two parts. The first part evaluates the performance and functionality of the mpi4py, a python module for MPI binding, using multiple point-to-point benchmarks with native C-based MPI benchmarks using an InfiniBand and a Gigabit Ethernet network interconnect. The results show that in many instances communication performance of the Python benchmarks was on par with their C-based counterparts. In the second part of the thesis, a few application scenarios used in Natural Language Processing (NLP) such as word count, n-gram count, and tfidf were developed, and mpi4py module was used to distribute data on different nodes for these scenarios and to evaluate performance. The results demonstrate that the application of mpi4py module in NLP scenarios can greatly improve execution time. | |
dc.description.department | Computer Science, Department of | |
dc.format.digitalOrigin | born digital | |
dc.format.mimetype | application/pdf | |
dc.identifier.uri | http://hdl.handle.net/10657/3102 | |
dc.language.iso | eng | |
dc.rights | The author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s). | |
dc.subject | NLP | |
dc.subject | Python | |
dc.subject | MPI | |
dc.title | Evaluation of mpi4py for Natural Language Processing Scenarios | |
dc.type.dcmi | Text | |
dc.type.genre | Thesis | |
thesis.degree.college | College of Natural Sciences and Mathematics | |
thesis.degree.department | Computer Science, Department of | |
thesis.degree.discipline | Computer Science | |
thesis.degree.grantor | University of Houston | |
thesis.degree.level | Masters | |
thesis.degree.name | Master of Science |
Files
Original bundle
1 - 1 of 1