Optimizing the Performance of Directive-based Programming Model for GPGPUs

Xu, Rengan 1987-

Optimizing the Performance of Directive-based Programming Model for GPGPUs

dc.contributor.advisor	Chapman, Barbara M.
dc.contributor.committeeMember	Eick, Christoph F.
dc.contributor.committeeMember	Shah, Shishir Kirit
dc.contributor.committeeMember	Subhlok, Jaspal
dc.contributor.committeeMember	Calandra, Henri
dc.creator	Xu, Rengan 1987-
dc.date.accessioned	2018-07-10T18:52:10Z
dc.date.available	2018-07-10T18:52:10Z
dc.date.created	May 2016
dc.date.issued	2016-05
dc.date.submitted	May 2016
dc.date.updated	2018-07-10T18:52:10Z
dc.description.abstract	Accelerators have been deployed on most major HPC systems. They are considered to improve the performance of many applications. Accelerators such as GPUs have an immense potential in terms of high compute capacity but programming these devices is a challenge. OpenCL, CUDA and other vendor-specific models for accelerator programming definitely offer high performance, but these are low-level models that demand excellent programming skills; moreover, they are time consuming to write and debug. In order to simplify GPU programming, several directive-based programming models have been proposed, including HMPP, PGI accelerator model and OpenACC. OpenACC has now become established as the de facto standard. We evaluate and compare these models involving several scientific applications. To study the implementation challenges and the principles and techniques of directive- based models, we built an open source OpenACC compiler on top of a main stream compiler framework (OpenUH as a branch of Open64). In this dissertation, we present the required techniques to parallelize and optimize the applications ported with OpenACC programming model. We apply both user-level optimizations in the applications and compiler and runtime-driven optimizations. The compiler optimization focuses on the parallelization of reduction operations inside nested parallel loops. To fully utilize all GPU resources, we also extend the OpenACC model to support multiple GPUs in a single node. Our application porting experience also revealed the challenge of choosing good loop schedules. The default loop schedule chosen by the compiler may not produce the best performance, so the user has to manually try different loop schedules to improve the performance. To solve this issue, we developed a locality-aware auto-tuning framework which is based on the proposed memory access cost model to help the compiler choose optimal loop schedules and guide the user to choose appropriate loop schedules.
dc.description.department	Computer Science, Department of
dc.format.digitalOrigin	born digital
dc.format.mimetype	application/pdf
dc.identifier.uri	http://hdl.handle.net/10657/3211
dc.language.iso	eng
dc.rights	The author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s).
dc.subject	GPU
dc.subject	OpenACC
dc.subject	OpenMP
dc.subject	Directives
dc.subject	Parallel programming
dc.subject	HPC
dc.subject	Programming models
dc.title	Optimizing the Performance of Directive-based Programming Model for GPGPUs
dc.type.dcmi	Text
dc.type.genre	Thesis
thesis.degree.college	College of Natural Sciences and Mathematics
thesis.degree.department	Computer Science, Department of
thesis.degree.discipline	Computer Science
thesis.degree.grantor	University of Houston
thesis.degree.level	Doctoral
thesis.degree.name	Doctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1

Name:: XU-DISSERTATION-2016.pdf
Size:: 3.3 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: LICENSE.txt
Size:: 1.81 KB
Format:: Plain Text
Description:

Download

Collections

Published ETD Collection