Fu, Xin2019-09-182019-09-18August 2012017-08August 201https://hdl.handle.net/10657/4820Processing-in-memory (PIM) offers a viable solution to overcome the memory wall crisis that has been plaguing memory system for decades. Due to advancements in 3D stacking technology in recent years, PIM provides an opportunity to reduce both energy and data movement overheads, which are the primary concerns in present computer architecture community. General purpose GPU (GPGPU) systems, with most of its emerging applications data intensive, require large volume of data to be transferred at fast pace to keep the computations in processing units running, thereby putting an enormous pressure on the memory systems. To explore the potential of PIM technology in solving the memory wall problem, in this research, we integrate PIM technology with GPGPU systems and develop a mechanism that dynamically identifies and offloads candidate thread blocks to PIM cores. Our offloading mechanism shows significant performance improvement (30% by average and up to 2.1x) as compared to the baseline GPGPU system without block offloading.application/pdfengThe author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s).Processing in-memoryGPGPUOffloadingIntegrating Processing In-Memory (PIM) Technology into General Purpose Graphics Processing Units (GPGPU) for Energy Efficient Computing2019-09-18Thesisborn digital