Architectural Approaches to Design Reliable and Energy-Efficient GPUs

dc.contributor.advisorFu, Xin
dc.contributor.committeeMemberChen, Jinghong
dc.contributor.committeeMemberChen, Yuhua
dc.contributor.committeeMemberPeng, Jiming
dc.contributor.committeeMemberChen, Guoning
dc.contributor.committeeMemberSong, Shuaiwen Leon
dc.creatorTan, Jingweijia
dc.date.accessioned2021-07-15T04:52:09Z
dc.date.available2021-07-15T04:52:09Z
dc.date.createdMay 2016
dc.date.issued2016-05
dc.date.submittedMay 2016
dc.date.updated2021-07-15T04:52:10Z
dc.description.abstractModern graphic processing units (GPUs) support thousands of concurrent threads and provide high computational throughput, which makes them popular platforms for general-purpose high-performance computing (HPC) applications. However this raises reliability and energy-efficiency challenges in GPU architecture design. Originally designed for graphics applications with relaxed requirements on execution correctness, GPUs lack the error detection and fault tolerance features. In contrast, HPC programs have rigorous demands on execution correctness, which poses serious reliability challenges for general purpose computing on GPUs (GPGPUs). In addition, GPUs consume large amount of energy to achieve its high computing power. The peak power consumption of a high-end GPU is more than twice of the CPU counterparts and the energy-efficiency of GPUs fail to grow as fast as the performance improvement. In this dissertation, we introduce several architectural approaches to design reliable and energy-efficient GPUs. We first propose several opportunistic techniques to recycle the idle time of streaming processors for soft-error detection and obtain the good fault coverage with negligible performance degradation. Utilizing the promising benefits of resistive memory, we further propose to leverage resistive memory to enhance the soft-error robustness and reduce the power consumption of registers in the GPUs. We then explore to mitigate the susceptibility of GPU register file to process variations. The proposed techniques are able to significantly optimize GPUs' performance under process variations. After that, we propose an effective and low-cost mechanism to maintain the register file reliability with negligible performance loss under process variations and low supply voltages, which enables substantial energy savings via aggressive supply voltage reduction. Finally, we propose an energy-efficient GPU L2 cache design that leverages locality similarity to reduce the L2 energy consumption with negligible performance degradation. Overall, these techniques efficiently address the reliability and energy-efficient challenges in GPU architectures.
dc.description.departmentElectrical and Computer Engineering, Department of
dc.format.digitalOriginborn digital
dc.format.mimetypeapplication/pdf
dc.identifier.citationPortions of this document appear in: Tan, Jingweijia, and Xin Fu. "RISE: Improving the streaming processors reliability against soft errors in GPGPUs." In Proceedings of the 21st international conference on Parallel architectures and compilation techniques, pp. 191-200. 2012. And in: Tan, Jingweijia, Zhi Li, and Xin Fu. "Soft-error reliability and power co-optimization for GPGPUS register file using resistive memory." In 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 369-374. IEEE, 2015. And in: Tan, Jingweijia, and Xin Fu. "Mitigating the susceptibility of gpgpus register file to process variations." In 2015 IEEE International Parallel and Distributed Processing Symposium, pp. 969-978. IEEE, 2015.
dc.identifier.urihttps://hdl.handle.net/10657/7900
dc.language.isoeng
dc.rightsThe author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. UH Libraries has secured permission to reproduce any and all previously published materials contained in the work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s).
dc.subjectGPU, Reliability, Energy Efficiency
dc.titleArchitectural Approaches to Design Reliable and Energy-Efficient GPUs
dc.type.dcmiText
dc.type.genreThesis
thesis.degree.collegeCullen College of Engineering
thesis.degree.departmentElectrical and Computer Engineering, Department of
thesis.degree.disciplineElectrical Engineering
thesis.degree.grantorUniversity of Houston
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
TAN-DISSERTATION-2016.pdf
Size:
6.58 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.81 KB
Format:
Plain Text
Description: