Enabling Efficient Neural Network Computation Via Hardware And Software Co-Design

Zhang, Xingyao

Enabling Efficient Neural Network Computation Via Hardware And Software Co-Design

dc.contributor.advisor	Fu, Xin
dc.contributor.committeeMember	Chen, Jinghong
dc.contributor.committeeMember	Pan, Miao
dc.contributor.committeeMember	Jackson, David R.
dc.contributor.committeeMember	Wu, Xuqing
dc.creator	Zhang, Xingyao
dc.creator.orcid	0000-0002-8874-9520
dc.date.accessioned	2022-06-30T23:37:55Z
dc.date.created	August 2020
dc.date.issued	2020-08
dc.date.submitted	August 2020
dc.date.updated	2022-06-30T23:37:57Z
dc.description.abstract	In recent years, the neural networks have achieved great successes in the many area, e.g., automotive driving, medical and Intelligent Personal Assistants (IPAs). Among the neural network models, Long-Short Term Memory network (LSTM) and Capsule Network (CapsNet) are popular but exhibit low efficient when processed on the hardware device. In this dissertation, I introduce two hardware and software co-design approaches to efficiently execute the inference stage of the LSTM and the CapsNet. In the first work, we observe that LSTMs exhibit quite inefficient memory access pattern when executed on mobile GPUs due to the redundant data movements and limited off-chip bandwidth. To address the redundancy, we propose inter-cell level optimizations to improve the data locality across cells with negligible accuracy loss. To relax the pressure on limited offchip memory bandwidth, we propose intra-cell level optimizations that dynamically skip the loads and computations of rows in the weight matrices with trivial contribution to the outputs. We also introduce a light-weighted module to the GPUs architecture for the runtime row skipping in weight matrices. In the second work, CapsNet execution is observed low efficiency due to the execution features of their routing procedure, including massive unshareable intermediate variables and intensive synchronizations. we propose the software-hardware co-designed optimizations, SH-CapsNet, which includes the software-level optimizations named S-CapsNet and a hybrid computing architecture design named PIM-CapsNet . In software-level, S-CapsNet reduces the computation and memory accesses by exploiting the computational redundancy and data similarity of the routing procedure. In hardware-level, the PIM-CapsNet leverages the processing-in-memory capability of today’s 3D stacked memory to conduct the off-chip in-memory acceleration solution for the routing procedure, while pipelining with the GPU’s on-chip computing capability for accelerating CNN types of layers in CapsNet.
dc.description.department	Electrical and Computer Engineering, Department of
dc.format.digitalOrigin	born digital
dc.format.mimetype	application/pdf
dc.identifier.citation	Portions of this document appear in: Zhang, Xingyao, et al. "Towards memory friendly long-short term memory networks (LSTMs) on mobile GPUs." 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2018; and in: Zhang, Xingyao, et al. "Enabling Highly Efficient Capsule Networks Processing Through A PIM-Based Architecture Design." 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2020.
dc.identifier.uri	https://hdl.handle.net/10657/10256
dc.language.iso	eng
dc.rights	The author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. UH Libraries has secured permission to reproduce any and all previously published materials contained in the work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s).
dc.subject	Computer Architecture
dc.subject	Machine Learning Acceleration
dc.subject	Emerging Technology
dc.subject	Processing in Memory
dc.title	Enabling Efficient Neural Network Computation Via Hardware And Software Co-Design
dc.type.dcmi	Text
dc.type.genre	Thesis
local.embargo.lift	2022-08-01
local.embargo.terms	2022-08-01
thesis.degree.college	Cullen College of Engineering
thesis.degree.department	Electrical and Computer Engineering, Department of
thesis.degree.discipline	Electrical Engineering
thesis.degree.grantor	University of Houston
thesis.degree.level	Doctoral
thesis.degree.name	Doctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1

Name:: ZHANG-DISSERTATION-2020.pdf
Size:: 10.1 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 2 of 2

Name:: PROQUEST_LICENSE.txt
Size:: 4.43 KB
Format:: Plain Text
Description:

Download

Name:: LICENSE.txt
Size:: 1.81 KB
Format:: Plain Text
Description:

Download

Collections

Published ETD Collection