Enabling Efficient Neural Network Computation Via Hardware And Software Co-Design
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
In recent years, the neural networks have achieved great successes in the many area, e.g., automotive driving, medical and Intelligent Personal Assistants (IPAs). Among the neural network models, Long-Short Term Memory network (LSTM) and Capsule Network (CapsNet) are popular but exhibit low efficient when processed on the hardware device. In this dissertation, I introduce two hardware and software co-design approaches to efficiently execute the inference stage of the LSTM and the CapsNet. In the first work, we observe that LSTMs exhibit quite inefficient memory access pattern when executed on mobile GPUs due to the redundant data movements and limited off-chip bandwidth. To address the redundancy, we propose inter-cell level optimizations to improve the data locality across cells with negligible accuracy loss. To relax the pressure on limited offchip memory bandwidth, we propose intra-cell level optimizations that dynamically skip the loads and computations of rows in the weight matrices with trivial contribution to the outputs. We also introduce a light-weighted module to the GPUs architecture for the runtime row skipping in weight matrices. In the second work, CapsNet execution is observed low efficiency due to the execution features of their routing procedure, including massive unshareable intermediate variables and intensive synchronizations. we propose the software-hardware co-designed optimizations, SH-CapsNet, which includes the software-level optimizations named S-CapsNet and a hybrid computing architecture design named PIM-CapsNet . In software-level, S-CapsNet reduces the computation and memory accesses by exploiting the computational redundancy and data similarity of the routing procedure. In hardware-level, the PIM-CapsNet leverages the processing-in-memory capability of today’s 3D stacked memory to conduct the off-chip in-memory acceleration solution for the routing procedure, while pipelining with the GPU’s on-chip computing capability for accelerating CNN types of layers in CapsNet.