Enabling Efficient Neural Network Computation Via Hardware And Software Co-Design

Date

2020-08

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

In recent years, the neural networks have achieved great successes in the many area, e.g., automotive driving, medical and Intelligent Personal Assistants (IPAs). Among the neural network models, Long-Short Term Memory network (LSTM) and Capsule Network (CapsNet) are popular but exhibit low efficient when processed on the hardware device. In this dissertation, I introduce two hardware and software co-design approaches to efficiently execute the inference stage of the LSTM and the CapsNet. In the first work, we observe that LSTMs exhibit quite inefficient memory access pattern when executed on mobile GPUs due to the redundant data movements and limited off-chip bandwidth. To address the redundancy, we propose inter-cell level optimizations to improve the data locality across cells with negligible accuracy loss. To relax the pressure on limited offchip memory bandwidth, we propose intra-cell level optimizations that dynamically skip the loads and computations of rows in the weight matrices with trivial contribution to the outputs. We also introduce a light-weighted module to the GPUs architecture for the runtime row skipping in weight matrices. In the second work, CapsNet execution is observed low efficiency due to the execution features of their routing procedure, including massive unshareable intermediate variables and intensive synchronizations. we propose the software-hardware co-designed optimizations, SH-CapsNet, which includes the software-level optimizations named S-CapsNet and a hybrid computing architecture design named PIM-CapsNet . In software-level, S-CapsNet reduces the computation and memory accesses by exploiting the computational redundancy and data similarity of the routing procedure. In hardware-level, the PIM-CapsNet leverages the processing-in-memory capability of today’s 3D stacked memory to conduct the off-chip in-memory acceleration solution for the routing procedure, while pipelining with the GPU’s on-chip computing capability for accelerating CNN types of layers in CapsNet.

Description

Keywords

Computer Architecture, Machine Learning Acceleration, Emerging Technology, Processing in Memory

Citation

Portions of this document appear in: Zhang, Xingyao, et al. "Towards memory friendly long-short term memory networks (LSTMs) on mobile GPUs." 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2018; and in: Zhang, Xingyao, et al. "Enabling Highly Efficient Capsule Networks Processing Through A PIM-Based Architecture Design." 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2020.