Accurate deep neural network inference using computational phase-change memory
In this work, we introduced a methodology to achieve close to software-equivalent accuracy deep neural network inference through in-memory computing using phase-change memory devices. We developed a new training algorithm and compensation methods to improve the accuracy retention over time.
Deep neural networks (DNNs) are revolutionizing the field of artificial intelligence as they continue to achieve unprecedented success in cognitive tasks such as image and speech recognition. However, running DNNs on current von Neuman computing architectures limits the achievable performance and energy efficiency. While there has been significant progress in the development of specialized hardware for inference, many of the existing architectures physically split the memory and processing units. This means that DNN models are typically stored in off-chip memory, and that computational tasks require a constant shuffling of data between the memory and computing units – a process that slows down computation and limits the maximum achievable energy efficiency.
Our research, featured in Nature Communications, exploits in-memory computing methods using resistance-based (memristive) storage devices as a promising non-von Neumann approach for developing hardware that can efficiently support DNN inference models. Specifically, we propose an architecture based on phase-change memory (PCM) that, like the human brain, has no separate compartments to store and compute data, and therefore consumes significantly less energy.
The challenge in using PCM devices, however, is achieving and maintaining computational accuracy. As PCM technology is analog in nature, computational precision is limited due to device variability as well as read and write conductance noise. To overcome this, we needed to find a way to train the neural networks so that transferring the digitally trained weights to the analog resistive memory devices would not result in significant loss of accuracy.
Our approach was to explore injecting noise to the synaptic weights during the training of DNNs in software as a generic method to improve the network resilience against analog in-memory computing hardware non-idealities. Our assumption was that injecting noise comparable to the device noise during the training of DNNs would improve the robustness of the models.
It turned out that our assumption was correct – training ResNet-type networks this way resulted in no considerable accuracy loss when transferring weights to PCM devices. We achieved an accuracy of 93.7% on the CIFAR-10 dataset and a top-1 accuracy on the ImageNet benchmark of 71.6% after mapping the trained weights to analog PCM synapses. And after programing the trained weights of ResNet-32 on 723,444 PCM devices of a prototype chip, the CIFAR-10 accuracy computed from the measured hardware weights stayed above 92.6% over a period of 1 day. To the best of our knowledge, this is the highest accuracy rate experimentally reported to-date on the CIFAR-10 dataset by any analog resistive memory hardware.
However, we still wanted to understand if we could improve the accuracy retention over time by introducing additional techniques. So, we developed an online compensation technique that exploits the batch normalization parameters to periodically correct the activation distributions during inference. This allowed us improve the CIFAR-10 accuracy one-day retention up to 93.5% on hardware.
In an era transitioning more and more towards AI-based technologies, including internet-of-things battery-powered devices and autonomous vehicles, such technologies would highly benefit from fast, low-powered, and reliably accurate DNN inference engines. The strategies developed in our study show great potential towards realizing AI hardware-accelerator architectures to support DNN model inferencing in an energy-efficient manner. While several works had previously proposed to inject noise to the synaptic weights during training, our work is the first to exploit this technique to experimentally demonstrate high accuracies on ResNet-type networks with analog resistive memory devices.