Link to the paper: https://doi.org/10.1038/s41928-020-0397-9
The essence of a machine learning task, such as autonomous driving, can be considered as a query process – passing high volume of data through deep neural networks in searching for an answer. Handling the computation inside each network layer in parallel can significantly improve the processing throughput. However, CMOS-based processors such as GPU and TPU may not be the ultimate solutions because of the costly implementation of each processing node (e.g. a full adder would require more than 10 transistors).
Alternatively, the memristor based system becomes a promising candidate because each memristor itself is a highly scalable analogue multiply-and-add processing node. Conventionally, all these individual devices are assembled in a grid-like, 2D crossbar structure where all the memristors are parallelized for matrix operations. When it comes to machine learning tasks, each 2D array can also be used to directly implement a fully connected network layer. However, in real world applications, neural networks are much more complex than the classic fully connected structures. As a result, it appears to be increasingly difficult to efficiently map the complex structure and connections of a modern neural network to a 2D system.
After all, this is not a surprise. If we take a look at a biological brain, neural networks are, in fact, built from complex 3D topologies! The added dimension has vastly enriched the functionality of neuron-to-neuron connections and gave rise to many emergent brain functions. For example, a series of image filtering processes have been found in visual cortex to extract high-level abstract features from the raw input. It has led to the development of the convolutional neural network (CNN) - the most popular deep neural network in machine vision. The topology of each convolutional layer composes of many highly interleaved local kernel connections, which cannot be simply fit into a 2D plane without any unwanted intersections. It appears that a 3D circuit is inevitably needed to practically host all these complex neural networks.
Inspired by the 3D connectivity of the brain, we designed a novel 3D memristor circuit that can directly implement a convolutional neural network. By purposely putting the top and bottom electrodes of the memristors quasi-vertically inside the 3D array, it forms many highly parallel and locally connected processing pathways that match the topologies in convolutional layers. As a result, our 3D circuit can naturally perform the 2D convolutions. It turned out that the same design has also led to four highly rewarded properties: (1) local connectivity of memristors is extremely effective in limiting sneak path current in large arrays. (2) 2D input/output from its area interface is essential to achieve high data throughput, e.g. for video streams. (3) vertical-alignment of electrodes makes it convenient to stack up – adding another layer only requires a short connection to the top layer of the 3D array (conventional 3D design would require new layout design to make room for the additional and direct connections to the substrate) and therefore (4) only two sets of masks are required in our design to stack all the 3D layers (used alternatively for odd and even layers).
The functionality of our 3D design was experimentally demonstrated on an in-house built 3D memristor chip with eight monolithically integrated memristor layers – the tallest computing circuit to date. Faithful operations such as pixel-wise parallel convolutions for neural networks and video processing were demonstrated. Although current work is only a proof-of-concept demonstration, we believe it is a good starting point that could introduce a new direction in designing neuromorphic computing hardware that emphasizes more on the functional connectivity and topology.