RCE-NN: a five-stage pipeline to execute neural networks (CNNs) on resource-constrained IoT edge devices
Sudharsan, Bharath ; Breslin, John G. ; Ali, Muhammad Intizar
Sudharsan, Bharath
Breslin, John G.
Ali, Muhammad Intizar
Loading...
Identifiers
http://hdl.handle.net/10379/16850
https://doi.org/10.13025/21052
https://doi.org/10.13025/21052
Repository DOI
Publication Date
2020-10-06
Type
Conference Paper
Downloads
Citation
Sudharsan, Bharath, Breslin, John G., & Ali, Muhammad Intizar. (2020). RCE-NN: a five-stage pipeline to execute neural networks (CNNs) on resource-constrained IoT edge devices. Paper presented at the 10th International Conference on the Internet of Things (IoT 2020), Malmö, Sweden, 05-09 October, doi:10.1145/3410992.3411005
Abstract
Microcontroller Units (MCUs) in edge devices are resource constrained due to their limited memory footprint, fewer computation cores, and low clock speeds. These limitations constrain one from deploying and executing machine learning models on MCUs. To fit, deploy and execute Convolutional Neural Networks (CNNs) for any IoT use-case on small MCUs, a complete design flow is required. Resource Constrained Edge - Neural Networks (RCE-NN) is the name given to our proposed design flow, with a five-stage pipeline that developers can follow for executing CNNs on MCUs. In this pipeline, the initial model architecture and training stage consists of four well-defined tasks on model size, workload, operations and quantization awareness, which maps the desired CNN as captured in an executable specification to a resource-constrained MCU's specification. The next quantization and conversion stage reduces model size, saves memory, and simplifies calculations without much impact on the accuracy. In the third stage, the quantized version of the model is translated into a c-byte array since the MCUs lack native file-system support. The translated c-byte array is fused with the main program of an IoT use-case and binaries are built using techniques from the fourth stage. Finally, the method presented in the last deployment stage is used to flash the built binaries onto MCUs, as this method allows the memory of the MCU to be fully utilized by the CNN and its operations. We evaluated RCE-NN using eight popular MCU boards. The results show that, when users realize all five pipeline stages, they can fit, deploy and execute multiple CNNs across multiple open-source MCU boards. The RCE-NN pipeline components quantize and compress the CNNs to 1/10th of their original size, enabling the CNNs to fit on MCUs with no or minimal loss in performance, both after quantization and compression, and also during runtime.
Publisher
Association for Computing Machinery (ACM)
Publisher DOI
Rights
Attribution-NonCommercial-NoDerivs 3.0 Ireland