Publication

Ultra-fast machine learning classifier execution on IoT devices without SRAM consumption

Sudharsan, Bharath
Patel, Pankesh
Breslin, John G.
Ali, Muhammad Intizar
Citation
Sudharsan, Bharath, Patel, Pankesh, Breslin, John G., & Ali, Muhammad Intizar. (2021). Ultra-fast Machine Learning Classifier Execution on IoT Devices without SRAM Consumption. Paper presented at the IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops), Kassel, Germany, 22-26 March.
Abstract
With the introduction of edge analytics, IoT devices are becoming smart and ready for AI applications. A few modern ML frameworks are focusing on the generation of small-size ML models (often in kBs) that can directly be flashed and executed on tiny IoT devices, particularly the embedded systems. Edge analytics eliminates expensive device-to-cloud communications, thereby producing intelligent devices that can perform energy-efficient real-time offline analytics. Any increase in the training data results in a linear increase in the size and space complexity of the trained ML models, making them unable to be deployed on IoT devices with limited memory. To alleviate the memory issue, a few studies have focused on optimizing and fine-tuning existing ML algorithms to reduce their complexity and size. However, such optimization is usually dependent on the nature of IoT data being trained. In this paper, we presented an approach that protects model quality without requiring any alteration to the existing ML algorithms. We propose SRAM-optimized implementation and efficient deployment of widely used standard/stable ML-frameworks classifier versions (e.g., from Python scikit-learn). Our initial evaluation results have demonstrated that ours is the most resource-friendly approach, having a very limited memory footprint while executing large and complex ML models on MCU-based IoT devices, and can perform ultra-fast classifications while consuming 0 bytes of SRAM. When we tested our approach by executing it on a variety of MCU-based devices, the majority of models ported and executed produced 1-4x times faster inference results in comparison with the models ported by the sklearn-porter, m2cgen, and emlearn libraries.
Publisher
Institute of Electrical and Electronics Engineers (IEEE)
Publisher DOI
Rights
Attribution 4.0 International (CC BY 4.0)