TinyML is bringing deep learning models to microcontrollers

This article is part of our reviews of AI research papers, a series of posts that explore the latest findings in artificial intelligence.

Deep learning models owe their initial success to large servers with large amounts of memory and clusters of GPUs. The promises of deep learning gave rise to an entire industry of cloud computing services for deep neural networks. Consequently, very large neural networks running on virtually unlimited cloud resources became very popular, especially among wealthy tech companies that can foot the bill.

But at the same time, recent years have also seen a reverse trend, a concerted effort to create machine learning models for edge devices. Called tiny machine learning, or TinyML, these models are suited for devices that have limited memory and processing power, and in which internet connectivity is either non-present or limited.

The latest in these efforts, a joint work by IBM and the Massachusetts Institute of Technology (MIT), addresses the peak-memory bottleneck of convolutional neural networks (CNN), a deep learning architecture that is especially critical for computer vision applications. Detailed in a paper presented at the NeurIPS 2021 conference, the model is called MCUNetV2 and can run CNNs on low-memory and low-power microcontrollers.

Why TinyML?


While deep learning in the cloud has been tremendously successful, it is not applicable in all situations. Many applications require on-device inference. For example, in some settings, such as drone rescue missions, internet connectivity is not guaranteed. In other domains, such as healthcare, privacy requirements and regulations make it very difficult to send data to the cloud for processing. And the delay caused by the roundtrip to the cloud is prohibitive for applications that require real-time ML inference.

All these necessities have made on-device ML both scientifically and commercially attractive. Your iPhone now runs facial recognition and speech recognition on device. Your Android phone can run on-device translation. Your Apple Watch uses machine learning to detect movements and ECG patterns.

These on-device ML models have partly been made possible by advances in techniques used to make neural networks compact and more compute- and memory-efficient. But they have also been made possible thanks to advances in hardware. Our smartphones and wearables now pack more computing power than a server did 30 years ago. Some even have specialized co-processors for ML inference.

TinyML takes edge AI one step further, making it possible to run deep learning models on microcontrollers (MCU), which are much more resource-constrained than the small computers that we carry in our pockets and on our wrists.

Microcontrollers are cheap, with average sales prices reaching under $0.50, and they’re everywhere, embedded in consumer and industrial devices. At the same time, they don’t have the resources found in generic computing devices. Most of them don’t have an operating system. They have a small CPU, are limited to a few hundred kilobytes of low-power memory (SRAM) and a few megabytes of storage, and don’t have any networking gear. They mostly don’t have a mains electricity source and must run on cell and coin batteries for years. Therefore, fitting deep learning models on MCUs can open the way for many applications.

Memory bottlenecks in convolutional neural networks


Architecture of convolutional neural network (CNN)