This work introduces a new training and compression pipeline to build nested sparse convolutional neural networks (ConvNets), a class of dynamic ConvNets suited for inference tasks deployed on resource-constrained devices at the edge of the Internet of Things. A nested sparse ConvNet consists of a single ConvNet architecture, containing N sparse subnetworks with nested weights subsets, like a Matryoshka doll, and can trade accuracy for latency at runtime, using the model sparsity as a dynamic knob. To attain high accuracy at training time, we propose a gradient masking technique that optimally routes the learning signals across the nested weight subsets. To minimize the storage footprint and efficiently process the obtained models at inference time, we introduce a new sparse matrix compression format with dedicated compute kernels that fruitfully exploit the characteristic of the nested weights subsets. Tested on image classification and object detection tasks on an off-the-shelf ARM-M7 microcontroller unit (MCU), nested sparse ConvNets outperform variable-latency solutions naively built assembling single sparse models trained as stand-alone instances, achieving 1) comparable accuracy; 2) remarkable storage savings; and 3) high performance. Moreover, when compared to state-of-the-art dynamic strategies, such as dynamic pruning and layer width scaling, nested sparse ConvNets turn out to be Pareto optimal in the accuracy versus latency space.
Dynamic ConvNets on Tiny Devices via Nested Sparsity / Grimaldi, Matteo; Mocerino, Luca; Cipolletta, Antonio; Calimera, Andrea. - In: IEEE INTERNET OF THINGS JOURNAL. - ISSN 2327-4662. - 10:6(2023), pp. 5073-5082. [10.1109/JIOT.2022.3222014]
Dynamic ConvNets on Tiny Devices via Nested Sparsity
Matteo Grimaldi;Luca Mocerino;Antonio Cipolletta;Andrea Calimera
2023
Abstract
This work introduces a new training and compression pipeline to build nested sparse convolutional neural networks (ConvNets), a class of dynamic ConvNets suited for inference tasks deployed on resource-constrained devices at the edge of the Internet of Things. A nested sparse ConvNet consists of a single ConvNet architecture, containing N sparse subnetworks with nested weights subsets, like a Matryoshka doll, and can trade accuracy for latency at runtime, using the model sparsity as a dynamic knob. To attain high accuracy at training time, we propose a gradient masking technique that optimally routes the learning signals across the nested weight subsets. To minimize the storage footprint and efficiently process the obtained models at inference time, we introduce a new sparse matrix compression format with dedicated compute kernels that fruitfully exploit the characteristic of the nested weights subsets. Tested on image classification and object detection tasks on an off-the-shelf ARM-M7 microcontroller unit (MCU), nested sparse ConvNets outperform variable-latency solutions naively built assembling single sparse models trained as stand-alone instances, achieving 1) comparable accuracy; 2) remarkable storage savings; and 3) high performance. Moreover, when compared to state-of-the-art dynamic strategies, such as dynamic pruning and layer width scaling, nested sparse ConvNets turn out to be Pareto optimal in the accuracy versus latency space.File | Dimensione | Formato | |
---|---|---|---|
final.pdf
accesso aperto
Tipologia:
2. Post-print / Author's Accepted Manuscript
Licenza:
PUBBLICO - Tutti i diritti riservati
Dimensione
1.1 MB
Formato
Adobe PDF
|
1.1 MB | Adobe PDF | Visualizza/Apri |
Dynamic_ConvNets_on_Tiny_Devices_via_Nested_Sparsity.pdf
non disponibili
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
1.36 MB
Formato
Adobe PDF
|
1.36 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2977752