This work introduces a new training and compression pipeline to build nested sparse convolutional neural networks (ConvNets), a class of dynamic ConvNets suited for inference tasks deployed on resource-constrained devices at the edge of the Internet of Things. A nested sparse ConvNet consists of a single ConvNet architecture, containing N sparse subnetworks with nested weights subsets, like a Matryoshka doll, and can trade accuracy for latency at runtime, using the model sparsity as a dynamic knob. To attain high accuracy at training time, we propose a gradient masking technique that optimally routes the learning signals across the nested weight subsets. To minimize the storage footprint and efficiently process the obtained models at inference time, we introduce a new sparse matrix compression format with dedicated compute kernels that fruitfully exploit the characteristic of the nested weights subsets. Tested on image classification and object detection tasks on an off-the-shelf ARM-M7 microcontroller unit (MCU), nested sparse ConvNets outperform variable-latency solutions naively built assembling single sparse models trained as stand-alone instances, achieving 1) comparable accuracy; 2) remarkable storage savings; and 3) high performance. Moreover, when compared to state-of-the-art dynamic strategies, such as dynamic pruning and layer width scaling, nested sparse ConvNets turn out to be Pareto optimal in the accuracy versus latency space.

Dynamic ConvNets on Tiny Devices via Nested Sparsity / Grimaldi, Matteo; Mocerino, Luca; Cipolletta, Antonio; Calimera, Andrea. - In: IEEE INTERNET OF THINGS JOURNAL. - ISSN 2327-4662. - 10:6(2023), pp. 5073-5082. [10.1109/JIOT.2022.3222014]

Dynamic ConvNets on Tiny Devices via Nested Sparsity

Matteo Grimaldi;Luca Mocerino;Antonio Cipolletta;Andrea Calimera
2023

Abstract

This work introduces a new training and compression pipeline to build nested sparse convolutional neural networks (ConvNets), a class of dynamic ConvNets suited for inference tasks deployed on resource-constrained devices at the edge of the Internet of Things. A nested sparse ConvNet consists of a single ConvNet architecture, containing N sparse subnetworks with nested weights subsets, like a Matryoshka doll, and can trade accuracy for latency at runtime, using the model sparsity as a dynamic knob. To attain high accuracy at training time, we propose a gradient masking technique that optimally routes the learning signals across the nested weight subsets. To minimize the storage footprint and efficiently process the obtained models at inference time, we introduce a new sparse matrix compression format with dedicated compute kernels that fruitfully exploit the characteristic of the nested weights subsets. Tested on image classification and object detection tasks on an off-the-shelf ARM-M7 microcontroller unit (MCU), nested sparse ConvNets outperform variable-latency solutions naively built assembling single sparse models trained as stand-alone instances, achieving 1) comparable accuracy; 2) remarkable storage savings; and 3) high performance. Moreover, when compared to state-of-the-art dynamic strategies, such as dynamic pruning and layer width scaling, nested sparse ConvNets turn out to be Pareto optimal in the accuracy versus latency space.
File in questo prodotto:
File Dimensione Formato  
final.pdf

accesso aperto

Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: PUBBLICO - Tutti i diritti riservati
Dimensione 1.1 MB
Formato Adobe PDF
1.1 MB Adobe PDF Visualizza/Apri
Dynamic_ConvNets_on_Tiny_Devices_via_Nested_Sparsity.pdf

non disponibili

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 1.36 MB
Formato Adobe PDF
1.36 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2977752