Temporal Convolutional Networks (TCNs) are emerging lightweight Deep Learning models for Time Series analysis. We introduce an automated exploration approach and a library of optimized kernels to map TCNs on Parallel Ultra-Low Power (PULP) microcontrollers. Our approach minimizes latency and energy by exploiting a layer tiling optimizer to jointly find the tiling dimensions and select among alternative implementations of the causal and dilated 1D-convolution operations at the core of TCNs. We benchmark our approach on a commercial PULP device, achieving up to 103 imes lower latency and 20.3 imes lower energy than the Cube-AI toolkit executed on the STM32L4 and from 2.9 imes to 26.6 imes lower energy compared to commercial closed-source and academic open-source approaches on the same hardware target.
TCN Mapping Optimization for Ultra-Low Power Time-Series Edge Inference / Burrello, Alessio; Dequino, Alberto; Jahier Pagliari, Daniele; Conti, Franceso; Zanghieri, Marcello; Macii, Enrico; Benini, Luca; Poncino, Massimo. - ELETTRONICO. - 2021:(2021), pp. 1-6. (Intervento presentato al convegno 2021 IEEE/ACM International Symposium on Low Power Electronics and Design, ISLPED 2021 tenutosi a USA nel 2021) [10.1109/ISLPED52811.2021.9502494].
TCN Mapping Optimization for Ultra-Low Power Time-Series Edge Inference
Burrello, Alessio;Dequino, Alberto;Jahier Pagliari, Daniele;Macii, Enrico;Poncino, Massimo
2021
Abstract
Temporal Convolutional Networks (TCNs) are emerging lightweight Deep Learning models for Time Series analysis. We introduce an automated exploration approach and a library of optimized kernels to map TCNs on Parallel Ultra-Low Power (PULP) microcontrollers. Our approach minimizes latency and energy by exploiting a layer tiling optimizer to jointly find the tiling dimensions and select among alternative implementations of the causal and dilated 1D-convolution operations at the core of TCNs. We benchmark our approach on a commercial PULP device, achieving up to 103 imes lower latency and 20.3 imes lower energy than the Cube-AI toolkit executed on the STM32L4 and from 2.9 imes to 26.6 imes lower energy compared to commercial closed-source and academic open-source approaches on the same hardware target.File | Dimensione | Formato | |
---|---|---|---|
ISPLED21___TCN_Library.pdf
accesso aperto
Descrizione: Articolo principale (post-print)
Tipologia:
2. Post-print / Author's Accepted Manuscript
Licenza:
PUBBLICO - Tutti i diritti riservati
Dimensione
672.96 kB
Formato
Adobe PDF
|
672.96 kB | Adobe PDF | Visualizza/Apri |
TCN_Mapping_Optimization_for_Ultra-Low_Power_Time-Series_Edge_Inference.pdf
non disponibili
Descrizione: Articolo principale (versione editoriale)
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
766.81 kB
Formato
Adobe PDF
|
766.81 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2924896