The success of deep learning comes at the cost of very high computational complexity. Consequently, Internet of Things (IoT) edge nodes typically offload deep learning tasks to powerful cloud servers, an inherently inefficient solution. In fact, transmitting raw data to the cloud through wireless links incurs long latencies and high energy consumption. Moreover, pure cloud offloading is not scalable due to network pressure and poses security concerns related to the transmission of user data. The straightforward solution to these issues is to perform deep learning inference at the edge. However, cost and power-constrained embedded processors with limited processing and memory capabilities cannot handle complex deep learning models. Even resorting to hardware acceleration, a common approach to handle such complexity, embedded devices are still not able to directly manage models designed for cloud servers. It becomes then necessary to employ proper optimization strategies to enable deep learning processing at the edge. In this chapter, we survey the most relevant optimizations to support embedded deep learning inference. We focus in particular on optimizations that favor hardware acceleration (such as quantization and big-little architectures). We divide our analysis in two parts. First, we review classic approaches based on static (design time) optimizations. We then show how these solutions are often suboptimal, as they produce models that are either over-optimized for complex inputs (yielding accuracy losses) or under-optimized for simple inputs (losing energy saving opportunities). Finally, we review the more recent trend of dynamic (input-dependent) optimizations, which solve this problem by adapting the optimization to the processed input.

Energy-efficient deep learning inference on edge devices / Daghero, F.; Jahier Pagliari, D.; Poncino, M. (ADVANCES IN COMPUTERS). - In: Hardware Accelerator Systems for Artificial Intelligence and Machine Learning / Shiho K., Ganesh C. D.. - ELETTRONICO. - [s.l] : Elsevier, 2021. - ISBN 978-0-12-823123-4. - pp. 247-301 [10.1016/bs.adcom.2020.07.002]

Energy-efficient deep learning inference on edge devices

Daghero F.;Jahier Pagliari D.;Poncino M.
2021

Abstract

The success of deep learning comes at the cost of very high computational complexity. Consequently, Internet of Things (IoT) edge nodes typically offload deep learning tasks to powerful cloud servers, an inherently inefficient solution. In fact, transmitting raw data to the cloud through wireless links incurs long latencies and high energy consumption. Moreover, pure cloud offloading is not scalable due to network pressure and poses security concerns related to the transmission of user data. The straightforward solution to these issues is to perform deep learning inference at the edge. However, cost and power-constrained embedded processors with limited processing and memory capabilities cannot handle complex deep learning models. Even resorting to hardware acceleration, a common approach to handle such complexity, embedded devices are still not able to directly manage models designed for cloud servers. It becomes then necessary to employ proper optimization strategies to enable deep learning processing at the edge. In this chapter, we survey the most relevant optimizations to support embedded deep learning inference. We focus in particular on optimizations that favor hardware acceleration (such as quantization and big-little architectures). We divide our analysis in two parts. First, we review classic approaches based on static (design time) optimizations. We then show how these solutions are often suboptimal, as they produce models that are either over-optimized for complex inputs (yielding accuracy losses) or under-optimized for simple inputs (losing energy saving opportunities). Finally, we review the more recent trend of dynamic (input-dependent) optimizations, which solve this problem by adapting the optimization to the processed input.
2021
978-0-12-823123-4
Hardware Accelerator Systems for Artificial Intelligence and Machine Learning
File in questo prodotto:
File Dimensione Formato  
preprint.pdf

accesso riservato

Descrizione: Pre-print
Tipologia: 1. Preprint / submitted version [pre- review]
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 1.22 MB
Formato Adobe PDF
1.22 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2851984