The success of deep learning comes at the cost of very high computational complexity. Consequently, Internet of Things (IoT) edge nodes typically offload deep learning tasks to powerful cloud servers, an inherently inefficient solution. In fact, transmitting raw data to the cloud through wireless links incurs long latencies and high energy consumption. Moreover, pure cloud offloading is not scalable due to network pressure and poses security concerns related to the transmission of user data. The straightforward solution to these issues is to perform deep learning inference at the edge. However, cost and power-constrained embedded processors with limited processing and memory capabilities cannot handle complex deep learning models. Even resorting to hardware acceleration, a common approach to handle such complexity, embedded devices are still not able to directly manage models designed for cloud servers. It becomes then necessary to employ proper optimization strategies to enable deep learning processing at the edge. In this chapter, we survey the most relevant optimizations to support embedded deep learning inference. We focus in particular on optimizations that favor hardware acceleration (such as quantization and big-little architectures). We divide our analysis in two parts. First, we review classic approaches based on static (design time) optimizations. We then show how these solutions are often suboptimal, as they produce models that are either over-optimized for complex inputs (yielding accuracy losses) or under-optimized for simple inputs (losing energy saving opportunities). Finally, we review the more recent trend of dynamic (input-dependent) optimizations, which solve this problem by adapting the optimization to the processed input.
Energy-efficient deep learning inference on edge devices / Daghero, F.; Jahier Pagliari, D.; Poncino, M. (ADVANCES IN COMPUTERS). - In: Hardware Accelerator Systems for Artificial Intelligence and Machine Learning / Shiho K., Ganesh C. D.. - ELETTRONICO. - [s.l] : Elsevier, 2021. - ISBN 978-0-12-823123-4. - pp. 247-301 [10.1016/bs.adcom.2020.07.002]
Energy-efficient deep learning inference on edge devices
Daghero F.;Jahier Pagliari D.;Poncino M.
2021
Abstract
The success of deep learning comes at the cost of very high computational complexity. Consequently, Internet of Things (IoT) edge nodes typically offload deep learning tasks to powerful cloud servers, an inherently inefficient solution. In fact, transmitting raw data to the cloud through wireless links incurs long latencies and high energy consumption. Moreover, pure cloud offloading is not scalable due to network pressure and poses security concerns related to the transmission of user data. The straightforward solution to these issues is to perform deep learning inference at the edge. However, cost and power-constrained embedded processors with limited processing and memory capabilities cannot handle complex deep learning models. Even resorting to hardware acceleration, a common approach to handle such complexity, embedded devices are still not able to directly manage models designed for cloud servers. It becomes then necessary to employ proper optimization strategies to enable deep learning processing at the edge. In this chapter, we survey the most relevant optimizations to support embedded deep learning inference. We focus in particular on optimizations that favor hardware acceleration (such as quantization and big-little architectures). We divide our analysis in two parts. First, we review classic approaches based on static (design time) optimizations. We then show how these solutions are often suboptimal, as they produce models that are either over-optimized for complex inputs (yielding accuracy losses) or under-optimized for simple inputs (losing energy saving opportunities). Finally, we review the more recent trend of dynamic (input-dependent) optimizations, which solve this problem by adapting the optimization to the processed input.File | Dimensione | Formato | |
---|---|---|---|
preprint.pdf
accesso riservato
Descrizione: Pre-print
Tipologia:
1. Preprint / submitted version [pre- review]
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
1.22 MB
Formato
Adobe PDF
|
1.22 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2851984