The success of deep learning comes at the cost of very high computational complexity. Consequently, Internet of Things (IoT) edge nodes typically offload deep learning tasks to powerful cloud servers, an inherently inefficient solution. In fact, transmitting raw data to the cloud through wireless links incurs long latencies and high energy consumption. Moreover, pure cloud offloading is not scalable due to network pressure and poses security concerns related to the transmission of user data. The straightforward solution to these issues is to perform deep learning inference at the edge. However, cost and power-constrained embedded processors with limited processing and memory capabilities cannot handle complex deep learning models. Even resorting to hardware acceleration, a common approach to handle such complexity, embedded devices are still not able to directly manage models designed for cloud servers. It becomes then necessary to employ proper optimization strategies to enable deep learning processing at the edge. In this chapter, we survey the most relevant optimizations to support embedded deep learning inference. We focus in particular on optimizations that favor hardware acceleration (such as quantization and big-little architectures). We divide our analysis in two parts. First, we review classic approaches based on static (design time) optimizations. We then show how these solutions are often suboptimal, as they produce models that are either over-optimized for complex inputs (yielding accuracy losses) or under-optimized for simple inputs (losing energy saving opportunities). Finally, we review the more recent trend of dynamic (input-dependent) optimizations, which solve this problem by adapting the optimization to the processed input.
Energy-efficient deep learning inference on edge devices / Daghero, F.; Jahier Pagliari, D.; Poncino, M.. - ELETTRONICO. - (In corso di stampa).
Titolo: | Energy-efficient deep learning inference on edge devices |
Autori: | |
Data di pubblicazione: | Being printed |
Titolo del libro: | Advances in Computers |
Serie: | |
Abstract: | The success of deep learning comes at the cost of very high computational complexity. Consequentl...y, Internet of Things (IoT) edge nodes typically offload deep learning tasks to powerful cloud servers, an inherently inefficient solution. In fact, transmitting raw data to the cloud through wireless links incurs long latencies and high energy consumption. Moreover, pure cloud offloading is not scalable due to network pressure and poses security concerns related to the transmission of user data. The straightforward solution to these issues is to perform deep learning inference at the edge. However, cost and power-constrained embedded processors with limited processing and memory capabilities cannot handle complex deep learning models. Even resorting to hardware acceleration, a common approach to handle such complexity, embedded devices are still not able to directly manage models designed for cloud servers. It becomes then necessary to employ proper optimization strategies to enable deep learning processing at the edge. In this chapter, we survey the most relevant optimizations to support embedded deep learning inference. We focus in particular on optimizations that favor hardware acceleration (such as quantization and big-little architectures). We divide our analysis in two parts. First, we review classic approaches based on static (design time) optimizations. We then show how these solutions are often suboptimal, as they produce models that are either over-optimized for complex inputs (yielding accuracy losses) or under-optimized for simple inputs (losing energy saving opportunities). Finally, we review the more recent trend of dynamic (input-dependent) optimizations, which solve this problem by adapting the optimization to the processed input. |
Appare nelle tipologie: | 2.1 Contributo in volume (Capitolo o Saggio) |
File in questo prodotto:
http://hdl.handle.net/11583/2851984