Recurrent Neural Networks (RNNs) such as those based on the Long-Short Term Memory (LSTM) architecture are state-of-the-art deep learning models for sequence analysis. Given the complexity of RNN-based inference, IoT devices typically offload this task to a cloud server. However, the complexity of RNN inference strongly depends on the length of the processed input sequence. Therefore, when communication time is taken into account, it may be more convenient to process short input sequences locally and only offload long ones to the cloud. In this paper, we propose a low-overhead runtime tool that performs this decision automatically. Results based on performance profiling of real edge and cloud devices show that our method is able to reduce the total execution time of the system by up to 20% compared to solutions that execute the RNN inference fully locally or fully in the cloud.
Optimal Input-Dependent Edge-Cloud Partitioning for RNN Inference / Jahier Pagliari, Daniele; Chiaro, Roberta; Chen, Yukai; Macii, Enrico; Poncino, Massimo. - ELETTRONICO. - (2019), pp. 442-445. ((Intervento presentato al convegno 2019 26th IEEE International Conference on Electronics, Circuits and Systems (ICECS) tenutosi a Genova (ITALY) nel 27-29 Novembre 2019.
Titolo: | Optimal Input-Dependent Edge-Cloud Partitioning for RNN Inference |
Autori: | |
Data di pubblicazione: | 2019 |
Abstract: | Recurrent Neural Networks (RNNs) such as those based on the Long-Short Term Memory (LSTM) archite...cture are state-of-the-art deep learning models for sequence analysis. Given the complexity of RNN-based inference, IoT devices typically offload this task to a cloud server. However, the complexity of RNN inference strongly depends on the length of the processed input sequence. Therefore, when communication time is taken into account, it may be more convenient to process short input sequences locally and only offload long ones to the cloud. In this paper, we propose a low-overhead runtime tool that performs this decision automatically. Results based on performance profiling of real edge and cloud devices show that our method is able to reduce the total execution time of the system by up to 20% compared to solutions that execute the RNN inference fully locally or fully in the cloud. |
ISBN: | 978-1-7281-0996-1 |
Appare nelle tipologie: | 4.1 Contributo in Atti di convegno |
File in questo prodotto:
File | Descrizione | Tipologia | Licenza | |
---|---|---|---|---|
08965079.pdf | Articolo principale | 2a Post-print versione editoriale / Version of Record | Non Pubblico - Accesso privato/ristretto | Administrator Richiedi una copia |
postprint.pdf | Post Print | 2. Post-print / Author's Accepted Manuscript | PUBBLICO - Tutti i diritti riservati | Visibile a tuttiVisualizza/Apri |
http://hdl.handle.net/11583/2785765