Recurrent Neural Networks (RNNs) such as those based on the Long-Short Term Memory (LSTM) architecture are state-of-the-art deep learning models for sequence analysis. Given the complexity of RNN-based inference, IoT devices typically offload this task to a cloud server. However, the complexity of RNN inference strongly depends on the length of the processed input sequence. Therefore, when communication time is taken into account, it may be more convenient to process short input sequences locally and only offload long ones to the cloud. In this paper, we propose a low-overhead runtime tool that performs this decision automatically. Results based on performance profiling of real edge and cloud devices show that our method is able to reduce the total execution time of the system by up to 20% compared to solutions that execute the RNN inference fully locally or fully in the cloud.
Optimal Input-Dependent Edge-Cloud Partitioning for RNN Inference / Jahier Pagliari, Daniele; Chiaro, Roberta; Chen, Yukai; Macii, Enrico; Poncino, Massimo. - ELETTRONICO. - (2019), pp. 442-445. (Intervento presentato al convegno 2019 26th IEEE International Conference on Electronics, Circuits and Systems (ICECS) tenutosi a Genova (ITALY) nel 27-29 Novembre 2019) [10.1109/ICECS46596.2019.8965079].
Optimal Input-Dependent Edge-Cloud Partitioning for RNN Inference
Jahier Pagliari, Daniele;CHIARO, ROBERTA;Chen, Yukai;Macii, Enrico;Poncino, Massimo
2019
Abstract
Recurrent Neural Networks (RNNs) such as those based on the Long-Short Term Memory (LSTM) architecture are state-of-the-art deep learning models for sequence analysis. Given the complexity of RNN-based inference, IoT devices typically offload this task to a cloud server. However, the complexity of RNN inference strongly depends on the length of the processed input sequence. Therefore, when communication time is taken into account, it may be more convenient to process short input sequences locally and only offload long ones to the cloud. In this paper, we propose a low-overhead runtime tool that performs this decision automatically. Results based on performance profiling of real edge and cloud devices show that our method is able to reduce the total execution time of the system by up to 20% compared to solutions that execute the RNN inference fully locally or fully in the cloud.File | Dimensione | Formato | |
---|---|---|---|
08965079.pdf
accesso riservato
Descrizione: Articolo principale
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
628.18 kB
Formato
Adobe PDF
|
628.18 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
postprint.pdf
accesso aperto
Descrizione: Post Print
Tipologia:
2. Post-print / Author's Accepted Manuscript
Licenza:
Pubblico - Tutti i diritti riservati
Dimensione
2.1 MB
Formato
Adobe PDF
|
2.1 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2785765