Recurrent Neural Networks (RNNs) such as those based on the Long-Short Term Memory (LSTM) architecture are state-of-the-art deep learning models for sequence analysis. Given the complexity of RNN-based inference, IoT devices typically offload this task to a cloud server. However, the complexity of RNN inference strongly depends on the length of the processed input sequence. Therefore, when communication time is taken into account, it may be more convenient to process short input sequences locally and only offload long ones to the cloud. In this paper, we propose a low-overhead runtime tool that performs this decision automatically. Results based on performance profiling of real edge and cloud devices show that our method is able to reduce the total execution time of the system by up to 20% compared to solutions that execute the RNN inference fully locally or fully in the cloud.

Optimal Input-Dependent Edge-Cloud Partitioning for RNN Inference / Jahier Pagliari, Daniele; Chiaro, Roberta; Chen, Yukai; Macii, Enrico; Poncino, Massimo. - ELETTRONICO. - (2019), pp. 442-445. (Intervento presentato al convegno 2019 26th IEEE International Conference on Electronics, Circuits and Systems (ICECS) tenutosi a Genova (ITALY) nel 27-29 Novembre 2019) [10.1109/ICECS46596.2019.8965079].

Optimal Input-Dependent Edge-Cloud Partitioning for RNN Inference

Jahier Pagliari, Daniele;CHIARO, ROBERTA;Chen, Yukai;Macii, Enrico;Poncino, Massimo
2019

Abstract

Recurrent Neural Networks (RNNs) such as those based on the Long-Short Term Memory (LSTM) architecture are state-of-the-art deep learning models for sequence analysis. Given the complexity of RNN-based inference, IoT devices typically offload this task to a cloud server. However, the complexity of RNN inference strongly depends on the length of the processed input sequence. Therefore, when communication time is taken into account, it may be more convenient to process short input sequences locally and only offload long ones to the cloud. In this paper, we propose a low-overhead runtime tool that performs this decision automatically. Results based on performance profiling of real edge and cloud devices show that our method is able to reduce the total execution time of the system by up to 20% compared to solutions that execute the RNN inference fully locally or fully in the cloud.
2019
978-1-7281-0996-1
File in questo prodotto:
File Dimensione Formato  
08965079.pdf

non disponibili

Descrizione: Articolo principale
Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 628.18 kB
Formato Adobe PDF
628.18 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
postprint.pdf

accesso aperto

Descrizione: Post Print
Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: PUBBLICO - Tutti i diritti riservati
Dimensione 2.1 MB
Formato Adobe PDF
2.1 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2785765