Machine Learning (ML) techniques play a crucial role in extracting valuable insights from the large amounts of data massively collected through networked sensing systems. Given the increased capabilities of user devices and the growing demand for inference in mobile sensing applications, we are witnessing a paradigm shift where inference is executed at the end devices instead of burdening the network and cloud infrastructures. This paper investigates the performance of inference execution at the network edge and at end-devices, when using both a full and a pruned model. While pruning reduces model size, thus aking the model amenable for execution at an end-device and decreasing communication footprint, trade-offs in time complexity, potential accuracy loss, and energy consumption must be accounted for. We tackle such trade-offs through extensive experiments under various ML models, edge load conditions, and pruning factors. Our results show that executing a pruned model provides time and energy (on the device side) savings up to 40% and 53%, respectively, w.r.t. the full model. Also, executing inference at the end-device may lead to 60% faster decision making compared to inference execution at aighly loaded edge.

Machine Learning Performance at the Edge: When to Offload an Inference Task / Chukhno, Olga; Singh, Gurtaj; Campolo, Claudia; Molinaro, Antonella; Chiasserini, Carla Fabiana. - ELETTRONICO. - (2023). (Intervento presentato al convegno ACM MobiCom Workshop on Networked Sensing Systems for a Sustainable Society (NET4us 2023) tenutosi a Madrid (Spain) nel 6 October 2023) [10.1145/3615991.3616403].

Machine Learning Performance at the Edge: When to Offload an Inference Task

Carla Fabiana Chiasserini
2023

Abstract

Machine Learning (ML) techniques play a crucial role in extracting valuable insights from the large amounts of data massively collected through networked sensing systems. Given the increased capabilities of user devices and the growing demand for inference in mobile sensing applications, we are witnessing a paradigm shift where inference is executed at the end devices instead of burdening the network and cloud infrastructures. This paper investigates the performance of inference execution at the network edge and at end-devices, when using both a full and a pruned model. While pruning reduces model size, thus aking the model amenable for execution at an end-device and decreasing communication footprint, trade-offs in time complexity, potential accuracy loss, and energy consumption must be accounted for. We tackle such trade-offs through extensive experiments under various ML models, edge load conditions, and pruning factors. Our results show that executing a pruned model provides time and energy (on the device side) savings up to 40% and 53%, respectively, w.r.t. the full model. Also, executing inference at the end-device may lead to 60% faster decision making compared to inference execution at aighly loaded edge.
2023
979-8-4007-0365-2
File in questo prodotto:
File Dimensione Formato  
2023_NET4us_ADROIT6G-5.pdf

accesso aperto

Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: PUBBLICO - Tutti i diritti riservati
Dimensione 1.16 MB
Formato Adobe PDF
1.16 MB Adobe PDF Visualizza/Apri
Chiasserini-Machine.pdf

non disponibili

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 587.21 kB
Formato Adobe PDF
587.21 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2980928