We consider the collaborative inference acceleration task via cloud-edge-end collaboration, which involves a series of tightly coupled decision-making steps, including which DNN model to be selected, how much to compress model, how to partition model, and where to offload partitioned submodels. In practical deployments, these decisions jointly affect both fine- tuning and inference performance, and must jointly account for such aspects as the model being used, the computational resources and local datasets available at each device, as well as network latencies, which significantly increases the complexity of optimizing the problem. Yet, no existing studies focus on such joint optimization problem for these tightly coupled decisions. In this paper, we model this problem as a multi-dimensional optimization problem, jointly optimizing collaborative inference and fine-tuning by selecting the DNN model, compression level, partition strategy, and computational resource allocation, with the objective of minimizing the overall energy consumption of the learning-inference process, subject to accuracy and latency constraints. To this end, we propose an algorithmic framework called JQODI combining a time-energy tree diagram to rep- resent the learning process, a dynamic programming solution strategy, and a data-driven theoretical approach to predict the expected total number of training epochs that meet the accuracy requirements. We prove that JQODI approximates the optimal solution with polynomial complexity. Numerical results demonstrate that JQODI surpasses state-of-the-art methods in both energy efficiency and latency.

Towards Energy-Efficient Collaborative Inference and Fine-tuning: Matching Model Compression and Offloading with Resource Availability / Zhou, Yue-e; Ma, Lianbo; Wang, Xingwei; Li, Qing; Chiasserini, Carla Fabiana; Han, Guangjie. - In: IEEE TRANSACTIONS ON NETWORKING. - ISSN 2998-4157. - (2026).

Towards Energy-Efficient Collaborative Inference and Fine-tuning: Matching Model Compression and Offloading with Resource Availability

Carla Fabiana Chiasserini;
2026

Abstract

We consider the collaborative inference acceleration task via cloud-edge-end collaboration, which involves a series of tightly coupled decision-making steps, including which DNN model to be selected, how much to compress model, how to partition model, and where to offload partitioned submodels. In practical deployments, these decisions jointly affect both fine- tuning and inference performance, and must jointly account for such aspects as the model being used, the computational resources and local datasets available at each device, as well as network latencies, which significantly increases the complexity of optimizing the problem. Yet, no existing studies focus on such joint optimization problem for these tightly coupled decisions. In this paper, we model this problem as a multi-dimensional optimization problem, jointly optimizing collaborative inference and fine-tuning by selecting the DNN model, compression level, partition strategy, and computational resource allocation, with the objective of minimizing the overall energy consumption of the learning-inference process, subject to accuracy and latency constraints. To this end, we propose an algorithmic framework called JQODI combining a time-energy tree diagram to rep- resent the learning process, a dynamic programming solution strategy, and a data-driven theoretical approach to predict the expected total number of training epochs that meet the accuracy requirements. We prove that JQODI approximates the optimal solution with polynomial complexity. Numerical results demonstrate that JQODI surpasses state-of-the-art methods in both energy efficiency and latency.
2026
File in questo prodotto:
File Dimensione Formato  
TON_manuscript.pdf

accesso aperto

Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: Pubblico - Tutti i diritti riservati
Dimensione 3.65 MB
Formato Adobe PDF
3.65 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3010272