Towards Energy-Efficient Collaborative Inference and Fine-tuning: Matching Model Compression and Offloading with Resource Availability

Zhou, Yue-e; Lianbo, Ma; Wang, Xingwei; Qing, Li; Chiasserini, Carla Fabiana; Han, Guangjie

doi:10.1109/TON.2026.3689748

We consider the collaborative inference acceleration task via cloud-edge-end collaboration, which involves a series of tightly coupled decision-making steps, including which DNN model to be selected, how much to compress model, how to partition model, and where to offload partitioned submodels. In practical deployments, these decisions jointly affect both fine- tuning and inference performance, and must jointly account for such aspects as the model being used, the computational resources and local datasets available at each device, as well as network latencies, which significantly increases the complexity of optimizing the problem. Yet, no existing studies focus on such joint optimization problem for these tightly coupled decisions. In this paper, we model this problem as a multi-dimensional optimization problem, jointly optimizing collaborative inference and fine-tuning by selecting the DNN model, compression level, partition strategy, and computational resource allocation, with the objective of minimizing the overall energy consumption of the learning-inference process, subject to accuracy and latency constraints. To this end, we propose an algorithmic framework called JQODI combining a time-energy tree diagram to rep- resent the learning process, a dynamic programming solution strategy, and a data-driven theoretical approach to predict the expected total number of training epochs that meet the accuracy requirements. We prove that JQODI approximates the optimal solution with polynomial complexity. Numerical results demonstrate that JQODI surpasses state-of-the-art methods in both energy efficiency and latency.

Towards Energy-Efficient Collaborative Inference and Fine-tuning: Matching Model Compression and Offloading with Resource Availability / Zhou, Yue-e; Ma, Lianbo; Wang, Xingwei; Li, Qing; Chiasserini, Carla Fabiana; Han, Guangjie. - In: IEEE TRANSACTIONS ON NETWORKING. - ISSN 2998-4157. - (2026). [10.1109/TON.2026.3689748]

Towards Energy-Efficient Collaborative Inference and Fine-tuning: Matching Model Compression and Offloading with Resource Availability

Yue-e Zhou;Lianbo Ma;Xingwei Wang;Qing Li;Carla Fabiana Chiasserini;Guangjie Han

2026

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2026
			
	Codice DOI
	
				https://dx.doi.org/10.1109/TON.2026.3689748
			
	Titolo della Rivista
	
				IEEE TRANSACTIONS ON NETWORKING
			
	Appare nelle tipologie
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
TON_manuscript.pdf accesso aperto Tipologia: 2. Post-print / Author's Accepted Manuscript Licenza: Pubblico - Tutti i diritti riservati Dimensione 3.65 MB Formato Adobe PDF Visualizza/Apri	3.65 MB	Adobe PDF	Visualizza/Apri
Towards_Energy-Efficient_Collaborative_Inference_and_Fine-tuning_Matching_Model_Compression_and_Offloading_with_Resource_Availability.pdf accesso riservato Tipologia: 2a Post-print versione editoriale / Version of Record Licenza: Non Pubblico - Accesso privato/ristretto Dimensione 3.74 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	3.74 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3010272

PORTO @ Archivio Istituzionale della Ricerca

Towards Energy-Efficient Collaborative Inference and Fine-tuning: Matching Model Compression and Offloading with Resource Availability

Yue-e Zhou;Lianbo Ma;Xingwei Wang;Qing Li;Carla Fabiana Chiasserini;Guangjie Han

2026

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)