To support multiple AI-based applications, mobile systems need to collaboratively execute DNN architectures on heterogeneous AI accelerators. At the same time, the increasing DNN complexity and high degree of diversity in workloads on multichip module (MCM) accelerators are pushing AI processing off mobile nodes onto the edge. This has made computationally intensive, edge-based solutions the dominant approach for the deployment of modern neural networks. However, the rigid structure of fully-executed DNNs fails to align with the mod- ular nature of MCM architectures, limiting their potential for efficient execution. In this paper, we introduce CLEAR, a novel optimization framework based on geometric programming that leverages both transformer-based and more canonical DNNs with early exits. CLEAR enables fast, coordinated decision- making across DNN design, workload distribution, and resource allocation, with the overarching goal of minimizing inference energy consumption. To our knowledge, this is the first work to integrate dynamic DNN optimization with decisions at both the communication infrastructure and hardware accelerator levels. We evaluate CLEAR using real-world wireless measurements and dynamic DNNs applied to computer vision inference tasks. Our results demonstrate that CLEAR achieves near-optimal performance and reduces energy consumption and resource usage by over 80% and 70%, respectively, compared to its benchmark.

CLEAR: Scheduling of Multi-model Mobile Workloads on Chiplet Edge Platforms / Singhal, C.; Mendula, M.; Malandrino, F.; Levorato, M.; Chiasserini, C. F.. - (2026). ( IEEE WoWMoM 2026 Bologna (Ita) 16-19June 2026).

CLEAR: Scheduling of Multi-model Mobile Workloads on Chiplet Edge Platforms

C. F. Chiasserini
2026

Abstract

To support multiple AI-based applications, mobile systems need to collaboratively execute DNN architectures on heterogeneous AI accelerators. At the same time, the increasing DNN complexity and high degree of diversity in workloads on multichip module (MCM) accelerators are pushing AI processing off mobile nodes onto the edge. This has made computationally intensive, edge-based solutions the dominant approach for the deployment of modern neural networks. However, the rigid structure of fully-executed DNNs fails to align with the mod- ular nature of MCM architectures, limiting their potential for efficient execution. In this paper, we introduce CLEAR, a novel optimization framework based on geometric programming that leverages both transformer-based and more canonical DNNs with early exits. CLEAR enables fast, coordinated decision- making across DNN design, workload distribution, and resource allocation, with the overarching goal of minimizing inference energy consumption. To our knowledge, this is the first work to integrate dynamic DNN optimization with decisions at both the communication infrastructure and hardware accelerator levels. We evaluate CLEAR using real-world wireless measurements and dynamic DNNs applied to computer vision inference tasks. Our results demonstrate that CLEAR achieves near-optimal performance and reduces energy consumption and resource usage by over 80% and 70%, respectively, compared to its benchmark.
File in questo prodotto:
File Dimensione Formato  
Scheduling_dynamic_models_chiplet-5.pdf

accesso aperto

Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: Pubblico - Tutti i diritti riservati
Dimensione 635.34 kB
Formato Adobe PDF
635.34 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3008711