Artificial Intelligence (AI) and Large Language Models (LLMs), are increasingly finding application in network-related tasks, such as network configuration synthesis and dialogue-based interfaces to network measurements, among others. In this preliminary work, we restrict our focus to the application of AI agents to network troubleshooting and elaborate on the need for a standardized, reproducible, and open benchmarking platform, where to build and evaluate AI agents with low operational effort. This platform primarily aims at standardize and democratize the experimentation with AI agents, by enabling researchers and practitioners — including non-domain experts such as ML/AI engineers— to evaluate AI agents on curated problem sets, without concerns for underlying operational complexities. We present a modular and extensible benchmarking framework that supports widely adopted network emulators. It targets an extensible set of network issues in diverse real-world scenarios – e.g., data centers, access, WAN, etc. – and orchestrates the end-to-end evaluation workflows, including failure injection, telemetry instrumentation and collection, and agent performance evaluation. Agents can be easily connected through a single Application Programming Interface (API) to an emulation platform and rapidly evaluated. The code is publicly available at https://github.com/zhihao1998/LLM4NetLab.
Towards a Playground to Democratize Experimentation and Benchmarking of AI Agents for Network Troubleshooting / Wang, Zhihao; Cornacchia, Alessandro; Galante, Franco; Centofanti, Carlo; Sacco, Alessio; Jiang, Dingde. - ELETTRONICO. - (2025), pp. 1-3. (Intervento presentato al convegno 1st Workshop on Next-Generation Network Observability (NGNO) tenutosi a Coimbra, Portugal nel September 8 - 11, 2025) [10.1145/3748496.3748990].
Towards a Playground to Democratize Experimentation and Benchmarking of AI Agents for Network Troubleshooting
Wang, Zhihao;Galante, Franco;Sacco, Alessio;
2025
Abstract
Artificial Intelligence (AI) and Large Language Models (LLMs), are increasingly finding application in network-related tasks, such as network configuration synthesis and dialogue-based interfaces to network measurements, among others. In this preliminary work, we restrict our focus to the application of AI agents to network troubleshooting and elaborate on the need for a standardized, reproducible, and open benchmarking platform, where to build and evaluate AI agents with low operational effort. This platform primarily aims at standardize and democratize the experimentation with AI agents, by enabling researchers and practitioners — including non-domain experts such as ML/AI engineers— to evaluate AI agents on curated problem sets, without concerns for underlying operational complexities. We present a modular and extensible benchmarking framework that supports widely adopted network emulators. It targets an extensible set of network issues in diverse real-world scenarios – e.g., data centers, access, WAN, etc. – and orchestrates the end-to-end evaluation workflows, including failure injection, telemetry instrumentation and collection, and agent performance evaluation. Agents can be easily connected through a single Application Programming Interface (API) to an emulation platform and rapidly evaluated. The code is publicly available at https://github.com/zhihao1998/LLM4NetLab.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/3004654
Attenzione
Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo
