AutoPenBench: Benchmarking Generative Agents for Penetration Testing

Gioacchini, Luca; Mellia, Marco; Drago, Idilio; Delsanto, Alexander; Siracusano, Giuseppe; Bifulco, Roberto

doi:10.48550/arXiv.2410.03225

Generative AI agents, software systems powered by Large Language Models (LLMs), are emerging as a promising approach to automate cybersecurity tasks. Among the others, penetration testing is a challenging field due to the task complexity and the diverse strategies to simulate cyber-attacks. Despite growing interest and initial studies in automating penetration testing with generative agents, there remains a significant gap in the form of a comprehensive and standard framework for their evaluation and development. This paper introduces AutoPenBench, an open benchmark for evaluating generative agents in automated penetration testing. We present a comprehensive framework that includes 33 tasks, each representing a vulnerable system that the agent has to attack. Tasks are of increasing difficulty levels, including in-vitro and real-world scenarios. We assess the agent performance with generic and specific milestones that allow us to compare results in a standardised manner and understand the limits of the agent under test. We show the benefits of AutoPenBench by testing two agent architectures: a fully autonomous and a semi-autonomous supporting human interaction. We compare their performance and limitations. For example, the fully autonomous agent performs unsatisfactorily achieving a 21% Success Rate (SR) across the benchmark, solving 27% of the simple tasks and only one real-world task. In contrast, the assisted agent demonstrates substantial improvements, with 64% of SR. AutoPenBench allows us also to observe how different LLMs like GPT-4o or OpenAI o1 impact the ability of the agents to complete the tasks. We believe that our benchmark fills the gap with a standard and flexible framework to compare penetration testing agents on a common ground. We hope to extend AutoPenBench along with the research community by making it available under https://github.com/lucagioacchini/auto-pen-bench.

AutoPenBench: Benchmarking Generative Agents for Penetration Testing / Gioacchini, L., Mellia, M., Drago, I., Delsanto, A., Siracusano, G., Bifulco, R.. - ELETTRONICO. - (2024). [10.48550/arXiv.2410.03225]

AutoPenBench: Benchmarking Generative Agents for Penetration Testing

Luca Gioacchini;Marco Mellia;Idilio Drago;Alexander Delsanto;Giuseppe Siracusano;Roberto Bifulco

2024

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2024
			
	Citazione
	
				AutoPenBench: Benchmarking Generative Agents for Penetration Testing / Gioacchini, L., Mellia, M., Drago, I., Delsanto, A., Siracusano, G., Bifulco, R.. - ELETTRONICO. - (2024). [10.48550/arXiv.2410.03225]
			
	Appare nelle tipologie
	
				5.15 Pubblicazione su portale

File in questo prodotto:

File	Dimensione	Formato
2410.03225v2.pdf accesso aperto Descrizione: Pre-Print Tipologia: 1. Preprint / submitted version [pre- review] Licenza: Pubblico - Tutti i diritti riservati Dimensione 2.51 MB Formato Adobe PDF Visualizza/Apri	2.51 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2994122

PORTO @ Archivio Istituzionale della Ricerca

AutoPenBench: Benchmarking Generative Agents for Penetration Testing

Luca Gioacchini;Marco Mellia;Idilio Drago;Alexander Delsanto;Giuseppe Siracusano;Roberto Bifulco

2024

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)