CFA-Bench: Cybersecurity Forensic Llm Agent Benchmark and Testing

De Santis, Francesco; Huang, Kai; Valentim, Rodolfo; Giordano, Danilo; Mellia, Marco; Houidi, Zied Ben; Rossi, Dario

doi:10.1109/eurospw67616.2025.00031

This paper investigates the capabilities and limitations of Large Language Model (LLM) agents in performing cybersecurity forensic tasks, including incident response, digital evidence correlation, and threat attribution. To enable a fair comparison of agents and LLMs, we introduce CFAbench, a novel benchmark designed to evaluate their forensic reasoning abilities. We leverage a controlled testbed where vulnerable services are instantiated, attacked, and monitored, generating forensic evidence in the form of packet captures and log traces. Using this setup, we generate 20 curated incidents targeting 13 distinct services, focusing on recent vulnerabilities. Each incident presents progressively complex checkpoints, culminating in the identification of the specific Common Vulnerabilities and Exposure (CVE). We evaluate different LLM-powered agent architectures, equipping them with essential forensic tools such as a PCAP Reader and an Information Retriever. Each agent is asked to analyse the incidents to systematically track their performance across different forensic checkpoints. While preliminary, our findings demonstrate the potential of LLM agents in cybersecurity forensics, revealing their strengths and critical areas for improvement. This study underscores the need for standardized benchmarks to assess LLM agents in cyber threat analysis rigorously. For this, we make CFA-bench open to the research community. Our results provide a foundation for future research aimed at refining agent architectures and enhancing their forensic reasoning capabilities.

CFA-Bench: Cybersecurity Forensic Llm Agent Benchmark and Testing / De Santis, Francesco; Huang, Kai; Valentim, Rodolfo; Giordano, Danilo; Mellia, Marco; Houidi, Zied Ben; Rossi, Dario. - (2025), pp. 217-225. ( 2025 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW) Venice (ITA) 30 June - 04 July 2025) [10.1109/eurospw67616.2025.00031].

CFA-Bench: Cybersecurity Forensic Llm Agent Benchmark and Testing

De Santis, Francesco;Huang, Kai;Valentim, Rodolfo;Giordano, Danilo;Mellia, Marco;Houidi, Zied Ben;Rossi, Dario

2025

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2025
			
	Codice ISBN
	
				979-8-3315-9546-3
			
	Appare nelle tipologie
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
CFA-Bench_Cybersecurity_Forensic_Llm_Agent_Benchmark_and_Testing.pdf accesso riservato Descrizione: Published Tipologia: 2a Post-print versione editoriale / Version of Record Licenza: Non Pubblico - Accesso privato/ristretto Dimensione 3.94 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	3.94 MB	Adobe PDF	Visualizza/Apri Richiedi una copia
CFABench_Agent_Foresics.pdf accesso aperto Tipologia: 2. Post-print / Author's Accepted Manuscript Licenza: Pubblico - Tutti i diritti riservati Dimensione 1.41 MB Formato Adobe PDF Visualizza/Apri	1.41 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3002951

PORTO @ Archivio Istituzionale della Ricerca

CFA-Bench: Cybersecurity Forensic Llm Agent Benchmark and Testing

De Santis, Francesco;Huang, Kai;Valentim, Rodolfo;Giordano, Danilo;Mellia, Marco;Houidi, Zied Ben;Rossi, Dario

2025

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)