This paper investigates the capabilities and limitations of Large Language Model (LLM) agents in performing cybersecurity forensic tasks, including incident response, digital evidence correlation, and threat attribution. To enable a fair comparison of agents and LLMs, we introduce CFAbench, a novel benchmark designed to evaluate their forensic reasoning abilities. We leverage a controlled testbed where vulnerable services are instantiated, attacked, and monitored, generating forensic evidence in the form of packet captures and log traces. Using this setup, we generate 20 curated incidents targeting 13 distinct services, focusing on recent vulnerabilities. Each incident presents progressively complex checkpoints, culminating in the identification of the specific Common Vulnerabilities and Exposure (CVE). We evaluate different LLM-powered agent architectures, equipping them with essential forensic tools such as a PCAP Reader and an Information Retriever. Each agent is asked to analyse the incidents to systematically track their performance across different forensic checkpoints. While preliminary, our findings demonstrate the potential of LLM agents in cybersecurity forensics, revealing their strengths and critical areas for improvement. This study underscores the need for standardized benchmarks to assess LLM agents in cyber threat analysis rigorously. For this, we make CFA-bench open to the research community. Our results provide a foundation for future research aimed at refining agent architectures and enhancing their forensic reasoning capabilities.
CFA-Bench: Cybersecurity Forensic Llm Agent Benchmark and Testing / De Santis, Francesco; Huang, Kai; Valentim, Rodolfo; Giordano, Danilo; Mellia, Marco; Houidi, Zied Ben; Rossi, Dario. - (2025), pp. 217-225. (Intervento presentato al convegno 2025 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW) tenutosi a Venice (ITA) nel 30 June - 04 July 2025) [10.1109/eurospw67616.2025.00031].
CFA-Bench: Cybersecurity Forensic Llm Agent Benchmark and Testing
De Santis, Francesco;Huang, Kai;Valentim, Rodolfo;Giordano, Danilo;Mellia, Marco;Rossi, Dario
2025
Abstract
This paper investigates the capabilities and limitations of Large Language Model (LLM) agents in performing cybersecurity forensic tasks, including incident response, digital evidence correlation, and threat attribution. To enable a fair comparison of agents and LLMs, we introduce CFAbench, a novel benchmark designed to evaluate their forensic reasoning abilities. We leverage a controlled testbed where vulnerable services are instantiated, attacked, and monitored, generating forensic evidence in the form of packet captures and log traces. Using this setup, we generate 20 curated incidents targeting 13 distinct services, focusing on recent vulnerabilities. Each incident presents progressively complex checkpoints, culminating in the identification of the specific Common Vulnerabilities and Exposure (CVE). We evaluate different LLM-powered agent architectures, equipping them with essential forensic tools such as a PCAP Reader and an Information Retriever. Each agent is asked to analyse the incidents to systematically track their performance across different forensic checkpoints. While preliminary, our findings demonstrate the potential of LLM agents in cybersecurity forensics, revealing their strengths and critical areas for improvement. This study underscores the need for standardized benchmarks to assess LLM agents in cyber threat analysis rigorously. For this, we make CFA-bench open to the research community. Our results provide a foundation for future research aimed at refining agent architectures and enhancing their forensic reasoning capabilities.File | Dimensione | Formato | |
---|---|---|---|
CFA-Bench_Cybersecurity_Forensic_Llm_Agent_Benchmark_and_Testing.pdf
accesso riservato
Descrizione: Published
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
3.94 MB
Formato
Adobe PDF
|
3.94 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
CFABench_Agent_Foresics.pdf
accesso aperto
Tipologia:
2. Post-print / Author's Accepted Manuscript
Licenza:
Pubblico - Tutti i diritti riservati
Dimensione
1.41 MB
Formato
Adobe PDF
|
1.41 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/3002951