Large Language Models (LLMs) are gaining traction in cybersecurity applications, offering both promising opportunities and potential new risks. The use of these models in sub-domains such as automotive is still in its early stages. In this work-in-progress study, we use GPT-4o from OpenAI to generate a preliminary set of domain-relevant cybersecurity questions exploiting the Automotive Information Sharing and Analysis Center (Auto-ISAC) framework, which we then refined through manual validation. We exploited the final set of 25 questions to evaluate the performance of five LLMs models. Then, these questions were administered through a survey to a group of 17 domain experts, allowing us to compare this baseline with the results from the LLMs. From our preliminary findings, we found that LLMs reached a mean of 91.2% of correct answers on the test while human experts’ performance reached 64.7%. This study lays the groundwork for future investigations into the use of LLMs in the automotive-security domain and into the safe and trustworthy exploitation of LLMs.

Assessing LLMs models’ knowledge of automotive cyberthreats benchmarking autoISAC framework / Scarano, Nicola; Mannella, Luca; Savino, Alessandro; Di Carlo, Stefano. - ELETTRONICO. - (2025), pp. 1-7. (Intervento presentato al convegno CSCS '25: 2nd Cyber Security in CarS Workshop (CSC) tenutosi a Taipei (TWN) nel October 13-17, 2025) [10.1145/3736130.3762690].

Assessing LLMs models’ knowledge of automotive cyberthreats benchmarking autoISAC framework

Scarano, Nicola;Mannella, Luca;Savino, Alessandro;Di Carlo, Stefano
2025

Abstract

Large Language Models (LLMs) are gaining traction in cybersecurity applications, offering both promising opportunities and potential new risks. The use of these models in sub-domains such as automotive is still in its early stages. In this work-in-progress study, we use GPT-4o from OpenAI to generate a preliminary set of domain-relevant cybersecurity questions exploiting the Automotive Information Sharing and Analysis Center (Auto-ISAC) framework, which we then refined through manual validation. We exploited the final set of 25 questions to evaluate the performance of five LLMs models. Then, these questions were administered through a survey to a group of 17 domain experts, allowing us to compare this baseline with the results from the LLMs. From our preliminary findings, we found that LLMs reached a mean of 91.2% of correct answers on the test while human experts’ performance reached 64.7%. This study lays the groundwork for future investigations into the use of LLMs in the automotive-security domain and into the safe and trustworthy exploitation of LLMs.
2025
9798400719288
File in questo prodotto:
File Dimensione Formato  
3736130.3762690.pdf

accesso aperto

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Pubblico - Tutti i diritti riservati
Dimensione 835.96 kB
Formato Adobe PDF
835.96 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3004901