Large Language Models (LLMs) are gaining traction in cybersecurity applications, offering both promising opportunities and potential new risks. The use of these models in sub-domains such as automotive is still in its early stages. In this work-in-progress study, we use GPT-4o from OpenAI to generate a preliminary set of domain-relevant cybersecurity questions exploiting the Automotive Information Sharing and Analysis Center (Auto-ISAC) framework, which we then refined through manual validation. We exploited the final set of 25 questions to evaluate the performance of five LLMs models. Then, these questions were administered through a survey to a group of 17 domain experts, allowing us to compare this baseline with the results from the LLMs. From our preliminary findings, we found that LLMs reached a mean of 91.2% of correct answers on the test while human experts’ performance reached 64.7%. This study lays the groundwork for future investigations into the use of LLMs in the automotive-security domain and into the safe and trustworthy exploitation of LLMs.
Assessing LLMs models’ knowledge of automotive cyberthreats benchmarking autoISAC framework / Scarano, Nicola; Mannella, Luca; Savino, Alessandro; Di Carlo, Stefano. - (2025). (Intervento presentato al convegno CSCS '25: Proceedings of the 2025 Cyber Security in CarS Workshop tenutosi a Taipei) [10.1145/3736130.3762690].
Assessing LLMs models’ knowledge of automotive cyberthreats benchmarking autoISAC framework
Scarano, Nicola;Mannella, Luca;Savino, Alessandro;Di Carlo, Stefano
2025
Abstract
Large Language Models (LLMs) are gaining traction in cybersecurity applications, offering both promising opportunities and potential new risks. The use of these models in sub-domains such as automotive is still in its early stages. In this work-in-progress study, we use GPT-4o from OpenAI to generate a preliminary set of domain-relevant cybersecurity questions exploiting the Automotive Information Sharing and Analysis Center (Auto-ISAC) framework, which we then refined through manual validation. We exploited the final set of 25 questions to evaluate the performance of five LLMs models. Then, these questions were administered through a survey to a group of 17 domain experts, allowing us to compare this baseline with the results from the LLMs. From our preliminary findings, we found that LLMs reached a mean of 91.2% of correct answers on the test while human experts’ performance reached 64.7%. This study lays the groundwork for future investigations into the use of LLMs in the automotive-security domain and into the safe and trustworthy exploitation of LLMs.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/3004901
Attenzione
Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo
