Large Language Models (LLM) have rapidly affirmed in the latest years as a means to support or substitute human actors in a variety of tasks. LLM agents can generate valid software models, because of their inherent ability in evaluating textual requirements provided to them in the form of prompts.The goal of this work is to evaluate the capability of LLM agents to correctly generate UML class diagrams in activities of Requirements Modeling in the field of Software Engineering. Our aim is to evaluate LLMs in an educational setting, i.e., understanding how valuable are the results of LLMs when compared to results made by human actors, and how valuable can LLM be to generate sample solutions to provide to students. For that purpose, we collected 20 exercises from a diverse set of web sources and compared the models generated by a human and an LLM solver in terms of syntactic, semantic, pragmatic correctness, and distance from a provided reference solution. Our results show that the solutions generated by an LLM solver typically present a significantly higher number of errors in terms of semantic quality and textual difference against the provided reference solution, while no significant difference is found in syntactic and pragmatic quality. We can therefore conclude that, with a limited amount of errors mostly related to the textual content of the solution, UML diagrams generated by LLM agents have the same level of understandability as those generated by humans, and exhibit the same frequency in violating rules of UML Class Diagrams.
Evaluating Large Language Models in Exercises of UML Class Diagram Modeling / De Bari, Daniele; Garaccione, Giacomo; Coppola, Riccardo; Torchiano, Marco; Ardito, Luca. - ELETTRONICO. - (2024), pp. 393-399. (Intervento presentato al convegno ESEM '24:18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement tenutosi a Barcelona (ES) nel October 24 - 25, 2024) [10.1145/3674805.3690741].
Evaluating Large Language Models in Exercises of UML Class Diagram Modeling
Garaccione, Giacomo;Coppola, Riccardo;Torchiano, Marco;Ardito, Luca
2024
Abstract
Large Language Models (LLM) have rapidly affirmed in the latest years as a means to support or substitute human actors in a variety of tasks. LLM agents can generate valid software models, because of their inherent ability in evaluating textual requirements provided to them in the form of prompts.The goal of this work is to evaluate the capability of LLM agents to correctly generate UML class diagrams in activities of Requirements Modeling in the field of Software Engineering. Our aim is to evaluate LLMs in an educational setting, i.e., understanding how valuable are the results of LLMs when compared to results made by human actors, and how valuable can LLM be to generate sample solutions to provide to students. For that purpose, we collected 20 exercises from a diverse set of web sources and compared the models generated by a human and an LLM solver in terms of syntactic, semantic, pragmatic correctness, and distance from a provided reference solution. Our results show that the solutions generated by an LLM solver typically present a significantly higher number of errors in terms of semantic quality and textual difference against the provided reference solution, while no significant difference is found in syntactic and pragmatic quality. We can therefore conclude that, with a limited amount of errors mostly related to the textual content of the solution, UML diagrams generated by LLM agents have the same level of understandability as those generated by humans, and exhibit the same frequency in violating rules of UML Class Diagrams.File | Dimensione | Formato | |
---|---|---|---|
3674805.3690741.pdf
accesso aperto
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Creative commons
Dimensione
499.54 kB
Formato
Adobe PDF
|
499.54 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2993437