With the rapid advancement of Large Language Models, their role in educational contexts has gained increasing attention due to the text generation capabilities and their emerging value as tools for supporting students and teachers. One relevant application is the task of Text Simplification (TS), which aims to reduce text complexity to enhance readability and accessibility for users with limited linguistic abilities, such as children. Many approaches to Text Simplification focus on the English language with children as target user group, while they are limited for the Italian language. In this paper, to fill this gap, we introduce new resources designed for children aged 8 to 11, which will support the definition of the TS approach. We build our dataset by collecting texts from both the Italian Wikipedia and Vikidia, which is an online encyclopedia written for a young audience. Subsequently, we fine-tune a pre-trained language model on our proposed resource to generate simplified versions of Wikipedia pages that align with Vikidia’s linguistic style. The simplification process is further guided by a vocabulary built from Vikidia. This post-processing step enables the replacement of potentially complex terms in the output text with synonyms in the vocabulary that are accessible to children. The effectiveness of the system was evaluated through a qualitative human evaluation conducted by elementary school teachers, showing general positive assessments, alongside some divergent views and suggestions for refinement before the introduction of the educational context. We release code, prompts and the proposed dataset at https://github.com/marostiri/italian-children- text-simplification.
Modelling Children’s Language: Vikidia-based Resources to Support Italian Text Simplification / Tirico, Andrea; Braga, Marco; Murgia, Emiliana; Pasi, Gabriella. - (In corso di stampa). ( IEEE CAI 2026 Granada (ES) May 8-10, 2026).
Modelling Children’s Language: Vikidia-based Resources to Support Italian Text Simplification
Braga, Marco;
In corso di stampa
Abstract
With the rapid advancement of Large Language Models, their role in educational contexts has gained increasing attention due to the text generation capabilities and their emerging value as tools for supporting students and teachers. One relevant application is the task of Text Simplification (TS), which aims to reduce text complexity to enhance readability and accessibility for users with limited linguistic abilities, such as children. Many approaches to Text Simplification focus on the English language with children as target user group, while they are limited for the Italian language. In this paper, to fill this gap, we introduce new resources designed for children aged 8 to 11, which will support the definition of the TS approach. We build our dataset by collecting texts from both the Italian Wikipedia and Vikidia, which is an online encyclopedia written for a young audience. Subsequently, we fine-tune a pre-trained language model on our proposed resource to generate simplified versions of Wikipedia pages that align with Vikidia’s linguistic style. The simplification process is further guided by a vocabulary built from Vikidia. This post-processing step enables the replacement of potentially complex terms in the output text with synonyms in the vocabulary that are accessible to children. The effectiveness of the system was evaluated through a qualitative human evaluation conducted by elementary school teachers, showing general positive assessments, alongside some divergent views and suggestions for refinement before the introduction of the educational context. We release code, prompts and the proposed dataset at https://github.com/marostiri/italian-children- text-simplification.| File | Dimensione | Formato | |
|---|---|---|---|
|
CAI26_0557_FI (1).pdf
accesso riservato
Tipologia:
2. Post-print / Author's Accepted Manuscript
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
187.14 kB
Formato
Adobe PDF
|
187.14 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/3009794
