With the rapid advancement of Large Language Models, their role in educational contexts has gained increasing attention due to the text generation capabilities and their emerging value as tools for supporting students and teachers. One relevant application is the task of Text Simplification (TS), which aims to reduce text complexity to enhance readability and accessibility for users with limited linguistic abilities, such as children. Many approaches to Text Simplification focus on the English language with children as target user group, while they are limited for the Italian language. In this paper, to fill this gap, we introduce new resources designed for children aged 8 to 11, which will support the definition of the TS approach. We build our dataset by collecting texts from both the Italian Wikipedia and Vikidia, which is an online encyclopedia written for a young audience. Subsequently, we fine-tune a pre-trained language model on our proposed resource to generate simplified versions of Wikipedia pages that align with Vikidia’s linguistic style. The simplification process is further guided by a vocabulary built from Vikidia. This post-processing step enables the replacement of potentially complex terms in the output text with synonyms in the vocabulary that are accessible to children. The effectiveness of the system was evaluated through a qualitative human evaluation conducted by elementary school teachers, showing general positive assessments, alongside some divergent views and suggestions for refinement before the introduction of the educational context. We release code, prompts and the proposed dataset at https://github.com/marostiri/italian-children- text-simplification.

Modelling Children’s Language: Vikidia-based Resources to Support Italian Text Simplification / Tirico, Andrea; Braga, Marco; Murgia, Emiliana; Pasi, Gabriella. - (In corso di stampa). ( IEEE CAI 2026 Granada (ES) May 8-10, 2026).

Modelling Children’s Language: Vikidia-based Resources to Support Italian Text Simplification

Braga, Marco;
In corso di stampa

Abstract

With the rapid advancement of Large Language Models, their role in educational contexts has gained increasing attention due to the text generation capabilities and their emerging value as tools for supporting students and teachers. One relevant application is the task of Text Simplification (TS), which aims to reduce text complexity to enhance readability and accessibility for users with limited linguistic abilities, such as children. Many approaches to Text Simplification focus on the English language with children as target user group, while they are limited for the Italian language. In this paper, to fill this gap, we introduce new resources designed for children aged 8 to 11, which will support the definition of the TS approach. We build our dataset by collecting texts from both the Italian Wikipedia and Vikidia, which is an online encyclopedia written for a young audience. Subsequently, we fine-tune a pre-trained language model on our proposed resource to generate simplified versions of Wikipedia pages that align with Vikidia’s linguistic style. The simplification process is further guided by a vocabulary built from Vikidia. This post-processing step enables the replacement of potentially complex terms in the output text with synonyms in the vocabulary that are accessible to children. The effectiveness of the system was evaluated through a qualitative human evaluation conducted by elementary school teachers, showing general positive assessments, alongside some divergent views and suggestions for refinement before the introduction of the educational context. We release code, prompts and the proposed dataset at https://github.com/marostiri/italian-children- text-simplification.
In corso di stampa
File in questo prodotto:
File Dimensione Formato  
CAI26_0557_FI (1).pdf

accesso riservato

Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 187.14 kB
Formato Adobe PDF
187.14 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3009794