The ever-increasing volume of electronic legal documents calls for effective, language-specific summarization and headline generation techniques to make legal content more accessible and easy-to-use. In the context of Italian law existing summarization models are either extractive or focused on abstracting long-form summaries. As a result, the generated summaries have a low level of readability or are not suited to summarize common legal documents such as norms. This paper proposes LegItBART, a new abstractive summarization model. It leverages a BART-based sequence-to-sequence architecture that is specifically pre-trained on Italian legal corpora. To enable the generation of concise summaries and headlines, we release two new annotated datasets tailored to the Italian legal domain, namely LawCodes and LegItConcepts. To successfully handle input documents exceeding the maximum token length such as verbose norms, codes, or legal articles, we also extend BART by integrating a global-sparse-local attention mechanism. We empirically analyze the performance of different pretraining and fine-tuned model combinations. The results show that using a mix of general-purpose and domain-specific pre-training yields relevant summarization performance improvements. The fine-tuned version of LegItBART outperforms all the tested baselines even those characterized by a significantly higher number of model parameters.
LegItBART: a summarization model for Italian legal documents / Benedetto, Irene; La Quatra, Moreno; Cagliero, Luca. - In: ARTIFICIAL INTELLIGENCE AND LAW. - ISSN 0924-8463. - (2025). [10.1007/s10506-025-09436-y]
LegItBART: a summarization model for Italian legal documents
Benedetto, Irene;La Quatra, Moreno;Cagliero, Luca
2025
Abstract
The ever-increasing volume of electronic legal documents calls for effective, language-specific summarization and headline generation techniques to make legal content more accessible and easy-to-use. In the context of Italian law existing summarization models are either extractive or focused on abstracting long-form summaries. As a result, the generated summaries have a low level of readability or are not suited to summarize common legal documents such as norms. This paper proposes LegItBART, a new abstractive summarization model. It leverages a BART-based sequence-to-sequence architecture that is specifically pre-trained on Italian legal corpora. To enable the generation of concise summaries and headlines, we release two new annotated datasets tailored to the Italian legal domain, namely LawCodes and LegItConcepts. To successfully handle input documents exceeding the maximum token length such as verbose norms, codes, or legal articles, we also extend BART by integrating a global-sparse-local attention mechanism. We empirically analyze the performance of different pretraining and fine-tuned model combinations. The results show that using a mix of general-purpose and domain-specific pre-training yields relevant summarization performance improvements. The fine-tuned version of LegItBART outperforms all the tested baselines even those characterized by a significantly higher number of model parameters.| File | Dimensione | Formato | |
|---|---|---|---|
| 
									
										
										
										
										
											
												
												
												    
												
											
										
									
									
										
										
											s10506-025-09436-y.pdf
										
																				
									
										
											 accesso riservato 
											Tipologia:
											2a Post-print versione editoriale / Version of Record
										 
									
									
									
									
										
											Licenza:
											
											
												Non Pubblico - Accesso privato/ristretto
												
												
												
											
										 
									
									
										Dimensione
										1.43 MB
									 
									
										Formato
										Adobe PDF
									 
										
										
								 | 
								1.43 MB | Adobe PDF | Visualizza/Apri Richiedi una copia | 
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2997744
			
		
	
	
	
			      	