On the use of Pretrained Language Models for Legal Italian Document Classification

Benedetto, Irene; Sportelli, Gianpiero; Bertoldo, Sara; Tarasconi, Francesco; Cagliero, Luca; Giacalone, Giuseppe

doi:10.1016/j.procs.2023.10.215

Document classification is helpful for law professionals to improve content browsing and retrieval. Pretrained Language Models, such as BERT, have become established for legal document classification. However, legal content is quite diversified. For example, documents vary in length from very short maxims to relatively long judgements; certain document types are rich of domain-specific expressions and can be annotated with multiple labels from domain-specific taxonomies. This paper studies to what extent existing pretrained models are suited to the legal domain. Specifically, we examine a real business case focused on Italian legal document classification. On a proprietary dataset with thousands of diversified categories (e.g., legal judgements, maxims, and legal news) we explore the use of Pretrained Language Models adapted to handle various content types. We collect both quantitative and qualitative results, highlighting best and worst cases, anomalous categories, and limitations of currently available models.

On the use of Pretrained Language Models for Legal Italian Document Classification / Benedetto, Irene; Sportelli, Gianpiero; Bertoldo, Sara; Tarasconi, Francesco; Cagliero, Luca; Giacalone, Giuseppe. - In: PROCEDIA COMPUTER SCIENCE. - ISSN 1877-0509. - ELETTRONICO. - 225:(2023), pp. 2244-2253. (Intervento presentato al convegno 27th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems tenutosi a Atene, Grecia nel September 6-8, 2023) [10.1016/j.procs.2023.10.215].

On the use of Pretrained Language Models for Legal Italian Document Classification

Irene Benedetto;Gianpiero Sportelli;Sara Bertoldo;Francesco Tarasconi;Luca Cagliero;Giuseppe Giacalone

2023

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2023
			
	Titolo della Rivista
	
				PROCEDIA COMPUTER SCIENCE
			
	Appare nelle tipologie
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
1-s2.0-S187705092301373X-main.pdf accesso aperto Tipologia: 2a Post-print versione editoriale / Version of Record Licenza: Creative commons Dimensione 570.15 kB Formato Adobe PDF Visualizza/Apri	570.15 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2982618

PORTO @ Archivio Istituzionale della Ricerca

On the use of Pretrained Language Models for Legal Italian Document Classification

Irene Benedetto;Gianpiero Sportelli;Sara Bertoldo;Francesco Tarasconi;Luca Cagliero;Giuseppe Giacalone

2023

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)