Extreme Classification of European Union Law Documents driven by Entity Embeddings

Benedetto, I.; Cagliero, L.; Tarasconi, F.

Extreme Multi-label Classification (XMC) is the task of labeling documents with one or more labels from a large set of classes. In the context of Legal Artificial Intelligence, XMC is relevant to the automatic categorization of documents as they commonly address several orthogonal categorization schemes. Since retrieving a sufficient number of training document examples per class is challenging, XMC models are expected to be particularly effective in zero-shot learning scenarios. Existing approaches rely on transformer-based classification models, which leverage the attention mechanism to attend to specific textual units. However, classical attention scores are not able to differentiate between domain-specific and generic textual units. In this paper, we propose to use a legal entity-aware approach to zero-shot XMC of European Union law documents. By integrating information about domain-specific legal entities we ease the detection of label-sensitive information and prevent XMC models from attending to irrelevant or wrong text spans. The results achieved on the law documents available in the EURLex benchmark show that our approach is superior to both previous transformer-based approaches and opensource Large Language Models.

Extreme Classification of European Union Law Documents driven by Entity Embeddings / Benedetto, I.; Cagliero, L.; Tarasconi, F.. - ELETTRONICO. - 3651:(2024). (Intervento presentato al convegno EDBT/ICDT 2024 Joint Conference tenutosi a Paestum (IT) nel 25-29 March 2024).

Extreme Classification of European Union Law Documents driven by Entity Embeddings

Benedetto I.;Cagliero L.;Tarasconi F.

2024

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2024
			
	Titolo della Serie/Collana
	
				CEUR WORKSHOP PROCEEDINGS
			
	Appare nelle tipologie
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
DARLI-AP-5.pdf accesso aperto Tipologia: 2a Post-print versione editoriale / Version of Record Licenza: Creative commons Dimensione 1.28 MB Formato Adobe PDF Visualizza/Apri	1.28 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2990023

PORTO @ Archivio Istituzionale della Ricerca

Extreme Classification of European Union Law Documents driven by Entity Embeddings

Benedetto I.;Cagliero L.;Tarasconi F.

2024

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)