Automated text summarization is particularly important in the legal domain due to the length and inherent complexity of the analyzed documents. The Legal AI community has already started to address the text summarization problem. However, most existing approaches focus on English-written documents. Up to now, limited efforts have been devoted to summarizing Italian legal documents. Existing approaches extract portions of existing content without rephrasing them. To bridge this gap, in this work we aim at generating abstractive summaries of Italian legal news. We propose to condense the original news content into different summary types, i.e., an abstract, a title, or a subheader. We benchmark different state-of-the-art summarization models to generate abstractive summaries of Italian legal news. We also investigate the suitability of augmented models capable of handling long Italian documents. The experimental results achieved on a proprietary Italian dataset show the effectiveness of abstractive models in generating fairly accurate summaries and the importance of using larger contextual windows to generate news abstracts.
Benchmarking Abstractive Models for Italian Legal News Summarization / Benedetto, Irene; Cagliero, Luca; Tarasconi, Francesco; Giacalone, Giuseppe; Bernini, Claudia. - ELETTRONICO. - 379:(2023), pp. 311-316. (Intervento presentato al convegno JURIX2023: 36th International Conference on Legal Knowledge and Information Systems tenutosi a Maastricht (NLD) nel 18-20 December 2023) [10.3233/faia230980].
Benchmarking Abstractive Models for Italian Legal News Summarization
Benedetto, Irene;Cagliero, Luca;
2023
Abstract
Automated text summarization is particularly important in the legal domain due to the length and inherent complexity of the analyzed documents. The Legal AI community has already started to address the text summarization problem. However, most existing approaches focus on English-written documents. Up to now, limited efforts have been devoted to summarizing Italian legal documents. Existing approaches extract portions of existing content without rephrasing them. To bridge this gap, in this work we aim at generating abstractive summaries of Italian legal news. We propose to condense the original news content into different summary types, i.e., an abstract, a title, or a subheader. We benchmark different state-of-the-art summarization models to generate abstractive summaries of Italian legal news. We also investigate the suitability of augmented models capable of handling long Italian documents. The experimental results achieved on a proprietary Italian dataset show the effectiveness of abstractive models in generating fairly accurate summaries and the importance of using larger contextual windows to generate news abstracts.File | Dimensione | Formato | |
---|---|---|---|
FAIA-379-FAIA230980.pdf
accesso aperto
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Creative commons
Dimensione
177.11 kB
Formato
Adobe PDF
|
177.11 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2987349