Keyword-based Annotation of Visually-Rich Document Content for Trend and Risk Analysis using Large Language Models

Gallipoli, Giuseppe; Papicchio, Simone; Vaiani, Lorenzo; Cagliero, Luca; Miola, Arianna; Borghi, Daniele

In the banking and finance sectors, members of the business units focused on Trend and Risk Analysis daily process internal and external visually-rich documents including text, images, and tables. Given a facet (i.e., topic) of interest, they are particularly interested in retrieving the top trending keywords related to it and then use them to annotate the most relevant document elements (e.g., text paragraphs, images or tables). In this paper, we explore the use of both open-source and proprietary Large Language Models to automatically generate lists of facet-relevant keywords, automatically produce free-text descriptions of both keywords and multimedia document content, and then annotate documents by leveraging textual similarity approaches. The preliminary results, achieved on English and Italian documents, show that OpenAI GPT-4 achieves superior performance in keyword description generation and multimedia content annotation, while the open-source Meta AI Llama2 model turns out to be highly competitive in generating additional keywords.

Keyword-based Annotation of Visually-Rich Document Content for Trend and Risk Analysis using Large Language Models / Gallipoli, G., Papicchio, S., Vaiani, L., Cagliero, L., Miola, A., Borghi, D.. - (2024), pp. 130-136. (The Joint Workshop of the 7th Financial Technology and Natural Language Processing (FinNLP), the 5th Knowledge Discovery from Unstructured Data in Financial Services (KDF), and the 4th Economics and Natural Language Processing (ECONLP) Workshop (FinNLP-KDF-ECONLP 2024) Turin (ITA) 20 May, 2024).

Keyword-based Annotation of Visually-Rich Document Content for Trend and Risk Analysis using Large Language Models

Giuseppe Gallipoli;Simone Papicchio;Lorenzo Vaiani;Luca Cagliero;Arianna Miola;Daniele Borghi

2024

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Anno del prodotto

2024

Appare nelle tipologie

4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
2024.finnlp-1.13.pdf accesso aperto Tipologia: 2a Post-print versione editoriale / Version of Record Licenza: Creative commons Dimensione 1.07 MB Formato Adobe PDF Visualizza/Apri	1.07 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2990378

PORTO @ Archivio Istituzionale della Ricerca

Keyword-based Annotation of Visually-Rich Document Content for Trend and Risk Analysis using Large Language Models

Giuseppe Gallipoli;Simone Papicchio;Lorenzo Vaiani;Luca Cagliero;Arianna Miola;Daniele Borghi

2024

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)