The analysis of Visually-Rich Documents (VRDs) is crucial in the banking sector to support Trend and Risk Analysis (TRA) as financial TRA documents are multimodal to a large extent. Recently, Retrieval Augmented Generation (RAG) systems have enabled the effective use of Large Language Models (LLMs) to answer questions related to multimodal content. However, the inherent verbosity and complexity of financial documents could degrade the quality of the generated answers. In this work, we explore the use of text summarization techniques to condense the information retrieved from TRA-related VRDs. We analyze the level of synthesis of the original RAG answers, both with and without cascading an ad hoc summarization step. We apply summarization performance measures to compare standard RAG answers with the summarization outputs achieved on the retrieved passages directly. The results show that proprietary LLMs (GPT-4o) significantly improve the RAG's ability to sum up the retrieved passages, whereas integrating open-source LLMs or traditional summarizers turns out to be not beneficial even while applying the summarization step on top of the RAG answer.
Retrieval Augmented Generation of Summarized Answers on Visually-Rich Documents for Trend and Risk Analysis / Gallipoli, Giuseppe; Cagliero, Luca; Mosca, Alessandro; Miola, Arianna; Borghi, Daniele. - ELETTRONICO. - 3946:(2025), pp. 1-7. (Intervento presentato al convegno The 9th International Workshop on Data Analytics solutions for Real-LIfe APplications (DARLI-AP) tenutosi a Barcelona (ES) nel March 25, 2025).
Retrieval Augmented Generation of Summarized Answers on Visually-Rich Documents for Trend and Risk Analysis
Gallipoli, Giuseppe;Cagliero, Luca;
2025
Abstract
The analysis of Visually-Rich Documents (VRDs) is crucial in the banking sector to support Trend and Risk Analysis (TRA) as financial TRA documents are multimodal to a large extent. Recently, Retrieval Augmented Generation (RAG) systems have enabled the effective use of Large Language Models (LLMs) to answer questions related to multimodal content. However, the inherent verbosity and complexity of financial documents could degrade the quality of the generated answers. In this work, we explore the use of text summarization techniques to condense the information retrieved from TRA-related VRDs. We analyze the level of synthesis of the original RAG answers, both with and without cascading an ad hoc summarization step. We apply summarization performance measures to compare standard RAG answers with the summarization outputs achieved on the retrieved passages directly. The results show that proprietary LLMs (GPT-4o) significantly improve the RAG's ability to sum up the retrieved passages, whereas integrating open-source LLMs or traditional summarizers turns out to be not beneficial even while applying the summarization step on top of the RAG answer.File | Dimensione | Formato | |
---|---|---|---|
DARLI-AP-6.pdf
accesso aperto
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Creative commons
Dimensione
1.34 MB
Formato
Adobe PDF
|
1.34 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/3002065