A Graph Attention Network Combining Multifaceted Element Relationships for Full Document-Level Understanding

Vaiani, Lorenzo; Napolitano, Davide; Cagliero, Luca

doi:10.3390/computers14090362

Question answering from visually rich documents (VRDs) is the task of retrieving the correct answer to a natural language question by considering the content of textual and visual elements in the document, as well as the pages’ layout. To answer closed-ended questions that require a deep understanding of the hierarchical relationships between the elements, i.e., the full document-level understanding (FDU) task, state-of-the-art graph-based approaches to FDU model the pairwise element relationships in a graph model. Although they incorporate logical links (e.g., a caption refers to a figure) and spatial ones (e.g., a caption is placed below the figure), they currently disregard the semantic similarity among multimodal document elements, thus potentially yielding suboptimal scoring of the elements’ relevance to the input question. In this paper, we propose GRAS-FDU, a new graph attention network tailored to FDU. GATS-FDU is trained to jointly consider multiple document facets, i.e., the local, spatial, and semantic elements’ relationships. The results show that our approach achieves superior performance compared to several baseline methods.

A Graph Attention Network Combining Multifaceted Element Relationships for Full Document-Level Understanding / Vaiani, L., Napolitano, D., Cagliero, L.. - In: COMPUTERS. - ISSN 2073-431X. - 14:9(2025). [10.3390/computers14090362]

A Graph Attention Network Combining Multifaceted Element Relationships for Full Document-Level Understanding

Vaiani, Lorenzo;Napolitano, Davide;Cagliero, Luca

2025

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2025
			
	Codice DOI
	
				https://dx.doi.org/10.3390/computers14090362
			
	Titolo della Rivista
	
				COMPUTERS
			
	Appare nelle tipologie
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
computers-14-00362.pdf accesso aperto Tipologia: 2a Post-print versione editoriale / Version of Record Licenza: Creative commons Dimensione 421.1 kB Formato Adobe PDF Visualizza/Apri	421.1 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3003704

PORTO @ Archivio Istituzionale della Ricerca

A Graph Attention Network Combining Multifaceted Element Relationships for Full Document-Level Understanding

Vaiani, Lorenzo;Napolitano, Davide;Cagliero, Luca

2025

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)