Multi Stage Retrieval for Web Search During Crisis

Tcaciuc, Claudiu Constantin; Rege Cambrin, Daniele; Garza, Paolo

doi:10.3390/fi17060239

During crisis events, digital information volume can increase by over 500% within hours, with social media platforms alone generating millions of crisis-related posts. This volume creates critical challenges for emergency responders who require timely access to the concise subset of accurate information they are interested in. Existing approaches strongly rely on the power of large language models. However, the use of large language models limits the scalability of the retrieval procedure and may introduce hallucinations. This paper introduces a novel multi-stage text retrieval framework to enhance information retrieval during crises. Our framework employs a novel three-stage extractive pipeline where (1) a topic modeling component filters candidates based on thematic relevance, (2) an initial high-recall lexical retriever identifies a broad candidate set, and (3) a dense retriever reranks the remaining documents. This architecture balances computational efficiency with retrieval effectiveness, prioritizing high recall in early stages while refining precision in later stages. The framework avoids the introduction of hallucinations, achieving a 15% improvement in BERT-Score compared to existing solutions without requiring any costly abstractive model. Moreover, our sequential approach accelerates the search process by 5% compared to the use of a single-stage based on a dense retrieval approach, with minimal effect on the performance in terms of BERT-Score.

Multi Stage Retrieval for Web Search During Crisis / Tcaciuc, Claudiu Constantin; Rege Cambrin, Daniele; Garza, Paolo. - In: FUTURE INTERNET. - ISSN 1999-5903. - 17:6(2025). [10.3390/fi17060239]

Multi Stage Retrieval for Web Search During Crisis

Tcaciuc, Claudiu Constantin;Rege Cambrin, Daniele;Garza, Paolo

2025

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2025
			
	Codice DOI
	
				https://dx.doi.org/10.3390/fi17060239
			
	Titolo della Rivista
	
				FUTURE INTERNET
			
	Appare nelle tipologie
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
futureinternet-17-00239.pdf accesso aperto Tipologia: 2a Post-print versione editoriale / Version of Record Licenza: Creative commons Dimensione 1.27 MB Formato Adobe PDF Visualizza/Apri	1.27 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3001610

PORTO @ Archivio Istituzionale della Ricerca

Multi Stage Retrieval for Web Search During Crisis

Tcaciuc, Claudiu Constantin;Rege Cambrin, Daniele;Garza, Paolo

2025

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)