During crisis events, digital information volume can increase by over 500% within hours, with social media platforms alone generating millions of crisis-related posts. This volume creates critical challenges for emergency responders who require timely access to the concise subset of accurate information they are interested in. Existing approaches strongly rely on the power of large language models. However, the use of large language models limits the scalability of the retrieval procedure and may introduce hallucinations. This paper introduces a novel multi-stage text retrieval framework to enhance information retrieval during crises. Our framework employs a novel three-stage extractive pipeline where (1) a topic modeling component filters candidates based on thematic relevance, (2) an initial high-recall lexical retriever identifies a broad candidate set, and (3) a dense retriever reranks the remaining documents. This architecture balances computational efficiency with retrieval effectiveness, prioritizing high recall in early stages while refining precision in later stages. The framework avoids the introduction of hallucinations, achieving a 15% improvement in BERT-Score compared to existing solutions without requiring any costly abstractive model. Moreover, our sequential approach accelerates the search process by 5% compared to the use of a single-stage based on a dense retrieval approach, with minimal effect on the performance in terms of BERT-Score.

Multi Stage Retrieval for Web Search During Crisis / Tcaciuc, Claudiu Constantin; Rege Cambrin, Daniele; Garza, Paolo. - In: FUTURE INTERNET. - ISSN 1999-5903. - 17:6(2025). [10.3390/fi17060239]

Multi Stage Retrieval for Web Search During Crisis

Tcaciuc, Claudiu Constantin;Rege Cambrin, Daniele;Garza, Paolo
2025

Abstract

During crisis events, digital information volume can increase by over 500% within hours, with social media platforms alone generating millions of crisis-related posts. This volume creates critical challenges for emergency responders who require timely access to the concise subset of accurate information they are interested in. Existing approaches strongly rely on the power of large language models. However, the use of large language models limits the scalability of the retrieval procedure and may introduce hallucinations. This paper introduces a novel multi-stage text retrieval framework to enhance information retrieval during crises. Our framework employs a novel three-stage extractive pipeline where (1) a topic modeling component filters candidates based on thematic relevance, (2) an initial high-recall lexical retriever identifies a broad candidate set, and (3) a dense retriever reranks the remaining documents. This architecture balances computational efficiency with retrieval effectiveness, prioritizing high recall in early stages while refining precision in later stages. The framework avoids the introduction of hallucinations, achieving a 15% improvement in BERT-Score compared to existing solutions without requiring any costly abstractive model. Moreover, our sequential approach accelerates the search process by 5% compared to the use of a single-stage based on a dense retrieval approach, with minimal effect on the performance in terms of BERT-Score.
2025
File in questo prodotto:
File Dimensione Formato  
futureinternet-17-00239.pdf

accesso aperto

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Creative commons
Dimensione 1.27 MB
Formato Adobe PDF
1.27 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3001610