Automatic and early detection of foodborne hazards is crucial for preventing foodborne outbreaks. Existing AI-based solutions often cannot handle complexity and noise in food recall reports and they struggle to overcome the dependency between product and hazard labels. We introduce a methodology for classifying reports on food-related incidents that addresses these challenges. Our approach leverages LLM-based information extraction, to minimize report variability, along with a two-stage classification pipeline. The first model assigns coarse-grained labels that narrow the space of eligible fine-grained labels for the second model. This sequential process allows us to capture hierarchical label dependencies between products and hazards and between their respective categories. Additionally, we designed each model with two classification heads that rely on the inherent relations between food products and associated hazards. We validate our approach on two multi-label classification sub-tasks. Experimental results demonstrate the effectiveness of our approach, which achieves an improvement of +30% and +40% in classification performance compared to the baseline.
BitsAndBites at SemEval-2025 Task 9: Improving Food Hazard Detection with Sequential Multitask Learning and Large Language Models / Gensale, Aurora; Benedetto, Irene; Gioacchini, Luca; Bosca, Alessio; Cagliero, Luca. - ELETTRONICO. - (2025), pp. 718-725. (Intervento presentato al convegno 19th International Workshop on Semantic Evaluation (SemEval-2025) tenutosi a Vienna (AT) nel July 31 - August 1, 2025).
BitsAndBites at SemEval-2025 Task 9: Improving Food Hazard Detection with Sequential Multitask Learning and Large Language Models
Gensale, Aurora;Benedetto, Irene;Gioacchini, Luca;Bosca, Alessio;Cagliero, Luca
2025
Abstract
Automatic and early detection of foodborne hazards is crucial for preventing foodborne outbreaks. Existing AI-based solutions often cannot handle complexity and noise in food recall reports and they struggle to overcome the dependency between product and hazard labels. We introduce a methodology for classifying reports on food-related incidents that addresses these challenges. Our approach leverages LLM-based information extraction, to minimize report variability, along with a two-stage classification pipeline. The first model assigns coarse-grained labels that narrow the space of eligible fine-grained labels for the second model. This sequential process allows us to capture hierarchical label dependencies between products and hazards and between their respective categories. Additionally, we designed each model with two classification heads that rely on the inherent relations between food products and associated hazards. We validate our approach on two multi-label classification sub-tasks. Experimental results demonstrate the effectiveness of our approach, which achieves an improvement of +30% and +40% in classification performance compared to the baseline.File | Dimensione | Formato | |
---|---|---|---|
2025.semeval-1.99.pdf
accesso aperto
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Creative commons
Dimensione
353.69 kB
Formato
Adobe PDF
|
353.69 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/3002572