Personalization in Information Retrieval (IR) is a topic studied by the research community since a long time. Nevertheless, the availability of high-quality, real-world datasets for large-scale experiments and model evaluation remains limited. This paper helps to fill this gap by introducing SE-PQA (StackExchange - Personalized Question Answering), a new curated dataset designed for the development and evaluation of personalized models in the domain of community Question Answering (cQA). SE-PQA encompasses over one million queries and two million answers, annotated with a rich set of features that capture the social interactions among users on a cQA platform. We provide reproducible baseline methods for the cQA task based on the resource, including deep learning and personalized approaches. The results of the preliminary experiments conducted show the appropriateness of SE-PQA to train effective cQA models; they also show that personalization remarkably improves the effectiveness of all the methods tested.

SE-PQA: StackExchange Personalized Community Question Answering / Kasela, Pranav; Braga, Marco; Pasi, Gabriella; Perego, Raffaele. - 3802:(2024), pp. 99-102. (Intervento presentato al convegno 14th Italian Information Retrieval Workshop tenutosi a Udine (ITA) nel September 5-6, 2024).

SE-PQA: StackExchange Personalized Community Question Answering

Marco Braga;
2024

Abstract

Personalization in Information Retrieval (IR) is a topic studied by the research community since a long time. Nevertheless, the availability of high-quality, real-world datasets for large-scale experiments and model evaluation remains limited. This paper helps to fill this gap by introducing SE-PQA (StackExchange - Personalized Question Answering), a new curated dataset designed for the development and evaluation of personalized models in the domain of community Question Answering (cQA). SE-PQA encompasses over one million queries and two million answers, annotated with a rich set of features that capture the social interactions among users on a cQA platform. We provide reproducible baseline methods for the cQA task based on the resource, including deep learning and personalized approaches. The results of the preliminary experiments conducted show the appropriateness of SE-PQA to train effective cQA models; they also show that personalization remarkably improves the effectiveness of all the methods tested.
File in questo prodotto:
File Dimensione Formato  
sepqa_iir.pdf

accesso aperto

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Creative commons
Dimensione 990.79 kB
Formato Adobe PDF
990.79 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3002213