Personalization in Information Retrieval (IR) is a topic studied by the research community since a long time. Nevertheless, the availability of high-quality, real-world datasets for large-scale experiments and model evaluation remains limited. This paper helps to fill this gap by introducing SE-PQA (StackExchange - Personalized Question Answering), a new curated dataset designed for the development and evaluation of personalized models in the domain of community Question Answering (cQA). SE-PQA encompasses over one million queries and two million answers, annotated with a rich set of features that capture the social interactions among users on a cQA platform. We provide reproducible baseline methods for the cQA task based on the resource, including deep learning and personalized approaches. The results of the preliminary experiments conducted show the appropriateness of SE-PQA to train effective cQA models; they also show that personalization remarkably improves the effectiveness of all the methods tested.
SE-PQA: StackExchange Personalized Community Question Answering / Kasela, Pranav; Braga, Marco; Pasi, Gabriella; Perego, Raffaele. - 3802:(2024), pp. 99-102. (Intervento presentato al convegno 14th Italian Information Retrieval Workshop tenutosi a Udine (ITA) nel September 5-6, 2024).
SE-PQA: StackExchange Personalized Community Question Answering
Marco Braga;
2024
Abstract
Personalization in Information Retrieval (IR) is a topic studied by the research community since a long time. Nevertheless, the availability of high-quality, real-world datasets for large-scale experiments and model evaluation remains limited. This paper helps to fill this gap by introducing SE-PQA (StackExchange - Personalized Question Answering), a new curated dataset designed for the development and evaluation of personalized models in the domain of community Question Answering (cQA). SE-PQA encompasses over one million queries and two million answers, annotated with a rich set of features that capture the social interactions among users on a cQA platform. We provide reproducible baseline methods for the cQA task based on the resource, including deep learning and personalized approaches. The results of the preliminary experiments conducted show the appropriateness of SE-PQA to train effective cQA models; they also show that personalization remarkably improves the effectiveness of all the methods tested.File | Dimensione | Formato | |
---|---|---|---|
sepqa_iir.pdf
accesso aperto
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Creative commons
Dimensione
990.79 kB
Formato
Adobe PDF
|
990.79 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/3002213