The proliferation of social media platforms has presented researchers with valuable avenues to examine language usage within diverse sociolinguistic frameworks. Italy, renowned for its rich linguistic diversity, provides a distinctive context for exploring diatopic variation, encompassing regional languages, dialects, and variations of Standard Italian. This paper presents our contributions to the GeoLingIt shared task, focusing on predicting the locations of social media posts in Italy based on linguistic content. For Task A, we propose a novel approach, combining data augmentation and contrastive learning, that outperforms the baseline in region prediction. For Task B, we introduce a joint multi-task learning approach leveraging the synergies with Task A and incorporate a post-processing rectification module for improved geolocation accuracy, surpassing the baseline and achieving first place in the competition.

baρtti at GeoLingIt: Beyond Boundaries, Enhancing Geolocation Prediction and Dialect Classification on Social Media in Italy / Koudounas, Alkis; Giobergia, Flavio; Benedetto, Irene; Monaco, Simone; Cagliero, Luca; Apiletti, Daniele; Baralis, ELENA MARIA. - ELETTRONICO. - 3473:(2023). (Intervento presentato al convegno EVALITA 2023 tenutosi a Parma (ITA) nel September 7th - 8th, 2023).

baρtti at GeoLingIt: Beyond Boundaries, Enhancing Geolocation Prediction and Dialect Classification on Social Media in Italy

Koudounas Alkis;Giobergia Flavio;Benedetto Irene;Monaco Simone;Cagliero Luca;Apiletti Daniele;Baralis Elena
2023

Abstract

The proliferation of social media platforms has presented researchers with valuable avenues to examine language usage within diverse sociolinguistic frameworks. Italy, renowned for its rich linguistic diversity, provides a distinctive context for exploring diatopic variation, encompassing regional languages, dialects, and variations of Standard Italian. This paper presents our contributions to the GeoLingIt shared task, focusing on predicting the locations of social media posts in Italy based on linguistic content. For Task A, we propose a novel approach, combining data augmentation and contrastive learning, that outperforms the baseline in region prediction. For Task B, we introduce a joint multi-task learning approach leveraging the synergies with Task A and incorporate a post-processing rectification module for improved geolocation accuracy, surpassing the baseline and achieving first place in the competition.
2023
File in questo prodotto:
File Dimensione Formato  
paper16.pdf

accesso aperto

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Creative commons
Dimensione 1.03 MB
Formato Adobe PDF
1.03 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2982511