This paper presents an NLP research system designed to geolocate tweets within Italy, a country renowned for its diverse linguistic landscape. Our methodology consists of a two-step process involving pre-training and fine-tuning phases. In the pre-training step, we take a semi-supervised approach and introduce two additional tasks. The primary objective of these tasks is to provide the language model with comprehensive knowledge of language varieties, focusing on both the sentence and token levels. Subsequently, during the fine-tuning phase, the model is adapted explicitly for two subtasks: coarse- and fine-grained variety geolocation. To evaluate the effectiveness of our methodology, we participate in the GeoLingIt 2023 shared task and assess our model’s performance using standard metrics. Ablation studies demonstrate the crucial role of thepre-training step in enhancing the model’s performance on both tasks
DANTE at GeoLingIt: Dialect-Aware Multi-Granularity Pre-training for Locating Tweets within Italy / Gallipoli, Giuseppe; LA QUATRA, Moreno; REGE CAMBRIN, Daniele; Greco, Salvatore; Cagliero, Luca. - 3473:(2023). (Intervento presentato al convegno EVALITA 2023 tenutosi a Parma nel September 7th-8th 2023).
DANTE at GeoLingIt: Dialect-Aware Multi-Granularity Pre-training for Locating Tweets within Italy
Giuseppe Gallipoli;Moreno La Quatra;Daniele Rege Cambrin;Salvatore Greco;Luca Cagliero
2023
Abstract
This paper presents an NLP research system designed to geolocate tweets within Italy, a country renowned for its diverse linguistic landscape. Our methodology consists of a two-step process involving pre-training and fine-tuning phases. In the pre-training step, we take a semi-supervised approach and introduce two additional tasks. The primary objective of these tasks is to provide the language model with comprehensive knowledge of language varieties, focusing on both the sentence and token levels. Subsequently, during the fine-tuning phase, the model is adapted explicitly for two subtasks: coarse- and fine-grained variety geolocation. To evaluate the effectiveness of our methodology, we participate in the GeoLingIt 2023 shared task and assess our model’s performance using standard metrics. Ablation studies demonstrate the crucial role of thepre-training step in enhancing the model’s performance on both tasksFile | Dimensione | Formato | |
---|---|---|---|
paper14.pdf
accesso aperto
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Creative commons
Dimensione
1.52 MB
Formato
Adobe PDF
|
1.52 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2981929